Lab 03: Building a Scheme Interpreter in Python

CS245: Programming Languages
Bryn Mawr College, Fall 2016
Prof. Blank

Put "By Your-Name-Here" here in the next cell, and delete this cell.

YOUR ANSWER HERE

1. Building an Interpreter

Consider using Python to turn the string "(+ 1 2)" into the actual number 3. How does that happen? This question is really: what does a programming language do? How does it work? In this lab, we will answer these questions by building a Scheme interpreter in Python.

1.1 Two Steps: Parse and Interpret

In designing a programming language, we break down the process into two steps:

STEP 1: The first step in implementing a programming language is to take a plain string and turn it into what is commonly called an Abstract Syntax Tree, or AST for short. This process is called parsing. ASTs are data structures.

STEP 2: The second step is to build an evaluator that takes ASTs and interprets them. This is called interpreting.

We will now go through the steps of designing a language. As this language will start off as a simple calculator with the syntax looking like Scheme, we'll call this language S-Calc, short for Scheme Calculator.

1.2 Parsing

To build a function that will take a string and produce AST, we further break down the parsing into three stages. We could do this all in one function, but it is common to break the process up into smaller chunks to make processing (and understanding/debugging) easier. The three components are:

  1. tokenize - turns a string into tokens
  2. reader - take the tokens and group them
  3. parser - turn the segmented parts into AST

The idea is that we can then take our string "(+ 1 2)" and end up with AST, like so:

parser(reader(tokenizer("(+ 1 1)")))

The form of the AST can really be any data structure that we decide. For these experiments, we will use very simple Scheme expressions (called s-exp). Thus, the above string might look like this in Scheme:

(apply-exp
 (var-exp +)
 ((lit-exp 1) (lit-exp 2)))

That is, it is an application-expression composed of the operator (a variable-expression '+') and two literal-expressions 1 and 2 as operands.

We call the syntax of the string the Concrete Syntax as compared to the Abstract Syntax of the AST.

As we have seen, Scheme is a simple language composed of lists, symbols, strings, and numbers. Everything in the language can be parsed into those components, so writing a Scheme parser is pretty easy compared to languages like Python.

1.2.1 Tokenize

To parse S-Calc we first define the lowest level of the process, the tokenizer:

In [ ]:
def tokenizer(string):
    """
    Takes a string and segments it into parts.
    We break strings up by brackets, and whitespace.
    Returns a Python list of strings.
    """
    retval = []
    current = ""
    for i in range(len(string)):
        if string[i] in ["(", "[", ")", "]"]:
            if current:
                retval.append(current)
            current = ""
            retval.append(string[i])
        elif string[i] in [" ", "\t", "\n"]:
            if current:
                retval.append(current)
            current = ""
        else:
            current += string[i]
    if current:
        retval.append(current)
    return retval
In [ ]:
tokenizer("""(this    is a
3.14 
(test))""")

Exercise 1: Try the tokenizer on many different strings. Describe what it does in simple terms based on its input and output.

Add cells below to test out the tokenizer.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

1.2.2 Reader

The reader will take the tokenized expression (texp) and produced grouped results.

In [ ]:
def reader(texp):
    """
    Takes the output of the tokenizer, and creates
    lists of lists of items. Numbers are represented
    as numbers.
    """
    current = None
    stack = []
    for item in texp:
        if item.isdigit():
            if current is not None:
                current.append(eval(item))
            else:
                current = eval(item)
        elif item in ["[", "("]:
            if current is not None:
                stack.append(current)
            current = []
        elif item in ["]", ")"]:
            if stack:
                stack[-1].append(current)
                current = stack.pop(-1)
            else:
                pass
        else:
            if current is not None:
                current.append(item)
            else:
                current = item
    return current
In [ ]:
tokenizer("(this is (a) ((list))")
In [ ]:
reader(tokenizer("(this is (a) ((list))"))

Exercise 2: Try the reader on many different tokenized strings. Describe what it does in simple terms. How does this differ from the lexer?

Add cells below to test out the reader.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

1.2.3 Parser

The final process of Step 1 is to take the output of the reader and parse it into an AST. For our first S-Calc expression, we just need to handle "(+ 1 2)". That is, we need to handle three things:

  • numbers - any kind of number
  • variables, like "+" - anything not a number
  • application - starts with a parenthesis
In [ ]:
## Version 1:

def parser(rexp):
    """
    Reads in a Python list of things, and returns an AST.
    """
    if isinstance(rexp, int):
        return lit_exp(rexp)
    elif isinstance(rexp, str):
        return var_exp(rexp)
    else:
        return app_exp(parser(rexp[0]), List(*map(parser, rexp[1:])))

That's it?! Yes, but we need to define some things before that will run. We need to define:

  • list_exp
  • var_exp
  • app_exp

To think like a Little Schemer, we define some utility functions in Python so that we can write code as if we were in Scheme. Specifically, let's replicate the linked-list of cons/car/cdr:

In [ ]:
EmptyList = "()"

def cons(item1, item2):
    return [item1, item2]

def car(exp):
    return exp[0]

def cdr(exp):
    return exp[1]

def cadr(exp):
    return exp[1][0]

def cddr(exp):
    return exp[1][1]

def caddr(exp):
    return exp[1][1][0]

def List(*args):
    "Create a linked-list of items"
    retval = EmptyList
    for arg in reversed(args):
        retval = cons(arg, retval)
    return retval

Note that EmptyList is a symbol (a single value, not an actual list object).

Let's test out the above Python functions to see if they behave like their Scheme counterparts.

An improper list, the dotted pair:

In [ ]:
cons(1, 2)

Make sure that car and cdr can properly deconstruct a cons cell:

In [ ]:
car(cons(1, 2))
In [ ]:
cdr(cons(1, 2))

And, a convenience method for constructing Scheme-like lists of multiple items:

In [ ]:
List(1, 2, 3, 4)

Exercise 3: Why does the list above look like this? Is this similar to how Scheme lists exist? Explain in the following cell:

YOUR ANSWER HERE

Add cells below to test the Scheme-like functions, including List. Explain how List works in a markdown cell.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Now, we can compose our AST constructor functions:

In [ ]:
def lit_exp(value):
    return List("lit-exp", value)

def var_exp(symbol):
    return List("var-exp", symbol)

def app_exp(f, args):
    return List("apply-exp", f, args)

And (finally!) we can put it all together and parse a string:

In [ ]:
parser(reader(tokenizer("1")))
In [ ]:
parser(reader(tokenizer("+")))
In [ ]:
parser(reader(tokenizer("(+ 1 2)")))

We'll be doing those three functions together quite often, so let's make a useful function:

In [ ]:
def scalc_parse(string):
    return parser(reader(tokenizer(string)))
In [ ]:
scalc_parse("652362")

Exercise 4: Try out the scalc_parser. Can it handle nested mathematical expressions? Why? How does the parser handle recursive expressions? Demonstrate and explain.

Add cells below to test, demonstrate, and explain what works with the parser.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

1.3 Interpreter

Now we are ready for Step 2: the interpreter. This function takes an AST expression and interprets it (i.e., gives a result). We will call our interpreter evaluator.

Again, as we only have numbers, symbols, and applications, it only needs to handle those three items. To help with debugging, we will also now add a print application.

We start top down, defining the evaluator itself. It takes a parser expression, and does the computation:

In [ ]:
## Version 1:

def evaluator(expr):
    if car(expr) == "lit-exp":
        return cadr(expr)
    elif car(expr) == "var-exp":
        return cadr(expr) ## for now, return symbol
    elif car(expr) == "apply-exp":
        return evaluator_apply(evaluator(cadr(expr)), 
                               Map(evaluator, caddr(expr)))
    else:
        raise Exception("invalid ast: %s" % expr)

We break the evaluator into smaller functions to make it easy on us to focus on one issue at a time. We need a function evaluator_apply that takes the symbol of a mathematical operation (say, "+") and a list of operands, and does the actual calculation:

In [ ]:
def evaluator_apply(op, operands):
    if op == "print":
        Print(operands)
    elif op == "+":
        return car(operands) + cadr(operands)
    else:
        raise Exception("unknown apply operator: %s" % op)
        

Like we have seen in Scheme, we want to Map evaluator to all of the operands before we pass them to evaluator_apply. So we write a Map function to operate on Scheme-like lists:

In [ ]:
def Map(f, slist):
    if slist == EmptyList:
        return EmptyList
    else:
        return cons( f(car(slist)), Map(f, cdr(slist))) ## recursive!
    

NOTE: there is actually a map function in Python, but it operates (of course) on Python lists. Ours operates on cons-cell-based Scheme-like lists.

And we write a recursive Print function to print a Scheme-list of items:

In [ ]:
def Print(slist):
    if slist == EmptyList:
        return
    else:
        print(car(slist))
        Print(cdr(slist))

Let's test out these support functions:

In [ ]:
Map(lambda v: v, List(1, 2, 3))

Add cells and test some functions out individually:

And now (drum roll, please) we can put it all together to interpret a Scheme-like string into actual computational answers. Let's parse a string:

In [ ]:
expr = scalc_parse("3")

And evaluate it:

In [ ]:
evaluator(expr)

Woot! The string "3" actually evaluates to the number 3! We did it! But we can do more.

Let's define a utility function scalc that will parse and interpret any string:

In [ ]:
def scalc(string):
    return evaluator(scalc_parse(string))

And we begin testing of out S-Calc interpreter:

In [ ]:
scalc("34")
In [ ]:
scalc("(+ 1 1)")
In [ ]:
scalc("(print 1 2 3)")
In [ ]:
scalc("(+ 1 (+ 100 10))")

Add cells to try out the interpreter:

Exercise 5: Add the following operators:

  • subtract
  • multiply
  • divide

You can redefine the Python functions here.

Test out these operations thoroughly.

What should you do with divide by zero (and other) errors? What are the choices?

In the next cell, copy the evaluator_apply code from above and add your new operators:

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Exercise 6: A quoted item is a literal item. However, if it is a list, it should be converted to a proper Scheme list in the parser. You can use the following for any quoted item (symbol, number, or list). The evaluator need not change. Copy the old parser code from above to use the following sexp function.

def sexp(item):
    """
    Takes an Python list of items and returns Scheme s-exp.
    """
    if isinstance(item, list):
        return List(*map(sexp, item)) # recursion!
    else:
        return item

That is, if you encounter a "quote" in the parser, then call sexp, and return a lit_exp.

To test it, you should be able to:

scalc("(quote 42)")
scalc("(quote \"hello\")")
scalc("(quote (1 2 3))")

In the cells below, test out quoted items.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

How could you use quoted lists? Do we need some additional operators to use them?

Exercise 7: Add the literals #t and #f that evaluate to 1 and 0.

In[ ]: scalc("#t")
Out[]: 1
In[ ]: scalc("#f")
Out[]: 0

How will you add these to the language? There is no one right answer, but you should justify your choice among possible options.

In the cell below, copy and change whatever is necessary to add booleans, and test that they work.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Exercise 8: Add an if-expression that works as follows:

In [1]: (if #t 2 3)
Out[1]: 2

In [2]: (if 1 2 3)
Out[2]: 2

In [3]: (if 0 2 3)
Out[3]: 3

Note that if should never evaluate the other argument, just the one it returns. For example:

(if 0 (/ 2 0) 3)

should return 3, but should not evaluate the divide-by-zero expression. This is called "short-cutting".

Copy the necessary code below and change it to add if. Make sure you test it thoroughly.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

2. Reflections

As per usual, please reflect deeply on this week's lab. What was challenging, easy, or surprising? Connect the topics onto what you already know.

YOUR ANSWER HERE