Principles of Programming Languages

Blank, Fall 2012 Assignment #3: Calculator Language

Due: Thursday, October 2, 2014

This assignment is designed to:

  • Create a language from scratch
  • Write parser and evaluate
  • Use the define-datatype interface to define recursive structure
  • Add pre-defined variables to your language

Introduction

When we talk about a programming language, we often refer to its form, or syntax. For example, in C or Java, you might have a function that looks like:

int plus_one(int n) {
    return n + 1;
}

We will refer to this kind of syntax as concrete syntax. Concrete syntax is what one actually types to program in a particular language. However, to make processing a language easier, the first step in creating a programming language is to convert concrete syntax into abstract syntax. In fact, we will convert, or parse, concrete syntax into an Abstract Syntax Tree (AST). This is a a "tree" in that it is a recursive data structure representing a program.

Parser: Concrete Syntax to Abstract Syntax Tree (AST)

In this assignment, you will write a small, but complete, Calculator Language. Your language should be able to at least add, subtract, divide, and multiple. The language should use infix notation, as per these examples. In addition, your language should allow arbitrary recursive expressions.

Concrete Syntax

Typically, we will use strings to represent concrete syntax. So the above example might look like:

(define program "
    int plus_one(int n) {
        return n + 1;
    }")

However, to start out we will use Scheme Expressions (s-expressions) to represent programs instead of strings. These will initially look just like the simple infix expressions from the last homework.

So, our first goal is to write parser that will take an expression such as (1 + 2) and turn it into an AST.

To begin, we will have just two kinds of expressions in our Calc language:

  1. numbers, which will be tagged as literals, lit-exp
  2. addition expressions, which will be tagged as plus-exp

lit-exp will be a list composed of the symbol `lit-exp followed by the number, like:

'(lit-exp 34)

'(lit-exp -1)

'(lit-exp 12133456)

plus-exp will be a list composed of the symbol `plus-exp followed by two Calc expressions, like:

'(plus-exp (lit-exp 1) (lit-exp 2))

'(plus-exp (lit-exp 1) 
           (plus-exp (lit-exp 2)
                     (lit-exp 3)))

Our first goal is to write parser that takes abstract s-expression Calc representations and turn them into AST's:

In  [1]: (parser '(1 + 2))
Out [1]: (plus-exp (lit-exp 1)
                   (lit-exp 2))

That could look like:

In [19]:
(define parser
  (lambda (exp)
    (cond
     ((number? exp) (list 'lit-exp exp))
     ((eq? (cadr exp) '+)
      (list 'plus-exp
            (parser (car exp))
            (parser (caddr exp))))
     (else (error 'parser "Invalid concrete syntax: ~s" exp)))))
In [4]:
(parser '42)
;; (lit-exp 42)
Out[4]:
(lit-exp 42)
In [5]:
(parser '(1 + 2))
;; (plus-exp (lit-exp 1) (lit-exp 2))
Out[5]:
(plus-exp (lit-exp 1) (lit-exp 2))
In [6]:
(parser '(100 ^ 2))
;; Traceback (most recent call last):
;;   File "stdin", line 1, col 1, in 'parser'
;;   File "stdin", line 9, col 12, in 'error'
;;   File "stdin", line 9, col 12
;; RunTimeError: Error in 'parser': Invalid concrete syntax: (100 ^ 2)
Traceback (most recent call last):
  File "stdin", line 1, col 1, in 'parser'
  File "stdin", line 9, col 12, in 'error'
  File "stdin", line 9, col 12
RunTimeError: Error in 'parser': Invalid concrete syntax: (100 ^ 2)

Problem 1:

Extend the parser to include the ability to parse minus, multiply, and divide concrete syntax. Extensively test it to make sure that it works for all valid input, and fails on non-valid input. Don't add additional operators yet... we'll do that below.

Evaluator: interpret AST

In Part 2 we will design and build evaluator that will take AST's and evaluate/interpret them. That is, we will interpret the AST into their calculated form. The result of evaluator at this point will always be a number, because both of our forms, plus-exp and lit-exp both evaluate to numbers.

In [20]:
(define evaluator 
  (lambda (ast)
    (case (car ast)
     (lit-exp (cadr ast))
     (plus-exp (+ (evaluator (cadr ast))
                  (evaluator (caddr ast)))))))
In [21]:
(evaluator (parser '(3 + 2)))
;; 5
Out[21]:
5
In [22]:
(evaluator (parser '2))
;; 2
Out[22]:
2

Problem 2:

Extend evaluator to evaluate multiple, divide, and minus. Extensively test your code to make sure that it works correctly. Errors such as divide-by-zero should produce "run time" errors. Errors such as (1 + +), (+ 1 2) and (1 + 2 + 3) should produce syntax errors.

Define-datatype

Things get a bit messy with the cars and cdrs in the above evaluator. We introduce define-datatype to make these datatypes easier to deal with by giving them named-parts.

define-datatype is a special form used to create recursive data structures. It takes the form:

(define-datatype NAME TEST?
  (SUBNAME1
     (PART1 SUBTEST1?))
  (SUBNAME2
     (PART2 SUBTEST2?))
  ...)

For example, we can define calc-exp as being composed of lit-exp and plus-exp:

In [9]:
(define-datatype calc-exp calc-exp?
  (lit-exp    
   (var number?))
  (plus-exp
   (arg1 calc-exp?)
   (arg2 calc-exp?)))

The above code automatically defines the following:

  • calc-exp?
  • lit-exp
  • plus-exp
In [10]:
(list calc-exp? lit-exp plus-exp)
Out[10]:
(#<procedure> #<procedure> #<procedure>)
In [11]:
(lit-exp 1)
Out[11]:
(lit-exp 1)
In [12]:
(lit-exp 'a)
Traceback (most recent call last):
  File "stdin", line 1, col 1, in 'lit-exp'
  Source "macro-generated-exp"
RunTimeError: Error in 'lit-exp': a is not of type number?

In [13]:
(calc-exp? (lit-exp 42))
Out[13]:
#t
In [14]:
(plus-exp (lit-exp 1) (lit-exp 2))
Out[14]:
(plus-exp (lit-exp 1) (lit-exp 2))
In [15]:
(plus-exp (plus-exp (lit-exp 23)
                    (lit-exp 50))
          (lit-exp -1))
Out[15]:
(plus-exp (plus-exp (lit-exp 23) (lit-exp 50)) (lit-exp -1))

We can now clean up the parser and evaluator by using the new datatypes. Here is the base evaluator:

In [16]:
(define evaluator 
  (lambda (ast)
    (record-case ast
     (lit-exp (value) value)
     (plus-exp (arg1 arg2) (+ (evaluator arg1)
                              (evaluator arg2))))))
In [17]:
(evaluator (lit-exp 42))
Out[17]:
42
In [15]:
(evaluator (parser '((10 + 10) + 13)))
;; 33
Out[15]:
33
In [ ]:
(evaluator (parser '((100 * 100) * (26 / 2))))
;; 130000

Note that when you change the parser to use define-datatype, it will produce the exact same output (e.g., tagged lists).

Problem 3:

Rewrite the parser and evaluator to use define-datatype. Test them both thoroughly to make sure that all correct versions work, and that incorrect forms fail.

Problem 4:

Refine your grammar so that all infix operators are treated as one expression type, rather than 4 different ones. For example, instead of having plus-exp and minus-exp, you will now have something like app-exp ("application expression"). You'll need to add another field to your new expression to store the operation symbol (e.g., '+ or '-).

Add other mathematical operators to make your Calculator Language useful.

Problem 5:

Add the following functions to your language by adding them to the grammar. min, max should each take two arguments and return the min and max (respectively) of the two numbers. The concrete syntax of min and max should be:

(min 1 2)

(max 3 7)

And they should parse to AST that looks like:

(func-call-exp min calc-exp calc-exp)

(func-call-exp max calc-exp calc-exp)

When evaluated, they will give the min or the max of the two arguments.

Add the functions avg and sum which can take any number of expressions, and compute the average.

Here is one way of defining func-call-var-exp, a datatype with variable args:

In [24]:
(define-datatype calc-exp2 calc-exp2?
    (func-call-var-exp 
        (func symbol?)
        (args (list-of number?))))
In [ ]:
(func-call-var-exp 'sum '(1 2 3))
In [ ]:
(evaluator (parser '(max 10 2)))
;; 10
In [ ]:
(evaluator (parser '(min 10 2)))
;; 2
In [ ]:
(evaluator (parser '(avg 10 2 4 5 6)))

Problem 6:

Add variables to your language. We will pre-define pi as 3.141592653589793, and e 2.718281828459045.

In order to use variables, you will need to pass in the environment to the evaluate function (and everywhere you call evaluate). We will use an association list as the environment. So, an environment will be defined and used like so:

In [1]:
(define env '((pi 3.141592653589793) 
              (e  2.718281828459045)))
In [3]:
(assq 'pi env)
;; (pi 3.141592653589793)
Out[3]:
(pi 3.141592653589793)
In [4]:
(assq 'c env)
;; #f                     ;; because c is not in the association list
Out[4]:
#f

Change the evaluate function to take an additional argument, the environment. You will need to add var-exp to the define-datatype, and be able to parse and evaluate the variables. It should then work as follows:

In [ ]:
(evaluate (parser 'e) env)
;; 3.141592653589793

Finally, use your new language to compute something useful. For example, what is the area of a circle of radius 2 feet?

In [ ]:
(evaluate (parser '(pi * (2 * 2))) env)
;; 12.5663706143592

Print out your code and examples of it running and bring to class Thursday, October 2, 2014.