CS 367 Compilers
Spring 2009

Computer Science Department
The College of Arts and Sciences
Boston College

About Syllabus Textbook Compiler Project
Staff Resources Grading Problem Sets
The centerpiece of the course is a series of implementation projects on compilers on three programming languages.

Monotype Expression Language (MEL)

MEL is a trivial expression language, even simpler than what would be supported by a $2 calculator found on the racks at CVS. MEL allows only integer expressions under addition, subtraction, multiplication and division. Even this trivial language presents challenges for the beginning compiler writer. The lexemes of the language must be recognized and the syntactic forms must be parsed, taking into account the left-associativity of the operators and the fact that multiplication and division are of higher precedence than addition and subtraction.

Two significant features of this language are:
  1. since all expressions are of the same type (i.e., type int), the language does not allow for the possibility of type error. But it does allow for an otherwise illegal operation: division by zero so this must be accounted for in the generated code;

  2. since there are no booleans and there is no branching, the intermediate code generated by the compiler is a single procedure with a single basic block as a body.

The MEL project comes in two parts. The first part is to be done individually and the second part is to be done as a team.
A Recursive Descent Parser for MEL
(20 Points) Due 5PM Wednesday February 18, 2009. The syntax of MEL is as follows:
	E ::= n | ( E ) | E + E | E - E | E * E | E / E
      
where n is an integer. The operators + and - are of equal precedence and the operators * and / are of equal precedence. The operators * and / are of higher precedence than + and -. All four operators are left-associative.

The first step in implementing a parser for MEL is to rewrite the above grammar to enforce the correct precedence and associativity. (A "layered" grammar enforcing the right precedence and associativity can be found in many places but you should figure it out for yourself.)

Once you've enforced the correct precedence and associativity, you'll have to rewrite the grammar once again in order to eliminate the left-recursion. Fortunately, the recursion will be immediate left-recursion so the transformation rule covered in class will suffice.

Armed with this new grammar, you are now ready to write the recursive descent parser. First, download the compressed tar archive containing the Ocaml infrastructure code and stash it away in a convenient place. (Let me know if you need help unpacking it.) You should print out at least the files: lexer.mli, ast.ml, compile.ml and parser.ml.

Your job for this part of the problem set is to write the function expression, the header of which is contained in the file parser.ml. You should make use of the resources found in the Lexer module. The Lexer module has one function for retrieving a token and another function for simply looking to see what token is ahead in the input (without advancing the input pointer).

The challenging part of this part of the project is to write the parser in such a way that it enforces the left-associativity of the operators. (In particular, the program "4 - 1 - 1" must parse as "(4 - 1) - 1".)

In the process of developing your parser, you'll need to compile, debug and run your compiler. We'll use the makefile Makefile to manage compilation. One order of business: in order to use the makefile, you'll need to tell the make system about the dependencies between the various modules. In order to do this, run a Unix shell under emacs by typing:
	M-x shell
      
This will split the window with the emacs shell in one half and it will place the cursor in the Unix part. Now type:
	> ocamldep *.ml *.mli > .depend
      
This runs the Ocaml dependency-file generator (i.e., ocamldep). If all goes well, it will store dependency information in the file .depend. You are now ready to compile your code. Type:
	M-x compile
      
and then hit enter. This will execute the makefile Makefile. In order to track down compilation errors, type:
	C-x `
      
This will place the cursor at the point of the error. It takes a while to interpret the messages, see me if you need help. In general, you'll find that your code usually works once it makes it through the Ocaml compiler.

When your code compiles successfully, the Ocaml compiler will create your compiler for you. The makefile tells Ocaml to store your compiler in the file melc. You'll want to run melc your compiler on a test program. There are a few such programs in the test subdirectory, or you can create your own. To run the melc compiler from within emacs, switch to the unix shell buffer and then type:
	> ./melc filename.mel
      
This will run the compile function which, as you can see from studying the code, calls your parser. The compile function has been rigged to write diagnostic information to the file filename.dbg. Take a look. Note that the compile function passes the parse tree produced by your parser to the Name.translate function. The result of that function is then passed to the Lift.translate function. These two functions are the subject of part 2 of this project.
Naming and Lifting
(20 Points) Due 5PM Wednesday February 27, 2009. This part of the project is to be done with your teammate. It consists of implementing the Name and Lift phases of the melc compiler:
  1. The name phase is a source-to-source transformation on abstract syntax trees. It is invoked on the abstract syntax tree produced by the parser. The name phase is responsible for introducing variables together with let-forms to capture the results of each subexpression in the program.

  2. The lift phase is another source-to-source transformation on abstract syntax trees. It accepts the output of the name phase and lifts or "flattens" the let-expressions according to the following rule:

    	  let x1 = (lex x2 = e2 in e3) in e4
    
    	  becomes
    
    	  let x2 = e2 in let x1 = e3 in e4
    	
    The output of the lift phase should look like "linear" code and should be suitable input to the code generator of the compiler.

The infrastructure source code for the melc compiler can be found here.

Polytype Expression Language (PEL)

(20 Points) Due 5PM, Friday March 13, 2009. PEL is obtained from MEL by adding boolean values, boolean operators and and or and a conditional expression. Unlike MEL programs, PEL programs can be syntactically well-formed but semantically meaningless because the programmer can attempt to use values inappropriately. (E.g., true + 4). In addition, PEL programs give rise to non-trivial (but still acyclic) control flow graphs.

The PEL project consists of implementing the following phases of the pelc compiler.
  1. a recursive descent parser;

  2. a static type-checker;

  3. a live variable analysis;

  4. a register allocator.

The infrastructure source code for the pelc compiler can be found here.

miniPython

(40 Points) Due 5PM, Friday May 1, 2009. MiniPython is a subset of python. MiniPython programs should run unchanged on the python interpreter (or compiler). The subset of python that we'll implement essentially extends PEL with:
  1. collections of recursive functions;

  2. dynamic type checking and

  3. imperative forms: assignment statements, conditionals and while loops.
For example, a miniPython program would be:
      def euclid(a, b):
          if b == 0:
              return a
          else:
              return euclid(b, a % b)
    
Created on 01-12-2009 12:04.