The centerpiece of the course is a series of implementation projects
on compilers on three programming languages.
Monotype Expression Language (MEL)
MEL is a trivial expression language, even simpler than
what would be supported by a $2 calculator found on the racks at
CVS. MEL allows only integer expressions under addition,
subtraction, multiplication and division. Even this trivial
language presents challenges for the beginning compiler
writer. The lexemes of the language must be recognized and the
syntactic forms must be parsed, taking into account the
left-associativity of the operators and the fact that
multiplication and division are of higher precedence than addition
and subtraction.
Two significant features of this language are:
- since all expressions are of the same type (i.e., type
int), the language does not allow for the possibility of type
error. But it does allow for an otherwise illegal operation:
division by zero so this must be accounted for in the
generated code;
- since there are no booleans and there is no branching, the
intermediate code generated by the compiler is a single procedure
with a single basic block as a body.
The MEL project comes in two parts. The first part is to be done
individually and the second part is to be done as a team.
A Recursive Descent Parser for MEL
(20 Points) Due 5PM Wednesday February 18, 2009.
The syntax of MEL is as follows:
E ::= n | ( E ) | E + E | E - E | E * E | E / E
where n is an integer. The operators + and - are of
equal precedence and the operators * and / are of equal
precedence. The operators * and / are of higher precedence than
+ and -. All four operators are left-associative.
The first step in implementing a parser for MEL is to rewrite
the above grammar to enforce the correct precedence and
associativity. (A "layered" grammar enforcing the right
precedence and associativity can be found in many places but you
should figure it out for yourself.)
Once you've enforced the correct precedence and associativity,
you'll have to rewrite the grammar once again in order to
eliminate the left-recursion. Fortunately, the recursion will be
immediate left-recursion so the transformation rule covered in
class will suffice.
Armed with this new grammar, you are now ready to write the recursive
descent parser. First, download
the compressed tar archive containing the Ocaml infrastructure code and
stash it away in a convenient place. (Let me know if you need help
unpacking it.) You should print out at least the files:
lexer.mli,
ast.ml,
compile.ml and
parser.ml.
Your job for this part of the problem set is to write the function
expression, the header of which is contained in the
file parser.ml. You should make use of the resources
found in the Lexer module. The Lexer module has one
function for retrieving a token and another function for simply
looking to see what token is ahead in the input (without advancing
the input pointer).
The challenging part of this part of the project is to write the
parser in such a way that it enforces the left-associativity of
the operators. (In particular, the program "4 - 1 - 1" must parse
as "(4 - 1) - 1".)
In the process of developing your parser, you'll need to
compile, debug and run your compiler. We'll use the
makefile Makefile to manage compilation. One order of
business: in order to use the makefile, you'll need to tell the
make system about the dependencies between the various
modules. In order to do this, run a Unix shell under emacs by
typing:
M-x shell
This will split the window with the emacs shell in one half and it will
place the cursor in the Unix part. Now type:
> ocamldep *.ml *.mli > .depend
This runs the Ocaml dependency-file generator (i.e., ocamldep). If all
goes well, it will store dependency information in the file .depend.
You are now ready to compile your code. Type:
M-x compile
and then hit enter. This will execute the makefile Makefile.
In order to track down compilation errors, type:
C-x `
This will place the cursor at the point of the error. It takes a while
to interpret the messages, see me if you need help. In general, you'll
find that your code usually works once it makes it through the Ocaml
compiler.
When your code compiles successfully, the Ocaml compiler will
create your compiler for you. The makefile tells Ocaml to store
your compiler in the file melc. You'll want to run
melc your compiler on a test program. There are a few such programs
in the test subdirectory, or you can create your own.
To run the melc compiler from within emacs, switch to the unix shell
buffer and then type:
> ./melc filename.mel
This will run the compile function which, as you can
see from studying the code, calls your parser. The compile
function has been rigged to write diagnostic information to the
file filename.dbg. Take a look. Note that the compile
function passes the parse tree produced by your parser to the
Name.translate function. The result of that function is
then passed to the Lift.translate function. These two
functions are the subject of part 2 of this project.
Naming and Lifting
(20 Points) Due 5PM Wednesday February 27, 2009.
This part of the project is to be done with your teammate. It
consists of implementing the Name and
Lift phases of the melc compiler:
- The name phase is a source-to-source transformation
on abstract syntax trees. It is invoked on the abstract syntax
tree produced by the parser. The name phase is responsible for
introducing variables together with let-forms to capture the
results of each subexpression in the program.
- The lift phase is another source-to-source
transformation on abstract syntax trees. It accepts the output
of the name phase and lifts or "flattens" the
let-expressions according to the following rule:
let x1 = (lex x2 = e2 in e3) in e4
becomes
let x2 = e2 in let x1 = e3 in e4
The output of the lift phase should look like "linear"
code and should be suitable input to the code generator of the
compiler.
The infrastructure source code for the melc compiler can be
found here.
Polytype Expression Language (PEL)
(20 Points) Due 5PM, Friday March 13, 2009. PEL is obtained
from MEL by adding boolean values, boolean operators and
and or and a conditional expression. Unlike MEL programs,
PEL programs can be syntactically well-formed but semantically
meaningless because the programmer can attempt to use values
inappropriately. (E.g., true + 4). In addition, PEL
programs give rise to non-trivial (but still acyclic) control flow
graphs.
The PEL project consists of implementing the following phases of
the pelc compiler.
- a recursive descent parser;
- a static type-checker;
- a live variable analysis;
- a register allocator.
The infrastructure source code for the pelc compiler can be
found here.
miniPython
(40 Points) Due 5PM, Friday May 1, 2009. MiniPython is a subset of python.
MiniPython programs should run unchanged on the python interpreter (or compiler).
The subset of python that we'll implement essentially extends PEL with:
- collections of recursive functions;
- dynamic type checking and
- imperative forms: assignment statements, conditionals and while loops.
For example, a miniPython program would be:
def euclid(a, b):
if b == 0:
return a
else:
return euclid(b, a % b)
|