This is the home page for Computer Science 652, Graduate Programming Languages, at the University of San Francisco. This course is all about implementing programming languages.
Instructor: Terence Parr
Fall 2010: COWELL 413 MWF 11:45am-12:50pm
Last day of class: Wednesday December 8, 2010
Midterm exam: October 27, 2010
2nd midterm exam: Wednesday December 8, 2010 (serves as final)
Materials use ANTLR v3.2 or above
Textbook is Language ImplementationPatterns (LIP):
You'll also find The Definitive ANTLR Reference (TDAR) useful.
No laptops are allowed in my lectures ('cept mine)
Abstract
Because programming languages are at the core of writing software, programmers should have a thorough understanding of how languages are designed, implemented, and manipulated. This course concerns itself specifically with the implementation and translation of computer languages, leaving an in-depth study of language design to another course. Students will learn the formalisms behind computer languages, but the focus will be on developing the ability to build languages and their translators.
This class can be challenging conceptually. Some of the language formalisms take a while to sink in. Well, actually you have one major hurdle to get over and then it's easy--abstraction in the sense of recursion, meta-language, programs that generate other programs (or even themselves), etc... If you get a headache when you try to figure out how the first C compiler could have been written in C, you might invest in a big bottle of aspirin.
Students are required to build projects in Java unless specified otherwise. You should have a good understanding of data structures, algorithms, and recursion. Prior experience with language implementation is helpful, but not required. You will be expected to write a lot of code this semester, culminating in a complete programming language implementation.
Two graduate classes is considered full-time at USF and, hence, you can expect this class to require about 20 hours/week of class time and homework/development time. You should start early on every project. Note that there is no such thing as a late project. Late projects get a 0 grade as I will be handing out the solutions the day projects are due.
Course goal
By the end of this course, you will have implemented a complete programming language. The lecture flow will lead you through all of the formalisms and techniques you'll need and we'll closely follow the chapter sequence in the textbook for the first part of the semester. For the Fall 2010 semester, we're going to build a bytecode compiler and interpreter for SmallTalk. The main tasks are:
- parser and AST constructor
- bytecode generator
- bytecode interpreter
- SmallTalk primitives and runtime support
After the first stage, you will all start from my solution for the subsequent stages. This will show you what a proper implementation looks like and prevents a single bad stage from killing your chances of success on the remaining stages.
Student Learning Outcomes
At the end of this course, you will
- be familiar with the evolution of programming languages
- be able to build NFA, DFA, and related state machines
- understand the nature and structure of language
- be able to build symbol tables and intermediate representations
- be able to build lexers and parsers by hand and via ANTLR
- be able to describe how object-oriented languages implement inheritance, polymorphism, encapsulation, and data-hiding
- be very comfortable with grammars and parsing
- know how to build source to source translators
- be able to build a programming language to bytecode compiler
- be able to implement a bytecode interpreter
In short, you will be able to intelligently discuss the implementation and translation of programming languages and will have built a nontrivial programming language.
Syllabus
under construction
Introduction
- The evolution of programming languages
- The difference between compilers and interpreters
- Chapter 2: The nature of computer languages (TDAR)
- Chapter 1: Language applications cracked open (LIP)
Language recognition
- Basic parsing patterns
- Resources: LL Parsing and recursive descent design pattern, 11.1-11.4 from ANTLR reference
- Grammar to Recursive-Descent Parser Mapping
- Recursive-descent Lexer
- Canonical LL(1) Recursive-Descent Parser
- LL(k) Recursive-Descent Parser
- Advanced parsing patterns
- Backtracking
- Memoizing
- Predicated
- ANTLR introduction: Chapter 3 from ANTLR reference
- Ambiguities and Nondeterminisms: Section 11.5 from ANTLR reference
- Using semantic and syntactic predicates: Chapter 12 from ANTLR reference (required reading: Chapters 13, 14 from ANTLR reference)
Labs:
SmallTalk intro
Internal representations
Symbol tables
Semantic analysis
Code generation
Interpreters
Advanced topics
-----------------
I. Introduction
- Course overview
- A trip down memory lane
- The evolution of programming languages
- Chapter 2: The nature of computer languages from Definitive ANTLR Reference
- Overview of language implementation and supporting tools
- Grammars
Homeworks:
Labs:
II. Parsing
- Basic parsing patterns
- Resources: LL Parsing and recursive descent design pattern, 11.1-11.4 from ANTLR reference
- Grammar to Recursive-Descent Parser Mapping
- Recursive-descent Lexer
- Canonical LL(1) Recursive-Descent Parser
- LL(k) Recursive-Descent Parser
- Advanced parsing patterns
- Backtracking
- Memoizing
- Predicated
- ANTLR introduction: Chapter 3 from ANTLR reference
- Ambiguities and Nondeterminisms: Section 11.5 from ANTLR reference
- Using semantic and syntactic predicates: Chapter 12 from ANTLR reference (required reading: Chapters 13, 14 from ANTLR reference)
Labs:
- Parse DOT files the easy way
- Grammars galore
- [calculator using ANTLR]
- [class hierarchy diagrams via Java grammar]
III. Intermediate forms and semantic analysis
- Intermediate form trees
- Resource: Chapter 7 from ANTLR reference
- Parse trees
- ASTs: homogeneous, Normalized Heterogeneous AST, Irregular Heterogeneous AST
- Tree walking patterns
- Embedded heterogeneous tree walker
- External visitor
- Tree grammar
- Tree pattern matcher
- Symbol table patterns
- Monolithic
- Nested scopes
- Data aggregates like C structs
- Class hierarchies
Resource: Symbol tables
- Semantic analysis: static type computation and enforcing static type equivalency
Labs:
IV. Interpreters and program execution
- Overview of interpreters
- Stack machines
- PostScript case study
- Tree-based interpreters
- Memory allocation and garbage collection
- Implementing Polymorphism
Labs:
- [AST-based calculator]
- PostScript interpreter
V. Translators
- Overview of source to source translation strategies
- Generating structured text with templates and grammars: Chapter 9 from ANTLR reference
- [Generating byte code]
- [Generating source code]
- Tree rewriting
- Tree walking instruction generators
- [Intermediate representations for compilers]
- gUnit - Grammar Unit Testing
Labs:
- Java profiling
- Tree walking instruction generator JBurg
- Generating machine executable binaries with LLVM
Projects
Parsing Graphviz DOT files the hard way 5% Due Sept 8, 2010
Build a lexer and parser by hand that checks DOT files for correct syntactic structure.
[Implement X in SmallTalk] Due Sept. 22
[SmallTalk grammar and AST construction] Due Oct 4
[Basic SmallTalk interpreter and primitives] Due Nov 3
[Blocks and message sends] Due Nov 17
[SmallTalk bytecode compiler] Due Dec 3
There are no late projects and incomplete features do not count.
All projects will be graded and must run under UNIX. Projects will be graded interactively and may require students to meet with the instructor outside class.
SVN repository
All projects must be submitted in SVN. For example, here is how I would create my cs652 directory:
svn mkdir https://www/svn/parrt/cs652 -m 'add dir'
Exams
There will be two written exams. The midterm exam is on ??.
Grading
Your grade will be computed according to the following relationship:
15% Exam 1
15% Exam 2
10% Labs/Homework/class participation
60% Projects
I consider an "A" grade to be above and beyond what most students have achieved. A "B" grade is an average grade or what you could call "competence" in a business setting. A "C" grade means that you either did not or could not achieve competence. An "F" grade implies you did very little work or had great difficulty with the class compared to other students.
Unless you are sick or have a family emergency, I will not change deadlines for projects nor exam times. For example, I will not give you a special final exam just because you want to fly home early. Consult the university academic calendar before making travel plans.
Miscellaneous
Attendance is part of your grade. Even if you know this subject well, I will be providing a number of anecdotes from industry and other goodies during my lectures. Anything I say in class is fair game for an exam
Tardiness. Please be on time. It's very disruptive for you to come in late and you could miss vital information.
Academic honesty. You must abide by the copyright laws of the United States and academic honesty policies of USF. If told you may for a particular project, use any code from the net that you find as long as it does not violate the software's license. You may not borrow code from other current or previous students. All suspicious activity will be investigated and, if warranted, passed to the Dean of Sciences for action.
Mailing list. You must all join:
http://cs.usfca.edu/mailman/listinfo/cs652
I will make lots of announcements on it.
Resources
Tools
Smalltalk
- smalltalk cheat sheet
- Squeak by example
- Lots of free smalltalk books
- Smalltalk-80: The Language and its Implementation
- A little Smalltalk
- Tim Budd's Java implementation of "a Little Smalltalk"
- "Simple" smalltalk implementations
- Overview of GNU Smalltalk syntax
- Paper describing GNU Smalltalk implementation
- Dan Ingalls on the History of Smalltalk and the Lively Kernel


Add Comment