410/510 1 of 20
Week 1 – Lecture 1
• Introduction• The Textbook• Assessment• Programming & Tools• A v. small compiler
Compiler Construction
410/510 2 of 20
The Big Picture
• In this course we will be constructing a compiler!
• Moving from a High Level Language to a Low Level Language
• Compilers are complex programs– > 10,000 lines of code
• Integrate aspects from many different areas of CS– Formal language theory, algorithms, data structures, HLL &
LLL (obviously), user interaction (error reporting)
410/510 3 of 20
What is a compiler?
• A specialization of a language translator• Usually in CS:
– the Source is a high level programming language– the Target is a machine code for a micro-processor
L1 L2
Source Target
C x86 processor
410/510 4 of 20
Applications of Compiler Techniques
• Potential Source languages include:– Natural languages (English, French,….)
– Circuit layout languages
– Mark-up languages (HTML, XML, …)
– Command line languages (SQL interface)
• Potential Target languages include:– Natural languages
– Printer drivers
– Markup languages
• e.g. HTML to RTF converter– Could involve many of the aspects we will cover in compiler
construction
410/510 5 of 20
Compilers for Programming Languages
• If we had 1 compiler for each {Source,Target} pair then we would have a lot of compilers!
Source Languages Target Languages
CompilersC
Prolog
Java
Lisp Haskell
C++C#
Fortran
PascalSather
x86 (MMX)
JVM
PowerPC 750 (G3)
ARM
SPARCAMD K6
410/510 6 of 20
Modularity for Code Generation
Compilers
x86
ARM
G4
Source
Intermediate Representation
Compiler portability (man gcc – lists different target machines)
410/510 7 of 20
Modularity for Source Languages?
Compilers
Intermediate Representation
Sources Targets
C
Java
Prolog
Typically compilers only compile one source language– but the techniques used are very similar and are shared across different compilers
410/510 8 of 20
Typical Compiler
IntermediateRepresentationSource Target
Front-end Back-endIndependent of Sourceand Target languages
Analysis Synthesis
For a new Source language – we can add a new front-end to an existing back-endFor a new Target language – we can add a new back-end to an existing front-end
coursenow week12
Ideally:
410/510 9 of 20
Front End
• Knowledge about the source language– Lexical structure (tokens)– Syntax
• Programming constructs– Conditionals, iteration etc
– Semantics• Type checking
• Error-reporting– UI component
• Often basic (and unhelpful!)• May vary if part of an IDE or standalone
Source program
Lexical analyser
Syntaxanalyser
Semanticanalyser
Symboltable
ErrorHandler
410/510 10 of 20
Back-end
• Knowledge about target processor / virtual machine– Instruction set
• ‘costs’ of different:– op-codes– instructions
– Registers– Memory
Semantic analyser
Intermediate code generator
Code optimiser
Codegenerator
Symbol tablemanager
Error handler
410/510 11 of 20
Putting it together
Source program
Lexical analyser
Syntaxanalyser
SemanticanalyserSymbol
tableError
Handler
Intermediate code generator
Code optimiser
Codegenerator
Compiler
Skeletal source program
preprocessor
compiler
assembler
Loaderlink-editor
Target asse mbly program
Relocatable machine code
Absolute machine code
Source program
A language-processing system
410/510 12 of 20
The Textbook
Compilers: principles, techniques & tools
Aho, Sethi & UllmanAddison-Wesley{‘The Dragon Book’}
410/510 13 of 20
Assessment
• Building a compiler for a new language• Front-end
– Lexical analysis– Parsing
• Back end– Generating assembler code
• Some formal and some practical– Formal more at the front-end
410/510 14 of 20
Programming & Tools
• Lexical analysis generator – lex / flex• Parser generator – yacc / bison• C / C++
– To implement the remainder of the compiler
• Unix environment– make files will be useful for coordinating lex and yacc
410/510 15 of 20
History of Compilers
• Grace Hopper – A-0 1952– B-0 (Flow-Matic) 1956
• Fortran compiler – 1957– (in the top 10 algorithms of 20th century)– (also, quicksort, fast fourier transforms, simplex LP ,..)
• BC (before compilers)– Low-level programming– Assemblers were a major advance
• Q: can an automatic translation to low-level languages be as efficient as writing it directly?
• The only way to show this was to do it - and the Fortran compiler provided a clear ‘yes’ answer
410/510 16 of 20
Instant Compilation
• Consider the program:main()
{ int a = 3; a = a + 1; }
Given a reasonably sensible assembly language a hand-compilation might be:
LDA #3STA 1LDA 1ADD a, #1STA 1
410/510 17 of 20
& an Instant Compiler could look like …
Switch( source_code_construct ) {case INT_DEC: print( “LDA #”, INT.value)
print(“STA 1”)
break
case INT_ADD: print(“LDA 1”)
print(“ADD a,#”, ADD.value)
print(“STA 1”)
break
} /* end switch */
410/510 18 of 20
The Problems ….
• Not efficient, (LDA #4; STA 1)• Only works for 1 variable • Only works at one location in memory
– (usually let assembler deal with symbolic addresses)
• Only has 2 programming constructs!• Not even slightly portable:
– 1 instruction set & 1 source language
410/510 19 of 20
More problems…
• No error reporting– type checking?
• Assumes:– Program is correct– Recognition of programming language constructs
• int a = 3 INT_DEC
– Access to values • INT.value, ADD.value
– 1:1 relationship between integers and memory locations
410/510 20 of 20
Solutions
• We can View compilers as a solution to all of these problems
• E.g.– Only compile correct programs to object code– Recognise all constructs in the language– Improve the efficiency of code
• Execution speed• Memory usage
– Meaningful error messages to the user– Cope with different target architectures
410/510 21 of 20
Why are compilers called compilers?
• In early compilers one of the main tasks was connecting object program to – standard library functions, I/O devices
• collecting information from different sources(e.g. libraries)– OS and processor dependent
• This is now performed by ‘linkers’• Compile – ‘construct by collecting from different
sources’