lecture1 - overview of compiler
TRANSCRIPT
-
8/3/2019 Lecture1 - Overview of Compiler
1/49
Overview of Compiler
Compiler is a program (written in ahigh-level language) that converts /
translates / compiles source programwritten in a high level language intoan equivalent machine code.
source program machine code
or object code
compiler
-
8/3/2019 Lecture1 - Overview of Compiler
2/49
What is a Compiler?
Definition: A compiler is a programthat translates one language toanother
Usually, the translation takes placebetween a high-level language anda low-level language
Clearly, our first step is to discusssome terminology
-
8/3/2019 Lecture1 - Overview of Compiler
3/49
Terminology
Source language the language thatis being translated
Object language the language intowhich the translation is being done
High-level language a language thatis far removed from a computer; one
which is close to the problem area(s)for which the language is designed
-
8/3/2019 Lecture1 - Overview of Compiler
4/49
Terminology
Low-level language a language thatis close to the machine (computer)upon which the language will run(execute)
Object language (sometimes calledmachine code) the language of some
computer. This language usually isnot human readable (and isexpressed in bits or hex)
-
8/3/2019 Lecture1 - Overview of Compiler
5/49
Terminology
Intermediate language a language thatis used either:
because it is a temporary step in the
translation process; or, because it is neither particularly, high, nor
low, and is the output of a translation
Assembly language a language that
translates almost one-to-one to machinelanguage, but is in human readable form
-
8/3/2019 Lecture1 - Overview of Compiler
6/49
Whats a Compiler?...
Today, compilers are written using high-level languages (such as Java, C++, etc.)
The earliest compilers were written using
assembly language (e.g., FORTRAN andCOBOL around 1954)
Sometimes a compiler is written in thesame language for which one is writing a
compiler. This is done throughBootstrapping.
-
8/3/2019 Lecture1 - Overview of Compiler
7/49
Why Should I learn CompilerConstruction?
How do compilers work?
How do computers work? (instruction set,registers, addressing modes, run time data
structures, )What machine code is generated for certain
language constructs? (efficiencyconsiderations)
Getting "a feeling" for good language design
-
8/3/2019 Lecture1 - Overview of Compiler
8/49
Why Compilers? A Brief History
The first computers were hard-wired
That is, they were collections ofphysical devices that connected toone-another, in an assemblagedesigned to calculate particular kinds
of results
-
8/3/2019 Lecture1 - Overview of Compiler
9/49
Why Compilers? A BriefHistory
For example, Babbages AnalyticEngine and his Difference Enginewere assemblages of gears that
solved numeric problems The primary driving force was the
calculation of ballistics tables forartillery
Jacquards loom is another example And Holleriths work for the US
Census bureau is another
-
8/3/2019 Lecture1 - Overview of Compiler
10/49
Why Compilers? A BriefHistory
In the late 1940s John von Neumanninvented the stored programcomputer
The invention is the observationthat just as you can store data in thememory of a computer, the data can
be machine instructions Then the computer can not only take
its instructions from memory
-
8/3/2019 Lecture1 - Overview of Compiler
11/49
Why Compilers? A BriefHistory
But the computer can modify theinstructions in its memory
And, in fact, can write its ownprograms, storing them in memory
It quickly became apparent that thesimplest way to store information in a
computer was in the form of binarynumbers
-
8/3/2019 Lecture1 - Overview of Compiler
12/49
Why Compilers? A BriefHistory
So, to program a computer, you onlyneeded to enter a sequence of binarynumbers into memory, and then tell
the computer at which memoryaddress to start execution
This was programming in machinelanguage
Instructions (and data) were enteredfrom a console, one word (in binary)at a time
-
8/3/2019 Lecture1 - Overview of Compiler
13/49
Why Compilers? A BriefHistory
This form of coding (note the word!)quickly was replaced by programmingin assembly language
A program was written (in machinelanguage) which translated assemblylanguage to machine language (called
an assembler)
-
8/3/2019 Lecture1 - Overview of Compiler
14/49
Why Compilers? A BriefHistory
After the first assembler was written,no one needed to code in machinelanguage any longer
But, coding x = 3; can take manyinstructions
So, the thought was can we create
a program that translates somethinglike x = 3; into assembly languageor into machine language?
-
8/3/2019 Lecture1 - Overview of Compiler
15/49
Why Compilers? A Brief History.Formal Languages
About the same time, in the mid-1950s, Noam Chomsky (M.I.T.)began investigating the formalstructure of natural languages
His work led to the Chomskyhierarchy oftype 0, 1, 2, 3 languages
and their associated grammars
-
8/3/2019 Lecture1 - Overview of Compiler
16/49
Why Compilers? A Brief History.Formal Languages
The type 2 (context-free) grammarsturned out to be very good atdescribing computer languages
And, efficient ways to recognize thestructure of a source program using atype 2 were developed
Such recognition is called parsing
-
8/3/2019 Lecture1 - Overview of Compiler
17/49
Why Compilers? A Brief History.Formal Languages
Very closely related to context-freegrammars are the type 3 grammars
These are equivalent to finiteautomata and regular grammars
An entire sub-branch of mathematicsstudies automata; its called
automata theory
-
8/3/2019 Lecture1 - Overview of Compiler
18/49
Why Compilers? A Brief History.Formal Languages
It turns out that type 3 (regular)grammars are very good atdescribing the atoms used in
computer languages These atoms are the reserved
words, symbols, and user-definedwords that are used in a computerlanguage
Recognizing atoms is called scanning(or lexing)
-
8/3/2019 Lecture1 - Overview of Compiler
19/49
Why Compilers? A BriefHistory
By far the most difficult andcomplicated problem has been howto generate object code that isconcise, and most importantly,executes efficiently
This is called optimization
-
8/3/2019 Lecture1 - Overview of Compiler
20/49
Why Compilers? A BriefHistory
Far simpler are the front-end issuesofscanning and parsing = recognizingthe source code
This is due to the fact that wevedeveloped (semi-) automatic ways tocreate scanners and parsers
using scanner generators and parsergenerators
-
8/3/2019 Lecture1 - Overview of Compiler
21/49
Programs Related toCompilers
Interpreters directly executesthe code upon recognition;usually statement by statement
Assemblers translateassembly language to machinelanguage
Macro Assemblers ditto, butwith (powerful) macrocapabilities
-
8/3/2019 Lecture1 - Overview of Compiler
22/49
Programs Related toCompilers
Linkers combine objectmodules to produce an
executable module Linkage Editors manage the
linking process, and are able to
create/maintain object libraries
-
8/3/2019 Lecture1 - Overview of Compiler
23/49
Programs Related toCompilers
Loaders load executablemodules into memory, and
launch executionDynamic Loaders loaders that
stay around during execution to
handle the loading of DLLs(dynamically loadable libraries)
-
8/3/2019 Lecture1 - Overview of Compiler
24/49
Programs Related toCompilers
Preprocessors usually aseparate program whose input is
source code and whose output issource code; perform macroexpansion, comment deletion,
etc. Sometimes the first phaseof a compiler
-
8/3/2019 Lecture1 - Overview of Compiler
25/49
Programs Related toCompilers
Editors allow the user to create andupdate source code
Smart Editors include syntaxcoloring, parenthesis balancing, etc.
Debuggers a program that providesan environment in which code may be
debugged; including single stepping,symbol tables, etc.
-
8/3/2019 Lecture1 - Overview of Compiler
26/49
Programs Related toCompilers
IDEs integrated developmentenvironments; provide integratededitor-debugger-executionenvironments
Profilers collects statistics aboutwhere programs spend their time
during execution; important foroptimizing at the source code level
-
8/3/2019 Lecture1 - Overview of Compiler
27/49
Programs Related toCompilers
Project Managers programs thathelp software managers deal withhundreds or thousands of modules;build reports, etc.
SCCS source code control systems;provide for multiple access to shared
code in a control manner
-
8/3/2019 Lecture1 - Overview of Compiler
28/49
The Translation Process
The translation process consists ofa collection ofphases, with the outputof one phase feeding the input of the
next
The original source code istransformed into a sequence of
intermediate representations (IRs)during this process
-
8/3/2019 Lecture1 - Overview of Compiler
29/49
The
Translation
Process
-
8/3/2019 Lecture1 - Overview of Compiler
30/49
Phases of Compiler
Parallel to all other phases are twoactivities:
Symbol table manipulation. Symboltable is one of the primary data-structures that a compiler uses. Thisdata-structure is used by all of the
phases.Error detecting and handling
-
8/3/2019 Lecture1 - Overview of Compiler
31/49
The Scanner
The scanner reads the sourceprogram, as a stream of characters,and it performs lexical analysis
collecting sequences of charactersinto meaningful units called tokens
The scanner also may create a
symbol table and a literal table
-
8/3/2019 Lecture1 - Overview of Compiler
32/49
The Parser
The parser reads the tokens producedby the scanner and performssyntactic analysis creating an IR (a
parse tree or a syntax tree) showingthe structure of the program
Syntax trees (abstract syntax trees)are reduced representations of the
tree, with many irrelevant nodeseliminated
-
8/3/2019 Lecture1 - Overview of Compiler
33/49
The Semantic Analyzer
The semantics of a program are itsmeaning what it is intended toaccomplish
The semantic analyzer creates anintermediate data structure thatcontains this meaning these are thestatic semantics
The dynamic semantics of a programonly can be determined byexecuting the program
-
8/3/2019 Lecture1 - Overview of Compiler
34/49
The Semantic Analyzer
An example of the static semantics ofa program is the data types of thevariables (and expressions)
These static semantics usually arerepresented in the intermediaterepresentations (IRs) as attributes
The IR usually is a tree, decoratedwith these attributes
-
8/3/2019 Lecture1 - Overview of Compiler
35/49
(Source) Code Optimization
Optimization may occur duringseveral phases
Source code optimization rearrangesthe source (or the IR of the source) inorder to produce more optimal results
E.g., x = 7 + 9; can become x =
16;
This is called constant folding
-
8/3/2019 Lecture1 - Overview of Compiler
36/49
(Source) Code Optimization
Duplicated computations can besaved as temporaries and thentheir values re-used
Recursion can be converted toiteration
Repeated calculations can be moved
out of loops
The possibilities are endless
-
8/3/2019 Lecture1 - Overview of Compiler
37/49
The Code Generator
The code generator takes the IR andgenerates code for the targetmachine
Here the details of how variousnumeric and non-numeric quantitiesare represented become important
E.g., word length, hardware stack,hardware calling conventions,memory access, etc.
-
8/3/2019 Lecture1 - Overview of Compiler
38/49
The Target Code Optimizer
The target code optimizer examinesthe emitted target code to see iffurther possibilities for optimization
are present and then capitalizes uponthem
E.g., reuse of registers, using a shift
instruction to replace a multiplicationor division, etc.
-
8/3/2019 Lecture1 - Overview of Compiler
39/49
Phases of the compiler
Lexical AnalyzerScanner
Parser
Semantic Analyzer
Source Program
Syntax Analyzer
Tokens
Parse Tree
Abstract Syntax Tree with
attributes
-
8/3/2019 Lecture1 - Overview of Compiler
40/49
Sample Program Compiled
Consider the example:
int a, b{
a = 100;b = f (a) + 3}
Source Program
Lexical Analyzer
Token stream
-
8/3/2019 Lecture1 - Overview of Compiler
41/49
Sample Program Compiled
Tokens are entities defined by the compiler writer
which are of interest. A sequence of characters with
collective meanings are grouped to form a token.
Examples of Tokens:Single Character operator: = + - * >