cpsc 325 - compiler tutorial 9 review of compiler
Post on 18-Dec-2015
249 views
TRANSCRIPT
Compiler and compilation
A high level programming language is usually described in terms of a grammar
– Grammar specifies the form of, syntax, of legal statements in the language
– Compilation = matching statements written by the programmer to structures defined by the grammar and generating the appropriate object code for each statement
We can see a source program as a sequence of tokens
– Keywords, variables, block, etc.
Lexical analysis/scanner
The task of scanning the source statement, recognizing and classifying the various tokens
Part of compiler that performs lexical analysis Help the parser to parse and make the
parser run/work more efficiently
Parser
Each statement in the program is recognized as some language construct, such as a declaration or an assignment statement described by grammar.
Symbol table/Analyzer (Optional)
Build the symbol table and located the memory locations for the program. It can be a very messy task.
Many different way to implement it The symbol table will be used through the
whole program Once the location had been located, then we
do NOT need the symbol table anymore
Code generator
Generate the Object/Target code Sometimes the target/object code be
optimize by the optimizer
Note1: the optimizer is totally optional
Note2: It is possible to compile a program in a single pass.
Note3: Compilers that perform code optimization generally make several passes
Compiler ideas
Compilers divide their problem into steps or passes to conquer it
Initial pass takes the source program as input
The last pass output the code for execution
Passes
Pass 1: Preprocessor– Macro and constitution– Strip Comments from source code
Pass 2: Lexical analyzer, Parser, Code generator– Heart of the compiler– Translates source into a platform independent
language much like assembler (Intermediate code)
Passes (cont.)
Pass 3: Optimizer– Improves the quality of the intermediate code
Pass 4: Back end– Translates the optimized code to real assembler
language or directly to some binary executable code
– Provides target independence for earlier phases
Lexical analyzer/Scanner
Scanning the program to be compiled and recognizing the tokens that make up the source statements
Converts the incoming source into a series of basic language elements
– A = B +3 has 5 tokens. Tokens have meaning and are indivisible
– In C, “while” is one token, you can’t say “wh” “and ile”– Can be placed into symbol table and have information asso
ciated with them Type, value, name, relationship to other structures Can be referenced by unique integer for later usage
Lexical Analyzer/Scanner (cont)
Scanners are usually designed to recognize keywords, operators, and identifiers as well as integers, floating-point number and others
The “Longest Match Rule” – which match the longest tokens in the library; if not otherwise stated. (For example >> is NOT > and > )
Variable are recognize as ONE token instead of many Characters
Lexical Analyzer/Scanner (cont)
The output of the scanner is a sequence of token coding
Token specifier: gives the identifier name, value, etc., that was found by scanner– Some scanner are designed to enter identifiers dir
ectly into a symbol table– Token specifier for an identifier might be a pointer
to the symbol-table entry for that idnetifier
Parser
The parser analyses the source grammatically to determine whether it meets the language specification and to develop a representation better suited to code generation
Parser invokes the lexical analyzer to get the next token (reference into symbol table) and its corresponding lexeme
Check the syntax of a sentence
Parser (cont)
To summarize– Parser breaks the token stream into a parse tree– Parse tree is a structural representation of the
sentence or program being parsed
Analyzer and Symbol Table
Omit – Since not everyone in the class do it Analyzer generate the symbol table for later
use
Code Generator
Last task of compilation generation of object code
Most compilers generate the output of the code generator as the parse progresses instead of leaving it until after a parse tree is build
Small part of the parse tree fill in code templates that are generated by the code generator
Code generator (cont)
Code generator can generate– Executable– Advantage: fast– Some aspects of optimization can still take place by observi
ng the final linear instruction stream
OR– Intermediate language representation that is close to assem
bler but has additional information– Makes it easier for optimizers to perform further optimizatio
ns to generate faster code
Intermediate Language
All code generation is machine dependent as we must know the instruction set of a computer to generate code for it
Intermediate form: syntax and semantics of the source statements have been completely analyzed, but the actual translation into machine code has not yet been performed.
Transportable: from one to the others. (Intel, Motorola, etc)
Processed by interpreters (TM, JM – byte code)
BNF – Backus Naur Form
Describe the grammars for language– Set ok tokens called terminal symbols
For things like numbers, key words, predefined symbols
– Set of definitions called non-terminal symbols For example: a := b | c (a is either b or c)
– Definitions create a system in which every legal structure can be represented
– Grammars are typically recursive, so recursion can be used to parse the grammar
Summary
Compilers can recognize when templates or objects are instantiated and destroyed– These are part of the language definition– Once the pattern is matched, it can output
intermediate level code to support these operations
Template parameters can be filled in Calls are made to appropriate routines to
construct/destroy objects