c program compiler presentation
TRANSCRIPT
Rigvendra Kumar Vardhan
M.TECH ECE
Pondicherry University
1
Interpreters (direct execution) Assemblers Preprocessors Text formatters (non-WYSIWYG) Analysis tools
CS 540 Spring 2013 GMU 2
InterpreterA program that reads a source program
and produces the results of executing that program
CompilerA program that translates a program
from one language (the source) to another (the target)
3
Correctness Speed (runtime and compile time)
Degrees of optimization Multiple passes
Space Feedback to user Debugging
CS 540 Spring 2013 GMU 4
InterpreterExecution engineProgram execution interleaved with
analysisrunning = true;while (running) { analyze next statement; execute that statement;}
May involve repeated analysis of some statements (loops, functions)
5
Read and analyze entire program Translate to semantically equivalent
program in another languagePresumably easier to execute or more
efficientShould “improve” the program in some
fashion Offline process
Tradeoff: compile time overhead (preprocessing step) vs execution performance
6
CompilersFORTRAN, C, C++, Java, COBOL, etc.
etc.Strong need for optimization, etc.
InterpretersPERL, Python, awk, sed, sh, csh,
postscript printer, Java VMEffective if interpreter overhead is low
relative to execution cost of language statements
7
8
Source code
Compiler
Assembly code
AssemblerObject code(machine code)
Fully-resolved objectcode (machine code)
Executable image
Linker
Loader
Series of program representations
Intermediate representations optimized
for program manipulations of various
kinds (checking, optimization)
Become more machine-specific, less
language-specific as translation proceeds
9
First approximation Front end: analysis
Read source program and understand its structure and meaning
Back end: synthesis Generate equivalent target language program
10
Source TargetFront End Back End
Must recognize legal programs (& complain about illegal ones)
Must generate correct code Must manage storage of all variables Must agree with OS & linker on target
format
11
Source TargetFront End Back End
Need some sort of Intermediate Representation (IR)
Front end maps source into IR Back end maps IR to target machine code
12
Source TargetFront End Back End
CS 540 Spring 2013 GMU 13
Scanner(lexical
analysis)
Parser(syntax
analysis)
CodeOptimizer
SemanticAnalysis
(IC generator)
CodeGenerator
SymbolTable
Sourcelanguage
tokens Syntacticstructure
IntermediateLanguage
Targetlanguage
IntermediateLanguage
Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation Code Optimization
CS 540 Spring 2013 GMU 14
15
Source code(character stream)
Lexical analysis
ParsingToken stream
Abstract syntax tree
Intermediate Code Generation
Intermediate code
Optimization
Code generationAssembly code
Intermediate code
Front end(machine-independent)
Back end(machine-dependent)
Split into two partsScanner: Responsible for converting
character stream to token stream Also strips out white space, comments
Parser: Reads token stream; generates IR Both of these can be generated
automaticallySource language specified by a formal
grammarTools read the grammar and generate
scanner & parser (either table-driven or hard coded)
16
Scanner Parsersource tokens IR
Token stream: Each significant lexical chunk of the program is represented by a tokenOperators & Punctuation: {}[]!+-=*;: …Keywords: if while return gotoIdentifiers: id & actual nameConstants: kind & value; int, floating-
point character, string, …
17
Input text// this statement does very l i t t leif (x >= y) y = 42;
Token Stream
Note: tokens are atomic items, not character strings
18
IF LPAREN ID(x) GEQ ID(y)
RPAREN ID(y) BECOMES INT(42) SCOLON
Many different forms(Engineering tradeoffs)
Common output from a parser is an abstract syntax treeEssential meaning of the program
without the syntactic noise
19
Token Stream Input Abstract Syntax Tree
20
IF LPAREN ID(x)
GEQ ID(y) RPAREN
ID(y) BECOMES
INT(42) SCOLON
ifStmt
>=
ID(x) ID(y)
assign
ID(y) INT(42)
During or (more common) after parsingType checkingCheck for language requirements like
“declare before use”, type compatibilityPreliminary resource allocationCollect other information needed by
back end analysis and code generation
21
ResponsibilitiesTranslate IR into target machine codeShould produce fast, compact codeShould use machine resources
effectivelyRegisters InstructionsMemory hierarchy
22
Typically split into two major parts with sub phases“Optimization” – code improvements
May well translate parser IR into another IR
Code generation Instruction selection & schedulingRegister allocation
23
Input
if (x >= y) y = 42;
Output
mov eax,[ebp+16] cmp eax,[ebp-8] jl L17 mov [ebp-8],42L17:
24
lda $30,-32($30)stq $26,0($30)stq $15,8($30)bis $30,$30,$15bis $16,$16,$1stl $1,16($15)lds $f1,16($15)sts $f1,24($15)ldl $5,24($15)bis $5,$5,$2s4addq $2,0,$3ldl $4,16($15)mull $4,$3,$2ldl $3,16($15)addq $3,1,$4mull $2,$4,$2ldl $3,16($15)addq $3,1,$4mull $2,$4,$2stl $2,20($15)ldl $0,20($15)br $31,$33
$33:bis $15,$15,$30ldq $26,0($30)ldq $15,8($30)addq $30,32,$30ret $31,($26),1
25
Optimized Codes4addq $16,0,$0mull $16,$0,$0addq $16,1,$16mull $0,$16,$0mull $0,$16,$0ret $31,($26),1
Unoptimized Code
26
Source code(character stream)
Lexical analysis
Parsing
Token stream
Abstract syntax tree(AST)
Semantic Analysis
if (b == 0) a = b;
if ( b ) a = b ;0==
if==
b 0
=
a b
if
==
int b int 0
=
int alvalue
int b
boolean
Decorated ASTint
;
;
27
Intermediate Code Generation
Optimization
Code generation
if
==
int b int 0
=
int alvalue
int b
boolean int;
CJUMP ==
MEM
fp 8
+
CONST MOVE
0 MEM MEM
fp 4 fp 8
NOP
+ +
CJUMP ==
CONST MOVE
0 DX CX
NOPCXCMP CX, 0CMOVZ DX,CX
Compiler techniques are everywhereParsing (little languages, interpreters)Database enginesAI: domain-specific languagesText processing
Tex/LaTex -> dvi -> Postscript -> pdfHardware: VHDL; model-checking toolsMathematics (Mathematica, Matlab)
28
Fascinating blend of theory and engineeringDirect applications of theory to practice
Parsing, scanning, static analysisSome very difficult problems (NP-hard or
worse)Resource allocation, “optimization”,
etc.Need to come up with good-enough
solutions
29
Ideas from many parts of CSEAI: Greedy algorithms, heuristic searchAlgorithms: graph algorithms, dynamic
programming, approximation algorithms
Theory: Grammars DFAs and PDAs, pattern matching, fixed-point algorithms
Systems: Allocation & naming, synchronization, locality
Architecture: pipelines & hierarchy management, instruction set use
30
program ::= statement | program statement
statement ::= assignStmt | ifStmt assignStmt ::= id = expr ; ifStmt ::= if ( expr ) stmt expr ::= id | int | expr + expr Id ::= a | b | c | i | j | k | n | x | y | z int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
31
There are several syntax notations for productions in common use; all mean the same thingifStmt ::= if ( expr ) stmtifStmt if ( expr ) stmt<ifStmt> ::= if ( <expr> ) <stmt>
32
a = 1 ; if ( a + 1 )
b = 2 ;
33
program ::= statement | program statementstatement ::= assignStmt | i fStmtassignStmt ::= id = expr ;i fStmt ::= if ( expr ) stmtexpr ::= id | int | expr + exprid ::= a | b | c | i | j | k | n | x | y | zint ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
program
program
ID(a) expr
stmt
stmt
assign
int (1)
ifStmt
expr stmt
expr expr
int (1)ID(a) ID(b) expr
assign
int (2)
34