c program compiler presentation

34
Rigvendra Kumar Vardhan M.TECH ECE Pondicherry University 1

Upload: rigvendra-kumar-vardhan

Post on 18-Jul-2015

139 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: C program compiler presentation

Rigvendra Kumar Vardhan

M.TECH ECE

Pondicherry University

1

Page 2: C program compiler presentation

Interpreters (direct execution) Assemblers Preprocessors Text formatters (non-WYSIWYG) Analysis tools

CS 540 Spring 2013 GMU 2

Page 3: C program compiler presentation

InterpreterA program that reads a source program

and produces the results of executing that program

CompilerA program that translates a program

from one language (the source) to another (the target)

3

Page 4: C program compiler presentation

Correctness Speed (runtime and compile time)

Degrees of optimization Multiple passes

Space Feedback to user Debugging

CS 540 Spring 2013 GMU 4

Page 5: C program compiler presentation

InterpreterExecution engineProgram execution interleaved with

analysisrunning = true;while (running) { analyze next statement; execute that statement;}

May involve repeated analysis of some statements (loops, functions)

5

Page 6: C program compiler presentation

Read and analyze entire program Translate to semantically equivalent

program in another languagePresumably easier to execute or more

efficientShould “improve” the program in some

fashion Offline process

Tradeoff: compile time overhead (preprocessing step) vs execution performance

6

Page 7: C program compiler presentation

CompilersFORTRAN, C, C++, Java, COBOL, etc.

etc.Strong need for optimization, etc.

InterpretersPERL, Python, awk, sed, sh, csh,

postscript printer, Java VMEffective if interpreter overhead is low

relative to execution cost of language statements

7

Page 8: C program compiler presentation

8

Source code

Compiler

Assembly code

AssemblerObject code(machine code)

Fully-resolved objectcode (machine code)

Executable image

Linker

Loader

Page 9: C program compiler presentation

Series of program representations

Intermediate representations optimized

for program manipulations of various

kinds (checking, optimization)

Become more machine-specific, less

language-specific as translation proceeds

9

Page 10: C program compiler presentation

First approximation Front end: analysis

Read source program and understand its structure and meaning

Back end: synthesis Generate equivalent target language program

10

Source TargetFront End Back End

Page 11: C program compiler presentation

Must recognize legal programs (& complain about illegal ones)

Must generate correct code Must manage storage of all variables Must agree with OS & linker on target

format

11

Source TargetFront End Back End

Page 12: C program compiler presentation

Need some sort of Intermediate Representation (IR)

Front end maps source into IR Back end maps IR to target machine code

12

Source TargetFront End Back End

Page 13: C program compiler presentation

CS 540 Spring 2013 GMU 13

Scanner(lexical

analysis)

Parser(syntax

analysis)

CodeOptimizer

SemanticAnalysis

(IC generator)

CodeGenerator

SymbolTable

Sourcelanguage

tokens Syntacticstructure

IntermediateLanguage

Targetlanguage

IntermediateLanguage

Page 14: C program compiler presentation

Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation Code Optimization

CS 540 Spring 2013 GMU 14

Page 15: C program compiler presentation

15

Source code(character stream)

Lexical analysis

ParsingToken stream

Abstract syntax tree

Intermediate Code Generation

Intermediate code

Optimization

Code generationAssembly code

Intermediate code

Front end(machine-independent)

Back end(machine-dependent)

Page 16: C program compiler presentation

Split into two partsScanner: Responsible for converting

character stream to token stream Also strips out white space, comments

Parser: Reads token stream; generates IR Both of these can be generated

automaticallySource language specified by a formal

grammarTools read the grammar and generate

scanner & parser (either table-driven or hard coded)

16

Scanner Parsersource tokens IR

Page 17: C program compiler presentation

Token stream: Each significant lexical chunk of the program is represented by a tokenOperators & Punctuation: {}[]!+-=*;: …Keywords: if while return gotoIdentifiers: id & actual nameConstants: kind & value; int, floating-

point character, string, …

17

Page 18: C program compiler presentation

Input text// this statement does very l i t t leif (x >= y) y = 42;

Token Stream

Note: tokens are atomic items, not character strings

18

IF LPAREN ID(x) GEQ ID(y)

RPAREN ID(y) BECOMES INT(42) SCOLON

Page 19: C program compiler presentation

Many different forms(Engineering tradeoffs)

Common output from a parser is an abstract syntax treeEssential meaning of the program

without the syntactic noise

19

Page 20: C program compiler presentation

Token Stream Input Abstract Syntax Tree

20

IF LPAREN ID(x)

GEQ ID(y) RPAREN

ID(y) BECOMES

INT(42) SCOLON

ifStmt

>=

ID(x) ID(y)

assign

ID(y) INT(42)

Page 21: C program compiler presentation

During or (more common) after parsingType checkingCheck for language requirements like

“declare before use”, type compatibilityPreliminary resource allocationCollect other information needed by

back end analysis and code generation

21

Page 22: C program compiler presentation

ResponsibilitiesTranslate IR into target machine codeShould produce fast, compact codeShould use machine resources

effectivelyRegisters InstructionsMemory hierarchy

22

Page 23: C program compiler presentation

Typically split into two major parts with sub phases“Optimization” – code improvements

May well translate parser IR into another IR

Code generation Instruction selection & schedulingRegister allocation

23

Page 24: C program compiler presentation

Input

if (x >= y) y = 42;

Output

mov eax,[ebp+16] cmp eax,[ebp-8] jl L17 mov [ebp-8],42L17:

24

Page 25: C program compiler presentation

lda $30,-32($30)stq $26,0($30)stq $15,8($30)bis $30,$30,$15bis $16,$16,$1stl $1,16($15)lds $f1,16($15)sts $f1,24($15)ldl $5,24($15)bis $5,$5,$2s4addq $2,0,$3ldl $4,16($15)mull $4,$3,$2ldl $3,16($15)addq $3,1,$4mull $2,$4,$2ldl $3,16($15)addq $3,1,$4mull $2,$4,$2stl $2,20($15)ldl $0,20($15)br $31,$33

$33:bis $15,$15,$30ldq $26,0($30)ldq $15,8($30)addq $30,32,$30ret $31,($26),1

25

Optimized Codes4addq $16,0,$0mull $16,$0,$0addq $16,1,$16mull $0,$16,$0mull $0,$16,$0ret $31,($26),1

Unoptimized Code

Page 26: C program compiler presentation

26

Source code(character stream)

Lexical analysis

Parsing

Token stream

Abstract syntax tree(AST)

Semantic Analysis

if (b == 0) a = b;

if ( b ) a = b ;0==

if==

b 0

=

a b

if

==

int b int 0

=

int alvalue

int b

boolean

Decorated ASTint

;

;

Page 27: C program compiler presentation

27

Intermediate Code Generation

Optimization

Code generation

if

==

int b int 0

=

int alvalue

int b

boolean int;

CJUMP ==

MEM

fp 8

+

CONST MOVE

0 MEM MEM

fp 4 fp 8

NOP

+ +

CJUMP ==

CONST MOVE

0 DX CX

NOPCXCMP CX, 0CMOVZ DX,CX

Page 28: C program compiler presentation

Compiler techniques are everywhereParsing (little languages, interpreters)Database enginesAI: domain-specific languagesText processing

Tex/LaTex -> dvi -> Postscript -> pdfHardware: VHDL; model-checking toolsMathematics (Mathematica, Matlab)

28

Page 29: C program compiler presentation

Fascinating blend of theory and engineeringDirect applications of theory to practice

Parsing, scanning, static analysisSome very difficult problems (NP-hard or

worse)Resource allocation, “optimization”,

etc.Need to come up with good-enough

solutions

29

Page 30: C program compiler presentation

Ideas from many parts of CSEAI: Greedy algorithms, heuristic searchAlgorithms: graph algorithms, dynamic

programming, approximation algorithms

Theory: Grammars DFAs and PDAs, pattern matching, fixed-point algorithms

Systems: Allocation & naming, synchronization, locality

Architecture: pipelines & hierarchy management, instruction set use

30

Page 31: C program compiler presentation

program ::= statement | program statement

statement ::= assignStmt | ifStmt assignStmt ::= id = expr ; ifStmt ::= if ( expr ) stmt expr ::= id | int | expr + expr Id ::= a | b | c | i | j | k | n | x | y | z int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

31

Page 32: C program compiler presentation

There are several syntax notations for productions in common use; all mean the same thingifStmt ::= if ( expr ) stmtifStmt if ( expr ) stmt<ifStmt> ::= if ( <expr> ) <stmt>

32

Page 33: C program compiler presentation

a = 1 ; if ( a + 1 )

b = 2 ;

33

program ::= statement | program statementstatement ::= assignStmt | i fStmtassignStmt ::= id = expr ;i fStmt ::= if ( expr ) stmtexpr ::= id | int | expr + exprid ::= a | b | c | i | j | k | n | x | y | zint ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

program

program

ID(a) expr

stmt

stmt

assign

int (1)

ifStmt

expr stmt

expr expr

int (1)ID(a) ID(b) expr

assign

int (2)

Page 34: C program compiler presentation

34