cs308 compiler principles introduction fan wu department of computer science and engineering...

59
CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Upload: charlotte-mcdaniel

Post on 02-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

CS308 Compiler Principles

Introduction

Fan WuDepartment of Computer Science and Engineering

Shanghai Jiao Tong University

Page 2: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Why study compiling?• Importance:

– Programs written in high-level languages have to be translated into binary codes before executing

– Reduce execution overhead of the programs

– Make high-performance computer architectures effective on users' programs

• Influence:– Language Design

– Computer Architecture (influence is bi-directional)

• Techniques used influence other areas – Text editors, information retrieval system, and pattern recognition programs

– Query processing system such as SQL

– Equation solver

– Natural Language Processing

– Debugging and finding security holes in codes

– …

2

Page 3: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Compiler Concept• A compiler is a program that takes a

program written in a source language and translates it into an equivalent program in a target language.

( Normally a program written in a high-level programming language)

( Normally the equivalent program in machine code relocatable object file)

source program COMPILER target program

error messages

3

Page 4: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Interpreter• An interpreter directly executes the

operations specified in the source program on inputs supplied by the user.

source program INTERPRETER output

error messages

input

4

Page 5: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Programming Languages• Compiled languages:

– Fortran, Pascal, C, C++, C#, Delphi, Visual Basic, …

• Interpreted languages:– BASIC, Perl, PHP, Ruby, TCL, MATLAB,…

• Joint Compiled and Interpreted languages– Java, Python, …

5

Page 6: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Compiler vs. Interpreter• Preprocessing

– Compilers do extensive preprocessing– Interpreters run programs “as is”, with little or

no preprocessing

• Efficiency– The target program produced by a compiler is

usually much faster than interpreting the source codes

6

Page 7: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Compiler Structure

Front End – language specific

Back End –machine specific

SourceLanguage

Target Language

Intermediate Language

•Separation of Concerns•Retargeting

Analysis SynthesisSymbol Table

7

Page 8: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Two Main Phases• Analysis Phase: breaks up a source

program into constituent pieces and produces an internal representation of it called intermediate code.

• Synthesis Phase: translates the intermediate code into the target program.

8

Page 9: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Phases of Compilation

Lexical AnalyzerSyntax Analyzer

Semantic AnalyzerIntermediate Code

Generator

Code OptimizerCode Generator

SourceLanguage

Target Language

Intermediate Language

Analysis SynthesisSymbol Table

• Compilers work in a sequence of phases.

• Each phase transforms the source program from one representation into another representation.

• They use the symbol table to store information of the entire source program.

9

Page 10: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

A Model of A Compiler Font End

• Lexical analyzer reads the source program character by character and returns the tokens of the source program.

• Parser creates the tree-like syntactic structure of the given program.

• Intermediate-code generator translates the syntax tree into three-address codes.

10

Page 11: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Lexical Analysis

11

Page 12: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Lexical Analysis• Lexical Analyzer reads the source

program character by character and returns the tokens of the source program.

<token-name, attribute-value>

• A token describes a pattern of characters having the same meaning in the source program. (such as identifiers, operators, keywords, numbers, delimiters, and so on)

12

<NUM, 60>

Page 13: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

White Space Removal• No blank, tab, newline, or comments in

grammar

Skipping white space

13

Page 14: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Constants• When a sequence of digits appears in the input

stream, the lexical analyzer passes to the parser a token consisting of the terminal num along with an integer-valued attribute computed from the digits.31+28+59 <num, 31><+><num, 28><+><num, 59>

• Simulate parsing some number ....

14

Page 15: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Keywords and Identifiers

Identifiers:

Keywords:

A character string forms an identifier only if it is not a keyword.

Fixed character strings used as punctuation marks or to identify constructs.

15

Page 16: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Lexical Analysis Cont’d• Puts information about identifiers into the

symbol table.

• Regular expressions are used to describe tokens (lexical constructs).

• A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer.

16

Page 17: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Symbol Table

17

Page 18: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Symbol Table• Symbol Tables are data structures that are

used by compilers to hold information about the source-program constructs.

• For each identifier, there is an entry in the symbol table containing its information.

• Symbol tables need to support multiple declarations of the same identifier– One symbol table per scope (of declaration)...

{ int x; char y; { bool y; x; y; } x; y; }

x int

y char y bool

Outer symbol table Inner symbol table

18

Page 19: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Parsing

• A Syntax/Semantic Analyzer (Parser) creates the syntactic structure (generally a parse tree) of the given program.

• Parsing is the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the grammar

20

Page 20: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Syntax Definition• Context-Free Grammar (CFG) is used to specify

the syntax of a formal language (for example a programming language like C, Java)

• Grammar describes the structure (usually hierarchical) of programming languages.

– Example: in Java an IF statement should fit in • if ( expression ) statement else statement

– statement if ( expression ) statement else statement

– Note the recursive nature of statement.

Production

23

Page 21: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Definition of CFG• Four components:

– A set of terminal symbols (tokens): elementary symbols of the language defined by the grammar

– A set of non-terminals (syntactic variables): represent the set of strings of terminals

– A set of productions: non-terminal a sequence of terminals and/or non-terminals

– A designation of one of the non-terminals as the start symbol.

24

Page 22: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

A Grammar ExampleList of digits separated by plus or minus signs

• Accepts strings such as 9-5+2, 3-1, or 7. • 0, 1, …, 9, +, - are the terminal symbols • list and digit are non-terminals • Every “line” is a production• list is the start symbol• Grouping: list → list + digit | list – digit | digit

25

Page 23: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Derivations• A grammar derives strings by beginning with

the start symbol and repeatedly replacing a non-terminal by the body of a production

• Language: The terminal strings that can be derived from the start symbol defined by the grammar.

• Example: Derivation of 9-5+2– 9 is a list, since 9 is a digit.

– 9-5 is a list, since 9 is a list and 5 is a digit.

– 9-5+2 is a list, since 9-5 is a list and 2 is a digit.

26

Page 24: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Parse Trees• A parse tree shows how the start symbol

of a grammar derives a string in the language

A XYZ

27

Page 25: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Parse Trees Properties• The root is labeled by the start symbol.

• Each leaf is labeled by a terminal or by ε.

• Each interior node is labeled by a non-terminal.

• If A is the non-terminal labeling some interior node and X1, X2,… , Xn are the labels of the children of that node from left to right, then there must be a production A X1X2 · · · Xn.

28

Page 26: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Parse Tree for 9-5+2

29

Page 27: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Ambiguity• A grammar can have more than one parse

tree generating a given string of terminals.list list + digit | list – digit | digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

string string + string | string - string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

(9-5)+2 = 6 9-(5+2) = 29-5+2

30

Page 28: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Eliminating Ambiguity• Operator Associativity: in most

programming languages arithmetic operators have left associativity. – Example: 9+5-2 = (9+5)-2 – Exception: Assignment operator = has right

associativity: a=b=c is equivalent to a=(b=c)

• Operator Precedence: if an operator has higher precedence, then it will bind to it’s operands first. – Example: * has higher precedence than +,

therefore 9+5*2 = 9+(5*2)

31

Page 29: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Parsing• Parsing is the process of determining how

a string of terminals can be generated by a grammar.

• Two classes:

– Top-down: construction of parse tree starts at the root and proceeds towards the leaves

– Bottom-up: construction of parse tree starts at the leaves and proceeds towards the root

32

Page 30: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Top-Down Parsing• The top-down construction of a parse tree is

done by starting from the root, and repeatedly performing the following two steps.

– At node N, labeled with non-terminal A, select the proper production of A and construct children at N for the symbols in the production body.

– Find the next node at which a subtree is to be constructed, typically the leftmost unexpanded non-terminal of the tree.

33

Page 31: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Top-Down Parsing

34

Page 32: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Predictive Parsing• Recursive descent parsing: a top-down

method of syntax analysis in which a set of recursive procedures is used to process the input.

• Predictive parsing: a simple form of recursive-descent parsing– Lookahead symbol unambiguously

determines the flow of control based on the first terminal(s) of the nonterminal

35

Page 33: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Procedure for stmt

Necessary condition to use predictive parsing? No confliction on the first symbols of the bodies for the same head.

36

Page 34: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Left Recursion Elimination• Leftmost symbol of the body is the same as

the nonterminal:

• A left-recursive production can be eliminated by rewriting the offending production:

37

Page 35: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Syntax Analyzer vs. Lexical Analyzer• Both of them do similar things• Granularity

– The lexical analyzer works on the characters to recognize the smallest meaningful units (tokens) in a source program.

– The syntax analyzer works on the smallest meaningful units (tokens) in a source program to recognize meaningful structures in the programming language.

• Recursion– The lexical analyzer deals with simple non-

recursive constructs of the language.– The syntax analyzer deals with recursive

constructs of the language.

38

Page 36: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Semantic Analysis• Semantic Analyzer

– adds semantic information to the parse tree (syntax-directed translation)

– checks the source program for semantic errors – collects type information for the code generation– type checking: check whether each operator has

matching operands– coercion: type conversion

39

Page 37: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Semantic Analysis• A Semantic Analyzer checks the source

program for semantic errors and collects the type information for the code generation.

• Type checking is an important part of semantic analysis.

Syntax Tree Semantic Tree

40

Page 38: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Syntax-Directed Translation• Syntax-directed translation is done by

attaching rules or program fragments to productions in a grammar.

• Infix expression postfix expression• Techniques: Attributes & Translation

Schemes41

Page 39: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Postfix Notation• Definition:

– If E is a variable or constant , • E E

– If E is an expression of the form E1 op E2, • E1 op E2 E’1 E’2 op

– If E is a parenthesized expression of the form (E1),• (E1) E’1

• Examples:– 9-5+2 95-2+– 9-(5+2) 952+-

42

Page 40: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Attributes• A syntax-directed definition

– associates attributes with non-terminals and terminals in a grammar

– attaches semantic rules to the productions of the grammar

• An attribute is said to be synthesized if its value at a parse-tree node is determined from attribute values of its children and itself.

43

Page 41: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Semantic Rules for Infix to Postfix

9-5+2 95-2+Annotated Parse Tree

Syntax-directed definition

44

Page 42: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Translation Schemes

• A Syntax-Directed Translation Scheme is a notation for specifying a translation by attaching program fragments to productions in a grammar.

• The program fragments are called semantic actions.

45

Page 43: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

A Translation Scheme

9-5+2 95-2+Parse tree

Translation scheme

46

Page 44: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Attribute vs. Translation Scheme• Syntax-directed attribute attaches strings

as attributes to the nodes in the parse tree

• Syntax-directed translation scheme prints the translation incrementally, through semantic actions

47

Page 45: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

A Simple Translator

Grammar of List of digits separated by plus or minus signs

49

Page 46: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Translation of 9-5+2 to 95-2+

Left-recursion eliminated

50

Page 47: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Procedures for Simple Translator

51

Page 48: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Syntax vs. Semantics• The syntax of a programming language

describes the proper form of its programs.

• The semantics of the language defines what its programs mean, what each program does when it executes.

53

Page 49: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Intermediate Code Generation

54

Page 50: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Intermediate Code Generation• The front end of a compiler constructs an

intermediate representation of the source program from which the back end generates the target program.

• Two kinds of intermediate representations

– Tree: parse trees and (abstract) syntax trees

– Linear representation: three-address code

56

Page 51: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Three-Address Codes• Three-address code is a sequence of instructions of

the form x = y op z

• Arrays will be handled by using the following two variants of instructions:

x [ y ] = z x = y [ z ]

• Instructions for control flow:ifFalse x goto L

ifTrue x goto L goto L

• Instruction for copying value x = y

60

Page 52: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Translation of Statements• Use jump instructions to implement the

flow of control through the statement.

• The translation of

if expr then stmtl

61

Page 53: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Translation of Expressions• Approach:

– No code is generated for identifiers and constants– If a node x of class Expr has operator op, then an

instruction is emitted to compute the value at node x into a temporary.

• Expression: i-j+k translates intot1 = i-jt2 = t1+k

• Expression: 2 * a[i] translates intot1 = a [ i ]t2 = 2 * t1

* Do not use a temporary in place of a[i], if a[i] appears on the left side of an assignment.

64

Page 54: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Translation of Expressions• Example:

65

Page 55: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Test Yourself• Generate three-address codes for

If(x[2*a]==y[b]) x[2*a+1]=y[b+1];

t4=2*a

t2=x[t4]

t3=y[b]

t1= t2 == t3

ifFalse t1 goto after

t5=t4+1

t7=b+1

t6=y[t7]

x[t5]=t6

after:

66

Page 56: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Code Optimization• The code optimizer optimizes the code

produced by the intermediate code generator in the terms of time and space.

67

Page 57: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Code Generation• The code generator takes as input an

intermediate representation of the source program and maps it into the target language.

• Example: MOVE id3, R1MULT #60.0, R1ADD id2, R1MOVE R1, id1

68

Page 58: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Tools

• Lexical Analysis – LeX, FLeX, JLeX

• Syntax Anaysis – Yacc, JavaCC, SableCC

• Semantic Analysis – Yacc, JavaCC, SableCC

70

Page 59: CS308 Compiler Principles Introduction Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Compiler Principles

Homework• Reading

– Chapter 1 and 2

71