language processing systemshamada/lp/l11-lp.pdfast dag linear ir • low level il before final code...
TRANSCRIPT
![Page 1: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/1.jpg)
Language Processing Systems
Prof. Mohamed Hamada
Software Engineering Lab. The University of Aizu
Japan
![Page 2: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/2.jpg)
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Intermediate Language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
Intermediate/Code Generation
Intr. Code Generator
![Page 3: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/3.jpg)
Intermediate Code Generation
![Page 4: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/4.jpg)
Intermediate Code
• Similar terms: Intermediate representation, intermediate language
• Ties the front and back ends together • Language and Machine neutral • Many forms • More than one intermediate language may
be used by a compiler
![Page 5: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/5.jpg)
• There is no standard Intermediate Representation. IR is a step in expressing a source program so that machine understands it
• As the translation takes place, IR is repeatedly analyzed and
transformed • Compiler users want analysis and translation to be fast and
correct • Compiler writers want optimizations to be simple to write, easy to
understand and easy to extend • IR should be simple and light weight while allowing easy
expression of optimizations and transformations.
Intermediate Representation
![Page 6: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/6.jpg)
Issues in new IR Design
• How much machine dependent • Expressiveness: how many languages are covered • Appropriateness for code optimization • Appropriateness for code generation
Front end Optimizer
![Page 7: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/7.jpg)
Intermediate Languages Types
• Graphical IRs: – Abstract Syntax trees, – DAGs, – Control Flow Graphs, – etc.
• Linear IRs: – Stack based (postfix) – Three address code (quadruples) – etc.
![Page 8: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/8.jpg)
Graphical IRs
• Abstract Syntax Trees (AST) – retain essential structure of the parse tree, eliminating unneeded nodes.
• Directed Acyclic Graphs (DAG) – compacted AST to avoid duplication – smaller footprint as well
• Control Flow Graphs (CFG) – explicitly model control flow
![Page 9: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/9.jpg)
Abstract Syntax Tree/DAG • Condensed form of parse tree • Useful for representing language constructs • Depicts the natural hierarchical structure of the
source program
– Each internal node represents an operator – Children of the nodes represent operands – Leaf nodes represent operands
• DAG is more compact than abstract syntax tree
because common sub expressions are eliminated
![Page 10: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/10.jpg)
ASTs and DAGs:
:=
a +
* *
b - (uni)
c
- (uni) b
c
:=
a +
b
*
- (uni)
c
a := b *-c + b*-cExample:
AST DAG
![Page 11: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/11.jpg)
Linear IR • Low level IL before final code generation
– A linear sequence of low-level instructions
– Resemble assembly code for an abstract machine
• Explicit conditional branches and goto jumps
– Reflect instruction sets of the target machine
• Stack-machine code and three-address code
– Implemented as a collection (table or list) of tuples
• Linear IR examples
– Stack based (postfix)
– Three address code (quadruples)
![Page 12: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/12.jpg)
Stack based: Postfix notation • Linearized representation of a syntax tree • List of nodes of the tree • Nodes appear immediately after its children • The postfix notation for an expression E is defined as follows:
– If E is a variable or constant then the postfix notation is E itself – If E is an expression of the form E1 op E2 where op is a binary
operator then the postfix notation for E is • E1' E2' op where E1' and E2‘ are the postfix notations for E1 and E2
respectively
– If E is an expression of the form (E1) then the postfix notation for E1 is also the postfix notation for E
![Page 13: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/13.jpg)
Stack based: Postfix notation
• No parenthesis are needed in postfix notation because – the position and parity of the operators permits
only one decoding of a postfix expression
• Postfix notation for a = b * -c + b * - c is a b c - * b c - * + =
![Page 14: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/14.jpg)
Stack-machine code
• Also called one-address code – Assumes an operand stack – Operations take operands from the stack and push results back
onto the stack – Need special operations such as
• Swapping two operands on top of the stack
• Compact in space, simple to generate and execute • Most operands do not need names • Results are transitory unless explicitly moved to memory
• Used as IR for Smalltalk and Java Push 2 Push y Multiply Push x subtract
Stack-machine code for x – 2 * y
![Page 15: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/15.jpg)
Three address code
• It is a sequence of statements of the general form X := Y op Z where
– X, Y or Z are names, constants or compiler
generated temporaries – op stands for any operator such as a fixed- or
floating-point arithmetic operator, or a logical operator
![Page 16: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/16.jpg)
Three address code • Only one operator on the right hand side is allowed • Source expression like x + y * z might be translated into
t1 := y * z t2 := x + t1
where t1 and t2 are compiler generated temporary names
• Unraveling of complicated arithmetic expressions and of control flow
makes 3-address code desirable for code generation and optimization • The use of names for intermediate values allows 3-address code to be
easily rearranged • Three address code is a linearized representation of a syntax tree where
explicit names correspond to the interior nodes of the graph
![Page 17: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/17.jpg)
Three address instructions • Assignment
– x = y op z – x = op y – x = y
• Jump
– goto L – if x relop y goto L
• Indexed assignment – x = y[i] – x[i] = y
• Function – param x – call p,n – return y
• Pointer – x = &y – x = *y – *x = y
![Page 18: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/18.jpg)
Three address code
• Every instruction manipulates at most two operands and one result. Typical forms include:
– Arithmetic operations: x := y op z | x := op y – Data movement: x := y [ z ] | x[z] := y | x := y – Control flow: if y op z goto x | goto x – Function call: param x | return y | call foo
t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3
Three-address code for x – 2 * y
Example:
![Page 19: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/19.jpg)
Storing three-address code
• Store all instructions in a quadruple table – Every instruction has four fields: op, arg1, arg2, result – The label of instructions è index of instruction in table
op arg1 arg2 result
(0) Uminus c t1
(1) Mult b t1 t2
(2) Uminus c t3
(3) Mult b t3 t4
(4) Plus t2 t4 t5
(5) Assign t5 a
t1 := - c t2 := b * t1 t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5
Three-address code
Quadruple entries
Alternative: store all the instructions in a singly/doubly linked list What is the tradeoff?
Example:
![Page 20: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/20.jpg)
Linear IR: Example
Push 2 Push y Multiply Push x subtract
Linear IR for x – 2 * y
MOV 2 => t1 MOV y => t2 MULT t2 => t1 MOV x => t4 SUB t1 => t4
Stack-machine code two-address code three-address code
t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3
Example:
![Page 21: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/21.jpg)
Generating Intermediate Code
As we parse, generate IC for the given input. Use attributes to pass information about temporary variables up the tree
![Page 22: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/22.jpg)
a := b *-c + b*-c
t0 = b
b t0
Example1:
![Page 23: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/23.jpg)
t0 = b t1 = -c
b - (uni)
c
t1
t1
t0
a := b *-c + b*-c Example1:
![Page 24: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/24.jpg)
t0 = b t1 = -c t1 = t1 * t0
*
b - (uni)
c
t1
t1
t0
t1
a := b *-c + b*-c Example1:
![Page 25: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/25.jpg)
t0 = b t1 = -c t1 = t1 * t0 t0 = b
*
b - (uni) b
c
t1
t1
t0
t1
t0
a := b *-c + b*-c Example1:
![Page 26: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/26.jpg)
t0 = b t1 = -c t1 = t1 * t0 t0 = b t2 = -c
*
b - (uni)
c
- (uni) b
c
t1 t2
t1 t2
t0 t0
t1
a := b *-c + b*-c Example1:
![Page 27: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/27.jpg)
t0 = b t1 = -c t1 = t1 * t0 t0 = b t2 = -c t0 = t0 * t2
* *
b - (uni)
c
- (uni) b
c
t1 t2
t1 t2
t0 t0
t0 t1
a := b *-c + b*-c Example1:
![Page 28: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/28.jpg)
t0 = b t1 = -c t1 = t1 * t0 t0 = b t2 = -c t0 = t0 * t2 t1 = t0 + t1
+
* *
b - (uni)
c
- (uni) b
c
t1 t2
t1 t2
t0 t0
t0 t1
t1
a := b *-c + b*-c Example1:
![Page 29: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/29.jpg)
t0 = b t1 = -c t1 = t1 * t0 t0 = b t2 = -c t0 = t0 * t2 t1 = t0 + t1 a = t1
assign
a +
* *
b - (uni)
c
- (uni) b
c
t1 t2
t1 t2
t0 t0
t0 t1
t1
a := b *-c + b*-c Example1:
![Page 30: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/30.jpg)
assign
a +
b
*
- (uni)
c
t1 t0
t0
t1
t0 t0 = b t1 = -c t1 = t1 * t0 t0 = t1 * t1 a = t10
a := b *-c + b*-c Example1:
![Page 31: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/31.jpg)
Expressions
Grammar: S à id := E E à E + E E à id
S
a := b + c + d + e
E
E E E
E
E
E
Generate: t1 = b
1
Example 2:
![Page 32: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/32.jpg)
Grammar: S à id := E E à E + E E à id
S
a := b + c + d + e
E
E E E
E
E
E
Generate: t1 = b t2 = c
1 2
Each number corresponds to a temporary variable.
Expressions Example 2:
![Page 33: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/33.jpg)
Grammar: S à id := E E à E + E E à id
S
a := b + c + d + e
E
E E E
E
E
E
Generate: t1 = b t2 = c t3 = t1 + t2
1 2
3
Each number corresponds to a temporary variable.
Expressions Example 2:
![Page 34: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/34.jpg)
Grammar: S à id := E E à E + E E à id
S
a := b + c + d + e
E
E E E
E
E
E
Generate: t1 = b t2 = c t3 = t1 + t2 t4 = d
1 2
3 4
Each number corresponds to a temporary variable.
Expressions Example 2:
![Page 35: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/35.jpg)
Grammar: S à id := E E à E + E E à id
S
a := b + c + d + e
E
E E E
E
E
E
Generate: t1 = b t2 = c t3 = t1 + t2 t4 = d t5 = t3 + t4
1 2
3 4
5
Each number corresponds to a temporary variable.
Expressions Example 2:
![Page 36: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/36.jpg)
Grammar: S à id := E E à E + E E à id
S
a := b + c + d + e
E
E E E
E
E
E
Generate: t1 = b t2 = c t3 = t1 + t2 t4 = d t5 = t3 + t4 t6 = e 1 2
3 4
5 6
Each number corresponds to a temporary variable.
Expressions Example 2:
![Page 37: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/37.jpg)
Grammar: S à id := E E à E + E E à id
S
a := b + c + d + e
E
E E E
E
E
E
1 2
3 4
5 6
7
Each number corresponds to a temporary variable.
Generate: t1 = b t2 = c t3 = t1 + t2 t4 = d t5 = t3 + t4 t6 = e t7 = t5 + t6
Expressions Example 2:
![Page 38: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/38.jpg)
Grammar: S à id := E E à E + E E à id
S
a := b + c + d + e
E
E E E
E
E
E
1 2
3 4
5 6
7
Each number corresponds to a temporary variable.
Generate: t1 = b t2 = c t3 = t1 + t2 t4 = d t5 = t3 + t4 t6 = e t7 = t5 + t6 a = t7
Expressions Example 2:
![Page 39: Language Processing Systemshamada/LP/L11-LP.pdfAST DAG Linear IR • Low level IL before final code generation – A linear sequence of low-level instructions – Resemble assembly](https://reader035.vdocument.in/reader035/viewer/2022071010/5fc79b2c3632e4497453959b/html5/thumbnails/39.jpg)
Other representations • SSA: Single Static Assignment • RTL: Register transfer language • Stack machines: P-code • CFG: Control Flow Graph • Dominator Trees • DJ-graph: dominator tree augmented with join edges • PDG: Program Dependence Graph • VDG: Value Dependence Graph • GURRR: Global unified resource requirement
representation. Combines PDG with resource requirements
• Java intermediate bytecodes • The list goes on ......