![Page 1: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/1.jpg)
INTRODUCTION TO COMPILERBaishakhi Ray
Programming Languages & Translators
These slides are motivated from Prof. Calvin Lin, UT Austin
![Page 2: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/2.jpg)
What is a Compiler?
Compiler
#include <iostream>
int main() { std::cout << "Hello World!"; return 0;}
001100110011001001001100110000000111001000110000000110110011
Source Program Target Program
Input
Output
![Page 3: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/3.jpg)
What is a Compiler?
CompilerSource Program Target Program
Input
Output
InterpreterSource Program
InputOutput
![Page 4: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/4.jpg)
A Hybrid Compiler
Translator
Source Program
InterpreterIntermediate Program
InputOutput
![Page 5: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/5.jpg)
A language processing system
Preprocessor
Compiler
Assembler
Linker/Loader
Source Program
Modified Source Program
Target Assembly Program
Relocatable Machine Code
Target Machine CodeLibrary filesRelocatable object files
A source code typically have many modules written in different files.
Collect source program from different files, expand macros, etc.
Assembly code is easier to produce and debug.
A linker resolves external memory addresses, where one file’s code
refers another file’s location. Loader puts all the executive object files to memory for
execution.
![Page 6: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/6.jpg)
What is a Compiler?
Compiler
#include <iostream>
int main() { std::cout << "Hello World!"; return 0;}
001100110011001001001100110000000111001000110000000110110011
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
![Page 7: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/7.jpg)
Structure of a Typical Compiler
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
Analysis Phase
Front End
Synthesis Phase
Back End
![Page 8: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/8.jpg)
Input to Compiler
for i = 1 to 10 doa[i] = x * 5;
f o r i = 1 t o 1 0 d o a [ i ] = x * 5 ;
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
![Page 9: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/9.jpg)
Lexical Analysis
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
for i = 1 to 10 doa[i] = x * 5;
Break character stream into tokens (“words”)
for id(i) <=> number(1) to number(10) do id(a) <[> id(i) <]> <=> id(x) <*> number(5) <;>
![Page 10: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/10.jpg)
Compiler Data Structure
▪ Symbol Tables▪ Compile-time data structures▪ Hold names, type information, and scope information for variables
▪ Scopes ▪ A name space e.g., In C/C++, each set of curly braces defines a new scope ▪ Can create a separate symbol table for each scope
![Page 11: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/11.jpg)
Lexical Analysis
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
for i = 1 to 10 doa[i] = x * 5;
Break character stream into tokens (“words”)
for id(i) <=> number(1) to number(10) do id(a) <[> id(i) <]> <=> id(x) <*> number(5) <;>
for <id,1> <=> number(1) to number(10) do <id,2> <[> <id,1> <]> <=> <id,3> <*> number(5) <;>
1 i …2 a …3 x …
Symbol Table
![Page 12: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/12.jpg)
Syntactic Analysis (Parsing)
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
for i = 1 to 10 doa[i] = x * 5;
Impose Structure to Token Stream
for
i 1 10 assign
a i x 5
array timesIn a typical syntax tree, intermediate nodes represent operations and Leaf node represent the arguments of the operations.
![Page 13: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/13.jpg)
Semantic Analysis
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
for i = 1 to 10 doa[i] = x * 5;
Determine whether source is meaningful
▪ Check for semantic errors▪ Check for type errors▪ Gather type information for subsequent stages
▪ Relate variable uses to their declarations
![Page 14: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/14.jpg)
Usage of Symbol Tables
▪ For each variable declaration: ▪ Check for symbol table entry ▪ Add new entry (parsing)▪ add type info (semantic analysis)
▪ For each variable use: ▪ Check symbol table entry (semantic analysis)
![Page 15: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/15.jpg)
Intermediate Code Generation
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
Transform AST into low-level intermediate representation (IR)
Simplifies the IR
• Removes high-level control structures: - for, while, do, switch • Removes high-level data structures: - arrays, structs, unions, enums
![Page 16: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/16.jpg)
Intermediate Code Generation
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
Transform AST into low-level intermediate representation (IR)
One possible result is assembly-like code
• Semantic lowering • Control-flow expressed in terms of “gotos” • Each expression is very simple (three-address code) e.g., x := a * b * c
t := a * b x := t * c
![Page 17: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/17.jpg)
Intermediate Code Generation
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
for i = 1 to 10 doa[i] = x * 5;
i := 1 loop1: t1 := x * 5 t2 := &a t3 := sizeof(int) t4 := t3 * i t5 := t2 + t4 *t5 := t1 i := i + 1 if i <= 10 goto loop1
![Page 18: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/18.jpg)
Optimization
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
Mostly machine independent optimization Phase aims to generate better code.
Better can be• Faster• Shorter• Energy efficient• …
![Page 19: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/19.jpg)
Optimization
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
for i = 1 to 10 doa[i] = x * 5;
i := 1 t3 := sizeof(int) loop1: t1 := x * 5 t2 := &a t3 := sizeof(int) t4 := t3 * i t5 := t2 + t4 *t5 := t1 i := i + 1 if i <= 10 goto loop1
![Page 20: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/20.jpg)
Low Level Code Generation
Intermediate Code Generation
optimization
Code Generation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Interpreter
Character stream
Token stream
Syntax trees
Syntax trees
IR
IR
Target Language
Register Transfer Language (RTL) – Linear representation – Typically language-independent – Nearly corresponds to machine instructions
Example operations • Assignment x := y • Unary op x := op y • Binary op x := y op z • Call x := f() • Cbranch if (x==3) goto L• Address of p := & y • Load x := *(p+4) • Store *(p+4) := y
![Page 21: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/21.jpg)
Exercise:
a = b + c * 5
![Page 22: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/22.jpg)
Compiler vs Interpreter
Compiler InterpreterOptimization Compiler sees the entire program. Thus
optimization is easy.Interpreter sees program line by line. Thus, optimization is not robust.
Running time Compiled code runs faster Interpreted code runs slower
Program Generation
Compiler generates output program, which can be run independently at a later point of time.
Do not generate output program. Evaluate each line one by one during program execution.
Error Execution Emits compilation errors after the whole compilation process.
Reads each line and shows errors, if any.
Example C, C++ Command Lines
![Page 23: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/23.jpg)
Why studying compiler?
Isn’t it a solved problem?
▪ Machines keep changing ▪ New features present new problems (e.g., GPU) ▪ Changing costs lead to different concerns
▪ Languages keep changing ▪ New ideas (e.g., OOP and GC) have gone mainstream
▪ Applications keep changing ▪ Interactive, real-time, mobile, machine-learning based applications
![Page 24: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/24.jpg)
Why studying compiler?
▪ Values keep changing
▪ We used to just care about run-time performance
▪ Now? ▪ Compile-time performance▪ Code size ▪ Correctness ▪ Energy consumption▪ Security ▪ Fault tolerance
![Page 25: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/25.jpg)
Value added compilation
▪ The more we rely on software, the more we demand more of it
▪ Compilers can help– treat code as data ▪ Analyze the code
▪ Correctness
▪ Security
![Page 26: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/26.jpg)
Correctness and Security
▪ Can we check whether pointers and addresses are valid?
▪ Can we detect when untrusted code accesses a sensitive part of a system?
▪ Can we detect whether locks are used properly?
▪ Can we use compilers to certify that code is correct?
▪ Can we use compilers to verify that a given compiler transformation is correct?
![Page 27: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/27.jpg)
Value-added Compilation
▪ The more we rely on software, the more we demand more of it
▪ Compilers can help– treat code as data ▪ Analyze the code
▪ Correctness ▪ Security ▪ Reliability ▪ Program understanding ▪ Program evolution
▪ Software testing ▪ Reverse engineering ▪ Program obfuscation ▪ Code compaction ▪ Energy efficiency
Computation important understanding computation important
![Page 28: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/28.jpg)
Why studying compiler?
▪ Compilers are a fundamental building block of modern systems
▪ We need to understand their power and limitations ▪ Computer architects ▪ Language designers ▪ Software engineers ▪ OS/Runtime system researchers ▪ Security researchers ▪ Formal methods researchers (model checking, automated theorem proving)
![Page 29: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution](https://reader035.vdocument.in/reader035/viewer/2022071300/609090c83023623f4e495ec9/html5/thumbnails/29.jpg)
Due
• Programming assignment 0 is due next Wednesday.• Set up Git environment