introduction to optimizations

64
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Introduction to Introduction to Optimizations Optimizations Guo, Yao Guo, Yao

Upload: carlow

Post on 05-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Optimizations. Guo, Yao. Outline. Optimization Rules Basic Blocks Control Flow Graph (CFG) Loops Local Optimizations Peephole optimization. Levels of Optimizations. Local inside a basic block Global (intraprocedural) Across basic blocks Whole procedure analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Optimizations

School of EECS, Peking University

“Advanced Compiler Techniques” (Fall 2011)

Introduction to Introduction to OptimizationsOptimizations

Guo, YaoGuo, Yao

Page 2: Introduction to Optimizations

2Fall 2011“Advanced Compiler

Techniques”

OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)

LoopsLoops Local OptimizationsLocal Optimizations

Peephole optimizationPeephole optimization

Page 3: Introduction to Optimizations

3Fall 2011“Advanced Compiler

Techniques”

Levels of OptimizationsLevels of Optimizations LocalLocal

inside a basic blockinside a basic block Global (intraprocedural)Global (intraprocedural)

Across basic blocksAcross basic blocks Whole procedure analysisWhole procedure analysis

InterproceduralInterprocedural Across proceduresAcross procedures Whole program analysisWhole program analysis

Page 4: Introduction to Optimizations

4Fall 2011“Advanced Compiler

Techniques”

The Golden Rules of OptimizationThe Golden Rules of Optimization

Premature Optimization is Premature Optimization is EvilEvil

Donald Knuth, Donald Knuth, premature optimization premature optimization is the root of all evilis the root of all evil Optimization can introduce new, subtle bugsOptimization can introduce new, subtle bugs Optimization usually makes code harder to Optimization usually makes code harder to

understand and maintainunderstand and maintain Get your code right first, then, Get your code right first, then, if really if really

neededneeded, optimize it, optimize it Document optimizations carefullyDocument optimizations carefully Keep the non-optimized version handy, or even Keep the non-optimized version handy, or even

as a comment in your codeas a comment in your code

Page 5: Introduction to Optimizations

5Fall 2011“Advanced Compiler

Techniques”

The Golden Rules of OptimizationThe Golden Rules of Optimization

The 80/20 RuleThe 80/20 Rule In general, In general, 80% percent of a program80% percent of a program’’s s

execution time is spent executing execution time is spent executing 20% of the code20% of the code

90%/10% for performance-hungry 90%/10% for performance-hungry programsprograms

Spend your time optimizing the important Spend your time optimizing the important 10/20% of your program10/20% of your program

Optimize the common case even at the Optimize the common case even at the cost of making the uncommon case slowercost of making the uncommon case slower

Page 6: Introduction to Optimizations

6Fall 2011“Advanced Compiler

Techniques”

The Golden Rules of OptimizationThe Golden Rules of Optimization

Good Algorithms RuleGood Algorithms Rule The best and most important way of The best and most important way of

optimizing a program is using optimizing a program is using good good algorithmsalgorithms E.g. O(n*log) rather than O(nE.g. O(n*log) rather than O(n22))

However, we still need lower level optimization However, we still need lower level optimization to get more of our programsto get more of our programs

In addition, In addition, asymptotic complexityasymptotic complexity is not is not always an appropriate metric of efficiencyalways an appropriate metric of efficiency Hidden constant may be misleadingHidden constant may be misleading E.g.E.g. a linear time algorithm than runs in 100 a linear time algorithm than runs in 100*n+*n+100 100

time is slower than a cubic time algorithm than runs time is slower than a cubic time algorithm than runs in in nn33++10 time 10 time if the problem size is smallif the problem size is small

Page 7: Introduction to Optimizations

7Fall 2011“Advanced Compiler

Techniques”

Asymptotic ComplexityAsymptotic ComplexityHidden ConstantsHidden Constants

Hidden Contants

0

500

1000

1500

2000

2500

3000

0 5 10 15

Problem Size

Exe

cuti

on

Tim

e

100*n+100

n*n*n+10

Page 8: Introduction to Optimizations

8Fall 2011“Advanced Compiler

Techniques”

General Optimization General Optimization TechniquesTechniques

Strength reductionStrength reduction Use the fastest version of an operationUse the fastest version of an operation E.g. E.g.

x >> 2x >> 2 instead ofinstead of x / 4x / 4

x << 1x << 1 instead ofinstead of x * 2x * 2

Common sub expression eliminationCommon sub expression elimination Eliminate redundant calculationsEliminate redundant calculations E.g.E.g.

double x = d * (lim / max) * sx;double x = d * (lim / max) * sx;

double y = d * (lim / max) * sy;double y = d * (lim / max) * sy;

double depth = d * (lim / max);double depth = d * (lim / max);

double x = depth * sx;double x = depth * sx;

double y = depth * sy;double y = depth * sy;

Page 9: Introduction to Optimizations

9Fall 2011“Advanced Compiler

Techniques”

General Optimization General Optimization TechniquesTechniques

Code motionCode motion InvariantInvariant expressions should be executed only expressions should be executed only

onceonce E.g.E.g.

for (int i = 0; i < x.length; i++)for (int i = 0; i < x.length; i++)

x[i] *= Math.PI * Math.cos(y);x[i] *= Math.PI * Math.cos(y);

double picosy = Math.PI * Math.cos(y);double picosy = Math.PI * Math.cos(y);

for (int i = 0; i < x.length; i++)for (int i = 0; i < x.length; i++)

x[i] *= picosy;x[i] *= picosy;

Page 10: Introduction to Optimizations

10Fall 2011“Advanced Compiler

Techniques”

General Optimization General Optimization TechniquesTechniques

Loop unrollingLoop unrolling The overhead of the loop control code can be The overhead of the loop control code can be

reduced by executing more than one iteration in reduced by executing more than one iteration in the body of the loop. the body of the loop. E.g.E.g.double picosy = Math.PI * Math.cos(y);double picosy = Math.PI * Math.cos(y);

for (int i = 0; i < x.length; i++)for (int i = 0; i < x.length; i++)

x[i] *= picosy;x[i] *= picosy;

double picosy = Math.PI * Math.cos(y);double picosy = Math.PI * Math.cos(y);

for (int i = 0; i < x.length; i += 2) {for (int i = 0; i < x.length; i += 2) {

x[i] *= picosy;x[i] *= picosy;

x[i+1] *= picosy;x[i+1] *= picosy;

}}

A efficient “+1” in arrayA efficient “+1” in arrayindexing is requiredindexing is required

Page 11: Introduction to Optimizations

11Fall 2011“Advanced Compiler

Techniques”

Compiler OptimizationsCompiler Optimizations

Compilers try to generate Compilers try to generate goodgood code code i.e.i.e. Fast Fast

Code improvement is challengingCode improvement is challenging Many problems are NP-hardMany problems are NP-hard

Code improvement may slow down Code improvement may slow down the compilation processthe compilation process In some domains, such as just-in-time In some domains, such as just-in-time

compilation, compilation speed is criticalcompilation, compilation speed is critical

Page 12: Introduction to Optimizations

12Fall 2011“Advanced Compiler

Techniques”

Phases of CompilationPhases of Compilation The first The first

three phases three phases are are language-language-dependentdependent

The last two The last two are are machine-machine-dependentdependent

The middle The middle two two dependent on dependent on neither the neither the language nor language nor the machinethe machine

Page 13: Introduction to Optimizations

13Fall 2011“Advanced Compiler

Techniques”

PhasesPhases

Page 14: Introduction to Optimizations

14Fall 2011“Advanced Compiler

Techniques”

OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)

LoopsLoops Local OptimizationsLocal Optimizations

Peephole optmizationPeephole optmization

Page 15: Introduction to Optimizations

15Fall 2011“Advanced Compiler

Techniques”

Basic BlocksBasic Blocks A A basic blockbasic block is a maximal sequence of is a maximal sequence of

consecutive three-address instructions consecutive three-address instructions with the following properties:with the following properties: The flow of control can only enter the basic The flow of control can only enter the basic

block thru the 1st instr.block thru the 1st instr. Control will leave the block without halting Control will leave the block without halting

or branching, except possibly at the last or branching, except possibly at the last instr.instr.

Basic blocks become the nodes of a Basic blocks become the nodes of a flow flow graphgraph, with edges indicating the order., with edges indicating the order.

Page 16: Introduction to Optimizations

16Fall 2011“Advanced Compiler

Techniques”

ExamplesExamples1)1) i = 1i = 12)2) j = 1j = 13)3) t1 = 10 * it1 = 10 * i4)4) t2 = t1 + jt2 = t1 + j5)5) t3 = 8 * t2t3 = 8 * t26)6) t4 = t3 - 88t4 = t3 - 887)7) a[t4] = 0.0a[t4] = 0.08)8) j = j + 1j = j + 19)9) if j <= 10 goto (3)if j <= 10 goto (3)10)10) i = i + 1i = i + 111)11) if i <= 10 goto (2)if i <= 10 goto (2)12)12) i = 1i = 113)13) t5 = i - 1t5 = i - 114)14) t6 = 88 * t5t6 = 88 * t515)15) a[t6] = 1.0a[t6] = 1.016)16) i = i + 1i = i + 117)17) if i <= 10 goto (13)if i <= 10 goto (13)

forfor i from 1 to 10 i from 1 to 10 dodo forfor j from 1 to 10 j from 1 to 10 dodo a[i,j]=0.0a[i,j]=0.0

forfor i from 1 to 10 i from 1 to 10 dodo a[i,i]=0.0a[i,i]=0.0

Page 17: Introduction to Optimizations

17Fall 2011“Advanced Compiler

Techniques”

Identifying Basic BlocksIdentifying Basic Blocks Input: sequence of instructions Input: sequence of instructions

instr(i)instr(i) Output: A list of basic blocksOutput: A list of basic blocks Method:Method:

Identify Identify leadersleaders:: the first instruction of a basic block the first instruction of a basic block

Iterate: add subsequent instructions to Iterate: add subsequent instructions to basic block until we reach another basic block until we reach another leaderleader

Page 18: Introduction to Optimizations

18Fall 2011“Advanced Compiler

Techniques”

Identifying LeadersIdentifying Leaders Rules for finding leaders in codeRules for finding leaders in code

First instrFirst instr in the code is a leader in the code is a leader Any instr that is the Any instr that is the targettarget of a of a

(conditional or unconditional) jump is a (conditional or unconditional) jump is a leaderleader

Any instr that Any instr that immediately followimmediately follow a a (conditional or unconditional) jump is a (conditional or unconditional) jump is a leaderleader

Page 19: Introduction to Optimizations

19Fall 2011“Advanced Compiler

Techniques”

Basic Block Partition Basic Block Partition AlgorithmAlgorithm

leaders = {1}leaders = {1} // start of program// start of programforfor i = 1 to |n| i = 1 to |n| // all instructions// all instructions

ifif instr(i) is a branch instr(i) is a branchleaders = leaders leaders = leaders UU targets of instr(i) targets of instr(i) UU

instr(i+1)instr(i+1)worklist = leadersworklist = leadersWhileWhile worklist worklist notnot emptyempty

x = first instruction in worklistx = first instruction in worklistworklist = worklist worklist = worklist –– {x} {x}block(x) = {x}block(x) = {x}forfor i = x + 1; i <= |n| && i i = x + 1; i <= |n| && i notnot inin leaders; i++ leaders; i++

block(x) = block(x) block(x) = block(x) UU {i} {i}

Page 20: Introduction to Optimizations

20Fall 2011“Advanced Compiler

Techniques”

EE

AA

BB

CC

DD

FF

Basic Block ExampleBasic Block Example

Leaders

1.1. i = 1i = 12.2. j = 1j = 13.3. t1 = 10 * it1 = 10 * i4.4. t2 = t1 + jt2 = t1 + j5.5. t3 = 8 * t2t3 = 8 * t26.6. t4 = t3 - 88t4 = t3 - 887.7. a[t4] = 0.0a[t4] = 0.08.8. j = j + 1j = j + 19.9. if j <= 10 goto (3)if j <= 10 goto (3)10.10. i = i + 1i = i + 111.11. if i <= 10 goto (2)if i <= 10 goto (2)12.12. i = 1i = 113.13. t5 = i - 1t5 = i - 114.14. t6 = 88 * t5t6 = 88 * t515.15. a[t6] = 1.0a[t6] = 1.016.16. i = i + 1i = i + 117.17. if i <= 10 goto if i <= 10 goto

(13)(13)

Basic Blocks

Page 21: Introduction to Optimizations

21Fall 2011“Advanced Compiler

Techniques”

OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)

LoopsLoops Local OptimizationsLocal Optimizations

Peephole optmizationPeephole optmization

Page 22: Introduction to Optimizations

22Fall 2011“Advanced Compiler

Techniques”

Control-Flow GraphsControl-Flow Graphs Control-flow graphControl-flow graph::

Node: an instruction or sequence of Node: an instruction or sequence of instructions (a instructions (a basic blockbasic block)) Two instructions i, j in same basic blockTwo instructions i, j in same basic block

iffiff execution of i execution of i guaranteesguarantees execution of j execution of j Directed edge: Directed edge: potentialpotential flow of controlflow of control Distinguished start node Distinguished start node Entry & ExitEntry & Exit

First & last instruction in programFirst & last instruction in program

Page 23: Introduction to Optimizations

23Fall 2011“Advanced Compiler

Techniques”

Control-Flow EdgesControl-Flow Edges Basic blocks = nodesBasic blocks = nodes Edges:Edges:

Add directed edge between B1 and B2 if:Add directed edge between B1 and B2 if: Branch from last statement of B1 to first Branch from last statement of B1 to first

statement of B2 (B2 is a leader), orstatement of B2 (B2 is a leader), or B2 immediately follows B1 in program order B2 immediately follows B1 in program order

and B1 does not end with unconditional branch and B1 does not end with unconditional branch (goto)(goto)

Definition of Definition of predecessorpredecessor and and successorsuccessor B1 is a predecessor of B2B1 is a predecessor of B2 B2 is a successor of B1B2 is a successor of B1

Page 24: Introduction to Optimizations

24Fall 2011“Advanced Compiler

Techniques”

Control-Flow Edge Control-Flow Edge AlgorithmAlgorithm

InputInput: block(i), sequence of basic blocks: block(i), sequence of basic blocks

OutputOutput: CFG where nodes are basic blocks: CFG where nodes are basic blocks

for i = 1 to the number of blocksfor i = 1 to the number of blocks

x = last instruction of block(i)x = last instruction of block(i)

if instr(x) is a branchif instr(x) is a branch

for each target y of instr(x),for each target y of instr(x),

create edge (i create edge (i ->-> y) y)

if instr(x) is if instr(x) is notnot unconditional branch, unconditional branch,

create edge (i -> i+1)create edge (i -> i+1)

Page 25: Introduction to Optimizations

25Fall 2011“Advanced Compiler

Techniques”

CFG ExampleCFG Example

Page 26: Introduction to Optimizations

26Fall 2011“Advanced Compiler

Techniques”

LoopsLoops Loops comes fromLoops comes from

while, do-while, for, goto……while, do-while, for, goto…… Loop definition: A set of nodes L in a Loop definition: A set of nodes L in a

CFG is a CFG is a looploop if if1.1. There is a node called the There is a node called the loop entryloop entry: :

no other node in L has a predecessor no other node in L has a predecessor outside L.outside L.

2.2. Every node in L has a nonempty path Every node in L has a nonempty path (within L) to the entry of L.(within L) to the entry of L.

Page 27: Introduction to Optimizations

27Fall 2011“Advanced Compiler

Techniques”

Loop ExamplesLoop Examples

{B3}{B3} {B6}{B6} {B2, B3, B4}{B2, B3, B4}

Page 28: Introduction to Optimizations

28Fall 2011“Advanced Compiler

Techniques”

Identifying LoopsIdentifying Loops MotivationMotivation

majority of runtimemajority of runtime focus optimization on loop bodies!focus optimization on loop bodies!

remove redundant code, replace expensive remove redundant code, replace expensive operations operations )) speed up program speed up program

Finding loops:Finding loops: easy… easy…

for i = 1 to 1000for i = 1 to 1000

for j = 1 to 1000for j = 1 to 1000

for k = 1 to 1000for k = 1 to 1000

do somethingdo something

11 i = 1; j = 1; k = 1;i = 1; j = 1; k = 1;

22 A1: if i > 1000 goto L1;A1: if i > 1000 goto L1;

33 A2: if j > 1000 goto L2;A2: if j > 1000 goto L2;

44 A3: if k > 1000 goto L3;A3: if k > 1000 goto L3;

55 do somethingdo something

66 k = k + 1; goto A3;k = k + 1; goto A3;

77 L3: j = j + 1; goto A2;L3: j = j + 1; goto A2;

88 L2: i = i + 1; goto A1;L2: i = i + 1; goto A1;

99 L1: haltL1: halt

or harderor harder(GOTOs)(GOTOs)

Page 29: Introduction to Optimizations

29Fall 2011“Advanced Compiler

Techniques”

OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)

LoopsLoops Local OptimizationsLocal Optimizations

Peephole optmizationPeephole optmization

Page 30: Introduction to Optimizations

30Fall 2011“Advanced Compiler

Techniques”

Local OptimizationLocal Optimization

Optimization of basic blocksOptimization of basic blocks

§8.5§8.5

Page 31: Introduction to Optimizations

31Fall 2011“Advanced Compiler

Techniques”

Transformations on basic Transformations on basic blocksblocks

Common subexpression eliminationCommon subexpression elimination: recognize : recognize redundant computations, replace with single redundant computations, replace with single temporarytemporary

Dead-code eliminationDead-code elimination: recognize computations : recognize computations not used subsequently, remove quadruplesnot used subsequently, remove quadruples

Interchange statements, for better schedulingInterchange statements, for better scheduling Renaming of temporaries, for better register Renaming of temporaries, for better register

usageusage

All of the above require All of the above require symbolic executionsymbolic execution of of the basic block, to obtain def/use informationthe basic block, to obtain def/use information

Page 32: Introduction to Optimizations

32Fall 2011“Advanced Compiler

Techniques”

Simple symbolic Simple symbolic interpretation:interpretation:

next-use information next-use information If x is computed in statementIf x is computed in statement ii,, and is an and is an

operand of statementoperand of statement j, j > ij, j > i,, its value must its value must be preserved (register or memory) until be preserved (register or memory) until jj..

If x is computed at If x is computed at k, k > ik, k > i,, the value the value computed at i has no further use, and be computed at i has no further use, and be discarded (i.e. register reused)discarded (i.e. register reused)

Next-use information is annotated over Next-use information is annotated over statementsstatements and symbol table.and symbol table.

Computed on one backwards pass over Computed on one backwards pass over statement.statement.

Page 33: Introduction to Optimizations

33Fall 2011“Advanced Compiler

Techniques”

Next-Use InformationNext-Use Information DefinitionsDefinitions

1.1. Statement Statement i i assigns a value to assigns a value to x;x;

2.2. Statement Statement j j has has xx as an operand; as an operand;

3.3. Control can flow from i to j along a Control can flow from i to j along a path with no intervening assignments path with no intervening assignments to x;to x;

Statement Statement j usesj uses the value of the value of xx computed at statement computed at statement ii..

i.e.,i.e., x x isis live live at statementat statement i. i.

Page 34: Introduction to Optimizations

34Fall 2011“Advanced Compiler

Techniques”

Computing next-useComputing next-use

Use symbol table to annotate Use symbol table to annotate status of variablesstatus of variables

Each operand in a statementEach operand in a statement carries additional information:carries additional information: Operand liveness (boolean)Operand liveness (boolean) Operand next use (later statementOperand next use (later statement))

On exit from block, all temporaries On exit from block, all temporaries are dead (no next-use)are dead (no next-use)

Page 35: Introduction to Optimizations

35Fall 2011“Advanced Compiler

Techniques”

AlgorithmAlgorithm INPUT: a basic block BINPUT: a basic block B OUTPUT: at each statement OUTPUT: at each statement i: x=y op zi: x=y op z in B, in B,

create liveness and next-use for x, y, zcreate liveness and next-use for x, y, z METHOD: for each statement in B METHOD: for each statement in B

(backward)(backward) Retrieve liveness & next-use info from a tableRetrieve liveness & next-use info from a table Set Set xx to “ to “not livenot live” and “” and “no next-useno next-use”” Set Set y, zy, z to “ to “livelive” and the next uses of y,z to “” and the next uses of y,z to “ii””

Note: step 2 & 3 cannot be interchanged.Note: step 2 & 3 cannot be interchanged. E.g., x = x + yE.g., x = x + y

Page 36: Introduction to Optimizations

36Fall 2011“Advanced Compiler

Techniques”

ExampleExample

1.1. x = 1x = 1

2.2. y = 1y = 1

3.3. x = x + yx = x + y

4.4. z = yz = y

5.5. x = y + zx = y + z

Exit:x: live, 6 y: not livez: not live

Page 37: Introduction to Optimizations

37Fall 2011“Advanced Compiler

Techniques”

Computing dependencies in a Computing dependencies in a basic block: the DAGbasic block: the DAG

Use Use directed acyclic graphdirected acyclic graph (DAG) to recognize (DAG) to recognize common subexpressions and remove redundant common subexpressions and remove redundant quadruples.quadruples.

Intermediate code optimization:Intermediate code optimization: basic block => DAG => improved block => basic block => DAG => improved block =>

assemblyassembly

Leaves are labeled with identifiers and constants.Leaves are labeled with identifiers and constants. Internal nodes are labeled with operators and Internal nodes are labeled with operators and

identifiersidentifiers

Page 38: Introduction to Optimizations

38Fall 2011“Advanced Compiler

Techniques”

DAG constructionDAG construction Forward pass over basic blockForward pass over basic block For For x = y x = y opop z; z;

Find node labeled y, or create one Find node labeled y, or create one Find node labeled z, or create oneFind node labeled z, or create one Create new node for Create new node for opop, or find an existing one , or find an existing one

with descendants y, z (with descendants y, z (need hash schemeneed hash scheme)) Add x to list of labels for new nodeAdd x to list of labels for new node Remove label x from node on which it appearedRemove label x from node on which it appeared

For For x = y;x = y; Add x to list of labels of node which currently Add x to list of labels of node which currently

holds yholds y

Page 39: Introduction to Optimizations

39Fall 2011“Advanced Compiler

Techniques”

DAG ExampleDAG Example Transform a basic block into a Transform a basic block into a

DAG.DAG.

a = b + ca = b + c

b = a – db = a – d

c = b + cc = b + c

d = a - dd = a - d

+

+

-

b0 c0

d0a

b, d

c

Page 40: Introduction to Optimizations

40Fall 2011“Advanced Compiler

Techniques”

Local Common Subexpr. Local Common Subexpr. (LCS)(LCS)

Suppose b is not live on exit.Suppose b is not live on exit.

a = b + ca = b + c

b = a – db = a – d

c = b + cc = b + c

d = a - dd = a - d

+

+

-

b0 c0

d0a

b, d

c

a = b + ca = b + c

d = a – dd = a – d

c = d + cc = d + c

Page 41: Introduction to Optimizations

41Fall 2011“Advanced Compiler

Techniques”

LCS: another exampleLCS: another example

a = b + ca = b + c

b = b – db = b – d

c = c + dc = c + d

e = b + ce = b + c

++ -

b0 c0 d0

a b

e+

c

Page 42: Introduction to Optimizations

42Fall 2011“Advanced Compiler

Techniques”

Common subexpCommon subexp

Programmers donProgrammers don’’t produce t produce common subexpressions, code common subexpressions, code generators do!generators do!

Page 43: Introduction to Optimizations

43Fall 2011“Advanced Compiler

Techniques”

Dead Code EliminationDead Code Elimination

Delete any root that has no live Delete any root that has no live variables attachedvariables attached

a = b + ca = b + c

b = b – db = b – d

c = c + dc = c + d

e = b + ce = b + c++ -

b0 c0 d0

a b

e+

c On exit:

a, b live

c, e not live

a = b + ca = b + c

b = b – db = b – d

Page 44: Introduction to Optimizations

44Fall 2011“Advanced Compiler

Techniques”

OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)

LoopsLoops Local OptimizationsLocal Optimizations

Peephole optmizationPeephole optmization

Page 45: Introduction to Optimizations

45Fall 2011“Advanced Compiler

Techniques”

Peephole OptimizationPeephole Optimization Dragon§8.7Dragon§8.7

Introduction to peepholeIntroduction to peephole Common techniquesCommon techniques Algebraic identitiesAlgebraic identities An exampleAn example

Page 46: Introduction to Optimizations

46Fall 2011“Advanced Compiler

Techniques”

Peephole OptimizationPeephole Optimization Simple compiler do not perform machine-Simple compiler do not perform machine-

independent code improvementindependent code improvement They generates They generates naivenaive code code

It is possible to take the target hole and It is possible to take the target hole and optimize itoptimize it Sub-optimal sequences of instructions that Sub-optimal sequences of instructions that

match an optimization pattern are transformed match an optimization pattern are transformed into optimal sequences of instructionsinto optimal sequences of instructions

This technique is known as This technique is known as peephole optimization

Peephole optimization usually works by sliding a Peephole optimization usually works by sliding a window of several instructions (a window of several instructions (a peepholepeephole))

Page 47: Introduction to Optimizations

47Fall 2011“Advanced Compiler

Techniques”

Peephole OptimizationPeephole Optimization

Goals:- improve performance- reduce memory footprint- reduce code size

Method: 1. Exam short sequences of target instructions 2. Replacing the sequence by a more efficient one.

• redundant-instruction elimination • algebraic simplifications• flow-of-control optimizations • use of machine idioms

Page 48: Introduction to Optimizations

48Fall 2011“Advanced Compiler

Techniques”

Peephole OptimizationPeephole OptimizationCommon TechniquesCommon Techniques

Page 49: Introduction to Optimizations

49Fall 2011“Advanced Compiler

Techniques”

Peephole OptimizationPeephole OptimizationCommon TechniquesCommon Techniques

Page 50: Introduction to Optimizations

50Fall 2011“Advanced Compiler

Techniques”

Peephole OptimizationPeephole OptimizationCommon TechniquesCommon Techniques

Page 51: Introduction to Optimizations

51Fall 2011“Advanced Compiler

Techniques”

Peephole OptimizationPeephole OptimizationCommon TechniquesCommon Techniques

Page 52: Introduction to Optimizations

52Fall 2011“Advanced Compiler

Techniques”

Algebraic identitiesAlgebraic identities Worth recognizing single instructions with Worth recognizing single instructions with

a constant operanda constant operand Eliminate computationsEliminate computations

A * 1 = AA * 1 = A A * 0 = 0A * 0 = 0 A / 1 = AA / 1 = A

Reduce strenthReduce strenth A * 2 = A + AA * 2 = A + A A/2 = A * 0.5A/2 = A * 0.5

Constant foldingConstant folding 2 * 3.14 = 6.282 * 3.14 = 6.28

More delicate with floating-pointMore delicate with floating-point

Page 53: Introduction to Optimizations

53Fall 2011“Advanced Compiler

Techniques”

Is this ever helpful?Is this ever helpful? Why would anyone write X * 1?Why would anyone write X * 1? Why bother to correct such obvious junk Why bother to correct such obvious junk

code?code? In fact one might writeIn fact one might write

#define MAX_TASKS 1#define MAX_TASKS 1......a = b * MAX_TASKS;a = b * MAX_TASKS;

Also, seemingly redundant code can be Also, seemingly redundant code can be produced by other optimizations. produced by other optimizations. This is an important effect.This is an important effect.

Page 54: Introduction to Optimizations

54Fall 2011“Advanced Compiler

Techniques”

Replace Multiply by ShiftReplace Multiply by Shift A := A * 4;A := A * 4;

Can be replaced by 2-bit left shift Can be replaced by 2-bit left shift (signed/unsigned)(signed/unsigned)

But must worry about overflow if language But must worry about overflow if language doesdoes

A := A / 4;A := A / 4; If If unsignedunsigned, can replace with shift right, can replace with shift right But shift right arithmetic is a well-known But shift right arithmetic is a well-known

problemproblem Language may allow it anyway (traditional C)Language may allow it anyway (traditional C)

Page 55: Introduction to Optimizations

55Fall 2011“Advanced Compiler

Techniques”

The Right Shift problemThe Right Shift problem Arithmetic Right shift: Arithmetic Right shift:

shift right and use sign bit to fill most shift right and use sign bit to fill most significant bitssignificant bits

-5 -5 111111...1111111011111111...1111111011 SAR SAR 111111...1111111101111111...1111111101 which is -3, not -2which is -3, not -2 in most languages -5/2 = -2in most languages -5/2 = -2

Page 56: Introduction to Optimizations

56Fall 2011“Advanced Compiler

Techniques”

Addition chains for Addition chains for multiplicationmultiplication

If multiply is very slow (or on a machine If multiply is very slow (or on a machine with no multiply instruction like the with no multiply instruction like the original SPARC), decomposing a constant original SPARC), decomposing a constant operand into sum of powers of two can be operand into sum of powers of two can be effective:effective: X * 125 = x * 128 - x*4 + xX * 125 = x * 128 - x*4 + x two shifts, one subtract and one add, which two shifts, one subtract and one add, which

may be faster than one multiplymay be faster than one multiply Note similarity with efficient exponentiation Note similarity with efficient exponentiation

methodmethod

Page 57: Introduction to Optimizations

57Fall 2011“Advanced Compiler

Techniques”

Flow-of-control Flow-of-control optimizationsoptimizations

goto L1 . . .L1: goto L2

goto L2 . . .L1: goto L2

if a < b goto L1 . . .L1: goto L2

if a < b goto L2 . . .L1: goto L2

goto L1 . . .L1: if a < b goto L2L3:

if a < b goto L2 goto L3 . . .L3:

Page 58: Introduction to Optimizations

58Fall 2011“Advanced Compiler

Techniques”

Peephole Opt: an Peephole Opt: an ExampleExample

debug = 0. . .if(debug) { print debugging information }

debug = 0 . . . if debug = 1 goto L1 goto L2L1: print debugging informationL2:

Source Code:

IntermediateCode:

Page 59: Introduction to Optimizations

59Fall 2011“Advanced Compiler

Techniques”

Eliminate Jump after Eliminate Jump after JumpJump

Before:

After:

debug = 0 . . . if debug = 1 goto L1 goto L2L1: print debugging informationL2:

debug = 0 . . . if debug 1 goto L2 print debugging informationL2:

Page 60: Introduction to Optimizations

60Fall 2011“Advanced Compiler

Techniques”

Constant PropagationConstant Propagation

Before:

After:

debug = 0 . . . if debug 1 goto L2 print debugging informationL2:

debug = 0 . . . if 0 1 goto L2 print debugging informationL2:

Page 61: Introduction to Optimizations

61Fall 2011“Advanced Compiler

Techniques”

Unreachable CodeUnreachable Code(dead code elimination)(dead code elimination)

Before:

After:

debug = 0 . . .

debug = 0 . . . if 0 1 goto L2 print debugging informationL2:

Page 62: Introduction to Optimizations

62Fall 2011“Advanced Compiler

Techniques”

Peephole Optimization Peephole Optimization SummarySummary

Peephole optimization is very fastPeephole optimization is very fast Small overhead per instruction since Small overhead per instruction since

they use a small, fixed-size windowthey use a small, fixed-size window

It is often easier to generate naIt is often easier to generate naïïve ve code and run peephole optimization code and run peephole optimization than generating good code!than generating good code!

Page 63: Introduction to Optimizations

63Fall 2011“Advanced Compiler

Techniques”

SummarySummary Introduction to optimizationIntroduction to optimization

Basic knowledgeBasic knowledge Basic blocksBasic blocks Control-flow graphsControl-flow graphs

Local OptimizationsLocal Optimizations Peephole optimizationsPeephole optimizations

Page 64: Introduction to Optimizations

64Fall 2011“Advanced Compiler

Techniques”

Next TimeNext Time Dataflow analysisDataflow analysis

Dragon§9.2 Dragon§9.2