dynamic removal of redundant computations
DESCRIPTION
ICS´99, Rhodes (Greece) - June 20-25, 1999. Dynamic Removal of Redundant Computations. Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona {cmolina,antonio,jordit}@ac.upc.es. Motivation. Quasi-common subexpression. Quasi - invariant. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/1.jpg)
UU PP CC
Dynamic Removal of Redundant Computations
Dynamic Removal of Redundant Computations
Carlos Molina, Antonio González and Jordi Tubella
Universitat Politècnica de Catalunya - Barcelona
{cmolina,antonio,jordit}@ac.upc.es
ICS´99, Rhodes (Greece) - June 20-25, 1999
![Page 2: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/2.jpg)
UU PP CC
for (i=0; i<N; i++)
A[i] = B[i]+C[i];
. . . . .
R = S / T ;
. . . . .
X = S / U ;
. . . . .
MotivationMotivation
Quasi - invariantQuasi-common subexpression
![Page 3: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/3.jpg)
UU PP CC
OutlineOutline
Instruction Reuse
Related Work
Redundant Computation Buffer
Performance Results
Conclusions
![Page 4: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/4.jpg)
UU PP CC
Instruction ReuseInstruction Reuse
FetchDecode
& Rename
CommitOOO
Execution
Reuse
Mechanismindex
![Page 5: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/5.jpg)
UU PP CC
Related WorkRelated Work
Instruction Reuse Value Cache for the Tree Machine (Harbison 82) Result Cache (Richardson 92, Oberman et al. 95) Reuse Buffer (Sodani and Sohi 97) Physical Register Reuse (Jourdan et al. 98)
Trace Reuse Basic blocks (Huang and Lilja 99) General traces (González et al. 99)
![Page 6: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/6.jpg)
UU PP CC
Related WorkRelated Work
Result Cache Richardson 92, Oberman & Flynn 95
– Special purpose (long latency operations)– Indexed by operand values– No reuse chaining– Can reuse dynamic instances of other static instructions
Reuse Buffer Sodani & Sohi 97
– General purpose– Indexed by PC– Reuse chaining– Only reuse dynamic instances of same static instructions
![Page 7: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/7.jpg)
UU PP CC
Redundant Computation BufferRedundant Computation Buffer
Vtable
Atable pointer
opcode result/address opnd1 opnd2 pointer
Atable
address tag result
Mtable
Reuse Test
Reused Value
Reused Memory Value
![Page 8: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/8.jpg)
UU PP CC
RCB (Working Example)RCB (Working Example)
I1: 8 / 2 = 4
Vtable Atable
10: div 8 nil2 4
4
while (cond) { r = s / t ; ...... x = s / u ; }
![Page 9: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/9.jpg)
UU PP CC
20: div 8 2 4 nil
RCB (Working Example)RCB (Working Example)
Vtable
10:
Atable
div 8 nil2 4
4
while (cond) { r = s / t ; ...... x = s / u ; } I2: 8 / 2 = 4
![Page 10: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/10.jpg)
UU PP CC
Vtable
10:
Atable
div 8 nil2 4
4
while (cond) { r = s / t ; ...... x = s / u ; } I2: 8 / 2 = 4
20: div 8 2 4
RCB (Working Example)RCB (Working Example)
![Page 11: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/11.jpg)
UU PP CC
20: div 8 nil2 4
div 8 nil2 4div 9 nil3 3
Vtable
10:
Atable
4
while (cond) { r = s / t ; ...... x = s / u ; }
I1: 9 / 3 = 3
3
I2: 9 / 3 = 3
RCB (Working Example)RCB (Working Example)
![Page 12: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/12.jpg)
UU PP CC
Enhanced Result CacheMtable
address tag result
Atable
opcode result/address opnd1 opnd2Operands
Enhanced Reuse BufferMtableAtable
opcode result/address opnd1 opnd2
address tag result
PC
Enhancements to Other SchemesEnhancements to Other Schemes
![Page 13: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/13.jpg)
UU PP CC
Timing ConsiderationsTiming Considerations
fetch issue commitexecute write backdecode&
rename
opnd read&dispatch
Pipeline Stages
Atablelookup
reuse test
Latency of the Reuse Buffer
1stAtable lookup
reuse test
2ndAtable lookup
Latency of the RCB
Atablelookup
reuse test
Latency of the Result Cache
![Page 14: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/14.jpg)
UU PP CC
Experimental FrameworkExperimental Framework
Simulator Alpha version of the SimpleScalar Toolset
BenchmarksSpec95
Maximum Optimization LevelDEC C & F77 compilers with -non_shared -O5
Statistics Collected for 125 million instructionsSkipping initializations
![Page 15: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/15.jpg)
UU PP CC
Basic Reuse StatisticsBasic Reuse Statistics
We evaluate different schemes- Enhanced Result Cache (ERC)- Enhanced Reuse Buffer (ERB)- Redundant Computation Buffer (RCB)
We find best configuration for each scheme- Number of entries- History depth
Best configurations will be evaluated- Percentage of reuse- Speedup
![Page 16: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/16.jpg)
UU PP CC
Quasi-Common SubexpressionsQuasi-Common Subexpressions
05
1015202530354045
Per
cen
tag
e o
f R
euse
ERB
RCB
32 KB
![Page 17: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/17.jpg)
UU PP CC
Study of Reuse (ERB)Study of Reuse (ERB)
10
15
20
25
30
35
40
45
50
55
Per
cen
tag
e o
f R
euse
16K entries
8K entries
4K entries
2K entries
1K entries
512 entries
256 entries
128 entries
| | | | | | | | |
8 16 32 64 128 256 512 1024 2048 4096
Size in Kbytes
![Page 18: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/18.jpg)
UU PP CC
Study of Reuse (RCB)Study of Reuse (RCB)
15
20
25
30
35
40
45
50
55
60
Per
cen
tag
e o
f R
euse
16K entries
8K entries
4K entries
2K entries
1K entries
512 entries
256 entries
128 entries
| | | | | | | | |
8 16 32 64 128 256 512 1024 2048 4096
Size in Kbytes
![Page 19: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/19.jpg)
UU PP CC
Study of Reuse (Comparative)Study of Reuse (Comparative)
10
20
30
40
50
60
70
Pe
rce
nta
ge
of
Re
us
e
ERB RCB ERC
| | | | | | | | |
8 16 32 64 128 256 512 1024 2048 4096
Size in Kbytes
![Page 20: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/20.jpg)
UU PP CC
Performance EvaluationPerformance Evaluation
Two different capacities are evaluated- 32 KB- 200 KB
Best configuration has been chosen for each reuse scheme
We present a performance evaluation for a supercalar processor
- Speedup- Percentage of reuse
![Page 21: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/21.jpg)
UU PP CC
Instruction fetch 4 instructions per cycle
Branch predictor 2048-entry bimodal predictor
Data cache 16 KB, 2-way-set associative, 32-byte block, 6-cycle miss latency
Instruction cache 16KB, direct mapped, 32 byte cache line, 6-cycle miss latency
Instruction issue/commitOut of order issue, 4 I´s commit per cycle, 32-entry reorder buffer,load execute if preceding stores are known, store-load forwarding
Architected registers 32 integer and 32 FP
Functional units4 integer ALUs, 2 load/store units, 4 FP adders,
1 integer mult/div, 1 FP mult/div
FU latency/repeat timeInteger ALU 1/1, load/store 1/1, integer mult 3/, integer div 20,19,
FP adder 2/1, FP mult 4/1, FP div 12/12
Base MicroarchitectureBase Microarchitecture
![Page 22: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/22.jpg)
UU PP CC
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
H_Mean
ERB
RCB
ERC
Speedup (32 KB)Speedup (32 KB)
1.20
1.10
1.00
1.05
1.15
![Page 23: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/23.jpg)
UU PP CC
Speedup (200 KB)Speedup (200 KB)
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
H_Mean
ERB
RCB
ERC
1.25
1.20
1.15
1.10
1.05
1.00
![Page 24: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/24.jpg)
UU PP CC
0
10
20
30
40
50
60
70
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
ERB
RCB
ERC
0
10
20
30
40
50
60
70
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
ERB
RCB
ERC
Reuse (32 KB)Reuse (32 KB)
Ops ready
![Page 25: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/25.jpg)
UU PP CC
Reuse (200 KB)Reuse (200 KB)
0
10
20
30
40
50
60
70
80
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
ERB
RCB
ERC
0
10
20
30
40
50
60
70
80
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
ERB
RCB
ERC
Ops ready
![Page 26: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/26.jpg)
UU PP CC
0102030405060708090
100
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
ERB
RCB
ERC
0102030405060708090
100
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
ERB
RCB
ERC
0102030405060708090
100
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
ERB
RCB
ERC
0102030405060708090
100
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
ERB
RCB
ERC
Reuse by Instruction CategoryReuse by Instruction Category
Load Value Memory Address Arithmetic Cond Branch
![Page 27: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/27.jpg)
UU PP CC
Hybrid SchemeHybrid Scheme
opco res/addr op1 op2 pointer
Atable
PC Atable
opco res/addr op1 op2 pointerPC
Opnds opco res/addr op1 op2 nilAtable
opcod result/addr opnd1 opnd2 Opnds
![Page 28: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/28.jpg)
UU PP CC
Speedup (Hybrid Scheme)Speedup (Hybrid Scheme)
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
H_Mean
RCB
Hybrid
1.20
1.10
1.05
1.00
1.15
![Page 29: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/29.jpg)
UU PP CC
Reuse (Hybrid Scheme)Reuse (Hybrid Scheme)
0
10
20
30
40
50
60
70
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
A_Mean
RCB
Hybrid
![Page 30: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/30.jpg)
UU PP CC
Speedup (Perfect Reuse Engine)Speedup (Perfect Reuse Engine)
Applu
Compre
ssGcc Go Li
M88
ksim
Mgrid Perl
Swim
Turb3d
Vortex
H_Mean
1.60
1.40
1.80
2.00
2.20
1.20
1.00
![Page 31: Dynamic Removal of Redundant Computations](https://reader036.vdocument.in/reader036/viewer/2022062323/5681584e550346895dc5ab4a/html5/thumbnails/31.jpg)
UU PP CC
ConclusionsConclusions
Redundant Computation Buffer Quasi-invariants Quasi-common subexpressions
High reuse coverage and low latency 30% reuse 10% speedup Outperforms previous schemes