enabling efficient on-the-fly microarchitecture simulation thierry lafage [email protected]...

19
Enabling Efficient On- the-fly Microarchitecture Simulation Thierry Lafage [email protected] September 2000

Upload: gabriella-obrien

Post on 17-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Enabling Efficient On-the-fly Microarchitecture Simulation

Thierry Lafage

[email protected]

September 2000

September 2000 Thierry Lafage 2

Introduction

• Microarchitecture simulation:– Accurate, but slow (execution 1000-10000)

– “On-the-fly” (vs. trace-driven):• Enables execution-driven simulation (complex

microprocessors)

• Simulation of long running workloads

• Complete microprocessor simulation requires:– Realistic workloads and working sets

– Huge amount of CPU time

September 2000 Thierry Lafage 3

• Realistic simulations in an affordable time

simulations of a reduced number of instructions:

• One “big slice” (eg. after program start-up phase)

• Trace sampling

Introduction (2)

Representativeness of the simulated execution slices?Representativeness of the simulated execution slices?

• On-the-fly simulations fast forwarding Current tools “fast” forwarding mode: >20Current tools “fast” forwarding mode: >20 execution slowdown execution slowdown

0 1.5B.1B.500M. ...

0 1B.500M.

September 2000 Thierry Lafage 4

Outline1. Speeding up the fast forwarding mode

– Approach

– Implementation

– Performance on the SPEC95 benchmarks

– Conclusion

2. Selecting representative execution slices– Approach

– Application to data cache simulations

– Conclusion

Conclusion and Future Work

September 2000 Thierry Lafage 5

Speeding up the fast forwarding mode

Two execution modes:

• A really fast mode (static code annotation) Rapid positioning of the execution where to

begin the simulation with direct execution

• An emulation mode (embedded instruction-set emulator) Calls to analysis routines (user provided)

At run time:Dynamic switches between both modes

September 2000 Thierry Lafage 6

DICEHost ISAEmulator

User analysisroutines

Implementation

Original code

SPARC V9 assembly

code

calvin2Static Code Annotation Tool

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint

Switching event

Emulation modeSwitching event

September 2000 Thierry Lafage 7

Performance on the SPEC95 Benchmarks

• calvin2+DICE:– Average slowdown in fast mode: 1.31 (checkpoints at

procedure calls and inside loops)

– Average slowdown in emulation mode (instruction and data addresses trace): 117.47

• Shade (instruction and data address generation enabled):– Average slowdown in “fast forward” mode: 17.07

(empty analysis routine)

– Average slowdown in emulation mode: 82.19 (tracing analysis routine)

September 2000 Thierry Lafage 8

A Simple Example of Microprocessor Simulation

• Simulation of 1% of a 1 hour workload

• Additional 1000 slowdown

Direct Execution Emulation + Simulation

With calvin2+DICE:0.99 1.31 + 0.01 (117.45 + 1000) = 12.5 hours

Fast Forward Emulation + Simulation

With Shade:0.99 17.07 + 0.01 (82.19 + 1000) = 27.7 hours

September 2000 Thierry Lafage 9

Conclusion for calvin2+DICE

• Performance of the emulator: not an issue

• Overall performance given by the performance of the fast forwarding mode (long running workloads)

calvin2+DICE enables simulations on slices spread over a whole application

September 2000 Thierry Lafage 10

Outline1. Speeding up the fast forwarding mode

– Approach

– Implementation

– Performance on the SPEC95 benchmarks

– Conclusion

2. Selecting representative execution slices– Approach

– Application to cache simulations

– Conclusion

Conclusion and Future Work

September 2000 Thierry Lafage 11

• On-the-fly simulations using realistic applications in an affordable time simulations of a reduced number of instructions– Before: one “big slice” (after program start-up phase)

– With calvin2+DICE: on-the-fly statistical sampling

• Number of simulated instructions often determined by:– The simulation time

– Empirical results

Introduction

Representativeness of the simulated instructions?Representativeness of the simulated instructions?

0 1B.500M.

0 1.5B.1B.500M. ...

September 2000 Thierry Lafage 12

Our Approach

Dynamic characterization of the target programs

Select representative execution slices for simulations (classification)

Aim:

Tune a per-program amount of simulated activity Reduce simulation time or increase simulation result accuracy

September 2000 Thierry Lafage 13

Dynamic Characterization of the Target Programs

0 1 2 NExecution

Slices

ProgramCharacterization

Metrics independent from the implementation detail of the Metrics independent from the implementation detail of the simulated componentssimulated components

September 2000 Thierry Lafage 14

Selection of Representative Execution Slices

0 1 2 3 4

Hierarchical Classification

02 3 41

{2,1,3},{0,4}

Two slices selected

September 2000 Thierry Lafage 15

Selection of Class Representatives

Wmdc indicator: weighted mean of distances from class centers

Class centersClass representatives

September 2000 Thierry Lafage 16

Application to the Data StreamData stream characterization:

– Temporal locality: data reuse distances– Spatial locality: data reuse distances with

several line sizes

Data reuse distance (in instructions)

Rel

ativ

e fr

eque

ncy

(%)

September 2000 Thierry Lafage 17

Results for Trained Cache Simulations on the SPEC95 Benchmarks

3.3%

5%10%

10%

0

2

4

6

8

10

12

14

16

Avg

. R

E(%

)

CHAVL Sampling Sampling Big slice

Cache configurations: 4-way set associative, LRU write back, write allocate sizes from 4KB to 512KB line sizes from 16B to 128B

September 2000 Thierry Lafage 18

Conclusion for representative slice selection

• Similar results with:– Branch characterization for branch predictor simulations

– Data stream characterization, branch characterization, instruction mix and basic block sizes for data cache simulations and branch predictor simulations

Program characterization actually helps in tuning the amount of simulated activity

September 2000 Thierry Lafage 19

General Conclusion• calvin2+DICE enables simulations on slices

spread over a whole application• Our approach enables to select representative

execution slices

Future Work• Complete execution-driven simulations (complex

microprocessor)• Operating system activity: LiKE, a Linux Kernel

Emulator