electrical and computer engineering fun size your data: using statistical techniques to efficiently...

26
Electrical and Computer Engineering Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J. Lilja Electrical and Computer Engineering University of Minnesota [email protected]

Upload: lucas-ingram

Post on 10-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Electrical and Computer Engineering

Fun Size Your Data: Using Statistical Techniques to

Efficiently Compress and Exploit Benchmarking Results

David J. Lilja

Electrical and Computer Engineering

University of Minnesota

[email protected]

Electrical and Computer Engineering

The Problem

We can generate heaps of data But it’s noisy Too much to understand or use efficiently

Heaps o’ data445 446 397 226388 3445 188 100247762 432 54 1298 345 2245 883977492 472 565 9991 34 882 545 4022827 572 597 364 …

Benchmarkprograms

Electrical and Computer Engineering

A Solution

Statistical design of experiments techniques Compress complex benchmark results Exploit the results in interesting ways Extract new insights

Demonstrate using Microarchitecture-aware floorplanning Benchmark classification

Electrical and Computer Engineering

Why Do We Need Statistics?

Draw meaningful conclusions in the presence of noisy measurements Noise filtering

Aggregate data into meaningful information Data compression

Heaps o’ data445 446 397 226388 3445 188 100247762 432 54 1298 345 2245 883977492 472 565 9991 34 882 545 4022827 572 597 364 …

...x

Electrical and Computer Engineering

Why Do We Need Statistics?

Draw meaningful conclusions in the presence of noisy measurements Noise filtering

Aggregate data into meaningful information Data compression

Heaps o’ data445 446 397 226388 3445 188 100247762 432 54 1298 345 2245 883977492 472 565 9991 34 882 545 4022827 572 597 364 …

...x

Electrical and Computer Engineering

Design of Experiments for Data Compression

445 446 397 226388 3445 188 100247762 432 54 1298 345 2245 883977492 472 565 9991 34 882 545 4022827 572 597 364 …

A B C

V1 √ √

V2 √ √ √

V3 √ √

V4 √ √

Effects of each input A, B, C

Effects of interactions AB, AC, BC, ABC

Electrical and Computer Engineering

Types of Designs of Experiments

Full factorial design with replication O(vm) experiments = O(43)

Fractional factorial designs O(2m) experiments = O(23)

Multifactorial design (P&B) O(m) experiments = O(3) Main effects only – no interactions

m-factor resolution x designs k O(2m) experiments = k O(23) Selected interactions

A B C

V1 √ √

V2 √ √ √

V3 √ √

V4 √ √

Electrical and Computer Engineering

Example:Architecture-Aware

Floor-Planner

V. Nookala, S. Sapatnekar, D. Lilja, DAC’05.

Electrical and Computer Engineering

Motivation

Imbalance between device and wire delays

Global wire delays > system clock cycle in nanometer technology

wire

Layout

Electrical and Computer Engineering

Solution

Wire-pipelining If delay > a clock cycle → insert flip-

flops along a wire Several methods for optimal FF insertion

on a wire • Li et al. [DATE 02]

• Cocchini et al. [ICCAD 02]

• Hassoun et al. [ICCAD 02]

wire

Layout

FF

But what about the performance impact of the pipeline delays?

Electrical and Computer Engineering

Impact on PerformanceExecution time = num-instr * cycles/instr (CPI) * cycle-timeExecution time = num-instr * cycles/instr (CPI) * cycle-time

Wire-pipelining

Electrical and Computer Engineering

Impact on Performance

Key idea Some buses are critical Some can be freely pipelined without (much) penalty

Execution time = num-instr * cycles/instr (CPI) * cycle-timeExecution time = num-instr * cycles/instr (CPI) * cycle-time

Wire-pipelining

Electrical and Computer Engineering

Change Objective Function

Traditional physical design objectives Minimize area, total wire length, etc.

New objective Optimize only throughput critical wires to maximize

overall performance

Execution time = num-instr * cycles/instr (CPI) * cycle-timeExecution time = num-instr * cycles/instr (CPI) * cycle-time

Wire-pipelining

Electrical and Computer Engineering

Conventional Microarchitecture Interaction with Floor Planner

Simulation Methodology

Physical Design

µ-arch

Benchmarks

CPI info

Frequency

Electrical and Computer Engineering

Microarchitecture-aware Physical Design

Incorporate wire-pipelining models into the simulator Extra pipeline stages in processor Simulator needs to adjust operation latencies

Simulation Methodology

Physical Design

µ-arch

Benchmarks

CPI info

FrequencyLayout

Electrical and Computer Engineering

But There are Problems

Simulation is too slow 2000-3000 instructions per simulated instruction Numerous benchmark programs to consider

Exponential search space Thousands of combinations tried in physical design step

Simulation Methodology

Physical Design

µ-arch

Benchmarks

CPI info

FrequencyLayout

Electrical and Computer Engineering

Design of Experiments Methodology

Design of Experiments based

Simulation Methodology

Floorplanning Validation

µ-arch

benchmarks

benchmarksBus, interaction weights

Layout

MinneSPECReduced input sets

# Simulations is linear in the number of buses (if no interactions)

Frequency

Electrical and Computer Engineering

Related Floorplanning Work Simulated Annealing (SA)

CPI look up table [Liao et al, DAC 04] Bus access ratios from simulation profiles

Minimize the weighted sum of bus latencies [Ekpanyapong et al, DAC 04]

Throughput sensitivity models for a selected few critical paths Limited sampling for a large solution space

[Jagannathan et al, ASPDAC 05] Our approach

Design of experiments to identify criticality of each bus

Electrical and Computer Engineering

Microarchitecture and factors22 buses → 19 factors in

experimental design Some factors model multiple

buses

Fetch Decode

RUU

REG

BPRED

IL1DL1

L2ITLB

LSQ

DTLB

IADD1

IADD2

IADD3

IMULT

FMULT

FADD

Electrical and Computer Engineering

2-level Resolution III Design2-levels for each factor

Lowest and highest possible values (range)

Latency range of buses Min = 0 Max = Chip corner-corner wire latency

19 factors 32 simulations (nearest power of 2) Captured by a design matrix (32x19)

• 32 rows - 32 simulations

• 19 columns - Factor values

Electrical and Computer Engineering

Experimental setup Nine SPEC 2000 benchmarks

MinneSPEC reduced input sets

SimpleScalar simulator Floorplanner -- PARQUET

Simulated annealing based

Objective functionMinimize the weighted sum of bus latencies Secondarily minimize aspect ratio and area

Electrical and Computer Engineering

Comparisons

Case Description

SFP Our “statistical floorplanner”

acc Access ratios from [Ekpanyapong et al, DAC 04]

minWL Traditional floorplanning

Electrical and Computer Engineering

Typical Results for Single Benchmark

Electrical and Computer Engineering

Averaged Over All Benchmarks

Compared to acc 3-7% point

improvement

Better improvements over acc at higher frequencies

SFP-comb ≈ SFP (within about 1-3% points)

Electrical and Computer Engineering

Summary

Use statistical design of experiments Compress benchmark data into critical bus weights

Used by microarchitecture-aware floorplanner Optimizes insertion of pipeline delays on wires to

maximize performance

Extend methodology for other critical objectives Power consumption Heat distribution

Electrical and Computer Engineering

Collaborators and Funders

Vidyasagar Nookala Joshua J. YiSachin SapatnekarSemiconductor Research Corporation (SRC)IntelIBM