slack analysis in the system design loop

27
Slack Analysis in the System Design Loop Girish Venkataramani Carnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University

Upload: ivie

Post on 05-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

Slack Analysis in the System Design Loop. Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University. Typical System Design Flow. Spec. Scalability Issues: Simulation takes minutes to hours Synthesis takes many hours. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Slack Analysis in the System Design Loop

Slack Analysis in the System Design Loop

Girish Venkataramani Carnegie Mellon University,The MathWorks

Seth C. Goldstein Carnegie Mellon University

Page 2: Slack Analysis in the System Design Loop

2

Typical System Design Flow

Spec.

Mapping + Allocation

Physical Synthesis

System Partitioning

RTL (*.v, *.vhd)Simulation

Code Emission

SystemDesign Loop

Too slow!

Scalability Issues:Simulation takes minutes to hoursSynthesis takes many hours

IR

Page 3: Slack Analysis in the System Design Loop

3

Proposed Design Flow

Spec.

Mapping + Allocation

Physical Synthesis

System Partitioning

RTL (*.v, *.vhd)Simulation

Code Emission

IRTraditionalSystemDesign Loop

Timing Analysis

Optimize Design

Update timing in IR

New Optimization

Loopreplace with

Page 4: Slack Analysis in the System Design Loop

4

Key Contributions

Spec.

Mapping + Allocation

Physical Synthesis

System Partitioning

RTL (*.v, *.vhd)Simulation

Code Emission

IR

Timing Analysis

Optimize Design

Update timing in IR

1. Slack is a distributed representation of system timing2. Linear-time update algorithm based on slack3. > 100x reduction in total design time:

• Optimization Loop runs in seconds/minutes while • System Design Loop runs in hours/days

TraditionalSystemDesign Loop

replace withNew

OptimizationLoop

Page 5: Slack Analysis in the System Design Loop

5

Outline

• Motivation

• Timing Metrics– Cycle time– Slack: An alternative view of cycle time

• Slack Update Algorithm• Experimental Evaluation• Conclusions

Page 6: Slack Analysis in the System Design Loop

6

The Intermediate Representation (IR)

• Models a dynamic system

• Concurrent sub-systems– PE, FSM, S/W, Memory

• Communication between sub-systems based on pre-defined protocols– FIFOs, NoC, shared bus

Sub-System

Sub-System

Sub-System

Sub-System

Network

… …

… …

Transaction-Level Modeling (TLM), [Cai, ISSS 03]Adopted by System-C, Bluespec, Balsa, Tangram

Page 7: Slack Analysis in the System Design Loop

7

Marked Graphs

• Model dynamic system interactions

• Events and transitions– Event: An edge acquires a

token– Transition: Node consumes

inputs and generates outputs

• Encode the communication protocols

S1 S2

S3

token

Page 8: Slack Analysis in the System Design Loop

8

Timing Analysis of IR

• Time Separation between Event (TSEs)– TSE between consecutive firings of same event in steady state is

the mean cycle time

• Mean Cycle Time, CT

• Computing CT is about O(|E|3) complexity [Dasdan 04]

cycleon tokensofnumber is N

cycle,graph a oflatency isC :

)(

i

iwhere

CTi

i

i

NC

CMax

Page 9: Slack Analysis in the System Design Loop

9

Slack as a Timing Metric

• Distributed representation of cycle time– Different type of TSE– Defined on each (input edge, node) pair– How early this input arrives

Slack(S1, S3) = 3Slack(S2, S3) = 0Slack(S3, S1) = 0Slack(S3, S2) = 0

• Zero-slack input is locally critical

• Longest chain of zero-slack events yields the critical cycle or the Global Critical Path (GCP)– Cycle time = Latency of the GCP– Given slack values, computing cycle time has linear complexity

*2 5

8

S1 S2

S3

locallycritical

Page 10: Slack Analysis in the System Design Loop

10

Slack is an Annotation on the IR

Sub-System

Sub-System

Sub-System

Sub-System

Network

… …

… …

SlackSlack

Slack

Slack

Helps in discovering hotspots and applying optimizations

Page 11: Slack Analysis in the System Design Loop

11

Outline

• Motivation

• Timing Metrics

• Slack Update Algorithm

• Experimental Evaluation

• Conclusions

Page 12: Slack Analysis in the System Design Loop

12

Optimizations Change the IR

Spec.

Mapping + Allocation

Physical Synthesis

System Partitioning

RTL (*.v, *.vhd)Simulation

Code Emission

IR+

slack

Optimize Design

Update timing in IR

New Optimization

Loop

Causes changes in component delays,in turn, changing in slack values

Need to update slack on each change

Page 13: Slack Analysis in the System Design Loop

13

Problem Description

• Given a graph model and its current slack values, compute new values of slack when latency of a node changes by a given Δ

2 5

8

S1 S2

S3 6, Δ = -2

Page 14: Slack Analysis in the System Design Loop

14

Insight behind Update Algorithm

• Slack is also latency difference of two branches of a re-convergent fork-join

• If delay of node, sc, increases by Δ– Update slack in surrounding

re-convergent fork-joins– Propagate change globally

sfork

sjoin

e1 e2

P1

P2

Assume δ(P1) > δ(P2), Di = δ(P1) - δ(P2)Slack(e1, sjoin) = 0Slack(e2, sjoin) = Di

Update (let Δ ≤ Di):Slack(e2, sjoin) = Di + Δ

+Δ sc

Page 15: Slack Analysis in the System Design Loop

15

Insight behind Update Algorithm

1. If there is a path from change point to every input of sjoin, then no change in slack

2. If not, then slack changes occur

sjoin

e1 e2

+Δ sc

sfork+Δ

Easy for acyclic graphs, but what about scc graphs?

Page 16: Slack Analysis in the System Design Loop

16

Insight for Cyclic Graphs

• Use token knowledge– Count tokens from change point to every input– If value is equal for all inputs, then no change in slack values

2 5

8

S1 S2

S3

2, Δ = -3

# toks = 1# toks = 2

Change in slack exists

Page 17: Slack Analysis in the System Design Loop

17

Insight for Cyclic Graphs

• Use token knowledge– Count tokens from change point to every input– If value is equal for all inputs, then no change in slack values

2 5

8

S1 S2

S3 6, Δ = -2

# toks = 0 # toks = 0

No change in slack

Page 18: Slack Analysis in the System Design Loop

18

Algorithm Summary

• Initially, find # tokens between every pair of nodes in the graph– Problem formulated as a flow lattice

– Invoked once, complexity is O(|M0| |V|)

• After inducing every change in graph– Compute slack change at each node– Propagate new change to neighboring outputs– Overall complexity is O(|V|)

Page 19: Slack Analysis in the System Design Loop

19

Outline

• Motivation

• Timing Metrics

• Slack Update Algorithm

• Experimental Evaluation

• Conclusions

Page 20: Slack Analysis in the System Design Loop

20

Experimental Setup

• Slack Update loop incorporated into CASH compiler [ASPLOS 04, DAC 07]

– Synthesizes asynchronous circuits from ANSI-C programs

• Applied three different optimizations– SM: Slack Matching [ICCAD 06]

– OC: Operation Chaining [ICCAD 07]– ASU: Heterogeneous Pipeline Synthesis [Async 08]

• Benchmarks: Fifteen frequently executed kernels from Mediabench suite, [Lee 97]

• All results are post-synthesis mapped to ST Micro 180nm library

Page 21: Slack Analysis in the System Design Loop

21

Absolute Accuracy

• After SM, compare computed values of slack against actual values of slack for adpcm_d

0

10

20

30

40

50

60

70

80

90

100

-100 -50 0 50 100

% Slack Difference = [(Actual - Estimate)*100 / Actual]

% o

f T

ota

l Eve

nts

• Close to 100 changes applied (algo invoked for each change)

• 1.2x performance speedup

• Update inaccuracy due to unknown latency values during circuit transformation

Page 22: Slack Analysis in the System Design Loop

22

Design Loop Experiments

Spec.

Mapping + Allocation

Physical Synthesis

System Partitioning

RTL (*.v, *.vhd)Simulation

Code Emission

IRSystemDesign Loop

Timing Analysis

Optimize Design

Update timing in IR

OptimizationLoop

Run the same N optimizations in both loops and compare:1. Overall Performance change2. Overall design time

Page 23: Slack Analysis in the System Design Loop

23

Design Loop Experiments

• Three optimization sequences1. SM-ASU: Slack matching followed by

Heterogenous latch insertion

2. ASU-SM

3. SM-ASU-OC

• Compare final circuit timing and loop traversal (design) times between the two methodologies

Page 24: Slack Analysis in the System Design Loop

24

Design Loop Experiments

• About ~500 circuit changes, on average• Design Loop time: 0.5 – 4 hours• Optimization loop: 10 - 100 seconds

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

K1.ad

pcm

_d

K2.ad

pcm

_e

K3.g7

21_d

K4.g7

21_e

K5.gs

m_d

K6.gs

m_d

K7.gs

m_e

K8.gs

m_e

K9.jpe

g_d

K10.jp

eg_e

K11.m

peg2

_d

K12.m

peg2

_d

K13.m

peg2

_e

K14.p

gp_d

K15.p

gp_e

Sp

ee

d-u

p o

ve

r B

as

e

Design Loop

Optimization Loop

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

K1.ad

pcm

_d

K2.ad

pcm

_e

K3.g7

21_d

K4.g7

21_e

K5.gs

m_d

K6.gs

m_d

K7.gs

m_e

K8.gs

m_e

K9.jpe

g_d

K10.jp

eg_e

K11.m

peg2

_d

K12.m

peg2

_d

K13.m

peg2

_e

K14.p

gp_d

K15.p

gp_e

Sp

ee

du

p o

ve

r B

as

e

Design Loop

Optimization Loop

SM-ASU ASU-SM

Page 25: Slack Analysis in the System Design Loop

25

Design Loop Experiments: SM-ASU-OC

• About ~1000-3000 circuit changes, on average• Design Loop time: 1 – 10 hours• Optimization loop: 20 - 200 seconds

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

K1.ad

pcm

_d

K2.ad

pcm

_e

K3.g7

21_d

K4.g7

21_e

K5.gs

m_d

K6.gs

m_d

K7.gs

m_e

K8.gs

m_e

K9.jpe

g_d

K10.jp

eg_e

K11.m

peg2

_d

K12.m

peg2

_d

K13.m

peg2

_e

K14.p

gp_d

K15.p

gp_e

Sp

ee

du

p o

ve

r B

as

e

Design Loop

Optimization Loop

d

300x Speedup

Page 26: Slack Analysis in the System Design Loop

26

Outline

• Motivation

• Timing Metrics

• Slack Update Algorithm

• Experimental Evaluation

• Conclusions

Page 27: Slack Analysis in the System Design Loop

27

Conclusions

• Slack update algorithm to speed up design loop in TLM-based workflows

• Use slack to track system-level timing changes

• Orders of magnitude reduction in design time at negligible loss in accuracy

• New optimizations loop enables scalability in iterative system design flows