clustering of flip-flops for useful-skew clock tree synthesis · clock tree synthesis (cts) •...

50
Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis Chuan Yean Tan Purdue University Rickard Ewetz University of Central Florida Cheng-Kok Koh Purdue University 1

Upload: others

Post on 27-Mar-2020

16 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis

Chuan Yean Tan • Purdue University Rickard Ewetz • University of Central Florida Cheng-Kok Koh • Purdue University

1

Page 2: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary

2

Page 3: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary

3

Page 4: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Multi-Bit Flip Flop (MBFF)

D0 Q0

CLK

D0 Q0

CLK

D1 Q1D0 Q0

CLK

D1 Q1

D2 Q2D3 Q3

1-Bit Flip-Flop 2-Bit Flip-Flop 4-Bit Flip-Flop

4

Page 5: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Benefits of utilizing MBFF

• Reduce power consumption• Reduce routing complexity and clock tree complexity,

less elements to connect leads to less routing complexity

5[3] L.Chen et al. Using MBFF for clock power saving by design compiler. – Proceedings of Synopsys User Group, 2010

MBFF Power And Area Savings[3]

Page 6: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary

6

Page 7: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Previous Works Summary

7

Key Methodologies of Previous Works:• Timing-Driven Intersecting of Feasible Regions[14] • Maximum Independent Set based on heuristic approach [2]• x- and y- interval graphs to determine maximum clique [9]• K-means Algorithm [15,10]

[2] Y.T.Chang et al. Post-placement power optimization with multi-bit flip-flops - ICCAD 2010[9] I.H.R. Jiang et al. INTEGRA: Fast multi-bit flip-flop cluster for clock power saving. - IEEE Transactions CAD of I.C.S. 2015[10] A.B.Khang et al. Improve flop tray-based design implementation for power reduction. - ICCAD 2016[14] S.H.Wang et al. Power-driver flip-flop merging and relocation. - IEEE Transactions CAD of I.C.S 2012[15] G.Wu et al. Flip-Flop clustering by weighted k-means algorithm. - DAC 2016

Page 8: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

D Q

CLK

fanini

fanouti

8

Fanout’s slack

Timing-Driven Clustering of Flip-Flops

Feasible Region

Page 9: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

D Q

CLK

fanini

fanouti

9

Feasible Region

Flip-Flop is allowed to moved with the region

Page 10: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Analysis of Previous Works

• Previous work cluster flip-flop without consideration of clock tree synthesis (CTS)

• Timing slacks are obtained pre-CTS, timing does not correlate as accurately after CTS

• Most modern designs adopt useful-skew tree to reduce area and power consumption

• However, once the UST has been built, updating or displacing FF’s placement is expensive!

10

Page 11: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary

11

Page 12: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

D Q

CLK

D Q

CLK

Launch FFi Capture FFj

Combinational Logic

tiCQ tij

max tjS

tjHtij

mintiCQ

Setup Constraints: ti + tiCQ + tij

max + tjS ≤ tj + T

Hold Constraints: ti + tiCQ + tij

min ≥ tj + tjH

12

Page 13: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Setup Constraints: ti + tiCQ + tij

max + tjS ≤ tj + T

Hold Constraints: ti + tiCQ + tij

min ≥ tj + tjH

Constraints Formulation: uij = T – ti

CQ – tijmax – tj

S

lij = tjH – ti

CQ – tijmin

Skew Constraints: lij ≤ ti – tj ≤ uijlij ≤ skewij ≤ uij

13

Page 14: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Skew Constraint Graph (SCG)

14

FF1 FF4

Skew Constraint Graph

FF3

FF2l12 ≤ t1 – t2 ≤ u12 l24 ≤ t2 – t4 ≤ u24

l13 ≤ t1 – t3 ≤ u13 l34 ≤ t3 – t4 ≤ u34

Weighted Edges: w12 = u12w21 = -l12

From the skew constraints: t1 – t2 ≤ u12t2 – t1 ≤ -l12

Generalize weighted constraint: ti – tj ≤ wij

Page 15: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Useful-Skew Tree based on Arrival Time Constraints [5]

FF1 FF4

Skew Constraint Graph (SCG)

FF3

FF2l12 ≤ t1 – t2 ≤ u12 l24 ≤ t2 – t4 ≤ u24

l13 ≤ t1 – t3 ≤ u13 l34 ≤ t3 – t4 ≤ u34

ff4ub

ff4lb

ff2ub

ff2lb

ff3ub

ff3lb

Arrival Time Ranges

15

ff1ub

ff1lb

[5] R. Ewetz et al. Clock Tree Construction based on arrival time constraints – ACM 2017

LP Formulation

No negative cyclefrom SCG

Page 16: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

[5] R. Ewetz et al. Clock Tree Construction based on arrival time constraints – ACM 2017

D Q

CLK

D Q

CLK

D Q

CLK

D Q

CLK

Arrival Time Ranges Clock Tree Construction

16

ff4ub

ff4lb

ff2ub

ff2lb

ff3ub

ff3lb

ff1ub

ff1lb

Useful-Skew Tree based on Arrival Time Constraints [5]

Clock Tree Synthesis

Page 17: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary

17

Page 18: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Clustering of FFs Mechanics

D Q

CLK

D Q

CLK

FFiMerge FF Candidate 1

FFjMerge FF Candidate 2

fanini

fanouti

fanoutj

faninj

18

Page 19: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Displacing FFs

D Q

CLK

D Q

CLK

FFiMerge FF Candidate 1

FFjMerge FF Candidate 2

MBFF Merged Location

ΔkiΔkj

Worst Case: FF’s displacement introduces increase in wirelength 19

Page 20: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Displacing FFs

D Q

CLK

D Q

CLK

FFiMerge FF Candidate 1

FFjMerge FF Candidate 2

update1

Moving each FF will require update to SCG to avoid “out-dated” skew constraints!!

D0 Q0

CLK

D1 Q1

20

MBFFij

update2

Page 21: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Displacing FFs

D Q

CLK

D Q

CLK

FFiMerge FF Candidate 1

FFjMerge FF Candidate 2

MBFFij

update1

Furthermore, displacing FFi does not guarantee that all skew constraints will be met after update1

D0 Q0

CLK

D1 Q1update2

21

Page 22: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Timing Delays

D Q

CLK

D Q

CLK

FFiMerge FF Candidate 1

FFjMerge FF Candidate 2

Merge Location

ΔkiΔkj

Let Δk = timing delay for increase in wirelength 22

Page 23: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Timing Delays

fanini

fanouti

fanoutj

faninj

D0 Q0

CLK

D1 Q1

MBFFij

Δk1

Δk4

Δk3

Δk2

ΣΔki = total timing delay increased from displacing FFs23

Page 24: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Comparing FF’s Arrival Time Ranges

FF1ub

FF1lb

FF2ub

FF3ub

FF4ub

FF5ub

FF2lb

FF3lb

FF4lb

FF4lb

Example of arrival time ranges derived from skew constraints24

Page 25: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Merging Flip-Flop Candidates

FF1ub

FF1lb

FF2ub

FF3ub

FF4ub

FF5ub

FF2lb

FF3lb

FF4lb

FF4lb

Merge Candidate

25

Page 26: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Overlapping Arrival Time Ranges Between FFs

FF1ub

FF1lb

FF2ub

FF3ub

FF4ub

FF5ub

FF2lb

FF3lb

FF4lb

FF4lb

Overlapping Region

26

Page 27: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Updating New Arrival Time Range

FF1ub

FF1lb

FF2ub

MBFF1ub

FF5ub

FF2lb

MBFF1lb

FF4lb

Resulting Merge FF Arrival Time Range

Fast computation to update Arrival Time Ranges after merging FFs 27

Page 28: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Outline • MBFF Introduction • Bounded Arrival Time Useful Skew Tree • Previous MBFF/Clustering Solutions • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary

28

Page 29: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Proposed Methodology

Design Netlist

Specify Arrival Time Range

Select a FF Candidate Randomly

Cluster Flip-Flops Algorithm

Clock Tree Synthesis

if not all FF evaluated

29

Page 30: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Cluster Flip-Flops Algorithm

Find Flip-Flop Nearest Neighbor

Arrival Time Ranges Overlaps?

Compute Cluster Location

Scaled Arrival Time Ranages Overlaps?

Cluster Flip-Flops

New FF Candidate

YES

NO

NO

YES

30

Page 31: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Cluster Flip-Flops Algorithm

Find Flip-Flop Nearest Neighbor

Arrival Time Ranges Overlaps?

Compute Cluster Location

Scaled Arrival Time Ranages Overlaps?

Cluster Flip-Flops

New FF Candidate

YES

NO

NO

YES

31

Page 32: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Nearest Neighbor FF Selection HeuristicD Q

CLK

D Q

CLK

D Q

CLK

D Q

CLK

D Q

CLK

D Q

CLK

D Q

CLK

Cluster Pair 1

Cluster Pair 2

Cluster Pair 3

32

Page 33: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Cluster Flip-Flops Algorithm

Find Flip-Flop Nearest Neighbor

Arrival Time Ranges Overlaps?

Compute Cluster Location

Scaled Arrival Time Ranages Overlaps?

Cluster Flip-Flops

New FF Candidate

YES

NO

NO

YES

33

Page 34: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Compute Clustering Location

Cluster Location Algorithm:Weighted FF candidate based on FF’s fanout Higher fanout Higher capacitive load

D Q

CLK

D Q

CLK

Merge FF Candidate 1, FFiNo. of Fanout = 5

Merge FF Candidate 2, FFjNo. of Fanout = 2

Cluster Location

34

Page 35: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Compute Clustering Location

D Q

CLK

D Q

CLK

Merge FF Candidate 1, FFiNo. of Fanout = 5

Merge FF Candidate 2, FFjNo. of Fanout = 2

Cluster Location

Displacing FF with higher capacitive load will incur higher delay change

35

Page 36: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Cluster Flip-Flops Algorithm

Find Flip-Flop Nearest Neighbor

Arrival Time Ranges Overlaps?

Compute Cluster Location

Scaled Arrival Time Ranges Overlaps?

Cluster Flip-Flops

New FF Candidate

YES

NO

NO

YES

36

Page 37: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Arrival Time Range Overlaps

FF1ub

FF1lb

FF2ub

FF3ub

FF4ub

FF5ub

FF2lb

FF3lb

FF4lb

FF4lb

Overlapping Region

37

Page 38: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Updating Scaled Arrival Time Ranges

FF1ub

FF1lb

FF2ub

MBFF1ub

FF5ub

FF2lb

MBFF1lb

FF4lb

38

Scaled Arrival Time Range update after FF has been moved

Scaled from FF’s displacement

Page 39: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Cluster Flip-Flops Algorithm

Find Flip-Flop Nearest Neighbor

Arrival Time Ranges Overlaps?

Compute Cluster Location

Scaled Arrival Time Ranages Overlaps?

Cluster Flip-Flops

New FF Candidate

YES

NO

NO

YES

39

Page 40: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary

40

Page 41: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Evaluation Framework

41

Monte Carlo Framework: • Process Variation (10%)• Voltage Variation (15%) • Temperature Variation (30%)

Tree Structure: • Zero-Skew Tree (ZST) • Useful-Skew Tree (UST) [5]

Cluster FF

CTS[5]

CTO[16,17,18]

Quality of Results (QOR)

[5] R. Ewetz et al. Clock Tree Construction based on arrival time constraints – ACM 2017[16] R. Ewetz et al. MCMM clock tree optimization based on slack redistribution using a reduced slack graph. ASP-DAC ’16[17] V. Ramachandran. Construction of minimal functional skew clock trees. ISPD’12[18] S. Roy, et al. Clock tree resynthesis for multi-corner multi-mode timing closure. ISPD’14

Page 42: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Clustering FF Results

42

8 – 14% decrease in total capacitance

Benchmark from IWLS 2005 [8]

[8] International Workshop for Logic Synthesis – http://irwls.org/iwls2005/benchmarks.html - 2005

Page 43: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Clustering FF + MBFF Transformation Results

43

20% – 34% decrease in total capacitance

Page 44: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Data & Clock Path Cap Reduction

44

Assuming worst casedatapath wirelength increase

Clock Path WireCap Reduction >> Data Path Wire Cap Reduction

Page 45: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Outline • MBFF Introduction • Bounded Arrival Time Useful Skew Tree • Previous MBFF/Clustering Solutions • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary

45

Page 46: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Summary

46

• Displacing FFs requires update to skew constraints to ensure timing is met

• Clustering of Flip-Flops based on Bounded Arrival Time Range Clock Tree Synthesis provides fast update to the skew constraints while ensuring timing correctness

• Clustering of FFs reduces routing complexity • MBFF transformation further reduces power consumption

Page 47: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

47

Q&A

Page 48: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Backup Slides

48

Page 49: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Bounded Arrival Time Ranges

FF1ub

FF1lb

FF2ub

MBFF1ub

FF5ub

FF2lb

MBFF1lb

FF4lb

Resulting Merge FF Arrival Time Range

49

Page 50: Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis · clock tree synthesis (CTS) • Timing slacks are obtained pre -CTS, timing does not correlate . as accurately after

Skew Constraint Graph (SCG)

FF1 FF4

Skew Constraint Graph

FF3

FF2l12 ≤ t1 – t2 ≤ u12 l24 ≤ t2 – t4 ≤ u24

l13 ≤ t1 – t3 ≤ u13 l34 ≤ t3 – t4 ≤ u34

50