ece260b – cse241a winter 2005 clocking
DESCRIPTION
ECE260B – CSE241A Winter 2005 Clocking. Website: http://vlsicad.ucsd.edu/courses/ece260b-w05. Slides courtesy of Prof. Andrew B. Kahng. Outline. Problem Statement Clock Distribution Structures Robustness / Signal Integrity Control Clock Design: Skew Scheduling Topology Construction - PowerPoint PPT PresentationTRANSCRIPT
ECE 260B – CSE 241A Clocking 1 http://vlsicad.ucsd.edu
ECE260B – CSE241A
Winter 2005
Clocking
Website: http://vlsicad.ucsd.edu/courses/ece260b-w05
Slides courtesy of Prof. Andrew B. Kahng
ECE 260B – CSE 241A Clocking 2 http://vlsicad.ucsd.edu
Outline
Problem Statement
Clock Distribution Structures
Robustness / Signal Integrity Control
Clock Design:
Skew Scheduling
Topology Construction
Embedding
ECE 260B – CSE 241A Clocking 3 http://vlsicad.ucsd.edu
Why Clocks?
Clocks provide the means to synchronize By allowing events to happen at known timing boundaries, we
can sequence these events
Greatly simplifies building of state machines
No need to worry about variable delay through combinational logic (CL)
All signals delayed until clock edge (clock imposes the worst case delay)
CombLogic
register
CombLogic
register
register
DataflowFSM
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Clocking 4 http://vlsicad.ucsd.edu
Clock Distribution Network
General goal of clock distribution Deliver clock to all memory elements with acceptable skew Deliver clock edges with acceptable sharpness
Clocking network design is one of the greatest challenges in the design of a large chip
Consume up to 1/3 of chip power Accurate signal delay Signal integrity Subject to uncertainty / variation of different processes /
operating conditions
ECE 260B – CSE 241A Clocking 5 http://vlsicad.ucsd.edu
Clock Design Components
Oscillator
Dividers
Buffers Strong drivers Reduce delay Signal integrity / slew rate
Interconnects Balanced trees, meshes, etc. Shielding (e.g., for crosstalk reduction) Non-tree links / feedback loops
ECE 260B – CSE 241A Clocking 6 http://vlsicad.ucsd.edu
Clock Distribution Objective
Minimum / bounded skew performance / hold time requirements
Guaranteed slew rate / signal integrity
Small insertion delay
Robustness under process / operating condition variation
Minimum cell / routing area
Minimum power consumption
ECE 260B – CSE 241A Clocking 7 http://vlsicad.ucsd.edu
Clock Distribution Robustness Subject to Radically different loading (flip-flop density)
Across the die ECO (Engineering Change Order)
Interconnect coupling Signal integrity Delay variation
Process variation From lot-to-lot Across the die Buffers Metal width
Supply voltage variation across the die Both static IR drop Dynamic voltage drop
Temperature
ECE 260B – CSE 241A Clocking 8 http://vlsicad.ucsd.edu
Issues in Clock Distribution Network Design
Skew Process, voltage, and temperature Data dependence Noise coupling Load balancing
Power, CV2f (consume up to 1/3 of total chip power) Clock gating
Flexibility/Tunability Compactness – fit into existing layout/design Facilitate ECO
ECE 260B – CSE 241A Clocking 9 http://vlsicad.ucsd.edu
Skew: Clock Delay Varies With Position
ECE 260B – CSE 241A Clocking 10 http://vlsicad.ucsd.edu
Clock Skew Causes
Designed (unavoidable) variations – mismatch in buffer load sizes, interconnect lengths
Process variation – process spread across die yielding different Leff, Tox, etc. values
Temperature gradients – changes MOSFET performance across die
IR voltage drop in power supply – changes MOSFET performance across die
Note: Delay from clock generator to fan-out points (clock latency) is not important by itself
BUT: increased latency leads to larger skew for same amount of relative variationSylvester / Shepard, 2001
ECE 260B – CSE 241A Clocking 11 http://vlsicad.ucsd.edu
Outline
Problem Statement
Clock Distribution Structures
Robustness / Signal Integrity Control
Clock Design:
Skew Scheduling
Topology Construction
Embedding
ECE 260B – CSE 241A Clocking 12 http://vlsicad.ucsd.edu
Clock Distribution Structures
RC-Tree Less capacitance More accuracy Flexible wiring
Grids Reliable Less data dependency Tunable (late in design)
Shown here for final stage drivers driving F/F loads
ECE 260B – CSE 241A Clocking 13 http://vlsicad.ucsd.edu
Grids
Gridded clock distribution common on earlier DEC Alpha microprocessors
Advantages: Skew determined by grid density, not
too sensitive to load position Clock signals available everywhere Tolerant to process variations Usually yields extremely low skew
values
Disadvantages: Huge amount of wiring and power To minimize such penalties, need to
make grid pitch coarser lose the grid advantage
Pre-drivers
Global grid
Sylvester / Shepard, 2001
ECE 260B – CSE 241A Clocking 14 http://vlsicad.ucsd.edu
H-Tree
H-tree (Bakoglu) One large central driver, recursive structure to
match wirelengths Halve wire width at branching points to reduce
reflections
Disadvantages Slew degradation along long RC paths Unrealistically large central driver
- Clock drivers can create large temperature gradients (ex. Alpha 21064 ~30° C)
Non-uniform load distribution Inherently non-scalable (wire R growth) Partial solution: intermediate buffers at branching
points
courtesy of P. Zarkesh-Ha
Sylvester / Shepard, 2001
ECE 260B – CSE 241A Clocking 15 http://vlsicad.ucsd.edu
Buffered H-tree
Advantages Ideally zero-skew Can be low power (depending on skew requirements) Low area (silicon and wiring) CAD tool friendly (regular)
Disadvantages Sensitive to process variations
- Devices Want same size buffers at each level of tree
- Wires Want similar segment lengths on each layer in each source-sink path !!!
Local clocking loads inherently non-uniform
Sylvester / Shepard, 2001
ECE 260B – CSE 241A Clocking 16 http://vlsicad.ucsd.edu
Tree Balancing
Some techniques:
a) Introduce dummy loads
b) Snaking of wirelength to match delays
Con: Routing area often more valuable than Silicon
Sylvester / Shepard, 2001
ECE 260B – CSE 241A Clocking 17 http://vlsicad.ucsd.edu
Examples From Processor Chips
H-Tree, Asymmetric RC-Tree (IBM)
GridsDEC [Alphas]
SerpentinesIntel x86[Young ISSCC97]
ECE 260B – CSE 241A Clocking 18 http://vlsicad.ucsd.edu
Example Skews From Processor Chips
DEC-Alpha 21064 clock spinesDEC-Alpha 21064 RC delays
DEC-Alpha 21164 RC delays for Global Distribution (Spine + Grid)
DEC-Alpha 21164 RC local delays
ECE 260B – CSE 241A Clocking 19 http://vlsicad.ucsd.edu
ReShape Clocks Example (High-End ASIC)
Balanced, shielded H-tree for pre-clock distribution
Mesh for block level distribution
output mesh
All routes 5-6u M6/5, shielded with 1u grounds
~10 buffers per node E.g., ganged BUFx20’s
Output mesh must hit every sub-block
ECE 260B – CSE 241A Clocking 20 http://vlsicad.ucsd.edu
Block Level Mesh (.18u)
Max 600u stride
1u m5 ribs every 20 - 30 u (4 to 6 rows)
Shielded input and output m6 shorting straps
Clumps of 1-6 clock buffers, surrounded by capacitor pads
Pre-clock connects to input shorting straps
ECE 260B – CSE 241A Clocking 21 http://vlsicad.ucsd.edu
Problems with Meshes
Burn more power at low frequencies
Blocks more routing resources (solution: integrated power distribution with ribs can provide shielding for ‘free’)
Difficult for ‘spare’ clock domains that will not tolerate regioning
Post placement (and routing) tuning required
No ‘beneficial skew’ possible
Clock gating only easy at root
Fighting tools to do analysis: Clumped buffers a problem in Static Timing Analysis tools Large shorted meshes a problem for STA tools What does Elmore delay calculation look like for a non-tree? Need full extraction and SPICE-like simulation to determine skew
ECE 260B – CSE 241A Clocking 22 http://vlsicad.ucsd.edu
Benefits of Meshes
Deterministic since shielded all the way down to rib distribution
No ECO placement required: all buffers preplaced before block placement
Low latency since uses shorted (= ganged, parallel) drivers, therefore lower skew
ECO placements of FFs later do not require rebalancing of tree
“Idealized” clocking environment for “concurrent dance” of RTL design and timing convergence
ECE 260B – CSE 241A Clocking 23 http://vlsicad.ucsd.edu
Hybrid Structure
Balanced tree on the top
Mesh in the middle Minimize skew
Steiner minimum tree at the bottom Minimize cost Facilitate ECO
ECE 260B – CSE 241A Clocking 24 http://vlsicad.ucsd.edu
Outline
Problem Statement
Clock Distribution Structures
Robustness / Signal Integrity Control
Clock Design:
Skew Scheduling
Topology Construction
Embedding
ECE 260B – CSE 241A Clocking 25 http://vlsicad.ucsd.edu
Process Variation
Intra-die and inter-die variations Intra-die variation is increasingly significant since 0.13um technology
Systematic and random variations Systematic variation is due to equipment, process, etc.
- Global len aberration in lithograthy causes systematic variation
- Pattern-dependent optical proximity, chemical mechanical polish (CMP) Random variation is due to inherent variation
Spatial correlation across a chip Fast vs. slow corners
ECE 260B – CSE 241A Clocking 26 http://vlsicad.ucsd.edu
Process Variation
Metal wires Width variation can be estimated by LUT(width, spacing) Thickness variation CMP local density Thickness variation also depends on wire width and spacing Could be up to 30-40% in 90nm process
Transistors Channel length variation (delay ~ L1.5) Thin gate oxide tox variation Vth variation Up to 30% variation in term of driving capability
ECE 260B – CSE 241A Clocking 27 http://vlsicad.ucsd.edu
Process Variations – SPICE model
Process variations are reflected into a statistical SPICE model
Usually only a few parameters have a statistical distribution (e.g. : {L, W, TOX,VTn, VTp}) and the others are set to a nominal value
The nominal SPICE model is obtained by setting the statistical parameters to their nominal value
Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB
ECE 260B – CSE 241A Clocking 28 http://vlsicad.ucsd.edu
Global Variations (Inter-die)
Process variations Performance variations
Critical path delay of a 16-bit adder
All devices have the same set
of model parameters value
Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB
ECE 260B – CSE 241A Clocking 29 http://vlsicad.ucsd.edu
Local Variations (Intra-die)
Each device instance has a slightly different set of model parameter values (aka device mismatch)
The performance of some analog circuits strongly depends on the degree of matching of device properties
Digital circuits are in general more immune to mismatch, but clock distribution network is sensitive (clock skew)
Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB
ECE 260B – CSE 241A Clocking 30 http://vlsicad.ucsd.edu
Statistical Design
Need to account for process variations during design phase
•Statistical design–Nominal design–Yield optimization–Design centering
Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB
ECE 260B – CSE 241A Clocking 31 http://vlsicad.ucsd.edu
Statistical Design
Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB
ECE 260B – CSE 241A Clocking 32 http://vlsicad.ucsd.edu
Process Variation Tolerance Enhancement
Rule of thumb: balanced tree Identical buffers at identical heights Drive identical subtree loads
Can we do better than this?
Process variation tolerant clock design Bounded-skew DME Topology construction
- With process variation tolerance in objective Useful skew scheduling
- To the center of permissible ranges
ECE 260B – CSE 241A Clocking 33 http://vlsicad.ucsd.edu
Signal Integrity
Crosstalk Capacitive, inductive
Supply voltage drop IR, L dI/dt, LC resonance
Temperature Increased resistance with higher temperature
Substrate coupling Parasitic resistance, capacitance in the substrate layer
ECE 260B – CSE 241A Clocking 34 http://vlsicad.ucsd.edu
Crosstalk
Due to the coupling capacitance between interconnections, a signal switching on a net (aggressor) may affect the voltage waveform on a neighboring net (victim)
Noise Propagation
Increased Delay
ECE 260B – CSE 241A Clocking 35 http://vlsicad.ucsd.edu
Circuit Model for Crosstalk
ECE 260B – CSE 241A Clocking 36 http://vlsicad.ucsd.edu
Crosstalk Simulation
ECE 260B – CSE 241A Clocking 37 http://vlsicad.ucsd.edu
Design for Crosstalk
It can be both capacitive and inductive Capacitive is dominant at current switching speeds
To reduce it: Use of shielding layer (inter-layer) Use of shielding wire (intra-layer)
GND
VDD
GND
Substrate
ECE 260B – CSE 241A Clocking 38 http://vlsicad.ucsd.edu
Clock Gating
Reduce power consumption by temporarily shutting down part of the circuit
Additional cost of enabling circuits CLK1
DQ combinationallogic
FF FF
CLK2
CLK ENABLING
ECE 260B – CSE 241A Clocking 39 http://vlsicad.ucsd.edu
Outline
Problem Statement
Clock Distribution Statement
Robustness / Signal Integrity Control
Clock Design:
Skew Scheduling
Topology Construction
Embedding
ECE 260B – CSE 241A Clocking 40 http://vlsicad.ucsd.edu
Skew = Local Constraint
D : longest pathd : shortest path
FF FF
safe
Skew
race condition cycle time violation
-d + thold Tperiod - D - tsetup< <
permissible range
Timing is correct as long as the clock signals of sequentially adjacent FFs arrive within a permissible skew range
W. Dai, UC Santa Cruz
ECE 260B – CSE 241A Clocking 41 http://vlsicad.ucsd.edu
“Useful Skew” Design Robustness
“0 0 0”: at verge of violation
FF FF FF2 ns 6 ns
T = 6 ns
“2 0 2”: more safety margin4 0
-22
4 0
Design will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on edge
W. Dai, UC Santa Cruz
ECE 260B – CSE 241A Clocking 42 http://vlsicad.ucsd.edu
Constraints on Skews
FFi receives clock signal delayed by xi MIN_DEL 0 < 1 : if nominal clock delay is xi, then actual clock delay
must fall within interval xi x xi
For FF to operate correctly when clock edge arrives at time x, the correct input data must be present and stable during the time interval (x – SETUP, x + HOLD)
For 1 i,j L (#FFs), we compute lower and upper bounds MIN(i,j) and MAX(i,j) for the time that is required for a signal edge to propagate from FFi to FFj
Avoid double-clocking (race condition) xi + MIN(i,j) xj + HOLD
Avoid zero-clocking xj + SETUP + MAX(i,j) xj + P; P = clock period
ECE 260B – CSE 241A Clocking 43 http://vlsicad.ucsd.edu
Optimal Useful Skews by Linear Programming
LP_SPEED (clock period reduction):
minimize P s.t.
xj - xj HOLD – MIN(i,j)
xi– xj + P SETUP + MAX(i,j)
xi MIN_DEL
LP_SAFETY (robustness):
Maximize M s.t.
xj - xj – M HOLD – MIN(i,j)
xi– xj – M SETUP + MAX(i,j) – P
xi MIN_DEL
Notes- J. P. Fishburn, “Clock Skew Optimization”, IEEE Trans. Computers 39(7) (1990), pp. 945-951.
- T. G. Szymanski, “Computing Optimal Clock Schedules”, Proc. DAC, June 1992, pp. 399-404.
- Useful Skew optimization is similar to Retiming optimization
- Peak current reductions are a side benefit
ECE 260B – CSE 241A Clocking 44 http://vlsicad.ucsd.edu
Outline
Problem Statement
Clock Distribution Structures
Robustness / Signal Integrity Control
Clock Design:
Skew Scheduling
Topology Design
Embedding For zero skew (ZST-DME) For bounded skew (BST-DME)
ECE 260B – CSE 241A Clocking 45 http://vlsicad.ucsd.edu
Zero-Skew Tree (ZST) Problem
Zero Skew Clock Routing Problem (S,G): Given a set S of sink locations and a connection topology G, construct a ZST T(S) with topology G and having minimum cost.
Skew = maximum value of |td(s0,si) – td(s0,sj)| over all sink pairs si, sj in S.
Td = signal delay (from source s0)
Connection topology G = rooted binary tree with nodes of S as leaves Edge ea in G is the edge from a to its parent |ea| is the (assigned) length of edge ea
Cost = total edge length
ECE 260B – CSE 241A Clocking 46 http://vlsicad.ucsd.edu
Zero-Skew Example (555 sinks, 40 obstacles)
ECE 260B – CSE 241A Clocking 47 http://vlsicad.ucsd.edu
A Zero-Skew Routing Algorithm
Finds a ZST under linear delay model with minimum cost over all ZSTs with topology G and sink set S
Terms Manhattan Arc: line segment with
slope +1 or –1 Tilted Rectangular Region (TRR):
collection of points within a fixed distance of a Manhattan arc- Core = Manhattan arc- Radius = distance
Merging segment = locus of feasible locations for a node v in the topology, consistent with minimum wirelength- If v is a sink, then ms(v) = {v}- If v is an internal node, then ms(v) is
the set of all points within distance |ea| of ms(a), and within distance |eb| of ms(b)
ECE 260B – CSE 241A Clocking 48 http://vlsicad.ucsd.edu
Phase 1: Tree of Merging Segments
Goal: Construct a tree of merging segments corresponding to topology G Merging segment of a node depends on merging segment of its
children bottom-up construction Let a, b be children of v. We want placements of v that allow TSa
and TSb to be merged with minimum added wire while preserving zero skew
Merging cost = |ea| + |eb|
Fact: The intersection of two TRRs is also a TRR and can be found in constant time
Constant time per each new merging segment linear time (in size of S) to construct entire tree
ECE 260B – CSE 241A Clocking 49 http://vlsicad.ucsd.edu
Phase 2: Find Node Placements
Goal: Find exact locations (“embeddings”) pl(v) of internal nodes v in the ZST topology
If v is the root node, then any point on ms(v) can be chosen as pl(v)
If v is an internal node other than the root, and p is the parent of v, then v can be embedded at any point in ms(v) that is at distance |ev| or less from pl(p) Detail: create square TRR trrp
with radius ev and core equal to pl(p); placement of v can be any point in ms(v) trrp
Each instruction executed at most once for each node in G, and TRR intersection is O(1) time Find_Exact_Placements is O(n) DME is O(n)
ECE 260B – CSE 241A Clocking 50 http://vlsicad.ucsd.edu
Outline
Problem Statement
Clock Distribution Structures
Robustness / Signal Integrity Control
Clock Design:
Skew Scheduling
Topology Design
Embedding For zero skew (ZST-DME) For bounded skew (BST-DME)
ECE 260B – CSE 241A Clocking 51 http://vlsicad.ucsd.edu
Non-Zero Skew Bounds
skew0
2 4 6
2
4
6
0246
2
4
6
skew
v
s4
va b
s1 s2 s3
Topologys0 b
a
Given a skew bound, where can internal nodes of the given topology (e.g., a, b, v) be placed?
ECE 260B – CSE 241A Clocking 52 http://vlsicad.ucsd.edu
BST-DME Bottom-Up Phase
s4
va b
s1 s2 s3
Topology
s0
s1
s3
s4
s2
mr(a)mr(b)mr(v)
B = 4
Bottom-Up: build tree of merging regions corresponding to given topology
s0
ECE 260B – CSE 241A Clocking 53 http://vlsicad.ucsd.edu
BST-DME Top-Down Phase
s4
va b
s1 s2 s3
Topology
s0
s1
s3
s4
s2
a bv
B = 4
s0
ECE 260B – CSE 241A Clocking 54 http://vlsicad.ucsd.edu
Good Luck for the Mid-Term!