Download - Clock Design
![Page 1: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/1.jpg)
Clock Design
Adopted from David Harris of Harvey Mudd College
![Page 2: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/2.jpg)
2
Outline Clock Distribution Clock Skew Skew-Tolerant Static Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits
![Page 3: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/3.jpg)
3
Clocking Synchronous systems use a clock to keep
operations in sequence– Distinguish this from previous or next– Determine speed at which machine operates
Clock must be distributed to all the sequencing elements– Flip-flops and latches
Also distribute clock to other elements – Domino circuits and memories
![Page 4: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/4.jpg)
4
Clock Distribution On a small chip, the clock distribution network is just
a wire– And possibly an inverter
On practical chips, the RC delay of the wire resistance and gate load is very long– Variations in this delay cause clock to get to
different elements at different times– This is called clock skew
Most chips use repeaters to buffer the clock and equalize the delay– Reduces but doesn’t eliminate skew
![Page 5: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/5.jpg)
5
Example Skew comes from differences in gate and wire delay
– With right buffer sizing, clk1 and clk2 could ideally arrive at the same time.
– But power supply noise changes buffer delays– clk2 and clk3 will always see RC skew
3 mm
1.3 pF
3.1 mmgclk
clk1
0.5 mm
clk2clk3
0.4 pF 0.4 pF
![Page 6: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/6.jpg)
6
Review: Skew Impact
F1 F2
clk
clk clk
Combinational Logic
Tc
Q1 D2
Q1
D2
tskew
CL
Q1
D2
F1
clk
Q1
F2
clk
D2
clk
tskew
tsetup
tpcq
tpdq
tcd
thold
tccq
setup skew
sequencing overhead
hold skew
pd c pcq
cd ccq
t T t t t
t t t t
Ideally full cycle isavailable for work
Skew adds sequencingoverhead
Increases hold time too
![Page 7: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/7.jpg)
7
Cycle Time Trends Much of CPU performance comes from higher f
– f is improving faster than simple process shrinks– Sequencing overhead is bigger part of cycle
0.0 1
0 .1
1
10
1 00
80 3 8 680 4 8 6P e ntiu mP e ntiu m II / II I
Spe
cInt
95
1985 1988 1991 1994 1997 2000
1 .2 0.8 0 .6 0.3 5 2.0
Process
100
200
500VD D = 5 VDD = 3.3
VDD = 2.5
5 0Fano
ut-o
f-4 (F
O4)
Inve
rter D
elay
(ps)
0 .25
1 0
1 0 0
1 0 0 0
80 3 8 680 4 8 6P e n tiumP e n tium II / II I
MH
z
1 9 8 8 1 9 9 1 1 9 9 4 19 9 7 2 0 0 01 9 85
1 0
1 0 0
1 9 8 5 1 9 88 1 9 9 1 1 9 9 4 1 9 9 7
8 0 3 8 68 0 4 8 6P e n tiu mP e n tiu m II / II IFO
4 in
verte
r del
ays
/ cyc
le
50
20
2 0 0 0
![Page 8: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/8.jpg)
8
Solutions Reduce clock skew
– Careful clock distribution network design– Plenty of metal wiring resources
Analyze clock skew– Only budget actual, not worst case skews– Local vs. global skew budgets
Tolerate clock skew– Choose circuit structures insensitive to skew
Post-fabrication adjustment– Intel, IBM, etc
GALS (Global Synchronous Locally Asynchronous)
![Page 9: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/9.jpg)
9
Clock Dist. Networks Ad hoc Grids H-tree Hybrid
![Page 10: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/10.jpg)
10
Clock Grids Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die
![Page 11: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/11.jpg)
11
Alpha Clock Grids
PLL
gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
![Page 12: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/12.jpg)
12
H-Trees Symmetric structure
– Gets clock arbitrarily close to any point– Matched delay along all paths
Delay variations cause skew A and B might see big skew A B
![Page 13: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/13.jpg)
13
Itanium 2 H-Tree Four levels of buffering:
– Primary driver– Repeater– Second-level
clock buffer– Gater
Route aroundobstructions
Primary Buffer
Repeaters
Typical SLCBLocations
![Page 14: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/14.jpg)
14
Hybrid Networks Use H-tree to distribute clock to many points Tie these points together with a grid
Ex: IBM Power4, PowerPC– H-tree drives 16-64 sector buffers– Buffers drive total of 1024 points– All points shorted together with grid
![Page 15: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/15.jpg)
15
Skew Tolerance Flip-flops are sensitive to skew because of hard edges
– Data launches at latest rising edge of clock– Must setup before earliest next rising edge of clock– Overhead would shrink if we can soften edge
Latches tolerate moderate amounts of skew– Data can arrive anytime latch is transparent
![Page 16: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/16.jpg)
16
Skew: LatchesQ1
L1
1
2
L2 L3
1 12
CombinationalLogic 1
CombinationalLogic 2
Q2 Q3D1 D2 D3
sequencing overhead
1 2 hold nonoverlap skew
borrow setup nonoverlap skew
2
,
2
pd c pdq
cd cd ccq
c
t T t
t t t t t t
Tt t t t
2-Phase Latches
setup skew
sequencing overhead
hold skew
borrow setup skew
max ,pd c pdq pcq pw
cd pw ccq
pw
t T t t t t t
t t t t t
t t t t
Pulsed Latches
![Page 17: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/17.jpg)
17
Post Fab Adjustment Build test circuits and programmable capacitors on
the die Test skew after fabrication Program the capacitors to de-skew
![Page 18: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/18.jpg)
18
GALS Since clock/data signal delay across chip is 5-10
cycles, it is impossible to maintain a synchronous clock
GALS– Globally asynchronous, using protocols to
communicate– Locally synchronous, just like we used to do
Research???
![Page 19: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/19.jpg)
19
Dynamic Circuit Review Static circuits are slow because fat pMOS load input Dynamic gates use precharge to remove pMOS
transistors from the inputs– Precharge: = 0 output forced high– Evaluate: = 1 output may pull low
A B
Y
C DY
A B C D
A
B
C
D
![Page 20: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/20.jpg)
20
Domino Circuits Dynamic inputs must monotonically rise during
evaluation– Place inverting stage between each dynamic gate– Dynamic / static pair called domino gate
Domino gates can be safely cascaded
A
W
B
X
domino AND
dynamicNAND
staticinverter
![Page 21: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/21.jpg)
21
Domino Timing Domino gates are 1.5 – 2x faster than static CMOS
– Lower logical effort because of reduced Cin
Challenge is to keep precharge off critical path Look at clocking schemes for precharge and eval
– Traditional schemes have severe overhead– Skew-tolerant domino hides this overhead
![Page 22: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/22.jpg)
22
Traditional Domino Ckts Hide precharge time by ping-ponging between half-
cycles– One evaluates while other precharges– Latches hold results during precharge
Tc
Sta
tic
Dyn
amic
Latc
h
clk
Sta
tic
Dyn
amic
clkS
tatic
Dyn
amic
clk
Dyn
amic
clk clk
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
clk clk clk clk clk
clk
clk
tpdq tpdq
2pd c pdqt T t
![Page 23: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/23.jpg)
23
Clock Skew Skew increases sequencing overhead
– Traditional domino has hard edges– Evaluate at latest rising edge– Setup at latch by earliest falling edge
Sta
tic
Dyn
amic
Latc
hclk
Sta
tic
Dyn
amic
clkD
ynam
icclk clk
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Dyn
amic
clk clk clk clk
clk
clk
tskewtsetup
setup skew2 2pd ct T t t
![Page 24: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/24.jpg)
24
Time Borrowing Logic may not exactly fit half-cycle
– No flexibility to borrow time to balance logic between half cycles
Traditional domino sequencing overhead is about 25% of cycle time in fast systems!
Sta
tic
Dyn
amic
Latc
hclk
Sta
tic
Dyn
amic
clk clk
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
clk clk clk
clk
clk
tskewtsetup
![Page 25: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/25.jpg)
25
Relaxing the Timing Sequencing overhead caused by hard edges
– Data departs dynamic gate on late rising edge– Must setup at latch on early falling edge
Latch functions– Prevent glitches on inputs of domino gates– Holds results during precharge
Is the latch really necessary?– No glitches if inputs come from other domino– Can we hold the results in another way?
![Page 26: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/26.jpg)
26
Skew-Tolerant Domino Use overlapping clocks to eliminate latches at phase
boundaries.– Second phase evaluates using results of first
a
Sta
tic
Dyn
amic
1
Sta
tic
Dyn
amic
2
b c d
a
1
2
b
c
a
1
2
b
c
No latch atphase boundary
![Page 27: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/27.jpg)
27
Full Keeper After second phase evaluates, first phase precharges Input to second phase falls
– Violates monotonicity? But we no longer need the value Now the second gate has a floating output
– Need full keeper to hold it either high or low
weak fullkeepertransistors
f
XH
![Page 28: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/28.jpg)
28
Time Borrowing Overlap can be used to
– Tolerate clock skew– Permit time borrowing
No sequencing overheadtskew
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Sta
tic
1
2
1 1 1 1 1 2 2 2
Phase 1 Phase 2
toverlap
tborrow
pd ct T
![Page 29: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/29.jpg)
29
Multiple Phases With more clock phases, each phase overlaps more
– Permits more skew tolerance and time borrowingS
tatic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Sta
tic
3
4
1 1 2 2 3 3 4 4
Phase 1 Phase 2 Phase 3 Phase 4
1
2
![Page 30: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/30.jpg)
30
Clock Generationclken
1
2
3
4
![Page 31: Clock Design](https://reader035.vdocument.in/reader035/viewer/2022081513/56816331550346895dd3b51a/html5/thumbnails/31.jpg)
31
Summary Clock skew effectively increases setup and hold
times in systems with hard edges Managing skew
– Reduce: good clock distribution network– Analyze: local vs. global skew– Tolerate: use systems with soft edges
Flip-flops and traditional domino are costly Latches and skew-tolerant domino perform at full
speed even with moderate clock skews.