feb. 17, 2011
Post on 23-Feb-2016
40 Views
Preview:
DESCRIPTION
TRANSCRIPT
Feb. 17, 2011• Midterm overview• Real life examples of built chips
– Clock Skew• Arithmetic• Data Centers• Power reduction techniques
– Dynamic Voltage / Frequency Scaling– Clock Throttling– Power Gating– Others?
• Project – 4b adder with Razor recovery
Go Over Problems
• 1c• 2a; 2b• 3c
Crossbar Design
6
Mirror AdderStick Diagram
CiA B
VDD
GND
B
Co
A Ci Co Ci A B
S
7
The Mirror Adder•The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carry-generation circuitry.
•When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. The reduction of the diffusion capacitances is particularly important.
•The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell .
•The transistors connected to Ci are placed closest to the output.
•Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.
8
Transmission Gate Full Adder
A
B
P
Ci
VDD A
A A
VDD
Ci
A
P
AB
VDD
VDD
Ci
Ci
Co
S
Ci
P
P
P
P
P
Sum Generation
Carry Generation
Setup
9
Manchester Carry Chain
CoCi
Gi
DiPi
PiVDD
CoCi
Gi
PiVDD
10
Manchester Carry Chain
G2
C3
G3Ci,0
P0
G1
VDD
G0
P1 P2 P3
C3C2C1C0
11
Carry-Bypass Adder
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3Co,2Co,1Co,0Ci ,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2Co,1Co,0Ci,0
Co,3
Multip
lexer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.
Also called Carry-Skip
12
Carry-Bypass Adder (cont.)
13
Carry Ripple versus Carry Bypass
N
tp
ripple adder
bypass adder
4..8
14
Carry-Select AdderSetup
"0" Carry Propagation
"1" Carry Propagation
Multiplexer
Sum Generation
Co,k-1 Co,k+3
"0"
"1"
P,G
Carry Vector
15
Carry Select Adder: Critical Path
16
Linear Carry Select
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
S0-3 S4-7 S8-11 S12-15
Ci,0
(1)
(1)
(5)(6) (7) (8)
(9)
(10)
(5) (5) (5)(5)
17
Square Root Carry Select
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13
S0-1 S2-4 S5-8 S9-13
Ci,0
(4) (5) (6) (7)
(1)
(1)
(3) (4) (5) (6)
Mux
Sum
S14-19
(7)
(8)
Bit 14-19
(9)
(3)
18
Adder Delays - Comparison
19
LookAhead - Basic Idea
Co k f Ak Bk Co k 1– Gk PkCo k 1–+= =
20
Look-Ahead: Topology
Co k Gk Pk Gk 1– Pk 1– Co k 2–+ +=
Co k Gk Pk Gk 1– Pk 1– P1 G0 P0Ci 0+ + + +=
Expanding Lookahead equations:
All the way:
21
Carry Lookahead Trees
Co 0 G0 P0Ci 0+=
Co 1 G1 P1G0 P1P0Ci 0+ +=
Co 2 G2 P2G1 P2P1G0 P+ 2P1P0C i 0+ +=
G2 P2G1+ = P2P1 G0 P0Ci 0+ + G 2:1 P2:1Co 0+=
Can continue building the tree hierarchically.
Power Reduction Techniques
• Stop the clock– Dynamic power reduction
• Power gating– Reduce the leakage
• How fast can you turn something on/off?– Nothing to do sleep
• How can you save power while in operation?– Near-threshold design
Power Gating
Kevin Nowka, IBM
Gate Leakage
Digital ParallelizationY[n] = X[n] + X[n-1]
Input(5bits @ 5GS/s)
clk clk
X[n]X[n-1]
Y[n]+
x
Clk = 5GHz
Analog Signal
Input(5bits @ 5GS/s)
Or
(8bits @ 100MHz)
ANALOG DIGITAL
DSP Parallelization Y[n] = X[n] + X[n-1]
Input(5bits @ 5GS/s)
clk
clk
X[n]X[n-2]
+
x
Y[n-1] = X[n-1] + X[n-2]
clk
clkb
CLK = 5GHz
clk
X[n-1]
Y[n]
Y[n-1]+
x
CLK = 2.5GHz
DSP Parallelization• Clock speed reduced by ½
– Can parallelize further– Increase number of MACs(multiply/accumulates) by 2
• Intuition?– Area goes up by 2– Power decreases (clock rate down
by 2, computations up by 2, but easier timing constraints)– What about clock power?
• Save a little power, but double the area?
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
• http://www.eecs.umich.edu/~taustin/papers/MICRO36-Razor.pdf
Project Description
• Minimal: 4b Adder, Implemented with Razor– Simulations into near-threshold domain
• Grad. Student: requires more advanced design– Analog: Opamps built using inverters– Digital: Adiabatic Near-Threshold– Power Gating: add power gating to your design
• Undergrad: extra credit if do any of the above
Problem 1: On-Chip Wires Consume Energy• On-chip wire power does not scale
– Dominated by interconnect capacitance (CVDD2)
ON-CHIP (Status Quo):100 - 300fJ/bit/mm
NOTE: Sub/Near-Threshold doesn’t help this problem!
OUR GOAL: < 5fJ/bit/mm
[DOE, Exascale Workshop]
Data Center Design
• http://www.spectrum.ieee.org/feb09/7327
top related