feb. 17, 2011

Post on 23-Feb-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Feb. 17, 2011. Midterm overview Real life examples of built chips Clock Skew Arithmetic Data Centers Power reduction techniques Dynamic Voltage / Frequency Scaling Clock Throttling Power Gating Others? Project – 4b adder with Razor recovery. Go Over Problems. 1c 2a; 2b 3c. - PowerPoint PPT Presentation

TRANSCRIPT

Feb. 17, 2011• Midterm overview• Real life examples of built chips

– Clock Skew• Arithmetic• Data Centers• Power reduction techniques

– Dynamic Voltage / Frequency Scaling– Clock Throttling– Power Gating– Others?

• Project – 4b adder with Razor recovery

Go Over Problems

• 1c• 2a; 2b• 3c

Crossbar Design

6

Mirror AdderStick Diagram

CiA B

VDD

GND

B

Co

A Ci Co Ci A B

S

7

The Mirror Adder•The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carry-generation circuitry.

•When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. The reduction of the diffusion capacitances is particularly important.

•The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell .

•The transistors connected to Ci are placed closest to the output.

•Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.

8

Transmission Gate Full Adder

A

B

P

Ci

VDD A

A A

VDD

Ci

A

P

AB

VDD

VDD

Ci

Ci

Co

S

Ci

P

P

P

P

P

Sum Generation

Carry Generation

Setup

9

Manchester Carry Chain

CoCi

Gi

DiPi

PiVDD

CoCi

Gi

PiVDD

10

Manchester Carry Chain

G2

C3

G3Ci,0

P0

G1

VDD

G0

P1 P2 P3

C3C2C1C0

11

Carry-Bypass Adder

FA FA FA FA

P0 G1 P0 G1 P2 G2 P3 G3

Co,3Co,2Co,1Co,0Ci ,0

FA FA FA FA

P0 G1 P0 G1 P2 G2 P3 G3

Co,2Co,1Co,0Ci,0

Co,3

Multip

lexer

BP=PoP1P2P3

Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.

Also called Carry-Skip

12

Carry-Bypass Adder (cont.)

13

Carry Ripple versus Carry Bypass

N

tp

ripple adder

bypass adder

4..8

14

Carry-Select AdderSetup

"0" Carry Propagation

"1" Carry Propagation

Multiplexer

Sum Generation

Co,k-1 Co,k+3

"0"

"1"

P,G

Carry Vector

15

Carry Select Adder: Critical Path

16

Linear Carry Select

Setup

"0" Carry

"1" Carry

Multiplexer

Sum Generation

"0"

"1"

Setup

"0" Carry

"1" Carry

Multiplexer

Sum Generation

"0"

"1"

Setup

"0" Carry

"1" Carry

Multiplexer

Sum Generation

"0"

"1"

Setup

"0" Carry

"1" Carry

Multiplexer

Sum Generation

"0"

"1"

Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15

S0-3 S4-7 S8-11 S12-15

Ci,0

(1)

(1)

(5)(6) (7) (8)

(9)

(10)

(5) (5) (5)(5)

17

Square Root Carry Select

Setup

"0" Carry

"1" Carry

Multiplexer

Sum Generation

"0"

"1"

Setup

"0" Carry

"1" Carry

Multiplexer

Sum Generation

"0"

"1"

Setup

"0" Carry

"1" Carry

Multiplexer

Sum Generation

"0"

"1"

Setup

"0" Carry

"1" Carry

Multiplexer

Sum Generation

"0"

"1"

Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13

S0-1 S2-4 S5-8 S9-13

Ci,0

(4) (5) (6) (7)

(1)

(1)

(3) (4) (5) (6)

Mux

Sum

S14-19

(7)

(8)

Bit 14-19

(9)

(3)

18

Adder Delays - Comparison

19

LookAhead - Basic Idea

Co k f Ak Bk Co k 1– Gk PkCo k 1–+= =

20

Look-Ahead: Topology

Co k Gk Pk Gk 1– Pk 1– Co k 2–+ +=

Co k Gk Pk Gk 1– Pk 1– P1 G0 P0Ci 0+ + + +=

Expanding Lookahead equations:

All the way:

21

Carry Lookahead Trees

Co 0 G0 P0Ci 0+=

Co 1 G1 P1G0 P1P0Ci 0+ +=

Co 2 G2 P2G1 P2P1G0 P+ 2P1P0C i 0+ +=

G2 P2G1+ = P2P1 G0 P0Ci 0+ + G 2:1 P2:1Co 0+=

Can continue building the tree hierarchically.

Power Reduction Techniques

• Stop the clock– Dynamic power reduction

• Power gating– Reduce the leakage

• How fast can you turn something on/off?– Nothing to do sleep

• How can you save power while in operation?– Near-threshold design

Power Gating

Kevin Nowka, IBM

Gate Leakage

Digital ParallelizationY[n] = X[n] + X[n-1]

Input(5bits @ 5GS/s)

clk clk

X[n]X[n-1]

Y[n]+

x

Clk = 5GHz

Analog Signal

Input(5bits @ 5GS/s)

Or

(8bits @ 100MHz)

ANALOG DIGITAL

DSP Parallelization Y[n] = X[n] + X[n-1]

Input(5bits @ 5GS/s)

clk

clk

X[n]X[n-2]

+

x

Y[n-1] = X[n-1] + X[n-2]

clk

clkb

CLK = 5GHz

clk

X[n-1]

Y[n]

Y[n-1]+

x

CLK = 2.5GHz

DSP Parallelization• Clock speed reduced by ½

– Can parallelize further– Increase number of MACs(multiply/accumulates) by 2

• Intuition?– Area goes up by 2– Power decreases (clock rate down

by 2, computations up by 2, but easier timing constraints)– What about clock power?

• Save a little power, but double the area?

Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

• http://www.eecs.umich.edu/~taustin/papers/MICRO36-Razor.pdf

Project Description

• Minimal: 4b Adder, Implemented with Razor– Simulations into near-threshold domain

• Grad. Student: requires more advanced design– Analog: Opamps built using inverters– Digital: Adiabatic Near-Threshold– Power Gating: add power gating to your design

• Undergrad: extra credit if do any of the above

Problem 1: On-Chip Wires Consume Energy• On-chip wire power does not scale

– Dominated by interconnect capacitance (CVDD2)

ON-CHIP (Status Quo):100 - 300fJ/bit/mm

NOTE: Sub/Near-Threshold doesn’t help this problem!

OUR GOAL: < 5fJ/bit/mm

[DOE, Exascale Workshop]

Data Center Design

• http://www.spectrum.ieee.org/feb09/7327

top related