datapath designs

Datapath Designs

CK Cheng

CSE Department

UC, San Diego

Prefix Adder – Well-known and Well-developed?

• Classic prefix networks: Sklansky, Kogge-Stone, Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles etc.

Prefix Adder – New Respects, New Method

• Realistic design considerations: Timing, Power and Area.

• Integer Linear Programming for prefix adder:– Logic effort timing model (gate cap. + wire cap.)– Activity-statistic power model– Non-uniform signal arrival/required times

Logic Levels

Max Fanouts Max Wire Tracks

Timing

Power Area

Prefix Adder – Optimum Prefix adders

• Uniform signal arrival/required times

Sklansky Adder Kogge-Stone Adder

Fastest depth-4 optimal prefix adderFastest depth-3 optimal prefix adder

• Uniform signal arrival/required times

30 35 40 45 50 55 60

Timi ng

Depth = 3 Depth = 4 Depth = 5

Brent - Kung Kogge- Stone Skl ansky

• Non-uniform signal arrival/required times

Increasing Signal Arrival Times Decreasing Signal Arrival Times Convex Signal Arrival Times

Division – Iteration effort

• Pencil and paper method: (A=QB+2-nR and R<B)

1 bit partial quotient per iteration, n iterationsA = 0.1001,

B = 0.1010;

Q = A / B.

Q = 0.1101

+Qi: Partial Quotient

Ri: Partial Remainder

Ri+1 = Ri – B Qi

1 0 0 11 0 1 0 R0=A

1 0 1 00 1 0 0 R2

0 0 0 01 0 0 0 R3

1 0 1 00 1 1 0 R4

1 0 1 0

1 0 0 0 R1Q1 = 0.1Q2 = 0.01Q3 = 0.000Q4 = 0.0001

Division – Memory effort

• Lookup table is the simplest way to obtain multiple partial quotient bits in each iteration.

• SRT method: a lookup tables stores m-bit partial quotients decided by m bits of partial remainder and m bits of divisor.

Table size: 22m m

• STR method is limited by memory wall.

Division – Arithmetic effort

• Partial quotient is calculated by arithmetic functions.• Prescaling:

• Taylor expansion:

• Series expansion:

322 )1

XXXXXXB

)1)(1)(1(11

Division – Solution space

• Modern FPGAs contains plenty of memory and build-in multipliers, which enable high performance divider.

Iteration Effort

Memory Effort

Arithmetic Effort

Memory Wall

Pencil-and-paper

Prescaling

Taylor Expansion

Low area

Series Expansion

Low latency

Our target

Division – PST algorithm

• Utilize the power of series expansion, but need a good start point.

• Prescaling provide a scaled divisor close to 1.

• 0-order Taylor expansion iterates to reach the final quotient

21)1)(1(

ERQ ii

Division – PST algorithm

E0 = Table (B(m)) 1/B

A1 = AE0; B1 = BE0

E1 = (2 B1) INV(B1(2m))

Qi = Ri-1 E1

Ri = Ri-1 Qi B1

Q = Q + Qi

A = 0.1011,0110B = 0.1100,1011

B(m) = 0.1100 E0 = 1.0011

E1 = INV(B1(2m)) = 1.0000,1110

A1 = A E0 = 0.1101,1000,0010B1 = B E0 = 0.1111,0001,0001

Q1 = A1 E1 = 0.1110,0011R1 = B1 – Q1 B1 = 0.0000,0010,0101,1110,1101

Q2 = R1 E1 = 0.1001,1111R2 = R1 – Q2 B1 = 0.0000,0001,1111,1011,0001

Q = 0.1110,0011 + 0.0000,0010,0111,11 = 0.1110,0101,0111,11

Division – FPGA Implementation• PST algorithm is suitable for high-perform

ance division unit design in FPGAs

Fmax(Period)

Memory Bits

DSP Blocks

Power Consumption

(Dynamic+Static)

Throughput

IP Core(no DSP)

50.16MHz

(19.935ns)

1203 84 0 381mW(52mW+329mW)

50.16Mdiv/s

PST(DSP)

72.8MHz(13.737n

213 768 28 350mW(23mW+327mW)

24.3Mdiv/s

PST(no DSP)

73.20MHz

(13.661ns)

1437 768 0 378mW(50mW+328mW)

24.4Mdiv/s

PST-pipelined(DSP)

74.15MHz

(13.486ns)

261 768 40 344mW(17mW+327mW)

74.15Mdiv/s

PSTp(no DSP)

76.05MHz

(13.150ns)

1940 768 0 359mW(31mW+328mW)

76.05Mdiv/s

32-bit division with 5-cycle latency

datapath designs

classic prefix networks

power of series expansion

adder new respects

paper method

bits of divisor

order taylor expansion

bits of partial remainder

multiple partial quotient

Documents

datapath and control

single cycle datapath

building a datapath datapath 1 - computer science at...

vada lab.sungkyunkwan univ. 1 datapath interconnections...

datapath design i

pipelined datapath

bg3 datapath control

datapath subsystem multiplication

myth8 processor datapath

ch. 10 central processing unit designs - cisc. two cpu...

asynchronous datapath design

datapath - oregon institute of...

9. datapath design

datapath functional units

chapter4 - single cyclecurt.nelson/cptr380/lecture/chapter4...

datapath report

datapath functional units

designing a simple datapath - · pdf filedesigning a simple...

datapath components and the datapath introduction have

datapath component tradeoffs