eecs 150 - components and design techniques for digital systems lec 16 – arithmetic ii...

EECS 150 - Components and Design

Techniques for Digital Systems

Lec 16 – Arithmetic II (Multiplication)

David CullerElectrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~cullerhttp://inst.eecs.berkeley.edu/~cs150

http://www.eecs.berkeley.edu/~culler

http://inst.eecs.berkeley.edu/~cs150

Overview

• Review of Addition

• Overflow

• Multiplication

• Further adder optimizations for multiplication

• CLA in the large – parallel prefix

Review

• Circuit design for unsigned addition– Full adder per bit slice– Delay limited by Carry Propagation

» Ripple is algorithmically slow, but wires are short

• Carry select– Simple, resource-intensive– Excellent layout

• Carry look-ahead– Excellent asymptotic behavior– Great at the board level, but wire length effects are significant on chip

• Digital number systems– How to represent negative numbers– Simple operations– Clean algorithmic properties

• 2s complement is most widely used– Circuit for unsigned arithmetic– Subtract by complement and carry in– Overflow when cin xor cout of sign-bit is 1

Computer Number Systems

• Positional notation– Dn-1 Dn-2 …D0 represents Dn-1Bn-1 + Dn-2Bn-2 + …+ D0 B0

where Di { 0, …, B-1 }

• 2s Complement– Dn-1 Dn-2 …D0 represents: - Dn-12n-1 + Dn-22n-2 + …+ D0 20

– MSB has negative weight

0000

0001

0010

0011

1000

0101

0110

0100

1001

1010

1011

1100

1101

0111

1110

1111

+0

+1

+2

+3

+4

+5

+6

+7-8

-7

-6

-5

-4

-3

-2

-1

-8 + 5

2s Complement OverflowAdd two positive numbers to get a negative number

or two negative numbers to get a positive number

5 + 3 = -8! -7 - 2 = +7!

0000

0001

0010

0011

1000

0101

0110

0100

1001

1010

1011

1100

1101

0111

1110

1111

+0

+1

+2

+3

+4

+5

+6

+7-8

-7

-6

-5

-4

-3

-2

-1

0000

0001

0010

0011

1000

0101

0110

0100

1001

1010

1011

1100

1101

0111

1110

1111

+0

+1

+2

+3

+4

+5

+6

+7-8

-7

-6

-5

-4

-3

-2

-1

How can you tell an overflow occurred?

2s comp. Overflow Detection

5

3

-8

0 1 1 1 0 1 0 1

0 0 1 1

1 0 0 0

-7

-2

7

1 0 0 0 1 0 0 1

1 1 0 0

1 0 1 1 1

5

2

7

0 0 0 0 0 1 0 1

0 0 1 0

0 1 1 1

-3

-5

-8

1 1 1 1 1 1 0 1

1 0 1 1

1 1 0 0 0

Overflow Overflow

No overflow No overflow

Overflow occurs when carry in to sign does not equal carry out

2s Complement Adder/Subtractor

A - B = A + (-B) = A + B + 1

A B

CO

S

+ CI

A B

CO

S

+ CI

A B

CO

S

+ CI

A B

CO

S

+ CI

0 1

Add/Subtract

A 3 B 3 B 3

0 1

A 2 B 2 B 2

0 1

A 1 B 1 B 1

0 1

A 0 B 0 B 0

Sel Sel Sel Sel

S 3 S 2 S 1 S 0

Overflow

Adders on the Xilinx Virtex

• Dedicated carry logic provides fast arithmetic carry capability for high-speed arithmetic functions. The Virtex-E CLB supports two separate carry chains, one per Slice. The height of the carry chains is two bits per CLB.

• The arithmetic logic includes an XOR gate and AND gate that allows a 2-bit full adder to be implemented within a slice.

• Cin to Cout delay = 0.1ns, versus 0.4ns for F to X delay. How do we map a 2-bit adder to one slice?

Time / Space (resource) Trade-offs

• Carry select and CLA utilize more silicon to reduce time.

• Can we use more time to reduce silicon?

• How few FAs does it take to do addition?

Bit-serial Adder

• Addition of 2 n-bit numbers:– takes n clock cycles,

– uses 1 FF, 1 FA cell, plus registers

– the bit streams may come from or go to other circuits, therefore the registers may be optional.

• Requires controller– What does the FSM look like? Implemented?

• Final carry out?

• A, B, and R held in shift-registers. Shift right once per clock cycle.

• Reset is asserted by controller.

n-bit shift register

n-bit shift registers

sc

reset

R

FAFF

B

A

lsb

Discussion

• What is sign extension and why does it work?

• Where is addition used in the project?

• Where might you want more powerful arithmetic operations?

Announcements

• Reading: 5.8 (4 pages!)

• Digital Design in the news – from UCB– UC Berkeley is among six universities to be part of the

program started by IBM Corp. and Google Inc. on college campuses to promote computer-programming techniques for clusters of processors known as "clouds". Cloud computing allows computers in remote data centers to run parallel, increasing their processing power. Each company will spend between $20 million and $25 million for hardware, software and services that can be used by computer-science professors and students.

http://www.ibm.com/us/

http://www.google.com/about.html

Basic concept of multiplication

multiplicand

multiplier

1101 (13)

1011 (11)

1101

1101

0000

1101

*

10001111 (143)

Partial products

• product of 2 n-bit numbers is an 2n-bit number– sum of n n-bit partial products

• unsigned

Combinational Multiplier: accumulation of partial

products A0

B0

A0 B0

A1

B1

A1 B0

A0 B1

A2

B2

A2 B0

A1 B1

A0 B2

A3

B3

A2 B0

A2 B1

A1 B2

A0 B3

A3 B1

A2 B2

A1 B3

A3 B2

A2 B3A3 B3

S6 S5 S4 S3 S2 S1 S0S7

Array Multiplier

b3 0 b2 0 b1 0 b0 0

P7 P6 P5 P4

a0

0

a1

0

a2

0

a3

0

P0

P1

P2

P3

FA

bj sum in

sum out

carryout

ai

carryin

Each row: n-bit adder with AND gates

What is the critical path?

Generates all n partial products simultaneously.

“Shift and Add” Multiplier

• Sums each partial product, one at a time.

• In binary, each partial product is shifted versions of A or 0.

Control Algorithm:

1. P 0, A multiplicand,

B multiplier

2. If LSB of B==1 then add A to P

else add 0

3. Shift [P][B] right 1

4. Repeat steps 2 and 3 n-1 times.

5. [P][B] has product.

Bn-bit shift registers

P

An-bit register

+0

10

n-bit adder

• Cost n, = n clock cycles.

• What is the critical path for determining the min clock period?

Carry-save Addition• Speeding up multiplication is

a matter of speeding up the summing of the partial products.

• “Carry-save” addition can help.

• Carry-save addition passes (saves) the carries to the output, rather than propagating them.

• Example: sum three numbers,

310 = 0011, 210 = 0010, 310 = 0011

310 0011

+ 210 0010

c 0100 = 410

s 0001 = 110

310 0011

c 0010 = 210

s 0110 = 610

1000 = 810

carry-save add

carry-save add

carry-propagate add

• In general, carry-save addition takes in 3 numbers and produces 2.

• Whereas, carry-propagate takes 2 and produces 1.

• With this technique, we can avoid carry propagation until final addition

Carry-save Circuits

• When adding sets of numbers, carry-save can be used on all but the final sum.

• Standard adder (carry propagate) is used for final sum.

FA FA FA FA FA FA FA FA 0

CSA

s cs cs cs cs cs cs cs cc

CSA

CPA

CSA

CSA

x0

x1

x2

Array Mult. using Carry-save Additionb3 0 b2 0 b1 0 b0 0

P7 P6 P5 P4

a0

0

a1

a2

a3

P0

P1

P2

P3

1

0

0

0

0

0

0

0

000

FA

bj sum in

sum out

carryout

ai

carryin

Fast carry-propagate adder

Another Representation

A3 B0

SC

A2 B0

SC

A1 B0

SC

A0 B0

SC

A3 B1

SC

A2 B1

SC

A1 B1

SC

A0 B1

SC

A3 B2

SC

A2 B2

SC

A1 B2

SC

A0 B2

SC

A3 B3

SC

A2 B3

S

A1 B3

S

A0 B3

S

B0

B1

B2

B3

P7 P6 P5 P4 P3 P2 P1 P0

A3 A2 A1 A0

Building block: full adder + and

4 x 4 array of building blocks

F A

X

Y

A B

S CI CO

Cin Sum In

Sum Out Cout

Add

CPA

Carry-save Addition

CSA is associative and commutative. For example:

(((X0 + X1) + X2 ) + X3 ) = ((X0 + X1) +( X2 + X3 ))

• A balanced tree can be used to reduce the logic delay.

• This structure is the basis of the Wallace Tree Multiplier.

• Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.

• Multiplier delay log3/2N + log2NCSA

CPA

CSA

CSA

x0x1x2

CSA

CSACSA

x3x4x5x6x7

log3/2N

log2N

Signed Multiplier

Signed Multiplication:Remember for 2’s complement numbers MSB has negative weight:

ex: -6 = 110102 = 0•20 + 1•21 + 0•22 + 1•23 - 1•24

= 0 + 2 + 0 + 8 - 16 = -6

• Therefore for multiplication:

a) subtract final partial product

b) sign-extend partial products

• Modifications to shift & add circuit:

a) adder/subtractor

b) sign-extender on P shifter register

11

2

0

22

nn

iN

ii xxX

Signed multiplication

multiplicand

multiplier

1101 (-3)

1011 (-5)

1101

11010

000000

1101000

*

(15)

• product of 2 n-bit numbers is an 2n-bit number– sum of n n-bit partial products

1111 Note: 2s complement

Sign extension111

00

1

+(-3)+

+

+

-

+(-6)

-(-24)

00001111

Signed Array Multiplier

b3 0 b2 0 b1 0 b0 0

P7 P6 P5 P4

a0

0

a1

0

a2

0

a3

0

P0

P1

P2

P3

Implicit

Sign

extension

- - - -

“Shift and Add” Signed Multiplier

Bn-bit shift registers

P

An-bit register

+0

10

n-bit adder

• Signed extend partial product at each stage

• Final step is a subtract

Carry Look-ahead Adders

• In general, for n-bit addition best we can achieve is

delay log(n)

• How do we arrange this? (think trees)

• First, reformulate basic adder stage:

carry “kill” ki = ai’ bi’

carry “propagate” pi = ai bi

carry “generate” gi = ai bi

ci+1 = gi + pici

si = pi ci

0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1

a b ci ci+1 s

Carry Look-ahead Adders – in blocks• “Group” propagate and generate signals:

• P true if the group as a whole propagates a carry to cout

• G true if the group as a whole generates a carry

• Group P and G can be generated hierarchically.

pi

gi

pi+1

gi+1

pi+k

gi+k

P = pi pi+1 … pi+k

G = gi+k + pi+kgi+k-1 + … + (pi+1pi+2 … pi+k)gi

cin

cout

Cout = G + PCin

Carry Look-ahead Adders

a0b0

a1b1a2b2

a

a3b3

a4b4a5b5

b

c3 = Ga + Pac0

Pa

Ga

Pb

Gb

a6b6

a7b7a8b8

c

c6 = Gb + Pbc3

Pc

Gc

P = PaPbPc

G = Gc + PcGb + PbPcGa

c9 = G + Pc0

c0

9-bit Example of hierarchically generated P and G signals:

Parallel Prefix (generalizing CLA)

• Compute all the prefixes Fi = Fi-1 op Fi-2 op … op F0

• Assume associative and commutative

B ABA

BAx

BAxAx

01234567

76 54 32 10

10

10

32

32

54

54

76

76

6 4 2 0

74 30

3074

54 10

70

30

54 10

74 64 30 20

30

30

70 60 50 40

c0

a0b0

s0

a1b1

s1

c1

a2b2

s2

a3b3

s3

c3

c2

c0

c0

a4b4

s4

a5b5

s5

c5

a6b6

s6

a7b7

s7

c7

c6

c0

c4

c0

c8

p,g

P,G

P,G

cin

cout

P,GPa,Ga

Pb,Gb

P = PaPb

G = Gb + GaPb

Cout = G + cinP

aibi

si

p,g

ci

ci+1

p = a bg = ab

s = p ci

ci+1 = g + cip

8-bit Carry Look-ahead Adder

Summary

• 2 complement number systems– Algebraic and corresponding bit manipulations

– Overflow detection

– Signficance of “sign bit” -2n-1

• Carry look ahead is form a parallel prefix

• Time / Space tradeoffs– Bit serial adder

• Binary Multiplication algorithm– Array multiplier

– Serial multiply (with bit parallel adder)

• Signed multiplication– Sign extend multipicand

– Sign bit of multiplier treated as subtract

eecs 150 - components and design techniques for digital systems lec 16 – arithmetic ii...

Documents