eecs 150 - components and design techniques for digital systems lec 16 – arithmetic ii...
TRANSCRIPT
EECS 150 - Components and Design
Techniques for Digital Systems
Lec 16 – Arithmetic II (Multiplication)
David CullerElectrical Engineering and Computer Sciences
University of California, Berkeley
http://www.eecs.berkeley.edu/~cullerhttp://inst.eecs.berkeley.edu/~cs150
Overview
• Review of Addition
• Overflow
• Multiplication
• Further adder optimizations for multiplication
• CLA in the large – parallel prefix
Review
• Circuit design for unsigned addition– Full adder per bit slice– Delay limited by Carry Propagation
» Ripple is algorithmically slow, but wires are short
• Carry select– Simple, resource-intensive– Excellent layout
• Carry look-ahead– Excellent asymptotic behavior– Great at the board level, but wire length effects are significant on chip
• Digital number systems– How to represent negative numbers– Simple operations– Clean algorithmic properties
• 2s complement is most widely used– Circuit for unsigned arithmetic– Subtract by complement and carry in– Overflow when cin xor cout of sign-bit is 1
Computer Number Systems
• Positional notation– Dn-1 Dn-2 …D0 represents Dn-1Bn-1 + Dn-2Bn-2 + …+ D0 B0
where Di { 0, …, B-1 }
• 2s Complement– Dn-1 Dn-2 …D0 represents: - Dn-12n-1 + Dn-22n-2 + …+ D0 20
– MSB has negative weight
0000
0001
0010
0011
1000
0101
0110
0100
1001
1010
1011
1100
1101
0111
1110
1111
+0
+1
+2
+3
+4
+5
+6
+7-8
-7
-6
-5
-4
-3
-2
-1
-8 + 5
2s Complement OverflowAdd two positive numbers to get a negative number
or two negative numbers to get a positive number
5 + 3 = -8! -7 - 2 = +7!
0000
0001
0010
0011
1000
0101
0110
0100
1001
1010
1011
1100
1101
0111
1110
1111
+0
+1
+2
+3
+4
+5
+6
+7-8
-7
-6
-5
-4
-3
-2
-1
0000
0001
0010
0011
1000
0101
0110
0100
1001
1010
1011
1100
1101
0111
1110
1111
+0
+1
+2
+3
+4
+5
+6
+7-8
-7
-6
-5
-4
-3
-2
-1
How can you tell an overflow occurred?
2s comp. Overflow Detection
5
3
-8
0 1 1 1 0 1 0 1
0 0 1 1
1 0 0 0
-7
-2
7
1 0 0 0 1 0 0 1
1 1 0 0
1 0 1 1 1
5
2
7
0 0 0 0 0 1 0 1
0 0 1 0
0 1 1 1
-3
-5
-8
1 1 1 1 1 1 0 1
1 0 1 1
1 1 0 0 0
Overflow Overflow
No overflow No overflow
Overflow occurs when carry in to sign does not equal carry out
2s Complement Adder/Subtractor
A - B = A + (-B) = A + B + 1
A B
CO
S
+ CI
A B
CO
S
+ CI
A B
CO
S
+ CI
A B
CO
S
+ CI
0 1
Add/Subtract
A 3 B 3 B 3
0 1
A 2 B 2 B 2
0 1
A 1 B 1 B 1
0 1
A 0 B 0 B 0
Sel Sel Sel Sel
S 3 S 2 S 1 S 0
Overflow
Adders on the Xilinx Virtex
• Dedicated carry logic provides fast arithmetic carry capability for high-speed arithmetic functions. The Virtex-E CLB supports two separate carry chains, one per Slice. The height of the carry chains is two bits per CLB.
• The arithmetic logic includes an XOR gate and AND gate that allows a 2-bit full adder to be implemented within a slice.
• Cin to Cout delay = 0.1ns, versus 0.4ns for F to X delay. How do we map a 2-bit adder to one slice?
Time / Space (resource) Trade-offs
• Carry select and CLA utilize more silicon to reduce time.
• Can we use more time to reduce silicon?
• How few FAs does it take to do addition?
Bit-serial Adder
• Addition of 2 n-bit numbers:– takes n clock cycles,
– uses 1 FF, 1 FA cell, plus registers
– the bit streams may come from or go to other circuits, therefore the registers may be optional.
• Requires controller– What does the FSM look like? Implemented?
• Final carry out?
• A, B, and R held in shift-registers. Shift right once per clock cycle.
• Reset is asserted by controller.
n-bit shift register
n-bit shift registers
sc
reset
R
FAFF
B
A
lsb
Discussion
• What is sign extension and why does it work?
• Where is addition used in the project?
• Where might you want more powerful arithmetic operations?
Announcements
• Reading: 5.8 (4 pages!)
• Digital Design in the news – from UCB– UC Berkeley is among six universities to be part of the
program started by IBM Corp. and Google Inc. on college campuses to promote computer-programming techniques for clusters of processors known as "clouds". Cloud computing allows computers in remote data centers to run parallel, increasing their processing power. Each company will spend between $20 million and $25 million for hardware, software and services that can be used by computer-science professors and students.
Basic concept of multiplication
multiplicand
multiplier
1101 (13)
1011 (11)
1101
1101
0000
1101
*
10001111 (143)
Partial products
• product of 2 n-bit numbers is an 2n-bit number– sum of n n-bit partial products
• unsigned
Combinational Multiplier: accumulation of partial
products A0
B0
A0 B0
A1
B1
A1 B0
A0 B1
A2
B2
A2 B0
A1 B1
A0 B2
A3
B3
A2 B0
A2 B1
A1 B2
A0 B3
A3 B1
A2 B2
A1 B3
A3 B2
A2 B3A3 B3
S6 S5 S4 S3 S2 S1 S0S7
Array Multiplier
b3 0 b2 0 b1 0 b0 0
P7 P6 P5 P4
a0
0
a1
0
a2
0
a3
0
P0
P1
P2
P3
FA
bj sum in
sum out
carryout
ai
carryin
Each row: n-bit adder with AND gates
What is the critical path?
Generates all n partial products simultaneously.
“Shift and Add” Multiplier
• Sums each partial product, one at a time.
• In binary, each partial product is shifted versions of A or 0.
Control Algorithm:
1. P 0, A multiplicand,
B multiplier
2. If LSB of B==1 then add A to P
else add 0
3. Shift [P][B] right 1
4. Repeat steps 2 and 3 n-1 times.
5. [P][B] has product.
Bn-bit shift registers
P
An-bit register
+0
10
n-bit adder
• Cost n, = n clock cycles.
• What is the critical path for determining the min clock period?
Carry-save Addition• Speeding up multiplication is
a matter of speeding up the summing of the partial products.
• “Carry-save” addition can help.
• Carry-save addition passes (saves) the carries to the output, rather than propagating them.
• Example: sum three numbers,
310 = 0011, 210 = 0010, 310 = 0011
310 0011
+ 210 0010
c 0100 = 410
s 0001 = 110
310 0011
c 0010 = 210
s 0110 = 610
1000 = 810
carry-save add
carry-save add
carry-propagate add
• In general, carry-save addition takes in 3 numbers and produces 2.
• Whereas, carry-propagate takes 2 and produces 1.
• With this technique, we can avoid carry propagation until final addition
Carry-save Circuits
• When adding sets of numbers, carry-save can be used on all but the final sum.
• Standard adder (carry propagate) is used for final sum.
FA FA FA FA FA FA FA FA 0
CSA
s cs cs cs cs cs cs cs cc
CSA
CPA
CSA
CSA
x0
x1
x2
Array Mult. using Carry-save Additionb3 0 b2 0 b1 0 b0 0
P7 P6 P5 P4
a0
0
a1
a2
a3
P0
P1
P2
P3
1
0
0
0
0
0
0
0
000
FA
bj sum in
sum out
carryout
ai
carryin
Fast carry-propagate adder
Another Representation
A3 B0
SC
A2 B0
SC
A1 B0
SC
A0 B0
SC
A3 B1
SC
A2 B1
SC
A1 B1
SC
A0 B1
SC
A3 B2
SC
A2 B2
SC
A1 B2
SC
A0 B2
SC
A3 B3
SC
A2 B3
S
A1 B3
S
A0 B3
S
B0
B1
B2
B3
P7 P6 P5 P4 P3 P2 P1 P0
A3 A2 A1 A0
Building block: full adder + and
4 x 4 array of building blocks
F A
X
Y
A B
S CI CO
Cin Sum In
Sum Out Cout
Add
CPA
Carry-save Addition
CSA is associative and commutative. For example:
(((X0 + X1) + X2 ) + X3 ) = ((X0 + X1) +( X2 + X3 ))
• A balanced tree can be used to reduce the logic delay.
• This structure is the basis of the Wallace Tree Multiplier.
• Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.
• Multiplier delay log3/2N + log2NCSA
CPA
CSA
CSA
x0x1x2
CSA
CSACSA
x3x4x5x6x7
log3/2N
log2N
Signed Multiplier
Signed Multiplication:Remember for 2’s complement numbers MSB has negative weight:
ex: -6 = 110102 = 0•20 + 1•21 + 0•22 + 1•23 - 1•24
= 0 + 2 + 0 + 8 - 16 = -6
• Therefore for multiplication:
a) subtract final partial product
b) sign-extend partial products
• Modifications to shift & add circuit:
a) adder/subtractor
b) sign-extender on P shifter register
11
2
0
22
nn
iN
ii xxX
Signed multiplication
multiplicand
multiplier
1101 (-3)
1011 (-5)
1101
11010
000000
1101000
*
(15)
• product of 2 n-bit numbers is an 2n-bit number– sum of n n-bit partial products
1111 Note: 2s complement
Sign extension111
00
1
+(-3)+
+
+
-
+(-6)
-(-24)
00001111
Signed Array Multiplier
b3 0 b2 0 b1 0 b0 0
P7 P6 P5 P4
a0
0
a1
0
a2
0
a3
0
P0
P1
P2
P3
Implicit
Sign
extension
- - - -
“Shift and Add” Signed Multiplier
Bn-bit shift registers
P
An-bit register
+0
10
n-bit adder
• Signed extend partial product at each stage
• Final step is a subtract
Carry Look-ahead Adders
• In general, for n-bit addition best we can achieve is
delay log(n)
• How do we arrange this? (think trees)
• First, reformulate basic adder stage:
carry “kill” ki = ai’ bi’
carry “propagate” pi = ai bi
carry “generate” gi = ai bi
ci+1 = gi + pici
si = pi ci
0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1
a b ci ci+1 s
Carry Look-ahead Adders – in blocks• “Group” propagate and generate signals:
• P true if the group as a whole propagates a carry to cout
• G true if the group as a whole generates a carry
• Group P and G can be generated hierarchically.
pi
gi
pi+1
gi+1
pi+k
gi+k
P = pi pi+1 … pi+k
G = gi+k + pi+kgi+k-1 + … + (pi+1pi+2 … pi+k)gi
cin
cout
Cout = G + PCin
Carry Look-ahead Adders
a0b0
a1b1a2b2
a
a3b3
a4b4a5b5
b
c3 = Ga + Pac0
Pa
Ga
Pb
Gb
a6b6
a7b7a8b8
c
c6 = Gb + Pbc3
Pc
Gc
P = PaPbPc
G = Gc + PcGb + PbPcGa
c9 = G + Pc0
c0
9-bit Example of hierarchically generated P and G signals:
Parallel Prefix (generalizing CLA)
• Compute all the prefixes Fi = Fi-1 op Fi-2 op … op F0
• Assume associative and commutative
B ABA
BAx
BAxAx
01234567
76 54 32 10
10
10
32
32
54
54
76
76
6 4 2 0
74 30
3074
54 10
70
30
54 10
74 64 30 20
30
30
70 60 50 40
c0
a0b0
s0
a1b1
s1
c1
a2b2
s2
a3b3
s3
c3
c2
c0
c0
a4b4
s4
a5b5
s5
c5
a6b6
s6
a7b7
s7
c7
c6
c0
c4
c0
c8
p,g
P,G
P,G
cin
cout
P,GPa,Ga
Pb,Gb
P = PaPb
G = Gb + GaPb
Cout = G + cinP
aibi
si
p,g
ci
ci+1
p = a bg = ab
s = p ci
ci+1 = g + cip
8-bit Carry Look-ahead Adder
Summary
• 2 complement number systems– Algebraic and corresponding bit manipulations
– Overflow detection
– Signficance of “sign bit” -2n-1
• Carry look ahead is form a parallel prefix
• Time / Space tradeoffs– Bit serial adder
• Binary Multiplication algorithm– Array multiplier
– Serial multiply (with bit parallel adder)
• Signed multiplication– Sign extend multipicand
– Sign bit of multiplier treated as subtract