![Page 1: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/1.jpg)
6.1
ALU Blocks and Control
1. Adder
2. Multiplier
3. Datapath Generation
Contents
![Page 2: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/2.jpg)
6.2
1. Adder Full Adder
Boolean equation
CARRY A B B C C A
A B C (A B)
SUM A B C A B C A B C A B C
A B C CARRY (A B C)
Sum(Odd Parity) CARRY A+B+CC
![Page 3: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/3.jpg)
6.3
Which is better?
Boolean Equation 1 :
CARRY evaluation is more urgent since CARRY is in the critical
path
[ Ripple Carry Adder ]
CARRY A B C (A B)
SUM A B C CARRY (A B C)
Boolean Equation 2 : CARRY A B C SUM (A B C)
SUM A B C A B C A B C A B C
ADDER
A0 B0
C0
C1
S0
ADDER
A1 B1
C2
S1
ADDER
A2 B2
Cn
S2
ADDER
An Bn
Cn
Sn
![Page 4: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/4.jpg)
6.4
Alternating Complementary Form
At Odd Stages At Even Stages
ABC
ABC
SUM
CARRY
SUM
CARRY
CARRY A B C (A B)
SUM A B C CARRY (A B C)
SUM
CARRY
CARRY (A B) (C A B)
SUM (A B C)(CARRY A B C)
SUM
CARRYABC
ABC
![Page 5: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/5.jpg)
6.5
Alternating Complementary Form
![Page 6: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/6.jpg)
6.6
Dynamic Serial Adder
A
B
SUM
CARRY
CR/S
Q D
CLOCK
A
B
S
a an1 0
b bn1 0
s sn1 0
)]1()1()1([)1()1()1()1()1(
)]1()1([)()1()1()1(
tCtBtAtCARRYtCtBtAtSUM
tBtAtCtBtAtCARRY
![Page 7: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/7.jpg)
6.7
Dynamic Configuration
CK
A
C
B A
B
CARRY GATE
OPTIONALPRECHARGE
DEVICE
SR
CK
CK
S
R
CK
CKC (CARRY)
C B A
CK
A
B
C
CK
CKSUM
SUM GATE
OPTIONALPRECHARGEDEVICE
Set/ResetCircuit
][ BACBACARRY
![Page 8: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/8.jpg)
6.8
Full Adder Truth Table
0
01234567
1 2 3
7 6 5 4
Mutually Complement
FC - on terms
FS - on terms
Conjugate Symmetry ; input 을 뒤집으면 output 도 뒤집힌다
A
00001111
B
00
C
01
110011
010101
CARRY
00010111
SUM
01101001
SUM F (A,B,C)
CARRY F (A, B,C)
SUM F (A, B,C)
CARRY F (A,B,C)
S
C
S
C
![Page 9: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/9.jpg)
6.9
Another Configuration of Carry & Sum Logic
A
C
B
CARRY STAGE
A B
A
A
1 PROPAGATE
1 PROPAGATE
1 GENERATE
1 GENERATE
CARRY
SUM STAGE
CARRY
B
B
C
CSUM
A B C
A B C
A
A
CARRY(t 1) F (A, B,C) A B B C C A A B C (A B)
SUM(t 1) F (A, B,C) A B C A B C A B C A B C
A B C CARRY (A B C)
C
S
![Page 10: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/10.jpg)
6.10
Dynamic full adder using np CMOS logic style
![Page 11: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/11.jpg)
6.11
Layout of the dynamic full adder
![Page 12: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/12.jpg)
6.12
Looking at the FA Truth Table
A
00001111
B
00
C
01
110011
010101
CARRY
00010111
SUM
01101001
CPCPSUM
BAP whereBPCPCARRY
0BA when C
1BA when CSUM
0=BA when A(orB)
1BA when CCARRY
![Page 13: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/13.jpg)
6.13
Transmission Gate Implementation
AB
BA B CARRY
C
A B
C
SUM
C
)( BAP
CP
CPCP
A B
A B
A B
![Page 14: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/14.jpg)
6.14
CLA (Carry Lookahead Adder)
C0
P1
G1
P2
G2
P3
G3
P4
G4
C1
C2
C3
C4
C G P C where G A B
= G P G P P G .. + P P .. P P C
S C P
i i i i 1 i i i
i i i 1 i i 1 i 2 i i-1 2 1 0
i i i
Available for (# of inputs 4)
. .
An
Bn
Gn
Pn
![Page 15: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/15.jpg)
6.15
Carry bypass structure - basic concept
![Page 16: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/16.jpg)
6.16
(N=16)-bit carry bypass adder(each stage: M bits)
tp = tsetup + M * tcarry+(N/M - 1) tbypass + M*tcarry+tsum
tsetup : time to create G and P signals
tcarry : propagation delay through a single bit
tbypass : propagation delay through MUX
tsum : time to generate sum
Worst case delay
![Page 17: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/17.jpg)
6.17
Combining 4 Domino Carry Lookahead Blocks
Manchester Carry Chain (4-bit)
Limit 4 stages
In the worst case, 6 Series Tr.s to the ground.
C0
CK
CK
P1
C1
G1
P2
C2
G2
P3
C3
G3
P4
C4
G4
C4
C0 MANCHESTERCARRY CHAIN
G1 P1 G2 P2 G3 P3 G4 P4
C0 C4
C0 C1 C2 C3 C4
C G P C1 2 1 0 GP Block Sum Block
![Page 18: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/18.jpg)
6.18
Improving Worst Case Carry Prop. Time
MANCHESTERCARRY CHAIN
C0 C4
C0 C4
CKP1 P2 P3 P4
CK
Faster pass transistor chain due to lower parasitic C loading
![Page 19: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/19.jpg)
6.19
Manchester CC Adder Floorplan
Dual CC Scheme One for Carry Prop.
The other for off-loading the 1st CC from the SUM-block.
GP
C4
A4
GP
GP
SUMGENERATE
MA
NC
HE
STE
RC
AR
RY
CH
AIN
MA
NC
HE
STE
RC
AR
RY
CH
AIN
MA
NC
HE
STE
RC
AR
RY
CH
AIN
MA
NC
HE
STE
RC
AR
RY
CH
AIN
MA
NC
HE
STE
RC
AR
RY
CH
AIN
MA
NC
HE
STE
RC
AR
RY
CH
AIN
MA
NC
HE
STE
RC
AR
RY
CH
AIN
MA
NC
HE
STE
RC
AR
RY
CH
AIN
SUM
SUM
SUM
SUMGENERATE
S4
S3
S2
S1
B4
A3
B3
A2
B2
A1
B1
C0
BIT 4
BIT 3
BIT 2
BIT 1
![Page 20: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/20.jpg)
6.20
CSA (Carry Select Adder)
1
0S4
~ S7
C8
S41 ~ S7
1
A4 ~ A7 B4 ~ B7
1
S40 ~ S7
00
S0 ~ S3
A0 ~ A3 B0 ~ B3
C0
C4
S0 ~ S3
C81
C80 )C(CC
)CC(CC
) 0CC always (since CCCCC
CCCCC
084
18
0844
18
18
08
18
084
184
084
1848
A4 ~ A7 B4 ~ B7
Realization of MUX with restoring logic
Note) Realization of MUX with pass-transistor gates
C8
0
1C81
C80
C4
C8
C81
C80
C4
C4
C4
C12
C121
C120
C8
C8
C8
Threshold voltage loss per stage
Vdd Vdd - Vt Vdd - 2Vt
Carry Selection
Use restoring logic for critical path
![Page 21: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/21.jpg)
6.21
CSA (Carry Select Adder)
For carry propagation, use restoring logic in the alternating pattern
S0 ~ S3
A0 ~ A3 B0 ~ B3
C0
C4
C80 C8
1
C8
C120 C12
1
Number of bits for each stageex1) 32-bit case : 4, 4, 5, 6, 7, 6 ( or 4, 4, 5, 6, 6, 7)ex2) 64-bit case : 4, 4, 5, 6, 7, 8, 9, 10
![Page 22: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/22.jpg)
6.22
Minimization of Carry Propagation Path Delay
Carry Select Scheme (prepare result for each case, Cin=1, Cin=0)
Simplify the carry selection using the characteristic between Ci0 & Ci
1
Take complement carries alternating the Even and Odd stages
Adjust each block size with the consideration to the delay of carry select logic carry propagation delay of each block = = carry propagation delay to the
block adjust
4 4 5 6 6 7
eg. for 32-bit path
![Page 23: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/23.jpg)
6.23
16-bit Linear CSA(Carry Select Adder)
tadd = tsetup + M * tcarry+ (N/M ) tmux + tsumM: #of bits/stageN : total # of bits
![Page 24: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/24.jpg)
6.24
Square Root CSA
tadd = tsetup + M * tcarry+ 2N tmux + tsum
N = M + (M+1) + ….. + (M+P-1) = MP + P(P-1)/2 = P2/2 + P(M - 1/2 ) ~ P2/2 9 stage
Assumed MUX delay is comparable to 1-stage carry prop delay
12 ~6(?) Number of clock cycles
for this signal to be obtained
![Page 25: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/25.jpg)
6.25
Propagation Delay of Linear and Square Root CSA and linear RCA
![Page 26: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/26.jpg)
6.26
Carry Skip Adder Ripple Carry Adder 와 CLA Adder 의 Compromise
P p p p p
G g g p g p p g p p p
O3 0 1 2 3
O3 3 2 3 1 3 2 0 3 2 1
a3b3a2b2
a1 b1a0 b0
a15b15
a14b14
a13 b13
a12b12
c0
c4c8c12
P12, 15 P8, 11 P4, 7
c16
G12,15 G8,11 G4,7
Worst case delay
![Page 27: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/27.jpg)
6.27
pi’s and gi’s are computed from pi=aibi and gi = aibi
Initially, c4, c8 and c12 are cleared
After 4 clock cycle (at T0+4Tc), G-values are calculated as cout assuming ci=0(P-values are also calculated by then)
At this time (at T0+4Tc), true cout in the first stage, c4 is obtained.
After one, two and three clock cycles respectively, assuming the delay of each AOI gate as Tc, true values of c8, c12 and c16 are obtained.
Sum and cout of the last block are obtained at (T0+4Tc+2Tc+4Tc)
Worst case delay
![Page 28: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/28.jpg)
6.28
Comparison of Carry Select & Carry Skip Adder
A 32-bit Carry Select Adder
A 32-bit Carry Skip Adder
RCAAreaArea
kkSpeed
2
delays)r multiplexe where(822
logic-P
22delays)r multiplexe where(12
AreaAreaArea
kkSpeed
RCA
Stage # 1 2 3 4 5 6bits/stage 4 4 5 6 7 6inc. delay 4 1 1 1 1 1
Stage # 1 2 3 4 5 6bits/stage 4 5 6 7 8 2inc. delay 4 1 1 1 1 2
32 bit9k2(k2=delay due to 1-bit addition or MUX)
10k2
![Page 29: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/29.jpg)
6.29
Conditional Sum Adder
A2 B2
S21 C3
1 S20 C3
0
MPX
A1 B1
S11 C2
1 S10 C2
0
MPX
A0 B0
S01 C1
1 S00 C1
0
MPX
Triple 2-input MUX
S0
C1
C0
S2
(C1=1)C3
(C1=1)S1
(C1=1)S2
(C1=0)
C3
(C1=0)
S2 C3 S1
S1
(C1=0)
![Page 30: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/30.jpg)
6.30
Carry Lookahead Tree Adder
Previous CLA implementation is not very adequate due to fan-in, fan-out problem & irregularity, despite the small(5) number of logic levels. Make it regular, using log2n - logic levels.a3 b3 a2 b2
g3 p3 g2 p2
G2,3 P2,3
G0,3 P0,3
a1 b1 a0 b0
g1 p1 g0 p0
G0,1 P0,1
ai bi
gi pi
Gj+1,k Pj+1,k
Gi,k Pi,k
Gi,j
Pi,j
iii
iii
bap
bag
kjjiki
jikjkjki
PPP
GPGG
,1,,
,,1,1,
[ 1st Part ]
![Page 31: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/31.jpg)
6.31
Carry Lookahead Tree Adder
iijjijCPGC ,1
iii
iii
iiii
bag
baP
cbaS
C3 C2
g2
p2
C1 C0
g0
p0
G0,1
P0,1
C2 C0
C0
Cj+1 Ci
Gi,j
Pi,j
Ci
aibi
gi pi
Gj+1,k Pj+1,k
Gi,kPi,k
Gi,jPi,j
a3b3 a2b2
C0
a1 b1 a0b0S3 S2 S1 S0
C3 C2C1 C0
C0
S3
Ci
Cj+1
CiCi
[ 2nd Part ]
[ Complete CLA Tree Adder ]
![Page 32: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/32.jpg)
6.32
Carry Save Adder
Ripple Carry Adder
Carry Lookahead Adder
CSA (Conditional Sum Adder)
CSA (Carry Select Adder)
CSA (Carry Skip Adder)
CSA (Carry Save Adder)
Carry Propagate Adder
![Page 33: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/33.jpg)
6.33
Carry Save Adder
Carry Save Adder is used wherever a large number of operands have to be added.
F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A
F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A
F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A
aibici
CSAstages
CPA
F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.AF.AF.A
CarryF/F
CarryF/F
SumF/F
SumF/F
Previous CycleCarry
Previous CycleSum Operand
![Page 34: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/34.jpg)
6.34
2. Multiplier
Add-and-Shift Algorithm
1
0
0
0
1
1
0
1
1
0
0
0
1
0
1
0
1
0
0
0
1
1
0
0
1
1
0
0
0
0
0
Multiplication procedure
by Pencil-and-Paper Method
0
Multiplication procedure
by Add-and-Shift Algorithm
0
0
0
0
1
1
1
0
1
0
1
1
0
0
0
0
1
1
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
1
0
1
1
1
0 1+
+
+
+
multiplier
multiplicand
![Page 35: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/35.jpg)
6.35
The Serial-Parallel Multiplier
0
0
1n
1n
n
n
01nn
01nn
b2Ab2Ab2ABA
as expressed is BAproduct The
)b, ... ,b,(bB
)a, ... ,a,(aA If
D
D
D
D
D D D D
F.A
D
0
D D D
F.A
D
F.A
D
F.A
D
Output
F.A
D
F.A
D
F.A
D
A
B
b2
b1
b0
a0a1a2a3
![Page 36: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/36.jpg)
6.36
4x4 array multiplier
![Page 37: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/37.jpg)
6.37
tmult = [(M-1) + (N-1)] * tcarry + (N-1) * tsum+ tand
both tcarry and tsum are important
Sum and Carry generation time need to be similar.
N(4)
M(3)
![Page 38: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/38.jpg)
6.38
Carry-save Multiplier(CSM)
Rectangular floorplan of CSM
![Page 39: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/39.jpg)
6.39
The Modified Booth Algorithm (cont’)
Booth Encoder Table
b2k+1
0
0
0
0
1
1
1
1
b2k
0
0
1
1
0
0
1
1
b2k-1
0
1
0
1
0
1
0
1
multiplied by
0
+ x
+ x
+ 2x
- 2x
- x
- x
0
Ab2k-1
b2k
b2k+1
negative
2A
Booth Encoder
= b2k b2k-1
= b2k+1
![Page 40: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/40.jpg)
6.40
Booth Multiplication Example
A
X
Initial 0
Add -A
2-bit Shift
Add 2A
2-bit Shift
Add -A
01
11
-A
00
10
10
11
10
01
00
10
11
00
01
+2A
00
11
11
10
00
11
01
11
01
01
11
-A
00
11
11
11
10
01
11
11
10
11
11
01
01
17
-9
Operation
-153
+
+
+
11
11
![Page 41: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/41.jpg)
6.41
The Modified Booth Algorithm
Let’s consider a number B = (bn-1, bn-2, ... , b1, b0) written in 2’s-complement.
B may be rewritten as follows :
Example
In this equation, the terms in brackets is in the set {-2, -1, 0, 1, 2}
n-bit multiplier generates exactly n/2 partial products
B b 2 b 2n 1n 1
kk 0
n 2k
0)=b (assume 2)b2b(bB 12k
1
0k12k2k12k
2n
0101 2)b2bb( 2
321 2)b2bb( 4
543 2)b2bb(
4
43
32
21
10
01 2b2b2b2b2bb
![Page 42: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/42.jpg)
6.42
Parallel Multiplier
Multiplier has two basic operations
The generation of partial products
The summation of partial products
Parallel multiplier avoids the overhead that is due to the separate
controls of these two operations
The gain in speed is obtained at the expense of extra hardware
Parallel multiplier can be implemented such that it supports a high rate
of pipelining
![Page 43: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/43.jpg)
6.43
The Braun Multiplier
a0
b0
a0b0
P0
a1
b1
a1b0
a0b1
P1
a2
b2
a2b0
a1b1
a0b2
P2
a3
b3
a3b0
a2b1
a1b2
a0b3
P3
a3b1
a2b2
a1b3
P4
a3b2
a2b3
P5
a3b3
P6
A straightforward implementation One bit of the new partial prod
uct
( ai . bj )
One bit of the previous partial product
Carry in
In the first four rows there is no horizontal carry propagation (using carry-save adder)
![Page 44: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/44.jpg)
6.44
The Braun Multiplier (cont’)
F.A F.A F.A
F.A F.A F.A
F.A F.A F.A
F.A F.A F.A0
b0
b1
b2
b3
p0
p1
p2
p3
p4p5p6p7
a0a1a2a3
0 0 0
![Page 45: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/45.jpg)
6.45
Baugh-Wooley Multiplier
Modified in order to allow multiplication of signed number
Let’s consider 2 number A and B (2’s complement number)
The product A.B is
2n
0
i
i
1n
1n01n
2n
0
i
i
1n
1n01n
2b2b)b ... (bB
2a2a)a ... (aA
22n
1-n1-n
12n22n
1n1n
1ni2n
0i1n
1ni2n
0i1n
1n
1n1n
2n
0
2n
0
ji22n12n
1ni2n
0i
1n22n
1n
1ni2n
0i
1n22n
1n
2n
0
2n
0
ji22n
1n1n
1in2n
0i1n
2n
0
1in
i1n
2n
0
2n
0
ji
ji
22n
1n1n
2)ba(22)a(b :because
2ba2ab2)b(a2ba2)baba(2
2a22b2b22a2ba2ba
2ab2ba2ba2baBA
ji1n1n1n1n
ji
1a when ,2a2aA
0a when ,2aA
complement s2'in bit sign :a
1-n
2n
0
i
i
1-n
1-n
1-n
2n
0
i
i
1n
![Page 46: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/46.jpg)
6.46
Baugh-Wooley Multiplier (cont’)
a0a1a2a3
F.A F.A F.A
F.A F.A F.A
b0
b1
b2
b3
p0
p1
p2
0 0 0
F.A F.A F.A
F.A F.A F.A
p4p5p6p7
F.A
p3
a3 b3F.A
F.A1
![Page 47: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/47.jpg)
6.47
Wallace Tree Multipliers
Full adder vs Wallace tree
Useful whenever a large number of operands are to add.
Completion time in Braun or Baugh-Wooley multiplier Using Ripple Carry Adder:
Proportional to the twice number of n of bits
Using Wallace trees,
Proportional to log2 (n)
Full Adder
20 20 20
21 20
Wallace n
20 20 20
2n 2021
![Page 48: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/48.jpg)
6.48
Recursive Decomposition of the Multiplication
A 2 A A
B 2 B B
A B 2 A B 2 (A B A B ) A B
PH L
PH L
2PH H
PH L L H L L
Partitioning two operands
Four Terms (AH.BH, AH
.BL, AL.BH, AL
.BL) are computed using 4 p-bits multipliers
The results are collected through Wallace tree
![Page 49: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/49.jpg)
6.49
Recursive Decomposition of the Multiplication
BH BL
AH AL
AL X BL
AH X BL
AH X BH
AL X BH
AL X BL
AH X BL
AH X BH
AL X BH
Aligning the four partial products
AL X BL
AL X BH
AH X BH
AH X BL
4 X W34 X W3
Adder
AH AL BH BL
![Page 50: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/50.jpg)
6.50
Booth’s Algorithm Array Multiplication
Another approach to the design of a parallel multiplier for two’s complement operands
The basic cell in rows i perform an add, subtract or transfer-only
CASS (Controlled Add/Subtract/Shift) Cell
cin
Pin (partial product)a
HD
cout
(subtract)
(add)
ca)c(aPc1,D If
ca)c(aPc0,D If
ca)c(aD)(Pc
)(
)(
caPP1,H If
PP0,H If
H)(cH)(aPP
inininout
inininout
inininout
ininout
inout
ininout
sum
transfer
![Page 51: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/51.jpg)
6.51
Booth’s Algorithm Array Multiplication (cont’)
CASS CASS
CASS CASS
CASS CASS
CASS CASS
CASS
CASS
CASS
CASS CASS CASS CASS
CASS CASS CASS CASS
CASS
CASS CASS
CTRL
P6
CTRL
CTRL
CTRL
x3
x2
x1
x0
0
P5 P4 P3 P2 P1 P0
a3 a2 a1 a0
0 0 0 0
00
00
00
0
HD
HD
HD
HD
i
ii
XD
XXH
1
Xi Xi-1
0110
0101
ShiftShift
SubtractAdd
0011
dd10
DH
![Page 52: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/52.jpg)
6.52
Generalized block diagram of an array multiplier
![Page 53: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/53.jpg)
6.53
Q. Why use an array multiplier if it requires as many addition steps?
A1) Array multiplier is combinational circuit, where the signals flow without being clocked.
Multi-pass Array Multiplier : normally use a clock, but the cycle time for passing through k arrays is < kTc
![Page 54: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/54.jpg)
6.54
A2) Some speed-up schemes are possible.
e.g. E/O array, Wallace-tree
Even-Odd Array
![Page 55: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/55.jpg)
6.55
Wallace-tree Multiplier
![Page 56: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/56.jpg)
6.56
6 x 6 Wallace-tree Multiplier Example
(n : width of the Wallace tree)
e.g. For 32-bit, number of adders necessary for each stage is
32 - 22 - 16 - 12 - 8 - 6 - 4 - 3 - 2
Total delay = 9 x adder delay
nDelay2
3log
![Page 57: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/57.jpg)
6.57
![Page 58: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/58.jpg)
6.58
Datapath and its elements in bit-slice organization
INP
UT
-OU
TP
UT
MEMORY
DATAPATH
CONTROL
3. Datapath Generation
![Page 59: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/59.jpg)
6.59
Two layout strategies for bit-slice datapath
![Page 60: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/60.jpg)
6.60
Layout of 4-bit DP using layout strategy II (feedthrough)
![Page 61: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/61.jpg)
6.61
1-D placement vs. 2-D placement
![Page 62: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/62.jpg)
6.62
1-D placement vs. 2-D placement(Cont’)
![Page 63: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/63.jpg)
6.63
Datapath Layout Flow
circuit design floorplan : block ordering, bus track assignment
schematic drawing : tr. sizing
layout cell drawing : leaf cell layout
layout assemble : leaf cell integration (routing)
DRC / LVS : design rule check, layout vs. schematic
back-annotation simulation with the exact capacitance
RTL descriptionRTL description
FloorplanFloorplan
Schematic DrawingSchematic Drawing
Cell DrawingCell Drawing
Layout AssembleLayout Assemble
DRC / LVSDRC / LVS
Back-AnnotationBack-Annotation
Datapath LayoutDatapath Layout
![Page 64: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/64.jpg)
6.64
Datapath Design Case (ACCENT HK386)
real mode support of x86 instruction set
enhanced (pipelined) datapath
problems & practices of general DP layout
![Page 65: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/65.jpg)
6.65
Datapath structure
3 major blocks alu, register file(32bit)
barallel shifter(40bit)
segment/effective address(32bit)
Seg
men
t,EA
AL
UR
eg
ister
File
BarrelShifter
![Page 66: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/66.jpg)
6.66
Track capacity
VSS VDD TRACK(6)
Power
Control, Clock
N-well P-well
6 vertical wires/track in metal 1 metal3 reserved for P & G routing
metal2metal1
![Page 67: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/67.jpg)
6.67
Power Grid From bottom & left(chip edges)
Considering IR drop
Seg
men
t,EA
AL
URF
BS
H
![Page 68: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/68.jpg)
6.68
Cell Structure
Initial cell template decision Nwell in the left
Pwell in the right
data flow vertical
control flow horizontal
Similar cell structure as VTI
Cell width
– 80 for PMOS
– 70 for NMOS
2510 35 45 10 25
70 80
N-well P-well
![Page 69: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/69.jpg)
6.69
Cell Structure
모든 쎌에 power line 이 통과함 power line width
10 (2 contact)
power line location 25 to the inside
from the boundary
![Page 70: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/70.jpg)
6.70
Accent Cell Layout Flow ( 어느 학생의 탄식 )
Block Spec.
Schematic
SPICE
처음에 cap 을 가정하고 시뮬레이션 TR sizing 은 간단하게 끝냄 Cap 값이 정확하지 않으니까 optimize 는 필요
없고 spec 만 만족하면 된다고 생각함 전체 assemble 이 되어야 정확한 cap 이 나오므로
한참동안 일에서 손을 뗌 assemble 된 다음 layout 을 고치면 새로 다시
assemble 해야 하는데 엄청난 노가다
![Page 71: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/71.jpg)
6.71
Data flow
Control
flow
Cell Design(I) Using 45 degree line for cell design
![Page 72: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/72.jpg)
6.72
Cell Design(II) needless effort to reduce cell size
ugly poly; current crowding
Data flow
![Page 73: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/73.jpg)
6.73
Critical path used for transistor sizing in relevant datapath element
![Page 74: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/74.jpg)
6.74
•Track assignment needs to be done before the cell layout (not after).
AssembleData flow
![Page 75: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents](https://reader035.vdocument.in/reader035/viewer/2022062304/56649ec05503460f94bcaea1/html5/thumbnails/75.jpg)
6.75
대학 성적과 사회에서의 성공은별로 correlation이 없는데 ,
이것은 사실 신기한 일이 아니다 .
사회 성공의 요인과 대학성적 기준이 종종 상당히 다르니까 .
대학 성적과 사회에서의 성공은별로 correlation이 없는데 ,
이것은 사실 신기한 일이 아니다 .
사회 성공의 요인과 대학성적 기준이 종종 상당히 다르니까 .
학점의 가치학점의 가치