introduction to computer organization and architecture lecture 10 by juthawut chantharamalee jutha...

Introduction to Computer Organization and Architecture

Lecture 10By Juthawut

Chantharamaleehttp://dusithost.dusit.ac.th/~juthawut_cha/home.htm

Outline Adders Comparators Shifters Multipliers Dividers Floating Point Numbers

2Introduction to Computer Organization and Architecture

Binary Representations of Numbers To find negative numbers

Sign and magnitude: msb = ‘1’ 1’s complement: complement each bit to change sign 2’s complement: 2n – positive number

b2b1b0 Unsigned

Sign and Magnitude

1’s

Complement

2’s Complement

0 1 1 3 +3 +3 +3

0 1 0 2 +2 +2 +2

0 0 1 1 +1 +1 +1

0 0 0 0 +0 +0 +0

1 0 0 4 -0 -3 -4

1 0 1 5 -1 -2 -3

1 1 0 6 -2 -1 -2

1 1 1 7 -3 -0 -1


Single-Bit AdditionHalf Adder Full Adder

A B Cout S

0 0

0 1

1 0

1 1

A B C Cout S

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

A B

S

Cout

A B

C

S

Cout

out

S

C

out

S

C


Single-Bit AdditionHalf Adder Full Adder

A B Cout S

0 0 0 0

0 1 0 1

1 0 0 1

1 1 1 0

A B C Cout S

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1

A B

S

Cout

A B

C

S

Cout


Carry-Ripple Adder Simplest design: cascade full adders

Critical path goes from Cin to Cout Design full adder to have fast carry delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3


Carry Propagate Adders N-bit adder called CPA

Each sum bit depends on all previous carries How do we compute all these carries quickly?

+

BN...1AN...1

SN...1

CinCout

11111 1111 +0000 0000

A4...1

carries

B4...1

S4...1

CinCout

00000 1111 +0000 1111

CinCout


Propagate and Generate - Define An n-bit adder is just a combinational circuit

si = a XOR b XOR c = aibi’ci’ + ai’bici’ + ai’bi’ci + aibici

ci+1 = MAJ(a,b,c) = aibi + aici + bici

Want to write si in “sum of products” form ci+1 = gi + pici, where gi = aibi, pi = ai + bi

if gi is true, then ci+1 is true, thus carry is generated if pi is true, and if ci is true, ci is propagated

Note that the pi equation can also be written as

pi = ai XOR bi since gi = 1 when ai = bi = 1 (i.e. generate occurs, so propagate is a “don’t care”)


Propagate and Generate - Blocks In the general case, for any j with i<j, j+1<k

ck+1 = Gik + Pikci

Gik = Gj+1,k + Pj+1,k Gij

Pik = PijPj+1,k

Gik equation in words: A carry is generated out of the block consisting of bits

i through k inclusive if it is generated in the high-order part of the block (j+1, k) or it is generated in the low-order (i,j) part of the block and then

propagated through the high part

0:00:00inGCP0:00:00inGCP


Propagate and Generate-Lookahead Recursively apply to eliminate carry terms

ci+1 = gi + pigi-1 + pipi-1gi-2 + …

+ pipi-1…p1g0 + pipi-1…p1p0c0

This is a carry-lookahead adder Note large fan-in of OR gate and last AND gate Too big! Build p’s and g’s in steps

c1 = g0 + p0c0

c2 = G01 + P01c0

Where: G01 = g1 + p1g0, P01 = p1p0


PG Logic

S1

B1A1

P1G1

G00

S2

B2

P2G2

G01

A2

S3

B3A3

P3G3

G02

S4

B4

P4G4

G03

A4 Cin

G0 P0

1: Bitwise PG logic

2: Block PG logic

3: Sum logicC0C1C2C3

Cout

C4


Carry-Ripple Revisited

S1

B1A1

P1G1

G00

S2

B2

P2G2

G01

A2

S3

B3A3

P3G3

G02

S4

B4

P4G4

G03

A4 Cin

G0 P0

C0C1C2C3

Cout

C4

G03 = G3 + P3 G02


Carry-Skip Adder Carry-ripple is slow through all N stages Carry-skip allows carry to skip over groups of n

bits Decision based on n-bit propagate signal

Cin+

S4:1

P4:1

A4:1 B4:1

+

S8:5

P8:5

A8:5 B8:5

+

S12:9

P12:9

A12:9 B12:9

+

S16:13

P16:13

A16:13 B16:13

CoutC4

1

0

C81

0

C121

0

1

0


Carry-Lookahead Adder Carry-lookahead adder computes G0i for many

bits in parallel. Uses higher-valency cells with more than two

inputs.

Cin+

S4:1

G4:1P4:1

A4:1 B4:1

+

S8:5

G8:5P8:5

A8:5 B8:5

+

S12:9

G12:9P12:9

A12:9 B12:9

+

S16:13

G16:13P16:13

A16:13 B16:13

C4C8C12Cout


Carry-Select Adder Trick for critical paths dependent on late input X

Precompute two possible outputs for X = 0, 1 Select proper output when X arrives

Carry-select adder precomputes n-bit sums For both possible carries into n-bit group

Cin+

A4:1 B4:1

S4:1

C4

+

+

01

A8:5 B8:5

S8:5

C8

+

+

01

A12:9 B12:9

S12:9

C12

+

+

01

A16:13 B16:13

S16:13

Cout

0

1

0

1

0

1


Comparators 0’s detector: A = 00…000 1’s detector: A = 11…111 Equality comparator: A = B Magnitude comparator: A < B


1’s & 0’s Detectors 1’s detector: N-input AND gate 0’s detector: NOTs + 1’s detector (N-input NOR)

A0

A1

A2

A3

A4

A5

A6

A7

allones

A0

A1

A2

A3

allzeros

allones

A1

A2

A3

A4

A5

A6

A7

A0


Equality Comparator Check if each bit is equal (XNOR, aka equality

gate) 1’s detect on bitwise equality

A[0]B[0]

A = B

A[1]B[1]

A[2]B[2]

A[3]B[3]


Magnitude Comparator Compute B-A and look at sign B-A = B + ~A + 1 For unsigned numbers, carry out is sign bit


Signed vs. Unsigned For signed numbers, comparison is harder

C: carry out Z: zero (all bits of A-B are 0) N: negative (MSB of result) V: overflow (inputs had different signs, output sign B)


Shifters Logical Shift:

Shifts number left or right and fills with 0’s 1011 LSR 1 = ____ 1011 LSL1 = ____

Arithmetic Shift: Shifts number left or right. Rt shift sign extends

1011 ASR1 = ____ 1011 ASL1 = ____

Rotate: Shifts number left or right and fills with lost bits

1011 ROR1 = ____ 1011 ROL1 = ____


Shifters Logical Shift:

Shifts number left or right and fills with 0’s 1011 LSR 1 = 0101 1011 LSL1 = 0110

Arithmetic Shift: Shifts number left or right. Rt shift sign extends

1011 ASR1 = 1101 1011 ASL1 = 0110

Rotate: Shifts number left or right and fills with lost bits

1011 ROR1 = 1101 1011 ROL1 = 0111


Funnel Shifter A funnel shifter can do all six types of shifts Selects N-bit field Y from 2N-bit input

Shift by k bits (0 k < N)

B C

offsetoffset + N-1

0N-12N-1

Y


Funnel Shifter Operation

Computing N-k requires an adder


Simplified Funnel Shifter Optimize down to 2N-1 bit input


Funnel Shifter Design 1 N N-input multiplexers

Use 1-of-N hot select signals for shift amount

k[1:0]

s0s1s2s3Y3

Y2

Y1

Y0

Z0Z1Z2Z3Z4

Z5

Z6

left Inverters & Decoder


Funnel Shifter Design 2 Log N stages of 2-input muxes

No select decoding needed

Y3

Y2

Y1

Y0Z0

Z1

Z2

Z3

Z4

Z5

Z6

k0k1

left


Multi-input Adders Suppose we want to add k N-bit words

Ex: 0001 + 0111 + 1101 + 0010 = _____



Ex: 0001 + 0111 + 1101 + 0010 = 10111



Ex: 0001 + 0111 + 1101 + 0010 = 10111

Straightforward solution: k-1 N-input CPAs Large and slow

+

+

0001 0111

+

1101 0010

10101

10111


Carry Save Addition A full adder sums 3 inputs and produces 2 outputs

Carry output has twice weight of sum output

N full adders in parallel are called carry save adder Produce N sums and N carry outs

Z4Y4X4

S4C4

Z3Y3X3

S3C3

Z2Y2X2

S2C2

Z1Y1X1

S1C1

XN...1 YN...1 ZN...1

SN...1CN...1

n-bit CSA


CSA Application Use k-2 stages of CSAs

Keep result in carry-save redundant form

Final CPA computes actual result

4-bit CSA

5-bit CSA

0001 0111 1101 0010

+

10110101_

0001 0111+1101 10110101_

XYZSC

0101_ 1011 +0010

XYZSC

ABS





4-bit CSA

5-bit CSA

0001 0111 1101 0010

+

10110101_

01010_ 00011

0001 0111+1101 10110101_

XYZSC

0101_ 1011 +0010 0001101010_

XYZSC

01010_+ 00011

ABS





4-bit CSA

5-bit CSA

0001 0111 1101 0010

+

10110101_

01010_ 00011

0001 0111+1101 10110101_

XYZSC

0101_ 1011 +0010 0001101010_

XYZSC

01010_+ 00011 10111

ABS

10111


Multiplication Example: 1100 : 1210

0101 : 510



0101 : 510 1100



0101 : 510 1100 0000



0101 : 510 1100 0000 1100



0101 : 510 1100 0000 1100 0000



0101 : 510 1100 0000 1100 000000111100 : 6010


Multiplication Example:

M x N-bit multiplication Produce N M-bit partial products Sum these to produce M+N-bit product

1100 : 1210 0101 : 510 1100 0000 1100 000000111100 : 6010

multiplier

multiplicand

partialproducts

product


General Form Multiplicand: Y = (yM-1, yM-2, …, y1, y0)

Multiplier: X = (xN-1, xN-2, …, x1, x0)

Product:

1 1 1 1

0 0 0 0

2 2 2M N N M

j i i jj i i j

j i i j

P y x x y

x0y5 x0y4 x0y3 x0y2 x0y1 x0y0

y5 y4 y3 y2 y1 y0

x5 x4 x3 x2 x1 x0






p0p1p2p3p4p5p6p7p8p9p10p11

multiplier

multiplicand

partialproducts

product


Dot Diagram Each dot represents a bit

partial products

multiplier x

x0

x15


Array Multipliery0y1y2y3

x0

x1

x2

x3

p0p1p2p3p4p5p6p7

B

ASin Cin

SoutCout

BA

CinCout

Sout

Sin

=

CSAArray

CPA

critical path BA

Sout

Cout CinCout

Sout

=Cin

BA


Rectangular Array Squash array to fit rectangular floorplan

y0y1y2y3

x0

x1

x2

x3

p0

p1

p2

p3

p4p5p6p7


Fewer Partial Products Array multiplier requires N partial products If we looked at groups of r bits, we could form

N/r partial products. Faster and smaller? Called radix-2r encoding

Ex: r = 2: look at pairs of bits Form partial products of 0, Y, 2Y, 3Y First three are easy, but 3Y requires adder


Booth Encoding Instead of 3Y, try –Y, then increment next partial

product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product


Booth Hardware Booth encoder generates control lines for each PP

Booth selectors choose PP bits


Sign Extension Partial products can be negative

Require sign extension, which is cumbersome High fanout on most significant bit

multiplier x

x0

x15

0

00

x-1

x16x17

ssssssssssssssss

ssssssssssssss

ssssssssssss

ssssssssss

ssssssss

ssssss

ssss

ss

PP0

PP1

PP2

PP3

PP4

PP5

PP6

PP7

PP8


Simplified Sign Ext. Sign bits are either all 0’s or all 1’s

Note that all 0’s is all 1’s + 1 in proper column Use this to reduce loading on MSB

s111111111111111s

s1111111111111s

s11111111111s

s111111111s

s1111111s

s11111s

s111s

s1s

PP0

PP1

PP2

PP3

PP4

PP5

PP6

PP7

PP8


Even Simpler Sign Ext. No need to add all the 1’s in hardware

Precompute the answer!

ssss

ss1

ss1

ss1

ss1

ss1

ss1

ss

PP0PP1PP2PP3PP4PP5PP6PP7PP8


Division - Restoring n times

Shift A and Q left one bit

Subtract M from A, put answer in A

If the sign of A is 1 set q0 to 0

Add M back to A If the sign of A is 0

set q0 to 1

Introduction to Computer Organization and Architecture 60

qn 1-

mn 1-

-bit

Divisor M

Controlsequencer

Dividend Q

Shift left

adder

an 1- a0 q0

m0

a n

0

Add/Subtract

Quotientsetting

n 1+

A

Division – Restoring Example

61

10111

1 1 1 1 1

01111

0

0

0

1

000

0

0

00

0

0

0

00

0

1

00

0

0

0

SubtractShift

Restore

1 00001 0000

1 1

Initially

Subtract

Shift

10111

100001100000000

SubtractShift

Restore

101110100010000

1 1

QuotientRemainder

Shift

10111

1 0000

Subtract

Second cycle

First cycle

Third cycle

Fourth cycle

00

0

0

00

1

0

1

10000

1 11 0000

11111Restore

q0Set

q0Set

q0Set

q0Set

1 01

111 1

01

0001

Division - Nonrestoring n times

If the sign of A is 0 shift A and Q left subtract M from A

Else shift A and Q left add M to A

Now if sign of A is 0 set q0 to 1

Else set q0 to 0

If the sign of A is 1 add M to A


qn 1-

mn 1-

-bit

Divisor M

Controlsequencer

Dividend Q

Shift left

adder

an 1- a0 q0

m0

a n

0

Add/Subtract

Quotientsetting

n 1+

A

Division – Nonrestoring Example

63

1

Add

Quotient

Remainder

0 0 0 01

0 0 1 01 1 1 1 1

1 1 1 1 10 0 0 1 1

0 0 0 01 1 1 1 1

Shift 0 0 01100001111

Add

0 0 0 1 10 0 0 0 1 0 0 01 1 1 0 1

ShiftSubtract

Initially 0 0 0 0 0 1 0 0 0

1 1 1 0 0000

1 1 1 0 00 0 0 1 1

0 0 0ShiftAdd

0 0 10 0 0 011 1 1 0 1

ShiftSubtract

0 0 0 110000

Restore remainder

Fourth cycle

Third cycle

Second cycle

First cycle

q0Set

q0Set

q0Set

q0Set

1 01

111 1

01

0001

Floating Point – Single Precision IEEE-754, 854 Decimal point can

“move” – hence it’s “floating”

Floating point is useful for scientific calculations

Can represent Very large integers and Very small fractions ~10±38


Sign ofnumber :

32 bits

mantissa fraction23-bit

representationexcess-127exponent in8-bit signed

Value represented

0 0 1 0 1 0 . . . 00 0 0 1 0 1 0 0 0

S M

Value represented

(a) Single precision

(b) Example of a single-precision number

E

+

1.0010100 287-

=

1. M 2E 127-

=

0 signifies-1 signifies

Floating Point – Double Precision Double Precision can

represent ~10±308


52-bitmantissa fraction

11-bit excess-1023exponent

64 bits

Sign

S M

Value represented 1. M 2E 1023-

=

E

Floating Point The IEEE Standard

requires these operations, at a minimum

Add Subtract Multiply Divide Remainder Square Root Decimal/Binary Conversion

Special Values

Exceptions Underflow, Overflow,

divide by 0, inexact, invalid


E’ M Value

0 0 +/- 0

255 0 +/- ∞

0 ≠ 0 ±0.M X 2 -126

255 ≠ 0 Not a Number NaN

FP Arithmetic Operations Add/Subtract

Shift mantissa of smaller exponent number right by the difference in exponents

Set the exponent of the result = the larger exponent Add/Sub Mantissas, get sign Normalize

MultiplyDivide Add/Sub exponents, Subtract/Add 127 Multiply/Divide Mantissas, determine sign Normalize


FP Guard Bits and Truncation Guard bits

Extra bits during intermediate steps to yield maximum accuracy in the final result

They need to be removed when generating the final result Chopping

simply remove guard bits

Von Neumann rounding if all guard bits 0, chop, else 1

Rounding Add 1 to LSB if guard MSB = 1


FP Add-Subtract Unit


E X

Magnitude M

with larger EM of number

with smaller EM of number

subtractor8-bit

sign

subtractor8-bit

MUX

Mantissa

SHIFTER

SWAP

detector

Normalize andround

Leading zeros

to right

adder/subtractor

SubtractAdd /

Sign

Add/Sub

n bitsS A S B

E A E B M A M B

n E A E B -=

E A E B

S R

E X-

E R M RR :32-bitresultR A B+=

32-bit operandsA : S A E A M AB : S B E B M B

CombinationalCONTROL

network

The End Lecture 10

introduction to computer organization and architecture lecture 10 by juthawut chantharamalee jutha...

Documents