prof. yih huang

25
1 CS365 1 Arithmetic and Logic Unit CS 365 Lecture 5 Prof. Yih Huang CS365 2 Inside a Processor Data Cache Instruction Cache (Internal) Bus Integer Arithmetic Circuits Floating Point Arithmetic Circuits Branch Control Logic Registers

Upload: others

Post on 27-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

1

CS365 1

Arithmetic and Logic Unit

CS 365 Lecture 5

Prof. Yih Huang

CS365 2

Inside a Processor

Data Cache Instruction Cache

(Internal) Bus

Integer Arithmetic Circuits

Floating Point Arithmetic CircuitsBranch

ControlLogic

Registers

2

CS365 3

Arithmetic and Logic Unit (ALU)

�The part of a processor circuit that actually gets the computations done.

32

32

32

operation

result

a

b

ALU

CS365 4

�Bits are just bits (no inherent meaning)

�Binary numbers (base 2) ⇒ decimal: 0...2n-1

�ASCII codes

�Of course it gets more complicated:numbers are finite (overflow)fractions and real numbersnegative numbers

�How do we represent negative numbers?

Numbers

3

CS365 5

� Sign Magnitude: One's Complement Two's Complement000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1

�Most of the modern architectures use two’s complement.

Possible Representations

CS365 6

Two’s Complement Numbers

�0010 =

�1010 =

�-10 in 8-bit two’s complement =

X3 X2 X1 X0

202122−−−−23

4

CS365 7

0000 0000 0000 0000 0000 0000 0000 0000 = 00000 0000 0000 0000 0000 0000 0000 0001 = +10000 0000 0000 0000 0000 0000 0000 0010 = +2...0111 1111 1111 1111 1111 1111 1111 1110 = +2,147,48 3,6460111 1111 1111 1111 1111 1111 1111 1111 = +2,147,48 3,6471000 0000 0000 0000 0000 0000 0000 0000 = –2,147,48 3,6481000 0000 0000 0000 0000 0000 0000 0001 = –2,147,48 3,6471000 0000 0000 0000 0000 0000 0000 0010 = –2,147,48 3,646...1111 1111 1111 1111 1111 1111 1111 1101 = –31111 1111 1111 1111 1111 1111 1111 1110 = –21111 1111 1111 1111 1111 1111 1111 1111 = –1

32-bit Signed Numbers

Two’s Complement Decimal

CS365 8

�Negating a two's complement number: invert all bits and add 1

– remember: “negate” and “invert” are different!

�Exercises (in 6 bits)

– Negate 12

– Negate -5

Two's Complement Operations

5

CS365 9

Sign Extensions

�MIPS 16 bit immediate gets converted to 32 bits for arithmetic

�copy the most significant bit (the sign bit) into the other bits

0010 ⇒⇒⇒⇒ 0000 00101010 ⇒⇒⇒⇒ 1111 1010

4 bit number 8 bit equivalent

CS365 10

Additions & Subtractions

�Just like regular binary numbers

0010+ 0110

1111+ 0001

1111+ 1111

0010- 0110

1111- 0001

1111- 1111

6

CS365 11

Overflows

�Result too large to store in finite-size computer words

– e.g., adding two n-bit numbers does not always yields an n-bit number

�Depends on the kind of numbers you have in mind: Signed or unsigned

0010+ 0110

1000- 0001

CS365 12

�No overflow when adding a positive and a negative number

�No overflow when signs are the same for subtraction

�Overflows when the value affects the sign:

Detecting Overflow

>0>0<0A−B

<0<0>0A−B

>0<0<0A+B

<0>0>0A+B

resultBA

7

CS365 13

�Architecture and case dependent

�Solution 1: just remember it and leave the handing to software.

– The condition/flag register of IA32

�Solution 2: exception/interrupt

– Control jumps to predefined address for exception

– Interrupted address is saved for possible resumption

– Used by MIPS

Effects of Overflow

CS365 14

Discussion

� IA32 provides an addc (add with carry) instruction. What is its use?

8

CS365 15

� Problem: Consider a logic function with three inputs: A, B, and C.

Output D is true if at least one input is trueOutput E is true if exactly two inputs are trueOutput F is true only if all three inputs are true

� Show the truth table for these three functions.

� Show the Boolean equations for the three functions.

� Show an implementation consisting of inverters, AND, and OR gates

Review: Boolean Algebra & Gates

CS365 16

Design An Overflow Detector

� Inputs: SA (sign of A), SB (Sign of B), OP (operation, 0 for add, 1 for sub).

� Output: OF=0 no overflow, 1 overflow

� Truth Table:

� Boolean equation for OF.

� A circuit design of OF according to the equation above. 111

011

101

001

110

010

100

000

OFSBSAOP

9

CS365 17

�Selects one of the inputs to be the output, based on a control input

�Note: we call this 2-input multiplexer even though it actually has three inputs

Review: The Multiplexer

Multiplexor

Output

Select

A B

CS365 18

More Inputs

�The general case: N-input multiplexer needs log2N select lines.

�You should be able to design its logic circuit.

Multiplexor

Output

Select

A B C D

2

10

CS365 19

Second Exercise

�Let us build a one-bit ALU to support addition and logic or.

– Operation: 0 for add 1 for or

operation

result

a

b

ALU

CS365 20

Solution

�Truth Table

�Sum of product

11

CS365 21

Supporting MIPS Logic Instructions

�MIPS provides bit-wise and , or , xor , and nor instructions.

� Input operation (3 bits) determine the output.

operation

result

a

b

ALU

3

CS365 22

32-bit ALU

�Both inputs A and B are 32 bit wide.

– Size of the truth table ?

�Rather we will just cascade 32 1-bit ALU.

– How about carries ?

– We need to refine the spec of the 1-bit ALU

12

CS365 23

Two Solutions

�Truth table and sum of product

�Use multiplexer

CS365 24

1-bit Adder

�How could we build a 1-bit ALU for add, and, or?

�How could we build a 32-bit ALU?

+A

B

Cin Cout = AB + ACin + BCin

Sum = A xor B xor Cin

13

CS365 25

Building a 32-bit ALU

b

0

2

Result

Operation

a

1

CarryIn

CarryOut

R e su lt 3 1a 3 1

b 3 1

R e su lt 0

C a rr y In

a 0

b 0

R e su lt 1a 1

b 1

R e su lt 2a 2

b 2

O p e ra t io n

A L U 0

C a rry In

C a rry O u t

A L U 1

C a rry In

C a rry O u t

A L U 2

C a rry In

C a rry O u t

A L U 3 1

C a rry In

CS365 26

� Two's complement approach: negate b and add.

� How do we negate?

� A clever solution:

What about subtraction (a – b) ?

0

2

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b

14

CS365 27

�Need to support the set-on-less-than instruction (slt)

– remember: slt is an arithmetic instruction

– produces a 1 if rs < rt and 0 otherwise

– use subtraction: (a-b) < 0 implies a < b

�Need to support test for equality (beq $t5, $t6, offset)

– use subtraction: (a-b) = 0 implies a = b

Tailoring the ALU to the MIPS

CS365 28

Supporting slt

0 3

Re

su

lt

Op

era

tio

n

a

1

Ca

rryI

n

Ca

rryO

ut

0 1

Bin

ve

rt

b2

Le

ss

0 3

Re

su

lt

Op

era

tio

n

a

1

Ca

rryI

n

0 1

Bin

ve

rt

b2

Le

ss

Se

t

Ov

erf

low

de

tec

tio

nO

ve

rflo

w

a. b.

15

CS365 29

Se

ta

31 0

AL

U0

Re

sult0

Ca

rryI

n

a0

Re

sult1

a1 0

Re

sult2

a2 0

Op

era

tion

b3

1b0

b1

b2

Re

sult3

1

Ov

erf

low

Bin

vert

Ca

rry

In

Le

ss

Ca

rryI

n

Ca

rryO

ut

AL

U1

Le

ss

Ca

rryI

n

Ca

rryO

ut

AL

U2

Le

ss

Ca

rryI

n

Ca

rryO

ut

AL

U3

1L

ess

Ca

rryI

n

CS365 30

Test for equality

� Notice control lines:

Seta31

0

Result0a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Bnegate

Zero

ALU0Less

CarryIn

CarryOut

ALU1Less

CarryIn

CarryOut

ALU2Less

CarryIn

CarryOut

ALU31Less

CarryIn

000 = and001 = or010 = add110 = subtract111 = slt

� Ouput zero=1 when result is 0.

16

CS365 31

� Important points about hardware

– all of the gates are always working

– the speed of a gate is affected by the number of inputs to the gate

– the speed of a circuit is affected by the number of gates in series(on the “critical path” or the “deepest

level of logic”)

– What is the critical path in our 32-bit ALU?

CS365 32

Ripple Carry Adder Is Slow

�Logic circuit speed is determined by the number of gates a signal have to pass in the worst case.

�Assuming each 1-bit ALU adds x–gate delay, what is the delay of a 32-bit ALU?

17

CS

36533

Carry Look A

head A

BC

out0

00

“kill”0

1C

in“propagate”

10

Cin

“propagate”1

11

“generate”A

0

B0

SGP

G = A

and

BP

= A xo

rB

A1

B1

SGP

A2

B2

SGP

A3

B3

SGP

Cin

C1 =

G0 +

C0 P

0

C2 =

G1 +

G0 P

1 + C

0 P0 P

1

C3 =

G2 +

G1 P

2 + G

0 P1 P

2 + C

0 P0 P

1 P2

G

C4 =

. . .

P

CS

36534

16-Bit A

dder

CarryIn

Result0--3

ALU0

CarryIn

Result4--7

ALU1

CarryIn

Result8--11

ALU2

CarryIn

CarryOut

Result12--15

ALU3

CarryIn

C1

C2

C3

C4

P0G0

P1G1

P2G2

P3G3

pigi

p i + 1gi + 1

ci + 1

ci + 2

ci + 3

ci + 4

pi + 2gi + 2

pi + 3gi + 3

a0 b0 a1 b1 a2 b2 a3 b3

a4 b4 a5 b5 a6 b6 a7 b7

a8 b8 a9 b9

a10 b10 a11 b11

a12 b12 a13 b13 a14 b14 a15 b15

Carry-lookahead unit

18

CS365 35

Exercise A B Cout0 0 0 “kill”0 1 Cin “propagate”1 0 Cin “propagate”1 1 1 “generate”A0 S

GP

G = A and BP = A xor BA1 S

GP

A2 SGP

A3 SGP

Cin

C1 =

C2 =

C3 =

G =

C4 = . . .

P =

CS365 36

Multiplications

�N bits × N bits → 2N bits result

�Paper and pencil example (unsigned):

Multiplicand: 0 0 1 1

Multiplier: 0 1 0 1

19

CS365 37

Unsigned Combinational Multiplier

�Stage i accumulates A * 2 i if B i == 1

B0

A0A1A2A3

A0A1A2A3

A0A1A2A3

A0A1A2A3

B1

B2

B3

P0P1P2P3P4P5P6P7

0 0 0 0

CS365 38

Discussions

�Multiplication is expensive

�A combinational multiplier uses a great deal of silicon

– 32 32-bit adders needed

�We will discuss designs that are slower but less silicon demanding.

– Due to its complexity, we will first present a basic but suboptimal design, and refine it twice.

20

CS365 39

Shift and Add

�One step per clock tick; n clock cycles needed for n-bit multiplications

B0

A0A1A2A3

A0A1A2A3

A0A1A2A3

A0A1A2A3

B1

B2

B3

P0P1P2P3P4P5P6P7

0 0 0 00 0 0

Clock tick

Clock tick

Clock tick

Clock tick

CS365 40

Example: 0101 ×××× 0011

Product

Multiplicand

To addor not to add

Multiplier

21

CS365 41

Unsigned Shift-Add Multiplier�64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit multiplier reg

Product

Multiplier

Multiplicand

64-bit ALU

Shift Left

Shift Right

WriteControl

32 bits

64 bits

64 bits

Multiplier = datapath + control

CS365 42

Observations

�Half bits in multiplicand always 0

– 64 bit adder is a waste

� Improvement:

– Use 32 bit multiplicand

– Don’t shift multiplicand left; shift the product right instead

22

CS365 43

Example: 0101 ×××× 0011

Product

Multiplicand

To addor not to add

Multiplier

CS365 44

Shift-Add Multiplier Version 2

�32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, 32-bit Multiplier reg

Product

Multiplier

Multiplicand

32-bit ALU

Shift Right

WriteControl

32 bits

32 bits

64 bits

Shift Right

23

CS365 45

A Second Example

Product Multiplier Multiplicand

0000 0000 0011 0010

0010 0000

0001 0000 0001 00100011 0000 0001 0010

0001 1000 0000 0010

0000 1100 0000 00100000 0110 0000 0010

CS365 46

Observations

�Product register wastes space that exactly matches size of multiplier

�Improvement: combine Multiplier register and Product register

24

CS365 47

Example: 0101 ×××× 0011

Product

MultiplicandTo addor not to add

Multiplier

CS365 48

A Second Example

Multiplicant

Initial Product

Product after 1st shift

Product after 2nd shift

Product after 3rd shift

Product after 4th shift

Product after 5th shift

0 10 0 1

0 00 0 0 1 00 1 1

25

CS365 49

Multiplier Hardware Version 3

�32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg)

Product (Multiplier)

Multiplicand

32-bit ALU

WriteControl

32 bits

64 bits

Shift Right

CS365 50

Discussions

�Can you see where the MIPS Hi and Lo registers come from?

�Can you see the special hardware associated with IA32 EAX and EDX?