1 chapter 4: arithmetic for computers (part 1) cs 447 jason bakos

74
1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

Post on 20-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

1

Chapter 4: Arithmetic for Computers(Part 1)

CS 447Jason Bakos

Page 2: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

2

Notes on Project 1

• There are two different ways the following two words can be stored in a computer memory…– word1 .byte 0,1,2,3– word2 .half 0,1

• One way is big-endian, where the word is stored in memory in its original order…– word1:– word2:

• Another way is little-endian, where the word is stored in memory in reverse order…– word1:– word2:

• Of course, this affects the way in which the lw instruction works…

00 01 02 03

0000 0001

0001 0000

03 02 01 00

Page 3: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

3

Notes on Project 1

• MIPS uses the endian-style that the architecture underneath it uses– Intel uses little-endian, so we need to deal with that– This affects assignment 1 because the input data is

stored as a series of bytes– If you use lw’s on your data set, the values will be

loaded into your dest. register in reverse order– Hint: Try the lb/sb instruction

• This instruction will load/store a byte from an unaligned address and perform the translation for you

Page 4: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

4

Notes on Project 1

• Hint: Use SPIM’s breakpoint and single-step features to help debug your program– Also, make sure you use the registers and

memory/stack displays

• Hint: You may want to temporarily store your input set into a word array for sorting

• Make sure you check Appendix A for additional useful instructions that I didn’t cover in class

• Make sure you comment your code!

Page 5: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

5

Goals of Chapter 4

• Data representation• Hardware mechanisms for performing

arithmetic on data• Hardware implications on the

instruction set design

Page 6: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

6

Review of Binary Representation

• Binary/Hex -> Decimal conversion• Decimal -> Binary/Hex conversion• Least/Most significant bits• Highest representable number/maximum number

of unique representable symbols• Two’s compliment representation

– One’s compliment– Finding signed number ranges (-2n-1 to 2n-1-1)– Doing arithmetic with two’s compliment

• Sign extending with load half/byte– Unsigned loads

• Signed/unsigned comparison

Page 7: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

7

Binary Addition/Subtraction

• Binary subtraction works exactly like addition, except the second operand is converted to two’s compliment

• Overflow in signed arithmetic occurs under the following conditions:

Operation Operand A Operand B Result

A+B Positive Positive Negative

A+B Negative Negative Positive

A-B Positive Negative Negative

A-B Negative Positive Positive

Page 8: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

8

What Happens When Overflow Occurs?

• MIPS detects overflow with an exception/interrupt

• When an interrupt occurs, a branch occurs to code in the kernel at address 80000080 where special registers (BadVAddr, Status, Cause, and EPC) are used to handle the interrupt

• SPIM has a simple interrupt handler built-in that deals with interrupts

• We may come back to interrupts later

Page 9: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

9

Review of Shift and Logical Operations

• MIPS has operations for SLL, SRL, and SRA– We covered this in the last chapter

• MIPS implements bit-wise AND, OR, and XOR logical operations– These operations perform a bit-by-bit parallel logical

operation on two registers– In C, use << and >> for arithmetic shifts, and &, |,

^, and ~ for bitwise and, or, xor, and NOT, respectively

Page 10: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

10

Review of Logic Operations

• The three main parts of a CPU– ALU (Arithmetic and Logic Unit)

• Performs all logical, arithmetic, and shift operations

– CU (Control Unit)• Controls the CPU – performs load/store, branch,

and instruction fetch

– Registers• Physical storage locations for data

Page 11: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

11

Review of Logic Operations

• In this chapter, our goal is to learn how the ALU is implemented

• The ALU is entirely constructed using boolean functions as hardware building blocks– The 3 basic digital logic building blocks can be used

to construct any digital logic system: AND, OR, and NOT

– These functions can be directly implemented using electric circuits (wires and transistors)

Page 12: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

12

Review of Logic Operations

• These “combinational” logic devices can be assembled to create a much more complex digital logic system

A B A AND B

0 0 0

0 1 0

1 0 0

1 1 1

A B A OR B

0 0 0

0 1 1

1 0 1

1 1 1

A not A

0 1

1 0

Page 13: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

13

Review of Logic Operations

• We need another device to build an ALU…

• This is called a multiplexor… it implements an if-then-else in hardware

A B D C (out)

0 0 0 0 (a)

0 0 1 0 (b)

0 1 0 0 (a)

0 1 1 1 (b)

1 0 0 1 (a)

1 0 1 0 (b)

1 1 0 1 (a)

1 1 1 1 (b)

Page 14: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

14

A 1-bit ALU

• Perform logic operations in parellel and mux the output

• Next, we want to include addition, so let’s build a single-bit adder– Called a full adder

Page 15: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

15

Full Adder

• From the following table, we can construct the circuit for a full adder and link multiple full adders together to form a multi-bit adder

– We can also add this input to our ALU– How do we give subtraction ability to our adder?– How do we detect overflow and zero results?

Inputs Outputs Comments

A B CarryIn CarryOut Sum

0 0 0 0 0 0+0+0=00

0 0 1 0 1 0+0+1=01

0 1 0 0 1 0+1+0=1

0 1 1 1 0 0+1+1=10

1 0 0 0 1 1+0+0=01

1 0 1 1 0 1+0+1=10

1 1 0 1 0 1+1+0=10

1 1 1 1 1 1+1+1=11

Page 16: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

16

Chapter 4: Arithmetic for Computers(Part 2)

CS 447Jason Bakos

Page 17: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

17

Logic/Arithmetic

• From the truth table for the mux, we can use sum-of-products to derive the logic equation– With sum-of-products, for each ‘1’ row for

each output, we AND together all the inputs (inverting the input 0’s), then OR all the row products

• To make it simpler, let’s add “don’t cares” to the table…

Page 18: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

18

Logic/Arithmetic

• This gives us the following equation– (A and (not D)) or (B and D)– We don’t need the inputs for the “don’t cares” in our

partial products– This is one way to simplify our logic equation

• Other ways include propositional calculus, Karnaugh Maps, and the Quine-McCluskey algorithm

A B D C (out)

0 X 0 0 (a)

X 0 1 0 (b)

1 X 0 1 (a)

X 1 1 1 (b)

Page 19: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

19

Logic/Arithmetic

• Here is a (crude) digital logic design for the 2-to-1 mux

• Note that multiple muxes can be assembled in stages to implement multiple-input muxes

A

D

B

Page 20: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

20

Logic/Arithmetic

• For the adder, let’s minimize the logic using a Karnaugh Map…

• For CarryOut, we need 23 entries…

• We can minimize this to– CarryOut=AB+CarryInB+CarryInC

AB

CarryIn 00 01 11 10

0 1

1 1 1 1

Page 21: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

21

Logic/Arithmetic

• There’s no way to minimize this equation, so we need the full sum of products:– Sum=(not A)(not B)CarryIn + ABCarryIn +

(not A)BCarryIn + A(not B)CarryIn

AB

CarryIn 00 01 11 10

0 1 1

1 1 1

Page 22: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

22

Logic/Arithmetic

• In order to implement subtraction, we can invert the B input to the adder and set CarryIn to be 1– This can be implemented with a mux: select B or not

B (call this input Binvert)

• Now we can build a 1-bit ALU using an AND, OR, addition, and subtraction operation– We can perform the AND, OR, and ADD in parallel

and switch the results with a 4-input mux (Operation will be our D-input)

– To make the adder a subtractor, we’ll need to have to set Binvert and CarryIn to 1

Page 23: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

23

Lecture 4: Arithmetic for Computers(Part 3)

CS 447Jason Bakos

Page 24: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

24

Chapter 4 Review

• So far, we’ve covered the following topics for this chapter– Binary representation of signed integers

• 16 to 32 bit signed conversion– Binary addition/subtraction

• Overflow detection/overflow exception handling– Shift and logical operations– Parts of the CPU– AND, OR, XOR, and inverter gates– Multiplexor (mux) and full adder– Sum-of-products logic equations (truth tables)– Logic minimization techniques

• Don’t cares and Karnaugh Maps

Page 25: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

25

1-bit ALU Design

• A 1-bit ALU can be constructed– Components

• AND, OR, and adder• 4-to-1 mux• “Binverter” (inverter and 2-to-1 mux)

– Interface• Inputs: A, B, Binvert, Operation (2 bits), CarryIn, and Less• Outputs: CarryOut and Result

– Digital functions are performed in parallel and the outputs are routed into a mux

• The mux will also accept a Less input which we’ll accept from outside the 1-bit ALU

– The select lines of the mux make up the “operation” input to the ALU

Page 26: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

26

32-bit ALU

• In order to create a multi-bit ALU, array 32 1-bit ALUs– Connect the CarryOut of each bit to the CarryIn of the

next bit– A and B of each 1-bit ALU will be connected to each

successive bit of the 32-bit A and B– The Result outputs of each 1-bit ALU will form the 32-bit

result

• We need to add an SLT unit and connect the output to the least significant 1-bit ALU’s Less input– Hardwire the other “Less” inputs to 0

• We need to add an Overflow unit• We need to add a Zero detection unit

Page 27: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

27

SLT Unit

• To compute SLT, we need to make sure that when the 1-bit ALU’s Operation is set to 11, a subtract operation is also being computed– With this happening, the SLT unit can compute Less

based on the MSB (sign) of A, B, and Result

Asign Bsign Rsign Less

0 0 0 0

0 0 1 1

0 1 X 0

1 0 X 1

1 1 0 1

1 1 1 0

Page 28: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

28

Overflow Unit

• When doing signed arithmetic, we need to follow this table, as we covered previously…

• How do we implement this in hardware?

Operation Operand A Operand B Result

A+B Positive Positive Negative

A+B Negative Negative Positive

A-B Positive Negative Negative

A-B Negative Positive Positive

Page 29: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

29

Overflow Unit

• We need a truth table…

• Since we’ll be computing the logic equation with SOP, we only need the rows where the output is 1

Operation A(31) B(31) R(31) Overflow

010 (add) 0 0 1 1

010 (add) 1 1 0 1

110 (sub) 0 1 1 1

110 (sub) 1 0 0 1

Page 30: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

30

Zero Detection Unit

• “Or” together all the 1-bit ALU outputs – the result is the Zero output to the ALU

Page 31: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

31

32-bit ALU Operation

• We need a 3-bit ALU Operation input into our 32-bit ALU

• The two least significant bits can be routed into all the 1-bit ALUs internally

• The most significant bit can be routed into the least significant 1-bit ALU’s CarryIn, and to Binvert of all the 1-bit ALUs

Page 32: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

32

32-bit ALU Operation

• Here’s the final ALU Operation table:

ALU Operation Function

000 and

001 or

010 add

110 subtract

111 set on less than

Page 33: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

33

32-bit ALU

• In the end, our ALU will have the following interface:– Inputs:

• A and B (32 bits each)• ALU Operation (3 bits)

– Outputs:• CarryOut (1 bit)• Zero (1 bit)• Result (32 bits)• Overflow (1 bit)

Page 34: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

34

Carry Lookahead

• The adder architecture we previously looked at requires n*2 gate delays to compute its result (worst case)– The longest path that a digital signal must propagate

through is called the “critical path”– This is WAAAYYYY too slow!

• There other ways to build an adder that require lg n delay

• Obviously, using SOP, we can build a circuit that will compute ANY function in 2 gate delays (2 levels of logic)– Obviously, in the case of a 64-input system, the

resulting design will be too big and too complex

Page 35: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

35

Carry Lookahead

• For example, we can easily see that the CarryIn for bit 1 is computed as:– c1=(a0b0)+(a0c0)+(b0c0)– c2=(a1b1)+(a1c1)+(b1c1)

• Hardware executes in parallel, so using the following fast CarryIn computation, we can perform an add with 3 gate delays– c2=(a1b1)+(a1a0b0)+(a1a0c0)+(a1b0c0)+

(b1a0b0)+(b1a0c0)+(b1b0c0)• I used the logical distributive law to compute this• As you can see, the CarryIn logic gets bigger and bigger

for consecutive bits

Page 36: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

36

Carry Lookahead

• Carry Lookahead adders are faster than ripple-carry adders

• Recall:– ci+1=(aibi)+(aici)+(bici)

• ci can be factored out…

– ci+1=(aibi)+(ai+bi)ci– So…– c2=(a1b1)+(a1+b1)((a0b0)+(a0+b0)c0)

Page 37: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

37

Carry Lookahead

• Note the repeated appearance of (aibi) and (ai+bi)

• They are called generate (gi) and propagate (pi)– gi=aibi, pi=ai+bi– ci+1=gi+pici– This means if gi=1, a CarryOut is generated– If pi=1, a CarryOut is propagated from

CarryIn

Page 38: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

38

Carry Lookahead

• c1=g0+(p0c0)• c2=g1+(p1g0)+(p1p0c0)• c3=g2+(p2g1)+(p2p1g0)+(p2p1p0c0)• c4=g3+(p3g2)+(p3p2g1)+

(p3p2p1g0)+(p3p2p1p0c0)• …This system will give us an adder with

5 gate delays but it is still too complex

Page 39: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

39

Carry Lookahead

• To solve this, we’ll build our adder using 4-bit adders with carry lookahead, and connect them using “super”-propagate and generate logic

• The superpropagate is only true if all the bits propagate a carry– P0=p0p1p2p3– P1=p4p5p6p7– P2=p8p9p10p11– P3=p12p13p14p15

Page 40: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

40

Carry Lookahead

• The supergenerate follows a similar equation:

• G0=g3+(p3g2)+(p2p2g1)+(p3p2p1g0)• G1=g7+(p7g6)+(p7p6g5)+(p7p6p5g4)• G2=g11+(p11g10)+(p11p10g9)+(p11p10p9g8)• G3=g15+(p15g14)+(p15p14g13)+(p15p14p13g12)

• The supergenerate and superpropagate logic for the 4-4 bit Carry Lookahead adders is contained in a Carry Lookahead Unit

• This yields a worst-case delay of 7 gate delays– Reason?

Page 41: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

41

Carry Lookahead

• We’ve covered all ALU functions except for the shifter

• We’ll talk after the shifter later

Page 42: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

42

Lecture 4: Arithmetic for Computers(Part 4)

CS 447Jason Bakos

Page 43: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

43

Binary Multiplication

• In multiplication, the first operand is called the multiplicand, and the second is called the multiplier

• The result is called the product• Not counting the sign bits, if we multiply

an n-bit multiplicand with a m-bit multiplier, we’ll get a n+m-bit product

Page 44: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

44

Binary Multiplication

• Binary multiplication works exactly like decimal multiplication

• In fact, multiply 100101 by 111001 and pretend you’re using decimal numbers

Page 45: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

45

First Hardware Design for Multiplier

Note that the multiplier is not routed into the ALU

Page 46: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

46

Second Hardware Design for Multiplier

• Architects realized that at the least, half of the bits in the multiplicand register were 0

• Reduce ALU to 32 bits, shift the product right instead of shifting the multiplicand left

• In this case, the product is only 32 bits

Page 47: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

47

Second Hardware Design for Multiplier

Page 48: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

48

Final Hardware Design for Multiplier

• Let’s combine the product register with the multiplier register…– Put the multiplier in the right half of the product

register and initialize the left half with zeros – when we’re done, the product will be in the right half

Page 49: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

49

Final Hardware Design for Multiplier

Page 50: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

50

Final Hardware Design for Multiplier

• For the first two designs, we need to convert the multiplicand and the multiplier must be converted to positive– The signs would need to be remembered so the

product can be converted to whatever sign it needs to be

• The third design will deal with signed numbers, as long as the sign bit is extended in the product register

Page 51: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

51

Booth’s Algorithm

• Booth’s Algorithm starts with the observation that if we have the ability to both add and subtract, there are multiple ways to compute a product– For every 0 in the multiplier, we shift the

multiplicand– For every 1 in the multiplier, we add the

multiplicand to the product, then shift the multiplicand

Page 52: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

52

Booth’s Algorithm

• Instead, when a 1 is seen in the multiplier, subtract instead of add

• Shift for all 1’s after this, until the first 0 is seen, then add

• The method was developed because in Booth’s era, shifters were faster than adders

Page 53: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

53

Booth’s Algorithm

• Example:

0010 == 2

x 0110 == 6

0000 == 0 shift

0010 == -2 (*21) subtract (first 1)

0000 == 0 shift (second 1)

0010 == 2 (*23) (first 0)

-4+16=2*6=12

Page 54: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

54

Lecture 4: Arithmetic for Computers(Part 5)

CS 447Jason Bakos

Page 55: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

55

Binary Division

• Like last lecture, we’ll start with some basic terminology…– Again, let’s assume our numbers are base

10, but let’s only use 0’s and 1’s

Re

10011000 1001010

1000

1010

1000

10

Quotient

Divisor Dividend

mander

Page 56: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

56

Binary Division

• Recall:– Dividend=Quotient*Divisor + Remainder

• Let’s assume that both the dividend and divisor are positive and hence the quotient and the remainder are nonnegative

• The division operands and both results are 32-bit values and we will ignore the sign for now

Page 57: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

57

First Hardware Design for Divider

Initialize the Quotient register to 0, initialize the left-half of the Divisor register with the divisor, and initialize the Remainder register with the dividend (right-aligned)

Page 58: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

58

Second Hardware Design for Divider

Also, the algorithm must be changed so the remainder is shifted left before the subtraction takes place

Much like with the multiplier, the divisor and ALU can be reduced to 32-bits if we shift the remainder right instead of shifting the divisor to the left

Page 59: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

59

Third Hardware Design for Divider

Shift the bits of the quotient into the remainder register…

Also, the last step of the algorithm is to shift the left half of the remainder right 1 bit

Page 60: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

60

Signed Division

• Simplest solution: remember the signs of the divisor and the dividend and then negate the quotient if the signs disagree

• The dividend and the remainder must have the same signs

Page 61: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

61

Considerations

• The same hardware can be used for both multiply and divide– Requirement: 64-bit register that can shift

left or right and a 32-bit ALU that can add or subtract

Page 62: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

62

Floating Point

• Floating point (also called real) numbers are used to represent values that are fractional or that are too big to fit in a 32-bit integer

• Floating point numbers are expressed in scientific notation (base 2) and are normalized (no leading 0’s)– 1.xxxx2 * 2yyyy

• In this case, xxxx is the significand and yyyy is the exponent

Page 63: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

63

Floating Point

• In MIPS, a floating point is represented in the following manner (IEEE 754 standard):– bit 31: sign of significand– bit 30..23 (8) exponent (2’s comp)– bit 22..0 (23) significand– Note that size of exponent and significand must be traded off...

accuracy vs. range• This allows us representation for signed numbers as small as 2x10-38 to

2x1038

• Overflow and underflow must be detected• Double-precision floating point numbers are 2 words... the significand is

extended to 52 bits and the exponent to 11 bits• Also, the first bit of the significand is implicit (only the fractional part is

specified)• In order to represent 0 in a float, put 0 in the exponent field

– So here’s the equation we use: (-1)S x (1+Significand) x 2E

– Or: (-1)S X (1+ (s1x2-1) + (s2x2-2) + (s3x2-3) + (s4x2-4) + ...) x 2E

Page 64: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

64

Considerations

• IEEE 754 sought to make floating-point numbers easier to sort– sign is first bit– exponent comes first

• But we want an all-0 (+1) exponent to represent the most-negative exponent and an all-1 exponent to be the most positive

• This is called biased-notation, so we’ll use the following equation:

– (-1)S x (1 + Significand) x 2(Exponent-Bias)

– Bias is 127 for single-precision and 1023 for double-precision

Page 65: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

65

Lecture 4: Arithmetic for Computers(Part 6)

CS 447Jason Bakos

Page 66: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

66

Converting Decimal Floating Point to Binary

• Use the method I showed last lecture...– Significand:

• Use the iterative method to convert the fractional part to binary

• Convert the integer part to binary using the “old-fashioned” method

• Shift the decimal point to the left until the number is normalized

• Drop the leading 1, and set the exponent to be the number of positions you shifted the decimal point

• Adjust the exponent for bias (127/1023)

Page 67: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

67

Floating Point Addition

• Let’s add two decimal floating point numbers...– Let’s try 9.999 x 101 + 1.610 x 10-1

– Assume we can only store 4 digits of the significand and two digits of the exponent

Page 68: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

68

Floating Point Addition

• Match exponents for both operands by un-normalizing one of them– Match to the exponent of the larger number

• Add significands• Normalize result• Round significand

Page 69: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

69

Binary Floating Point Addition

Page 70: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

70

Floating Point Multiplication

• Example: 1.110 x 1010 X 9.200 x 10-5

– Assume 4 digits for significand and 2 digits for exponent

– Calculate the exponent of the product by simply adding the exponents of the operand

• 10+(-5)=5

– Bias the exponents• 137+122=259

– Something’s wrong! We added the biases with the exponents...

• 5+127=132

Page 71: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

71

Floating Point Multiplication

• Multiply the significands...– 1.110 x 9.200=10.212000

• Normalize and add add 1 to exponent– 1.0212 x 106

• Round significand to four digits– 1.021

• Set sign based on signs of operands– = +1.021 x 106

Page 72: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

72

Floating Point Multiplication

Page 73: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

73

Accurate Arithmetic

• Integers can represent every value between the largest and smallest possible values

• This is not the case with floating point– Only 253 unique values can be represented with

double precision fp– IEEE 754 always keeps 2 extra bits on the right of the

significand during intermediate calculation called guard and round to minimize rounding errors

Page 74: 1 Chapter 4: Arithmetic for Computers (Part 1) CS 447 Jason Bakos

74

Accurate Arithmetic

• Since the worst case for rounding would be when the actual number is halfway between two floating point representations, accuracy is measured as number of least-significant error bits– This is called units in the last place (ulp)

• IEEE 754 guarantees that the computer is within .5 ulp (using guard and round)