cs 232: computer architecture ii

9
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic

Upload: hyatt-burt

Post on 31-Dec-2015

17 views

Category:

Documents


2 download

DESCRIPTION

CS 232: Computer Architecture II. Prof. Laxmikant (Sanjay) Kale Floating point arithmetic. Floating Point (a brief look). We need a way to represent numbers with fractions, e.g., 3.1416 very small numbers, e.g., .000000001 very large numbers, e.g., 3.15576  10 9 Representation: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 232:  Computer Architecture II

CS 232: Computer Architecture II

Prof. Laxmikant (Sanjay) Kale

Floating point arithmetic

Page 2: CS 232:  Computer Architecture II

Floating Point (a brief look)

• We need a way to represent

– numbers with fractions, e.g., 3.1416

– very small numbers, e.g., .000000001

– very large numbers, e.g., 3.15576 109

• Representation:

– sign, exponent, significand: (–1)sign significand 2exponent

– more bits for significand gives more accuracy

– more bits for exponent increases range

• IEEE 754 floating point standard:

– single precision: 8 bit exponent, 23 bit significand

– double precision: 11 bit exponent, 52 bit significand

Page 3: CS 232:  Computer Architecture II

Floating point representation:

• The idea is to normalize all numbers, so the significand has exactly one digit to the left of the decimal point.– 12345 = 1.2345 * 10^4

– .0000012345 = 1.2345 * 10^-6

– Do this in binary: 1.01110 x 2^(1011)

• IEEE FP representation– (+/-) 1.0101010101010101010101 * 2 ^ ( 10101010)

– This is single precision

– Double precision: 64 bits in all.

• Where does one need accuracy of that level?

Page 4: CS 232:  Computer Architecture II

Floating point numbers• Representation issues:

– sign bit, exponent, significand

– Question: how to represent each field

– Question: which order to lay them out in a word?

– Factor: should be easy to do comparisons (for sorting)

• For arithmetic, we will have special hardware anyway

– Choice:

• Sign + magnitude representation

• Sign bit, followed by exponent, then significand (why?)

• exponent: represented with a “bias”: add 127 (1023 for double precision)

• significand: assume implicit 1. (so 00001 means 1.00001)

Page 5: CS 232:  Computer Architecture II

Floating point representation• So:

– (+/-) x (1 + significand) x 2 ^ (exponent - bias) is the value of a floating point number

– Example: 0 00001000 01010000000000000000000

– Example: convert -.41 to single precision form

Page 6: CS 232:  Computer Architecture II

IEEE 754 floating-point standard

• Leading “1” bit of significand is implicit

• Exponent is “biased” to make sorting easier– all 0s is smallest exponent all 1s is largest

– bias of 127 for single precision and 1023 for double precision

– summary: (–1)sign significand) 2exponent – bias

• Example:– decimal: -.75 = -3/4 = -3/22

– binary: -.11 = -1.1 x 2-1

– floating point: exponent = 126 = 01111110

– IEEE single precision: 10111111010000000000000000000000

Page 7: CS 232:  Computer Architecture II

Floating point addition

• The problem is: the exponents of numbers being added may be different– 2.0 * 10^1 + 3.0 * 10^(-1)

– 2.0 * 10^1 + .03 * 10^ 1 : Now we can add them

– 2.03 * 10 ^1

– But we are not necessarily done!

– E.g. 9.74 * 10^0 + 3.3 * 10^(-1)

– 10.07 * 10^0 is not correct form!

– Shift again to get the correct form: 1.037 * 10^1

Page 8: CS 232:  Computer Architecture II

You can get different results

• A + B + C = A + (B+C) = (A+B) + C– Right?

• Can you see a problem?• When do you lose bits?

Page 9: CS 232:  Computer Architecture II

Floating point multiplication

• Add exponents, but subtract bias• Then multiply significands• Then normalize