datorteknik floatingpoint bild 1 floating point number system corresponding to the decimal notation...

Datorteknik FloatingPoint bild 1

Floating point

Number system corresponding to the decimal notation

1,837 * 10

significand exponent

a great number of corresponding binary standards exists

there is one common standard:

IEEE 754-1985 (IEC 559)

4


IEEE 754-1985

Number representation

Single precision (32 bits)

sign: 1 bit

exponent: 8 bits

fraction: 23 bits

Double precision (64 bits)

sign: 1 bit

exponent: 11 bits

fraction: 52 bits

Single extended and double extended numbers exists inside the floating point hardware


IEEE 754-1985

1 8 23

sign

exponent:excess 127binary integer

S E M

mantissa:sign + magnitude, normalizedbinary significand w/ hiddeninteger bit: 1.M

Single Precision:

actual exponent ise = E - 127

N = (-1) 2 (1.M)E-127

0 < E < 255

0 = 0 00000000 0 . . . 0 -1.5 = 1 01111111 10 . . . 0

Magnitude of numbers that can be represented is in the range:

2-126

(1.0) to 2127

(2 - 223)

which is approximately:

1.8 x 10-38

to 3.40 x 10 38


IEEE 754-1985

Fraction part:23 / 52 bits;

0 ≤ x <1

Significand:1 + fraction part

“1” is not stored; “hidden bit”

corresponds to 7 resp. 16 decimal digits

Exponent:127 / 1023 added to the exponent;

“biased exponent”

corresponds to 10 - 10

resp. 10 - 10

-39 39

-308 308


IEEE 754-1985

Special features:

Correct rounding of “halfway” result (to even number)

Includes special values:

NaN Not a number

∞ Infinity

-∞ - Infinity

Uses denormal number to represent

numbers less than 2

Rounds to nearest by default; Three other rounding modes exists.

Sophisticated exception handling

Emin


Multiplication

(s1 * 2 ) * (s2 * 2 ) = s1*s2 *2

so, multiply significands and add exponents

Problem:Significand coded in signed-

magnitude - use unsigned multiplication and take care of sign

Round 2n bits significand to n bits significand

Compute new exponent with respect to bias

e1 e2 e1+e2


Rounding

1. Multiply the two significands to get the 2n-bits product:

Case 1: x0 = 0, shift needed:

Case 2: x0 = 1, increment exponent, set g=r; r=s or r

x0 x1 x2 x3 x4 x5 g r s s s s

P A

x1 x2 x3 x4 x5 g r s s s s

P A

x0 x1 x2 x3 x4 x5 r s s s s s

P A

These four bitsOR:ed together(“sticky bit”)

guard roundbit bit


Rounding

2: For both cases:

if r = 0, P is the correctly rounded product.

if r = 1 and s = 1, then P + 1 is the correctly rounded product

if r = 1 and s = 0, (the “halfway case”), then

P is the correctly rounded product if x5 (or g) is 0

P+1 is the correctly rounded product if x5 (or g) is 1


Add / Sub

(s1 * e ) + (s2 * e ) = (s3 * e )

1: Shift summands so they have the same exponent.

(eg. if e2 < e1: shift s2 right and increment e2 until e1 = e2)

2: Add significands

3: Normalize number(shift s3 left and decrement e3 until

MSB = 1)

4: Round s3 correctly(under the common assumption

that more than 23 / 52 bits is internally used for addition)

Subtraction use the same method

e1 e2 e3

s3


Division

(s1 * 2 ) / (s2 * 2 ) = (s1 / s2) * 2e1 e1 e1-e2

so, divide significands and subtract exponents

Problem:Significand coded in signed- magnitude - use unsigned division (different algoritms exists) and take care of sign

Round n + 2 (guard and round) bits significand to n bits significand

Compute new exponent with respect to bias

datorteknik floatingpoint bild 1 floating point number system corresponding to the decimal notation...

Documents