1 ieee floating point revision guide for phase test week 5

23
IEEE Floating Point Revision Guide for Phase Test Week 5

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

1

IEEE Floating Point

Revision Guide for Phase

Test

Week 5

Page 2: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

2

Floating Point

15900000000000000 could be represented as

14

15.9 * 1015

1.59 * 1016

A calculator might display 159 E14

159 * 1014Mantissa Exponent

Page 3: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

3

BinaryThe value of real binary numbers…

Scientific 22 21 20 . 2-1 2-2 2-3

Fractions . ½ ¼ ¾Decimal 4 2 1 . .5 .25 .125

101.101 = 4+1+1/2+1/8 = 4+1+.5+.125= 5.625

= 5 ⅝

1 0 1 . 1 0 1

Page 4: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

4

Binary FractionsThe value of real binary numbers…

Scientific 22 21 20 . 2-1 2-2 2-3

Fractions . ½ ¼ ⅛Decimal 4 2 1 . .5 .25 .125

101.101 = 4+1+1/2+1/8 = 4+1+.5+.125= 5.625

= 5 ⅝

1 0 1 . 1 0 1

Page 5: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

5

Binary FractionsThe value of real binary numbers…

Scientific 22 21 20 . 2-1 2-2 2-3

Fractions . ½ ¼ ⅛Decimal 4 2 1 . .5 .25 .125

101.101 = 4+1+1/2+1/8 = 4+1+.5+.125= 5.625

= 5 ⅝

1 0 1 . 1 0 1

Page 6: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

6

IEEE Single Precision

The number will occupy 32 bits

The first bit represents the sign of the number; 1= negative 0= positive.

The next 8 bits will specify the exponent stored in biased 127 form.

The remaining 23 bits will carry the mantissa normalised to be between 1 and 2.

i.e. 1<= mantissa < 2

Page 7: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

7

Basic Conversion

Converting a decimal number to a floating point number.

1.Take the integer part of the number and generate the binary equivalent.

2.Take the fractional part and generate a binary fraction

3.Then place the two parts together and normalise.

Page 8: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

8

IEEE – Example 1

Convert 6.75 to 32 bit IEEE format.1. The Mantissa. The Integer first. 6 / 2 = 3 r 0

3 / 2 = 1 r 1 1 / 2 = 0 r 1

2. Fraction next. .75 * 2 = 1.5

.5 * 2 = 1.0

3. put the two parts together… 110.11Now normalise 1.1011 * 22

= 1102

= 0.112

Page 9: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

9

= 0.112

IEEE – Example 1

Convert 6.75 to 32 bit IEEE format.1. The Mantissa. The Integer first. 6 / 2 = 3 r 0

3 / 2 = 1 r 1 1 / 2 = 0 r 1

2. Fraction next. .75 * 2 = 1.5

.5 * 2 = 1.0

3. put the two parts together… 110.11Now normalise 1.1011 * 22

= 1102

Page 10: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

10

IEEE – Example 1

Convert 6.75 to 32 bit IEEE format.1. The Mantissa. The Integer first. 6 / 2 = 3 r 0

3 / 2 = 1 r 1 1 / 2 = 0 r 1

2. Fraction next. .75 * 2 = 1.5

.5 * 2 = 1.0

3. put the two parts together… 110.11Now normalise 1.1011 * 22

= 1102

= 0.112

Page 11: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

11

IEEE Biased 127 Exponent

To generate a biased 127 exponent

Take the value of the signed exponent and add 127.

Example.

216 then 2127+16 = 2143 and my value for the exponent

would be 143 = 100011112

So it is simply now an unsigned value ....

Page 12: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

12

Possible Representations of an

Exponent

Binary Sign Magnitude 2's Complement

Biased 127 Exponent.

00000000 0 0 -127 {reserved}

00000001 1 1 -126 00000010 2 2 -125 01111110 126 126 -1 01111111 127 127 0 10000000 -0 -128 1 10000001 -1 -127 2 11111110 -126 -2 127 11111111 -127 -1 128

{reserved}

Page 13: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

13

Why Biased ?

The smallest exponent 00000000

Only one exponent zero 01111111

The highest exponent is 11111111

To increase the exponent by one simply add 1 to the present pattern.

Page 14: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

14

Back to the example

Our original example revisited…. 1.1011 * 22

Exponent is 2+127 =129 or 10000001 in binary.

NOTE: Mantissa always ends up with a value of ‘1’ before the Dot. This is a waste of storage therefore it is implied but not actually stored. 1.1000 is stored .1000

6.75 in 32 bit floating point IEEE representation:-

0 10000001 10110000000000000000000

sign(1) exponent(8) mantissa(23)

Page 15: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

15

Special cases

0 + Infinity and - infinity. Zero is a pattern that only contains ‘0’s

00000000000000000000000000000000 Positive Infinity is the pattern

011111111…. Negative Infinity is the pattern

111111111….

Page 16: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

16

Truncation and Rounding

Following arithmetic operations on a floating point number we may have increased the number of mantissa bits.

Since we will have a fixed storage (23 places) for the mantissa we require to limit these bits.

The simplest approach is to truncate the result prior to storage

Example0.1101101 stored in 4 bits

stored in 4 bits => 0.1101 ( loss 0.0000101 )

Page 17: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

17

RoundingIf lost digit is > ½ then add 1 to LSB

Example – in 4 bits

0.1101101 <- 0.1101 + 0.0001 = 0.1110 ( rounded UP)

0.1101011 <- 0.1101 ( rounded DOWN)

NOTE:

Rounding is always preferred to truncation partly because it is intrinsically more accurate , and because we end up with a FAIR error .

Page 18: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

18

Other Considerations

Truncation always undervalues the result, and can lead to a systematic error situation .

Rounding has one major disadvantage since it requires up to two further arithmetic operations .

Note. When we use floating point care has to be taken when comparing the size of numbers because we are generating binary fractions of a predefined length. There is always going to be the chance of recurring numbers etc like 1/3 in decimal 0.333333333333333333333 etc..

Page 19: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

19

From Floating Point Binary to Decimal

Example 1 01111011 11100000100000000000000

Sign = 1 therefore this number is a negative number.

Exponent 01111011 = 64+32+16+8+2+1 = 123

subtract the 127 = - 4Mantissa = 1.111000001

1.111000001 * 2- 4 -ve 0.0001111000001 1/16 + 1/32 +1/64+1/128+1/8192

or - 0.1173095703125

Page 20: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

20

Floating Point Maths

Floating point addition and subtraction.1. Make sure that the two numbers are of the same

magnitude. Their Exponents have to be equal.

2. We then add or subtract the mantissas

3. Starting with the existing exponent re-normalise if needed.

Page 21: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

21

Example

Example1.1* 23 + 1.1 * 22 Select the smaller number and make the mantissa smaller by

moving the point whilst increasing the exponent until the exponents match.

1.1 * 22 0.11 * 23

Add the mantissas

Re-normalise.

Page 22: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

22

Example

1.1* 23 001.1 23

+1.1 * 22 000.11 23

010.01 23

Re normalise 010.01 * 23

= 1.001 * 24

Page 23: 1 IEEE Floating Point Revision Guide for Phase Test Week 5

23

FP math

Floating Point MultiplicationAssume two numbers a x 2m b x 2n

Result (a x 2m ) x (b x 2n) = ( a x b ) x ( 2m+n )

Floating Point DivisionAssume two number a x 2m and b x 2n

Result (a x 2m ) / (b x 2n) = (a/b ) x 2m-n