datorteknik floatingpoint bild 1 floating point number system corresponding to the decimal notation...
TRANSCRIPT
![Page 1: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/1.jpg)
Datorteknik FloatingPoint bild 1
Floating point
Number system corresponding to the decimal notation
1,837 * 10
significand exponent
a great number of corresponding binary standards exists
there is one common standard:
IEEE 754-1985 (IEC 559)
4
![Page 2: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/2.jpg)
Datorteknik FloatingPoint bild 2
IEEE 754-1985
Number representation
Single precision (32 bits)
sign: 1 bit
exponent: 8 bits
fraction: 23 bits
Double precision (64 bits)
sign: 1 bit
exponent: 11 bits
fraction: 52 bits
Single extended and double extended numbers exists inside the floating point hardware
![Page 3: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/3.jpg)
Datorteknik FloatingPoint bild 3
IEEE 754-1985
1 8 23
sign
exponent:excess 127binary integer
S E M
mantissa:sign + magnitude, normalizedbinary significand w/ hiddeninteger bit: 1.M
Single Precision:
actual exponent ise = E - 127
N = (-1) 2 (1.M)E-127
0 < E < 255
0 = 0 00000000 0 . . . 0 -1.5 = 1 01111111 10 . . . 0
Magnitude of numbers that can be represented is in the range:
2-126
(1.0) to 2127
(2 - 223)
which is approximately:
1.8 x 10-38
to 3.40 x 10 38
![Page 4: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/4.jpg)
Datorteknik FloatingPoint bild 4
IEEE 754-1985
Fraction part:23 / 52 bits;
0 ≤ x <1
Significand:1 + fraction part
“1” is not stored; “hidden bit”
corresponds to 7 resp. 16 decimal digits
Exponent:127 / 1023 added to the exponent;
“biased exponent”
corresponds to 10 - 10
resp. 10 - 10
-39 39
-308 308
![Page 5: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/5.jpg)
Datorteknik FloatingPoint bild 5
IEEE 754-1985
Special features:
Correct rounding of “halfway” result (to even number)
Includes special values:
NaN Not a number
∞ Infinity
-∞ - Infinity
Uses denormal number to represent
numbers less than 2
Rounds to nearest by default; Three other rounding modes exists.
Sophisticated exception handling
Emin
![Page 6: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/6.jpg)
Datorteknik FloatingPoint bild 6
Multiplication
(s1 * 2 ) * (s2 * 2 ) = s1*s2 *2
so, multiply significands and add exponents
Problem:Significand coded in signed-
magnitude - use unsigned multiplication and take care of sign
Round 2n bits significand to n bits significand
Compute new exponent with respect to bias
e1 e2 e1+e2
![Page 7: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/7.jpg)
Datorteknik FloatingPoint bild 7
Rounding
1. Multiply the two significands to get the 2n-bits product:
Case 1: x0 = 0, shift needed:
Case 2: x0 = 1, increment exponent, set g=r; r=s or r
x0 x1 x2 x3 x4 x5 g r s s s s
P A
x1 x2 x3 x4 x5 g r s s s s
P A
x0 x1 x2 x3 x4 x5 r s s s s s
P A
These four bitsOR:ed together(“sticky bit”)
guard roundbit bit
![Page 8: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/8.jpg)
Datorteknik FloatingPoint bild 8
Rounding
2: For both cases:
if r = 0, P is the correctly rounded product.
if r = 1 and s = 1, then P + 1 is the correctly rounded product
if r = 1 and s = 0, (the “halfway case”), then
P is the correctly rounded product if x5 (or g) is 0
P+1 is the correctly rounded product if x5 (or g) is 1
![Page 9: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/9.jpg)
Datorteknik FloatingPoint bild 9
Add / Sub
(s1 * e ) + (s2 * e ) = (s3 * e )
1: Shift summands so they have the same exponent.
(eg. if e2 < e1: shift s2 right and increment e2 until e1 = e2)
2: Add significands
3: Normalize number(shift s3 left and decrement e3 until
MSB = 1)
4: Round s3 correctly(under the common assumption
that more than 23 / 52 bits is internally used for addition)
Subtraction use the same method
e1 e2 e3
s3
![Page 10: Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d205503460f949f4ebe/html5/thumbnails/10.jpg)
Datorteknik FloatingPoint bild 10
Division
(s1 * 2 ) / (s2 * 2 ) = (s1 / s2) * 2e1 e1 e1-e2
so, divide significands and subtract exponents
Problem:Significand coded in signed- magnitude - use unsigned division (different algoritms exists) and take care of sign
Round n + 2 (guard and round) bits significand to n bits significand
Compute new exponent with respect to bias