cse 246: computer arithmetic algorithms and hardware design
DESCRIPTION
CSE 246: Computer Arithmetic Algorithms and Hardware Design. Winter 2004 Lecture 9. Instructor: Prof. Chung-Kuan Cheng. Topics:. Floating Point Numbers (IEEE P754) Standard Operations Exceptional Situations Rounding Modes. Standard. 2 32 Typically. Goal: Dynamic Range: - PowerPoint PPT PresentationTRANSCRIPT
CSE 246: Computer Arithmetic Algorithms and Hardware Design
Instructor:Prof. Chung-Kuan Cheng
Winter 2004
Lecture 9
CSE 246 2
Topics:
Floating Point Numbers (IEEE P754) Standard Operations Exceptional Situations Rounding Modes
CSE 246 3
Standard
232 Typically
Goal: Dynamic Range:
largest #/ smallest #
If too large, holes between #’s
CSE 246 4
Standard ulp (unit in the last place)
Difference between two consecutive values of the significand.
3 Parts x = s be
Sign Bit
8-bit exponent
Significand
CSE 246 5
Standard a1a2a3a4a5a6a7a8b1b2b3b22b23
1.* normalized number 0.* denormalized number0 0.b1b2b3b22b23 2-126
1 --------------------------------- 1. b1b2b3b22b23 2-126
2...253254 ------------------------------- 1. b1b2b3b22b23 2127
if bi = 0 for all i = 1,2,…,23, NaN otherwise
NaN Not a Number
CSE 246 6
Standard
0.01x2-3 = 0.00x2-2
Same number, so normalize to remove redundancy
Smallest Number0.00…01x2-126 = 1.0x2-23x2-126
= 1x2-149
1.1101111001110011100101
Difference between 2 #’s small for normalized
0.0001 2 times compared to magnitudes
0.0010
CSE 246 7
Standard - Examples. eeeeeeee nnnnnnnnnnnnnnnnnnnnnnn0.00000000 00000000000000000000000 = 0.000…0x2-126
1.00000000 00000000000000000000000 = 0
0.00000001 00000000000000000000000 = 1.000…0x2-126
- minimal normalized #
0.00000001 00000000000000000000001 = 1.000…1x2-126
.
.
.
0.01111111 00000000000000000000001 = 1.000…1x20
0.10000000 00000000000000000000001 = 1.000…0x21
CSE 246 8
Standard – Example Cont.0.11111110 00000000000000000000001 = 1.000…1x2127
0.11111110 11111111111111111111111 = 1.111…1x2127
- Normalized Maximum
0.11111111 00000000000000000000000 =
Nmin = 1.0 x 2-126
Nmax = (2 – 2-23)2127
CSE 246 9
Double Floating Point a1a2…a11b1b2…b52
000…00 0. b1b2…b52 x 2-1023
000…01 1. b1b2…b52 x 2-1022
.
.
.
011…11 1. b1b2…b52 x 20
100…00 1. b1b2…b52 x 21
.
.
.
111…10 1. b1b2…b52 x 21023
111…11 = if bi = 0 for all i = 1,2,…,52
CSE 246 10
Overflow/Underflow
NmaxNmin
SparserDenser
Overflow
Underflow
CSE 246 11
Addition/Multiplication s1xbe1 + (s2xbe2) = sxbe
= s1xbe1 + s2/be1-e2 x be1
= (s1 s2/be1-e2) x be1
(s1xbe1) x (s2xbe2) = (s1xs2)be1+e2
CSE 246 12
Exceptions
a/0 = if a > 0a/ = 0 if a != 0a·0 = 0a· = if a > 00· = invalid operation (NaN)0/0 = invalid operation (NaN)NaP op a = NaNa + = - = NaN
CSE 246 13
Rounding Mode Adder Output = Cout z1z0.z-1z-2…z-l GRS
Guard BitRound BitSticky Bit, OR of all bits below bit R
1.101 x 23
+1.110 x 23
11.011 x 23
1.1011x24 Normalize – need to round or
CSE 246 14
Rouding1.110 x 23
- 1.101 x 23
0.001 x 23
1.000 x 20 normalize
1.101 x 23
- 1.111 x 22
1.101 x 23
- 0.1111 x 23
0.1101 x 23
1.101 x 22
Guard bit
CSE 246 15
Rounding Round to the nearest even
toward 0 1.1011 Toward + 1.1100 Toward - 1.1011
CSE 246 16
Conventional Rounding Error
Rounding Error
1.10100 1.101 = 01.10101 1.101 = -0.251.10110 1.110 = +0.51.10111 1.110 = +0.25
Average Error = 0.5/4 = 0.125