chapter 13 numerical issues. dr. naim dahnoun, bristol university, (c) texas instruments 2002...
TRANSCRIPT
Chapter 13Chapter 13
Numerical IssuesNumerical Issues
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 2
Learning ObjectivesLearning Objectives
Numerical issues and data formats.Numerical issues and data formats. Fixed point.Fixed point. Fractional number.Fractional number. Floating point.Floating point. Comparison of formats and dynamic Comparison of formats and dynamic
ranges.ranges.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 3
Numerical Issues and Data FormatsNumerical Issues and Data Formats
C6000 Numerical C6000 Numerical Representation Representation
Fixed point arithmetic:Fixed point arithmetic: 16-bit (integer or fractional). 16-bit (integer or fractional). Signed or unsigned.Signed or unsigned.
Floating point arithmetic:Floating point arithmetic: 32-bit single precision.32-bit single precision. 64-bit double precision.64-bit double precision.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 4
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
2233 2222 2211 2200
00 00 00 00
Unsigned Unsigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 5
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 11 1100 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
2233 2222 2211 2200
00 00 00 11
UnsignedUnsigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 6
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 00
1122
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
2233 2222 2211 2200
00 00 11 00
UnsignedUnsigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 7
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
2233 2222 2211 2200
11 11 11 11
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 1111 00 00 0011 00 00 1111 00 11 0011 00 11 1111 11 00 0011 11 00 1111 11 11 0011 11 11 11
112233445566778899
101011111212131314141515
00 00 00 00 00UnsignedUnsigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 8
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 00 00
00 00 00 00 Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
SignedSigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 9
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 00 00
00 00 00 11 Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
00 00 00 11 11SignedSigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 10
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 00 00
00 00 11 00 Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
00 00 00 11 1100 00 11 00 22
SignedSigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 11
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 11
11223344556677
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
00 11 11 11
SignedSigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 12
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 1111 00 00 00
11223344556677-8-8
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
11 00 00 00
SignedSigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 13
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 1111 00 00 00
11223344556677-8-8
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
11 00 00 11
11 00 00 11 -7-7
SignedSigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 14
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 1111 00 00 0011 00 00 1111 00 11 0011 00 11 1111 11 00 0011 11 00 1111 11 11 0011 11 11 11
11223344556677-8-8-7-7-6-6-5-5-4-4-3-3-2-2-1-1
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
11 11 11 11
SignedSigned integer integer numbersnumbers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 15
Fixed Point Arithmetic - ProblemsFixed Point Arithmetic - Problems
The following equation is the basis of many The following equation is the basis of many DSP algorithms (See Chapter 1):DSP algorithms (See Chapter 1):
Two problems arise when using signed and Two problems arise when using signed and unsigned integers:unsigned integers: Multiplication overflow.Multiplication overflow. Addition overflow.Addition overflow.
1
0
N
k
knxkany
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 16
16-bit x 16-bit = 32-bit16-bit x 16-bit = 32-bit Example: using 4-bit representationExample: using 4-bit representation
24 cannot be represented with 4-bits.24 cannot be represented with 4-bits.
Multiplication OverflowMultiplication Overflow
33
88
2424
xx
00 00 11 11
11 00 00 00xx
11 00 00 0000 00 00 11
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 17
32-bit + 32-bit = 33-bit32-bit + 32-bit = 33-bit Example: using 4-bit representationExample: using 4-bit representation
16 cannot be represented with 4-bits.16 cannot be represented with 4-bits.
Addition OverflowAddition Overflow
11 00 00 00
11 00 00 00++
88
88
1616
++
00 00 00 0011
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 18
Fixed Point Arithmetic - SolutionFixed Point Arithmetic - Solution
The solutions for The solutions for reducingreducing the overflow the overflow problem are:problem are: Saturate the result.Saturate the result. Use double precision result.Use double precision result. Use fractional arithmetic.Use fractional arithmetic. Use floating point arithmetic.Use floating point arithmetic.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 19
Solution - Saturate the resultSolution - Saturate the result
Unsigned numbers:Unsigned numbers: If A x B If A x B 15 15 result = A x B result = A x B If A x B > 15 If A x B > 15 result = 15 result = 15
00 00 11 11
11 00 00 00xx
11 00 00 00
11 11 11 11
00 00 00 11
33
88
2424
1515SaturatedSaturated
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 20
Solution - Saturate the resultSolution - Saturate the result
Signed numbers:Signed numbers: If -8 If -8 A x B A x B 7 7 result = A x B result = A x B If If A x B > 7 A x B > 7 result = 7 result = 7 If If A x B < -8 A x B < -8 result = -8 result = -8
00 00 11 11
11 00 00 00xx
11 00 00 00
11 00 00 00
11 11 11 00
33
-8-8
-24-24
-8-8SaturatedSaturated
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 21
Solution - Double precision resultSolution - Double precision result
For a 4-bit x 4-bit multiplication hold the For a 4-bit x 4-bit multiplication hold the result in an 8-bit location.result in an 8-bit location.
Problems:Problems: Uses more memory for storing data.Uses more memory for storing data. If the result is used in another multiplication If the result is used in another multiplication
the data needs to be represented into single the data needs to be represented into single precision format (e.g. prod = prod x sum).precision format (e.g. prod = prod x sum).
Results need to be scaled down if it is to be Results need to be scaled down if it is to be sent to an A/D converter.sent to an A/D converter.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 22
Solution - Fractional arithmeticSolution - Fractional arithmetic
If A and B are fractional then:If A and B are fractional then: A x B < min(A, B)A x B < min(A, B) i.e. The result is less than the operands hence i.e. The result is less than the operands hence
it will never overflow.it will never overflow. Examples: Examples:
0.6 x 0.2 = 0.12 (0.12 < 0.6 and 0.12 < 0.2)0.6 x 0.2 = 0.12 (0.12 < 0.6 and 0.12 < 0.2) 0.9 x 0.9 = 0.81 (0.81 < 0.9)0.9 x 0.9 = 0.81 (0.81 < 0.9) 0.1 x 0.1 = 0.01 (0.01 < 0.1)0.1 x 0.1 = 0.01 (0.01 < 0.1)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 23
-2-200 22-1-1 22-2-2 22-(N-1)-(N-1)
++
Fractional numbersFractional numbers
Definition:Definition:
00 00 11
-2-200 22-1-1 22-2-2
11
22-(N-1)-(N-1)
00 11 11 11 = MAX= MAX
00 00 00 11 = 2= 2-(N-1)-(N-1)
11 00 00 00 = MAX+2= MAX+2-(N-1) -(N-1) = 1= 1
MAX = 1-2MAX = 1-2-(N-1)-(N-1)
Largest Largest Number:Number:
What is the largest number?What is the largest number?
-1-1 0.50.5 0.250.25
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 24
Fractional numbersFractional numbers
Definition:Definition:
00 00 11
-2-200 22-1-1 22-2-2
11
22-(N-1)-(N-1)
11 00 00
-2-200 22-1-1 22-2-2
00
22-(N-1)-(N-1)
= MIN = -1= MIN = -1
For 16-bit representation:For 16-bit representation: MAX = 1 - 2MAX = 1 - 2-15 -15 = 0.999969= 0.999969 MIN = -1MIN = -1 -1-1 x < 1 x < 1
Smallest Smallest Number:Number:
What is the smallest number?What is the smallest number?
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 25
Fractional numbers - Sign ExtensionFractional numbers - Sign Extension
To keep the same resolution as the To keep the same resolution as the operands we need to select these 4-bits:operands we need to select these 4-bits:
00 11 11 00a=a= = 0.5 + 0.25 = 0.75= 0.5 + 0.25 = 0.75
11 11 11 00b=b= = -1 + 0.5 + 0.25 = -0.25= -1 + 0.5 + 0.25 = -0.25
00 00 00 0000 11 11 00 ..
00 11 11 00 .. ..11 00 11 00 .. .. ..
00 11 00 0011 11 11 11
Sign extensionSign extension
11 11 11 00
xx
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 26
Fractional numbers - Sign ExtensionFractional numbers - Sign Extension
The way to do it is to shift left by one bit The way to do it is to shift left by one bit and store upper 4-bits or right shift by and store upper 4-bits or right shift by three and store the lower 4-bits:three and store the lower 4-bits:
00 11 11 00a=a= = 0.5 + 0.25 = 0.75= 0.5 + 0.25 = 0.75
11 11 11 00b=b= = -1 + 0.5 + 0.25 = -0.25= -1 + 0.5 + 0.25 = -0.25
00 00 0000 11 00
00 11 11 0011 00 11 00
.... ..
.. .. ..
00 11 00 0011 11 11 11
Sign extensionSign extension
11 11 11 00
xx
0000
1100
1100 000000000000
Sign extension bitsSign extension bits
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 27
CPUCPUMPY A3,A4,A6MPY A3,A4,A6NOP NOP
Q15Q15 s. x x x x x x x x x x x x x x x
s. y y y y y y y y y y y y y y yxx Q15 Q15
s.s z z z z z z z z z z z z z z z z z z z z z z z z z z z z z zQ30Q30
15-bit * 15-bit Multiplication15-bit * 15-bit Multiplication
Store toStore toData MemoryData Memory SHR SHR A6, A6,1515,A6,A6
STH STH A6,*A7 A6,*A7
s. z z z z z z z z z z z z z z zQ15Q15
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 28
‘‘C6000 C Data TypesC6000 C Data Types
TypeType SizeSize RepresentationRepresentation
char, signed charchar, signed char 8 bits8 bits ASCIIASCIIunsigned charunsigned char 8 bits8 bits ASCIIASCIIshortshort 16 bits16 bits 2’s complement2’s complementunsigned shortunsigned short 16 bits16 bits binarybinaryint, signed intint, signed int 32 bits32 bits 2s complement 2s complement unsigned intunsigned int 32 bits32 bits binarybinarylong, signed longlong, signed long 40 bits 40 bits 2’s complement2’s complementunsigned longunsigned long 40 bits 40 bits binarybinaryenumenum 32 bits 32 bits 2’s complement2’s complementfloatfloat 32 bits 32 bits IEEE 32-bitIEEE 32-bitdoubledouble 64 bits 64 bits IEEE 64-bitIEEE 64-bitlong doublelong double 64 bits 64 bits IEEE 64-bitIEEE 64-bitpointerspointers 32 bits 32 bits binarybinary
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 29
Pseudo assembly language:Pseudo assembly language:
Pseudo ‘C’ language:Pseudo ‘C’ language:
Fractional numbers - Sign ExtensionFractional numbers - Sign Extension
A0 = 0x80000000 ; initial valueA1 = 0.5 ; initial valueA2 = 0.5 ; initial valueA3 = 0 ; initial value
MPY A1, A2, A3 ; A3 = 0x10000000SHL A3,1,A3 ; A3 = 0x20000000STH A3, *A0 ; 0x2000 -> 0x80000000
or
MPY A1, A2, A3 ; A3 = 0x10000000SHR A3,15,A3 ; A3 = 0x00002000STH A3, *A0 ; 0x2000 -> 0x80000000
short a, b, result;int prod;
prod = a * b;prod = prod >> 15;result = (short) prod;
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 30
Fractional numbers - ProblemsFractional numbers - Problems
There are some problems that need to There are some problems that need to be resolved when using fractional be resolved when using fractional numbers.numbers.
These are:These are: Result of -1 x -1 = 1Result of -1 x -1 = 1 Accumulative overflow.Accumulative overflow.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 31
Problem of -1 x -1Problem of -1 x -1
We have seen that:We have seen that: -1-1 x < 1 x < 1 -1 x -1 = 1 which cannot be represented.-1 x -1 = 1 which cannot be represented.
Solution:Solution: There are two instructions that saturate the There are two instructions that saturate the
result if you have -1 x -1:result if you have -1 x -1:
SMPYSMPY SMPYHSMPYH
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 32
Problem of -1 x -1Problem of -1 x -1
In one cycle these instructions do the In one cycle these instructions do the following:following: Multiply.Multiply. Shift left by 1-bit.Shift left by 1-bit. Saturate if the sign bits are 01.Saturate if the sign bits are 01.
It can be shown that:It can be shown that:
Positive ResultPositive ResultNegative ResultNegative Result-1 x -1 Result-1 x -1 Result
Result of MPY(H)Result of MPY(H)00.xxx-xb00.xxx-xb11.xxx-xb11.xxx-xb01.xxx-xb01.xxx-xb
Result of SMPY(H)Result of SMPY(H)0.xxx-xb0.xxx-xb1.xxx-xb1.xxx-xb0.xxx-xb0.xxx-xb
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 33
Problem of Accumulative OverflowProblem of Accumulative Overflow
In this case the overflow is due to the summation.In this case the overflow is due to the summation.
Examples of overflow:Examples of overflow:
99
0k
knxkany
0x7fff + 0x0002 = 0x80010x7fff + 0x0002 = 0x8001
0x7ffe0x7ffe
0x00000x00000xffff0xffff
0x7fff0x7fff0x80010x8001
(positive number + positive number = negative number!)(positive number + positive number = negative number!)
0xffff + 0x0002 = 0x00010xffff + 0x0002 = 0x0001(negative number + positive number = negative number!)(negative number + positive number = negative number!)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 34
Problem of Accumulative OverflowProblem of Accumulative Overflow
Solutions:Solutions:(1)(1) Saturate the intermediate results by using these add instructions:Saturate the intermediate results by using these add instructions:
If saturation occurs the SAT bit in the CSR is set to 1. You must If saturation occurs the SAT bit in the CSR is set to 1. You must clear it.clear it.
(2)(2) Use guard bits:Use guard bits:
e.g. e.g. ADD ADD A1A1, , A2A2, , A1:A0A1:A0
SADDSADD SSUBSSUB
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 35
Problem of Accumulative OverflowProblem of Accumulative Overflow
Solutions:Solutions:(3)(3) Do nothing if the system is Do nothing if the system is Non-Gain:Non-Gain:
With a non-gain system the final result is always less than With a non-gain system the final result is always less than unity.unity.
Example system:Example system:
This will be non-gain if:This will be non-gain if:
99
0
1k
ka
99
0k
knxkany
1ix
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 36
Floating Point ArithmeticFloating Point Arithmetic
The C67xx support both single and The C67xx support both single and double precision floating point formats.double precision floating point formats.
The single precision format is as The single precision format is as follows:follows:
ss3131
ee3030
ee2222 2121
ee ee mm...... mm00
mm mm......
1-bit1-bit 8-bits8-bits 23-bits23-bits
value = (-1)value = (-1)sign sign * (1.mantissa) * 2* (1.mantissa) * 2(exponent-127)(exponent-127)
s = sign bits = sign bit
e = exponent (8-bit biased : -127)e = exponent (8-bit biased : -127)
m = mantissa (23-bit normalised fraction)m = mantissa (23-bit normalised fraction)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 37
Floating Point Arithmetic ExampleFloating Point Arithmetic Example
Example: Conversion between integer and floating point.Example: Conversion between integer and floating point.
Convert ‘dd’ to the IEEE floating point format:Convert ‘dd’ to the IEEE floating point format:
int dd = 0x6000 0000;int dd = 0x6000 0000;
flot1 = (float) dd;flot1 = (float) dd;
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 38
Floating Point Arithmetic ExampleFloating Point Arithmetic Example
flot1 = 0x4EC0 0000flot1 = 0x4EC0 0000
To view the value of “flot1” use:To view the value of “flot1” use:
VView: iew: MMemory:emory:AAddress= &flot1ddress= &flot1
We find:We find:
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 39
Floating Point Arithmetic ExampleFloating Point Arithmetic Example
Let us check to see if we have the same Let us check to see if we have the same number:number:
4 E C 0 0 0 0 00 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0s exponent mantissa
s = 0s = 0
e = 10011101b = 128+16+8+4+1 = 157e = 10011101b = 128+16+8+4+1 = 157
m = 0.100b = 0.5m = 0.100b = 0.5
float1 float1 = (-1)= (-1)00 * (1.5) * 2 * (1.5) * 2(157-127)(157-127) = 1.5 * 2 = 1.5 * 23030
= 1610612736 decimal= 1610612736 decimal
= 0x6000 0000= 0x6000 0000
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 40
Floating Point Arithmetic ExampleFloating Point Arithmetic Example
The previous example can be seen in:The previous example can be seen in: numerical.pjtnumerical.pjt Numerical_.wsNumerical_.ws
Use the mixed mode display to see the assembly code.Use the mixed mode display to see the assembly code.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 41
Floating Point IEEE StandardFloating Point IEEE Standard
Special values:Special values:
ss
0011ssss0011ss
ee
000000
0<e<2550<e<255255255255255255255
mm
0000
00mm0000
00
NumberNumber
0-0(-1)s * 0.m * 2-126
(-1)s * 1.m * 2e-127
+-NaN (not a number)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 42
Floating Point IEEE StandardFloating Point IEEE Standard
Dynamic range:Dynamic range: Largest positive number:Largest positive number:
e(max) = 255, e(max) = 255, m(max) = 1-2m(max) = 1-2-(23-1)-(23-1)
max max = [1 + (1 -2= [1 + (1 -2-24-24)] * 2)] * 2255-127255-127
= 3.4 * 10= 3.4 * 103838
Smallest positive number:Smallest positive number: e(min) = 0, e(min) = 0, m(min) = 0.5 (normalised 0.100…0b)m(min) = 0.5 (normalised 0.100…0b) minmin = 1.5 * 2= 1.5 * 2-127-127 = 8.816 * 10 = 8.816 * 10-39-39
value = (-1)value = (-1)sign sign * (1.mantissa) * 2* (1.mantissa) * 2(exponent-127)(exponent-127)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 43
Floating Point IEEE StandardFloating Point IEEE Standard
Dynamic range:Dynamic range: Largest negative number:Largest negative number:
e(max) = 255, e(max) = 255, m(max) = 1-2m(max) = 1-2-24 -24
max max = [-1 + (1 -2= [-1 + (1 -2-24-24)] * 2)] * 2255-127255-127
= -3.4 * 10= -3.4 * 103838
Smallest negative number:Smallest negative number: e(min) = 0, e(min) = 0, m(min) = 0.5 (normalised 1.100…0b)m(min) = 0.5 (normalised 1.100…0b) minmin = -1.5 * 2= -1.5 * 2-127-127 = -8.816 * 10 = -8.816 * 10-39-39
value = (-1)value = (-1)sign sign * (1.mantissa) * 2* (1.mantissa) * 2(exponent-127)(exponent-127)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 44
Floating/Fixed Point SummaryFloating/Fixed Point Summary
Floating point single precision:Floating point single precision:
Floating point double precision:Floating point double precision:ss
3131
ee3030
ee2323 2222
ee ee mm...... mm00
mm mm......
1-bit1-bit 8-bits8-bits 23-bits23-bits
ss6363
ee6262
ee5252 5151
ee ee mm...... mm00
mm mm......
1-bit1-bit 11-bits11-bits 52-bits52-bits
value = (-1)value = (-1)ss * 1.m * 2 * 1.m * 2e-127e-127
value = (-1)value = (-1)ss * 1.m * 2 * 1.m * 2e-1023e-1023
odd:even registersodd:even registers
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 45
Floating/Fixed Point Summary Floating/Fixed Point Summary (Short: N = 16;(Short: N = 16; Int: N = 32)Int: N = 32)
Unsigned integer:Unsigned integer:
Signed integer:Signed integer:
Signed fractional:Signed fractional: xx22N-1N-1 2200
xx xx......
2211
xx-2-2N-1N-1 2200
xx xx......
2211
xx-2-200 22-(N-1)-(N-1)
xx......xx22-1-1
xx22-2-2
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 46
Floating/Fixed Point Dynamic RangeFloating/Fixed Point Dynamic Range
Smallest Number Smallest Number (positive)(positive)
Largest Number Largest Number (positive)(positive)
Smallest Number Smallest Number (negative)(negative)
Floating Floating Point Point Single Single
PrecisionPrecision
3.4 x 103.4 x 103838
8.8 x 108.8 x 10-39-39
-3.4 x 10-3.4 x 103838
221616 - 1 - 1
11
-2-21616
16-bit16-bit
223232 - 1 - 1
11
-2-23232
32-bit32-bit
1-21-2-15-15
22-15-15
-1-1
16-bit16-bit
1-21-2-31-31
22-31-31
-1-1
32-bit32-bit
IntegerInteger
Fixed PointFixed Point
FractionalFractional
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 47
Numerical Issues - Useful TipsNumerical Issues - Useful Tips Multiply by 2: Multiply by 2: Use shift leftUse shift left Divide by 2:Divide by 2: Use shift rightUse shift right LogLog22N:N: Use shiftUse shift Sine, Cosine, Log:Sine, Cosine, Log: Use look up tablesUse look up tables To convert a fractional number to hex:To convert a fractional number to hex:
Num x 2Num x 21515
Then convert to hexThen convert to hex
e.g: convert 0.5 to hexe.g: convert 0.5 to hex 0.5 x 20.5 x 21515 = 16384 = 16384 (16384)(16384)decdec = (0x4000) = (0x4000)hexhex
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 48
Numerical Issues - 32-bit MultiplicationNumerical Issues - 32-bit Multiplication
It is possible to perform 32-bit multiplication using It is possible to perform 32-bit multiplication using 16-bit multipliers.16-bit multipliers.
Example: c = a x b (with 32-bit values).Example: c = a x b (with 32-bit values).
aahh aall
bbhh bbll
a =a =
b =b =
32-bits32-bits
a * b a * b == (a(ahh << 16 + a << 16 + all)* (b)* (bhh << 16 + b << 16 + bll))
== [(a[(ahh * b * bhh) << 32] + [(a) << 32] + [(all * b * bhh) << 16] + ) << 16] +
[(a[(ahh * b * bll) << 16] + [a) << 16] + [all * b * bl l ]]
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002
Chapter 13, Slide 49
LinksLinks
Further reading:Further reading: Understanding TMS320C62xx DSP Single-precision Understanding TMS320C62xx DSP Single-precision
Floating-Point Functions:Floating-Point Functions: \Links\spra515.pdf\Links\spra515.pdf TMS320C6000 Integer Division: TMS320C6000 Integer Division: \Links\spra707.pdf\Links\spra707.pdf
Chapter 13Chapter 13
Numerical IssuesNumerical Issues
- End -- End -