na10-02-floating_pkjjkkjoint.pdf

7/30/2019 Na10-02-floating_pkjjkkjoint.pdf

1/27

Chapter 3.

Approximation and Round-Off Errors

Speed: 48.X

Mileage: 87324.4X


2/27

2Significant Figures

Number of significant figures indicates precision. Significant digits of anumber are those that can be usedwith confidence, e.g., the number ofcertain digits plus one estimated digit.

53,800 How many significant figures?

5.38 x 104 3

5.380 x 104 4

5.3800 x 104 5

Zeros are sometimes used to locate the decimal point not significantfigures.

0.00001753 4

0.0001753 4

0.001753 4


3/27

3Approximations and Round-Off Errors

For many engineering problems, no analytical solutions.

Numerical methods yield approximate results. We cannotexactly compute the errors associated with numerical

methods. Only rarely given data are exact, since they originate from

measurements. Therefore there is probably error in the inputinformation.

Algorithm itself usually introduces errors as well, e.g.,unavoidable round-offs, etc

The output information will then contain error from both of thesesources.

How confident we are in our approximate result?

The question is how much error is present in ourcalculation and is it tolerable?


4/27

4

Accuracy

How close is a computed or measured value to thetrue value

Precision (or reproducibility)

How close is a computed or measured value topreviously computed or measured values.

Inaccuracy (or bias)

A systematic deviation from the actual value.

Imprecision (or uncertainty). Magnitude of scatter.


5/27

5Fig 3.2


6/27

6Error Definit ions

True Value = Approximation + Error

Et = True value Approximation (+/-)

valuetrue

errortrueerrorrelativefractionalTrue =

%100valuetrue

errortrueerror,relativepercentTrue

t=

True error


7/27

For numerical methods, the true value will be

known only when we deal with functions thatcan be solved analytically (simple systems). Inreal world applications, we usually not know

the answer a priori. Then

Iterative approach, example Newtonsmethod

ionApproximat

erroreApproximat

a=

ionapproximatCurrent

ionapproximatPrevious-ionapproximatCurrenta

=

(+ / -)


8/27

8

Use absolute value. Computations are repeated until stopping criterion is

satisfied.

If the following criterion is met

you can be sure that the result is correct to at least nsignificant figures.

sa

)%10(0.5100.5 n)-(2-ns ==

Pre-specified % tolerancebased on the knowledge ofyour solution


9/27

9Fig 3.3 decimal, binary


10/27

10Fig 3.4 Signed binary

1000 0000 0000 0001 = (-1)

2s complement

0000 0000 0000 0001 = 1

0000 0000 0000 0000 = 0

1111 1111 1111 1111 = -1

1111 1111 1111 1110 = -2

Number range ?, How to compute from ?a a


11/27

11Fractional number decimal, binary

321012 106105104103102101456.123 +++++=

375.5

125.025.014

212120212021011.101 321012

=

+++=

+++++=

fractionFixed point number


12/27

12Floating point number (base-10)

Chapter 3

156.78 0.15678x103 in a floatingpoint base-10 system

Suppose only 4decimal places to be stored

Normalized to remove the leading zeroes.

Multiply the mantissa by 10 and lower theexponent by 1

0.2941 x 10-1

1

10

1100294.0

029411765.0341

0


13/27

13Floating point number (binary, base-2)

32101011.0011.101 =

22101101.000101101.0 =

Fig 3.5


14/27

14Floating point number

Numbers such as , e, or cannot be expressed

by a fixed number of significant figures.

Computers use a base-2 representation, they cannot

precisely represent certain exact base-10 numbers.

Fractional quantities are typically represented in

computer using floating point form, e.g.,

7

exponent

Base of the number system usedmantissa

ebm


15/27

15Floating point number

Therefore

for a base-10 system 0.1m


16/27

16Chopping, Rounding

Example:

=3.14159265358 to be stored on a base-10 system

carrying 7 significant digits.=3.141592 chopping error t=0.00000065

If rounded

=3.141593 t=0.00000035

Some machines use chopping, because rounding adds

to the computational overhead. Since number ofsignificant figures is large enough, resulting choppingerror is negligible.


17/27

17Fig 3.6 - example


18/27

18Example 3.4 (p. 61)

2-3 (1 2-1+0 2-2+0 2-3)=0.062500 (the smallest)

2-3(1 2-1+0 2-2+1 2-3)=0.078125

2-3(1 2-1+1 2-2+0 2-3)=0.093750

2-3(1 2-1+1 2-2+1 2-3)=0.109375

Evenly spaced by

2-3(0 2-1+0 2-2+1 2-3)=0.015625

2-2 (1 2-1+0 2-2+0 2-3)=0.125000

2-2(1 2-1+0 2-2+1 2-3)=0.156250

2-2(1 2-1+1 2-2+0 2-3)=0.187500

2-2(1 2-1+1 2-2+1 2-3)=0.218750

Evenly spaced by

2-2(0 2-1+0 2-2+1 2-3)=0.03125


19/27

19Example 3.4 (p. 61)

22 (1 2-1+0 2-2+0 2-3)=2

22(1 2-1+0 2-2+1 2-3)=2.5

22(1 2-1+1 2-2+0 2-3)=3

22(1 2-1+1 2-2+1 2-3)=3.5

Evenly spaced by

22(0 2-1+0 2-2+1 2-3)=0.5

23 (1 2-1+0 2-2+0 2-3)=4

23(1 2-1+0 2-2+1 2-3)=5

23(1 2-1+1 2-2+0 2-3)=6

23(1 2-1+1 2-2+1 2-3)=7 (the largest)

Evenly spaced by

23(0 2-1+0 2-2+1 2-3)=1


20/27

20Fig 3.7


21/27

21IEEE Standard 754 Floating Point Numbers

Single precision (32-bit)

sign: 1 bit, exponent: 8 bits, mantissa: 23 bits

7 significant base-10 digits with range 10 -38 to 10 39

Double precision (64-bit) sign: 1 bit, exponent: 11 bits, mantissa: 52 bits

15-16 significant base-10 digits with range 10 -308 to 10 308


22/27

22Arithmetic Manipulations

Common Arithmetic operations

The mantissa of the number with the smaller exponentis modified so that the exponents are the same

0.1557 101+0.4381 10-1

1

1

0.1557 10

0.4381 10

1

1

1

0.1557 10

0.004381 100.160081 10

10.1600 10


23/27


Subtraction

0.3641 102 - 0.2686 102

2

2

0.3641 10

0.2686 10

2

2

2

0.3641 10

0.2686 10

0.0955 10

10.9550 10


24/27


Subtraction

0.7642 103 - 0.7641 103

3

3

0.7642 10

0.7642 10

3

3

3

0.7642 10

0.7641 10

0.0001 10

00.1000 10


25/27


Multiplication

0.1363 103 0.6423 103

Exponents are added

Mantissas are multiplied

3

1

0.1363 10

0.6423 10

3

1

2

0.1363 10

0.6423 10

0.08754549 10

10.8754549 10

1

0.8754 10

26


26/27

26Errors

Adding a large and a small number

0.4000 104+0.1000 10-2

4

4

0.4000 10

0.0000001 10

+

4

4

4

0.4000 10

0.0000001 10

0.4000001 10

+

40.4000 10

27


27/27

27Errors

Subtractive Cancellation

If2 4

2

b b acx

a

=

2 4b ac b b = %

" "b b small + =%

0.12345678

0.12345666

0.00000012

0.12345???

0.12345???

0.00000???

acb 42 >>

na10-02-floating_pkjjkkjoint.pdf

Documents