chen10011 notes(1)

84
Engineering Maths 1 (Part 1) - Basic Statistics and Error Analysis Course Structure: Lecture 1 Nature of errors Summarising data: o Measures of central tendency – mean, median, mode o Measures of dispersion – variance, standard deviation, median absolute deviation Combination and propagation of errors Lecture 2 Distributions – normal, log-normal, Student’s t distribution Confidence limits Central limit theorem Tests for normality: o Kolmogorov-Smirnov o Lilliefors Lecture 3 Parametric statistical tests: o F test (variances) o t tests (means) Outliers Rejection of data Lecture 4 Regression and correlation Correlation coefficient Least squares fitting Lecture 5 Software packages: o Excel/Openoffice Calc N.J. Goddard Basic Statistics and Error Analysis 1

Upload: talha-tanweer

Post on 22-Sep-2014

118 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Chen10011 Notes(1)

Engineering Maths 1 (Part 1) - Basic Statistics and Error Analysis

Course Structure:

Lecture 1

Nature of errors Summarising data:

o Measures of central tendency – mean, median, modeo Measures of dispersion – variance, standard deviation, median absolute

deviation Combination and propagation of errors

Lecture 2

Distributions – normal, log-normal, Student’s t distribution Confidence limits Central limit theorem Tests for normality:

o Kolmogorov-Smirnovo Lilliefors

Lecture 3

Parametric statistical tests:o F test (variances)o t tests (means)

Outliers Rejection of data

Lecture 4

Regression and correlation Correlation coefficient Least squares fitting

Lecture 5

Software packages:o Excel/Openoffice Calco Sigmaploto Origino SPSS

N.J. Goddard Basic Statistics and Error Analysis 1

Page 2: Chen10011 Notes(1)

Introduction

This course is intended to provide a basic introduction to the subject of statistics and to impart the necessary skills for primary data analysis. The emphasis will be on the use of straightforward statistical tests and methods that will ensure that data can be clearly and unambiguously interpreted and used.

The principal thesis underlying this lecture course is that any quantitative result must be accompanied by an estimate of the errors it contains for it to be of real value. The five lectures will describe techniques that that can be used to determine the effect of errors as well as to provide estimates for uncertainty in a given measurement or calculation. The emphasis of this course is on the pragmatic rather than the theoretical.

Reading List

Statistics and Chemometrics for Analytical Chemistry, 6th Edition, J.N. Miller and J.C. Miller, Prentice-Hall, ISBN 0273730428.

Chemometrics: Statistics and Computer Applications in Analytical Chemistry, M. Otto, Wiley-VCH, ISBN 3527314180.

Statistical Procedures for Analysis of Environmental Monitoring Data and Risk Assessment, E.A. McBean and F.A. Rovers, Prentice-Hall, ISBN 0136750184.

Statistical Tables, J. Murdoch and J.A. Barnes, 4th Edition, McMillan, ISBN 0333558596.

In addition, there is a very useful Web resource for statistics at:

http://www.itl.nist.gov/div898/handbook/

N.J. Goddard Basic Statistics and Error Analysis 2

Page 3: Chen10011 Notes(1)

Introduction to Statistics

Classification of Data and Errors Summarising Data Summarising Dispersion

Classification of Data and Errors

Statistics may be defined as the collection, ordering and analysis of data. Data consists of sets of recorded observations or values. Any quantity that may be described by more than one value is a variable, and there are two types of variable:

Discrete: also known as step variables

Discrete variables are those that can be counted (one bean, two beans, three beans etc) or they may be described as variables that are described by a fixed set of values. In other words, each value of the variable can be associated with an integer index.

Continuous

Continuous variables are those described by a continuous range of values. The range of possible values may be limited or may extend from -∞ to +∞. The result is dependent upon the precision of the measurement or the accuracy of the observer.

Every measurement is subject to two types of errors:

Systematic: also known as determinate errors

These errors are built into the observation and affect the accuracy of measurement. They are caused by imperfections in the instruments or methods used to make the measurement. In theory, a determinate error may be quantified and corrected for. Figure 1 illustrates a very simple situation in which a systematic error might occur.

Random: also known as indeterminate errors

These are due to random and uncontrolled variations in the implementation of the measurement. Such errors affect the precision of the observation. Statistical methods may be used to assess the effect of a random or indeterminate error.

N.J. Goddard Basic Statistics and Error Analysis 3

Page 4: Chen10011 Notes(1)

Summarising Data

The Mean

The mean of a set of measurements is defined by Equation (1):

Equation (1)

where x is the sample mean, xi is a member of the sample and n is the size of the sample. The mean is also referred to as the first moment of a sample.

There are two other ways of reporting repeated measurements of the same variable, the mode and the median:

The Mode

The mode is the value of the most frequent observation in a set of data of the same variable. Furthermore, for data presented as a frequency histogram (a plot of number of occurrences against value of the variable) the mode may be evaluated using graphical techniques. In this case, the mode is the value that gives the tallest bar in the histogram

The Median

The median is the mid-point in a set of data of the same variable. If the number of observations is odd, the median is the centre value in a sorted list of observations, while for an even number of observations the median is the average of the two observations on either side of the centre.

N.J. Goddard Basic Statistics and Error Analysis 4

Figure 1. An example of a source of systematic error in a measurement.

V

Power SupplyTransducer

Stimulus

Copper AluminiumT

Page 5: Chen10011 Notes(1)

Measurements of Dispersion

The most obvious measurement of dispersion is of the range, which is the difference between the largest and smallest value observed for a variable. The range does not, however, convey any useful information about the distribution of values within the range.

Median Absolute Deviation (MAD)

The median absolute deviation is defined by Equation (2):

Equation (2)

The MAD is useful in statistical analysis as it is a way of independently estimating the population standard deviation, σ. In this case, MAD/0.6745 is used as the estimate of σ. We will come on to the difference between sample and population standard deviations later.

The Variance (S2)

The variance of a sample is given by Equation (3):

Sx x

n

ii

n

2

2

1

1

( )

Equation (3)

Using the square of the difference between a sample value and the mean ensures that the variance is always positive.

The standard form of Equation (3) can be rewritten to avoid the need to first calculate the mean then subtract it from each xi:

S

x

x

n

n

ii

n ii

n

2

2

1

1

2

1

Equation (3a)

In this case, all that needs to be calculated are the sums of x and x2.

Standard deviation (S)

Standard deviation is the square root of the variance and so may be described simply by Equation (4):

Sx x

n

ii

n

( )2

1

1

Equation (4)

Or using the square root of Equation (3a):

N.J. Goddard Basic Statistics and Error Analysis 5

Page 6: Chen10011 Notes(1)

S

x

x

n

n

ii

n ii

n

2

1

1

2

1

Equation (4a)

To use the alternate forms of these equations, we simply accumulate the sums of xi and xi2

and use these values in Equations (3a) or (4a).

NOTE ON THE EQUATIONS TO USE:

You should always use the forms involving sums (Equations 3a and 4a) instead of the formal definitions (Equations 3 and 4) as the formal definitions are prone to accumulated rounding errors. Later examples where you should use an alternative equation will also be noted.

Alternative Ways of Expressing Dispersion

There are other ways of expressing the standard deviation. The relative standard deviation expresses the ratio of the standard deviation to the mean as a percentage, as shown in Equation (5):

RSDS

x 100% Equation (5)

while the coefficient of variation simply expresses the dispersion as the ratio of the standard deviation over the sample mean, Equation (6):

Coefficient of VariationS

x Equation (6)

For scales that begin at zero, the coefficient of variation is independent of units, and consequently is sometimes a more convenient way of reporting dispersion. However, the coefficient of variation should not be used for scales that do not have a common zero origin or for data sets that contain negative values. If negative values are possible, the mean could turn out to be close to zero, leading to very large values for the coefficient of variation.

Example 1

20 mass measurements were made of a flask of reagent. The units and measurements were in grams:

12.475 12.469 12.481 12.46612.474 12.465 12.475 12.47312.481 12.472 12.482 12.47512.485 12.473 12.465 12.48512.468 12.477 12.450 12.513

The first task is to sort the data into ascending order. The mean is given by Equation (1) above, and in this case n is 20, so Equation 1 reduces to:

N.J. Goddard Basic Statistics and Error Analysis 6

Page 7: Chen10011 Notes(1)

xxi

i

1

20

20

or, x g 249 504

2012 4752

..

12.4572 g claims a greater precision than the observation was capable of and so is rounded to the same precision as the original data. So x g12 475. to three decimal places. The table below shows the individual measurements, their squares, deviations from the mean and the square of the deviation from the mean for this set of measurements, as well as the sums of these values.

Observation xi xi2 x xi

1 12.513 156.575169 0.038 0.0014442 12.485 155.875225 0.010 0.0001003 12.485 155.875225 0.010 0.0001004 12.482 155.800324 0.007 0.0000495 12.481 155.775361 0.006 0.0000366 12.481 155.775361 0.006 0.0000367 12.477 155.675529 0.002 0.0000048 12.475 155.625625 0.000 0.0000009 12.475 155.625625 0.000 0.00000010 12.475 155.625625 0.000 0.00000011 12.474 155.600676 -0.001 0.00000112 12.473 155.575729 -0.002 0.00000413 12.473 155.575729 -0.002 0.00000414 12.472 155.550784 -0.003 0.00000915 12.469 155.475961 -0.006 0.00003616 12.468 155.451024 -0.007 0.00004917 12.466 155.401156 -0.009 0.00008118 12.465 155.376225 -0.010 0.00010019 12.465 155.376225 -0.010 0.00010020 12.450 155.002500 -0.025 0.000625Sums: 249.504 3112.615078 - 0.002778

The variance, S2, of this sample is obtained by:

x xi 2 32 778 10.

S

x xii2

2

1

20

34

19

2 778 10

191462 10

..

S g 1462 10 0 0124. .

Alternatively,

N.J. Goddard Basic Statistics and Error Analysis 7

Page 8: Chen10011 Notes(1)

S g

3112 615078249 504

20

19

3112 615078 3112 612301

190 012

2

..

. ..

RSDS

x 100%

0 012

12 475100% 0 096%

.

..

What happens if you round too soon? The table below shows the result of rounding the squares of the deviations to different numbers of decimal places:

Obs xi x xi x xi 2

3 dp

x xi 2

4 dp

x xi 2

5 dp

x xi 2

6 dp1 12.513 0.038 0.001 0.0014 0.00144 0.001444

2 12.485 0.01 0 0.0001 0.00010 0.000100

3 12.485 0.01 0 0.0001 0.00010 0.000100

4 12.482 0.007 0 0 0.00004 0.000049

5 12.481 0.006 0 0 0.00003 0.000036

6 12.481 0.006 0 0 0.00003 0.000036

7 12.477 0.002 0 0 0.00000 0.000004

8 12.475 0 0 0 0.00000 0.000000

9 12.475 0 0 0 0.00000 0.000000

10 12.475 0 0 0 0.00000 0.000000

11 12.474 -0.001 0 0 0.00000 0.000001

12 12.473 -0.002 0 0 0.00000 0.000004

13 12.473 -0.002 0 0 0.00000 0.000004

14 12.472 -0.003 0 0 0.00000 0.000009

15 12.469 -0.006 0 0 0.00003 0.000036

16 12.468 -0.007 0 0 0.00004 0.000049

17 12.466 -0.009 0 0 0.00008 0.000081

18 12.465 -0.01 0 0.0001 0.00010 0.000100

19 12.465 -0.01 0 0.0001 0.00010 0.000100

20 12.45 -0.025 0 0.0006 0.00062 0.000625

Sums: 249.504 0 0.001 0.0024 0.00271 0.002778

S     0.007255 0.011239 0.011943 0.012092

To finish with this data set, we note that the mode (the most frequent observation) is 12.475g, while the median is given by the average of observations 10 and 11:

Median g g rounded

12 475 12 474

212 4745 12 475

. .. . ( )

In this case, the Mean, Mode and Median are the same.

The MAD of this set of measurements is the median of the absolute deviations from the median. The table below gives a sorted list of these mass measurements and their absolute deviations from the median.

N.J. Goddard Basic Statistics and Error Analysis 8

Page 9: Chen10011 Notes(1)

Observation xi

1 12.513 0.0382 12.450 0.0253 12.485 0.0104 12.485 0.0105 12.465 0.0106 12.465 0.0107 12.466 0.0098 12.482 0.0079 12.468 0.00710 12.481 0.00611 12.481 0.00612 12.469 0.00613 12.472 0.00314 12.477 0.00215 12.473 0.00216 12.473 0.00217 12.474 0.00118 12.475 0.00019 12.475 0.00020 12.475 0.000

Since the median of the absolute deviations is the average of observations 10 and 11, we can see that in this case MAD = 0.006 g. Our estimate of the population standard deviation will be σ = 0.006/0.6745 = 8.89510-3 g or 0.009 g (rounded to the same number of decimal places as the original measurements.) This is somewhat lower than the sample standard deviation calculated from the measurements. The estimate derived from the MAD is likely to be closer to the true value, as it effectively ignores the values furthest away from the median, which have a disproportionate effect on the calculation of the sample standard deviation (because the deviation from the mean is squared.)

NOTE ON ROUNDING:

While the results of your data manipulations should be presented to the same precision as the original data, avoid the trap of rounding too soon. Only round your data at the last stage of the calculation, keeping any extra significant figures until the final reading. This can be illustrated clearly in the alternative calculation of standard deviation above, where the answer depends on the small difference between two very large numbers. If these were rounded to, say, three decimal places the result would be erroneous. Similarly, in the table on the previous page, the differences x xi have three decimal places, so when squared require six decimal places. If these squares are rounded to fewer than six places, the final answer will almost certainly be wrong. In addition, if you are using a calculated value in subsequent calculations, use the unrounded value if possible to preserve the precision in the final answer.

NOTE ON UNITS:

N.J. Goddard Basic Statistics and Error Analysis 9

Page 10: Chen10011 Notes(1)

Do not forget to state the units of a measurement. In the example above, the units of the variable are grams (g), and it is incorrect to state a result without these units. The units for standard deviation are always the same as the units of the original measurement. You should note, however, that measures of dispersion such as the RSD and Coefficient of Variation are always dimensionless, as you are dividing one unit by the same unit (as the standard deviation and mean have the same units). You will lose marks in examinations if you leave out the units (or add units to a dimensionless quantity).

Figure 2 shows the effect of an outlier value on the mean, median and mode using the dataset shown above. The data has been generated by removing one of the readings at 12.475g and replacing it with the value shown on the x axis. The mean and mode are shown to be seriously affected by the outlier, while the median value is unaffected. The mode is particularly affected in this case as there are five values which then occur twice in the remaining dataset. If the outlier value is one of these five values, then a single value of the mode is defined, otherwise the dataset is multi-modal.

Propagation of Errors

Many measurements are composite; that is, they are the result of combining more than one measurement into a single value. For example, the density of a substance can be determined by dividing the mass by the volume. Since the mass and volume measurements each have

N.J. Goddard Basic Statistics and Error Analysis 10

Figure 2. Effect of an Outlier on the Mean, Median and Mode.

Page 11: Chen10011 Notes(1)

their own errors, if we are to estimate the error present in the density measurement, we need the know how to combine errors.

Random errors tend to cancel each other out (consider the drunkard’s walk), but systematic errors are vectors which do not cancel out. Thus, the propagation of systematic and random errors are undertaken in a slightly different ways

Consider the trivial example where the final result of an experiment, x, is given by the equation:

x = a + b

If a and b each have a systematic error of +1 then the whole systematic error of x is +2. On the other hand, for a and b having an indeterminate error of 1 the random error in x is not 2, for there will be occasions when the random error in a is positive while that in b is negative and vice versa.

In addition, if we have measuring instruments that have a finite resolution, we can use the propagation of determinate errors to determine the resolution of a composite measurement. In this context, the resolution of an instrument is typically half of the smallest scale reading. In the case of a ruler marked in millimetres, the resolution would be 0.5 mm.

Propagation of Determinate Errors

Figure 3 shows how an error in a length measurement rapidly propagates, as length is used to determine area and then volume. In many systems a small error in a critical factor results in a disproportionate bias in the final response. Error analysis of such a system enables key factors to be identified and hence controlled. Error analysis undertaken at the beginning of an experiment will often highlight errors which need to be improved, and sometimes experimental workers will change their design when error analysis highlights that their efforts are concentrated in the wrong place. Error analysis should be part of experimental design and planning at the early stages.

N.J. Goddard Basic Statistics and Error Analysis 11

Page 12: Chen10011 Notes(1)

For most practical applications the contribution of an uncertainty in a factor towards the uncertainty in the final response is obtained by the product of the (partial) differential of the response with respect to the factor and the uncertainty, or error, in the respective factor. For a two factor system:

y f x z ( , )

where the uncertainty in x and z is x and z. If

y x z .then

y y x x z z xz x z z x x z ( )( )

giving y x z z x x z

Since x z will be very small, we can discard it, giving:

y x z z x

but since

xy

zx

and z

y

xz

N.J. Goddard Basic Statistics and Error Analysis 12

Figure 3. Accumulation of Errors in Composite Measurements

A small error x in a length measurement:

x x

Leads to increased errors in area estimates:

Error is 2xx+x2

And even larger errors in volume estimates:

Error is 3x2x+3xx2+x3

Page 13: Chen10011 Notes(1)

then

yy

zz

y

xx

x z

Equation (7)

for a multi-variate system, y f x x xn ( , ),1 2 :

yy

xx

y

xx

y

xx

x xn x xn n x xn

n

1 2

12 1

2

1 1, , ,

Equation (8)

For resolution calculations:

yy

xx

y

xx

y

xx

x xn x xn n x xn

n

1 2

12 1

2

1 1, , ,

Equation (9)

Equation (9) above outlines the calculation of an experimental resolution. Measurements and their associated calculations have a resolution determined by the sensitivity of the instruments used. Unless explicitly stated it may be taken as half the scale division of the instrument concerned. Equation (9) is used to determine the experimental resolution at the levels for which measurements were made.

Example 3

Determination of the density, , of the material weighed in Example 1. The volume of the material was measured five times. The data obtained were: 6.0, 6.0, 5.8, 5.7, and 6.3 cm3. This data may be summarised as:

x = 5.98 cm3, Sv = 0.239 cm3, n = 5 and the resolution = 0.05 cm3

Or, rounded:

x = 6.0 cm3, Sv = 0.2 cm3, n = 5 and the resolution = 0.05 cm3

The density of the material is determined from Equation (10):

m

VEquation (10)

The resolution of the experiment is given by:

mm

VV

V m

since

m V

V

1and

V

m

Vm

2

then

N.J. Goddard Basic Statistics and Error Analysis 13

Page 14: Chen10011 Notes(1)

m

V

m V

V 2

Since m = 510-4g, V = 510-2 cm3, m = 12.475g and V = 6.0 cm3

Then

5 10

6 0

12 475 5 10

6 0174 10

4 2

22 3

.

.

.. g cm

The resolution of the density determination experiment was 0.017 g cm-3.

Determinate errors caused by external factors such as temperature, pressure, relative humidity or supply voltage for example should be corrected for using Equation (8). It is important not to confuse the calculation of an experimental resolution with the calculation of bias or error resulting from a system operating outside its calibrated parameters.

Propagation of Indeterminate Errors

There are a variety of treatments for analysing the propagation of indeterminate errors and these fall under the headings of linear combinations and multiplicative expressions. Examples of these treatments may be found in Statistics for Analytical Chemistry, 6th Edition, by JC Miller and JN Miller. Firstly, we will consider the case where the derived value is a function of a single variable, as shown in Equation (11):

Equation

(11)

Note the similar form of Equation (11) to Equation (9).

In some cases variance is used in the place of standard deviation in Equation (11) to give an estimate for the variance in the derived quantity, as shown in Equation (12):

Equation (12)

NOTE: is the square of the first differential and is not the same as , the second

differential.

And for y f x x xn ( , , )1 2

S Sy

xS

y

xS

y

xy x x xnn

21

2

1

2

22

2

2

2

2

Equation (13)

Equations (12) and (13) are strictly applicable only for linear functions of the form:

Since

N.J. Goddard Basic Statistics and Error Analysis 14

Page 15: Chen10011 Notes(1)

Then:

Equation (14)

For multiplicative combinations, the treatment is slightly different:

Equation (15)

When we have a combination such as:

We do not treat this as xxxx…, as the errors in x are no longer independent (the error is the same for each occurrence of x). In this case, we would use a term:

This arises from Equation (12).

An additional treatment of Equation (15) can be applied to the coefficients of variation of the different functions, resulting in Equation (16):

Equation (16)

where Ci is the coefficient of variation of i.

With such a range of possible methods for determining the propagation of indeterminate error through an experiment it is important that you are consistent in your treatment of data and that you clearly state how you have determined your estimates for mean and standard deviation. Whatever method you choose, it is important to ensure that the original data is summarised so that readers may use it to perform their own calculations, if they so choose. For many functions and applications Equation (15) yields an estimate for standard deviation that is larger than other treatments. This is a prudent measure to adopt, as it means that you will not be underestimating the variance in your measurement.

Example 4

N.J. Goddard Basic Statistics and Error Analysis 15

Page 16: Chen10011 Notes(1)

A common operation is the subtractive measurement of the mass of some material. In this case, a vessel is weighed empty and then weighed with the material. The mass of material is then determined by subtraction. What then is the indeterminate error in the weight of material?

Firstly, we must distinguish between repeated weighings, where the same object is weighed a number of times using the same instrument, and replicate determinations, where the analysis is repeated a number of times. In the first case the repeated measurements allow us to estimate the indeterminate error in the instrument, while the second case allows us to estimate the indeterminate error in the entire analysis procedure, including the weighing instrument.

In the case of repeated weighings, we should note that the repeated weighings of the empty and full vessel are not paired. This is, we cannot make an estimate of the standard deviation of the weight of precipitate by subtracting the first, then second, then third etc empty and full weights, then working out the standard deviation of these values. The example below will illustrate this:

Observation Full weight (g)

Empty weight (g)

Difference (g)

1 12.465 9.965 2.5002 12.466 9.966 2.5003 12.482 9.982 2.5004 12.468 9.968 2.5005 12.481 9.981 2.5006 12.481 9.981 2.5007 12.469 9.969 2.5008 12.472 9.972 2.5009 12.477 9.977 2.50010 12.473 9.973 2.500

Mean 12.473 9.973 2.500Standard deviation 0.006 0.006 0.000

The standard deviations of the full end empty weights are both 0.006 g, which we can take to be the indeterminate error in the weighing scales. The standard deviation of the difference, however, is zero. This cannot be correct, as we know that the weighing scales have a significant indeterminate error. This arises because of the false pairing of the weighings. If we take a different order of weighings, the results are different:

Observation Full weight (g)

Empty weight (g)

Difference (g)

1 12.465 9.965 2.5002 12.466 9.966 2.5003 12.482 9.968 2.5144 12.468 9.969 2.4995 12.481 9.972 2.5096 12.481 9.973 2.5087 12.469 9.969 2.5008 12.472 9.977 2.4959 12.477 9.981 2.49610 12.473 9.981 2.492

N.J. Goddard Basic Statistics and Error Analysis 16

Page 17: Chen10011 Notes(1)

Mean 12.473 9.982 2.501Standard deviation 0.006 0.006 0.007

Now we can see that the difference has a non-zero standard deviation, which is different from the standard deviations of the full and empty weighings. How do we make a good estimate of the standard deviation of the difference?

The relationship between the mass of material and the mass of the full and empty vessels is given by:

Where m1 is the mass of the vessel plus material and m2 is the mass of the vessel alone. This is a linear combination, so we can use Equation 14 to estimate the indeterminate error in the mass of material. In this case, So in this case, k0 = 0, k1 = 1, k2 = -1, k3..n = 0 and x = m.

Substituting these parameters in Equation 14 gives:

In general, if using the same instrument to perform the measurements, then:

So:

That is, the error in the final mass measurement is larger by a factor of √2 than the individual errors in the mass measurements. We can see why this should be by noting that the errors in the two measurements are independent of each other. In other words, the indeterminate error in one measurement has no effect on the magnitude of the error in the other measurement. In more mathematical language, we can consider that the errors are orthogonal. This can be illustrated graphically:

So, for our examples above, the standard deviations of the full and empty weights are both 0.006 g. Our estimate of the standard deviation of the mass of precipitate is then:

N.J. Goddard Basic Statistics and Error Analysis 17

Sm1

Sm2

Sm

Page 18: Chen10011 Notes(1)

If the values were paired, then it would be appropriate to use the standard deviation of the differences to estimate the indeterminate error in the mass of precipitate.

Distributions

The data in Example 1 may be plotted as number of occurrences against observed value. Such a graph is known as a frequency distribution or frequency histogram (Figure 4a). If the data are plotted as summed number of occurrences against observed value the graph is known as a cumulative frequency distribution curve, Figure 4b.

The standard deviation (S) provides a measure of the spread of data, but it does not necessarily indicate the way in which the data are distributed. In the example above the data are clustered around the centre of the distribution, which is the mean. We could in theory make an infinite number of mass measurements and so we could completely define the values of recorded mass.

The infinite set of possible observations is the population. The 20 measurements taken are a sample of the population. If there are no determinate errors then the mean of the population is the true value of the mass. The mean of the population is denoted by . Similarly the standard deviation of the population would be a measure of the true distribution, and is denoted by .

The true mean of a distribution is given by the symbol , and is the maximum of the probability density function, its formal definition being given by Equation (17):

x f x dx. ( ) Equation (17)

A measurement is an estimate of the probability density function of the observed variable. The position of the measurement is given by an estimate of the mean, while the shape, or C-spread, of the distribution is provided by an estimate for the standard deviation:

x is an estimate for . S is an estimate for .

All of the statistical tests described in these notes are parametric tests; they make the assumption that the data follow a particular distribution, in most cases the Normal or Gaussian distribution.

N.J. Goddard Basic Statistics and Error Analysis 18

Page 19: Chen10011 Notes(1)

As more observations are made of a variable so a continuous distribution begins to be defined. As the number of observations increases so too does the quality of the definition of the distribution until, when an infinite number of observations are available, the distribution is completely defined.

If the distribution curve obtained from an infinite number of observations is normalised so that the area under it is equal to 1 then the function that describes that distribution is known as the probability density function (PDF). The area under the probability density function is 1 (by definition) and is the probability of observing x over all possible values:

N.J. Goddard Basic Statistics and Error Analysis 19

Figure 4

Page 20: Chen10011 Notes(1)

P x F x dx( ) ( ).

1 Equation (18)

Consequently the area under the curve between two values for x is the probability of observing a value for x in the defined range, see Figure 1 and Equation (19):

P x x F x dxx

x x

( ) ( ).

Equation (19)

There are many types of probability density function, but the three most important are:

Normal distribution, also known as a Gaussian distribution Poisson distribution, also known as a Stochastic distribution Binomial distribution

The probability density function of most physical measurements can be described or approximated by a normal distribution. Other important distributions include 2, exponential and bivariate. It should be emphasised though, that non-normality is rare and that some distributions which are not normal may be rendered normal by taking the logarithm of the variable. In addition, binomial distributions approximate towards a normal distribution for large numbers of observations.

A normal distribution is defined by and and the general form is given in Equation (20), and Figure 5 summarises the form of a normal distribution with a mean of 0 and a standard deviation of 1 (N(0,1)).

f x e

x( )

1

2 2

2

2

Equation (20)

For the normal distribution, the area under the curve bounded by 1 will contain approximately 68.3% of the population, increasing the bounded area to 2 will draw in approximately 95.5% of the population, while a further increase to 3 will increase the proportion of the population to approximately 99.7%, as illustrated in Figure 6.

N.J. Goddard Basic Statistics and Error Analysis 20

Page 21: Chen10011 Notes(1)

Figure 5. A Normal distribution having a mean of 0 and a standard deviation of 1. (N(0,1)).

Figure 6. Areas under the Normal distribution for 1, 2 and 3.

x

-4 -2 0 2 4

Fre

quency

0.0

0.1

0.2

0.3

0.4

0.5

x

-4 -2 0 2 4

Fre

quency

0.0

0.1

0.2

0.3

0.4

0.5

x

-4 -2 0 2 4

Fre

qu

en

cy

0.0

0.1

0.2

0.3

0.4

0.5

~68.3% ~95.5% ~99.7%

1 2 3

Another important distribution is the log-normal. This is often found in cases where the variable cannot take values below a particular limit, unlike the normal distribution which is defined from -∞ to +∞. Examples include aerosol particle size distributions and antibody concentrations in blood, where the variable cannot go below zero. The log-normal distribution can be converted into a normal distribution by taking the log of the variable, as shown in Figure 7.

N.J. Goddard Basic Statistics and Error Analysis 21

x

-4 -2 0 2 4

Fre

quen

cy

0.0

0.1

0.2

0.3

0.4

0.5

x x+x

Area under the curveis the probability thatthe variable is in the

range x to x+∆x

Page 22: Chen10011 Notes(1)

Figure 7. The Log-Normal distribution and its transformation into a Normal distribution.

Log(x)

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

y

0.0

0.1

0.2

0.3

0.4

0.5

x

0 2 4 6 8 10 12

y

0.0

0.1

0.2

0.3

0.4

0.5

log(x)

The final distribution we will consider is the t-distribution. The derivation of the t-distribution was first published in 1908 by William Sealy Gosset, who worked at the Guinness Brewery in Dublin. He was not allowed to publish under his own name, so the paper was written under the pseudonym “Student”. The t-distribution and the associated theory became well-known through the work of R.A. Fisher, who called the distribution “Student's distribution”.

Gossett studied the distribution of:

And showed it was of the form:

Where = n – 1. is known as the “degrees of freedom” of the variable. In this case it is one less than the number of observations. This is because if we know the mean we can determine any observation from the remaining n – 1 observations. In other words, we only have n – 1 independent observations. The t-distribution is independent of μ and σ, so no estimate of σ is required, S can be used instead. For large n, the t-distribution tends towards the normal distribution, while at low n the tails of the t-distribution are higher than those of the normal distribution. Figure 8 shows the normal distribution (black line) against the t distribution for degrees of freedom of 1, 2, 3, 4, 5, 10, 15, 20 and 30. The t distribution is of use in establishing probabilities where the number of observations is less than ~30, while the normal distribution can be used where the number of observations is > 30 as the difference between the two distributions is then insignificant.

N.J. Goddard Basic Statistics and Error Analysis 22

Page 23: Chen10011 Notes(1)

Figure 8. The Normal distribution compared to the t distribution for degrees of freedom of 1, 2, 3, 4, 5, 10, 15, 20 and 30.

x

-4 -2 0 2 4

Pro

babi

lity

0.0

0.1

0.2

0.3

0.4

Normalt, = 1t, = 2t, = 3t, = 4t, = 5t, = 10t, = 15t, = 20t, = 30

Sampling Distributions

If a limited set of observations is made on a variable, a range of values is obtained and from these the sample mean and standard distribution can be obtained. It is unlikely that the sample mean is equal to the true value for the mean, , and similarly that the sample standard deviation S is equal to the true standard deviation .

Furthermore, if another set of readings is taken giving new values for the sample mean and standard deviation it is unlikely that these new values obtained in a second set would agree with those obtained from a first set of observations.

If this process is repeated a distribution for x is obtained, and this distribution is the sampling distribution of the mean. The sampling distribution of the mean has a mean equal to that of the original population. Its standard deviation, however, is different and is referred to as the standard error of the mean and is defined in Equation (21):

SEMn

Equation (21)

where SEM is the standard error of the mean, is the standard deviation of the observed variable's PDF and n is the number of observations per sample.

N.J. Goddard Basic Statistics and Error Analysis 23

Page 24: Chen10011 Notes(1)

The obvious corollary to this definition is that the more measurements that are used to define a variable the more precise the result will be.

The Central Limit Theorem

The central limit theorem is important and lies at the centre of many statistical techniques applied to experimental data. It may be summarised as follows.

If we take samples randomly from a non-normally distributed population, the resulting distribution of the mean becomes more normally distributed as the sample size increases. More generally, the central limit theorem states that the distribution of a sum or average of many random quantities is close to normal. This is true even if the quantities are not independent (as long as they are not too strongly associated) and even if they have different distributions (as long as no one random quantity is so large it outweighs all the others). It is also only true if the underlying distributions have a finite standard deviation.

The central limit theorem suggests why normal distributions are found to be common models for observed data. Any measurement that is the sum of many small random influences will have an approximately normal distribution, even if the distributions for these individual influences are not normally distributed. This theorem is also important because many statistical tests assume that data are normally distributed, and the central limit theorem shows that normal or near-normal distributions occur naturally.

To summarise, the sample mean of a random sample of a population having a mean and a standard deviation will have:

( )x

( )xn

The sample mean is an unbiased estimator of the population mean . In addition, the central limit theorem indicates that for large samples, the sampling distribution of x is approximately

normally distributed with a mean of and a standard deviation of n .

Figure 9 illustrates how increasing sample size changes the observed distribution of even non-normally distributed variables to a more normal distribution.

N.J. Goddard Basic Statistics and Error Analysis 24

Page 25: Chen10011 Notes(1)

Accuracy and Precision

Accuracy is defined by the trueness of a measurement. An accurate measurement is one which produces a value for x equal to , without any systematic error.

A precise measurement produces a value for S that is as close to zero as possible. Indeterminate error may arise from the variable under observation as well as from the measurement technique being applied. We assume that the variable under observation has no variance associated with it. If the variable does have some variance, then a precise measurement would produce a sample standard deviation (S) as close as possible to the underlying population standard deviation () of the observed variable.

It is important to remember that precision and accuracy describe two different properties of a measurement, and that precise data does not imply accurate data and vice-versa. Accuracy is tested by calibration and validation methods, in particular through the analysis of certified national and international standards.

Precision, that is indeterminate error, is best reported through the use of confidence limits at a specified level of probability. The use of confidence limits has much to recommend it, in that

N.J. Goddard Basic Statistics and Error Analysis 25

Figure 9. The Central Limit Theory in action.

Page 26: Chen10011 Notes(1)

it explicitly states that indeterminate errors are being reported, whereas a bold statement giving a range may give the impression that it represents a determinate error.

A clear and straightforward format for reporting confidence limits would be:

x E at the P% confidence limit for n measurements

Figure 10 shows the relationship between the form of the probability distribution function and the terms accurate and precise.

Confidence Limits

Confidence limits define a range with a probability that the true value, , lies within it. The format is:

x E at the P% confidence level for n measurements. Equation (22)

Confidence limits are determined by the value for the standard deviation. However, as shown previously, for a small sample (n < ~30) then the estimate, S, is unlikely to be an accurate assessment of the true value, . Consequently, Student's t-distribution is used to derive a value for E,

E tS

np n , 1 Equation (23):

Example 2

N.J. Goddard Basic Statistics and Error Analysis 26

Figure 10. Accuracy and Precision

Observed Value

-15 -10 -5 0 5 10 15

No

rma

lise

d N

um

be

r o

f O

bse

rva

tio

ns

0.0

0.1

0.2

0.3

0.4

0.5

A

B

C

D

A: Precise and accurateB: Imprecise and accurateC: Precise and inaccurateD: Imprecise and inaccurate

Page 27: Chen10011 Notes(1)

The mass measurements used in Example 1 have been summarised as:

N = 20, x = 12.475 g, S = 0.012 g

The data are rounded to the resolution of the original measurement.

Table 1 is a copy of a table of Student's t-distribution for different percentage confidence limits. This is similar in layout to Table 7 (page 17) in Murdoch and Barnes, with the addition of column headings for 2, the two-tailed probability. From Table 1 it can be seen that for ν = n - l = 19 the values for t are 2.093, 2.861 and 3.883 for the 95, 99 and 99.9% probabilities respectively. We use the two-tailed value in this case (use the column with the appropriate 2 value, 95% confidence is 2 = 0.05, 99% 2 = 0.01 etc). The calculation of the 95% confidence limits would be as follows:

Mass

Mass = 12.475 0.0056g at the 95% confidence limit for 20 measurements

In this case the error is reported to two significant figures as rounding to the same resolution as the original mass measurement would result in a significantly different value of probability.

The confidence level simply says that there is 95%, 99% or 99.9% confidence that the true values lies within the specified confidence limits. In other words, there is less than a 1 in 20 (for 95%), 1 in 100 (for 99%) or 1 in 1000 (for 99.9% confidence level) that the true value lays outside the confidence limits. The drawback to using higher confidence levels is that the confidence limits have to be drawn wider (the t value increases).

If the number of observations is > 30, we can use the Normal distribution instead of the t distribution, as the two are virtually the same. Thus, for 95, 99 and 99.9% confidence levels the appropriate values would be 1.960, 2.576 and 3.291, as shown in the last row of Table 1, where the Normal and t-distributions are the same.

N.J. Goddard Basic Statistics and Error Analysis 27

Page 28: Chen10011 Notes(1)

Table 1. Critical Values for t-Test

= 0.12 = 0.2

0.050.1

0.0250.05

0.010.02

0.0050.01

0.00250.005

0.0010.002

0.00050.001

1 3.078 6.314 12.706 31.821 63.657 127.321 318.309 636.6192 1.886 2.920 4.303 6.965 9.925 14.089 22.327 31.5993 1.638 2.353 3.182 4.541 5.841 7.453 10.215 12.9244 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.6105 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.8696 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.9597 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.4088 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.0419 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.78110 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.58711 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.43712 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.31813 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.22114 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.14015 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.07316 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.01517 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.96518 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.92219 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.88320 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.85021 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.81922 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.79223 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.76824 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.74525 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.72526 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.70727 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.69028 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.67429 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.65930 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.64640 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.55160 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460120 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291

The values are used for the 1-tailed test, and the 2 values for the 2-tailed test.

N.J. Goddard Basic Statistics and Error Analysis 28

Page 29: Chen10011 Notes(1)

Statistical Tests

Indeterminate error means that it is unlikely that the mean observed value of a variable will exactly agree with a previous set of observations, or the mean derived from an alternative measurement technique. In evaluating data a decision often needs to be made as to whether the difference is real, or is it due to indeterminate error in each of the measurements? When the difference between means is small compared to their respective standard deviations statistical tests can be used to support the judgment of the analyst.

Statistical tests are no replacement for common sense. Sometimes statistical tests result in non-sensible conclusions, and in such cases it is up to the experimental worker to decide the result. In many cases, the decision will be to seek more experimental data.

The statistical tests described here are:

Kolmogorov-Smirnov and Lilliefors tests, used to determine the probability that a set of observations is drawn from a particular distribution.

F-tests, used for the comparison of standard deviations. t-tests, which are used to compare means.

Before we can perform any statistical test, we must first establish a null hypothesis (H0). This

is an exact statement of something we initially suppose to be true. For example, we may propose that the means of two samples are the same, and that any observed difference is a result of random error:

The alternate hypothesis (H1) not simply the opposite of H0. In this case, there are three

possibilities:

The first two alternate hypotheses are one-tailed. The third alternate hypothesis is two-tailed. We use one-tailed tests where the direction of difference is important, while two-tailed tests are used where the direction of difference is unimportant. We also need to establish beforehand the tails of the test. For example, if we were performing a clinical trial of a drug designed to reduce blood pressure the null hypothesis would be that the blood pressure before and after treatment would be the same, while the alternate hypothesis would be that the mean blood pressure after treatment would be lower than before treatment. In this case we would use a one-tailed test because we have an expectation before performing the trial that the direction of difference would be important.

Alternatively, if we are just testing whether two analytical methods give the same result, the alternate hypothesis would be simply that the two results are different, and we would use a two-tailed test.

As with the setting of confidence limits, we also need to set an appropriate confidence level before performing the test. The confidence level (P) is expressed as a percentage, and can be related to the probability (α) in the following ways:

N.J. Goddard Basic Statistics and Error Analysis 29

Page 30: Chen10011 Notes(1)

For a one tailed test:

For a two tailed test:

There are two types of possible errors when performing statistical tests:

Type 1: rejection of the null hypothesis even though it is in fact true Type 2: acceptance of the null hypothesis even though it is false

Figure 11 illustrates the meaning of the two types of error. The solid line shows the sampling distribution of the mean if the null hypothesis (the exact statement) is true. The dashed line shows a possible distribution that fits the alternate hypothesis.

Figure 11. Illustration of Type I and Type II Errors.

Reducing the chance of a Type 1 error increases the chance of a Type 2 error, and conversely reducing the chance of a Type 2 error increases the chance of a Type 1 error. We can see in Figure 11 that if we increase we reduce the area (or probability) of the Type 1 error, but at the expense of increasing the area (probability) of the Type 2 error. Only by increasing the sample size can we reduce the chances of both types of error. This is because we then reduce the standard error of the mean. Figure 12 illustrates this concept.

Figure 12. Increasing sample size to reduce both Type 1 and Type 2 errors.

N.J. Goddard Basic Statistics and Error Analysis 30

Type I errorType II error

xc

Sampling distribution of mean

if null hypothesis is true

Sampling distribution of mean

if alternate hypothesis is true

Page 31: Chen10011 Notes(1)

Tests for Normality

If we look at the weight data given earlier, we can plot it as a frequency distribution (Figure 4a). We can see that the data appears at least approximately normally distributed. More rigorously, we can create a fractional cumulative frequency distribution and plot this against the value normalised to the normal distribution with a mean of 0 and a standard deviation of 1 (N(0,1)). This is done by subtracting the mean and dividing by the standard deviation:

(Equation 22)

The fractional cumulative frequency for any value is simply given by the cumulative frequency divided by the number of data points PLUS ONE:

fractional cumulative frequency

cumulative frequency

n 1

The divisor is n+1 to ensure that the centre of the fractional cumulative frequency distribution is at 0.5. If the divisor was n, then the range of fractional cumulative frequencies would vary from a minimum of 1/n to n/n, giving a mean or central value of (n+1)/2n. Using the divisor n+1 means that the range of fractional cumulative frequencies would vary from a minimum of 1/(n+1) to n/(n+1), giving a mean or central value of (n+1)/2(n+1), or 0.5.

We can replot our weight data as the fractional cumulative frequency against the standard normal variable with an overlaid line showing the expected cumulative distribution for a normal distribution, as shown in Figure 13.The individual points are close to the expected frequencies (as shown by the solid line). This indicates that this dataset is reasonably close to a normal distribution. This method can be performed manually using Normal probability graph paper, where the Y axis has been made non-linear to give a straight line instead of the sigmoidal curve shown below in Figure 13.

Figure 13. Graph of Data Points on Cumulative Frequency Plot

N.J. Goddard Basic Statistics and Error Analysis 31

Type I errorType II error

xc

Type I errorType II error

xc

Increased number of observations

Page 32: Chen10011 Notes(1)

Once we have this graph, how can we tell if the individual data points are far enough from the line to be considered not part of the normal distribution? We can apply the Kolmogorov-Smirnov (K-S) test to determine if any of the points lie so far from the line that the data can be considered non-normally distributed. In this test (and its variants) the null hypothesis is that the observations are all drawn from the hypothesized distribution, and the alternate hypothesis is that they are not all drawn from the hypothesized distribution. The maximum deviation from the expected cumulative frequency is determined and compared to a table of critical values at a given confidence level. If the maximum deviation is greater than this critical value, then the null hypothesis is rejected and the alternate hypothesis accepted.

Two tests are possible with this method; the first determines whether the data fits a particular normal distribution whose parameters are determined in advance and the second whether the data fits a normal distribution whose parameters are the sample mean and standard deviation. The first method, which requires the distribution to be known in advance, uses the critical values in Table 3 (also Table 16, page 28 in Murdoch and Barnes). The second method uses a modified table of critical values derived by Hubert Lilliefors, with Table 4 giving the critical values for a number of different confidence levels. There is no table of Lilliefors values in Murdoch and Barnes, so this will be given in the paper if an exam question requires the use of this table. We will use the second method for this data set. The two last columns of the table are the expected cumulative frequency for a given value of the standard normal value and the absolute difference between the expected and actual cumulative frequencies. To apply the Lilliefors variant of the K-S test, we first establish our null hypothesis, that the data comes from a normally distributed population with a mean and standard deviation equal to the sample mean and standard deviation. We then find the maximum difference between the expected and actual cumulative frequencies (0.119). We then compare this to the critical value from the Lilliefors table at the appropriate confidence level (95% in this case). For 20

N.J. Goddard Basic Statistics and Error Analysis 32

Page 33: Chen10011 Notes(1)

values we find the critical value to be 0.192. Our maximum difference is less than the critical value, so we accept the null hypothesis that the data are normally distributed with a mean given by the sample mean and a standard deviation given by the sample standard deviation.

In this case, we have generated the standard normal values using the mean and standard deviation determined from the data, so we are using Lilliefors’ variant of the K-S test. To determine if the data fit this particular normal distribution, we simply use the mean and standard deviation of that distribution to generate the standard normal values. The expected cumulative frequency can be obtained from tables, or generated directly in spreadsheets such as Microsoft Excel.

Table 5 gives the area in the upper tail of the Normal distribution. This is the same as the table in Murdoch and Barnes. The required value can be found from this table very simply. If the SNV whose cumulative frequency you wish to determine is negative, you simply find the row that corresponds to the first two digits of the SNV (ignoring the sign), then look along this row to the column that corresponds to the third digit. That is then the expected cumulative frequency. F the SNV is positive, then you look up the cumulative frequency as before, then subtract it from 1 to get the required value. As an example, consider the SNV of 0.579 in the table below. Rounded, this is 0.58, which gives a value of 0.2810. To get the correct value, we subtract this from 1 to give 0.7190. This is close to the true value in the table, which was generated directly from the unrounded SNV of 0.5790.

Table 2. Sorted list of weights showing the actual and expected cumulative frequencies.

Value Standard Normal Value (normalised to N(0,1))

Number of occurrences

Cumulative frequency

Fractional cumulative frequency

Expected cumulative frequency

|expected-actual| cumulative frequency

12.450 -2.0679 1 1 0.0476 0.0193 0.028312.465 -0.8272 2 3 0.1429 0.2041 0.061212.466 -0.7444 1 4 0.1905 0.2283 0.037812.468 -0.5790 1 5 0.2381 0.2813 0.043212.469 -0.4963 1 6 0.2857 0.3098 0.024112.472 -0.2481 1 7 0.3333 0.4020 0.068812.473 -0.1654 2 9 0.4286 0.4343 0.005712.474 -0.0827 1 10 0.4762 0.4670 0.009212.475 0.0000 3 13 0.6190 0.5000 0.119012.477 0.1654 1 14 0.6667 0.5657 0.101012.481 0.4963 2 16 0.7619 0.6902 0.071712.482 0.5790 1 17 0.8095 0.7187 0.090812.485 0.8272 2 19 0.9048 0.7959 0.108912.513 3.1432 1 20 0.9524 0.9992 0.0468

N.J. Goddard Basic Statistics and Error Analysis 33

Page 34: Chen10011 Notes(1)

Table 3. Kolmogorov-Smirnov critical values

n α=0.20 α=0.15 α=0.10 α=0.05 α=0.011 0.900 0.925 0.950 0.975 0.9952 0.684 0.726 0.776 0.842 0.9293 0.565 0.597 0.642 0.708 0.8284 0.494 0.525 0.564 0.624 0.7335 0.446 0.474 0.510 0.565 0.6696 0.410 0.436 0.470 0.521 0.6187 0.381 0.405 0.438 0.486 0.5778 0.358 0.381 0.411 0.457 0.5439 0.339 0.360 0.388 0.432 0.51410 0.322 0.342 0.368 0.410 0.49011 0.307 0.326 0.352 0.391 0.46812 0.295 0.313 0.338 0.375 0.45013 0.284 0.302 0.325 0.361 0.43314 0.274 0.292 0.314 0.349 0.41815 0.266 0.283 0.304 0.338 0.40416 0.258 0.274 0.295 0.328 0.39217 0.250 0.266 0.286 0.318 0.38118 0.244 0.259 0.278 0.309 0.37119 0.237 0.252 0.272 0.301 0.36320 0.231 0.246 0.264 0.294 0.35625 0.210 0.220 0.240 0.270 0.32030 0.190 0.200 0.220 0.240 0.29035 0.180 0.190 0.210 0.230 0.27016 0.258 0.274 0.295 0.328 0.39217 0.250 0.266 0.286 0.318 0.38118 0.244 0.259 0.278 0.309 0.37119 0.237 0.252 0.272 0.301 0.36320 0.231 0.246 0.264 0.294 0.35625 0.210 0.220 0.240 0.270 0.32030 0.190 0.200 0.220 0.240 0.29035 0.180 0.190 0.210 0.230 0.270

>35

N.J. Goddard Basic Statistics and Error Analysis 34

Page 35: Chen10011 Notes(1)

Table 4. Lilliefors modified critical values

n α=0.20 α=0.15 α=0.10 α=0.05 α=0.014 0.3027 0.3216 0.3456 0.3754 0.41295 0.2893 0.3027 0.3188 0.3427 0.39596 0.2694 0.2816 0.2982 0.3245 0.37287 0.2521 0.2641 0.2802 0.3041 0.35048 0.2387 0.2502 0.2649 0.2875 0.33319 0.2273 0.2382 0.2522 0.2744 0.316210 0.2171 0.2273 0.2410 0.2616 0.303711 0.2080 0.2179 0.2306 0.2506 0.290512 0.2004 0.2101 0.2228 0.2426 0.281213 0.1932 0.2025 0.2147 0.2337 0.271414 0.1869 0.1959 0.2077 0.2257 0.262715 0.1811 0.1899 0.2016 0.2196 0.254516 0.1758 0.1843 0.1956 0.2128 0.247717 0.1711 0.1794 0.1902 0.2071 0.240818 0.1666 0.1747 0.1852 0.2018 0.234519 0.1624 0.1700 0.1803 0.1965 0.228520 0.1589 0.1666 0.1764 0.1920 0.222621 0.1553 0.1629 0.1726 0.1881 0.219022 0.1517 0.1592 0.1690 0.1840 0.214123 0.1484 0.1555 0.1650 0.1798 0.209024 0.1458 0.1527 0.1619 0.1766 0.205325 0.1429 0.1498 0.1589 0.1726 0.201026 0.1406 0.1472 0.1562 0.1699 0.198527 0.1381 0.1448 0.1533 0.1665 0.194128 0.1358 0.1423 0.1509 0.1641 0.191129 0.1334 0.1398 0.1483 0.1614 0.188630 0.1315 0.1378 0.1460 0.1590 0.184831 0.1291 0.1353 0.1432 0.1559 0.182032 0.1274 0.1336 0.1415 0.1542 0.179833 0.1254 0.1314 0.1392 0.1518 0.177034 0.1236 0.1295 0.1373 0.1497 0.174735 0.1220 0.1278 0.1356 0.1478 0.1720

N.J. Goddard Basic Statistics and Error Analysis 35

Page 36: Chen10011 Notes(1)

Table 5. Area in the upper tail of the Normal distribution

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.46410.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.42470.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.38590.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.34830.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.31210.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.27760.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.24510.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.21480.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.18670.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.16111.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.13791.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.11701.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.09851.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.08231.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.06811.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.05591.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.04551.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.03671.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.02941.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.02332.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.01832.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.01432.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.01102.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.00842.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.00642.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.00482.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.00362.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.00262.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.00192.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.00143.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.00103.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.00073.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.00053.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.00033.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002

F-Tests

F-tests are used for the comparison of standard deviations of two samples. They can be used to determine if one set of data is more precise (a one-tailed test), or is different in its precision, (a two-tailed test). The F-test looks at the ratio of two sample variances,

FS

S 1

2

22 Equation (23)

S1 and S2 are chosen such that F 1.

The critical values for F are determined by numbers of observations in each of the two samples, n1 and n2, the confidence level and the type of test performed. The degrees of

N.J. Goddard Basic Statistics and Error Analysis 36

Page 37: Chen10011 Notes(1)

freedom for an F-test are given by n1-1 and n2-1. The null hypothesis is that the variances are equal:

Table 6 is an example of a one-tailed F-test table at the 95% confidence limit, while Table 7 is the two-tailed variant. If the calculated value for F is less than the critical value obtained from the appropriate F-table then the hypothesis that the two variances of the sample populations are equal at the stated confidence level is accepted. If the calculated value for F is greater than the critical value given in the F table then this hypothesis is rejected, and the alternative hypothesis accepted.

Example 4

A different experimental worker repeated the measurement in Example 1. The data obtained were:

x2 =12.501g, S2 = 0.019g, n2 = 5

The original data were:

x1 =12.475g, S1 = 0.012g, n1 = 20

The null hypothesis adopted is

S S12

22

that is, that there is no significant difference in the variance of both samples at the 95% (P = 0.05) confidence level.

S g S g12 4 2

22 4 2144 10 361 10 . , .

F

361 10

144 102 51

4

4

.

..

The critical value for F for a two-tailed test at the 95% confidence level, P = 0.05, for degrees of freedom of 4 and 19 is F0.05,4,19 = 3.56. A two-tailed test is used because we have no reason to suppose that one set of measurements will be more precise than the other.

In this case, the calculated F value is less than the critical value (2.51 < 3.56), in other words, we accept the null hypothesis that there is no significant difference between the two variances.

The table of F values in Murdoch and Barnes (Table 9, pages 20-21) is laid out somewhat differently to Tables 6 and 7 below. In Murdoch and Barnes the one-tailed values for = 0.05, 0.025, 0.01 and 0.001 are given for each combination of numerator (columns) and denominator (rows). This corresponds to P = 95, 97.5, 99 and 99.9% one-tailed or P = 90, 95, 98 and 99.8% two-tailed. The value corresponding to = 0.025 is bracketed to make it easier to see.

N.J. Goddard Basic Statistics and Error Analysis 37

Page 38: Chen10011 Notes(1)

Table 6. Table of one-tailed critical F values at the 95% confidence level.

Numerator degrees of freedom (ν1 = n1-1)1 2 3 4 5 6 7 8 9 10

Den

omin

ator

deg

rees

of

free

dom

(ν 1

= n

1-1)

1 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.882 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.403 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.794 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.965 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.746 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.067 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.648 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.359 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14

10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.9811 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.8512 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.7513 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.6714 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.6015 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.5416 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.4917 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.4518 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.4119 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.3820 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35

Numerator degrees of freedom (ν1 = n1-1)11 12 13 14 15 16 17 18 19 20

Den

omin

ator

deg

rees

of

free

dom

(ν 1

= n

1-1)

1 242.98 243.91 244.69 245.36 245.95 246.46 246.92 247.32 247.69 248.012 19.40 19.41 19.42 19.42 19.43 19.43 19.44 19.44 19.44 19.453 8.76 8.74 8.73 8.71 8.70 8.69 8.68 8.67 8.67 8.664 5.94 5.91 5.89 5.87 5.86 5.84 5.83 5.82 5.81 5.805 4.70 4.68 4.66 4.64 4.62 4.60 4.59 4.58 4.57 4.566 4.03 4.00 3.98 3.96 3.94 3.92 3.91 3.90 3.88 3.877 3.60 3.57 3.55 3.53 3.51 3.49 3.48 3.47 3.46 3.448 3.31 3.28 3.26 3.24 3.22 3.20 3.19 3.17 3.16 3.159 3.10 3.07 3.05 3.03 3.01 2.99 2.97 2.96 2.95 2.94

10 2.94 2.91 2.89 2.86 2.85 2.83 2.81 2.80 2.79 2.7711 2.82 2.79 2.76 2.74 2.72 2.70 2.69 2.67 2.66 2.6512 2.72 2.69 2.66 2.64 2.62 2.60 2.58 2.57 2.56 2.5413 2.63 2.60 2.58 2.55 2.53 2.51 2.50 2.48 2.47 2.4614 2.57 2.53 2.51 2.48 2.46 2.44 2.43 2.41 2.40 2.3915 2.51 2.48 2.45 2.42 2.40 2.38 2.37 2.35 2.34 2.3316 2.46 2.42 2.40 2.37 2.35 2.33 2.32 2.30 2.29 2.2817 2.41 2.38 2.35 2.33 2.31 2.29 2.27 2.26 2.24 2.2318 2.37 2.34 2.31 2.29 2.27 2.25 2.23 2.22 2.20 2.1919 2.34 2.31 2.28 2.26 2.23 2.21 2.20 2.18 2.17 2.1620 2.31 2.28 2.25 2.22 2.20 2.18 2.17 2.15 2.14 2.12

N.J. Goddard Basic Statistics and Error Analysis 38

Page 39: Chen10011 Notes(1)

Table 7. Table of two-tailed critical F values at the 95% confidence level.

Numerator degrees of freedom (ν1 = n1-1)1 2 3 4 5 6 7 8 9 10

Den

omin

ator

deg

rees

of

free

dom

(ν 1

= n

1-1)

1 647.79 799.50 864.16 899.58 921.85 937.11 948.22 956.66 963.28 968.632 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39 39.403 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47 14.424 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90 8.845 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68 6.626 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52 5.467 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82 4.768 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36 4.309 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03 3.9610 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78 3.7211 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59 3.5312 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44 3.3713 6.41 4.97 4.35 4.00 3.77 3.60 3.48 3.39 3.31 3.2514 6.30 4.86 4.24 3.89 3.66 3.50 3.38 3.29 3.21 3.1515 6.20 4.77 4.15 3.80 3.58 3.41 3.29 3.20 3.12 3.0616 6.12 4.69 4.08 3.73 3.50 3.34 3.22 3.12 3.05 2.9917 6.04 4.62 4.01 3.66 3.44 3.28 3.16 3.06 2.98 2.9218 5.98 4.56 3.95 3.61 3.38 3.22 3.10 3.01 2.93 2.8719 5.92 4.51 3.90 3.56 3.33 3.17 3.05 2.96 2.88 2.8220 5.87 4.46 3.86 3.51 3.29 3.13 3.01 2.91 2.84 2.77

Numerator degrees of freedom (ν1 = n1-1)11 12 13 14 15 16 17 18 19 20

Den

omin

ator

deg

rees

of

free

dom

(ν 1

= n

1-1)

1 973.03 976.71 979.84 982.53 984.87 986.92 988.73 990.35 991.80 993.102 39.41 39.41 39.42 39.43 39.43 39.44 39.44 39.44 39.45 39.453 14.37 14.34 14.30 14.28 14.25 14.23 14.21 14.20 14.18 14.174 8.79 8.75 8.71 8.68 8.66 8.63 8.61 8.59 8.58 8.565 6.57 6.52 6.49 6.46 6.43 6.40 6.38 6.36 6.34 6.336 5.41 5.37 5.33 5.30 5.27 5.24 5.22 5.20 5.18 5.177 4.71 4.67 4.63 4.60 4.57 4.54 4.52 4.50 4.48 4.478 4.24 4.20 4.16 4.13 4.10 4.08 4.05 4.03 4.02 4.009 3.91 3.87 3.83 3.80 3.77 3.74 3.72 3.70 3.68 3.6710 3.66 3.62 3.58 3.55 3.52 3.50 3.47 3.45 3.44 3.4211 3.47 3.43 3.39 3.36 3.33 3.30 3.28 3.26 3.24 3.2312 3.32 3.28 3.24 3.21 3.18 3.15 3.13 3.11 3.09 3.0713 3.20 3.15 3.12 3.08 3.05 3.03 3.00 2.98 2.96 2.9514 3.09 3.05 3.01 2.98 2.95 2.92 2.90 2.88 2.86 2.8415 3.01 2.96 2.92 2.89 2.86 2.84 2.81 2.79 2.77 2.7616 2.93 2.89 2.85 2.82 2.79 2.76 2.74 2.72 2.70 2.6817 2.87 2.82 2.79 2.75 2.72 2.70 2.67 2.65 2.63 2.6218 2.81 2.77 2.73 2.70 2.67 2.64 2.62 2.60 2.58 2.5619 2.76 2.72 2.68 2.65 2.62 2.59 2.57 2.55 2.53 2.5120 2.72 2.68 2.64 2.60 2.57 2.55 2.52 2.50 2.48 2.46

It should also be noted that in Table 9 in Murdoch and Barnes some numerator degrees of freedom are missing (for example, 1 = 9), so values must be interpolated. If, for example, the two-tailed critical F value for 1 = 9, 2 = 9 (F0.025,9,9) was required, we would use the bracketed values ( = 0.025) on either side, giving

We can see that this is correct from Table 7 above.

N.J. Goddard Basic Statistics and Error Analysis 39

Page 40: Chen10011 Notes(1)

The t-Test

t-Tests are used to evaluate the likelihood that observed differences between means are a result of indeterminate error. In a t-test, the null hypothesis is that x x1 2 , or that there is no difference between the means.

Statistical theory is used to calculate the probability that observed differences between the means are due to indeterminate errors. The null hypothesis will be rejected if the probability of differences being due to chance is less than the confidence limit adopted. There are three forms of t-test:

Comparison of a mean against a reference Comparison of the means of two samples Comparison of sets of means

Comparison of a Mean Against a Reference

Remembering that:

We can express the equation for confidence limits (equation 22) as:

x tS

np n, 1 Equation (24)

where is the reference value, x is the mean, tp,n-1 is Student's t-value for adopted confidence level p for n-1 degrees of freedom, S is the sample standard deviation and n is the number of observations.

This expression can be re-arranged to give Equation (25)

t xn

Sp n, 1 Equation (25)

This expression may be used to decide if x and are equivalent, or significantly different. In this case, the null hypothesis is:

First we calculate t, and then look up the critical value for t with n-l degrees of freedom at the selected confidence limit. If the modulus of t is greater than the critical value then the null hypothesis is rejected. Since we are only interested in determining if the means are different, we use a two-tailed t-test. If we wanted to determine if the mean was greater or less than the reference value (a particular direction of difference), we would use a one-tailed test.

Example 5

N.J. Goddard Basic Statistics and Error Analysis 40

Page 41: Chen10011 Notes(1)

The material weighed in Example 1 was obtained from a machine set to deliver 12.45g per operation. Is the unit operating outside its specification?

In this case the null hypothesis is that x = .

=12.45g, x = 12.475g, n = 20. S = 0.012g

t 12 475 12 4520

0 0129 317. .

..

The critical two-tailed value for t at the 95% confidence limit with 19 degrees of freedom is 2.093. The calculated t-value is much greater, so the null hypothesis (that the means are equal) is rejected and the alternative hypothesis (that the means are different) is accepted. The machine is operating out of specification.

Comparison of the Means of Two Samples: t-Tests

The t-test is used to test that the difference between the means of two samples is significant, and cannot be accounted for by indeterminate error. The first part of a t-test is to test whether the variances of the two means are significantly different, so a two-tailed F-test is used. For samples with the same distribution, that is the variances are not significantly different (also known as homoscedastic), the t-value is calculated by the following expression, Equation (26)

t

x x

Sn n

1 2

1 2

1 1 Equation (26)

The number of degrees of freedom for t in Equation (26) is given by:

n n1 2 2

and

Equation (27)

S is a pooled value for the standard deviation, and should be applied if the F-test shows no significant difference in the standard deviations. This pooled value is used because we know from the F test that the observations are drawn from distributions with the same standard deviation. We use a weighted average of the two standard deviations.

If the standard deviations are significantly different, a modification of Equation (26) can be used:

t

x x

S

n

S

n

1 2

12

1

22

2

Equation (28)

N.J. Goddard Basic Statistics and Error Analysis 41

Page 42: Chen10011 Notes(1)

and the degrees of freedom are calculated from:

S

n

S

n

S

n

n

S

n

n

12

1

22

2

2

12

1

2

1

22

2

2

21 1

2

Equation (29)

rounded to the nearest integer.

Example 6

Although there is a clear difference between the mean mass values given in Examples 1 and 4, the uncertainty associated with the second measurement is larger than the first, as a result of the smaller sample size. A t-test will be useful in confirming this observation. The null hypothesis is:

We have already performed the F-test to determine if the standard deviations are significantly different, and found that there is no significant difference at the 95% confidence level.Calculating t from Equations (12) and (13):

and

The critical value for t for (20 + 5 - 2 ) = 23 degrees of freedom at the 95% confidence level (P = 0.05) is 2.069. Since our t value is much larger, we reject the null hypothesis that the means are the same and accept the alternative hypothesis that the means are different at the 95% confidence level. We use a two-tailed test in this case as we are only testing if there is a difference in the two means, not whether one mean is greater than the other.

Rejection of Data

It is often the case that a set of data may contain a datum that is clearly different from the rest of the sample. Such data may contain a determinate error as well as an indeterminate error. If the experiment was not operating correctly at the time that the "outlier" was measured then to include it is misleading. However, removing inconvenient results so that the recorded observations fit preconceived models is wrong - even though some very well known scientists

N.J. Goddard Basic Statistics and Error Analysis 42

Page 43: Chen10011 Notes(1)

have succumbed to this temptation (Mendel, Darwin). Consequently, a great deal of care needs to be exercised when dealing with outliers.

If an outlier exists in experimental data, the first thing to be done is to check the experimental procedures operating at the time the datum was obtained. Check records and observations to try to identify the cause of the outlier. At this point deficiencies in record keeping are often highlighted. Should this be the case, changes to the experimental procedure need to be made to ensure that appropriate records are kept so that error tracing in future becomes more reliable.

If no determinate error is identified the argument is applied that while no changes to the system have been observed it is nevertheless highly unlikely that the outlier lies on the probability density function, as the frequency of such an observation is so low that it would require a very large sample size normally to observe it. Consequently such an outlier is more likely to contain determinate error, even though that error is not known, and therefore the outlier may be rejected.

Note: The most important data are the ones that don't conform to existing models.

Criteria for the Rejection of Data

There are three simple tests to use in deciding whether or not to reject outliers.

Chauvanet's Criterion

In this case the null hypothesis is that all the observations are drawn from the same distribution. The presumed outlier is removed from the sample and new values for the mean and standard deviation are calculated. The confidence limits based on the new mean and standard deviation are then determined at the appropriate confidence level and if the rejected outlier lies outside the confidence limits then it may be discarded. This criterion may only be applied once to a set of data, and it carries substantially more weight if strict confidence limits are applied, e.g. 99% or even better 99.9%.

Dixon's Q, Also Known as a Q-Test

The null hypothesis again is that all the observations are drawn from the same Normal distribution. A ratio, Dixon's Q, is calculated using the equations in Table 8, which also gives the critical values for = 0.1, 0.05 and 0.01 (P = 90, 95 and 99%). In this case the distinction between one and two tailed tests is irrelevant, as the outlier must lie either below or above the rest of the data points. The Q value is calculated using the equations depending on the sample size n and whether the presumed outlier is below (left hand equation) or above (right hand equation) the rest of the data. The calculated Q value is compared to the critical value at the required confidence level and if it is below the critical value the null hypothesis is accepted. If it is larger than the critical value the null hypothesis is rejected and the alternate hypothesis (that the point is drawn from a different distribution (is an outlier)) is accepted.

Table 8. Dixons Q test equations and critical values.

Rank Difference Ratio (Q statistic) n α = 0.10 α = 0.05 α = 0.013 0.886 0.941 0.9884 0.679 0.765 0.889

N.J. Goddard Basic Statistics and Error Analysis 43

Page 44: Chen10011 Notes(1)

5 0.557 0.642 0.7806 0.482 0.560 0.6987 0.434 0.507 0.6378 0.650 0.710 0.8299 0.594 0.657 0.77610 0.551 0.612 0.72611 0.517 0.576 0.67912 0.490 0.546 0.64213 0.467 0.521 0.61514 0.448 0.501 0.59315 0.472 0.525 0.61616 0.454 0.507 0.59517 0.438 0.490 0.57718 0.424 0.475 0.56119 0.412 0.462 0.54720 0.401 0.450 0.53521 0.391 0.440 0.52422 0.382 0.430 0.51423 0.374 0.421 0.50524 0.367 0.413 0.49725 0.360 0.406 0.489

Grubb’s Test

Grubb’s test is the ISO recommended method for removal of outliers. The null hypothesis is again that all measurements are from the same population. The suspect value is that furthest away from the mean. This test assumes that the observations are Normally distributed.

We calculate:

Equation (30)

Which is the standard normal value for the suspected outlier. The presence of an outlier increases both the numerator and the denominator (as the outlier increases S as well), so the G statistic therefore cannot increase indefinitely. In fact, G cannot exceed:

We compare the calculated G statistic against critical values at an appropriate confidence level, and if the calculated G value is less than the critical value we accept the null hypothesis, otherwise we reject it and accept the alternate hypothesis (that the observation is an outlier). This test should generally only be used once to remove an outlier. Table 9 gives critical values for Grubb’s test at = 0.05, 0.01 (P = 95, 99%) for sample sizes from 3 up to 600.

N.J. Goddard Basic Statistics and Error Analysis 44

Page 45: Chen10011 Notes(1)

Table 9. Critical values for Grubb’s test at = 0.05, 0.01 (P = 95, 99%)

n Gcrit, = 0.05

Gcrit, = 0.01

n Gcrit, = 0.05

Gcrit, = 0.01

n Gcrit, = 0.05

Gcrit, = 0.01

3 1.1543 1.1547 15 2.5483 2.8061 80 3.3061 3.67294 1.4812 1.4962 16 2.5857 2.8521 90 3.3477 3.71635 1.7150 1.7637 17 2.6200 2.8940 100 3.3841 3.75406 1.8871 1.9728 18 2.6516 2.9325 120 3.4451 3.81677 2.0200 2.1391 19 2.6809 2.9680 140 3.4951 3.86738 2.1266 2.2744 20 2.7082 3.0008 160 3.5373 3.90979 2.2150 2.3868 25 2.8217 3.1353 180 3.5736 3.946010 2.2900 2.4821 30 2.9085 3.2361 200 3.6055 3.977711 2.3547 2.5641 40 3.0361 3.3807 300 3.7236 4.093512 2.4116 2.6357 50 3.1282 3.4825 400 3.8032 4.170713 2.4620 2.6990 60 3.1997 3.5599 500 3.8631 4.228314 2.5073 2.7554 70 3.2576 3.6217 600 3.9109 4.2740

Example 7.

The weight data in Example 1 has one point (12.513 g) that is 3.14 standard deviations away from the mean. Is this observation an outlier?

We will apply all three tests to this data, using the null hypothesis that this observation is from the same (Normal) distribution as the rest of the observations. We will use a confidence level of 95% for all three tests.

Chauvanet’s criterion.

Removing the suspected outlier from the set leaves 19 observations with a mean of 12.473 g and a standard deviation of 0.00841 g.

The confidence limits are given by:

Where n´= n – 1 and n is the original number of observations before removal of the outlier.

The t value (two-tailed, P = 95%) is 2.101. Since our presumed outlier is outside of the confidence range of 12.469 to 12.477 g, we reject the null hypothesis (that the observation is part of the same population as the rest of the data) and accept the alternate hypothesis that the observation is an outlier from a different population.

Dixon’s Q

N.J. Goddard Basic Statistics and Error Analysis 45

Page 46: Chen10011 Notes(1)

The null hypothesis is that all of the observations are drawn from the same population. Since the sample size is 20 and the presumed outlier is greater than the rest of the observations, we calculate the Q statistic thus:

n Observation1 12.4502 12.4653 12.465 x3

4 12.4665 12.4686 12.4697 12.4728 12.4739 12.47310 12.47411 12.47512 12.47513 12.47514 12.47715 12.48116 12.48117 12.48218 12.485 xn-2

19 12.48520 12.513 xn

The critical value for P = 95% is 0.450. Our Q statistic is greater than the critical value, so we reject the null hypothesis and accept the alternate hypothesis that the observation is an outlier from a different population.

Grubb’s test

The null hypothesis is that all of the observations are drawn from the same population. We have already calculated the standard normal value for the outlier, but will show the calculation again:

N.J. Goddard Basic Statistics and Error Analysis 46

Page 47: Chen10011 Notes(1)

The critical value for P = 95% ( = 0.05) for 20 samples is 2.7082. Our calculated G value is well above this, so we reject the null hypothesis and accept the alternate hypothesis that the observation is an outlier from a different population.

We can see that all three methods give the same result, that the observation 12.513 g is an outlier.

Concluding Comments on the Rejection of Data

It is important to always retain and report outliers, as they may contain important information that you are unaware of. Explain the basis for their exclusion from your data analysis. Do not hide such data, even if it is highly inconvenient and even embarrassing to report it, history may attach a great deal more importance to it than you do.

N.J. Goddard Basic Statistics and Error Analysis 47

Page 48: Chen10011 Notes(1)

Linear Regression Analysis

Instrumental analysis techniques are often used to determine the concentration of an analyte over a wide range. Such comparative methods of analysis usually rely on a calibration curve obtained from the analysis of reference standards. The material presented to the instrument containing an unknown concentration of analyte yields a response from which the analyte concentration can be interpolated using the calibration graph.

Sometimes in the investigation of an unknown system the relationship between the systems response and a factor may be studied over a wide range of factor levels. In using such data a curve is fitted to show the relationship between the factor and the response, or the analyte concentration and the instrument signal.

Important questions are raised in adopting such an approach.

Is the graph linear? If not, what is the form of the curve? As each calibration point is subject to indeterminate error, then what is the best straight

line through the data? What errors are present in the fitted curve? What is the error in a determined concentration? What is the limit of detection?

An important assumption in conventional linear regression analysis, and other curve fitting techniques, is that there is no error in x axis values, that is, the standard deviation of the individual x-axis values is very much less than the standard deviation of the individual y-axis values. More complex forms of regression analysis, such as the Reduced Major Axis method, can produce a regression line where both the x and y axis values have significant variances.

When presented with a set of data where a linear relationship is claimed between a factor and its associated response an assessment of the accuracy of that claim may be obtained through the product-moment correlation coefficient, r. This is also known simply as the correlation coefficient. The correlation coefficient is used to estimate the "goodness of fit" of data to a straight tine and is given by Equation (31):

rx x y y

x x y y

i ii

n

i ii

n

i

n

( )( )

( ) . ( )

1

2 2

11

Equation (31)

or:

rn x y x y

n x x n y y

i ii

n

ii

n

ii

n

i ii

n

i

n

i ii

n

i

n

1 1 1

2

1

2

1

2

1

2

1

Equation (31a)

The advantage of equation 31a is that only the sums of x, y, xy, x2 and y2 need be accumulated.

N.J. Goddard Basic Statistics and Error Analysis 48

Page 49: Chen10011 Notes(1)

A "perfect" straight line fit will result in a value for r of 1. The sign of r is determined by the sign of the gradient of the line. The correlation coefficient effectively measures how much of the variance in the y (dependent) variable is accounted for by the variance in the x (independent) variable. If all of the variance in y is accounted for by the variance in x, then the variables are perfectly correlated with r = 1. It should be noted that the correlation coefficient can give very low correlations when there is an obvious relationship between the dependent and independent variables. Figure 14 illustrates such a case.

Sometimes there is a large indeterminate error present in the measurement of the response, always plotted on the y-axis. In such cases a t-test may be used to determine if a low value for r is significant. The null hypothesis adopted in such instances is that y is not correlated to x, or as an exact statement, r = 0. A two-tailed t-test is used if the sign of the slope is not significant (we only want to detect if r 0, H1: r 0) or a one-tailed test is used if we have some reason to believe in advance that the slope will be of a particular sign (we want to detect if r < 0 or r > 0). If t is greater than the critical value then the hypothesis is rejected. We calculate a t statistic using Equation (32).

tr n

r

2

1 2Equation (32)

where the number of degrees of freedom is given by:

n 2

The degrees of freedom are n – 2, NOT n – 1, because we have two derived values (slope and intercept) that means that we only need n – 2 points to determine the remaining two points.An alternative method is to use tables of critical values of the correlation coefficient, as found in Table 10 (page 22) of Murdoch and Barnes. Table 10 below (generated in Excel) also gives the critical values. The values are for the one-tailed test and the 2 values are for the two-

N.J. Goddard Basic Statistics and Error Analysis 49

Figure 14. Illustration of obvious correlation where the correlation coefficient is low.

Page 50: Chen10011 Notes(1)

tailed test. It can be seen for many degrees of freedom that even low values of the correlation coefficient means that there is significant correlation.

Table 10. Critical values of the correlation coefficient.

  α 0.05 0.025 0.005 0.0025 0.0005 0.00025  2α 0.1 0.05 0.01 0.005 0.001 0.0005ν = 1 0.98769 0.99692 0.999877 0.999969 0.99999877 0.99999969  2 0.9000 0.9500 0.9900 0.995000 0.999000 0.999500  3 0.8054 0.8783 0.9587 0.97404 0.99114 0.99442  4 0.7293 0.8114 0.9172 0.9417 0.9741 0.98169  5 0.6694 0.7545 0.8745 0.9056 0.9509 0.96287  6 0.6215 0.7067 0.8343 0.8697 0.9249 0.9406  7 0.5822 0.6664 0.7977 0.8359 0.8983 0.9170  8 0.5494 0.6319 0.7646 0.8046 0.8721 0.8932  9 0.5214 0.6021 0.7348 0.7759 0.8470 0.8699  10 0.4973 0.5760 0.7079 0.7496 0.8233 0.8475  11 0.4762 0.5529 0.6835 0.7255 0.8010 0.8262  12 0.4575 0.5324 0.6614 0.7034 0.7800 0.8060  13 0.4409 0.5140 0.6411 0.6831 0.7604 0.7869  14 0.4259 0.4973 0.6226 0.6643 0.7419 0.7689  15 0.4124 0.4821 0.6055 0.6470 0.7247 0.7519  16 0.4000 0.4683 0.5897 0.6308 0.7084 0.7358  17 0.3887 0.4555 0.5751 0.6158 0.6932 0.7207  18 0.3783 0.4438 0.5614 0.6018 0.6788 0.7063  19 0.3687 0.4329 0.5487 0.5886 0.6652 0.6927  20 0.3598 0.4227 0.5368 0.5763 0.6524 0.6799  25 0.3233 0.3809 0.4869 0.5243 0.5974 0.6244  30 0.2960 0.3494 0.4487 0.4840 0.5541 0.5802  40 0.2573 0.3044 0.3932 0.4252 0.4896 0.5139  50 0.2306 0.2732 0.3542 0.3836 0.4432 0.4659

Least Squares Fit

A least squares fit is used to draw a straight line through data that minimises the residuals in the y-axis, as shown in Figure 15. We are, in fact, minimising the sum of the squares of the residuals (the difference between the actual y value and the y value calculated from the regression line). For a straight line of form y =a + bx the coefficients b and a are given by:

Equation (33)

or,

bn x y x y

n x x

i ii

n

ii

n

ii

n

i ii

n

i

n

1 1 1

2

1

2

1

Equation (33a)

a y bx Equation (34)

N.J. Goddard Basic Statistics and Error Analysis 50

Page 51: Chen10011 Notes(1)

Equation (34a)

The alternate forms of both expressions use only sums of x, y, xy and x2.

It is important to provide an estimate of the uncertainty in the slope and intercept calculated through a least squares fit. This is especially so when involved in the characterisation of a systems response to a proposed factor.

The first stage is to calculate the y residuals. These are the differences between the calculated

data yi and the observed data yi for a given value of xi . Having done this a statistic S yx

is

obtained from Equation (35):

S

y y

nyx

i ii

n

2

1

2Equation (35)

This is the standard deviation of the y residuals – that is, the standard deviation of the difference between each data point and the y value given by the best fit line.

Alternatively, if we define:

N.J. Goddard Basic Statistics and Error Analysis 51

Figure 15. Residuals for least-squares fitting.

Page 52: Chen10011 Notes(1)

Q x

x

n

Q y

y

n

Q x yx y

n

x ii

n ii

n

y ii

n ii

n

xy i ii

n ii

n

ii

n

2

1

1

2

2

1

1

2

1

1 1

then,

S

QQ

Q

nyx

yxy

x

2

2

Equation (35a)

Which does not involve the calculation of the individual yi values and can be performed using only the sums of x, y, xy, x2 and y2.

S yx

is used to estimate the standard deviation in b, Sb, and a, Sa, as given in Equations (36)

and (37):

S

S

x x

S

Qb

yx

ii

n

yx

x

2

1

Equation (36)

S S

x

n x xS

x

na yx

ii

n

ii

n b

ii

n

2

1

2

1

2

1 Equation (37)

These estimates for the standard deviation may be used in the normal way in t and F-tests and used to provide estimates of appropriate confidence limits. These estimates of the standard deviation of the coefficients that define the linear least squares fit may be used to determine the uncertainty that accompanies a value of x0 obtained from the interpolation of an unknown x0 yielding of response of y0. An alternative approach is to calculate the standard deviation of x0 using Equation (38), which approximates the value of the standard deviation:

SS

b n

y y

b x xx

yx

ii

n00

2

2 2

1

11

Equation (38)

N.J. Goddard Basic Statistics and Error Analysis 52

Page 53: Chen10011 Notes(1)

If y0 is the mean of m measurements then Equation (10) is modified and becomes:

Equation (39)

Alternatively, we can use the forms involving the Q parameters:

or

These expressions are valid if:

Where t is the value of the t statistic for the appropriate confidence level and n-2 degrees of freedom. See http://www.rsc.org/images/Brief22_tcm18-51117.pdf for a fuller discussion.

How can we minimise the error Sx0? We can reduce the 1/m term by making replicate

measurements (m > 1). We can also reduce the 1/n term by increasing the number of data points in our calibration line. We can reduce the factor by working close to the centre of the data and we can also use a well-determined line, where b >> 0. Finally, we can

maximize the factor by using a wide range of x values.

Some graphing packages (such as Sigmaplot) can show the confidence interval for a best-fit line. Figure 16 shows such a plot. The curved lines are the confidence interval, in this case at a confidence level of 95%. In essence, these lines show the region over which there is a 95% probability of finding the true best fit line.

The confidence interval shows why interpolation is acceptable, while extrapolation can be very dangerous. As can be seen in Figure 16, the confidence interval is smallest in the centre of the data range. The confidence interval diverges away from the centre of the data, as shown in Figure 17. The further away from the centre of the data range, the larger the confidence interval. This can be seen clearly in the estimate of the standard deviation of x0

using equations (38) and (39), which both include a term , which increases rapidly the further y0 is from .

N.J. Goddard Basic Statistics and Error Analysis 53

Page 54: Chen10011 Notes(1)

Figure 16. Sigmaplot graph showing best fit line and 95% confidence interval.

Figure 17. Interpolation versus Extrapolation

Summary of Least Squares Fit

The approach described above is widely used, but it is flawed:

It assumes that x values are free of errors. This is not necessarily true. It assumes that errors in y values are constant. This is rarely true. All y values are given

equal weighting regardless of the uncertainty associated with them.

Nevertheless, used carefully linear regression analysis and least squares fit provides useful information.

N.J. Goddard Basic Statistics and Error Analysis 54

Graph of ozone concentration versus time

Time (hours)

0 2 4 6 8 10 12

Ozo

ne

conc

ent

ratio

n (

ppm

)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

95% confidence intervals

Best fit line

95% confidence intervals

Age of Plasma (s)

0 200 400 600 800 1000 1200

Co

agu

latio

n T

ime

(s)

18

20

22

24

26

28

30

32

Age of Plasma (s)

0 2000 4000 6000 8000 10000

Co

agu

latio

n T

ime

(s)

20

40

60

80

100

Page 55: Chen10011 Notes(1)

Non-linear Regression

The linear regression method outlined above works for any linear relationship. Many relationships between variables do not follow a simple linear relationship, but may be linearised by the appropriate transformation, allowing linear regression to be performed. An example would be:

y ax b

which may be linearised by taking logs of both sides:

log log logy a b x

We can now plot log y against log x to determine the slope and intercept, and hence the values of a and b. Figure 3 shows how this is done for a data set of fish length and mass.

From the log-log plot we obtain log a =-1.954 (a = 0.0111g) and b = 3.013. The equation relating fish mass to length is then:

y x0 0111 3 013. .

The correlation coefficient for the transformed data is 0.8733 for 29 observations. If we wish to see if this correlation is significant, we can determine the t value as shown in equation (4):

tr n

r

2

1

0 8733 29 2

1 0 87339 314

2 2

.

..

Our null hypothesis is that there is no correlation between the variables (r = 0). Because we expect in advance that the fish mass should increase with fish length (positive correlation), the alternate hypothesis will be that r > 0, which means that we should use a one-tailed test. The critical value for a one-tailed t-test with 27 degrees of freedom at the 99% confidence level is 2.473, which is much less than our calculated t value. Accordingly, we reject the null hypothesis (that there is no correlation) and accept the alternative hypothesis that there is a positive correlation between fish mass and length. Figure 18 shows graphs of both the original and transformed data, along with 99% confidence lines on either side of the least-squares fit line.

We see that the exponent in the relationship is very close to 3, which is what we would expect if the fish show isometric growth - that is, the fish grow uniformly in all three dimensions. We can perform a t-test on the exponent against the reference value (3), by first calculating the standard deviation of the exponent (the slope in this case).

S yx

01944.

Sb 01944

0 6010 3235

.

..

We calculate t using:

N.J. Goddard Basic Statistics and Error Analysis 55

Page 56: Chen10011 Notes(1)

t xn

S 3013 3000

29

0 32350 2164. .

..

The critical value for 27 degrees of freedom at the 99% confidence level for a two-tailed test is 2.771. Our t value is well below this value, so we accept the null hypothesis that the slope is not significantly different from 3.000 at the 99% confidence level.

Limits of Detection

Much debate has taken place regarding the definition for the limit of detection of a measurement system. IUPAC has proposed that the limit of detection be the response at zero analyte concentration plus three standard deviations of the response at zero analyte. If a calibration curve is used then the limit of detection is given by the y axis intercept plus three times the standard deviation associated with that value, Equation (40):

Limit of detection (LOD) = a Sa 3 Equation (40)

Or using Sy/x:

Limit of detection (LOD) = Equation (41)

Equation (40) should be used if Sa has been estimated independently by making replicate blank measurements. Equation (41) should be used where replicate blank measurements have not been made. It may be possible to record the presence of an analyte below this level, but the uncertainty associated with such an observation is such as to make it unreliable. Figure 19 illustrates the determination of the LOD from the intercept and standard deviation of the intercept.

N.J. Goddard Basic Statistics and Error Analysis 56

Figure 18. Linear-linear and log-log plots of fish mass (g) against fish length (cm)

Roach Data - Log data

log(length)

0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3

log

(ma

ss)

-1

0

1

2

3

Roach Data - Raw Data

Length (cm)

4 6 8 10 12 14 16 18

Ma

ss (

g)

-40

-20

0

20

40

60

Page 57: Chen10011 Notes(1)

Figure 19. Determination of the limit of detection (LOD) from the intercept and standard deviation of the intercept.

Data Processing

All these statistical tests and calculations, along with curve and line fitting can be done using statistical packages and spreadsheets. These are an important part of a modern measurement laboratory's equipment.

Often procedures will be given as to how to determine the appropriate statistical parameters and produce calibration curves which require no thought on the part of the operator. May I warn you against falling into this trap of using a computer package that you are unfamiliar with. Make sure you understand what the program does before you trust it with your data and always highlight results derived from computer programs that you are unfamiliar with. Fortunately, most software packages come with excellent tutorials and texts that fully describe how they operate and the basic theory behind the programs.

Packages such as MINITAB and SPSS give a far more comprehensive range of statistical tests than covered in these lectures. Most of the graphs shown in these lecture notes (with associated confidence limits) were produced using the SigmaPlot for windows package, while most spreadsheets have sophisticated graphing facilities.

Of the software available within The University of Manchester, SAS is the most comprehensive package for statistics. It can do far more than you are ever likely to require. It is also possible to perform statistical manipulations using spreadsheets such as Microsoft Excel. It should be noted, however, that Excel reports the results of its statistical tests in an terms of probabilities, not confidence levels.

N.J. Goddard Basic Statistics and Error Analysis 57

0

0

a

a+3Sa

LOD

Concentration

Response

Page 58: Chen10011 Notes(1)

Sigmaplot for Windows is a sophisticated scientific graphing package, permitting the creation of 2D and 3D plots. It has limited statistical capabilities, with its strong point being its ability to add confidence limits and prediction intervals to graphs. Origin is another graphics package with similar capabilities.

Finally, the arithmetic behind most of the methods outlined here is very simple, involving summations and the odd square root. It is not impossible to write your own programs to perform these tests using packages such as Visual BASIC or Borland Delphi. Of course, if you write your own programs you should check the results using test data before performing any significant analyses.

N.J. Goddard Basic Statistics and Error Analysis 58