Download - Statistical Methods in Experimental Chemistry

8/9/2019 Statistical Methods in Experimental Chemistry

1/103

Statistical Methods in Experimental Chemistry

Philip J. Grandinetti

January 20, 2000


2/103

2

P. J. Grandinetti, January 20, 2000


3/103

Contents

1 Statistical Description of Experimental Data 7

1.1 Random Error Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.1.1 Univariate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

The Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

The Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

The Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

The Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.1.2 Bivariate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

The Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1.3.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.1.3.2 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.1.3.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 17

Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Bi-dimensional Gaussian Distribution . . . . . . . . . . . . . . . . . . 24

Multi-dimensional Gaussian Distribution . . . . . . . . . . . . . . . . . 25

1.1.3.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.2 The 2 test of a distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.3 Systematic Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Instrument Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Personal Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Method Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.4 Gross Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.5 Propagation of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3


4/103

4 CONTENTS

1.6 Confidence Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1.6.1 Students t-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381.7 The Two Sample Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1.7.1 The t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1.7.1.1 Comparing a measured mean to a true value . . . . . . . . . . . . . 40

1.7.1.2 Comparing two measured means . . . . . . . . . . . . . . . . . . . . 41

1.7.2 Comparing Variances - The F-test . . . . . . . . . . . . . . . . . . . . . . . . 42

1.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2 Modeling of Data 53

2.1 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.2 A Simple Example of Linear Least-Squares . . . . . . . . . . . . . . . . . . . . . . . 56

2.2.1 Finding Best-Fit Model Parameters . . . . . . . . . . . . . . . . . . . . . . . 56

2.2.2 Finding Uncertainty in Model Parameters . . . . . . . . . . . . . . . . . . . . 58

2.2.3 Finding Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

2.2.4 What to do if you dont know yi? . . . . . . . . . . . . . . . . . . . . . . . . 60

2.3 General Linear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.4 General Non-Linear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2.4.1 Minimizing 2 without Derivative Information . . . . . . . . . . . . . . . . . 67

2.4.1.1 Downhill Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . 67

2.4.2 Minimizing 2 with Derivative Information . . . . . . . . . . . . . . . . . . . 692.4.2.1 Directional Derivatives and Gradients . . . . . . . . . . . . . . . . . 72

2.4.2.2 Steepest Descent Method . . . . . . . . . . . . . . . . . . . . . . . . 74

2.4.2.3 Newton Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . 75

2.4.2.4 Marquardt Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2.5 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

2.5.1 Constant Chi-Squared Boundaries as Confidence Limits . . . . . . . . . . . . 79

2.5.2 Getting Parameter Errors from 2 with Non-Gaussian Errors . . . . . . . . . 81

2.5.2.1 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . 81

2.5.2.2 Bootstrap Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

A Computers 89

The Bit: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Integers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Signed Integer Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 91



5/103

CONTENTS 5

Alphanumeric Characters: . . . . . . . . . . . . . . . . . . . . . . . . . 92

Floating point numbers: . . . . . . . . . . . . . . . . . . . . . . . . . . 93Roundoff Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

A.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

B Programming in C - A quick tutorial 95

Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

#include . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

main() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Another Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

To program or not to program . . . . . . . . . . . . . . . . . . . . . . 100

C.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103



6/103

6 CONTENTS



7/103

Chapter 1

Statistical Description of

Experimental Data

A fundamental problem encountered by every scientist who performs a measurement is whether

her measured quantity is correct. Put another way, how can she know the difference between her

measured value and the true value?

Error in Measurement = Measured Value True ValueThe basic difficulty is that to answer this very important question she needs to know the true

value. Of course, if she knew the true value she wouldnt be making the measurement in the firstplace! So, how can she resolve this catch-22 situation?

The solution is for her to go back into the lab, perform the measurement many times on known

quantities, and study the errors. While she can never know the true value of her unknown quantity,

she can use her error measurements and the mathematics of probability theory to tell her the

probability that the true value for her unknown quantity lies within a given range.

Errors can be classified into two classes, random and systematic. Random errors are irrepro-

ducible, and usually are caused by a large number of uncorrelated sources. In contrast, systematic

errors are reproducible, and usually can be attributed to a single source. Random errors could also

be caused by a single source, particularly when the experimentalist cannot identify the source of her

errors. If she could identify the source she might be able to make the error behave systematically.

Often systematic errors can be eliminated by simply recalibrating the measurement device. You

may hear of a third class of errors called gross errors. These are described as errors that occur

occasionally and are large. We will discuss this particular type of error later on, however, these

errors technically do not define a new class of errors. All errors, regardless of the class, can be

studied and used to establish limits between which the true value lies.

7


8/103

8 CHAPTER 1. STATISTICAL DESCRIPTION OF EXPERIMENTAL DATA

Measurement Number

31.4

31.3

31.2

31.1

31.0

30.9

30.8

N = 128

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

(A) (B)

Probability

Density

Temperature(C

)

Temperature (C)

Figure 1.1: (A) A set of 128 temperature measurements of a solution in a constant temperature

bath. (B) A normalized histogram of the measured values.

1.1 Random Error Distributions

1.1.1 Univariate Data

In this section we will assume that when we perform a measurement there are no systematic errors.

We are all familiar with random errors. For example, suppose you were measuring the temperature

of a solution in a constant temperature bath. You know that the temperature should be constant,

but on close inspection you find that is it not constant and appears to be randomly fluctuating

around the expected constant value (see Fig 1.1a). What do you report for the temperature of the

solution? On one extreme we could report only the average value, and on the other extreme wecould report all the values measured in the form of a histogram of measured values (see Fig 1.1b).

To use the mathematics of probability theory in helping us understand errors we make the

assumption that our histogram of measured values is governed by an underlying probability distri-

bution called the parent distribution. In the limit of an infinite number of measurements our

histogram or sample distribution becomes the parent distribution. This point is illustrated in

Fig. 1.2. Notice how the histogram of measured values more closely follows the values predicted

by the parent distribution as the number of measurements used to construct the histogram in-

creases. Thus, our first goal in understanding the errors in our measurements is to learn what is

the underlying parent distribution1

, p(x), that predicts the spread in our measured values. Oncewe know that distribution we can calculated the probability that the measured value lies within a

1We define our parent distribution so that it is normalized. That is, the area under the distribution is unity,

p(x)dx = 1. (1.1)



9/103

1.1. RANDOM ERROR DISTRIBUTIONS 9

N = 4096

30.8 30.9 31.0 31.2 31.3 31.4

8

6

4

2

0

N = 128

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

31.1

Measured Value

Probability

Density

N = 1024

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

Probability

Density

Probability

Density

N = 128

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

N = 1024

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

N = 4096

30.8 30.9 31.0 31.2 31.3 31.4

8

6

4

2

031.1

Measured Value

(a) (b)

N = 128

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

N = 1024

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

N = 4096

30.8 30.9 31.0 31.2 31.3 31.4

8

6

4

2

031.1

Measured Value

(c)

Figure 1.2: In (a), (b), and (c) are histograms constructed from three different sets of 4096 measure-

ments, all drawn from the same parent distribution. Shown from top to bottom are the histograms

constructed using only the first 128, the first 1024, and finally all 4096 measurements, respectively.

Notice how the histogram bin heights vary at low N and fall closer to the bin heights predicted by

the parent distribution for high N.

given range. That is

P(x, x+) =

x+x

p(x) dx, (1.2)

where p(x) is our parent distribution for the measured value x, and P(x, x+) is the probability

that the measured value lies between x and x+. In general, x and x+ are called the confidence

limits associated with a given probability P(x, x+). We will have further discussions on confidence

limits later in the text.

Note that when you report confidence limits you are not reporting any information concerning

the shape of the parent distribution. It is often useful to have such information. In the interests

of not having to plot a histogram for every measurement we report we ask the question: Can we

mathematically describe the parent distribution for our measured values by a finite (and perhaps



10/103


even small) number of parameters? While the answer to this question depends on the particular

parent distribution, there are a few parameters that by convention are often used to describe (inpart, or sometimes completely) the parent distribution. These are the mean, variance, skewness,

and kurtosis.

The Mean describes the average value of the distribution. It is also called the first moment

about the origin. It is defined as:

= limN

1

N

i

xi, (1.3)

where N corresponds to the number of measurements xi and is the mean of the parent distribution.

Experimentally we cannot make an infinite number of measurements so we define the experimental

mean according to:

x =1

N

i

xi, (1.4)

where x is the mean of the experimental distribution. The mean is usually approximated to be the

true value, particularly when the distribution is symmetric and assuming no systematic errors.

If the distribution is not symmetric the mean is often supplemented with two other parameters,

the median and the mode. The median cuts the area of the parent distribution in half. That is,xmedian

p(x)dx = 1/2. (1.5)

When the distribution is symmetric then the mean and median are the same. The mode is the

most probable value of the parent distribution. That is,

dp(xmode)

dx= 0, and

d 2p(xmode)

dx2< 0. (1.6)

The Variance characterizes the width of the distribution. It is also called the second moment

about the mean. It is defined as:

2 = limN

1

N

Ni

(xi )2, (1.7)

where 2 is the variance of the parent distribution. The experimental variance is defined as:

s2 =1

N 1Ni

(xi x)2, (1.8)



11/103


mode

median

mean

Measured Value

probabilitydensity

Figure 1.3: Position of mean, median, and median for an asymmetric distribution. Note that in a

symmetric distribution the mean and median are the same.

where s2 is the variance of the experimental parent distribution. and s are defined as the

standard deviation of the parent distribution and experimental parent distribution, respectively.

The variance is approximated as the error dispersion of the set of measurements, and the standard

deviation is approximated as the uncertainty in the true value. Notice that in the expression

for s2 we use N 1 in the denominator in contrast to N for 2. Without going into detail, thedifference is because s2 is calculated using x instead of . In practice, if the difference between N

and N1 in the denominator affects the conclusions you make about your data, then you probablyneed to collect more data (i.e. increase N) and reanalyze your data.

The Skewness characterizes the degree of asymmetry of a distribution around its mean. It is

also called the third moment about the mean. It is defined as:

skewness =1

N

N

i=1xi

3

. (1.9)

Skewness is defined to have no dimensions. Positive skewness signifies a distribution with an asym-

metric tail extending out towards more positive x, while negative skewness signifies a distribution

whose tail extends out toward more negative x. A symmetric distribution should have zero skew-

ness (e.g. Gaussian). It is good practice to believe you have a statistically significant non-zero

skewness if it is larger than

15/N.



12/103


positive

skewness

negative

skewness

Measured Value

ProbabilityDensity

Figure 1.4: Positive skewness signifies a distribution with an asymmetric tail extending out towards

more positive x, while negative skewness signifies a distribution whose tail extends out toward more

negative x.

The Kurtosis measures the relative peakedness or flatness of a distribution relative to a normal

(i.e., Gaussian) distribution. It is also called the fourth moment about the mean. It is defined as:

kurtosis =

1

N

Ni=1

xi

4 3. (1.10)

Subtracting 3 makes the kurtosis zero for a normal distribution. A positive kurtosis is calledleptokurtic, a negative kurtosis is called platykurtic, and in between is called mesokurtic.

1.1.2 Bivariate Data

Lets consider a slightly different measurement situation. In this situation when we make a mea-

surement we obtain not a single number, but a pair of numbers. That is, each measurement yields

an (x, y) pair. Just as in the one parameter case, there are random errors that will result in a

distribution of (x, y) pairs around the true (x, y) pair value. Also, like the one parameter case we

can construct a histogram of the N measured (x, y) pairs, except in this case our histogram will be

a three dimensional plot of occurance versus x versus y.As before, in the limit of an infinite number of (x, y) pair measurements we would have the two

parameter parent distribution2 p(x, y). The mean (x, y) pair are calculated

x = limN

1

N

i

xi, and y = limN

1

N

i

yi. (1.12)

2We define our two parameter parent distribution so that it is normalized. That is, the area under the distribution



13/103


negative

kurtosis

(platykurtic)

positive

kurtosis(leptokurtic)

Measured Value

ProbabilityDensit

y

Figure 1.5: The Kurtosis measures the relative peakedness or flatness of a distribution relative toa normal (i.e., Gaussian) distribution. A positive kurtosis is called leptokurtic, a negative kurtosis

is called platykurtic, and in between is called mesokurtic.

Similarly the variance is calculated

2x = limN

1

N

i

(xi x)2, and 2y = limN

1

N

i

(yi y)2. (1.13)

The Covariance In the two parameter case, however, we need to consider another type of

variance called the covariance. It is defined as

2xy = limN

1

N

i

(xi x)(yi y) (1.14)

If the errors in x and y are uncorrelated this term will be zero. What distinguishes these two

situations? Usually, a non-zero covariance implies that the two measured parameters are dependent

on each other, that is

y = F(x), or x = F(y),or both x and y are dependent on a common third parameter, that is

x =F

(z), or y =F

(z).

If x is the temperature of a solution in a water bath in New York City, and y is the temperature

of a solution in a water bath in San Francisco, then its unlikely that the errors in x and y will be

is unity,

p(x, y)dxdy = 1. (1.11)



14/103


x

y y

x

2xy = 0 2xy 0

(a) (b)

Figure 1.6: (a) Contour plot of a two parameter parent distribution having no correlation between

the two parameter errors. (b) Contour plot of a two parameter parent distribution having a strong

correlation between parameter errors.

correlated. This situation would have a distribution similar to the one in Fig. 1.6a. If, however,

x and y were the temperature of two different solutions in the same water bath, then it would be

reasonable if their errors were correlated as shown for the distribution in Fig. 1.6b.

Another parameter often used that is related to the covariance is the linear correlation coefficient

r. It is defined as

rxy =2xy

xy. (1.15)

An r2xy value of 1 implies a complete linear correlation between the measured x and y parameters,

while an r2xy value of zero implies no correlation whatsoever. It should be pointed out that even if

y is dependent on x in a non-linear fashion, the non-linear function may be fairly linear over the

small range of x and y associated with the random errors in their measurement.

A high covariance between two parameters means that any increase (or decrease) in the uncer-

tainty of one parameter will lead to a corresponding increase (or decrease) in the uncertainty of

the other parameter. This is illustrated in Fig. 1.7.

Finding the probability that a measured (x, y) pair will lie within a given range is a morecomplicated problem than in one-dimension. One could define a square region delimited by x and

x+ on the x-axis and y and y+ on the y-axis, such as

P(x, x+, y, y+) =

x+x

y+y

p(x, y)dxdy, (1.16)

as shown in Fig. 1.8. In case of high correlation (i.e. r2 close to 1), this approach will includes



15/103


x

y y

x

2xy = 0 2xy 0

}y

1

y2 }

y1

y2

Distribution in x after

summing over all possible

values of y.


summing over all possible

values of y.


summing over values of y

between y1 and y2


summing over values of y

between y1 and y2

(a) (b)

Figure 1.7: When the distribution in two parameters are strongly correlated the width of the dis-

tribution in one of the parameters will be greatly reduced if the values of the other parameters arerestricted to a smaller set of possible values.

regions of low probability, and thus doesnt give an accurate representation of where the data is

likely to occur. Finding the probability that a measured (x, y) pair will lie within a given ellipse is

a better approach, which we will discuss later in the course.

1.1.3 Probability

Before we can go deeper with this statistical picture of random experimental errors, we need to

review the mathematics of probability theory. This will help us better understand the shapes ofparent distributions and their application to the study of random errors.

As you might have guessed, the mathematical theory of probability arose when a gambler

(Chevalier de Mere) wanted to adjust the odds so he could be certain of winning. Thus he enlisted

the help of the famous French mathematicians Blaise PASCAL and Pierre DE FERMAT.

x

y

x - x +

y -

y +

Figure 1.8: Cartesian confidence limits of two-dimensional probablity distribution.



16/103


In principle, calculating the probability of an event is straightforward:

Probability =Number of outcomes that are successful (winning)

Total number of outcomes (winning and losing)(1.17)

The difficulty lies in counting. In order to count the number of outcomes we appeal to combinatorics.

1.1.3.1 Permutations

A Permutationis an arrangement of outcomes in which the order is important. A common example

is the Election problem. Consider a club with 5 members, Joe, Cathy, Sally, Bob, and Pat. In

how many ways can we elect a president and a secretary? One solution is to make a tree, such as

the one below:

Joe

Joe, Cathy

Joe, Sally

Joe, Bob

Joe, Pat

Cathy

Cathy, Joe

Cathy, Sally

Cathy, Bob

Cathy, Pat

Pat

Pat, Joe

Pat, Cathy

Pat, Sally

Pat, Bob

Sally

Sally, Joe

Sally, Cathy

Sally, Bob

Sally, Pat

Bob

Bob, Joe

Bob, Cathy

Bob, Sally

Bob, Pat

Using such a tree diagram we can count that there are a total of 20 possible ways to elect a

president and a secretary in a club with 5 members. Another approach is to think of two boxes,

one for president, and one for secretary. If you pick the president first and secretary second then

youll have five choices for president 5 president, and four choices for secretary 4 secretary. The

total number of ways is the product of the two numbers

5 president 4 secretary = 20.

What if we wanted to elect a president, secretary, and treasurer? In this case a tree would be

alot of work. Using the boxes approach we would have 5 4 3 = 60 possibilities. In general, thenumber of ways r objects can be selected from n objects is

nPr = n (n 1) (n 2) (n r + 1), (1.18)

or more generally written as

nPr =n!

(n r)! . (1.19)



17/103


For example, how many ways can a baseball coach arrange the batting order with 9 players?

Answer: 9P9 =9!

(9 9)! = 9! = 362, 880 ways.

1.1.3.2 Combinations

A Combination is an arrangement of outcomes in which the order is not important. A common

example is the Committee problem. Consider again our club with 5 members. In how many

ways can we form a three member committee? In this case the order is not important. That is,

{Joe, Cathy, Sally} = {Cathy, Joe, Sally} = {Cathy, Sally, Joe}. All arrangements of Cathy, Joe,and Sally are equivalent.

The total number of combinations of n objects taken r at a time is

nCr =nPr

r!, or

n!

r!(n r)! . (1.20)

For example, there are 5C3 =5!

3!2!= 10 possible combinations for a 3 member committee starting

with 5 members.

Heres another example: What is the number of ways a 5 card hand can be drawn from a 52

card deck?

52C5 =52!

5!47!= 2, 598, 960 ways.

nCr is called the binomial coefficient, and is also often written asn

r

. Both notations will be

used in this text.

Permutations and Combinations cannot be used to solve every counting problem, however, they

can be extremely helpful for certain types of counting problems.

1.1.3.3 Probability Distributions

Now that we know how to count, lets go back to the original problem of calculating the probability

of an event. Our starting point is the equation

Probability =Number of successful outcomes

Total number of outcomes. (1.21)

What if the names of everyone in our 5 member club are thrown in a hat, and two names are drawn,

with the first becoming president and the second becoming secretary. With this approach, what is

the probability that Pat is chosen as president and Cathy as secretary?



18/103


To calculate this probability we first have to count the outcomes. In this case there is only one

successful outcome, namely Pat as president and Cathy as secretary. The total number of outcomescan be calculated as the total number of permutations of drawing two names out of a group of five,

or

total number of outcomes = 5P2 = 20

Therefore, the probability is

Probability =1

20= 0.05.

That is, theres a 5% chance that Pat is chosen as president and Cathy as secretary.

Lets look at another example. What is the probability of being dealt a particular five card

hand from a 52 card deck?

number of successful outcomes = 1,

total number of outcomes = 52C5 = 2, 598, 960,

Probability =1

2, 598, 960= 3.847692 107.

Yet, another example. What is the probability of being dealt five cards that are all spades from

a 52 card deck?

number of successful outcomes = 13C5 =13!

5!8!

= 1, 287,

total number of outcomes = 52C5 = 2, 598, 960,

Probability =1, 287

2, 598, 960= 4.951980 104.

Binomial Distribution

Now lets look at the probabilities associated with independent events with the same probabil-

ities. For example, if I roll a die ten times, what is the probability that only 3 will come up sixes?

The first question we have to ask is what is the probability of rolling a particular sequence with

only 3 sixes. For example, one possibility is

X,X, 6,X,X,X, 6, 6, X , X

where X is a roll that was not 6. The probability of this particular sequence of independent events

is5

6 5

6 1

6 5

6 5

6 5

6 1

6 1

6 5

6 5

6=

5

6

7 16

3= 1.292044 103



19/103


The second question we ask is how many different ways can we roll only 3 sixes. Clearly, the

total number of possibilities (i.e., combinations) is 10C3 or10

3. Assuming that all possible

combinations are equally probable, then to obtain the overall probability that I will roll only 3

sixes we simply multiply our calculated probability above by the number of combinations that give

only 3 sixes. That is,

P(3 sixes out of 10 rolls) = 10C3

5

6

7 16

3= 120 1.292044 103 = 0.15504536,

or roughly a 1 in 6.5 chance. Note that the solution to this problem would have been no different

if I had rolled ten different dice all at once and asked for the probability that only 3 came up sixes.

We can generalize this reasoning to the case where the probability of success is p (instead of

1/6), the probability of failure is (1 p) (instead of 5/6), the number of trials is n (instead of 10),and the number of successes is r (instead of 3). That is,

P(r,n,p) =

n

r

pr(1 p)nr (1.22)

This distribution of probabilities, for r = 0, 1, 2, . . . , n, is called the binomial distribution.

In general, the mean and variance of a discrete distribution3 is given by

r =rmaxr=0

rP(r), and 2r =rmaxr=0

(r r)2P(r). (1.24)

Thus, we find the mean of the binomial distribution to be

=n

r=0

r

n

r

pr(1 p)nr = np, (1.25)

and the variance of the binomial distribution to be

2 =n

r=0

(r )2

n

r

pr(1 p)nr

= np(1 p). (1.26)

For example, a coin is flipped 10 times. What is the probability of getting exactly r heads?

Since p = 1/2 and (1 p) = 1/2 then we have

P(r, 10, 1/2) =

10

r

(1/2)r(1/2)10r =

10

r

(1/2)10

3For a continuous distribution we have

x =

xp(x)dx, and 2x =

(x x)2p(x)dx. (1.23)

where x is a continuous variable.



20/103


r Prob.

0

1

2

3

4

5

6

7

8

9

10

9.77 X 10- 4

9.77 X 10- 3

0.117

0.205

0.246

0.205

0.117

0.044

9.77 X 10- 4

9.77 X 10- 3

0.044

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

Probability

r

= 5

= 1.58

(a)

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

r

= 1.67

= 1.18

Probability

r Prob.

0

12

3

4

5

6

7

8

9

10

0.162

0.323

0.00025

0.002

0.013

0.054

0.155

0.291

1.65 X10-80.0000008

0.00002

(b)

Figure 1.9: The binomial distribution. (a) A symmetric case where p = 1/2 and n = 10. (b) A

asymmetric case where p = 1/6 and n = 10.

In this case we get a symmetric distribution (see Fig. 1.9a) with a mean of = 5 (i.e., 5 out of 10

times we will get heads) and a variance of 2 = /2 = 2.5, and =

2.5.

Lets look at another example. A die is rolled 10 times. What is the probability of getting a

six r times? Since p = 1/6 and (1 p) = 5/6 then we have

P(r, 10, 1/6) =10!

r!(10 r)! (1/6)r(5/6)10r.

In this case we get an asymmetric distribution (see Fig. 1.9b) with a mean of = 10/6, 2 = 50/36,

and =

50/36.

Poisson Distribution

Lets look at a problem that more oriented towards physics and chemistry. If the probability

that an isolated excited state atom will emit a single photon in one second is 0.00050, what is the

probability that two photons would be emitted in one second from five identical non-interactingexcited state atoms?

Answer: P(2, 5, 0.0005) =5!

2!3!(0.00050)2(0.99950)3 = 2.5 106.

While this is a straightforward calculation, we run into difficulties in this type of calculation when

working with a larger number of atoms (i.e. higher n). For example, how would you calculate the



21/103


0 5 10 15 20 25 300.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40=1

=2

=4

=8

=16

r

P(r,)

Figure 1.10: The Poisson distribution.

probability that two photons would be emitted from 10,000 identical non-interacting excited stateatoms? My calculator cannot do factorial calculations with numbers greater than 69.

In this situation we can make an approximation for the binomial distribution in the limit that

n and p 0 such that np a finite number. This approximation is called the Poissondistribution and is given by

PPoisson(r,n,p) =(np)r

r!enp. (1.27)

This distribution most often describes the parent distribution when you are observing independent

random events that are occurring at a constant rate, such as photon counting experiments. The

mean of the Poisson distribution is

=r=0

r

(np)r

r!enp

= np. (1.28)

and the variance of the Poisson distribution is

2 =r=0

(r np)2 (np)

r

r!enp

= np. (1.29)



22/103


These two results are as expected for the binomial distribution in the limit that p 0. We define = np and rewrite the Poisson distribution as

PPoisson(r, ) =r

r!e. (1.30)

The Poisson distribution gives the probability for obtaining r events in a given time interval with

the expected number of events being .

Lets use the Poisson distribution to solve our problem above with 10,000 atoms. In this problem

n = 10, 000 and p = 0.00050 so = np = 5.0. Thus,

P(2, 5.0) =(5.0)2

2!

e5.0 = 0.084.

Theres an 8.4% chance that two photons would be emitted from 10,000 identical non-interacting

excited state atoms.

Gaussian Distribution In the limit of large n and p 0 the Poisson distribution serves as agood approximation for the binomial distribution. Is there an approximation that holds for the

limit of large n when p is not close to zero? Yes, under these conditions we can use the Gaussian

distribution as an approximation for the binomial. That is,

PGaussian(r,n,p) =

12np(1 p) exp1

2

(r

np)2

np(1 p) . (1.31)The mean of the Gaussian distribution is

=r=0

r

2np(1 p) exp

12

(r np)2np(1 p)

= np. (1.32)

and the variance of the Gaussian distribution is

2 =r=0

(r np)2

2np(1 p) exp

1

2

(r np)2np(1 p)

= np(1 p). (1.33)

Making the substitutions for = np and 2 = np(1 p) we can rewrite the Gaussian distributionin the form

PGaussian(r, ; ) =1

2exp

1

2

r

2. (1.34)

It turns out that the Gaussian distribution describes the distribution of random errors in many

kinds of measurements.



23/103


In the same way that we make the transition from quantized to continuous observables in going

from quantum to classical mechanics we replace the integer r with a continuous parameter x andget

p Gaussian(x, ; ) =1

2exp

1

2

x

2. (1.35)

An important distinction in this transition is that p Gaussian now represents probability density

not probability. Recall from our earlier discussion that to get the probability that a continuous

measured quantity lies within a given range we need to integrate the probability density between

the limits that define the range. If your parent distribution is known to be Gaussian, and if you

choose your confidence limits to be equidistant from the mean, then the confidence limits for a

given probability can be written in the simple form

x = z, (1.36)

where z is a factor that is proportional to the percent confidence desired and is given in table 1.1.

For example, z = 0.67 for 50% confidence; that is, 50% of the total area under the Gaussian curve

lies between 0.67 from the mean . Conversely, we can say that for a given single measurement,x, the true mean will lie within the interval

= x z, (1.37)

-5 -4 -3 -2 - 0 2 3 4 5

0.40/

0.35/

0.30/

0.25/

0.20/

0.15/

0.10/

0.05/

0.00/

x

probabilitydensity

Figure 1.11: Gaussian Distribution.



24/103


with a percent confidence determined by z.

Bi-dimensional Gaussian Distribution So far weve only discussed univariate data. Lets

look at the case of a bivariate Gaussian distribution. Remember we have to take into account

the covariance (or linear correlation rxy) between the two variables. The functional form of this

distribution is

p Gaussian(x, y) =1

2xy

1 r2xy

exp

12(1 r2xy)

x xx

2+

y y

y

2 2rxy

x x

x

y y

y

. (1.38)

If you take a cross-section through the distribution to obtain, for example, the distribution in

x for a fixed value of y, then the mean of this cross-section distribution will be

x(y) = x + rxy(x/y)(y y), (1.39)

and the standard deviation will be

x(y) = x

1 r2xy. (1.40)

-5 -4 -3 -2 -1 0 1 2 3 4 5

z = (x ) /

Area between z = 0.67 is

50% of total area

+0.67-0.67

0.40/

0.35/

0.30/

0.25/

0.20/

0.15/

0.10/

0.05/

0.00/

probabilitydensity

Figure 1.12: 50% of the area under a Gaussian distribution lies between the limits 0.67 from themean. These limits define the 50% confidence limits for the true value.



25/103


% confidence z

50 0.6760 0.84

75 1.15

90 1.64

95 1.96

97.5 2.24

99.0 2.58

99.5 2.81

99.95 3.50

Table 1.1: Percent Confidence (i.e., Area under Gaussian distribution) as a function of z values.

Multi-dimensional Gaussian Distribution In the more general case of multi-variate data we

have

p Gaussian(x) =1

(2)n/2|V| exp

1

2(x ) V1 (x )

. (1.41)

Here we use the symbol x to represent not one variable but m variables. For example, instead of the

variables (x,y ,z) we use a 3 dimensional vector x or (x1, x2, x3). In general x is a m-dimensional

vector whose elements (x1, x2, x3, . . . , xm) represent the m variables in our multi-variate data.Likewise is a m-dimensional vector whose elements (x1 , x2 , x3 , . . . , xm) represent the means

of the m variables in our multi-variate data.

To represent the variance, however, we use a m m dimension symmetric (i.e., Vi,j = Vj,i)matrix so that we can include all the covariances between variables. For example, in the bi-variate

case we have

V =

2x 2xy

2xy 2y

. (1.42)

Or in the three variable case we have

V =

21 212

213

212 22

223

213 223

23

. (1.43)



26/103


x

y

2xy 0

y2

(b)

Mean

Stand.

Dev.

x(y) = x + r(x/y)(y y),

x(y) = x/

1 r2

Figure 1.13: Bivariate Gaussian Distribution.

V1, the inverse of the covariance matrix, is called the curvature matrix, ,

V1 = . (1.44)

We will learn more about the curvature matrix later when we discuss modeling of data.

1.1.3.4 Central Limit Theorem

Our earlier assumption that the random errors in our measurements are governed by a parent dis-

tribution seems to be reasonable in terms of the binomial theorem and the probabilities associated

with quantum measurements. Wouldnt it be great if our measurement uncertainties were domi-

nated by just the uncertainty principle of quantum mechanics! But alas, we also have the workman

outside our building running a jackhammer that sends random vibrations into our instrument,

and/or the carelessness of Homer Simpson at the local power plant sending random voltage fluctu-

ations into our instrument, and/or a number of other seemingly random perturbations leading to

random errors in our measurements. Not being able to eliminate them we are still faced with the

task of determining the parent distribution governing our errors.

One helpful simplification is the Central Limit Theorem. This theorem says that if you

have different random error sources, each with their own parent distribution, then in the limit

that you have a infinite number of random error sources the final parent distribution for your

random errors will be Gaussian, even if none of the individual random error sources have Gaussian



27/103

1.2. THE 2 TEST OF A DISTRIBUTION 27

parent distributions. An important condition is that the component errors have the same order

of magnitude and that no single source of error dominates all the others. Since you really cantknow if you have a large number of random error sources you should be cautious about assuming

Gaussian errors. Nonetheless, the Gaussian distribution does seem to describe the errors in most

experiments, other than situation where the Poisson distribution should apply.

1.2 The 2 test of a distribution

If we suspect that a given set of observations comes from a particular parent distribution, how can

we test them for agreement with this distribution? For example, if I gave you two dice and ask you

to tell me if the dice are loaded4 what would you do?You could roll the dice a large number of times and make a histogram of the results. You dont

expect that you will roll 12 (i.e., both dice come up as 6) exactly 1/36 times the number of rolls,

but it should be close. In contrast, if the die came up 12 in over half the number of total rolls then

you might suspect that the dice are not obeying the statistics of the binomial distribution. How

much disagreement between the parent (binomial) distribution and our sample distribution can we

reasonably expect? To answer this question, we need a quantitative index of the difference between

the two distributions and a way to interpret this index.

Lets consider a variable x that weve measured N times. We can construct a histogram of

measured x values using a bin width of dx. If the errors in x are governed by a parent distributionp(x) then the expected number of x observations in the range dx is given by

h(xi) = N p(xi)dx.

How do we find the uncertainty in the bin heights? Pay close attention, this part is tricky.

For each bin height h(xi) theres a probability distribution governing the distribution of measured

heights. If we construct many histograms from different groups of measurements, then we would

expect some distribution of bin heights around the expected bin heights. Recall that counting

events are typically governed by a Poisson distribution. So while the spread in x can be governed

by a particular parent distribution, the spread in the frequency of occurance of a particular x valueis governed by the Poisson distribution. Recall that for a Poisson distribution the variance is equal

to the mean of the distribution. Thus we estimate the standard deviation of the bin heights as

i(h(xi)) =

h(xi). (1.45)

4One side of the die is weighted so that the probabilities for any given side coming up are not equal.



28/103


Now back to our original problem. The experimental histogram can be tested against a parent

distribution using the 2 test:

2 =Nbini

[h(xi) Np(xi)dx]2h(xi)

(1.46)

The 2 test characterizes the dispersion of the observed frequencies from the expected frequencies.

For good agreement between two distributions you might expect that each bin height would differ

from the expected heights by one standard deviation. So we might expect 2 to be equal to Nbin

for good agreement. More specifically, the expectation value for 2 is

2

= = N

bin nc, (1.47)

where is the degrees of freedom, Nbin is the number of sample frequencies, and nc is the number

of constraints or parameters calculated from the data to describe the potential parent distribution

p(x). Simply normalizing the parent distribution to the total number of events N is one constraint.

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

Measured Value

relative

occurance

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Probability

relative occurance

Figure 1.14: The uncertainty in the number of occurances within a given bin is governed by the

Poisson distribution.



29/103

1.2. THE 2 TEST OF A DISTRIBUTION 29

If you obtained the mean and variance from your data and used them to describe your parent

distribution then you have used another two constraints.Often you will see the reduced 2 or

2 = 2/, (1.48)

which has an expectation value of2 = 1. Large values for 2 (i.e. 1) indicate poor agreement.Very small (i.e. 1) are also unreasonable, and imply something wrong somewhere.

We can also use the 2 test to compare two experimental distributions to decide whether or not

they were drawn from the same parent distribution. For example, to compare h(xi) and g(xi) you

would calculate

2 =

Nbini

[g(xi)

h(xi)]2

g(xi) + h(xi) . (1.49)

If a given bin contains no occurances for both histograms, then the sum over that bin is skipped

and the number of degrees of freedom (i.e., Nbin) is reduced by one. If the two experimental

distributions are constructed from different number of measurements then you need to scale the

individual distributions. In this case you would calculate

2 =Nbini

[

H/Gg(xi)

G/H h(xi)]2

g(xi) + h(xi)(1.50)

where H = i h(xi) and G = i g(xi). Having to scale the histograms for different number ofmeasurements is another constraint that will reduce the degrees of freedom by one.Interpreting the 2 test result can be simpler if the 2 probability function is used

p(2, ) =1

2/2(/2)(2)(2)/2e

2/2, (1.51)

where (x) is the Gamma function and is given by

(x) =

0euux1du with 0 x . (1.52)

p(2, ) is the distribution of 2 values as a function of the number of degrees of freedom, , and

has a mean of and a variance of 2. Q(2

|) is the probability that another distribution wouldgive a higher (i.e., worse) 2 than your value 20. It is calculated as follows:

Q(20|) =1

2/2(/2)

20

(2)(2)/2e2/2d(2). (1.53)

See table C.4 in Bevington for this function tabulated. If Q(2|) is reasonably close to 1, then theassumed distribution describes the spread of the data points well. If Q(2|) is small the assumed



30/103


0 5 10 15 20 25 30 350.0

0.1

0.2

0.3

0.4

2

p(2 ,)

= 10

= 1

= 2

= 4

= 8 = 6

Figure 1.15: The 2 distribution for = 1,2,4,6,8, and 10.

parent distribution is not a good estimate of the parent distribution. For example, for a sample

with 10 degrees of freedom ( = 10) and 2 = 2.56 the probability is 99% that another distribution

would give a higher (i.e., worse) 2 (i.e., the distributions agree well). Another example, if ( = 10)and 2 = 29.59 the probability is 0.1% that another distribution would give a higher (i.e., worse)

2 (i.e., the distributions dont agree well).

1.3 Systematic Errors

Systematic errors lead to bias in a measurement technique. Bias is the difference between your

measured mean and the true value

bias = B xt (1.54)

Systematic errors are reproducible, and usually can be attributed to a single source. The sources

of error can be divided into three subclasses:

Instrument Errors Arise from imperfections in the measuring device. For example, uncalibrated

glassware, bad power supply, faulty Pentium chip, ... If these errors are found they can be corrected

by calibration. Periodic calibration of equipment is always desirable because the response of most



31/103

1.4. GROSS ERRORS 31

30.8 30.9 31.0 31.2 31.3 31.4

8

6

4

2

031.1

Measured Value

Relative

Occurance

31.5

A B

A B

bias

True

Value

Figure 1.16: Method A has no bias so A = xtrue.

instruments changes with time as a result of wear, corrosion, or mistreatment.

Personal Errors Arise from carelessness, inattention, other limitations. For example, missedend points, poor record keeping, ... These errors can be minimized by care and self-discipline. It is

always a good habit to often check instrument readings, notebook entries, and calculations.

Method Errors Arise from non-ideal chemical or physical behavior of system under study. For

example, incomplete reactions, unknown decomposition, system nonlinearities, ... These errors are

the most difficult to detect. Some steps to take are analyses of standard samples, independent

analyses (i.e., another technique that confirms your result), blank determinations which can reveal

contaminant in the instrument.

1.4 Gross Errors

As we mentioned earlier these errors are not a third class of errors. They are either systematic or

random errors. It can be dangerous to reject gross errors (or outliers). There is no universal rule.

A common approach is the Q test. For more details see Rorabacher, Anal. Chem. 63, 139(1991).



32/103


X1 Xn Xq

d

w

Figure 1.17: Questionable measurements can be rejected if Qexp = d/w is greater than the value of

Qcrit given in table 1.2

If Qexp is greater than Qcrit than xq can be rejected.

Qexp =d

w=

|xq xn||x1 xq| (1.55)

where d is the difference between the questionable result and the nearest neighbor and w is the

spread in the entire data set. Or xq is the questionable result, xn is the nearest neighbor, and x1

is the furthest result from xq. Ideally, the only valid reason for rejecting data from a small set of

data is knowledge that a mistake was made in the measurement.

1.5 Propagation of Errors

If we calculate a result using experimentally measured variables how do we relate the uncertainty

in our experimental variable to the uncertainty in our calculated variable? Recall that the area

under the error distribution curve between the confidence limits is proportional to how confident

we are (i.e., the uncertainty) in our measurements. In general, propagating errors requires us to

Number of Qcrit

observations 90% confidence 95% confidence 99% confidence

3 0.941 0.970 0.994

4 0.765 0.829 0.926

5 0.642 0.710 0.821

6 0.560 0.625 0.7407 0.507 0.568 0.680

8 0.468 0.526 0.634

9 0.437 0.493 0.598

10 0.412 0.466 0.568

Table 1.2: Qcrit values for rejecting data from a small set of data.



33/103


34/103


map our experimental error distribution curve through our calculation into an error distribution

curve for our calculated variable, and then find the interval in our calculated variable that gives usthe area that corresponds to our desired confidence. In Fig. 1.18 is an example where a Gaussian

experimental error distribution curve is mapped through a linear function to obtain a calculated

error distribution. In this example, the shape of the calculated error distribution is also Gaussian.

In Fig. 1.19 is another example where a Gaussian experimental error distribution curve is

mapped through a nonlinear function to obtain a calculated error distribution that is highly asym-

metric, and cannot be characterized by just a mean and variance. Clearly, it is important to realize

that the shape of the error distribution curve for the calculated variable depends on the functional

relationship between the experimentally measured variables and the calculated variables. This last

example would be one of the worst cases for error propagation. With this caveat, lets look at error

propagation and assume that the error distributions are Gaussian and map through the calculations

without too significant distortions in shape.

Previously we defined the uncertainty in a measured value in terms of its standard deviation.

We can do the same here,

2y(u , v , . . .) = limN

1

N

i

{y(ui, vi, . . .) y(u , v , . . .).}2

(1.56)

Here ui, vi, . . . are the experimental measurements. For example, we measured ui = 2.25, 2.05, 1.85, 1.79,

and 2.12, which have a mean of u = 2.01 and a variance of s2u = 0.03632. What is s2y(u) if

y(u) = 4u + 3? If we wanted to use Eq. 1.56 directly we would first calculate y(ui).

ui yi

2.25 12.00

2.05 11.20

1.85 10.40

1.79 10.16

2.12 11.48

Then we calculate the average y = 11.048 and a standard deviation s2y = 0.581.

Alternatively we could do a Taylor Series expansion of y(u , v , . . .) around y(u , v , . . .) and use it

in our expression for 2y .

y(u , v , . . .) = y(u , v , . . .) + (ui u) dy(u , v , . . .)du

+ (vi v) dy(u , v , . . .)dv

+ + higher-order terms

Assumingdy(u , v , . . .)

duand

dy(u , v , . . .)

dvare not near zero, as they were in our parabola example

earlier, then we can neglect higher-order terms, and write



35/103

1.5. PROPAGATION OF ERRORS 35

2y(u , v , . . .) = limN

1N

i

(ui u) dy(u , v , . . .)

du+ (vi v) dy(u , v , . . .)

dv+

2.

Expanding the squared term in brackets we obtain

2y(u,v , . . .) = lim

N

1

N

i

(ui u)

2

dy

du

2+ (vi v)

2

dy

dv

2+ + 2(ui u)(vi v)

dy

du

dy

dv

+ .

.

Breaking this into individual sums we have

2y(u , v , . . .) = limN

1

Ni(ui u)2

dy

du2

+ limN

1

Ni(vi v)2

dy

dv2

+

+ limN

1

N

i

2(ui u)(vi v)

dy

du

dy

dv

+ .

Finally we can write

2y(u , v , . . .) = 2u

dy

du

2+ 2v

dy

dv

2+ + 22uv

dy

du

dy

dv

+ . (1.57)

This is the error propagation equation. If ui, vi, . . . are all uncorrelated then 2uv = 0, and the

error propagation equation simplifies to

2y(u , v , . . .) = 2u

dy

du

2+ 2v

dy

dv

2+ . (1.58)

Thus, in our earlier example where y = 4u + 3, we can use the error propagation equation to

relate the variance in u to the variance in y. That is,

s2y = 16 s2u = 0.581.

Now lets look at some specific functions y(u , v , . . .) and see how the error propagation equation

is applied.

Simple Sums and Differences:

y = u a here a = constant.dy

du= 1

2y = 2u



36/103


Weighted Sums and Differences:

y = au bv here a, b = constants.

dy

du= a and

dy

dv= b

2y = a22u + b

22v 2ab2uv

Multiplication and Division:

y = auw

dydu

= aw and dydw

= au

2y = a2w22u + a

2u22w + 2a2uw2uw

If we divide both sides by y2 we get something that might be more familiar, that is

yy

2=

uu

2+

ww

2+ 2

uwuw

2

y/y is called the relative error in y, while y is the absolute error in y.

Powers:

y = aub

dy

du= abu(b1) = by

u

2y =2uu2

b2y2 or

yy

2= b2

2uu2

Exponentials:

y = aebu

dy

du= abebu = by

2y = 2u b2y2 or

yy

2= b22u



37/103

1.6. CONFIDENCE LIMITS 37

Logarithms:

y = a ln(bu)dy

du= a

bu(b) = a

u

2y = 2u

a2

u2= a2

2uu2

1.6 Confidence Limits

As we learned earlier, once you know the parent distribution governing the errors in your measure-

ments you can calculate the probability that a given measurement will lie within a certain intervalof possible values. That is,

P(x, x+) =

x+x

p(x) dx, (1.59)

where P(x, x+) is the probability that the measured value x lies between x and x+. The interval

between x and x+ is called the confidence interval and x and x+ are called confidence

limits.

Based on the Central Limit theorem we can often assume that our parent distribution is Gaus-

sian. In certain situations you may also know the variance of this Gaussian distribution but not the

mean. For example, most analytical balance manufacturers will print on the front of the balance

the standard deviation () for any measurement made with the balance. Assuming the distributionof errors are Gaussian, then given a single measurement, x, we can estimate that the true mean

to lie within the interval

= x z, (1.60)with a percent confidence determined by z, as given in Table 1.1.

If we performed the measurement twice we can use the average of the two measurements as a

better estimate for , but how do we determine the confidence limits in this case? We use the error

propagation equation. Given

x =x1 + x2

2, (1.61)

we can calculate the variance in x given the variances in x1 and x2,

2x =

1

2

221 +

1

2

222. (1.62)

Assuming 21 = 22 =

2 we obtain

2x = 2/2 or x = /

2. (1.63)



38/103


p(t ,)

= 10

= 1

= 2

= 4

= 3

= 6

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 60.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

t

Gaussian

Figure 1.20: Students t distribution for = 1,2,3,4,6, and 10. The unit Gaussian is also shown.

In general, after N measurements the standard deviation of the mean is given by

x = /

N. (1.64)

Therefore the confidence limits of the mean after N measurements is given by

= x z/

N. (1.65)

Note that this applies only if there are no systematic errors (or bias).

1.6.1 Students t-distribution

A problem with these equations is that they assume we already know , which requires an infinite

number of measurements. For a finite number of measurements we only have the experimental

variance s2. In the case where our distribution was Gaussian with variance 2, then we knew that

the distribution in z = ( x)/ had a zero mean and unit variance. If we know our distributionis Gaussian but only know s2, then we need to know the distribution in t = ( x)/s. Thisdistribution was derived by William Gossett and is called Students t distribution. It is

p(t, ) =((+ 1)/2)

(/2) 1

1 + (t2/))(+1)/2(1.66)



39/103

1.6. CONFIDENCE LIMITS 39

Degrees of Freedom

= N 1 t(95%) t(99%)1 12.7 63.7

2 4.30 9.92

3 3.18 5.84

4 2.78 4.60

5 2.57 4.03

6 2.45 3.71

7 2.36 3.50

10 2.23 3.17

30 2.04 2.7560 2.00 2.66

1.96 2.58

Table 1.3: Students t values as a function of for 95 and 99 Percent Confidence (i.e., 95 and 99

Percent of the Area under Students t distribution).

In contrast to the z-distribution this expression depends on t and , the number of degrees of

freedom. As expected, in the limit that N then t z. In table 1.3 are the values for t as afunction of the degrees of freedom (i.e., = N

1) for the 95% and 99% confidence limits. Using

Students t distribution we can use the average of N measurements and estimate the true mean

to lie within the interval

= x ts/

N, (1.67)

with a percent confidence determined by t and , as given in Table 1.3. From this table you will

notice that for a given confidence limit the t inflates the confidence limits to take into account a

finite number of measurements.

Lets look at an example. A chemist obtained the following data for the alcohol content of a

sample of blood: 0.084%, 0.089%, and 0.079%. How would he calculate the 95% confidence limits

for the mean assuming there is no additional knowledge about the precision of the method? First

he would calculate the mean

x = (0.084% + 0.089% + 0.079%)/3 = 0.084%,

and then the standard deviation

s =

(0.084% 0.084%)2 + (0.089% 0.084%)2 + (0.079% 0.084%)2

2= 0.0050%.



40/103


Then for 2 degrees of freedom he uses t = 4.30 to calculate the 95% confidence limits for the mean

as = 0.084% 0.012%.

How would the calculation change if on the basis of previous experience he knew that s =0.005% (i.e., he know the parent distribution variance)? In this case he would use z = 1.96 instead

of t = 4.30, and would calculate

= 0.084% 0.006%for the 95% confidence limits.

1.7 The Two Sample Problem

1.7.1 The t-test

1.7.1.1 Comparing a measured mean to a true value

The best way to detect a systematic error in an experimental method of analysis is to analyze

a standard reference material and compare the results with the known true value. If there is a

difference between your measured mean and the true value how would you know if its due to

random error or to some reproducible systematic error? Using the expression

= x

ts/

N (1.68)

we can rewrite it as

x = ts/

N Critical Value

. (1.69)

If | x| is less than or equal to ts/N then the difference is statistically insignificant at thepercent confidence determined by t. Otherwise the difference will be statistically significant.

A procedure for determination of sulfur in kerosenes was tested on a sample known from

preparation to contain 0.123% S. The results were %S= 0.112, 0.118, 0.115, and 0.119.

Is there a systematic error (or bias) in the method?

The first step is the calculate the mean.

x =0.112 + 0.118 + 0.115 + 0.119

4= 0.116%S

Next we calculate the difference between our mean and the true value

x = 0.116 0.123 = 0.007%S.



41/103

1.7. THE TWO SAMPLE PROBLEM 41

The experimental standard deviation is

s =

(0.112 0.116)2 + (0.118 0.116)2 + (0.115 0.116)2 + (0.119 0.116)23

= 0.0032.

At 95% confidence t has a value of 3.18 for N = 4. The critical value is

ts/

N = (3.18)(0.0032%)/

4 = 0.0051%S.

Since the difference between our mean and the true value (0.007% S) is greater than

the critical value (0.0051% S) we conclude that a systematic error is present with 95%

confidence.

1.7.1.2 Comparing two measured means

Often chemical analyses are used to determine whether two materials are identical. Then the

question arises . . . How can you know if the difference in the mean of two sets of (allegedy identical)

analyses is either real and constitutes evidence that the samples are different or simply due to

random errors in the two data sets.

To solve this problem we use the same approach as in the last section. The difference between

the two experimental means can be compared to a critical value to determine if the difference is

statistically significant.

x1

x2 =

tspooled

N1 + N2

N1N2 Critical Value

, (1.70)

where

spooled =

s21(n1 1) + s22(n2 1)

n1 + n2 2 . (1.71)

As before if |x1 x2| is less than or equal to the critical value then the difference is probablydue to random error, i.e., the sample are probably the same. If, however, |x1 x2| is greater thanthe critical value then the difference is probably real, i.e., the samples are not identical.

Two wine bottles were analyzed for alcohol content to determine whether they were

from difference sources. Six analyses of the first bottle gave a mean alcohol contentof 12.61% and s = 0.0721%. Four analyses of the second bottle gave a mean alcohol

content of 12.53% and s = 0.0782%.

First we calculate spooled.

spooled =

5 (0.0721)2 + 3 (0.0782)2

8= 0.0744%.



42/103


At 95% confidence t = 2.31 (for 8 degrees of freedom, i.e., (6 1 ) + ( 4 1) = 8). Then

tspooled

N1 + N2

N1N2= (2.31)(0.0744%)

6 + 4

6 4 = 0.111%.

Since x1 x2 = 12.61 12.53 = 0.08% we conclude that at the 95% confidence levelthere is no difference in the alcohol content of the two wine bottles.

1.7.2 Comparing Variances - The F-test

Sometimes you need to compare two measured variances. For example, is one method of analysis

is more precise than another? Or, are the differences between the variances for the two methods of

analysis statistically significant?To answer these types of questions we use the F-test. F is the ratio of the variance of sampling

A with A = NA 1 degrees of freedom and sampling B with B = NB 1 degrees of freedom,that is

Fexp =s2As2B

. (1.72)

In the limit of A and B then

F =2A2B

. (1.73)

p (F)

0 1 2 3 4 5 6 7 8 9 100.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

F

( = 3, = 5)

( = 10, = 12)

Figure 1.21: F distribution for A = 3 and B = 5.



43/103


An F value of 1 indicates that the two variances are the same, whereas F values differing from one

indicates that the two variances are different. When A and B are finite we will, of course, haverandom errors such that even if the variances are identical we would often expect to obtain Fexp

values different from 1. How likely is it that Fexp differs from one when A = B? That depends

on the shape of the distribution of F values. The parent distribution for F values is given by

p(F, A, B) =((A + B)/2)

(A/2)(B/2)

AA BB

F(A/2)1

(B + AF)(A+B)/2). (1.74)

A plot of this distribution is given in Fig. 1.21. As you can see, for A = 10 and B = 12 an F value

of 2 is not that unlikely. The basic approach, then, is to integrate the F distribution from infinity

down to some critical F value to define the range of F values that cover, say the top 5% worst

F-values. It is common to use tables that will give you the critical F value (limit) that correspondsto a given probability. In in Tables 1.5 and 1.4 are tables of critical F values that correspond to

2.5% and 5% significance.

Sometimes you will see two types of F tables called the one- and two-tailed F distribution

tables. If you are asking if the variance in A is greater than the variance in B? or if the variance

in B is greater than the variance in A? then you will use the one-tailed F distribution table. On

the other hand, if you are asking if the variance in A is not equal to the variance in B, then you

would use the two-tailed distribution table. In most reference texts, however, you will only find

the one-tailed F distribution table because the values for the two-tailed F distribution table can be

obtained from the one-tailed table. That is,

Pone-tailed = Ptwo-tailed/2. (1.75)

That is, the one tailed F table for 5% probability is the same as the two tailed F table for 10%

probability.

Lets look at an example where we ask if the variance in A is not equal to the variance in B.

Two chemists determined the purity of a crude sample of ethyl acetate. Chemist A made

6 measurements with a variance ofs2A = 0.00428 and Chemist B made 4 measurements

with a variance of s2B = 0.00983. Can we be 95% confident that these two chemists

differ in their measurement precision?In this case we are not asking if A is better than B, but only if A and B are different.

Therefore we use the two-tailed F distribution table. When calculating the experimental

ratio Fexp, the convention is to define s2A to be the large of the two variances s

2A and

s2B. With this definition Fexp will always be greater than 1. So we have

Fexp = 0.00983/0.00428 = 2.30.



44/103


From the two-tailed table with A = 3, B = 5, and P = 0.05 we find Fcrit = 7.764.

Since 2.30 < 7.764 we conclude that the F test shows no difference at the 95% confidencelevel.

Lets look at another example.

A water sample is analyzed for silicon content by two methods, one is supposed to

have improved precision. The original method gives xA = 143.6 and s2A = 66.8 after 5

measurments. The newer method gives xB = 149.0 and s2B = 8.50 after 5 measurments.

Is the new method an improvement over the original at 95% confidence (i.e., less than

5% chance that theyre the same.).

First we calculate the F value from the measurements.

Fexp = 66.8/8.5 = 7.86.

Then we use the one-tailed table and find Fcrit = 6.388 for A = 5 1 = 4 andB = 5 1 = 4. Since 7.86 > 6.388 we conclude that we are 95% confident that thenew method is better than the original.

1 =1 2 3 4 5 6 7 8 9 10 12 15 20

2

1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 243.9 245.9 248.0 254.32 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.41 19.43 19.45 19.50

3 10.13 9.552 9.277 9.117 9.013 8.941 8.887 8.845 8.812 8.786 8.745 8.703 8.660 8.53

4 7.709 6.944 6.591 6.388 6.256 6.163 6.094 6.041 5.999 5.964 5.912 5.858 5.803 5.63

5 6.608 5.786 5.409 5.192 5.050 4.950 4.876 4.818 4.772 4.735 4.678 4.619 4.558 4.36

6 5.987 5.143 4.757 4.535 4.387 4.284 4.207 4.147 4.099 4.060 4.000 3.938 3.874 3.67

7 5.591 4.737 4.347 4.120 3.972 3.866 3.787 3.726 3.677 3.637 3.575 3.511 3.445 3.23

8 5.318 4.459 4.066 3.838 3.687 3.581 3.500 3.438 3.388 3.347 3.284 3.218 3.150 2.93

9 5.117 4.256 3.863 3.633 3.482 3.374 3.293 3.230 3.179 3.137 3.073 3.006 2.936 2.71

10 4.965 4.103 3.708 3.478 3.326 3.217 3.135 3.072 3.020 2.978 2.913 2.845 2.774 2.54

12 4.747 3.885 3.490 3.259 3.106 2.996 2.913 2.849 2.796 2.753 2.687 2.617 2.544 2.30

15 4.543 3.682 3.287 3.056 2.901 2.790 2.707 2.641 2.588 2.544 2.475 2.403 2.328 2.07

20 4.351 3.493 3.098 2.866 2.711 2.599 2.514 2.447 2.393 2.348 2.278 2.203 2.124 1.84

Table 1.4: Critical values of F for a one-tailed test P = 0.05 (or two-tailed test with P = 0.10). 1

is the number of degrees of freedom of the numerator and 2 is the number of degrees of freedom of

the denominator.



45/103


1 =1 2 3 4 5 6 7 8 9 10 12 15 20

2

1 647.8 799.5 864.2 899.6 921.8 937.1 948.2 956.7 963.3 968.6 976.7 984.9 993.1 1018

2 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39 39.40 39.41 39.43 39.45 39.50

3 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47 14.42 14.34 14.25 14.17 13.90

4 12.22 10.65 9.979 9.605 9.364 9.197 9.074 8.980 8.905 8.844 8.751 8.657 8.560 8.26

5 10.01 8.434 7.764 7.388 7.146 6.978 6.853 6.757 6.681 6.619 6.525 6.428 6.329 6.02

6 8.813 7.260 6.599 6.227 5.988 5.820 5.695 5.600 5.523 5.461 5.366 5.269 5.168 4.85

7 8.073 6.542 5.890 5.523 5.285 5.119 4.995 4.899 4.823 4.761 4.666 4.568 4.467 4.14

8 7.571 6.059 5.416 5.053 4.817 4.652 4.529 4.433 4.357 4.295 4.200 4.101 3.999 3.67

9 7.209 5.715 5.078 4.718 4.484 4.320 4.197 4.102 4.026 3.964 3.868 3.769 3.667 3.33

10 6.937 5.456 4.826 4.468 4.236 4.072 3.950 3.855 3.779 3.717 3.621 3.522 3.419 3.08

12 6.554 5.096 4.474 4.121 3.891 3.728 3.607 3.512 3.436 3.374 3.277 3.177 3.073 2.72

15 6.200 4.765 4.153 3.804 3.576 3.415 3.293 3.199 3.123 3.060 2.963 2.862 2.756 2.40

20 5.871 4.461 3.859 3.515 3.289 3.128 3.007 2.913 2.837 2.774 2.676 2.573 2.464 2.09

Table 1.5: Critical values of F for a one-tailed test P = 0.025 (or two-tailed test with P = 0.05).

1 is the number of degrees of freedom of the numerator and 2 is the number of degrees of freedom

of the denominator.



46/103


1.8 Problems

1. Why is it often assumed that experimental random errors follow a Gaussian parent distribu-

tion?

2. In what type of experiments would you expect random errors to follow a Poisson parent

distribution?

3. Indicate whether the Skewness and Kurtosis are positive, negative, or zero for the following

distribution.

RelativeFrequency

Measured Value

4. Describe the three classifications of systematic errors and what can be done to minimize eachone.

5. Describe the Q-test. When is it reasonable to reject data from a small set of data?

6. (a) Give a general explanation of confidence limits. (b) How would you find the 99% confidence

limits for a measured parameter that had the distribution in question 3 above?

7. The error propagation equation is

2

y(u , v , . . .) = 2

u yu2

+ 2

v yv2

+ 2

uv yuyv + Explain the assumptions involved in using this equation to relate the variances in u, v, . . . to

the variance in y.

8. The mean is defined as the first moment about the origin. Explain why the variance is defined

as the second moment about the mean and not the second moment about the origin.



47/103

1.8. PROBLEMS 47

9. An negatively charged oxygen is coordinated by 4 potassium cations. 50% of the potassium

cations are ion exchanged out and replaced by sodium cations. Assuming that the sodi-ums simply replace potassium cations randomly, what is the probability that the oxygen is

coordinated by four potassiums in the ion exchanged material?

10. We discussed two equations for obtaining the confidence limits for the mean after N mea-

surements of a variable. They were

= x zN

,

and

= x tsN

.

If you had measured a concentration in triplicate, which equation would you use and why?

Describe a situation where you could correctly use the other equation for your triplicate

measurements.

11. In the 2 test of a distribution explain what value is used for the variance of the bin heights

and why.

12. In your own words describe the Central Limit theorem and one experimental situation where

the experimental distribution of noise is not described by this theorem.

13. Given the mean and standard deviation from the parent distribution in question 3, would

it be correct if you calculated the 50% confidence limits for a single measurement using the

equation = x .67? Explain why or why not.

14. Draw a distribution that has a positive skewness.

15. Draw a distribution that is leptokurtic.

16. If the probability of getting an A in this course is 38%, what is the probability that 15 people

will receive an A if the class has 30 students enrolled?

17. (a) Explain the conditions under which the Poisson distribution is a good approximation for

the binomial distribution. (b) Explain the conditions under which the Gaussian distribution

is a good approximation for the binomial distribution.

18. If you had to guess which parent distribution governs the errors in your experiment before

you made a single measurement, which distribution would you guess and why?



48/103


19. When using the chi-squared test to compare distributions A and B below, how many degrees

of freedom would you use to calculate 2 and why?

8

6

4

2

0

(A) (B)

Probability

Density

30.8 30.9 31.0 31.1 31.2 31.3 31.4

8

6

4

2

0

Probability

Density

30.8 30.9 31.0 31.1 31.2 31.3 31.4

20. Given the following measurements 2.12, 2.20, 2.12, 2.39, and 2.25, use the Q-test and deter-

mine which, if any, of these numbers can be rejected at the 90% confidence level.

21. Seven measurements of the pH of a buffer solution gave the following results:

5.12, 5.20, 5.15, 5.17, 5.16, 5.19, 5.15

Calculate (i) the 95% and (ii) 99% confidence limits for the true pH. (Assume that there are

no systematic errors.)

22. The solubility product of barium sulfate is 1.31010, with a standard deviation of 0.11010.Calculate the standard deviation of the calculated solubility of barium sulfate in water.

23. A 0.1 M solution of acid was used to titrate 10 mL of 0.1 M solution of alkali and the following

volumes of acid were recorded:

9.88, 10.18, 10.23, 10.39, 10.25 mL

Calculate the 95% confidence limits of the mean and use them to decide if there is any evidence

of systematic error.

24. Seven measurements of the concentration of thiol in a blood lysate gave:

1.84, 1.92, 1.94, 1.92, 1.85, 1.91, 2.07

Verify that 2.07 is not an outlier.



49/103

1.8. PROBLEMS 49

25. The following data give the recovery of bromide from spiked samples of vegetable matter,

measured by using a gas-liquid chromatographic method. The same amount of bromide wasadded to each specimen.

Tomato: 777 790 759 790 770 758 764 g/g

Cucumber: 782 773 778 765 789 797 782 g/g

(a) Test whether the recoveries from the two vegetables have variances which differ signifi-

cantly.

(b) Test whether the mean recovery rates differ significantly.

26. Calculate the probability of a rolling a one exactly 120 times in 720 tosses of a die.

27. Explain the circumstances under which the Students t distribution is used, and its relation-

ship to the Gaussian distribution.

28. (a) Give a general explanation of confidence limits. (b) How would you find the 99% confidence

limits for a measured parameter that had the distribution in question 3?

29. Give a real life example of a bivariate data set expected to have a linear correlation coefficient

rxy = 2xyxy near zero and an example o

Download - Statistical Methods in Experimental Chemistry

Top Related