statistics for physicists - cuso...f. james (cern)statistics for physicists, 1: basic...

53
Statistics for Physicists Lectures Thursdays at 9:00, 18 Sept. to 11 Dec. 2014 F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 1 / 52

Upload: others

Post on 10-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Statistics for Physicists

Lectures Thursdays at 9:00,

18 Sept. to 11 Dec. 2014

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 1 / 52

Page 2: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 2 / 52

Page 3: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Basic Concepts Introduction

A Statistician’s view of a physics experiment:

P(data|hyp) is always known.

It is the function (or algorithm) which describes the experiment.It gives the probability of observing data when the laws of physics aregiven by the hypothesis hyp. It describes the forward process (hyp→data).

forward process (hyp→data) occurs in real expt. (hyp true but unknown)forward process (hyp→data) occurs in simulation (hyp known but untrue)

Data are random: Repeat the experiment,the data will be different.

Hypothesis is NOT random: The mass of the Higgs is fixed,even if it is unknown.

Let’s look at three examples of a P(data|hyp).

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 3 / 52

Page 4: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Basic Concepts Introduction

Three examples of a P(data|hyp).

Ex 1. A counting experiment (proton decay):

P(data|hyp) = P(n|µ) = e−µµn/n!

where n is the observed number of decays (an integer), andµ is the expected number = decay rate × number of protons × time.

Ex 2. Measuring a particle mass from an invariant mass distribution:

f (X |µ) = N(µ, σ2) =1

σ√

2πexp

[−1

2

(X − µ)2

σ2

]where X are the measured mass values, µ is the true value, and σ is thewidth of the Gaussian resolution. Now the data X are continuous, so f isa probability density function (pdf).

Ex 3. In a big experiment, P(data|hyp) is given by the Monte Carlo.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 4 / 52

Page 5: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Basic Concepts Introduction

The backward process (data→hyp.) is called Statistics

There are two different ways of inverting the forward reasoning to dostatistics:

The Bayesian way, and the Frequentist way.

For both ways, the methods will depend on the kind of hypothesisinvolved:

1. Measuring a parameter: Point Estimation

2. Finding the error on the above: Interval Estimation

3. Comparing two hypotheses: Hypothesis Testing

4. Testing one hypothesis: Goodness-of-fit Testing (Frequentist only)

5. Making Decisions: Decision Theory (Bayesian only)

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 5 / 52

Page 6: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Basic Concepts Introduction

How the course is organized:The first half of the course will consist of about one day (week) on each ofthese main topics:

1. Basic Concepts: Probability, random variables, distributions,convergence, law of large numbers, Central Limit Theorem, etc.

2. Point Estimation

3. Interval Estimation

4. Hypothesis Testing

5. Goodness-of-Fit Testing

6. Decision Theory

The second half will will be devoted to applications of the above theory,including Monte Carlo methods and pseudorandom number generation.

There is also a book: Statistical Methods in Experimental Physics(second edition, 2006), by F. James.

This is the second edition of Statistical Methods in Experimental Physics(1971), by Eadie, Drijard, James, Roos and Sadoulet.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 6 / 52

Page 7: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Basic Concepts Introduction

The slides for the course

will be uploaded to the site :physique.cuso.ch/notes-de-cours/cours-communs

file name subject lecture date

james.chap1.pdf Basic Concepts Sept. 18james.chap2.pdf Point Estimation Sept. 25james.chap3.1.pdf Interval Estimation, Part 1 Oct. 2james.chap4.pdf Tests of Hypotheses Oct. 9james.chap3.2.pdf Interval Estimation, Part 2 .james.chap5.pdf Goodness-of-fit .james.chap6.pdf Decision Theory .james.chap7.pdf . .. . .. . .

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 7 / 52

Page 8: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Definitions

Probability

All statistical methods are based on calculations of probability.We will define three different kinds of probability:

I Mathematical probability is an abstract concept which obeys theKolmogorov axioms, and is defined by those axioms alone.It cannot be measured.

We will need a specific operational definition that allows us to measureprobability. There are two such definitions we will use:

I Frequentist probability is defined as the limiting frequency offavourable outcomes in a large number of identical experiments.

I Bayesian probability is defined as the degree of belief in a favourableoutcome of a single experiment.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 8 / 52

Page 9: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Definitions

Mathematical Probability

The Theory of Probability was properly formalized as a branch ofmathematics only in 1930 by Kolmogorov.

Let the set Xi ∈ Ω be exclusive events:(one and only one Xi can occur).

Then P(Xi ), is a probability if it satisfies the Kolmogorov axioms:

I (a) P(Xi ) ≥ 0 for all i .

I (b) P(Xi or Xj) = P(Xi ) + P(Xj)

I (c)∑

Ω P(Xi ) = 1.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 9 / 52

Page 10: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Definitions

Mathematical Probability 2

Some basic properties that follow directly from the axioms:

I If A is certain, then P(A) = 1.

I If A cannot happen, then P(A) = 0.

However, surprisingly:

I If P(A) = 1, then A is not necessarily certain.It may be almost certain.

Example: Let R be a real number drawn randomly between zero and one.Then P(R 6= 1/2) = 1, but it is only almost certain, since R could be1/2, with probability zero!

But mathematical probability is an abstract concept. It cannot bemeasured. We need a probability with an operational definition.There are two such definitions of probability: Frequentist and Bayesian.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 10 / 52

Page 11: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Definitions

Frequentist Probability

First defined by John Venn in 1866.

The frequentist probability of an event A is defined as the number of timesA occurs, divided by the total number of trials N, in the limit of a largenumber of identical trials:

P(A) = limN→∞N(A)

N

where A occurs N(A) times in N trials.

Frequentist probability is used in most scientific work, because it isobjective. It can (in principle) be determined to any desired accuracy andis the same for all observers.

It is the probability of Quantum Mechanics.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 11 / 52

Page 12: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Definitions

Frequentist Probability 2

Objection to the definition of Frequentist probability:It requires an infinite number of experiments.

Objection overruled: Many scientific concepts are defined as limits.

For example the electric field:

~E = limq→0

~F

q

where ~F is the force due to the field on the charge q.

Since the charge disturbs the field, it has to be infinitesimally small. In thiscase, the limit is not even possible because charge is quantized, but stillthe definition is perfectly valid.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 12 / 52

Page 13: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Definitions

Frequentist Probability 3

However, Frequentist Probability has an important limitation:

It can only be applied to repeatable phenomena.

Objection There are no repeatable phenomena, sinceno experiment can be repeated exactly,for example the age of the universe will be greater the second time.Objection overruled: We will not apply it to phenomena that dependcritically on the age of the universe.

Conclusion: Most scientific work involves repeatable phenomena, andfrequentist probability is well defined for this work.But we need in addition a more general kind of probability if we want toapply it to non-repeatable phenomena, for example decisions.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 13 / 52

Page 14: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Definitions

Bayesian Probability

For phenomena that are not repeatable, we need a more general definition.(for example, in order to make a decision, we may want to know theprobability that it will rain tomorrow)

Bayesian Probability is not easy to define.All the definitions I have seen are based on the observer’s degree of beliefthat A will happen.This probability is therefore subjective as it depends on the observer.

The Bayesian probability of A is therefore not only a property of A, butdepends also on the state of knowledge and beliefs of the observer, and itwill in general change with time as the observer gains more knowledge.

We cannot verify if the Bayesian probability is “correct” by countingfrequencies.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 14 / 52

Page 15: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Definitions

de Finetti’s definition of Probability

Bruno de Finetti (1906-1985) was the foremost Bayesian theoretician, andis responsible for the logical and mathematical foundations of Bayesianstatistics (which he called simply Probability Theory), including thedefinition of Bayesian probability.

The operational definition of Bayesian Probability is based onthe coherent bet of de Finetti. (around 1930)

The probability of A is defined as the amount the observer would bet on A,if he receives one unit when A happens, and no units if A doesn’t happen.

In order to make the bet coherent, the observer must agree to bet eitherfor or against A happening, at the same odds.

De Finetti’s probability satisfies the axioms of Kolmogorov.But there are problems, such as whether the value of money is linear.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 15 / 52

Page 16: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Properties

Some Properties of (any) ProbabilityP(A or B) means A or B or bothP(A and B) means both A and B

From the Venn diagram, we see that:

P(A or B) = P(A) + P(B)− P(A and B)

Conditional probability: P(A|B) means the probability that A is true, giventhat B is true.

If A and B are independent, then P(A|B) = P(A) .

Example of conditional probability: HB is a human being.A: HB is pregnant.B: HB is a woman.

Then: P(A|B) ≈ 1%P(B|A) = 1.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 16 / 52

Page 17: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Properties

Bayes’ Theorem

Rev. Thomas Bayes, published posthumously in 1746.

Bayes’ Theorem says that the probability of both A and B being truesimultaneously can be written:

P(A and B) = P(A|B)P(B) = P(B|A)P(A)

which implies:

P(B|A) =P(A|B)P(B)

P(A)

which can be written:

P(B|A) =P(A|B)P(B)

P(A|B)P(B) + P(A|notB)P(notB)

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 17 / 52

Page 18: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Properties

An Application of Bayes’ Theorem

Suppose we have a particle ID detector designed to identify K particles,with the property that if a K hits the detector, the probability that it willproduce a positive pulse (T+) is 0.9 :

P(T+ |K) = 0.9 [90% acceptance]

and 1% if some other particle goes through:

P(T+ |not K) = 0.01 [1% background]

Now a particle gives a positive pulse. What is the probability that it is aK? The answer by Bayes’ Theorem:

P(K|T+) =P(T+ |K)P(K)

P(T+ |K)P(K) + P(T+ |not K)P(not K)

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 18 / 52

Page 19: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Properties

Bayes Prior Probability

So the answer depends on the Prior Probability of the particle being a K,that is, the proportion of K in the beam.Let us consider two possibilities:P(K) is 1% , and P(K) = 10−6

We would get the following probabilities:

K in beam K = 1% K = 10−6

P(K|T+) 0.48 10−4

P(K|T−) 0.001 10−7

We have learned that:• Prior Probability is very important.• Bayes’ Theorem is useful in non-Bayesian analysis.• This detector is not very useful if P(K) is small.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 19 / 52

Page 20: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Other Definitions

Other Fundamental ConceptsThe Hypothesis is what we want to test, verify, measure.

Examples: H: The standard model is correct.H: The mass of the proton is mp (continuous range of hypotheses).H: The tau neutrino is massless.

A Random Variable is a variable which will take on different values if theexperiment is repeated, unpredictable except in probability:

P(data|hypothesis)

is assumed known, provided any unknowns in the hypothesis are givensome assumed values. Example: for a Poisson process

P(N|µ) =e−µµN

N!

where the possible values of the data N are discrete, andµ is the parameter of interest (the hypothesis).

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 20 / 52

Page 21: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Other Definitions

Other Fundamental Concepts: pdf

When the data are continuous, the probability P becomes aProbability Density Function, or pdf, as in:

dP(x | µ, σ) =1√

2πσ2e

−(x−µ)2

2σ2 dx ,

which we write: P(x | µ, σ) =1√

2πσ2e

−(x−µ)2

2σ2 ,

where µ is the true value of the quantity being measured, x is themeasured value, and σ is a parameter, the width of the Gaussian.

In the above example, µ is the parameter of interest (what we are tryingto measure), and σ, if it is not known, is a Nuisance parameter: anunknown whose value does not interest us, but is unfortunately necessaryfor the calculation of P(data|hypothesis).

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 21 / 52

Page 22: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Other Definitions

The Likelihood Function

If in P(data|hypothesis), we put in the values of the data observed in theexperiment, and consider the resulting function as a function of theunknown parameter(s) in the hypothesis, it becomes

P(data|hypothesis)|data obs. = L(hypothesis)

L is called the Likelihood Function.

R. A. Fisher, the first person to use it, knew that it was not a probability,so he called it a likelihood.It will turn out to have some important properties.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 22 / 52

Page 23: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Other Definitions

How Likelihoods and pdfs transformSuppose we wish to transform variables, either the data X → Y (X ) orthe parameters θ → τ(θ).

I For a likelihood function, the function values remain invariant, andone simply substitutes the transformed parameter values:L(τ) = L(τ(θ)).

I However, for a pdf, the invariant is the integrated probabilitybetween corresponding points, so one must in addition multiply bythe Jacobian of the transformation X → Y (X ) :

Pdf (X ) = J(X ,Y )Pdf (Y )

where the Jacobian J is just ∂X/∂Y in one dimension, and is thedeterminant of matrix of partial derivatives of Xi with respect to Yi inmany dimensions.

So the peaks and valleys in a likelihood are invariant,whereas the shape of a pdf can be transformed into anything.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 23 / 52

Page 24: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions

Probability Distributions: Expectation

Let X represent continuous data, and Xi discrete data, and in both casesthe entire space of the data is Ω.Let the X be distributed with pdf f (X ) if continuous, and with probabilityf (Xi ) if discrete.Then we define the expectation of a function g(X ) as:

E (g) =∫

Ω g(X )f (X )dX X continuous

E (g) =∑

Ω g(Xi )f (Xi ) X discrete

The expectation E is a linear operatorE [ag(X ) + bh(X )] = aE [g(X )] + bE [h(X )] .

The expectation of X is called the mean µ: µ =∫

Ω Xf (X )dX

Remember that f is always normalized:∫

Ω f (X )dX =∑

Ω f (Xi ) = 1

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 24 / 52

Page 25: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions

Probability Distributions: Variance

The expectation of the function (X − µ)2 is called the variance V (X ) ofthe density f (X ):

V (X ) = σ2 = E [(X − µ)2]

= E [X 2 − 2µX + µ2]

= E (X 2)− µ2

=

∫(X − µ)2f (X )dX .

The square root of the variance is the standard deviation σ.

The expectation and the variance do not always exist(Cauchy, Landau dist.).

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 25 / 52

Page 26: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions

Probability Distributions: Variance of a Sum

From the definition, the Variance of the sum of random variables is

V (αX + Y ) = α2V (X ) + V (Y ) + 2α cov(X ,Y )

where

cov(X ,Y ) = E [(X − µX )(Y − µY )] = E (XY )− E (X )E (Y )

and we define the correlation in terms of the covariance

corr(X ,Y ) = ρ(X ,Y ) =cov(X ,Y )

σXσY.

The correlation coefficient ρ satisfies −1 ≤ ρ ≤ 1 .

If X and Y are independent, they are also uncorrelated and ρ = 0.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 26 / 52

Page 27: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions

Probability Distributions: Variance of a RatioSuppose X and Y are independently distributed, with density functionsf (X ) and g(Y ), respectively. Let E (X ) = µX and V (X ) = σ2

X , andsimilarly for Y . We wish to consider the distribution of the randomvariable U = X/Y .

Approximate variance formula:

V(XY

)'(µXµY

)2[σ2X

µ2X

+σ2Y

µ2Y

− 2ρXYµX

σXσYµY

],

The above “rule of thumb” is well known, but may be very wrong.In particular, if f and g are Gaussian distributions, V (X/Y ) is infinite.

Unfortunately, Silverman et al. didn’t read our book. The above resultwas in our first edition and is also in the 2nd (pp.30-31), but these authorscalculated V (X/Y ) and got the wrong answer (that it was finite). TheirMC calculation agreed because it was also wrong.

See Cousins and James, Am J. Physics, 74,159 (Feb 2006)F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 27 / 52

Page 28: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

NOTES AND DISCUSSIONS

Comment on “The distribution of composite measurements:How to be certain of the uncertainties in what we measure,”by M. P. Silverman, W. Strange, and T. C. Lipscombe†Am. J. Phys. 72 „8…, 1068–1081 „2004…‡

Robert D. Cousinsa

Department of Physics and Astronomy, University of California, Los Angeles, California 90095

Frederick Jamesb

CERN, CH-1211 Geneva, Switzerland

Received 10 July 2005; accepted 1 November 2005DOI: 10.1119/1.2142798

Silverman et al.1 recently reviewed how to calculate theexact probability density function pdf of a function of sev-eral variables for which the individual pdf’s are known. Forthe ratio Z of two Gaussian normal variables X and Y, Ref.1 gives the full expression for the pdf of Z, denoted by pZzin Eq. 25b. This result is the same as that in Eadie et al.2

except for a missing factor of 1/2 in the last exponent of Eq.25b, not relevant to this comment. However, the values ofthe second moment of Z in Table 1 of Ref. 1 are finite,whereas, as Ref. 2 discusses the second moment of Z isinfinite.

The divergence of the second moment of Z is evident fromobserving that the expressions for pZz in Eq. 25 go as1/z2 for large z. Hence the integrand in the calculation ofthe second moment approaches a constant, and the integraldiverges. For the example in Ref. 1 in which the pdf’s for Xand Y are centered at zero, this divergence is expected be-cause pZz is a Cauchy density, which is well known to haveundefined mean and infinite variance. But it is perhaps lessexpected for the example in Ref. 1 in which Z=N10,1 /N5,1 that is, in which the denominator issampled from a Gaussian density in which the mean is fivestandard deviations away from zero. Reference 1 shows ex-cellent agreement with the predictions of a finite value forthe second moment of Z provided by several numericalmethods, including a Monte Carlo simulation with 50 000samples. Insight into how this apparent agreement can occurfor a second moment that in fact diverges can be obtainedfrom Fig. 4 in Ref. 1 in which the graphs of pZz and thecomputer-simulated histogram are in good agreement overthe range shown. When the second moment is calculated, allof the numerical methods in Ref. 1 miss the fact that al-though the integrated probability in the distant tails of pZzis very small, these tails make the second moment diverge.The example Z=N10,1 /N5,1 would have required manymore samples and a good Gaussian random number genera-tor in order for the Monte Carlo result to exhibit a divergenttrend, but if one chooses a smaller mean in the denominator,

say N3,1, the growth in the variance is already evidentwith 50 000 samples.

This example serves as a reminder of the pitfalls that canbe encountered when evaluating improper integrals numeri-cally. For example, in calculating the mean of the Cauchydensity, one can obtain zero which is the Cauchy principlevalue even though the integral is undefined because the con-tributions from the positive and negative values of the inte-grand each diverge. The divergence of the variance can bemissed if every last bit of the pdf is not included, as inRef. 1.

As noted in Ref. 1, care must be taken in interpreting theinterval formed from ±1 standard deviations in Z; for ex-ample, the rule that 68% of the sampled values of Z arewithin one standard deviation of the true mean is valid for aGaussian pZz but not in general. Indeed, for conveying asense of the width of pZz, the incorrect finite values inTable 1 are in some contexts more useful than the interval− , + based on the correct standard deviation. Still, us-ing the finite interval as even an approximate confidenceinterval, with a specified confidence level, requires greatcare. In some applications, eliminating the divergence in theratio by truncating the pdf of the denominator Y sufficientlyfar away from zero can be a physically reasonable procedureand more reasonable than assuming that the pdf for Y isexactly Gaussian out to many standard deviations. Thus,with care, the values in Table 1 can be interpreted as thecorrect answers to a differently posed problem, which couldbe a good approximation in a real physical situation.

aElectronic mail: [email protected] mail: [email protected]. P. Silverman, W. Strange, and T. C. Lipscombe, “The distribution ofcomposite measurements: How to be certain of the uncertainties in whatwe measure,” Am. J. Phys. 72, 1068–1081 2004.

2W. T. Eadie, D. Drijard, F. E. James, M. Roos, and B. Sadoulet, StatisticalMethods in Experimental Physics North-Holland, Amsterdam, 1971, pp.25–26.

159 159Am. J. Phys. 74 2, February 2006 http://aapt.org/ajp © 2006 American Association of Physics Teachers

Page 29: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions

Probability Distributions: Higher Moments

The mean and variance are just the first two moments of a probabilitydistribution. More generally, the nth algebraic moment of f (X ) is theexpectation of X n given the density f (X ). Other moments are thendefined as follows:

µ′n = E (X n) is the nth algebraic moment ,

µn = E[X − E (X )]n is the nth central moment ,

ν ′n = E (|X |n) is the nth absolute moment ,

νn = E [|X − E (X )|n] is the nth absolute central moment

In particular, the mean µ is the first algebraic moment, andthe variance is the second central moment of f (X ).The third and fourth moments, relative to the Normal distribution, arecalled the skewness and kurtosis.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 28 / 52

Page 30: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions

Probability Distributions – Characteristic FunctionThe Fourier transform of the probability density function is called thecharacteristic function of the random variable. It has many useful andimportant properties.

Given a random variable X with density f (X ), the characteristic functionis defined as:

φX (t) = E (e itX ) (t real)

=

∫ ∞−∞

e itX f (X )dX (X continuous)

=∑k

pkeitX k (X discrete) .

The characteristic function φX (t) determines completely the probabilitydistribution of the random variable. In particular,

f (X ) =1

∫ ∞−∞

φX (t)e−iXtdt .

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 29 / 52

Page 31: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions

Probability Distributions – Characteristic Function

The function φ(t) has the properties

φ(0) = 1 ; |φ(t)| ≤ 1

and it exists for all t.If a and b are constants, the characteristic function for aX + b is

φaX+b(t) = e ibtφX (at)

If X and Y are independent random variables, with characteristic functionsφX (t), φY (t), then the characteristic function of the sum (X + Y ) is

φX+Y (t) = φX (t) · φY (t) .

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 30 / 52

Page 32: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Discrete

Discrete Distributions: BinomialConsider the case where there are two possible outcomes(which we may call success and failure)for each of N trials.

The binomial distribution gives the probability of findingexactly r successes in N trials,

when the probability of success in each single trial is a constant, p.

The distribution ofthe number of events in a single bin of a histogram

is binomial(if the bin contents are independent,

but the total number of events N is fixed).

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 31 / 52

Page 33: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Discrete

Discrete Distributions: Binomial

Variable r , positive integer ≤ N.

Parameters N, positive integer.

p, 0 ≤ p ≤ 1.

Probability function P(r) =

(N

r

)pr (1− p)N−r ,

r = 0, 1, . . . ,N.

Expected value E (r) = Np.

Variance V (r) = Np(1− p).

Probability generating function G (Z ) = [pZ + (1− p)]N .

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 32 / 52

Page 34: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Discrete

Discrete Distributions: Multinomial

The generalization of the binomial distribution to the case of more thantwo possible outcomes of an experiment is called the multinomialdistribution.

It gives the probability of exactly ri outcomes of type i in N independenttrials, where the probability of outcome i in a single trial is pi ,i = 1, 2, . . . , k .

Note that, as with the binomial distribution, the total number of trials,N =

∑ki=1 ri is fixed.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 33 / 52

Page 35: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Discrete

Discrete Distributions: MultinomialVariable ri , i = 1, 2, . . . , k, positive integers ≤ N.Parameters N, positive integer.

k , positive integer.

p1 ≥ 0, p2 ≥ 0, . . . , pk ≥ 0,∑k

i=1 pi = 1

Probability function

P(r1, r2, . . . , rk) =N!

r1!r2! · · · rk !pr1

1 pr22 · · · p

rkk .

Expected values: E (ri ) = Npi .Variances: V (ri ) = Npi (1− pi ) .Covariances: cov(ri , rj) = −Npipj , i 6= j .Probability generating function

G (Z2, . . . ,Zk) = (p1 + p2Z2 + · · ·+ pkZk)N .

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 34 / 52

Page 36: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Discrete

Discrete Distributions: Poisson

The Poisson distribution gives the probability of finding exactly r events ina given length of time, if the events occur independently, at a constantrate.

It is a limiting case of the binomial distribution for p → 0 and N →∞,when Np = µ, a finite constant.

As µ→∞, the Poisson distribution converges to the Normal distribution.

If events occur randomly and independently in time, so that the number ofevents occuring in a fixed time interval is Poisson-distributed, then thetime between two successive events is exponentially distributed.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 35 / 52

Page 37: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Discrete

Discrete Distributions: Poisson

Variable r , positive integer.

Parameter µ, positive real number.

Probability function P(r) = µr e−µ

r !

Expected value E (r) = µ .

Variance V (r) = µ .

Skewness γ1 =1√µ.

Kurtosis γ2 =1

µ.

Probability generating function G (Z ) = eµ(Z−1)

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 36 / 52

Page 38: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Discrete

Continuous Distributions: Normal (Gaussian)

The most important theoretical distribution in statistics is the Normalprobability density function, or Gaussian, usually abbreviated N(µ, σ2). Itscumulative distribution is called the Normal probability integral or errorfunction. One may find in the literature several variations of the definitionof the error function.Note that the half-width of the pdf at half-height is not σ, but 1.176σ.The probability content of various intervals is given below:

P

(− 1.00 ≤ X − µ

σ≤ 1.00

)= 0.683

P

(− 1.64 ≤ X − µ

σ≤ 1.64

)= 0.90

P

(− 1.96 ≤ X − µ

σ≤ 1.96

)= 0.95

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 37 / 52

Page 39: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Continuous Distributions: Normal (Gaussian)Parameters µ, real.

σ, positive real number.Probability density function

f (X ) = N(µ, σ2) =1

σ√

2πexp

[−1

2

(X − µ)2

σ2

]Cumulative distribution

F (X ) = φ

(X − µσ

)where φ(Z ) =

1√2π

∫ Z

−∞e−

12x2dx .

Expected value E (X ) = µ .

Variance V (X ) = σ2 .

Characteristic function

φ(t) = exp

[itµ− 1

2t2σ2

].

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 38 / 52

Page 40: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Normal Distribution in many variablesVariables X, k-dimensional real vector.

Parameters µ, k-dimensional real vector.

∼V , k × k matrix, positive semi-definite.

Probability density function

f (X) =1

(2π)k/2| ∼V |12

exp

[− 1

2

(X− µ

)T∼V−1 (X− µ

)].

Expected values E (X) = µ .

Covariances cov(X) =∼V

V (Xi ) = Vii

cov(XiXj) = Vij , the (i , j)th element of ∼V .

Characteristic function: φ(t) = exp

[itµ− 1

2tT ∼V t

]F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 39 / 52

Page 41: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Continuous Distributions: Chi-square

Suppose that X1, . . . ,XN are independent, standard Normal variables,N(0,1).Then the sum of squares

X 2(N) =

N∑i=1

X 2i

is said to have a chi-square distribution χ2(N), with N degrees of freedom.

The pdf of the chi-square distribution was first derived by Karl Pearsonwhen he published his famous Chi-square Test (1900).

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 40 / 52

Page 42: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Continuous Distributions: Chi-square

Variable X , positive real number.

Parameter N, positive integer [”degrees of freedom”]

Probability density f (X ) =

1

2

(X

2

)(N/2)−1

e−X/2

Γ

(N

2

) .

Expected value E (X ) = N .

Variance V (X ) = 2N .

Characteristic function: φ(t) = (1− 2it)−N/2

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 41 / 52

Page 43: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Continuous Distributions: Chi-square

An important relationship exists between the cumulative Poissondistribution and the cumulative χ2 distribution:

P(r≤N0 |µ) = 1− P[χ2(2N0 + 2) < 2µ] ,

or P(r>N0 |µ) = P[χ2(2N0 + 2) < 2µ] ,

or P(r≥N0 |µ) = P[χ2(2N0) < 2µ] .

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 42 / 52

Page 44: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Continuous Distributions: Chi-square

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 1 2 3 4 5 6 7 8 9 10

Chi

squ

are

Pro

babi

lity

Den

sity

Variable X

Chi-square Distribution, N = 1, 6

N=1

N=2

N=3

N=4N=5

N=6

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 43 / 52

Page 45: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Continuous Distributions: Cauchy or Breit-WignerProbability density function

f (X ) =1

π

1

(1 + X 2).

Expected value: E (X ) is undefined.The Variance, Skewness and Kurtosis are all divergent.The Characteristic function is: φ(t) = e−|t| .

The physically important Breit–Wigner distribution is a form of Cauchy,usually written as

f (X ) =1

π

Γ2 + (X − X0)2

).

The parameters X0 and Γ represent location and scale parametersrespectively, being the mode and half-width at half-height respectively.However it should be noted that the mean and moments of thedistribution are still undefined.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 44 / 52

Page 46: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Continuous Distributions: Uniform

Probability density function

f (X ) =1

b − a, a ≤ X ≤ b

f (X ) = 0 , otherwise

Expected value E (X ) =a + b

2.

Variance V (X ) =(b − a)2

12.

Skewness γ1 = 0 .Kurtosis γ2 = −1.2 .Characteristic function

φ(t) =e itb − e ita

it(b − a).

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 45 / 52

Page 47: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

Continuous Distributions: Landau

The Landau Density function:

φ(x) =1

π

∫ ∞0

y−y sin(πy) exp(−xy)dy

where x is a linear function of the energy loss of a charged particletraversing a very thin layer of matter.

This density has an infinite tail going to zero so slowly (like the Cauchy)that the variance of x diverges and its expectation is undefined. Itsproperties are not easily expressible in closed form, but it is so importantthat many programs exist to handle it (notably, in ROOT, in the CERNProgram Library, and in GSL).

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 46 / 52

Page 48: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Probability Distributions Continuous

The Landau density

0

0.05

0.1

0.15

0.2

-5 0 5 10 15 20 25

Land

au d

ensi

ty

X

"landau.data" using 1:2

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 47 / 52

Page 49: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Convergence

ConvergenceWe say that the sequence tn, (n = 1, . . .∞) converges to T if:

The Usual Convergence:

You give me an ε > 0,

and I’ll give you an Nsuch that,for all n > N,|tn − T | < ε.

Convergence in Probability:

You give me an ε > 0and a p < 1,and I’ll give you an Nsuch that, for all n > N,P(|tn − T | < ε) > p.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In quadratic meanAlmost certain

In probability

In distribution

strongertoweakerconvergence

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 48 / 52

Page 50: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Convergence

The Law of Large Numbers

Let X1, . . . ,XN be a sequence of independent random variables, eachhaving the same mean µ, and variances σ2

i . Then theLaw of Large Numbers

says that the average converges to µ as N →∞:

limN→∞

X = limN→∞

1

N

N∑i=1

Xi = µ .

If the σi are all the same, one can show that the above converges inprobability. Depending on the behaviour of the σi for large i , one canprove stronger laws of large numbers which differ from this weak law byhaving stronger types of convergence.These stronger laws will not be of interest to us.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 49 / 52

Page 51: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Convergence

The Central Limit Theorem

Given a sequence of independent random variables Xi , each from adistribution with mean µi and variance σ2

i , then the distribution of thesum S =

∑Xi will have a mean

∑µi and a variance

∑σ2i . This holds for

any distributions provided that the Xi are independent, the individualmeans and variances exist and do not increase too rapidly with i .Under the same conditions as above, the Central Limit theorem statesthat, as N →∞, the sum S is distributed such that:

S −N∑i=1

µi√√√√ N∑i=1

σ2i

→ N (0, 1) .

where N (0, 1) is the Normal or Gaussian distribution with mean zero andvariance one.

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 50 / 52

Page 52: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Convergence

Example of Central Limit Theorem

Let ri be a random variable uniformly distributed between 0 and 1.

Then the sum S =n∑

i=1

(ri − 0.5)×√

12 is distributed as:

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 51 / 52

Page 53: Statistics for Physicists - CUSO...F. James (CERN)Statistics for Physicists, 1: Basic ConceptsSept-Dec 2014 4 / 52 Basic ConceptsIntroduction The backward process (data!hyp.) is called

Convergence

Other Limiting Distributions

NORMAL

BINOMIAL POISSON

MULTINOMIAL STUDENT’s - t

CHI-SQUARE F-distribution

p→ 0

Np = µ

i = 2 N →∞

N →∞

µ→∞

N →∞

N →∞ν1 →∞

ν2 →∞

ν2 →∞

ν1 →∞

F. James (CERN) Statistics for Physicists, 1: Basic Concepts Sept-Dec 2014 52 / 52