freshman physics laboratory (ph003) - ustcjxzy.ustc.edu.cn/jxzy/syjx/bbb.pdf · i started this work...

77
PHYSICS MATHEMATICS AND ASTRONOMY DIVISION CALIFORNIA INSTITUTE OF TECHNOLOGY Freshman Physics Laboratory (PH003) VadeMecum for Data Analysis Beginners (September 25, 2003) Copyright c Virgínio de Oliveira Sannibale, June, 2001

Upload: lamxuyen

Post on 30-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

PHYSICS MATHEMATICS AND ASTRONOMY DIVISIONCALIFORNIA INSTITUTE OF TECHNOLOGY

Freshman Physics Laboratory (PH003)

VadeMecum forData Analysis Beginners(September 25, 2003)

Copyright c©Virgínio de Oliveira Sannibale, June, 2001

2

Acknowledgments

I started this work with the aim of improving the course of Physics Lab-oratory for Caltech freshmen students, the so called ph3 course . Thanksto Donald Skelton, ph3 was already a very good course, well designed tosatisfy the needs of news students eager to learn the basics of laboratorytechniques and data analysis.

Because of the need of introducing new experiments, and new topics inthe data analysis notes, I decided to rewrite the didactical material tryingto keep intact the spirit of the course, i.e emphasis on techniques and noton the details of the theory.

Anyway, I believe and hope that this attempt to reorganize old experi-ments and introduce new ones constitutes an improvement of the course.

I would like to thank, in particular, Eugene W. Cowan for his incom-mensurable help he gave to me with critiques, suggestions, discussions,and corrections to the notes. His experience as professor at Caltech forseveral years were really valuable to make the content of these notes suit-able for students at the first year of the undergraduate course.

I would like to thank also all the teaching assistants that make thiscourse work, for their patience and valuable comments that I constantlyreceived during the academic terms. Christophe Basset deserves a partic-ular mention for his effort which went beyond his assignments.Sincerely,

Virgínio de Oliveira Sannibale

3

4

Contents

1 Basics on Error Analysis 91.1 Random Variables and Measurements . . . . . . . . . . . . . 91.2 Uncertainties on Measurements . . . . . . . . . . . . . . . . . 12

1.2.1 Accuracy and Precision . . . . . . . . . . . . . . . . . 121.3 Statistics and Measurements . . . . . . . . . . . . . . . . . . . 13

1.3.1 Measurement and Probability Distribution . . . . . . 131.3.1.1 Gaussian Distribution Parameter Estimation

for a Single Variable . . . . . . . . . . . . . . 151.3.1.2 Gaussian Distribution Parameter Estimation

for the Average Variable . . . . . . . . . . . 161.3.1.3 Gaussian Distribution Parameter Estimation

for the Weighted Average . . . . . . . . . . . 171.3.2 Example (Unweighted Average) . . . . . . . . . . . . 171.3.3 Example (Weighted Average) . . . . . . . . . . . . . . 18

2 Probability Distributions 212.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.1 Probability and Probability Density Function (PDF) . 212.1.2 Distribution Function (DF) . . . . . . . . . . . . . . . 212.1.3 Probability and Frequency . . . . . . . . . . . . . . . . 222.1.4 Continuous Random Variable v.s. Discrete Random

Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1.5 Expectation Value . . . . . . . . . . . . . . . . . . . . . 232.1.6 Intuitive Meaning of the Expectation Value . . . . . . 232.1.7 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 242.1.8 Intuitive Meaning of the Variance . . . . . . . . . . . 252.1.9 Standard Deviation . . . . . . . . . . . . . . . . . . . . 25

2.2 Gaussian Distribution (NPDF) . . . . . . . . . . . . . . . . . . 26

5

6 CONTENTS

2.2.1 Standard Probability Density Function . . . . . . . . 272.2.2 Gaussianity . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . 292.3.1 Random Variable Uniformly Distributed . . . . . . . 30

2.3.1.1 Example: Ruler Measurements . . . . . . . . 302.3.1.2 Example: Analog to Digital Conversion . . 30

2.4 Binomial/Bernoulli Distribution . . . . . . . . . . . . . . . . 322.5 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . 33

2.5.1 Example: Silver Activation Experiment . . . . . . . . 34

3 Propagation of Errors 373.1 Propagation of Errors Law . . . . . . . . . . . . . . . . . . . . 373.2 Statistical Propagation of Errors Law (SPEL) . . . . . . . . . 38

3.2.1 Example 1: Area of a Surface . . . . . . . . . . . . . . 393.2.2 Example 2: Power Dissipated by a Circuit . . . . . . . 393.2.3 Example 4: Improper Use of the Formula . . . . . . . 40

3.3 Relative Uncertainties . . . . . . . . . . . . . . . . . . . . . . 413.3.1 Example 1: . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Measurement Comparison . . . . . . . . . . . . . . . . . . . . 423.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Graphical Representation of Data 434.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Graphic Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.1 Linear Graphic Fit . . . . . . . . . . . . . . . . . . . . 454.2.2 Theoretical Points Imposition. . . . . . . . . . . . . . 46

4.3 Linear Plot and Linearization . . . . . . . . . . . . . . . . . . 474.3.1 Example 1: Square Function . . . . . . . . . . . . . . . 504.3.2 Example 2: Power Function . . . . . . . . . . . . . . . 504.3.3 Example 3: Exponential Function . . . . . . . . . . . . 51

4.4 Logarithmic Scales . . . . . . . . . . . . . . . . . . . . . . . . 514.4.1 Linearization with Logarithmic Graph Sheets . . . . 52

4.5 Difference Plots . . . . . . . . . . . . . . . . . . . . . . . . . . 524.5.1 Difference Plot of Logarithmic Scales . . . . . . . . . 52

5 Parameter Estimation 555.1 The Maximum Likelihood Principle (MLP) . . . . . . . . . . 55

CONTENTS 7

5.1.1 Example: σ and µ of a Normally Distributed Ran-dom Variable . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.2 Example: µ of a set of Normally Distributed Ran-dom Variables . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 The Least Square Principle (LSP) . . . . . . . . . . . . . . . . 585.2.1 Geometrical Meaning of the LSP . . . . . . . . . . . . 595.2.2 Example: Linear Function . . . . . . . . . . . . . . . . 595.2.3 The Reduced χ2(Fit Goodness) . . . . . . . . . . . . . 60

5.3 The LSP with the Effective Variance . . . . . . . . . . . . . . 615.4 Fit Example (Thermistor) . . . . . . . . . . . . . . . . . . . . . 62

5.4.1 Linear Fit . . . . . . . . . . . . . . . . . . . . . . . . . 625.4.2 Quadratic Fit . . . . . . . . . . . . . . . . . . . . . . . 635.4.3 Cubic Fit . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.5 Fit Example (Offset Constant) . . . . . . . . . . . . . . . . . . 64

A Central Limit Theorem 69

B Statistical Propagation of Errors 71

C NPDF Random Variable Uncertainties 73

D The Effective Variance 75

8 CONTENTS

Chapter 1

Basics on Error Analysis

1.1 Random Variables and Measurements

During an experiment, the result of a measurement of a physical quantity1

x (a number or a set of numbers) is always somewhat indeterminate. Inother words, if we repeat the measurement we can get a different result.

Apart from any philosophical point of view, the reasons for this in-determination can be explained considering that we are able to controlor measure just a few of the physical quantities involved in the experi-ment and we don’t completely know the dependency of each one of them.Moreover, all those variables can change with time and it becomes impos-sible to measure their evolution.

Fortunately and quite often, this ignorance does not preclude measure-ment with the required “precision”.

Let’s consider as example, a physical system ( see figure 1.1) made of athermally isolated liquid and a heater, a paddle wheel turning at constantvelocity. Let’s assume that we want to measure the average liquid tem-perature vs time using a mercury thermometer as instrument. Let’s tryto list some of the potential perturbations mechanisms that can affect themeasurement:

• During the measurement process the liquid temperature changes notuniformly because the system is not perfectly isolated and loosesheat.

1Any measurable quantity is a physical quantity.

9

10 CHAPTER 1. BASICS ON ERROR ANALYSIS

Thermometer

Isolator

Liquid

Paddlewheel

Current

Figure 1.1: Example of a physical system under measurement , i.e a variant of theJoule’s Experiment (1845). The isolated liquid is heated up by the paddle wheelmovement driven by an electric motor. The mercury thermometer measures thetemperature changes.

• During the measurement the liquid is not uniformly heated up be-cause of the location of the paddle wheel.

• The measurement should be taken when the liquid and the instru-ment are in thermal equilibrium (same temperature), and this can-not really happen because the liquid is heated up and because of thethermometer heat capacity. In other words, the thermometer is notfast enough to follow temperature changes.

• The instrument reading is affected by the parallax error, i.e the posi-tion of the mercury column cannot be accurately read.

• The accuracy of the thermometer scale divisions is not perfect, i.e thescale calibration is not perfect (the scale origin and/or the averagedivisions distance is not completely right, the division don’t havethe same distance, et cetera...).

• Liquid currents produced by the paddle wheel movement are turbu-lent (chaotic) affecting the uniformity of the liquid temperature.

1.1. RANDOM VARIABLES AND MEASUREMENTS 11

• The heat flow due to the paddle wheel movement is not completelyconstant and is affected by small unpredictable fluctuations. For ex-ample, measuring the current flowing through the electric motor wesee small random fluctuations around a constant value.

Some of those perturbations are probably completely negligible (the in-strument is unable to see them), some others can be estimated and mini-mized, and some others cannot2.

1

2

Ideal PhysicalSystem

Ambient

Instrument

Disturbance Response

DisturbanceDisturbance

x+ξ

ξ

ξ

Excitation

Figure 1.2: Model of a physical system under measurement.

Figure 1.2 shows a quite general model of the experimental situation.The physical system as an ideal system follows a known theoretical model.The instrument interacting with the system allows us to measure the phys-ical quantity x. External unpredictable disturbances ξ1, ξ2, ..., ξn, ... per-turb the system and the instrument. The instrument itself perturbs thephysical system, when we perform the measurement.

These considerations can be resumed by a simple but widely used lin-ear model for any physical quantity x. If we call x∗(t) the value of a physi-cal quantity with no disturbances at the time t, and ξ(t) its random fluctu-ations, the measured value of x at the time t will be

x(t) = x∗(t) + ξ(t).

2One can argue that we are trying to measure something with something worst than akludge. In other words, if we want the average liquid temperature, we need for example amore sophisticated apparatus that allows to map very accurately the liquid temperature.Anyway, what we can do is only to minimize perturbations but never get rid of them.

12 CHAPTER 1. BASICS ON ERROR ANALYSIS

Any physical quantity x is indeed a random variable or a stochastic vari-able.

1.2 Uncertainties on Measurements

We can distinguish types of uncertainties based on their nature, i.e. uncer-tainties that can be in principle eliminated, and uncertainties that cannotbe eliminated.

Starting from this criterion, we can divide the source of uncertaintiesalso called errors into two categories:

• random errors: an error which is not or does not appear to be directlyconnected to any cause (the cause and effect principle doesn’t work),and is indeed not repeatable but random. Random errors cannot becompletely eliminated.

• systematic errors: any error which is not random.Quite often, this kind of error algebraically adds to the measurementa constant unknown value. This value can also change with time.A typical systematic error comes from a faulty calibration of the in-strument used. This kind of error is the hardest one to be minimizedand quite often difficult to detect. Sometimes, it can be found by re-peating the measurement using different procedures and/or instru-ments.We can have also systematic errors due to the measurement proce-dure or definition. Let’s consider as example, the measurement ofthe thickness of an elastic material. Lack in the procedure defini-tion: measurement with a micrometer without defining the instru-ment applied pressure, the temperature, the humidity, etc...Lack in the procedure execution: drift of physical quantities sup-posed to be stationary such as pressure, temperature, etc...Systematic errors in principle can be completely removed.

1.2.1 Accuracy and Precision

It is worthwhile to make some new definitions about properties of a mea-surement:

1.3. STATISTICS AND MEASUREMENTS 13

xx1 2x_

Figure 1.3: Accuracy and precision comparison. The gray strips representthe uncertainty associated with the measurement xi, x represent the valueof the measured physical quantity with no perturbations. Measurementx1 is more accurate but less precise than measurement x2.

• Accuracy: a measurement is said to be accurate if it is not affected bysystematic errors. This characteristic does not preclude the presenceof small or large random error.

• Precision: a measurement is said to be precise if it is not affected byrandom errors. This characteristic does not preclude the presence ofany type of systematic errors.

Real measurements approximate the properties of precision and accuracy.Figure 1.3 shows an example of a precise and not accurate measurementx2 and an accurate but not precise measurement x1.

In general, accuracy and precision are antithetic properties. A measure-ment with very large random errors can be extremely precise because thesystematic error is negligible with respect to the random error. An analo-gous statement can be formulated for extremely accurate measurements.

1.3 Statistics and Measurements

1.3.1 Measurement and Probability Distribution

It is experimentally well known (or accepted) that a large number of phys-ical quantities are random variables, whose distribution follows with goodprecision the so called Gaussian or Normal Distribution3.

3In general, statistics are not able to provide a necessary and sufficient test to check ifa random variable follows the Gaussian distribution (Gaussianity)[4].

14 CHAPTER 1. BASICS ON ERROR ANALYSIS

-4 µ−3σ µ−2σ µ−σ µ µ−σ µ−2σ µ−3σ 4Normally distributed physical quantity (A.U.)

0

0,1

0,2

0,3

0,4

Pob

abili

ty D

ensi

ty (1

/A.U

.)

Figure 1.4: Probability density of a physical quantity following theGauss/Normal distribution. The x axis units are normalized to σ, i.e. thesquare root of the variance of the distribution. The dashed area representsa probability of 68.3%, which correspond to the probability of having x inthe symmetric interval (x − σ, x + σ).

1.3. STATISTICS AND MEASUREMENTS 15

The Gaussian distribution

p(x) =1√2πσ

e−(x−µ)2

2σ2 ,

is a continuous function (see figure 1.4) with one peak, symmetric arounda vertical axis crossing the value µ, and with tails exponentially decreasingto 0. It has indeed one absolute maximum at x = µ. The peak width isdefined by the parameter σ .

The probability dP to measure a value in the interval (x, x + dx) is

dP (x) =1√2πσ

e−(x−µ)2

2σ2 dx ,

and in general the probability to measure a value in the interval (a, b) is

P (a < x < b) =∫ b

a

1√2πσ

e−(x−µ)2

2σ2 dx .

The “most probable value” lies indeed inside an interval centered aroundµ, The probability to have the true value in the interval (µ − σ, µ + σ) is68.3% and is represented by the dashed area of figure 1.4. The statisticalresult of a measurement with a probability/confidence level of 68.3% iswritten as

x = (x0 ± σ)Units, [68.3% confidence]

The half width of the interval σ is the uncertainty, or the experimentalerror, or simply the error that we associate to the measurement of x.

Quite often, the Gaussian distribution parameters are unknown, andthe problem arises to experimentally estimate µ, and σ . The theory ofstatistics comes into play to give us the tools to estimate parameters dis-tributions, which allow us to define the uncertainty of the measurement.The basic idea of statistics is to use a large number of measurements of thephysical quantity (samples) to estimate the parameters of the distribution.in the next paragraphs we will just state some basic results with some in-tuitive and naive explanations. They will be explained in more detail inchapter 5.

1.3.1.1 Gaussian Distribution Parameter Estimation for a Single Vari-able

Let’s suppose that we measure a normally distributed physical quantityx N times, obtaining the results x1, x2, ..., xN . If no systematic errors are

16 CHAPTER 1. BASICS ON ERROR ANALYSIS

present, it is probable that the average x

x =x1 + x2 + ... + xN

N

becomes closer to the value µ, when the number of measurement N in-creases4. We can assume indeed that the average x is the so called estima-tor µ of µ. To distinguish between the parameter and its estimator we willuse the symbol ”^”, i.e.

µ = x, ⇒ µ ' µ .

Averaging the square of the distance of each single measurement xi

from µ , we will have

σ2 =(x1 − x)2 + (x2 − x)2 + ... + (xN − x)2

N,

which is the square of average distance of the measured points from theestimated “most probable value” x.

It is reasonable to assume that this value is an estimator of the uncer-tainty of each single measurement xi. A rigorous approach shows thatthis is an estimator of the σ of the distribution. It can be demonstrated thata better estimator is not the standard average but the sum of the square ofthe distances divided by N − 1 instead of N , i.e.

σ2 =1

N − 1

N∑

i=1

(µ − xi)2, ⇒ σ ' σ.

1.3.1.2 Gaussian Distribution Parameter Estimation for the Average Vari-able

The average x of a normally distributed variable x, is a random variableitself, and it must follow the normal distribution. What is the σx of thenormal distribution associated to x ? It can be proved that

σ2x =

σ2

N⇒ σx ' σx .

4It is possible and probable also to have a single measurement much closer to µ thanthe average, but this does not help to find an estimate of µ.

1.3. STATISTICS AND MEASUREMENTS 17

1.3.1.3 Gaussian Distribution Parameter Estimation for the WeightedAverage

Let’s suppose now that N measurements of the same physical quantityx1, x2, ..., xN , are still normally distributed, but each measurement has knownvariance σ2

1 , σ22, ..., σ

2N .

Considering that the variance is a statistical parameter of the measure-ment precision, to calculate the average we can use the σi as weight foreach measurement, i.e.

x =1

∑Ni=1 wi

N∑

i=1

wi xi. weighted average

wherewi =

1

σ2i

, i = 1, 2, ..., N .

The uncertainty associated with the weighted random variable x is

σ2x ' 1

∑Ni=1 1/σ2

i

.

It is left as exercise to check what happen to the previous equationwhen σ1 = σ2 = ... = σN .

To try to digest these new fundamental formulas, Let’s see two exam-ples.

1.3.2 Example (Unweighted Average)

The voltage difference V across a resistor directly measured 10 times, givesthe following values:

i 1 2 3 4 5 6 7 8 9 10Vi(mV) 123.5 125.3 124.1 123.9 123.7 124.2 123.2 123.7 124.0 123.2

The voltage difference average V is, indeed

18 CHAPTER 1. BASICS ON ERROR ANALYSIS

V =1

10

10∑

i=1

Vi = 123.880 mV

Assuming that V is a random variable normally distributed, we willhave that uncertainty sv on each single measurement Vi is

sV =

1

10 − 1

10∑

i=1

(Vi − V )2 = 0.6070 mV.

and the uncertainty on V will be

sV =sV√10

= 0.1919 mV.

Finally, we will have5

V1 = (123.5 ± 0.6)mV, V2 = (125.3 ± 0.6)mV, ..., V10 = (123.2 ± 0.6)mV,

V = (123.880 ± 0.192)mV.

1.3.3 Example (Weighted Average)

The reflectivity R 6 of a dielectric mirror, measured with 5 sets of measure-ments , gives the following table

i 1 2 3 4 5Ri 0.4932 0.4947 0.4901 0.4921 0.4915si 0.0021 0.0025 0.0032 0.0018 0.0027

5The cumbersome notation of each single measurement is used here to avoid any kindof ambiguity. Whenever possible results should be tabulated.

6The reflectivity of a dielectric mirror for a polarized monochromatic light of wave-length λ, with a incident angle θ can be defined as the ratio of the reflected light powerand the impinging power.

1.3. STATISTICS AND MEASUREMENTS 19

Assuming that R is a normally distributed random variable, we willhave that the uncertainty sR on the weighted average R will be

sR =

1∑5

i=1 1/s2i

= 0.001037.

The weighted average R

R = s2R

5∑

i=1

Ri

s2i

= 0.492517

Finally, we will have

R = 0.49252 ± 0.00103.

20 CHAPTER 1. BASICS ON ERROR ANALYSIS

Chapter 2

Probability Distributions

This chapter reports the basic definitions describing the probability den-sity functions (PDF) and some of the frequently used distributions withtheir main properties. To comprehend the next chapters it is important tobecome familiar mainly with the first section, where some new conceptsare introduced. The understanding of each distribution is required sincethey will be used or mentioned in the next chapters.

2.1 Definitions

2.1.1 Probability and Probability Density Function (PDF)

Let’s consider a continuous random variable x. We define the probabilitythat x assumes a value in the interval (a, b) the following integral

Pa < x < b =∫ b

ap(x)dx,

where p(x) is a function, which is called probability density function (PDF) ofthe random variable x. From the previous definition p(x)dx represents theprobability of having of x in the interval (x, x + dx). It is worth to noticethat in general p(x) has dimensions , i.e. [p(x)] = [x−1].

2.1.2 Distribution Function (DF)

The following function

21

22 CHAPTER 2. PROBABILITY DISTRIBUTIONS

F (x′) =∫ x′

−∞

p(x)dx (2.1)

is called the distribution function (DF) or cumulative distribution functionof p(x).

Considering the properties of the integral, and equation (2.1), we willhave

Pa < x < b = F (b) − F (a).

The two following relations must be satisfied also

limx→+∞

F (x) = 1, (2.2)

limx→−∞

F (x) = 0. (2.3)

The first limit, the so called normalization condition, represents the prob-ability that x assumes any one of its possible values. The second limitrepresents the probability that x does not assume a value.

2.1.3 Probability and Frequency

Let’s suppose that we measure N times a random variable x, which canassume any value inside the interval (a, b). Partitioning the interval inton intervals, and counting how may times ni the measured value is insidethe i-th interval, we will have

fi =ni

N, i = 1, ..., n

which is the called frequency of the random variable x . The limit

Pn(x) = limN→∞

ni

N, i = 1, ..., n

is the probability of obtaining a measurement of x in the i-th interval. Thisexperimental (un-practical) definition will be used in the next subsections.

2.1. DEFINITIONS 23

2.1.4 Continuous Random Variable v.s. Discrete RandomVariable

We can always make a continuous random variable x become a discreterandom variable X . Defining a partition Σ of the domain (a, b) of x1

Σ = a = x0, x1, x2, ..., xn = b,

we can compute the probability Pn associated with each interval (xn, xn+1).If we define the discrete variable X , assuming one value per each inter-val2X = X1, X2, ..., then we will have defined a new discrete randomvariable X , with probability Pn.

2.1.5 Expectation Value

The following integral

E[x] =∫ +∞

−∞

x p(x)dx

is defined the expectation value of the random variable xBecause of the linearity of the operation of integration, we have

E[αx + βy] = αE[x] + βE[y],

where α and β are constant values.

2.1.6 Intuitive Meaning of the Expectation Value

The intuitive meaning of the expectation value can be easily understoodin the case of a discrete random variable.

In this case, we have that the expectation value becomes3

E[X] =+∞∑

i=1

XiP(Xi),

1In general, the partition Σ can be numerable. In other words, it can have infinitenumber of intervals but we can associate an integer to each one of them.

2There is an arbitrariness on choice of the values of Xn because we are assuming thatany value in each given interval is equiprobable.

3In general, when we change between the continuous to discrete variable, we havethat

p(x)dx ⇒∑

P (xi)

24 CHAPTER 2. PROBABILITY DISTRIBUTIONS

where P(Xi) is the probability that X assumes the value Xi.Let’s suppose that measuring X N times, we obtain M different values

of X . Let ni be the number of times that we measure the value Xi, withi = 1, 2, ..., M .

If we estimate P(Xi) with the frequency ni/N of having the event X =Xi

P(Xi) 'ni

N,

we will have

E[X] ' 1

N

M∑

i=1

Xini,

which corresponds to the average of the N measurements of the variableX .

We can conclude that experimentally, the expectation value of a ran-dom variable is estimated by the average of the measured values of therandom variable.

2.1.7 Variance

The expectation value of the square of the difference between the randomvariable x and its expectation value is called variance of x, i.e.

V [x] = E[(x − E[x])2].

A more explicit expression gives

V [x] =∫

(x − µ)2p(x)dx

Using the properties of the expectation value and of the distributionfunction, we obtain

V [x] = E[x2] − E[x]2.

A common symbol used as a shortcut for the variance is the Greekletter sigma, i.e.

σ2 = V [x].

The variance have the so called pseudo linearity property. Consideringtwo random variables x, and y, we will have

V [αx + βy] = α2V [x] + β2V [y],

where α and β are constant values.

2.1. DEFINITIONS 25

2.1.8 Intuitive Meaning of the Variance

To provide an intuitive meaning of the variance, we can still make use ofa discrete random variable X .

In this case, we will have

V [X] =+∞∑

i=1

(Xi − E[X])2P(Xi),

which shows that the variance is just the sum of the square of the dis-tance of each experimental point to the expectation value weighted by theprobability to obtain the measurement. Estimating the probability withthe frequency, we will have that V [X] is just the average of the the squareof the distance of the experimental points Xi from their average, i.e.

V [X] ' 1

N

M∑

i=1

(Xi − E[X])2ni,

We can conclude that the variance is estimated by the average of thesquare of the distances of the experimental values from their expecta-tion value.

2.1.9 Standard Deviation

The square root of the variance is defined as the standard deviation of therandom variable x, i.e.

σ =

(x − µ)2p(x)dx

26 CHAPTER 2. PROBABILITY DISTRIBUTIONS

2.2 Gaussian Distribution (NPDF)

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

P(t)

t

σ

σ=1

µ=0

2

Figure 2.1: Normal probability distribution centered around 0 with uni-tary variance.

The following PDF of the random variable x

p(x) =1√

2πσ2e−

(x−µ)2

2σ2 . (2.4)

is said Gauss/Normal probability density function (NPDF) centered in µand with variance σ2. The variable x is said to be normally distributedaround µ (see figure 2.1).

Let’s make a list of some important properties of the NPDF:

• The variance and the expectation values are respectively

E[x] = µ, V [x] = σ2.

Calculation of E[x], and V [x], are left as exercise.

• the (2.4) is symmetric around the vertical axis crossing the point x =µ.

2.2. GAUSSIAN DISTRIBUTION (NPDF) 27

• the (2.4) has one absolute maximum in x = µ.

• the (2.4) has exponentially decaying tails

|x| |µ|, ⇒ p(x) ∝ e−x2

2σ2

• the analytical cumulative function

P (x) =∫ x

−∞

p(x′)dx′ =1√

2πσ2

∫ x

−∞

dx′e−(x′−µ)2

2σ2 , (2.5)

is not known.

2.2.1 Standard Probability Density Function

Applying the following transformation to the NPDF

t =x − µ

σ,

we obtain the so called standard probability density function

p(t) =1√2π

e−t2

2 , E[t] = 0, V [t] = 1.

It is left to the student to verify the normalization condition (2.2) forthe distribution function of the previous PDF.

2.2.2 Gaussianity

As it has been said before, It is experimentally known (or accepted) that alarge number of physical quantities are random variables following withgood approximation the NPDF.

This experimental evidence is theoretically corroborated by the so calledcentral limit theorem (see appendix A). Under a reasonably limited numberof hypothesis, this theorem states that the average x of any random vari-able x is a random variable normally distributed, when the averages num-ber tends to infinity. Quite often, the measurements of a physical quantity

28 CHAPTER 2. PROBABILITY DISTRIBUTIONS

is the result of a wished or unwished average of several measurements andindeed this physical quantity tends to follows the Gaussian distribution.

Deviations from a Gaussian distribution are quite often time depen-dent. In other words, a physical quantity behaves as a Gaussian randomvariable for a given period of time. This is true mainly because it is alwaysquite difficult to keep the experimental conditions constant or controlledduring time time needed to perform all the measurements. Sudden orslow uncontrolled changes of the system can easily modify the parame-ters or the PDF of the physical quantity we are measuring.

Anyway, it is important to stress that the Gaussianity of a random vari-able or more in general the type of PDF should be always investigated.

2.3. UNIFORM DISTRIBUTION 29

2.3 Uniform Distribution

p(x)

xa x

P(x)

b

/(b-a)

a b

11

Figure 2.2: Uniform probability density function p(x) and its cumulativefunction P (x).

The following PDF of the random variable x

p(x) =

1/(b − a) x ∈ [a, b)0 x /∈ [a, b)

p(x) is said uniform probability density function, and x is said to be uni-formly distributed in the interval [a, b] (see figure 2.2). This PDF says thatany value in the interval [a, b) has the same probability.

The cumulative function is

P (x) =

0 x < a,x−ab−a

a ≤ x ≤ b

1 x > b

The expectation value of x is

E[x] =a + b

2,

and the varianceV [x] =

1

12(b − a)2.

The calculation of E[x], and V [x], are left as exercise.

30 CHAPTER 2. PROBABILITY DISTRIBUTIONS

2.3.1 Random Variable Uniformly Distributed

Let’s suppose that measuring N times a given physical quantity x we al-ways obtain the same result x0. In this case, we cannot make any assump-tion about how x is statistically distributed. A reasonable hypothesis isthat x is uniformly distributed in the interval

[

x0 − ∆x2

, x0 + ∆x2

]

, where∆x is the instrument resolution. Under this assumption, the best estimateof the uncertainty on x is indeed

σx =b − a√

12=

∆x

2√

3.

This is a case where the statistical uncertainty cannot be evaluated fromthe measurements, and has to be estimated from the instrument resolu-tion.

2.3.1.1 Example: Ruler Measurements

The measurement of a distance d with a ruler with a resolution ∆x =0.5mm (half of the smallest division) is repeated several times, and givesalways the same value of 12.5mm. Assuming that d is uniformly dis-tributed in the interval [12.25, 12.75], the statistical uncertainty associatedto d will be

σd =0.5

2√

3= 0.144mm .

2.3.1.2 Example: Analog to Digital Conversion

The conversion of an analog signal to a number through an Analog to Dig-ital Converter (ADC), is a typical example of “creation” of a uniformly dis-tributed random variable. In fact, the conversion rounds the analog valueto a given integer number. The integer value depends on which intervalthe analog value lies in, and the interval length ∆V is the ADC resolu-tion. Then, it is reasonable to assume that the converted value follows theuniform PDF.

If the ADC numerical representation is 12bit long, and the input/dynamicrange is from -10V to 10V the interval length is

∆V =10 − (−10)

212' 4.88mV

2.3. UNIFORM DISTRIBUTION 31

and the uncertainty associated to each conversion will be

σV =4.88

2√

3' 1.4mV

Anyway, the previous statement about the distribution followed by theconverted value is not general, and not applicable to any ADC. There aresome numerical techniques applied to converters that change the statisti-cal distribution of the converted values. As usual, we have to investigatewhich PDF the random variable follows.

32 CHAPTER 2. PROBABILITY DISTRIBUTIONS

2.4 Binomial/Bernoulli Distribution

Let’s suppose that we perform an experiment, which only allows to obtaintwo results/events4 A, and A. The first event has probability P(A) = p andthe other event has necessarily probability P(A) = (1 − p).

If we repeat the experiment N times, the probability of having k timesthe event A (and indeed having N − k times A) is

PN,p(k) =

(

Nk

)

pk (1 − p)N−k, 0 ≤ p ≤ 1 (2.6)

where(

nk

)

=n!

(n − k)! k!

is the binomial coefficient. The (2.6) is called Binomial of Bernoulli distri-bution.

The expectation value of k and its variance are respectively

E[k] = Np,

V [k] = Np(1 − p).

The calculation of E[k], and V [k], are left as exercise.For large values of N (N > 20, and especially for p ' 1/2), P(k) be-

comes well approximated by the NPDF.The Poisson distribution which is described in the next section, is an

asymptotic limit of the binomial distribution when N → ∞ .

4Any kind of experiment which has more than one, numerable, or infinite results canbe arbitrarily arranged into two set of results, and indeed into two possible events.

2.5. POISSON DISTRIBUTION 33

2.5 Poisson Distribution

0 5 10 15 20 25 30 35 40 45 50Events

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

0,18

Pro

babi

lity

Figure 2.3: Poisson Distribution Pm(k) for different values of the parame-ter m (5, 10, 15, 25)

Imposing the following conditions to the binomial distribution

N → ∞p → 0

Np = m = const.,

we obtain the so called Poisson distribution (see figure 2.3)

Pm(k) =mk

k!e−m .

The expectation value of k and its variance are respectively

E[k] = m,

V [x] = m.

The calculation of E[k], and V [k], are left as exercise.

34 CHAPTER 2. PROBABILITY DISTRIBUTIONS

The meaning of this distribution is essentially the same of the bino-mial distribution. It represents the probability of measure an event k timeswhen the number of measurements N is very large, i.e N m.

Let’s mention some important qualitatively properties of the Poissondistribution.

• For m < 20, Pm(k) is quite asymmetric and the expectation valuedoes not coincide with the maximum.

• For m ≥ 20, Pm(k) is quite symmetric, the expectation value is veryclose with the maximum, and the curve is quite well approximatedby a Gaussian distribution with µ = m, and σ =

√m.

If k1, k2, ..., kn are n measurements of k, a good estimator of m is the aver-age of the measurements, i.e.

m =1

n

n∑

i=1

ki ,

and the uncertainty on each single measurement is indeed

σk =√

m.

The uncertainty on the estimator m is

σm =σ√n

=

m

n.

The demonstration of the validity of these estimators is based on con-cepts explained in chapter 5.

2.5.1 Example: Silver Activation Experiment

A classical example of application of the Poisson distribution is the statis-tical analysis of the atomic decay. In this case, the Poisson variable is thetotal number of decays measured during a given time τ . The number Nis the number of atoms that can potentially decay, which is normally quitedifficult to make it very small, making the approximation N → ∞ quitegood.

Let’s consider here some real data taken from the activation of the sil-ver with a radioactive source:

2.5. POISSON DISTRIBUTION 35

• Number of measurements n = 20

• Measurement time τ = 8s

• Table of measurements (with the average radioactive backgrounddecays already removed)

i 1 2 3 4 5 6 7 8 9 10ki(Counts) 52 46 61 60 48 55 53 59 56 53

i 11 12 13 14 15 16 17 18 19 20ki(Counts) 59 48 63 50 55 56 55 61 49 39

Using the average as the estimator of m we get

m =1

n

ki =1078

20= 53.90 counts

The uncertainty is

σ =√

m = 7.3417 counts,

The uncertainty on m is

σm =σ√n

= 1.6416 counts .

The mean number of decaying atoms during a period of τ = 8s is in-deed

m = (53.90 ± 1.64) counts

Finally, neglecting the uncertainty on the measurement time τ , the sta-tistical measurement of the decay rate obtained dividing by τ is

R = (6.74 ± 0.21) decays/s

It is important to notice that a single long measurement gives the sameresult. In fact, considering the overall decay time , we will have

m = 1078 countsσ =

√1078 = 32.83 counts

⇒ m = (1078.0 ± 32.8) counts

Because of the integration time is now τ n = 8 · 20 = 160s, we will have

R = (6.74 ± 0.21) decays/s

The calculation of R using a single long measurement has essentiallytwo issues:

36 CHAPTER 2. PROBABILITY DISTRIBUTIONS

• it does not allow to check the assumption on the statistic distribution.

• There is no way to check for any anomalies during the data takingwith just one single datum.

Chapter 3

Propagation of Errors

3.1 Propagation of Errors Law

1

x1 1 x0 0

f(x )+df0

1

0

f(x )f(x ) +df

f(x )

x +dxx +dx

Figure 3.1: Variation of f(x) in two points x0, and x1. The derivative ac-counts for the difference in magnitude variation of the function f(x) indifferent points.

We want to find an approximate method to compute the uncertaintyof a physical quantity f , which has been indirectly determined, i.e f is afunction of a a set of random variables that are physical quantities.

To focus the problem it is better to consider the case of f as a function

37

38 CHAPTER 3. PROPAGATION OF ERRORS

of a single variable x with uncertainty σx. The differential

df =

(

df

dx

)

x=x0

dx

represents the variation of f for the corresponding variation dx around x0

(see figure 3.1). Imposingdx = σx,

we can interpret the following expression as an estimation of the uncer-tainty on f

∆f '∣

df

dx

x0

σx,

where x0 is the measured value. The absolute value is calculated to takeinto account a possible negative sign of the derivative. This formula can beeasily extended, considering the definition of the differential of a functionof n variables x1, x2, ..., xn, i.e.

∆f '∣

∂f

∂x1

σx1 +

∂f

∂x2

σx2 ... +

∂f

∂xn

σxn. (3.1)

The previous expression is said to be the law of propagation of errors,which gives an estimation of the maximum uncertainty on f for a givenset of uncertainties σx1, ..., σxn

. The derivatives are calculated using themeasured values of the physical quantities x1, x2, ..., xn .

This law is rigorously exact (but not statistically correct as shown in thenext section) in the case of a linear function, where the Taylor expansioncoincides with the function itself.

This formula is quite useful during design stages of an experiment. Infact, because of its linearity it is easier to apply and estimate the contribu-tions to the error of measured physical quantity. Moreover, this estimatecan be used to minimize the error and optimize the measurement (see ex-ample 3.3.1)

3.2 Statistical Propagation of Errors Law (SPEL)

A more orthodox approach, which starts from a Taylor expansion formulaof the variance, gives the statistically correct formula for the propagation

3.2. STATISTICAL PROPAGATION OF ERRORS LAW (SPEL) 39

of errors (see appendix B) . For the case of a function f of two randomvariables x, and y and the two data points (x0 ± σx0), (y0 ± σy0), we have

σ2f =

(

∂f

∂x

)2

σ2x0

+

(

∂f

∂y

)2

σ2y0

+ 2

(

∂f

∂x

)(

∂f

∂y

)

E [(x − x0) (y − y0)]

where the partial derivatives are calculated in (x0, y0).The previous expression is said to be the law of statistical propagation of

errors (SPEL).In the special case of uncorrelated random variables, i.e. variables havingindependent PDF, the previous equation becomes

σ2f =

(

∂f

∂x

)2

σ2x0

+

(

∂f

∂y

)2

σ2y0

.

The most general expression and the derivation of the SPEL can befound in appendix B.

3.2.1 Example 1: Area of a Surface

Let’s suppose that the area A of rectangular surface of sides a and b isindirectly measured by measuring the sides. We will have

A = ab, ⇒ σ2A ' b2σ2

a + a2σ2b .

If the surface is a square, we could write

A = a2, ⇒ σ2A ' 2aσ2

a,

which implies that we are assuming that the square is perfect, and it issufficient to measure one side of the square.

3.2.2 Example 2: Power Dissipated by a Circuit

Suppose we want to know the uncertainty on the power

P = V Icosφ ,

dissipated by an AC circuit where V and I are the sinusoidal voltage andcurrent of the circuit, and φ the phase difference between the V and I .

40 CHAPTER 3. PROPAGATION OF ERRORS

Applying the SPEL and supposing that there is no correlation amongthe variables, we will have

σ2P ' (Icosφ)2σ2

V + (V cosφ)2σ2I + (V Isinφ)2σ2

φ (3.2)

= P 2

[

(

σV

V

)2

+(

σI

I

)2

+ (tanφ σφ)2

]

. (3.3)

Considering the following experimental values with the respective es-timates of their expectation values and uncertainties, we get

V = (77.78 ± 0.71)V,

I = (1.21 ± 0.071)A,

φ = (0.283 ± 0.017)rad,

⇒ P = (90.37 ± 5.39)W.

3.2.3 Example 4: Improper Use of the Formula

Let consider a random variable x which follows a NPDF with

E[x] = 0,V [x] = α,

and a function f of x, f(x) = x2.Applying the SPEL to f(x), we obtain

σf = [2x]x=0 σx = 0,

which leads to a wrong result, because it is approximation (see eq. B.1 inAppendix B) is not legitimate. Expanding the function up to the secondorder term , we have as expectation

f(x) = f(µ) + [2x]x=0 x +1

2[2]x=0 x2 = x2,

which shows that we are neglecting for x = 0 a contribution of the secondorder term which is greater than one.

Considering the definition of variance instead, and with the aid of theintegration by parts formula, we get the correct result

σf = V [f(x)] = V [x2] = 2.

3.3. RELATIVE UNCERTAINTIES 41

3.3 Relative Uncertainties

Let f be a physical quantity and σf its uncertainty. The ratio

∆f =σf

f

is said to be the relative uncertainty or fractional error of the physicalquantity f .

The importance of this quantity arises mainly, when we have to ana-lyze the contribution of each uncertainty to the uncertainty of the physicalquantity f .

Expression (3.1) becomes quite useful for a quick estimate of the rela-tive uncertainty .

3.3.1 Example 1:

We want to measure the gravity constant g with a precision of 0.1% (∆g =.001) using the equation of the resonant angular frequency of a simple pen-dulum of length l, i.e.

ω0 =

g

l.

Applying the logarithm in the previous expression, and making thederivative of both sides , we get

∆ω0

ω0=

1

2

(

∆g

g+

∆l

l

)

.

Considering that uncertainties cannot exactly cancel out, we will have

∆g = 2∆ω0

ω0

+∆l

l,

which means that the contribution of the uncertainty on ω0 and on l arelinear and different just by a factor two.

Supposing that we are able to measure ω0 within less than 0.1%, wehave to be able to measure l at least with the same precision to guaranteethe required precision . If we have

l = 1m ⇒ σl < 0.001m.

This formula tells us that the accuracy on the knowledge of l must besmaller than 1mm to measure g within 0.1%.

42 CHAPTER 3. PROPAGATION OF ERRORS

3.4 Measurement Comparison

We can use the SPEL to determine if two different measurements of thesame physical quantity x are statistically the same. Let’s suppose that thefollowing measurements

x1 ± σx1 , x2 ± σx2 ,

are two independent measurements of the physical quantity x. The differ-ence and the uncertainty of the difference will be, indeed

∆x = |x2 − x1|, σ∆x =√

σ2x1

+ σ2x2

.

We can assume as “test of confidence” that the two measurements arestatistically the same if

∆x < 2σ∆x.

3.4.1 Example

Suppose that the following measurements

I1 = (4.398 ± 1.256) · 10−2 kg m2

I2 = (4.431 ± 1.324) · 10−2 kg m2

are two measurements of the moment of inertia of a cylinder. We will have

∆I = (0.033 ± 1.825) · 10−2 kg m2,

which shows that ∆I is within two σ of zero. Statistically, the two mea-surements must be considered two determinations of the same physicalquantity.

Chapter 4

Graphical Representation of Data

4.1 Introduction

A good way to analyze a set of experimental data, and resume the results,is to plot them in a graph. It becomes important to provide on it all theinformation needed to correctly and easily read the graph. The choice ofthe proper scale and the type of scale is important also.

Judgment of the curve fit goodness is quite often done by the inspectionof the graph, of the data points and the theoretical fitted curve, or better,analyzing the so called “difference plot”.

For a reasonable good understanding of a graph, the following infor-mation should be contained:

• a title,

• axis labels to define the plotted physical quantities,

• the physics units of the plotted physical quantities,

• a dot in correspondence of each experimental point, and error barsor error rectangles,

• graphical analysis made on the graph, in particular the data-pointsused clearly labeled,

• a legend if more than one data set is plotted.

Figure 4.1 shows an example of graph containing all the information neededto properly read the plot.

43

44 CHAPTER 4. GRAPHICAL REPRESENTATION OF DATA

0 2 4 6 8 10 12Current through the Resistor, (mA)

0

2

4

6

8

10

12

Vol

tage

diff

eren

ce a

cros

s th

e R

esis

tor,

(V)

Current -Voltage Characteristic for a Carbon Resistor

y(x)=ax+b

a=(996.3+-12.9)

b=(-0.103+-0.094)mA

Ω

Figure 4.1: Example of Graph.

4.2. GRAPHIC FIT 45

4.2 Graphic Fit

Nowadays, graphic curve fitting of experimental data can be considereda romantic or nostalgic way to obtain an estimate of function parameters.Anyway, we can still face a situation where the omni present computercannot be accessed to perform a statistical numerical fit. Moreover, thiskind of exercise is still pedagogically valuable, and indeed it deserves tobe studied.

Statistical curve fit techniques will be explained in chapter 5.

4.2.1 Linear Graphic Fit

Let’s consider the case of the fit of a straight line

y = ax + b (4.1)

where the two parameters, the slope a, and the intercept b must begraphically determined.

Let’s suppose that we are able to trace a straight line, which fits theexperimental points reasonably well. Considering then, two points A =(x1, y1), and B = (x2, y2) belonging to the straight line, and the eq. (4.1),after some trivial algebra, we will have the two estimators of a, and b

a =y2 − y1

x2 − x1

, b =x2y1 − x1y2

x2 − x1

, x1 < x2.

Anyway, because of the previous assumption we still need an objectivecriteria to trace the straight line.

If we draw a rectangle centered on each data point, having sides corre-sponding to the data-point uncertainties, we will obtain a plot similar tothat shown in figure 4.2. Using a ruler, and applying the following twocriteria we will be able to trace two straight lines with maximum and min-imum slopes:

• The maximum slope is such that if we try to draw a straight line withsteeper slope, not all the rectangles will be intercepted,

• The minimum slope is such that if we try to draw a straight line withless steep slope, not all the rectangles will be intercepted,

46 CHAPTER 4. GRAPHICAL REPRESENTATION OF DATA

We will have indeed, two estimations of a, and b, whose averages will givethe estimated values of the slope and the intercept. Their semi-differenceswill give the maximum uncertainties associated with them

a =amax + amin

2, ∆a =

amax − amin2

, (4.2)

b =bmax + bmin

2, ∆b =

bmax − bmin2

. (4.3)

Figure 4.2 shows an example of graphic fit applying the mentionedmax min slope criteria. Using the data point A, B, C, and D we obtain

amax =12.4 − (−0.9)

12.0 − 0.0= 1.108kΩ amin = 12.0−1.0

13.5−0.0= 0.815Ω

bmax =12.0 · (−0.9) − 0.0 · 12.4

12.0 − 0.0= −0.9V bmin = 13.5·1.0−0.0·12.0

13.5−0.0= 1.0V

and finally

a = (1.0 ± 0.1)kΩ

b = (0 ± 1)V

If we cannot have all the points intercepted by a straight line, we coulduse this additional but very subjective thumb rule:

• The maximum and the minimum slope straight lines are those lines,which make the straight line computed using eq.s.(4.2) and (4.3) in-tercepting at least 2/3 of the rectangles.

This rule tries to empirically take into account the results of statistics,when applied to a curve fit. It is indeed better to use a statistical fittingmethods as explained in chapter 5.

4.2.2 Theoretical Points Imposition.

Imposing theoretical points to the fit curve, implies that we are assum-ing that each separate experimental point uncertainty is not dominated byany systematic error (negligible compared to random errors). In fact, if we

4.3. LINEAR PLOT AND LINEARIZATION 47

have a systematic error, the theoretical points will be not properly alignedwith the experimental points. This can cause more likely, a large system-atic error on the parameter estimation, especially in the case of statisticalfits.

Figure 4.3 shows an examples of graphic fit, which imposes to the maxmin straight lines to cross the zero point.

Using the data-points A, and B we obtain

amax =12.1

12.0= 1.008kΩ amin =

12.0

13.1= 0.916Ω

and finallya = (0.96 ± 0.05)kΩ

Statistically, this new measurement of a do agree with the previous onewithin their uncertainty. Which measurement is more accurate is difficultto say. A statistical analysis can reduce the uncertainty giving a more pre-cise measurement.

4.3 Linear Plot and Linearization

The application of the graphical fitting methods can be extended to nonlinear functions through the so called process of the linearization. In otherwords, if we have a function, which is not linear we can apply functionsto linearize it.

We can mathematically formulate the problem in the following terms.Let’s suppose that

y = y(x; a, b)

is a non-linear function, with two parameters a and b. If we find the trans-formations

Y = Y (x, y), X = X(x, y),

that allow to write the following relation

Y = XF (a, b) + G(a, b),

where F, and G are known expression, which only depend on a, and b,then we have linearized y. Once the F and G values are found with graph-ical or numerical methods, a and b can be then found by inversion of F ,

48 CHAPTER 4. GRAPHICAL REPRESENTATION OF DATA

0 2 4 6 8 10 12 14Current through the Resistor (mA)

-2

0

2

4

6

8

10

12

Vol

tage

Diff

eren

ce A

cros

s th

e R

esis

tor (

V)

Voltage-Current Characteristic for a Carbon Resistor

A=( 0.0mA, 1.1V)

B=(13.5mA, 12.0V)

C=( 0.0mA, -0.9V)

D=(12.0mA, 12.4V)

A

B

C

D

Exp. PointsLimiting the Min Slope

Exp. PointsLimiting theMax Slope

Figure 4.2: Maximum an minimum slope straight lines intercepts all theexperimental points. If we try to draw a straight line with steeper slope,not all the point will be intercepted. The same is true for the minimumslope straight line

4.3. LINEAR PLOT AND LINEARIZATION 49

0 2 4 6 8 10 12 14Current through the Resistor , (mA)

-2

0

2

4

6

8

10

12

Vol

tage

Diff

eren

ce A

cros

s th

e R

esis

tor (

V)

Voltage-Current Characteristic for a Carbon Resistor

A=(12.0mA, 12.1V)

B=(13.1mA, 12.0V)

Exp. Point BLimiting the

A

Exp. PointLimiting theMin Slope

Max. Slope

O

Figure 4.3: Maximum an minimum slope straight lines with zero crossingpoint imposed. The same comments of the previous graphs applies to thefigure

50 CHAPTER 4. GRAPHICAL REPRESENTATION OF DATA

and G. The uncertainty on a and b can be calculated using the SPEL. Quiteoften, the complexity of inverting F , and G can make the linearizationmethod impractical.

Sometimes, the linearization of a function can be achieved with non-linear scales as shown in the next sections.

4.3.1 Example 1: Square Function

Let’s suppose thaty = ax2.

If we define ( the two functions to linearize the equation)

X(x) = x2, Y (y) = y

then we will have the linear function

Y (X) = aX,

that can be plotted in a linear graph. The parameter a is now the slope of astraight line, and it can be estimated using the already explained method.

4.3.2 Example 2: Power Function

If we havey = bxa, (4.4)

applying the logarithm to both sides of the previous expression, we willhave

log y = a log x + log b.

If we define the following functions

Y = log y, X = log x,

we will have the linear function

Y = aX + log b.

In this case the slope a and the intercept b of the straight line are respec-tively the exponent and the coefficient of the function y.

4.4. LOGARITHMIC SCALES 51

4.3.3 Example 3: Exponential Function

If we havey = beax,

applying the logarithm to the right-hand sides, we have

log y = ax + log b.

If we define the following functions Y = log y, X = x, we willfinally have the linear function to plot

Y = aX + log b.

4.4 Logarithmic Scales

A logarithmic scale is an axis, whose scale is proportional to the logarithmof the plotted value. The base of the logarithm is arbitrary for logarithmicscales and for sake of simplicity we will assume base 10.

For example, if we plot on a logarithmic scale the numbers log10 0.1 =−1, log10 1 = 0,and log10 10 = 1, they will be exactly equi-spaced. In gen-eral any multiple of power of 10 will be also equi-spaced. This distance isa characteristic of the logarithmic scale sheet and is called decade.

One of the main advantages of using logarithmic scales is that we areable to plot ranges of several orders of magnitude in a manageable sheet.The inconvenience is the great distortion of the plotted curves, which some-times can lead to a wrong interpretation of the results. This distortion be-comes small when the range amplitude of the scale ∆x = (b−a), is smallerthan the magnitude a of the range. In fact, if ∆x a, we will have

log(a + ∆x) = log[

a(1 +∆x

a)]

= log(a) + log(1 +∆x

a) ' log(a) +

∆x

a,

and the logarithmic scale is well approximated by the linear scale.Another advantage of using logarithmic scales is for the linearization

of functions as briefly discussed in the next subsection.

52 CHAPTER 4. GRAPHICAL REPRESENTATION OF DATA

4.4.1 Linearization with Logarithmic Graph Sheets

There are essentially two cases that can be linearized using logarithmicgraph sheets:

If the experimental points follow a power law (y = axb), we will obtaina straight line if we plot y and x on a logarithmic scale.

If the experimental points follow an exponential law (y = abx), thelinearization procedure of the example 2 , gives a straight line if we plotthe logarithm of y versus the logarithm of x.

4.5 Difference Plots

The possibility of visualizing the data points scattering from the theoret-ical curve in a graphic is quite important to asses the fit goodness. Quiteoften, if we plot experimental points, and the fitting curve together, it be-comes difficult to appreciate and analyze the difference between of theexperimental points and the curve. In fact, if the measurements range of aphysical quantity is greater than the average distance between the experi-mental points and the curve, data points and curve become indistinguish-able.

A way to avoid this problem, is to produce the so called difference plot,i.e the plot of the difference between the theoretical and the measuredpoints. In a difference plot, goodness of a fit and poorness of a theoret-ical model can be better analyzed.

4.5.1 Difference Plot of Logarithmic Scales

The difference of the experimental point and the theoretical point in a log-arithmic scale is

∆y = log(y) − log(y0) = log

(

y

y0

)

.

Supposing thaty = y0 + δy,

and considering the first order expansion

log(1 + ε) ' ε ε 1

4.5. DIFFERENCE PLOTS 53

we will get

∆y = log

(

1 +δy

y0

)

' δy

y0= y − y0.

Assuming that small deviations between the experimental and the theo-retical data, differences plot with vertical logarithmic scale is well approx-imated by a difference plot with a linear scale.

54 CHAPTER 4. GRAPHICAL REPRESENTATION OF DATA

Chapter 5

Parameter Estimation

When a function depends on a set of parameters, it give rise to the problemof estimating the values of those parameters. Starting from a finite set ofmeasurements, we can find a statistical determination of the parametersset.

We will see in the next sections two standard methods, the maximum-likelihood and least-square methods, to estimate parameters of a PDF andof a general function of one independent variable.

5.1 The Maximum Likelihood Principle (MLP)

Let x be a random variable and f its PDF, which depends on a set of un-known parameters ~θ = (θ1, θ2, ...θn)

f = f(x; ~θ).

If we have N independent samples ~x = (x1,x2, ..., xN) of x , the follow-ing quantity

L(~x; ~θ) =N∏

i=1

f(xi; ~θ),

is called the Likelihood of f . L is proportional to probability to get the setof samples ~x if the N samples are supposed to be independent.

The maximum likelihood principle (MLP) states that the best estimate ofthe parameters ~θ is the set of values which maximizes L(~x; ~θ).

55

56 CHAPTER 5. PARAMETER ESTIMATION

The MLP reduces the problem of parameter estimation to that of maxi-mizing the function L. In general, because it is not possible to compute theparameters ~θ analytically, numerical methods implemented in computersare quite often used.

5.1.1 Example: σ and µ of a Normally Distributed RandomVariable

Let’s suppose we have N independent samples of a normally distributedrandom variable x, whose σ and µ are unknown. Experimentally, this casecorresponds to measure several times the same physical quantity x withthe same instrument.

L in this case is

L(~x, ~θ) =

(

1√2πσ

)N

exp

[

−N∑

i=1

(xi − µ)2

2σ2

]

,

and we have to maximize it.Considering that the exponential function is monotone, we have just to

study the argument of the exponential. Imposing the following conditions

∂µ

[

−N∑

i=1

(xi − µ)2

2σ2

]

= 0,

∂σ

[

−N∑

i=1

(xi − µ)2

2σ2

]

= 0,

is sufficient to determine the absolute minimum of L.Solving the first equation respect to µ, we obtain the estimator µ of µ

µ =1

N

N∑

i=1

xi,

which replaced into the second equation, gives the estimation σ of σ

σ2 =1

N

N∑

i=1

(xi − µ)2.

5.1. THE MAXIMUM LIKELIHOOD PRINCIPLE (MLP) 57

The estimator of the variance is biased, i.e . the expectation value ofthe estimator is not the parameter itself

E[σ2] =(

1 − 1

N

)

σ2.

Because of that it is preferable to use the following unbiased estimator

s2 =1

N − 1

N∑

i=1

(xi − µ)2. ⇒ E[s2] = σ2.

What is the variance associated with the average ?To answer to this question, let’s consider the average variable,

x =1

N

N∑

i=1

xi,

which must be a Gaussian random variable. Using the pseudo-linearityproperty, its variance can be computed directly, i.e.

V [x] =1

Nσ2.

5.1.2 Example: µ of a set of Normally Distributed RandomVariables

Let’s suppose we have N independent samples of N normally distributedrandom variables x1, x2, ..., xN , having the same unknown expected valueµ but different known variances σ2

1, σ22 ..., σ

2N . Experimentally, this case cor-

respond to measure several times the same physical quantity x with dif-ferent instruments.

L in this case is

L(~x, ~θ) =

(

1√2π

)N N∏

i=1

1

σiexp

[

−N∑

i=1

(xi − µ)2

2σ2i

]

,

and we have to maximize it.Imposing the following condition

d

[

−N∑

i=1

(xi − µ)2

2σ2i

]

= 0,

58 CHAPTER 5. PARAMETER ESTIMATION

is sufficient to determine the absolute maximum of L.Solving the equation respect to µ, we obtain the estimator µ of µ

µ =N∑

i=1

1/σ2i

∑Nk=1 1/σ2

k

xi,

which is the weighted average. What is the variance associated with theweighted average ?

To answer to this question, let’s consider the weighted average of arandom variable,

x =N∑

i=1

1/σ2i

∑Nk=1 1/σ2

k

xi

Using the pseudo-linearity property, its variance can be directly computed,i.e.

V [x] =1

∑Ni=1 1/σ2

i

5.2 The Least Square Principle (LSP)

Let’s suppose that we have a function y of one variable x, and of a set ofparameters ~θ = (θ1, θ2, ...θn)

y = y(xi; ~θ).

If we have N couples of values of (xi, yi ± σi) with i = 1, ..., N , thefollowing quantity

χ2(~x; ~θ) =N∑

i=1

yi − y(xi; ~θ)

σi

2

,

is called chi square1 of y.The Least Square Principle (LSP) states, that the best estimate of the

parameters ~θ is the set of values which minimizes the χ2.If σi = σ for i = 1, ..., N , the χ2 expression can be simplified because σ

can be eliminated and the search of the χ2 minimum becomes easier .

1The χ2 must be considered a symbol, i.e. we cannot write χ =√

χ2.

5.2. THE LEAST SQUARE PRINCIPLE (LSP) 59

It is important to notice that in this formulation the LSP requires theknowledge of the σi , which are the uncertainties of each measurementyi and assumes no uncertainties are associated with the xi . In a moregeneral and correct formulation the uncertainties on the xi should be ac-counted (see appendix D). In general, uncertainties on the x variable canbe neglected if

σxi

xi

σyk

yk

i = 1, 2, ..., N, k = 1, 2, ...., N

It can be easily proved that in the case of a normally distributed ran-dom variable the MLP and the LSP are equivalent, i.e. applying the twoprinciples to the Gaussian function, we get the same estimator for σ andµ.

5.2.1 Geometrical Meaning of the LSP

Neglecting the uncertainties σi, the χ2 is just the sum of the square of thedistance between the curve’s points y(xi) and the points yi. The χ2 mini-mization corresponds to the search of the best curve, which minimizes thedistance between the points yi and the curve’s points, varying the param-eters θk.

The introduction of the uncertainties is necessary if we want to performa statistical analysis instead of just solving a pure geometrical problem.

5.2.2 Example: Linear Function

Let’s suppose that the function we want to fit is a straight line

y(x) = ax + b,

where a and b are the parameters we have to determine. For sake of sim-plicity, all the experimental points y1, y2, ..., yn have the same uncertaintyσy, and the uncertainty on x1, x2, ..., xn are negligible

Applying the LSP to y(x), we get

χ2(~x; a, b) =1

σy

N∑

i=1

(yi − axi − b)2 .

60 CHAPTER 5. PARAMETER ESTIMATION

Computing the partial derivatives with respect to the parameters a andb, equating them to zero

∂χ2

∂a= 0 ,

∂χ2

∂b= 0 ,

and solving the linear system for a, and b, we finally get2

a =

∑Ni=1(xi − x)(yi − y)∑N

i=1(xi − x)2, b =

y∑N

i=1 x2i − x

∑Ni=1 xiyi

∑Ni=1(xi − x)2

,

where

x =1

N

N∑

i=1

xi , y =1

N

N∑

i=1

yi .

The functions a, and b are the best estimators of a and b parameters givenby the LSP.

Uncertainties on a and b can be estimated applying the SPEL to the a

and b expressions. After a tedious algebra we get

σa ' σy

N∑

i=1

(

∂a

∂yi

)2

, σb ' σy

N∑

i=1

(

∂b

∂yi

)2

,

where the partial derivatives with respect to x1, ...xN are neglected becausewe are supposing that their uncertainties are negligible.

5.2.3 The Reduced χ2(Fit Goodness)

The Reduced χ2 is defined as

χ2/(N − d) =1

N − d

N∑

i=1

yi − y(xi; ~θ)

σi

2

,

where d , which is called the number of degrees of freedom, is the numberof parameters to be estimated.

2As already mentioned, the use of the “hat symbol” is just to distinguish the parameterfrom its estimator, i.e. a is the estimator of a.

5.3. THE LSP WITH THE EFFECTIVE VARIANCE 61

The meaning of the reduced χ2 can be intuitively understood as fol-lows. Because the difference between the fitted valued y(xi; ~θ) and theexperimental value yi is statistically close to σi, we can naively estimatethe χ2 value to be equal to N (

σi) /σi = N . Dividing χ2 by N , we alwaysexpect to obtain a number close to 1. A rigorous theoretical approach canexplain the need to subtract d from N , but it is far from the intent of thisintroductory note. It is worthwhile to notice that this subtraction is notnegligible for small values of N which are comparable to d .

For N − d < 20, the reduced χ2 should be slightly less than one.The list reported below, contains the main possible reason of a reduced

χ2 not close to one:

• Uncertainties on xi and/or on yi are too small ⇒ χ2/(N − d) > 1,

• Uncertainties on xi and/or on yi are too large ⇒ χ2/(N − d) < 1,

• a small number of data points too much scattered from the theoreti-cal curve ⇒ χ2/(N − d) > 1,

• wrong fitting function or poor physical model ⇒ χ2/(N − d) > 1,

• Poor Gaussianity of the data points distribution ⇒ χ2/(N − d) > 1,

A more rigorous statistical interpretation of the reduced χ2 value impliesthe study of its PDF.

5.3 The LSP with the Effective Variance

To generalize the LSP in the case of appreciable uncertainties on the inde-pendent variable x, we can use the so called effective variance, i.e.

σ2i =

(

∂f

∂x

)2

σ2xi

+ σ2yi.

where σxi, and σyi

are the uncertainties associated to xi , and yi respectively.The derivative is calculated in xi

Replacing this new definition of σi in previous the definition of the χ2 ,we will take into account the effect of the uncertainty on x.

The proof of this formula is given in appendix D.

62 CHAPTER 5. PARAMETER ESTIMATION

5.4 Fit Example (Thermistor)

Thermistors, a device made of a semiconductor material [5], are remark-ably good candidates to study fit techniques. In fact, the temperature de-pendency of their energy band gap Eg(T ), makes the resistance vs. tem-perature characteristic ideal to show some of the fit issues and aspects.

The thermistor response is described by the equation

R(T ) = R0eEg(T )/2kbT .

where R is the resistance, T is temperature in Kelvin, and kb is the Boltz-mann constant.

Taking the logarithm of the previous expression, we will have

log R = log R0 +Eg(T )

2kT.

Neglecting the temperature dependence of Eg, we can linearize the ther-mistor response as follows

1

T=

2k

Eglog R − 2k

Eglog R0, ⇒ y =

1

T, x = log R.

Following what has been done in a published article [6], the corrections tothe linear fit can be empirically introduced using a polynomial expansionin log R, i.e.

1

T= C0 + C1 log R + C2 log2 R + C3 log3 R.

5.4.1 Linear Fit

Applying a linear curve fit, we obtain

1T

= C0 + C1 log R

C0 = (2.67800 ± 0.0005) · 10−3a.u.C1 = (3.0074 ± 0.00024) · 10−4a.u.

χ2/(N − 2) = 4727.5

5.4. FIT EXAMPLE (THERMISTOR) 63

It is clear from the difference plot in figure 5.1, and the reduced χ2

value that there is more than just a linear trend in the data set. In fact, thedifference plot shows a quadratic curve instead of a random distribution ofdata-points around the horizontal axis. It is worth to notice that it is quitehard to appreciate from figure 5.1 any difference between the straight lineand the experimental points.

5.4.2 Quadratic Fit

Applying a quadratic curve fit, we obtain

1T

= C0 + C1 log R + C2 log2 R

C0 = (2.68800 ± 0.0005) · 10−3a.u.C1 = (2.7717 ± 0.00067) · 10−4a.u.C2 = (5.460 ± 0.014) · 10−6a.u.

χ2/(N − 3) = 34.2

Again, it is difficult to appreciate any trend of the data points in theplot shown in figure 5.2. Anyway, the reduced χ2 is still larger than 1 andthe difference plots clearly shows a residual trend in the data.

5.4.3 Cubic Fit

Applying a cubic curve fit, we obtain

1T

= C0 + C1 log R + C2 log2 R + C3 log3 R

C0 = (2.6874 ± 0.0003) · 10−3a.u.C1 = (2.8021 ± 0.0012) · 10−4a.u.C2 = (3.376 ± 0.066) · 10−6a.u.C3 = (2.88 ± 0.09) · 10−7a.u.

χ2/(N − 4) = 0.403

In this final case, the reduced χ2 value smaller than one and the scat-ter of the data from the horizontal axis in the difference plot of figure 5.3,

64 CHAPTER 5. PARAMETER ESTIMATION

suggest that the assumed uncertainty on the temperature σT = 0.02K isprobably a little too large. Apart from an increase of the data-points un-certainty, no special trend seems to be visible on the difference plot.

The following table contains the data points used for the thermistorcharacteristic fits .

Point Resist. Temp. ∆T ∆R Point Resist. Temp. ∆T ∆R(#) R(Ω) T (K) (K) (Ω) (#) R(Ω) T (K) (K) (Ω)

1 0.76 383.15 0.02 0 17 10.00 298.15 0.02 02 0.86 378.15 0.02 0 18 12.09 293.15 0.02 03 0.97 373.15 0.02 0 19 14.68 288.15 0.02 04 1.11 368.15 0.02 0 20 17.96 283.15 0.02 05 1.45 358.15 0.02 0 21 22.05 278.15 0.02 06 1.67 353.15 0.02 0 22 27.28 273.15 0.02 07 1.92 348.15 0.02 0 23 33.89 268.15 0.02 08 2.23 343.15 0.02 0 24 42.45 263.15 0.02 09 2.59 338.15 0.02 0 25 53.39 258.15 0.02 0

10 3.02 333.15 0.02 0 26 67.74 253.15 0.02 011 3.54 328.15 0.02 0 27 86.39 248.15 0.02 012 4.16 323.15 0.02 0 28 111.3 243.15 0.02 013 4.91 318.15 0.02 0 29 144.0 238.15 0.02 014 5.83 313.15 0.02 0 30 188.4 233.15 0.02 015 6.94 308.15 0.02 0 31 247.5 228.15 0.02 016 8.31 303.15 0.02 0 32 329.2 223.15 0.02 0

5.5 Fit Example (Offset Constant)

The following dataare the measurement of the voltage-current (V versus I ) characteristic

of silicon diode[5], i.e

I(V ) = I0

(

eqV/ηkbT − 1)

,

where I0 is the reverse bias current, q is the electron charge, kb the Boltz-mann constant T the absolute temperature, and η a parameter, which isequal to 2 for silicon diodes

5.5. FIT EXAMPLE (OFFSET CONSTANT) 65

0 1 2 3 4 5

3

3.5

4

x 10−3

1/T

(K)

Thermistor Characteristic Curve Fit

0 1 2 3 4 5

−1

0

1

2

3

4

5

6

x 10−5

Resistance (log(Ohm))

1/T

(K)

Difference Plot

Figure 5.1: Linear fit of the thermistor data. The difference plot clearlyshow the bad approximation of the linear fit.

66 CHAPTER 5. PARAMETER ESTIMATION

0 1 2 3 4 5

3

3.5

4

x 10−3

1/T

(K)

Thermistor Characteristic Curve Fit

0 1 2 3 4 5

−2

−1

0

1

2

3

4

5

x 10−6

Resistance (log(Ohm))

1/T

(K)

Difference Plot

Figure 5.2: Quadratic fit of the thermistor data. The plot of the experimen-tal points and the theoretical curve seems to show a good agreement. Thedifference plot clearly still shows a coarse approximation of the quadraticfit.

5.5. FIT EXAMPLE (OFFSET CONSTANT) 67

0 1 2 3 4 5

3

3.5

4

x 10−3

1/T

(K)

Thermistor Characteristic Curve Fit

0 1 2 3 4 5

−6

−4

−2

0

2

4

6

8x 10−7

Resistance (log(Ohm))

1/T

(K)

Difference Plot

Figure 5.3: Cubic fit of the thermistor data. The plot of the experimentalpoints and the theoretical curve, and the difference plot don’t show anyclear systematic difference between the experimental point an theoreticalcurve.

68 CHAPTER 5. PARAMETER ESTIMATION

Figure ? , which has been made using the following fitting curve

y = aebx,

shows an systematic quadratic residue in the difference plot.Fitting the experimental points with the following curve

y = aebx + c

the quadratic trend in the difference plot disappears.This effect is typical of fits in which an offset constant is not introduced

that produces systematic trends in the difference plots.an indirect mea-surement.

Appendix A

Central Limit Theorem

• Let’s consider a set of N independent random variables ξ1, ξ2, . . . , ξN

having all the same PDF p(ξi) with the following parameters1

E[ξi] = µ, V [ξi] = σi, ∀i ∈ 1, 2, . . . , N

• Let ’s then consider the following random variable

x =N∑

i

ξi.

The central limit theorem states that, for N → ∞, x is normally distributedaround µ and its variance σ is the sum of the random variable variances,i.e.

N 1 ⇒ p(x) ' 1

2√

πσ2e

(x−µ)2

2σ2 ,

E[x] = µ

V [x] =∑N

1 σi

The demonstration of the theorem is rather complicated and it is out ofthe purpose of this work.

It is worthwhile to notice that this theorem suggests a simple way togenerate values of a normally distributed variable. In fact each normallydistributed value can be computed by just adding a large number of valuesof any given random variable. The σ and the µ can be estimated from theset of generated values.

1This theorem has more general formulation. It has been given proof first by Laplaceand then extended by other mathematicians such as P.L.Chebychev,A.A. Markov andA.M. Lyapunov and others.

69

70 APPENDIX A. CENTRAL LIMIT THEOREM

Appendix B

Statistical Propagation of Errors

We want to find a approximate formula to compute the variance of a func-tion of random variables by using the standard approach of the Taylorexpansion.

Let f be a function of a n random variables x = (x1, x2, ..., xn).The first order Taylor expansion of f(x) about µ = (µ1, µ2, ..., µn) where

µi = E[xi], i = 1, 2, ...n,

is

f(x) = f(µ) +n∑

1

∂f(x)

∂xi

xi=µi

(xi − µi) + O (xi − µi) , (B.1)

where O indicates the higher order terms, higher than its argumentorder.

Using the (B.1) in the variance definition expression for f(x), we get

V [f(x)] = E[(f(x) − f(µ))2] (B.2)

= E

n∑

1

∂f(x)

∂xi

xi=µi

(xi − µi)

2

(B.3)

By the aid of the expected value properties, we obtain

V [f(x)] =n,n∑

i=1,j=1

∂f(x)

∂xi

xi=µi

∂f(x)

∂xj

xj=µj

E[(xi − µi) (xj − µj)].

71

72 APPENDIX B. STATISTICAL PROPAGATION OF ERRORS

Defining the covariance matrix Vij as follows

Vij = E[(xi − µi) (xj − µj)], i, j = 1, 2, ....n,

the previous expression of the variance of f finally becomes

V [f(x)] =N,N∑

i=1,j=1

∂f(x)

∂xi

xi=µi

∂f(x)

∂xj

xj=µj

Vij (B.4)

which is the law of statistical propagation of errors.

Appendix C

NPDF Random VariableUncertainties

Let’s consider a set of independent measurements X = x1, x2, ..., xn ofthe same physical quantity x following the NPDF.

Measurement Set with no Uncertainties (Unweighted Case)

The uncertainty s of each single measurement xi, and the way to report it,are

s2 =1

N − 1

N∑

i=1

(xi − x)2, x =1

N

N∑

i=1

xi, ⇒ x = (xi ± s)units

For the mean of the measurements xand its uncertaintysx, we have

x =1

N

N∑

i=1

xi, s2x =

s2

N, ⇒ x = (x ± sx)units

Measurement Set with Uncertainties (Weighted Case)

If each measurement has an uncertainty Σ = σ1, σ2, ..., σn, we will triv-ially have

x = (xi ± σi)units

For the mean of the measurements and its uncertainty, we have

73

74 APPENDIX C. NPDF RANDOM VARIABLE UNCERTAINTIES

s2x =

1∑N

i=1 1/σ2i

, x = s2x

N∑

i=1

xi

σ2i

, ⇒ x = (x ± sx)units

Appendix D

The Effective Variance

Let’s x be a random variable normally distributed and y = f(x) anotherrandom variable. Let’s suppose that each data point (xi±σxi

, yi±σyi), i =

1, ..., N , is distributed around (xi, yi).Applying the MLP to y and x we obtain

L(x, y) =N∏

i=1

1

σxi

√2π

exp

[

−(xi − xi)2

2σ2xi

]

× 1

σyi

√2π

exp

[

−(yi − yi)2

2σ2yi

]

=1

σxi

√2π

1

σyi

√2π

exp S,

where

S =N∑

i=1

[

−(xi − xi)2

2σ2x

+(yi − yi)

2

2σ2yi

]

,

Making the following approximation

y(xi) ' f(xi) + (x − xi)f′(xi),

S becomes

S 'N∑

i=1

[

(xi − xi)2

2σ2x

+[f(xi) + (x − xi)f

′(xi) − yi]2

2σ2yi

]

.

In this case, the condition of maximization of L(x, y)

∂S

∂xi= 0, i = 1, ..., N,

75

76 APPENDIX D. THE EFFECTIVE VARIANCE

gives2(xi − xi)

σ2x

+2yi − [f(xi) − (xi − xi f

′(xi)f ′(xi)

σ2x

= 0.

Solving with respect to the parameters xi we obtain the expression

xi = xi −σ2

xi

σ2i

[f(xi) − yi)]f′(xi),

whereσ2

i = [f ′(xi)]2σ2

xi+ σ2

yi.

Replacing this expression in the definition of S we obtain

S =N∑

i=1

[yi − f(xi)]2

2σ2i

The previous expression is indeed the new S that must be minimized.

Bibliography

[1] P.R. Bevington, D. K. Robinson, Data Reduction and Error Analysisfor the Physical Science, second edition, WCB McGraw-Hill.

[2] J. Orear, Least squares when both variables have uncertainties, Am.J. Phys. 50(10), Oct 1982

[3] S. G. Rabinovich, Measurements Errors and Uncertainties Theoryand Practice, second edition, Springer .

[4] C. L. Nikias, A. P. Petropulu, Higher-Order Spectral Analysis, PTRPrentice Hall.

[5] V. de O. Sannibale, Basics on Semiconductor Physics, Freshman Lab-oratory Notes, //http://www.ligo.caltech.edu/~vsanni/ph3/

[6] Deep Sea Research, 1968, Vol 15, pp 497 to 501, Pergamon Press(printed in Great Britain).

77