modern methods of data analysismenzemer/stat09/statistik_05.… · example: pendulum measure length...

34
Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer Modern Methods of Data Analysis Lecture VI (05.05.09) Uncertainties: Error propagation Correlated uncertainties Systematic uncertainties Introduction to Estimators Contents:

Upload: others

Post on 30-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Modern Methods of Data Analysis

Lecture VI (05.05.09)

● Uncertainties:– Error propagation– Correlated uncertainties– Systematic uncertainties

● Introduction to Estimators

Contents:

Page 2: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Error Propagation: 2D Example● like to measure speed v of a car; therefore (uncorrelated)

measure distance s and time t: v = s/t

● measurements of distance s:

● measurements of the time t:

● first order Taylor expansion:

Page 3: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Error Propagation (I)● x =

● Vi,j and µi known

● y(x) is function of

● first order Taylor expansion ...

Page 4: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Error Propagation (II)

= 0= V_ij

Page 5: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Gaussian error propagation

● Error estimates for functions of several correlated variables :

Normal errorsfor uncorrelated variables

Additional terms accountingfor correlations

Special case, uncorrelated variables:

This is called Gaussian error propagation, however has nothing to do with Gaussian distributions

Page 6: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

And the same in more dimensions

(A is Jacobi matrix)

Page 7: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Be aware ....● The approximation using Taylor expansion breaks down

if the function is significantly not linear in the region ± 1σ around the mean value.

Example: momentum estimate in B field; p ~ 1/κ

10 % momentum bias!

Page 8: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

2. Order Taylor Expansion

Page 9: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Example Systematic Error

● This is a systematic bias and not a systematic uncertainty! To neglect this effect is a systematic mistake.

● Effects can be corrected for! If the temperature coefficient and lab temperature is known (exactly), then there is no systematic uncertainty.

● If we correct for effect, but corrections are not known exactly, then we have to introduce a systematic uncertainty.

● In practice (unfortunately): often not corrected for such effects, but then just “included in sys. uncertainties”.

Measurements are taken with a steel ruler, the ruler wascalibrated at 15C, the measurements done at 22C.

Page 10: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Systematic Error ● Definition: A systematic uncertainty denotes the

uncertainty in effects caused by systematic mistakes (bias) and caused by neglecting systematic mistakes

● A systematic mistake is not a systematic uncertainty.

● Note:– sys. error do NOT decrease with 1/√N

– statistical and systematic errors can in general be added in quadrature (if uncorrelated; else include correlations)

– need to quote them separately in the results, they are often correlated among experiments:

m(B0) = 5279.63 ± 0.53 (stat) ± 0.33 (sys)

Page 11: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Combing Errors (I)● Suppose you have two measurements , with a

random (statistical) uncertainties and a common systematic error S. How to make the covariance matrix?

Page 12: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Combining Errors (II)●

Consider and as sum of three uncorrelated random variables:

Now just use error propagation ...

Page 13: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Combining Errors (II)

More extended case, three measurements with one common systematic uncertainty S, and one systematic uncertainty Tcommon for two of the measurements

Consider and as sum of three random variables: assign according uncertainties

Page 14: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Example: Pendulum● Measure length of bar by measuring period of

pendelum. Take two time measurements at different temperature. Compute the difference in length: and associated uncertainties.

● Given statistical uncertainties on the time measurements, additional common systematic uncertainty on the time measurement ( )and common systematic uncertainty on g ( ).

Page 15: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Evaluating Systematic Errors (I)● Distinguish systematic errors from known and from

unsuspected sources

● known sources:– error on factors in the analysis, energy calibration,

tracking efficiencies, corrections, ...– error on external input: theory error, error on

branching ratios, masses, fragmentation

● evaluate systematic uncertainties from known sources s(i) on result R.– take several typical assumptions on s(i), compute R

for each of them. Compute standard deviation of R– take two extreme assumptions, compute R. Take

difference of results divided by √12

Page 16: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Evaluating Systematic Errors (II)● Errors from unsuspected sources need first to be

identified

● repeat the analysis in different form helps to find systematic effects– vary the range of data used for extraction of the result,

use subset of data– change cuts, change histogram binning– change parameterizations, change fit techniques– look for impossibilities

● It is clearly wrong to add in quadrature resulting deviations from the check list as systematic error – this is misconception Moreover, the more careful you are doing more checks, the bigger should your systematic be??? - No!

Page 17: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Evaluating Systematic Errors (III)● define before the consistency checks a pass/fail

criteria. Remember with 20 checks you expect on average one 2σ deviation. However uncertainties are highly correlated!

● if you do not expect a systematic effect a priori and if the deviation is not significant, then do not add this in the systematic error

● if there is a deviation, try to understand, where the mistake is in the analysis and fix it!

● only as a last resort include discrepancy in systematic error

Page 18: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Evaluating Systematic Errors (IV)● Conservative estimate of uncertainties ....

● Physicists tend to overestimate their systematics:

– “If we estimate them conservatively, we are save in case we have forgotten to evaluate one source.”

– How can we be sure that this identified source is covered by the conservative uncertainties ??!!

– This is (commonly used) non-sense.

Page 19: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Estimators

Page 20: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Estimator● Definition: An estimator is a procedure applied to a

data sample which gives a numerical value for a property or parameter of the underlying parent distribution.

● We are mainly concerned with 3 properties of estimators:

– Consistency

– Bias

– Efficiency

Page 21: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Consistency/Bias/Efficiency● Estimator of true value a:

-> is a consistent estimator

● is unbiased estimator

● is more efficient than – there is a most efficient estimator– the most efficient estimator is very often

biased, however this is no reason not to use it!

Page 22: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Example

● You have a dataset distributed according to a Gaussian of unknown mean µ

● Give examples of unbiased estimator for µ

● Give examples for biased but consistent estimators for µ.

Page 23: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Likelihood Function

● How to compute or ?

● Likelihood L:

f: pdf depending on paramter a

Page 24: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Minimum Variance Bound (I)

Given an unbiased estimator

-> differentiate for a:

with

-> differentiate for a:

-> ->

Page 25: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Minimum Variance Bound (II)

Using Schwarz inequality:

Page 26: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Minimum Variance Bound (III)

● If estimator is efficient.

● Otherwise it's efficiency is defined as

● is called Fisher-Information

● MBV as well called Rao-Cramer Frechet bound

● For biased estimators:

Page 27: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Some side calculation ...

Page 28: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Is Arithmetic Mean, Efficient for Gauss?

arithmetic mean is efficient estimator

Page 29: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Example: exponential pdf● Exponential distribution with unknown parameter т

E[t] = τ V[t] = τ²

Is arithmetic mean best unbiased estimator?

Page 30: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Re: Truncated Mean● truncated mean (“getrimmter Mittelwert”):

– e.g. r = 40% truncated mean:● 10% lowest and 10% highest values ignored, calculate mean of 80% central values

– r = 50% truncated mean ->– r -> 0% -> median

r = 0.23 truncatedmean best estimatorfor unkown sym. distribution

Page 31: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Maximum Likelihood (I)● Assume N measurements of a random variable● Assume them to be independent and distributed according

to a known pdf f(x|a), with unknown parameter a● Want to get the best estimate for the true parameter a.

● General concept:Maximum Likelihood (=ML) estimation● probability to have first measurement in

is● joint probability:

● Define Likelihood-Function as joined pdf:

Page 32: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Maximum Likelihood (II)● Note that the Likelihood is a sampling function, i.e., a

random variable, it is not an analytical probability density of the true parameter a. Subtlety: It is called likelihood an not probability!

● For a given measurements the Likelihood function L = L(a) is the probability to obtain exactly for a given value a.

● If hypothesis f(x|a) and parameter a are correct, then we expect a high probability for these measured data sets.

Page 33: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Maximum Likelihood (III)● According to the ML principle the best estimation of a is

the value which maximizes L(a), i.e., which maximizes the probability to obtain the observed data L(a) = maximum

● It is very important to keep normalization of f(x|a) in every step. The maximum is computed by

● Or for several parameters simultaneously: often only numerical solvable!

Page 34: Modern Methods of Data Analysismenzemer/Stat09/statistik_05.… · Example: Pendulum Measure length of bar by measuring period of pendelum. Take two time measurements at different

Modern Methods of Data Analysis - SS 2009 Stephanie Hansmann-Menzemer

Maximum Likelihood (IV)

● In practice work with (natural) logarithm of Likelihood. So-called Log-Likelihood. log is monotone rising function, thus same maximum for L and for log(L).

● Often numerically easier: minimise -ln L(a)– > negative Log-Likelihood

● Note: The ML-estimation gives a value . For this value

the observed data correspond then to the most probable (plausible) measurement (compared to other parameter values of a). This does not mean, that is the most probable value.