modern methods of data analysis - physikalisches institutmenzemer/stat... · modern methods of data...
TRANSCRIPT
![Page 1: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/1.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Modern Methods ofData Analysis
Lecture II (22.10.07)
● Characterize distributions– average, spread ...
● Correlations, covariance
Contents:
![Page 2: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/2.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
● arithmetic mean of data set:
● weighted mean of data set:
● mode – most probable value (peak in distribution)
● median – smallest value which is ≥ 50% of events` better use median than mean, more robust against outliers!
● similar defined Quantile: Median = 50% Quantil
Reminder: Average
![Page 3: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/3.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Reminder: Variance●
mean square deviation called sample variance
● RMS (root mean square) – standard deviation σ
● FWHM (Full Width at Half Maximum)
FWHM more robust againstoutliers than RMS!For describing core distributionuse FWHM, for describing tailsuse RMS
![Page 4: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/4.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Expectation Values● So far characterized given set realization of an
experiment (sum over N) by sample mean, sample spread ...
● Now talk about mean, spread of a distribution:
Note
However for N->∞, Law of large numbers
![Page 5: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/5.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Variance of a Distribution:
● V[x] = E[(x-μ)²] =
● V[x] = E[(x-μ)²] =
● V[x] = E[x²] – µ²
V[x] is the measure of the spread of the distribution,not how well the mean is defined!
![Page 6: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/6.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Example:
N = 100
N = 10000
N = 1000
µ = 5σ = 1
![Page 7: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/7.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
How to determine uncertainty on the mean?
● E[ x ] = ???● V[ x ] = ???
![Page 8: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/8.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Uncertainty of Mean
![Page 9: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/9.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
● CDF has a mass resolution of 16 MeV– the reconstructed mass of a single B meson is spread
around the true B mass with σ=16 MeV● The B mass can be measured with way better precision
m(B0) = 5279.63 ± 0.53 (stat) ± 0.33 (sys)
![Page 10: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/10.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Unbiased Estimators:
Unbiased Estimator “erwartungstreuer Schätzer”
unbiased estimator for true mean µ is :
for n data points, we estimate the variance true V(x) by the“sample variance s²” - if true mean µ is known!
- If the true mean is unknown, then an unbiased estimator for the variance σ² is the “sample variance s²”:
beware of N-1!
“One single value is not enough to determine mean and spread.”
![Page 11: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/11.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Solution: Unbiased Estimators
![Page 12: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/12.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Efficiency of Estimators
● Optimal Estimator: Result of Maximum Likelihood Fit (see later lectures) ”optimal” ↔ smallest variance
● Efficiency of Estimator: “variance of estimator/variance of opti. estimator”
● For Gaussian distribution is optimal estimator
● non optimal estimators are called not robust
● E.g. Median of Gauss distribution has 64% efficiency
![Page 13: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/13.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Truncated Mean
● truncated mean (“getrimmter Mittelwert”):– e.g. r = 40% truncated mean:
● 10% lowest and 10% highest values ignored, calculate mean of 80% central values
– r = 50% truncated mean ->– r -> 0% -> median
![Page 14: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/14.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Truncated Mean
Cauchy Laplace ordouble exponential
r = 0.23 truncatedmean best estimatorfor unkown sym. distribution
![Page 15: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/15.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Moments
● r-th algebraic moment ● r-th central moment
Expectation value: 1. algebraic momentVariance: 2. central moment
“Schiefe”/skewness- dimensionless, pos. for right winged distributions
“Wölbung”/kurtosis- measure for ratio of core relative to tails- pos. kurtosis: flatter/broader than Gaussian
![Page 16: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/16.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Skewness & Kurtosis
kurtosis < 0 kurtosis > 0
Gaussian distribution have kurtosis = 0
![Page 17: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/17.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Which fraction of events is within 1,2,3 σ
4σ3σ
1σ
2σ
This is only true for Gaussian distributions!
![Page 18: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/18.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Biennaymé-Tchebycheff-Inequality
For every distribution the following inequality is valid:
k Gauss Tchebycheff
1 0.317 1.02 0.0555 0.253 0.0027 0.11114 0.000063 0.0625
![Page 19: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/19.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Solution: Biennaymé-Tchebycheff-Inequality
Given a PDF f(x) and a function w(x)≥0:
with :
![Page 20: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/20.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Two Dimensional Distributions
● box plot● lego plot● surface plot● numbers● scatter plot● color map● contour plot● ...
Multiple ways to visualize 2-dim distributions
![Page 21: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/21.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Two dimensional Distributions
● straight generalization of 1-dim PDFs
A 2-dim PDF is a function f(x,y)≥0 with
![Page 22: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/22.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Marginal Distributions● Marginal distributions: projection on the axis
“Randverteilungen”
![Page 23: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/23.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Conditional Probability ●
![Page 24: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/24.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Covariance (I)
● x,y independent:– h(x|y) = f(x) for all y and h(y|x) = f(y) for all x– f(x,y) = f(x)f(y)
covariance:
Note: cov (x,x) = V(x)
![Page 25: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/25.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
● estimate for cov(x,y) from sample:
Covariance (II)
![Page 26: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/26.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Exercises:
● Proof: V(x+y) = V(x) + V(y) + 2 cov(x,y)
● Given are the following data points (x,y):(1,1), (2,1), (-3,-1), (2,2), (1,5) Give estimate for cov(x,y)
![Page 27: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/27.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
● based on the covariance between two variables x,y
● define correlation coefficient ρ– ρ ranges between +1 and -1– if two variables are uncorrelated, then ρ=0
Correlation
![Page 28: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/28.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Example Correlations● Two correlated Gaussian distributions:
ρ = 0.97ρ = -0.7
ρ = 0.0ρ = 0.5
![Page 29: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/29.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Example: Linear Correlation
● y = ax+b; x = 0.5 in [-1,1], else 0● Calculate <y>?● Calculate <xy>?● Calculate cov(x,y) = <xy>-<x><y>● Calculate
![Page 30: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/30.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Example: Parabola
● y = x²; x 0.5 in [-1,1], else 0● calculate <y>● calculate <xy>● calculate cov(x,y)
![Page 31: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/31.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Example ρ=0.0
● By construction the covariance vanishes for uncorrelated variables. The opposite is not true, zero covarinace does not necessarily mean that the variables are uncorrelated.
● E.g.
![Page 32: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/32.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Example ρ = 0:
![Page 33: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... Skewness & Kurtosis kurtosis](https://reader033.vdocument.in/reader033/viewer/2022041711/5e483abd50be330a3478e233/html5/thumbnails/33.jpg)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Correlation ≠ Causality
To be read in a newspaper:
If you take your time for your studies you earn more money!
taken from “So lügt man mit Statistik”