summary statistics
DESCRIPTION
Summary Statistics. Jake Blanchard Spring 2008. Summarizing and Interpreting Data. It is useful to have some metrics for summarizing statistical data (both input and output) 3 key characteristics are central tendency (mean, median, mode) Dispersion (variance) Shape ( skewness , kurtosis). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/1.jpg)
Uncertainty Analysis for Engineers 1
Summary StatisticsJake BlanchardSpring 2008
![Page 2: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/2.jpg)
Uncertainty Analysis for Engineers 2
Summarizing and Interpreting Data
It is useful to have some metrics for summarizing statistical data (both input and output)
3 key characteristics are ◦central tendency (mean, median,
mode)◦Dispersion (variance)◦Shape (skewness, kurtosis)
![Page 3: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/3.jpg)
Uncertainty Analysis for Engineers 3
Central TendencyMean
Median=point such that exactly half of the probability is associated with lower values and half with greater values
Mode=most likely value (maximum of pdf)
dxxfxxEpxxE i
n
ii )()()(
1
z
dxxf 5.0)(
![Page 4: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/4.jpg)
Uncertainty Analysis for Engineers 4
For 1 Dice
5.3mod5.3
5.3)(616
615
614
613
612
611)()(
6
1
exmedianxE
xpxxE
mean
ix
ii
![Page 5: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/5.jpg)
Uncertainty Analysis for Engineers 5
Radioactive DecayFor our example, the mean, median,
and mode are given by
The mode is x=0
)2ln(
5.0
1)()(
0
0
z
dte
median
dtetdtttftE
mean
zt
t
![Page 6: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/6.jpg)
Uncertainty Analysis for Engineers 6
Other CharacteristicsWe can calculate the expected
value of any function of our random variable as
iii xpxh
dxxfxh
xhE
)()(
![Page 7: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/7.jpg)
Uncertainty Analysis for Engineers 7
Some Results
n
jjj
n
jjj
n
jj
n
jj
xEbxbE
xExE
xcEcxEccE
11
11
)()()(
![Page 8: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/8.jpg)
Uncertainty Analysis for Engineers 8
ii
ki
k
kk
xpx
dxxfx
xE
dxxfx
)(
)(
)(
1
1
1
1
Moments of DistributionsWe can define many of these
parameters in terms of moments of the distribution
Mean is first moment. Variance is second momentThird and fourth moments are related
to skewness and kurtosis
![Page 9: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/9.jpg)
Uncertainty Analysis for Engineers 9
Spread (Variance)Variance is a measure of spread or dispersion
For discrete data sets, the biased variance is:
and the unbiased variance is
The standard deviation is the square root of the variance
dxxfxxE )(21
212
2
n
i
xxn
s1
22 1
n
i
xxn
s1
22
11
![Page 10: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/10.jpg)
Uncertainty Analysis for Engineers 10
Skewnessskewness is a measure of
asymmetry
For discrete data sets, the biased skewness is related to:
The skewness is often defined as
dxxfxxE )(31
313
n
i
xxn
m1
331
33
1
![Page 11: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/11.jpg)
Uncertainty Analysis for Engineers 11
Skewness
![Page 12: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/12.jpg)
Uncertainty Analysis for Engineers 12
Kurtosiskurtosis is a measure of
peakedness
For discrete data sets, the biased kurtosis is related to:
The kurtosis is often defined as
dxxfxxE )(41
414
n
i
xxn
m1
441
344
2
![Page 13: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/13.jpg)
Uncertainty Analysis for Engineers 13
Kurtosis
Pdf of Pearson type VII distribution with kurtosis of infinity (red), 2 (blue), and 0 (black)
![Page 14: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/14.jpg)
Uncertainty Analysis for Engineers 14
Using MatlabSample data is length of time a
person was able to hold their breath (40 attempts)
Try a scatter plotload RobPracticeHolds; y = ones(size(breathholds));h1 = figure('Position',[100 100 400
100],'Color','w');scatter(breathholds,y);
![Page 15: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/15.jpg)
Uncertainty Analysis for Engineers 15
Adding Informationdisp(['The mean is ',num2str(mean(breathholds)),' seconds
(green line).']);disp(['The median is ',num2str(median(breathholds)),'
seconds (red line).']);hold all;line([mean(breathholds) mean(breathholds)],[0.5
1.5],'color','g');line([median(breathholds) median(breathholds)],[0.5
1.5],'color','r');
![Page 16: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/16.jpg)
Uncertainty Analysis for Engineers 16
Box Plottitle('Scatter with Min, 25%iqr, Median, Mean, 75%iqr, & Max lines'); xlabel(''); h3 = figure('Position',[100 100 400 100],'Color','w'); boxplot(breathholds,'orientation','horizontal','widths',.5); set(gca,'XLim',[40 140]); title('A Boxplot of the same data'); xlabel(''); set(gca,'Yticklabel',[]); ylabel('');
![Page 17: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/17.jpg)
Uncertainty Analysis for Engineers 17
Box Plot
Min
MaxMedia
nOutlier
Box represents
inter-quartile
range (half of data)
![Page 18: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/18.jpg)
Uncertainty Analysis for Engineers 18
Empirical cdfh3 = figure('Position',[100 100 600
400],'Color','w');cdfplot(breathholds);
![Page 19: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/19.jpg)
Uncertainty Analysis for Engineers 19
Multivariate Data SetsWhen there are multiple input
variables, we need some additional ways to characterize the data
If x and y are independent, then Cov(x,y)=0
)()()(),(
,),(
),(),(),(
yExExyEyxCov
discreteyxpyxh
continuousdxdyyxfyxhyxhE
i jjiji
![Page 20: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/20.jpg)
Uncertainty Analysis for Engineers 20
Correlation CoefficientsTwo random variables may be relatedDefine correlation coefficient of input (x)
and output (y) as
=1 implies linear dependence, positive slope
=0 no dependence=-1 implies linear dependence, negative
slope
)()(
),(
1 1
22
1, yx
yxCov
yyxx
yyxxm
k
m
k kk
m
k kkyx
![Page 21: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/21.jpg)
Uncertainty Analysis for Engineers 21
Example
=0.98
=-0.38
=1
=-0.98
![Page 22: Summary Statistics](https://reader035.vdocument.in/reader035/viewer/2022062520/56815bcc550346895dc9c1ae/html5/thumbnails/22.jpg)
Uncertainty Analysis for Engineers 22
Examplex=rand(25,1)-0.5;y=x;corrcoef(x,y)subplot(2,2,1), plot(x,y,'o')y2=x+0.2*rand(25,1);corrcoef(x,y2)subplot(2,2,2), plot(x,y2,'o')y3=-x+0.2*rand(25,1);corrcoef(x,y3)subplot(2,2,3), plot(x,y3,'o')y4=rand(25,1)-0.5;corrcoef(x,y4)subplot(2,2,4), plot(x,y4,'o')