summary
DESCRIPTION
Summary. Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability range, IQR , average absolute deviation, variation and standard deviation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/1.jpg)
Summary• Five numbers summary, percentiles, mean• Box plot, modified box plot• Robust statistic – mean, median, trimmed mean
• outlier• Measures of variability
• range, IQR, average absolute deviation, variation and standard deviation
• Average distance between each data value and the mean is zero.
![Page 2: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/2.jpg)
Standard deviation – empirical rule
![Page 3: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/3.jpg)
Standard deviation – empirical rule
![Page 4: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/4.jpg)
Standard deviation – empirical rule
![Page 5: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/5.jpg)
Population - parameterMean Standard deviation
Sample - statisticMean Standard deviation
Výběr - statistikaVýběrový průměr Výběrová směrodatná odchylka
population (census) vs. sample
parameter (population) vs. statistic (sample)
![Page 6: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/6.jpg)
Bias, sampling• Sampling – how to construct sample from the population?• Bias – a sample is biased if it differs from the population in
a systematic way.• Unbiased standard deviation – divide by .
𝑠=√∑ (𝑥𝑖−𝑥 )2
𝑛−1 ≈𝜎=√∑ (𝑥𝑖−𝜇)2
𝑛
![Page 7: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/7.jpg)
SRS• sampling with replacement
• Generates independent samples.• Two sample values are independent if that what we get on the first
one doesn't affect what we get on the second.• sampling without replacement
• Deliberately avoid choosing any member of the population more than once.
• This type of sampling is not independent, however it is more common.
• The error is small as long as 1. the sample is large2. the sample size is no more than 10% of population size
![Page 8: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/8.jpg)
• Suppose you have a bag with 3 cards in it. The cards are numbered 0, 2 and 4.
• Population mean = 2• Population variance = 8/3
• An important property of a sample statistic that estimates a population parameter is that if you evaluate the sample statistic for every possible sample and average them all, the average of the sample statistic should equal the population parameter.
We want: • This is called unbiased.
![Page 9: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/9.jpg)
Bessel’s game
Sample Sample average Sample variance (n-1) Sample variance (n)
0,2 1 2 10,4 2 8 42,0 1 2 12,4 3 2 14,0 2 8 44,2 3 2 10,0 0 0 02,2 2 0 04,4 4 0 0
average
𝜇=2 ,𝜎=83
![Page 10: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/10.jpg)
Histogram revision• Distribution – the pattern of values in the data• Histogram – visualizing the distribution• We can see
• whether the data tend to be close to the particular value• whether the data varies a lot or a little about the most common
values• whether that variation tends to be more above or below the
common values• whether there are unusually large or small values in the data
![Page 11: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/11.jpg)
Life expectancy data – histogram• Use interactive histogram applet to generate histogram
with bin size of 10, starting at 40.
life expectancy
freq
uenc
y
![Page 12: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/12.jpg)
Life expectancy data – histogram
life expectancy
freq
uenc
y
![Page 13: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/13.jpg)
Making conclusions from a histogram• What all you can tell for life expectancy data?
• how many modes?• where is the mode?• symmetric, left skewed or right skewed?• outliers – yes or no?
life expectancy
freq
uenc
y
![Page 14: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/14.jpg)
Making conclusions from a histogram• Where is the mode, the median, the mean?
life expectancy
freq
uenc
y
![Page 15: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/15.jpg)
Min. Q1 Median Q3 Max. 47.79 64.67 73.24 76.65 83.39
Five numbers summary
8.5>3.5
25.4>10.2
What is the position of the mean and the median?
mean=69.9
![Page 16: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/16.jpg)
![Page 17: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/17.jpg)
symmetric, left or rigt skewed?
![Page 18: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/18.jpg)
STANDARDIZINGnormování
![Page 19: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/19.jpg)
Playing chess• Pretend I am a chess player.• Which of the following tells you most about how good I
am:1. My rating is 1800.2. 8110th place among world competitive chess players.3. Ranked higher than 88% of competitive chess players.
![Page 20: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/20.jpg)
Distribution
Distribution of scores in one particular year
We should use relative frequencies and convert all absolute frequencies to proportions.
![Page 21: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/21.jpg)
Height data – absolute frequencies
http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights
![Page 22: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/22.jpg)
Height data – relative frequencies
![Page 23: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/23.jpg)
![Page 24: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/24.jpg)
Height data – relative frequenciesWhat proportion of values is between 170 cm and 173.75 cm?
30%
![Page 25: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/25.jpg)
Height data – relative frequenciesWhat proportion of values is between 170 cm and 175 cm?
We can’t tell for certain.
![Page 26: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/26.jpg)
• How should we modify data/histogram to allow us a more detail?1. Adding more value to the dataset2. Increasing the bin size3. A smaller bin size
![Page 27: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/27.jpg)
Height data – relative frequenciesWhat proportion of values is between 170 cm and 175 cm?
36%
![Page 28: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/28.jpg)
Height data – relative frequencies
![Page 29: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/29.jpg)
Decreasing bin size• Check out what happens with the smallest bin size for
Physics Test Scores from http://quarknet.fnal.gov/cosmics/histo.shtml.
![Page 30: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/30.jpg)
Height
![Page 31: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/31.jpg)
Height data – relative frequencies
![Page 32: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/32.jpg)
Normal distribution
recall the empirical rule
68-95-99.7
𝑥=3
![Page 33: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/33.jpg)
Empirical rule
0 +1 +2 +3-1-2-3
3 4 5 6 2 1 0
![Page 34: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/34.jpg)
Z
Z – number of standard deviations away from the mean
If the Z-value is 1, how many percent are less than that value?
cca 84 %
0 +1 +2 +3-1-2-3
![Page 35: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/35.jpg)
Who is more popular?Let’s demonstrate the importance of Z-scores with the following example.
![Page 36: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/36.jpg)
Who is more popular
s.d. = 36
s.d. = 60
Z = -3.53
Z = -2.57
![Page 37: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/37.jpg)
Standardizing
![Page 38: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/38.jpg)
Formula• What formula describes what we did?
![Page 39: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/39.jpg)
Quiz• What does a negative Z-score mean?
1. The original value is negative.2. The original value is less than mean.3. The original value is less than 0.4. The original value minus the mean is negative.
![Page 40: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/40.jpg)
Quiz II• If we standardize a distribution by converting every value
to a Z-score, what will be the new mean of this standardized distribution?
• If we standardize a distribution by converting every value to a Z-score, what will be the new standard deviation of this standardized distribution?
![Page 41: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/41.jpg)
Standard normal distribution
N(,)
N(,)
![Page 42: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/42.jpg)
Standard normal distribution
![Page 43: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/43.jpg)
Meaning of relative frequencies
5 2 3 2 4
1 3 4 3 3
1
2
2
3
3
3
34 45
![Page 44: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/44.jpg)
Histogram of these data
![Page 45: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/45.jpg)
Probability density function
Probability density function (PDF)
Hustota pravděpodobnosti
![Page 46: Summary](https://reader036.vdocument.in/reader036/viewer/2022081515/5681645a550346895dd62b7a/html5/thumbnails/46.jpg)
Standard normal distribution
1√2𝜎𝜋
𝑒𝑥𝑝 {− (𝑥−𝜇)2
2𝜎2 }