1 tr 555 statistics “refresher” lecture 1: probability concepts references: – penn state...
TRANSCRIPT
![Page 1: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/1.jpg)
1
TR 555 Statistics “Refresher”Lecture 1: Probability Concepts
References:– Penn State University, Dept. of Statistics
Statistical Education Resource Kit a collection of resources used by faculty in Penn State's
Department of Statistics in teaching introductory statistics courses.
Page maintained by Laura J. Simon, Sept. 2003 – Statistics: Making Sense of Data (MIT)
William Stout, John Marden and Kenneth Travers http://www.introductorystatistics.com/ Sept. 2003
– Tom Maze, stat course prepared for KDOT, 2003
![Page 2: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/2.jpg)
2
Outline
Overview of statistics Types of data Describing data numerically and graphically Probability and random variables
![Page 3: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/3.jpg)
3
Probability and Statistics
Probably is the likelihood of an event occurring relative to all other events
– Example: If a coin is flipped, what is the probability of getting a heads
– 0.5Given that the last flip was a heads what is the probability that the next will be
heads– 0.5
Statistics is the measurement and modeling of random variables– Example:
If our state averages 200 fatal crashes per year, what is the probability of having one crash today. Poisson distribution – = average per time period. 200/365 = 0.55
– P(1 = x) = ((t)x/x!)e-t=(0.55*1)1/1!)e-0.55(1)= 0.32
![Page 4: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/4.jpg)
4
Data Collection
Designing experiments– Does aspirin help reduce the risk of heart
attacks?
Observational studies– Polls - Clinton’s approval rating
![Page 5: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/5.jpg)
5
Variable Types
Deterministic– Assume away variation and randomness– Known with certainty– One to one mapping of independent variable to
dependent variable
Relationship
X1
Y1
![Page 6: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/6.jpg)
6
Variable Types Continued
Random or Stochastic– Recognized uncertainty of an event– One to one distribution mapping of independent
variable to dependent variable
Probability that it could be any of these values
Most Likely Less LikelyLess Likely
![Page 7: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/7.jpg)
7
Population
The set of data (numerical or otherwise) corresponding to the entire collection of units about which information is sought
![Page 8: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/8.jpg)
8
Sample
A subset of the population data that are actually collected in the course of a study.
![Page 9: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/9.jpg)
9
WHO CARES?
In most studies, it is difficult to obtain information from the entire population. We rely on samples to make estimates or inferences related to the population.
![Page 10: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/10.jpg)
10
Organization and Description of Data
Qualitative vs. Quantitative data Discrete vs. Continuous Data Graphical Displays Measures of Center Measures of Variation
![Page 11: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/11.jpg)
11
Qualitative (Categorical) Data
The raw (unsummarized) data are merely labels or categories
Quantitative (Numerical) Data
The raw (unsummarized) data are numerical
![Page 12: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/12.jpg)
12
Qualitative Data Examples
Class Standing (Fr, So, Ju, Sr) Section # (1,2,3,4,5,6) Automobile Make (Ford, Chevrolet, Nissan) Questionnaire response (disagree, neutral,
agree)
![Page 13: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/13.jpg)
13
Quantitative Data Examples (measures)
Voltage Height Weight SAT Score Number of students arriving late for class Time to complete a task
![Page 14: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/14.jpg)
14
Discrete Data
Only certain values are possible (there are gaps between the possible values)
Continuous Data
Theoretically, any value within an interval is possible with a fine enough measuring device
![Page 15: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/15.jpg)
15
Discrete Data Examples
Number of students late for class Number of crimes reported to SC police Number of times the word number is used
(generally, discrete data are counts)
![Page 16: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/16.jpg)
16
Discrete Variable ModelPoisson Distribution
(0.55*t)x/x!)e-0.55(t)
01
23
45
67
89
1011
1213
1415
# of Fatal Crashes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Pro
babi
lty
Probability of # of Fatals per one day
![Page 17: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/17.jpg)
17
Continuous Data Examples
Voltage Height Weight Time to complete a homework assignment
![Page 18: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/18.jpg)
18
Continuous Variable ModelExponential Distribution
0 0.8 1.6 2.4 3.2 4 4.7 5.5
Time till the first fatal accident
0
0.1
0.2
0.3
0.4
0.5
0.6
Pro
babi
lity
Fatality Probability Density Function
Probability of first Fatal at time t = e-t
![Page 19: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/19.jpg)
19
Continuous Probability Function
0 0.8 1.6 2.4 3.2 4 4.7 5.5
Days
0
0.2
0.4
0.6
0.8
1
1.2
Cum
mul
ativ
e P
roba
bilit
y
Cummulative Probability till first fatal
Cumulative Probability of Time Till First Fatal t = 1 - e-t
![Page 20: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/20.jpg)
20
Nominal Data
A type of categorical data in which objects fall into unordered categories, for example:– Hair color
blonde, brown, red, black, etc.
– Race Caucasian, African-American, Asian, etc.
– Smoking status smoker, non-smoker
![Page 21: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/21.jpg)
21
Ordinal Data
A type of categorical data in which order is important. For example …– Class
fresh, sophomore, junior, senior, super senior
– Degree of illness none, mild, moderate, severe, …, going, going, gone
– Opinion of students about riots ticked off, neutral, happy
![Page 22: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/22.jpg)
22
Binary Data
A type of categorical data in which there are only two categories.
Binary data can either be nominal or ordinal, for example …
– Smoking status smoker, non-smoker
– Attendance present, absent
– Class lower classman, upper classman
![Page 23: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/23.jpg)
23
Interval and Ratio Data
Interval– Interval is important, but no meaningful zero– e.g, temperature in farenheit
Ratio– has a meaningful zero value– e.g., temperature in Kelvin, crash rate
![Page 24: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/24.jpg)
24
Who Cares?
The type(s) of data collected in a study determine the type of statistical analysis used.
![Page 25: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/25.jpg)
25
Proportions
Categorical data are commonly summarized using “percentages” (or “proportions”).– 11% of students have a tattoo– 2%, 33%, 39%, and 26% of the students in class
are, respectively, freshmen, sophomores, juniors, and seniors
![Page 26: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/26.jpg)
26
Averages
Measurement data are typically summarized using “averages” (or “means”).– Average number of siblings Fall 1998 Stat 250
students have is 1.9.– Average weight of male Fall 1998 Stat 250
students is 173 pounds.– Average weight of female Fall 1998 Stat 250
students is 138 pounds.
![Page 27: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/27.jpg)
27
Descriptive statistics
Describing data with numbers:measures of location
![Page 28: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/28.jpg)
28
Mean
Another name for average. If describing a population, denoted as , the
greek letter “mu”. If describing a sample, denoted as x, called “x-
bar”. Appropriate for describing measurement data. Seriously affected by unusual values called
“outliers”.
_
![Page 29: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/29.jpg)
29
Calculating Sample Mean
nX
X iFormula:
That is, add up all of the data points and divide by the number of data points.
Data (# of classes skipped): 2 8 3 4 1
Sample Mean = (2+8+3+4+1)/5 = 3.6
Do not round! Mean need not be a whole number.
![Page 30: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/30.jpg)
30
Population Mean
The mean of a random variable X is called the population mean and is denoted
It is also called the expected value of X or the expectation of X and is denoted E(X).
ii xfxXE )(
![Page 31: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/31.jpg)
31
Median
Another name for 50th percentile. Appropriate for describing measurement
data. “Robust to outliers,” that is, not affected
much by unusual values.
![Page 32: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/32.jpg)
32
Calculating Sample Median
Order data from smallest to largest.
If odd number of data points, the median is the middle value.
Data (# of classes skipped): 2 8 3 4 1
Ordered Data: 1 2 3 4 8
Median
![Page 33: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/33.jpg)
33
Calculating Sample Median
Order data from smallest to largest.
If even number of data points, the median is the average of the two middle values.
Data (# of classes skipped): 2 8 3 4 1 8
Ordered Data: 1 2 3 4 8 8
Median = (3+4)/2 = 3.5
![Page 34: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/34.jpg)
34
Mode
The value that occurs most frequently. One data set can have many modes. Appropriate for all types of data, but most
useful for categorical data or discrete data with only a few number of possible values.
![Page 35: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/35.jpg)
35
Most appropriate measure of location
Depends on whether or not data are “symmetric” or “skewed”.
Depends on whether or not data have one (“unimodal”) or more (“multimodal”) modes.
![Page 36: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/36.jpg)
36
Symmetric and Unimodal
![Page 37: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/37.jpg)
37
Symmetric and Bimodal
![Page 38: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/38.jpg)
38
Skewed Right
0 100 200 300 400
0
10
20
Number of Music CDs
Fre
quen
cy
Number of Music CDs of Spring 1998 Stat 250 Students
![Page 39: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/39.jpg)
39
Skewed Left
![Page 40: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/40.jpg)
40
Choosing Appropriate Measure of Location
If data are symmetric, the mean, median, and mode will be approximately the same.
If data are multimodal, report the mean, median and/or mode for each subgroup.
If data are skewed, report the median.
![Page 41: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/41.jpg)
41
Descriptive statistics
Describing data with numbers: measures of variability
![Page 42: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/42.jpg)
42
Range
The difference between largest and smallest data point.
Highly affected by outliers.
Best for symmetric data with no outliers.
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
0
10
20
GPA
Fre
quen
cy
GPAs of Spring 1998 Stat 250 Students
![Page 43: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/43.jpg)
43
Interquartile range
The difference between the “third quartile” (75th percentile) and the “first quartile” (25th percentile). So, the “middle-half” of the values.
IQR = Q3-Q1 Robust to outliers or
extreme observations. Works well for skewed data.
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
0
10
20
GPA
Fre
quen
cy
GPAs of Spring 1998 Stat 250 Students
![Page 44: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/44.jpg)
44
Variance
1n
2)x(x2s
1. Find difference between each data point and mean.
2. Square the differences, and add them up.
3. Divide by one less than the number of data points.
![Page 45: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/45.jpg)
45
Variance
If measuring variance of population, denoted by 2 (“sigma-squared”).
If measuring variance of sample, denoted by s2 (“s-squared”).
Measures average squared deviation of data points from their mean.
Highly affected by outliers. Best for symmetric data.
Problem is units are squared.
![Page 46: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/46.jpg)
46
Population Variance
The variance of a random variable X is called the population variance and is denoted
2
ii xfx22
![Page 47: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/47.jpg)
47
Standard deviation
Sample standard deviation is square root of sample variance, and so is denoted by s.
Units are the original units. Measures average deviation of data points
from their mean. Also, highly affected by outliers.
![Page 48: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/48.jpg)
48
Population Standard Deviation
The population standard deviation is the square root of the population variance and is denoted
ii xfx22
![Page 49: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/49.jpg)
49
What is the variance or standard deviation?
(MPH)
![Page 50: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/50.jpg)
50
Variance or standard deviation
Sex N Mean Median TrMean StDev SE Mean female 126 91.23 90.00 90.83 11.32 1.01 male 100 06.79 110.00 105.62 17.39 1.74 Minimum Maximum Q1 Q3female 65.00 120.00 85.00 98.25male 75.00 162.00 95.00 118.75
Females: s = 11.32 mph and s2 = 11.322 = 128.1 mph2
Males: s = 17.39 mph and s2 = 17.392 = 302.5 mph2
![Page 51: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/51.jpg)
51
Coefficient of Variation (COV) – not covariance!
Ratio of sample standard deviation to sample mean multiplied by 100.
Measures relative variability, that is, variability relative to the magnitude of the data.
Unitless, so good for comparing variation between two groups.
![Page 52: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/52.jpg)
52
Coefficient of variation (MPH)
Sex N Mean Median TrMean StDev SE Mean female 126 91.23 90.00 90.83 11.32 1.01 male 100 106.79 110.00 105.62 17.39 1.74 Minimum Maximum Q1 Q3female 65.00 120.00 85.00 98.25male 75.00 162.00 95.00 118.75
Females: CV = (11.32/91.23) x 100 = 12.4
Males: CV = (17.39/106.79) x 100 = 16.3
![Page 53: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/53.jpg)
53
Choosing Appropriate Measure of Variability
If data are symmetric, with no serious outliers, use range and standard deviation.
If data are skewed, and/or have serious outliers, use IQR.
If comparing variation across two data sets, use coefficient of variation.
![Page 54: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/54.jpg)
54
Descriptive Statistics
Summarizing data using graphs
![Page 55: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/55.jpg)
55
Which graph to use?
Depends on type of data Depends on what you want to illustrate Depends on available statistical software
![Page 56: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/56.jpg)
56
Bar Chart
Summarizes categorical data. Horizontal axis represents categories, while vertical
axis represents either counts (“frequencies”) or percentages (“relative frequencies”).
Used to illustrate the differences in percentages (or counts) between categories.
Middle Oldest Only Youngest
10
20
30
40
Birth Order
Per
cent
Birth Order of Spring 1998 Stat 250 Students
n=92 students
![Page 57: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/57.jpg)
57
Histogram
Divide measurement up into equal-sized categories. Determine number (or percentage) of measurements
falling into each category. Draw a bar for each category so bars’ heights represent
number (or percent) falling into the categories. Label and title appropriately.
18 19 20 21 22 23 24 25 26 27
0
10
20
30
40
50
Age (in years)
Fre
quen
cy (
Cou
nt)
Age of Spring 1998 Stat 250 Students
n=92 students
![Page 58: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/58.jpg)
58
Use common sense in determining number of categories to use.
(Trial-and-error works fine, too.)
Number of ranges (see Tufte)
18 23 28
0
10
20
30
40
50
60
Age (in years)
Fre
quen
cy (
Cou
nt)
Age of Spring 1998 Stat 250 Students
n=92 students
2 3 4
0
1
2
3
4
5
6
7
GPA
Fre
quen
cy (
Co
unt)
GPAs of Spring 1998 Stat 250 Students
n=92 students
![Page 59: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/59.jpg)
59
Dot Plot
Summarizes measurement data.
Horizontal axis represents measurement scale.
Plot one dot for each data point.
160150140130120110100908070Speed
Fastest Ever Driving Speed
Women126
Men100
226 Stat 100 Students, Fall '98
![Page 60: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/60.jpg)
60
Stem-and-Leaf Plot
Summarizes measurement data.
Each data point is broken down into a “stem” and a “leaf.”
First, “stems” are aligned in a column.
Then, “leaves” are attached to the stems.
![Page 61: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/61.jpg)
61
Boxplot
smallest observation = 3.20 Q1 = 43.645
Q2 (median) = 60.345
Q3 = 84.96 largest observation = 124.27
0 10 20 30 40 50 60 70 80 90 100 110 120 130
. . . . .
![Page 62: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/62.jpg)
62
Box Plot
“Whiskers” are drawn to the most extreme data points that are not more than 1.5 times the length of the box beyond either quartile.
– Whiskers are useful for identifying outliers.
“Outliers,” or extreme observations, are denoted by asterisks.
– Generally, data points falling beyond the whiskers are considered outliers.
Useful for comparing two distributions
0
1
2
3
4
5
6
7
8
9
10
Hou
rs o
f sle
ep
Amount of sleep in past 24 hours
of Spring 1998 Stat 250 Students
![Page 63: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/63.jpg)
63
Using Box Plots to Compare
female male
60
110
160
Gender
Fast
est
Speed (
mph)
Fastest Ever Driving Speed
226 Stat 100 Students, Fall 1998
![Page 64: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/64.jpg)
64
Scatter Plots
Summarizes the relationship between two measurement variables.
Horizontal axis represents one variable and vertical axis represents second variable.
Plot one point for each pair of measurements.
22 23 24 25 26 27 28 29 30 31
22
23
24
25
26
27
28
29
30
31
Left foot (in cm)
Rig
ht fo
ot (
in c
m)
Foot sizes of Spring 1998 Stat 250 students
n=88 students
![Page 65: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/65.jpg)
65
No relationship
52 57 62
22
23
24
25
26
27
28
29
30
31
32
Head circumference (in cm)
Left fore
arm
(in
cm
)
Lengths of left forearms and head circumferences
of Spring 1998 Stat 250 Students
n=89 students
![Page 66: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/66.jpg)
66
Closing comments
Many possible types of graphs. Use common sense in reading graphs. When creating graphs, don’t summarize your
data too much or too little. When creating graphs, label everything for
others. Remember you are trying to communicate something to others!
![Page 67: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/67.jpg)
67
Probability
You’ll probably like it!
![Page 68: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/68.jpg)
68
Before we begin …
What is the probability that 2 or more people share the same birthday if …– 5 people are in the sample?– 23 people?– 50 people?– This class?
![Page 69: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/69.jpg)
69
Probability Properties
The probability of an event “A” (the proportion of times the event is expected to occur in repeated experiments), is denoted P(A).
All probabilities are between 0 and 1.(i.e. 0 < P(A) < 1)
The sum of the probabilities of all possible outcomes must be 1.
![Page 70: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/70.jpg)
70
Probability Basics
Given that a crash has occurred, what is the probability that it is a fatal crash?– Possible events – Fatal, injury, and property
damage onlyFatal 37,000 P(F) = 0.58%Injury 2,026,000 P(I) = 32.16%PDO 4,226,000 P(D) = 67.08%Total Crashes 6,300,000
![Page 71: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/71.jpg)
71
Complement
The complement of an event A, denoted by A, is the set of outcomes that are not in A
A means A does not occur
P(A) = 1 - P(A)Some texts use Ac to denote the complement of A
![Page 72: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/72.jpg)
72
Union
The union of two events A and B, denoted by A U B, is the set of outcomes that are in A, or B, or both
If A U B occurs, then either A or B or both occur
![Page 73: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/73.jpg)
73
Intersection
The intersection of two events A and B, denoted by AB, is the set of outcomes that
are in both A and B.
If AB occurs, then both A and B occur
![Page 74: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/74.jpg)
74
Combinations of Events
Union of fatal speed related and run-off the road crashes
Single Vehicle Crash
Speed RelatedCrashes
Intersection of Fatal and Run-off the Road Crashes
All Fatal Crashes (37,795)
21,052
13,357
![Page 75: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/75.jpg)
75
Addition Law
P(A U B) = P(A) + P(B) - P(AB)
(The probability of the union of A and B is the probability of A plus the probability of B minus the probability of the intersection of A and B)
![Page 76: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/76.jpg)
76
Mutually Exclusive Events
Two events are mutually exclusive if their intersection is empty.
Two events, A and B, are mutually exclusive if and only if P(AB) = 0
P(A U B) = P(A) + P(B)
![Page 77: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/77.jpg)
77
Conditional Probability
The probability of event A occurring, given that event B has occurred, is called the conditional probability of event A given event B, denoted P(A|B)
![Page 78: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/78.jpg)
78
Multiplication Rule
General form P(A/B) = P(A,B)/P(B)e.g., what is the probability of a single vehicle
accident given that it was speed related?
![Page 79: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/79.jpg)
79
Conditional Probability Example
Total fatal crashes - 37,795 Total speed related crashes – 13,357 Total single vehicle crashes – 21,052 Total single vehicle, speed related crashes - 8,600 If the crash was speed related, what is the probability that it was a
single vehicle crash?– P(sv/sp) = 8600/13357 = 64.38%
If the crash was speed related, what is the probability that it was not a single vehicle crash?
– P(sv/sp) = 1 – 0.6438 = 35.62%
Single VehicleCrashes
Speed RelatedCrashes
21,05213,357
All FatalCrashes37,795
SR+SV8,600
![Page 80: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/80.jpg)
80
Conditional Probability Example (Cont)
Probability that a fatal crash was speed related = P(sp) – 13,357/ 37,795 = 35.34%
Probability that a fatal crash was a single vehicle = P(sv) – 21,052/37,795 = 55.70%
Probability that a fatal crash is both speeding related and a single vehicle = P(sv,sp)
– 8,600/37,795 = 22.74%
Single VehicleCrashes
Speed RelatedCrashes
21,05213,357
All FatalCrashes37,795
SR+SV8,600
![Page 81: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/81.jpg)
81
Bayes’ Theorem
P(A/B)P(B) = P(B/A)P(A)P(B/A) = P(A/B)P(B)/P(A)P(sv) = 55.70%P(sp) = 35.34%P(sv/sp) = 64.38%P(sp/sv) = ?P(sp/sv) =
((0.6438)*(0.3534))/0.5570 = 0.3854
Single VehicleCrashes
Speed RelatedCrashes
21,05213,357
All FatalCrashes37,795
SR+SV8,600
![Page 82: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/82.jpg)
82
Bayes’ Theorem Problem
Given– There were 11,696 off-road fixed object fatal crashes
involving a single vehicle– There were 13,357 fatal crashes involving a speeding vehicle– There were 8,600 fatal crashes involving speeding and single
vehicles– There were 5,400 fatal crashes involving single vehicles,
speeding, and off-road fixed object crashes– The total number of fatal crashes is 37,795– Given that a crash is speeding related, what is the probability
that it will be an off-road single vehicle crash
![Page 83: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/83.jpg)
83
Bayes’ Problem Answer
What we need to know P(or,sv/sp)What we know
– P(or,sv) = 30.95%– P(sp) = 35.34%– P(sv,sp) = 55.70%– P(sv,sp) = 22.75%– P(sp,or,sv) = 14.29%– P(or,sv/sv) = 55.56%
![Page 84: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/84.jpg)
84
Answer Continued
Multiplication Rule– P(sp/or,sv)P(or,sv) = P(sp,or,sv)– P(sp/or,sv) = P(sp,or,sv)/P(or,sv)– 46.17% =0.1429/0.3095
Bayes’ Theorem– P(or,sv/sp)= (P(sp/or,sv)*P(or,sv))/P(sp)– 40.43% = (0.4617*0.3095)/0.3534
![Page 85: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/85.jpg)
85
Independence
Two events A and B are independent if
P(A|B) = P(A)
or
P(B|A) = P(B)
or
P(AB) = P(A)P(B)
![Page 86: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/86.jpg)
86
Probability Concepts
RandomnessIndependence
![Page 87: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/87.jpg)
87
Thought Question 1
What does it mean to say that a deck of cards is “randomly” shuffled? Every ordering of the cards is equally likely
There are 8 followed by 67 zeros possible orderings of a 52 card deck
Every card has the same probability to end up in any specified location
![Page 88: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/88.jpg)
88
The question continued
A 52 card deck is randomly shuffled How often will the tenth card down from
the top be a Club? 1/4 of the time Every card has the same chance to end up
10th. There are 13 clubs and 13 / 52 = 1/4
![Page 89: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/89.jpg)
89
Law of Large Numbers
Relative frequency of an event gets closer to true probability as number of trials gets larger
![Page 90: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/90.jpg)
90
Probability values
Probabilities are between 0 and 1 Total probabilities of all possible
outcomes = 1 Probability = 1
means an event always happens
Probability = 0 means an event never happens
![Page 91: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/91.jpg)
91
Does a prior event matter?
A fair coin is flipped four times. First three flips are heads What’s the probability that the fourth flip
is heads? 1/2 assuming flips are independent
Results of first three flips don’t matter
![Page 92: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/92.jpg)
92
Independence
The chance that B happens is not affected by whether A had happened.
![Page 93: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/93.jpg)
93
Does prior event matter?
Ten card drawn without replacement from 52 card deck.
2 Aces are among these 10 cards What’s the probability the tenth card is an
Ace? 2/42 = 1/21
After ten draws, 42 cards remain, 2 of them are Aces
![Page 94: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/94.jpg)
94
Dependence
The chance that B happens is affected by whether A has happened.
![Page 95: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/95.jpg)
95
Sequence of Events
You guess at five True False questions. What’s the probability you get them right?
![Page 96: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/96.jpg)
96
Five right in five guesses
For each question, Pr(correct) = 1/2 Multiply probabilities
(1/2) x (1/2) x (1/2) x (1/2) x (1/2) = 1/32 = 0.031
![Page 97: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/97.jpg)
97
Card Example
Two cards are taken from normal 52 card deck.
What’s the probability that both are Hearts?
Note - there’s dependence between the two cards
Answer = (13/52) x (12/51) = 1/17 = 0.059
![Page 98: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/98.jpg)
98
The Birthday Problem
What is the probability that at least two people in this class share the same birthday?
![Page 99: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/99.jpg)
99
Assumptions
Only 365 days each year. Birthdays are evenly distributed throughout
the year, so that each day of the year has an equal chance of being someone’s birthday.
![Page 100: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/100.jpg)
100
Take group of 5 people….
Let A = event no one in group shares same birthday.
Then AC = event at least 2 people share same birthday.
P(A) = 365/365 × 364/365 × 363/365 × 362/365 × 361/365
= 0.973
P(AC) = 1 - 0.973 = 0.027
That is, about a 3% chance that in a group of 5 people at least two people share the same birthday.
![Page 101: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/101.jpg)
101
Take group of 23 people….
Let A = event no one in group shares same birthday.
Then AC = event at least 2 people share same birthday.
P(A) = 365/365 × 364/365 × … × 343/365
= 0.493
P(AC) = 1 - 0.493 = 0.507
That is, about a 50% chance that in a group of 23 people at least two people share the same birthday.
![Page 102: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/102.jpg)
102
Take group of 50 people….
Let A = event no one in group shares same birthday.
Then AC = event at least 2 people share same birthday.
P(A) = 365/365 × 364/365 × … × 316/365
= 0.03
P(AC) = 1 - 0.03 = 0.97
That is, “virtually certain” that in a group of 50 people at least two people share the same birthday.
![Page 103: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/103.jpg)
103
Two-way Tables
And various probabilities...
![Page 104: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/104.jpg)
104
Two-way table of counts
Rows: gender Columns: pierced ears N Y All M 71 19 90 F 4 84 88 All 75 103 178 Cell Contents -- Count
![Page 105: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/105.jpg)
105
Joint (“”) probabilities
Rows: gender Columns: pierced ears N Y All M 71 19 90 39.89 10.67 50.56 F 4 84 88 2.25 47.19 49.44
All 75 103 178 42.13 57.87 100.00 Cell Contents -- Count % of Tbl
![Page 106: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/106.jpg)
106
Row conditional probabilities
Rows: gender Columns: pierced ears N Y All M 71 19 90 78.89 21.11 100.00 F 4 84 88 4.55 95.45 100.00 All 75 103 178 42.13 57.87 100.00 Cell Contents -- Count % of Row
![Page 107: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/107.jpg)
107
Column conditional probabilities
Rows: gender Columns: pierced ears N Y All M 71 19 90 94.67 18.45 50.56 F 4 84 88 5.33 81.55 49.44 All 75 103 178 100.00 100.00 100.00 Cell Contents -- Count % of Col
![Page 108: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/108.jpg)
108
Expected Value
Coincidences
![Page 109: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/109.jpg)
109
Roulette Color Bet
18 black, 18 red, and 2 green numbers Bet on one of black or red If correct , win $1 If wrong, lose $1
![Page 110: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/110.jpg)
110
Is the bet fair?
Fair game : expected value is 0 Expected value =
sum of (outcome x prob) Exp Val. = (+1)(18/38)+(-1)(20/38) = -2/38 Not fair since expected value is not 0.
![Page 111: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/111.jpg)
111
Color Bet versus Number bet
Both have same expected value How are the bets the same? Long run result is same How are they different? Short run results can be quite different
![Page 112: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/112.jpg)
112
Prob of Five Straight Losses
Color Bet = (20/38)5 = 0.04 , 4% Number Bet = (37/38)5 = 0.88, 88%
![Page 113: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/113.jpg)
113
A Spectacular Coincidence ?
Many states draw four digit lottery numbers
Several years ago Mass. and N.H. both drew the same number on the same night
Associated Press wrote that this was a spectacular 1 in 100 million coincidence
![Page 114: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/114.jpg)
114
Was Associated Press Right ?
Only if number picked is specified in advance of the draws.
Chance both pick the same pre-specified number, for example 2963, is (1/10,000) (1/10,000)
This is 1 in 100 million But the match could have been on any of
10,000 possibilities
![Page 115: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/115.jpg)
115
The correct analysis
First state could have picked any number Chance the second state matches is
1/10,000 Answer for two specific states is 1/10,000 But there were 15 states doing this almost
every night .
![Page 116: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/116.jpg)
116
The prob that the 15 states all differ
First state can be any number Prob second state differs = 9,999/10,000 Prob third state is unique = 9,998/10,000 And so on, for 15 states Multiply these prob.'s to get probability
that all 15 differ Answer is about 0.99 that all picked
different numbers
![Page 117: 1 TR 555 Statistics “Refresher” Lecture 1: Probability Concepts References: – Penn State University, Dept. of Statistics Statistical Education Resource](https://reader030.vdocument.in/reader030/viewer/2022032707/56649e205503460f94b0c389/html5/thumbnails/117.jpg)
117
Prob at least two states are same
Opposite from all different Prob at least two the same = 1-Prob(all
differ) 1 - 0.99 = 0.01 About 1 in 100 ; a far cry from 1 in 100
million