chapter 4
DESCRIPTION
CHAPTER 4. 4.1 - Discrete Models General distributions Classical: Binomial, Poisson, etc. 4.2 - Continuous Models General distributions Classical: Normal, etc. - PowerPoint PPT PresentationTRANSCRIPT
CHAPTER 4
• 4.1 - Discrete Models General distributions Classical: Binomial, Poisson, etc.
• 4.2 - Continuous Models General distributions Classical: Normal, etc.
What is the connection between probability and random variables? Events (and their corresponding probabilities) that involve experimental measurements can be described by random variables (e.g., “X = # Males” in previous gender equity example).
2
POPULATION
random variable X
3
x1
x2
x3
x4
x5
x6
…etc….xn
Data values xi
Relative Frequenciesf (xi ) = fi /n
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
xk f (xk)
Total 1
Pop values xi
Probabilitiesf (xi )
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
Total 1
SAMPLE of size n
Example: X = Cholesterol level (mg/dL)
Discrete
POPULATION Pop values x
Probabilitiesf (x)
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
Total 1
Example: X = Cholesterol level (mg/dL)random variable XDiscrete
Probability Histogram
X
Total Area = 1
f(x) = Probability that the random variable X is equal to a specific value x, i.e.,
|x
“probability mass function” (pmf)
f(x) = P(X = x)
X
POPULATION Pop values x
Probabilitiesf (x)
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
Total 1
Example: X = Cholesterol level (mg/dL)random variable XDiscrete
Probability Histogram Total Area = 1
F(x) = Probability that the random variable X is less than or equal to a specific value x, i.e.,
“cumulative distribution function” (cdf)
F(x) = P(X x)
|x
POPULATION Pop values x
Probabilitiesf (x)
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
Total 1
Example: X = Cholesterol level (mg/dL)random variable XDiscrete
Probability Histogram
X
Hey!!! What about the
population mean
and the
population variance 2 ???
Calculating probabilities…
P(a X b) = ????????f (x)
b
a
|a
|x
|b
= F(b) – F(a)
7
POPULATION Pop values x
Probabilitiesf (x)
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
Total 1
• Population mean
Also denoted by E[X], the “expected value” of the variable X.
• Population variance
)(xfx
)()( xfx 22
Example: X = Cholesterol level (mg/dL)random variable XDiscrete
Just as the sample mean and sample variance s2 were used to characterize “measure of center” and “measure of spread” of a dataset, we can now define the “true” population mean and population variance 2, using probabilities.
x
Pop values x
Probabilitiesf (x)
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
Total 1
8
POPULATION
• Population mean
Also denoted by E[X], the “expected value” of the variable X.
• Population variance
)(xfx
)()( xfx 22
Example: X = Cholesterol level (mg/dL)random variable XDiscrete
Just as the sample mean and sample variance s2 were used to characterize “measure of center” and “measure of spread” of a dataset, we can now define the “true” population mean and population variance 2, using probabilities.
x
)/()()/()()/()( 212031106140 222
Pop values xi
Probabilitiesf (xi )
210 1/6
240 1/3
270 1/2
Total 1
Example 1:
9
POPULATION
)(xfx
)()( xfx 22
)/)(()/)(()/)(( 212703124061210 250
500
1/6
1/3
1/2
Example: X = Cholesterol level (mg/dL)random variable XDiscrete
)/()()/()()/()( 31303103130 222
Example 2:
10
POPULATION
)/)(()/)(()/)(( 312403121031180 210
600
)(xfx
)()( xfx 22
Pop values xi
Probabilitiesf (xi )
180 1/3
210 1/3
240 1/3
Total 1
1/3 1/3 1/3
Equally likely outcomes result in a “uniform distribution.”
(clear from symmetry)
Example: X = Cholesterol level (mg/dL)random variable XDiscrete
To summarize…
11
12
POPULATION
SAMPLE of size n
x1
x2
x3
x4
x5
x6
…etc….xn
Data xi
Relative Frequenciesf (xi ) = fi /n
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
xk f (xk)
1
Pop xi
Probabilitiesf (xi )
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
1
Frequency Table
Probability Table
)()(
)(
xfx
xfx22
Probability Histogram
X
Total Area = 1
Density Histogram
X
Total Area = 1
)()(
)(
xfxxs
xfxx
nn 22
1
Discrete random variable X
13
POPULATION
SAMPLE of size n
x1
x2
x3
x4
x5
x6
…etc….xn
Data xi
Relative Frequenciesf (xi ) = fi /n
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
xk f (xk)
1
Pop xi
Probabilitiesf (xi )
x1 f (x1)
x2 f (x2)
x3 f (x3)
⋮ ⋮
1
Frequency Table
Probability Table
)()(
)(
xfx
xfx22
Probability Histogram
X
Total Area = 1
Density Histogram
X
Total Area = 1
)()(
)(
xfxxs
xfxx
nn 22
1
?Discrete
random variable X
Continuous
14
One final example…
15
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
x f1(x)
210 1/6
240 1/3
270 1/2
Total 1
X2 = Cholesterol level (mg/dL)
x f2(x)
180 1/3
210 1/3
240 1/3
Total 1
1 = 250
12 = 500
2 = 210
22 = 600
D = X1 – X2 ~ ???
d Outcomes
-30 (210, 240)
0 (210, 210), (240, 240)
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)
NOTE: By definition, this
is the sample space of
the experiment!NOTE: By definition, this
is the sample space of
the experiment!
What are the probabilities
of the corresponding
events “D = d” for
d = -30, 0, 30, 60, 90?
d Outcomes
-30 (210, 240)
0 (210, 210), (240, 240)
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)
d Probabilities f(d)
-30 1/9 ?
0 2/9 ?
+30 3/9 ?
+60 2/9 ?
+90 1/9 ?16
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
x f1(x)
210 1/6
240 1/3
270 1/2
Total 1
X2 = Cholesterol level (mg/dL)
x f2(x)
180 1/3
210 1/3
240 1/3
Total 1
1 = 250
12 = 500
2 = 210
22 = 600
D = X1 – X2 ~ ???
NO!!!The
outcomes of D are NOT EQUALLY LIKELY!!!
d Outcomes
-30 (210, 240)
0 (210, 210), (240, 240)
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (210, 210), (240, 240)
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)17
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
x f1(x)
210 1/6
240 1/3
270 1/2
Total 1
X2 = Cholesterol level (mg/dL)
x f2(x)
180 1/3
210 1/3
240 1/3
Total 1
1 = 250
12 = 500
2 = 210
22 = 600
D = X1 – X2 ~ ???
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (210, 210), (240, 240)
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (1/6)(1/3) + (1/3)(1/3) = 3/18
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)18
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
x f1(x)
210 1/6
240 1/3
270 1/2
Total 1
X2 = Cholesterol level (mg/dL)
x f2(x)
180 1/3
210 1/3
240 1/3
Total 1
1 = 250
12 = 500
2 = 210
22 = 600
D = X1 – X2 ~ ???
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (1/6)(1/3) + (1/3)(1/3) = 3/18
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)19
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
x f1(x)
210 1/6
240 1/3
270 1/2
Total 1
X2 = Cholesterol level (mg/dL)
x f2(x)
180 1/3
210 1/3
240 1/3
Total 1
1 = 250
12 = 500
2 = 210
22 = 600
D = X1 – X2 ~ ???
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (1/6)(1/3) + (1/3)(1/3) = 3/18
+30 (1/6)(1/3) + (1/3)(1/3) + (1/2)(1/3) = 6/18
+60 (1/3)(1/3) + (1/2)(1/3) = 5/18
+90 (1/2)(1/3) = 3/18
1/18
3/18
6/18
5/18
3/18
Probability Histogram
What happens if the two
populations are dependent?
SEE LECTURE NOTES!
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (1/6)(1/3) + (1/3)(1/3) = 3/18
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)20
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
x f1(x)
210 1/6
240 1/3
270 1/2
Total 1
X2 = Cholesterol level (mg/dL)
x f2(x)
180 1/3
210 1/3
240 1/3
Total 1
1 = 250
12 = 500
2 = 210
22 = 600
D = X1 – X2 ~ ???
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (1/6)(1/3) + (1/3)(1/3) = 3/18
+30 (1/6)(1/3) + (1/3)(1/3) + (1/2)(1/3) = 6/18
+60 (1/3)(1/3) + (1/2)(1/3) = 5/18
+90 (1/2)(1/3) = 3/18
1 = 250
12 = 500
2 = 210
22 = 600
1/18
3/18
6/18
5/18
3/18
Probability Histogram
D = (-30)(1/18) + (0)(3/18) + (30)(6/18) + (60)(5/18) + (90)(3/18) = 40
D2 = (-70) 2(1/18) + (-40) 2(3/18) +
(-10) 2(6/18) + (20) 2(5/18) + (50) 2(3/18) = 1100
D = 1 – 2
D 2 = 1
2 + 2 2
General: TWO INDEPENDENT POPULATIONS
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (1/6)(1/3) + (1/3)(1/3) = 3/18
+30 (210, 180), (240, 210), (270, 240)
+60 (240, 180), (270, 210)
+90 (270, 180)21
X1
x f1(x)
210 1/6
240 1/3
270 1/2
Total 1
X2
x f2(x)
180 1/3
210 1/3
240 1/3
Total 1
1 = 250
12 = 500
2 = 210
22 = 600
d Probabilities f(d)
-30 (1/6)(1/3) = 1/18 via independence
0 (1/6)(1/3) + (1/3)(1/3) = 3/18
+30 (1/6)(1/3) + (1/3)(1/3) + (1/2)(1/3) = 6/18
+60 (1/3)(1/3) + (1/2)(1/3) = 5/18
+90 (1/2)(1/3) = 3/18
1 = 250
12 = 500
2 = 210
22 = 600
1/18
3/18
6/18
5/18
3/18
Probability Histogram
D = (-30)(1/18) + (0)(3/18) + (30)(6/18) + (60)(5/18) + (90)(3/18) = 40
D2 = (-70) 2(1/18) + (-40) 2(3/18) +
(-10) 2(6/18) + (20) 2(5/18) + (50) 2(3/18) = 1100
D = X1 – X2 ~ ???
D = 1 – 2
Mean (X1 – X2) = Mean (X1) – Mean (X2)
D 2 = 1
2 + 2 2
Var (X1 – X2) = Var (X1) + Var (X2) – 2 Cov (X1, X2)
X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL)
These two formulas are valid for continuous as well as discrete distributions.
IF the two populations are dependent…
…then this formula still holds,
BUT……