random variables & probability distributions chapter 6 of the textbook pages 167-208
TRANSCRIPT
Random Variables & Probability Distributions
Chapter 6 of the textbook
Pages 167-208
Lecture Overview
Schedule
Clarification from Friday
Discrete Random Variables
ScheduleToday:
– Discrete random variables– Homework #4 will be posted this afternoon
Wednesday: – Continuous random variables & bivariate random variables– Homework #3 due
Friday: – Homework 4 help & Excel show and tell
Next Monday: – Any remaining chapter 6 slides – Exam #1 Review – Homework #4 due
Next Wednesday:– Exam #1 (you’re allowed 1 sheet of paper (front & back) for notes & equations)– Test questions will be very reminiscent of homework problems
Next Friday:– Go over exam #1 questions– Intro to S-Plus
Clarification From Friday
On HW3, question #15
P(A) = .3, P(B) = .5, P(B|A) = .4
What is P(A|B)What is P(A∩B)
Using the multiplication theorem
P(B∩A) = P(B|A)*P(A) = 0.4 * 0.3 = .12
Using the definition of conditional probability
P(A|B) = P(A∩B) / P(B) = .12 / .5 = .24
Clarification From Friday
Why doesn’t P(A ∩ B) = P(A) * P(B)?
Answer: because (A) and (B) aren’t statistically independent
Recall that statistical independence is defined as:– P(A|B) = P(A) – OR– P(B|A) = P(B)
This is not true for this problem
If A and B are statistically independent, the multiplication theorem becomes: P(A ∩ B) = P(A) * P(B) since we can just replace P(A|B) with P(A)
Definitions
Random Sample (from Ch. 1)
Variable (from Ch. 1)
Random Variable– “any numerically valued function that is defined over a sample
space”– For the household example in the book - “the variable is random
not because the household makes a random decision to include a certain number of people, but because our sample experiment selects a household randomly”
Example
Imagine randomly sampling students in the union and asking them how many books they are carrying
Example DataElementary Outcomes– Student 1 : 3 books– Student 2 : 2 books– Student 3 : 0 books– Student 4 : 1 books– Student 5 : 2 books– Student 6 : 1 books– Student 7 : 0 books– Student 8 : 1 books– Student 9 : 4 books– Student 10 :1 books
Sample Space : {0,1,2,3,4}
Random Variable (X)(Function: X = # of books)
– X(Student 1) = 3– X(Student 2) = 2 – X(3) = 0– Etc.
Probability of Random Variables– So X can be any value from the
full set of possible #s of books
– x can = any number in the sample space (0,1,2,3,4)
– P(x) is the probability of getting an x in a random sample
– Example: P(0 books) = 2/10 = .2– Example: P(3) = 1/10 = .1
Clarification: X and xX is the random variable– Can be any of the possible values and their associated probabilities– In other words, this can equal any element in a sample space, each
with a probability of occurring
x is one possible outcome of the random variable (i.e., an event)– For example, x can be 0 books, 1 book, 2 books, etc.
Why does this matter?– When we figure out probabilities we are usually concerned with
P(xi) since P(X) = 1– When we figure out expected values (E) or variances (V) we are
concerned with X because we want to know the expected values with respect to all possibilities
Definition
Probability Distribution or Function– “a table, graph, or mathematical function that
describes the potential values of a random variable X and their corresponding probabilities”
Example ContinuedProbability Distribution or Function
Table Form Graph Form
xi P(xi)
0 2/10 = 0.2
1 4/10 = 0.4
2 2/10 = 0.2
3 1/10 = 0.1
4 1/10 = 0.10
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 1 2 3 4
Number of Books
Pro
bab
ilit
y :
P(x
)
Question: What do these remind you of from past chapters?
Key Concept
Discrete Random Variables– “The set of possible values (i.e., the sample space) is finite or
countably infinite.”
Continuous Random Variables– The set of possible values can be any real number in the range of
possible values (i.e., infinite possible values)
Questions:– What type of random variable is the student / book example?– Can you come up with examples of each?
Probability Mass Function
Specifies the probability distribution for discrete variables
The tables and graphs are examples of the probability mass function (i.e., the probability is “massed” at the discrete possible values)
The probability mass function preserves the provision:
k = the different values of the discrete variable i = 1, 2, …. kThis is identical to the rule from last chapter
k
iixPXP
1
1)()(
Example Continued
1)4()3()2()1()0()(1
PPPPPxPk
ii
k
iixP
1
1)(
11.01.02.04.02.0)(1
k
iixP
Expected Values for Discrete Random Variables
“E” is the term for expected values
E(X) = the expected value of a discrete random variable
To calculate E we need the probability distribution (e.g., the probability distribution table)
)(*)()(1
k
iii xxPXE
Example ContinuedFor our students & books example
So if we randomly selected a student we would expect them to have 1.5 books with themQuestion: does this remind you of any other statistic?
)(*)()(1
k
iii xxPXE
4*1.03*1.02*2.01*4.00*2.0)( XE
).....(*)()(*)()( 1100 xxPxxPXE
5.14.03.04.04.00)( XE
Variance Values for Discrete Random Variables
“V” is the term for the variance of a probability distribution
V(X) = the variance value of a discrete random variable
To calculate V we need E and the probability distribution (e.g., the probability distribution table)
Note: I wrote the equation a little differently than the book to make it clear that you do the sum first and then subtract the E(X)2
])([])(*)([)( 22
1
XExxPXVk
iii
Example Continued
For out students & books example
As with univariate statistics, the standard deviation is the square root of the variance
45.1]25.2[]7.3[)(
]5.1[]1.0*41.0*32.0*24.0*12.0*0[)(
])([.....]*)(*)([)(
])([])(*)([)(
222222
2211
200
22
1
XE
XE
XExxPxxPXE
XExxPXEk
iii
Discrete Probability Models
If we have a census, the probability distribution (table, graph, etc.) is complete/finished/appropriate/accurate
If we have a sample, we try to match our sample probability distribution to a known probability distribution
Discrete Probability Models
Common probability models for discrete random variables include– Uniform distribution
– Binomial distribution
– Poisson distribution
The benefit of using these models is that they have known properties and corresponding equations already made
Discrete Uniform Distribution
The probability of all possible random variables is equal
This equates to a rectangular graph (i.e., flat on top)
Discrete Uniform Distribution Equations
Probability:
Expected:
Variance:
kxXE
k
x
1)(
1
kxP
1)(
12
1)(
2
KXV
Discrete Binomial Distribution
There are 2 and only 2 possible outcomes of a statistical experiment (e.g., flipping a coin)
Discrete Binomial Distribution Equations
Probability:
Expected:
Variance:
= the probability of one solution
)1()( nXV
xnxnxCxP )1()(
nXE )(
Example (from book)
Quiz with 10 multiple choice questions
Each question has 5 possible answers
P(guessing correctly) = 1/5 = 0.2
P(guessing incorrectly) = 4/5 = 0.8
What is the probability of guessing 5 correct answers?
Answering this question with what we learned last chapter
Imagine we only have 2 questionsSince the questions are independent, P(A∩B) = P(A) * P(B) This result is in the upper left box
Answering this question with what we learned last chapter
Now imagine we add a third questionSince the questions are independent P(A∩B∩C) = P(A) * P(B) *P(C) is the upper left box
Answering this question with what we learned last chapter
So 3 out of 3 right answers = P(RRR) = 0.2 * 0.2 * 0.2 = .008
Follow this out to 5 correct & 5 incorrect– = 0.2*0.2*0.2*0.2*0.2*0.8*0.8*0.8*0.8*0.8 (i.e., 0.25 * 0.85)– = 0.000104858 for one option of 5 R, 5W– Another option would be 4 R, 5 W, 1 R (i.e., 1-4 & 10 correct)
How many combinations of 5 Right & 5 Wrong are there?
Use combinations rule: C(10,5) = 252
Answer = 252 * 0.000104858 = 0.026
Answer Question Using the New Equation
026.0)2.01(2.0)(
)1()(510510
5
CxP
CxP xnxnx
This answer can also be found in a table in the back of the book on P. 605
Poisson Discrete Distribution
Poisson distributions are often used to determine the probability of a number of events (x) occurring in a fixed space or over a fixed period of time
But they can be used to determine probabilities for other variables as well (the book used the example of lengths of rope)
Poisson distributions are also called the “distribution of rare events”
Poisson Discrete Distribution
Requirements for using a Poisson distribution– Mutually exclusive events are independent– The probability of an event occurring is small and
proportional to the size of the area (or to the length of the interval)
– The probability of 2 or more events occurring in a small area or interval is near zero
Rules 2 and 3 are where the phrase “distribution of rare events”
Poisson Discrete DistributionThe parts of a Poisson distribution equation:– λ - The average occurrence of an event in time or space
• 8 houses per block• 1 hiccup per minute• 1 lightening strike per square mile per decade• The “answers” to questions will be in the same units as λ (e.g., “per
minute”)
– e - base of the natural logarithm (e = 2.71828...)
– X – the Poisson random variable• Just like the random variable for the other distributions• This is the value for which we determine E and V
– x – the values from X for which we find probabilities etc.• E.g., what is the probability of x if x = 2 hiccups per minute?• x can be any positive number
Poisson Discrete Distribution
Probability:
Expected:
Variance:
!
)(0 x
exXE
x
x
!)(
x
exP
x
!
)]([)(0
2
x
exXExXV
x
x
Poisson Discrete Distribution
The Poisson discrete distribution is actually a family of distributions– The members of the family relate to one λ each
For example:– 1 hiccup per minute uses one family– 2 hiccups per minute uses another family
Poisson Discrete Distribution
1 Hiccup Per Minute (i.e., λ = 1)
What is the probability of hiccupping 4 times per minute (i.e., x = 4)?
01533.0)4(16
36788.0)4(
!4
12.71828)4(
!)(
41
P
P
P
x
exP
x
This answer can also be found in a table in the back of the book on P. 606
Continuous Random Variables
Review:– Continuous Random Variables: The set of
possible values can be any real number in the range of possible values (i.e., infinite possible values)
For continuous random variables we use probability density functions rather than probability mass functions
Probability Density Functions
Specify the probability distribution for continuous variables
Unlike probability mass functions we used with discrete variables where all the P(xi) added up to 1, with probability density function the area under the curve = 1
Also unlike probability mass functions we aren’t concerned with P(xi) because each xi is a vertical line with an area equal to zero
Instead we are concerned with probabilities such as P(x > some amount A) or P(x between values B and C)
Probability Density Functions
The probability density function of a random continuous variable X is denoted as f(X)
The “function” part (i.e., the “f”) relates to the equation that produces a curve (i.e., it is used to graph the line)
Conditions satisfied by probability density functions (assume the min and max values are a and b respectively):– f(x) ≥ 0 for a ≤ x ≤ b– The area under f(x) from x = a to x = b = 1
Because we are ultimately concerned with areas under portions of the curve, what type of math do we need?
Continuous Probability Distribution Models
Probability Distribution Models– As with discrete random variables we usually
have a sample rather than a census
– To calculate probabilities from a sample we assume the data conform to some known distribution for which we have handy tables
– This is how we avoid having to do calculus
Continuous Probability Models
Common probability models for continuous random variables include– Uniform distribution
• Rectangular distribution
– Normal distribution (a.k.a. Gaussian)• The “bell shaped curve”
Uniform Continuous Distribution
Probability: P(c to d) = (d-c) / (b-a) given a ≤ c ≤ d ≤ b
Distribution Function: for a ≤ x ≤ b
Expected:
Variance:12
a)-(b)(
2)(
2
XV
abXE
abxf
1)(
Normal Probability Distribution
Probability: convert to z-scores first (explained in a few slides)
Distribution Function:
Expected:
Variance:
Notes: = 3.14……
2)(
)(
XV
XE
2
2
1
2
1)(
x
exf
Features of the Normal Probability Distribution
The mean, median, and mode values are all equal to the peak of the distribution
The distribution is symmetrical
½ of the curve is above the mean and vice-versa
Z scores
To avoid having to calculate probabilities for curves with varying μ, σ, and shapes we can convert any normally distributed random variable to a standard form for which we have tables
To do this we use the z transformation:
This conversion changes our variable measured in units x (e.g., meters, miles, pounds) to units of z (i.e., standard deviation units)
x
zx
Example
If we have a normally distributed dataset of bowling scores with μ = 150, σ = 10, what is the z-score of 175?
– What does a z-score of 2.5 mean?
– Answer: the value of 175 is 2.5 standard deviations above the mean for this particular normal probability distribution
5.210
25
10
150175175
x
z
Probability z-scores
Remember that one of the reasons we calculate z-scores is to ask questions about probability
For example what is the probability of bowling over 175 given our previous example?
To answer this question it is easiest to use a z-table like the one on page 207 of your book
Using Standard Normal Probabilities (i.e., a z-table)
The table in our book is atypical of what you usually see, but more user friendly thanks to the pictures
For our bowling example, find the z-score (2.5) in the column on the left
Now choose the column of interest, in this case column #3: P(Z > z)
The probability value we get from the table is 0.006– This means that the probability of bowling over 175, for our
fictional dataset, is 0.006 (i.e., 0.6%)
More Conventional Z tables
Normally we see z-tables with the following characteristics:– 2 digits of precision in the far left column
– 1 additional digit of precision in each of the 10 other columns to the right
– A value indicating a one-directional probability (i.e., the total probability of values less than a z-score)
• This is equivalent to the 5th column in the z-table in our book
Bivariate Random Variables
Now we turn our attention to the relationship between two variables (hence the name “bivariate”)
The random variables can be discrete or continuous
Most of the following slide & equations should look very familiar to those from chapter 5
Bivariate Probability Functions
Conditions:– 0 ≤ P(x,y) ≤ 1
–
For 2 discrete random variables (x & y) it is useful to set up a contingency table
These contingency tables are just like those for 2 events and they may contain actual counts or probabilities
1),( x y
yxP
Example (from book)100 households sampled and asked how many people are in the household (x) and how many cars are owned by members of the household (y)
Our book did you the disservice of switching the x and y axes
The data can be summarized in the following table:
0 1 2 3
2 10 8 3 2
3 7 10 6 3
4 4 5 12 6
5 1 2 6 15
Cars (y)
Household Size
(x)
Marginal TotalsMarginal Probabilities are the sums of the rows and columns
The marginal totals for household size (x) are in red
The marginal totals for cars (y) are in blue
The total number of households sampled is in green
0 1 2 3
2 10 8 3 2
3 7 10 6 3
4 4 5 12 6
5 1 2 6 15
Cars (y)
Household Size
(x)
23
26
27
24
22 25 27 26 100
Marginal ProbabilitiesAll probabilities are just the totals from each box (see last slide) divided by the total number of households (100)
The marginal probabilities for household size (x) are in red
The marginal probabilities for cars (y) are in blue
The sum of each set of marginal probabilities green
0 1 2 3
2 .1 .08 .03 .02
3 .07 .1 .06 .03
4 .04 .05 .12 .06
5 .01 .02 .06 .15
Cars (y)
Household Size
(x)
.23
.26
.27
.24
.22 .25 .27 .26 1.0
Conditional Probabilities
Equation:
For Example: what is the probability of having a household size of 4 (x=4) given the household has 3 cars (y=3)?
)(
),()|(
yP
yxPyxP
231.026.0
06.0)3|4( P
Covariance
“Covariance is a direct statistical measure of the degree to which two random variables X and Y tend to vary together”
Covariance is positive when X and Y increase together (and therefore decrease together) – Ex. the amount of ice cream you eat and the temperature outside
Covariance is negative when X and Y are inversely related– Ex. the number of layers of clothes you tend to wear and the
temperature outside
When there is no pattern the covariance is close to zero
Covariance
Covariance Equation:
Note 1: There is another option for calculating covariance in your book
Note 2: there are also nice tables showing how you would go about calculating these values on page 202 & 203
)]([)]()[,(),( YEyXExyxPYXCx y
Covariance
Covariance Equation:
Parts of this equation:– The P(x,y) values come from the covariance table
– The E(X) and E(Y) values are calculated using this formula (from about 40 slides ago):
)]([)]()[,(),( YEyXExyxPYXCx y
)(*)()(1
k
iii xxPXE
Covariance Example (from book)
First we need to calculate the expected (E) values:
We can use the marginal probabilities for this
For E(X) multiply the xi by the row totals– (i.e., orange # * red #)
For E(Y) multiply the yi by the column totals– (i.e., purple # * blue #)
52.32.108.178.46.)(
24.*527.*426.*323.0*2)(
)(*)()(1
XE
XE
xxPXEk
iii
0 1 2 3
2 .1 .08 .03 .02
3 .07 .1 .06 .03
4 .04 .05 .12 .06
5 .01 .02 .06 .15
.23
.26
.27
.24
.22 .25 .27 .26 1.0
57.178.54.25.0)(
26.*327.*225.*122.0*0)(
)(*)()(1
YE
YE
yyPYEk
iii
Covariance Example (book is wrong again!)(x,y) P(x,y
)X-E(X) Y-E(Y) [x-E(X)][y-E(Y)] P(x,y) [x-E(X)][y-E(Y)]
2,0 .1 -1.52 -1.57 2.3864 0.23864
2,1 .08 -1.52 -0.57 0.8664 0.069312
2,2 .03 -1.52 0.43 -0.6536 -0.019608
2,3 .02 -1.52 1.43 -2.1736 -0.043472
3,0 .07 -0.52 -1.57 0.8164 0.057148
3,1 .1 -0.52 -0.57 0.2964 0.02964
3,2 .06 -0.52 0.43 -0.2236 -0.013416
3,3 .03 -0.52 1.43 -0.7436 -0.022308
4,0 .04 0.48 -1.57 -0.7536 -0.030144
4,1 .05 0.48 -0.57 -0.2736 -0.01368
4,2 .12 0.48 0.43 0.2064 0.024768
4,3 .06 0.48 1.43 0.6864 0.041184
5,0 .01 1.48 -1.57 -2.3236 -0.023236
5,1 .02 1.48 -0.57 -0.8436 -0.016872
5,2 .06 1.48 0.43 0.6364 0.038184
5,3 .15 1.48 1.43 2.1164 0.31746
Sum 0.6336
Independence
As with events (e.g., A and B) from last chapter, x and y are independent if P(x,y) = P(x)P(y) for all values of x and y
Independence and covariance are closely related, but not the same– Independent variable will have a covariance of 0
– But random variables with a covariance of 0 may not be independent
Problems With Covariance
The sign (+ or -) of the calculated covariance is meaningful, but not the magnitude
This is because the covariance is dependent on the scale of the input data
Therefore if we multiplied x or y by 10 and recalculated the covariance it will have changed even though the relationship between x and y, strictly speaking, is the same
Correlation Coefficient
The correlation coefficient is a standardized statistic that measures the relationship between random variables
Correlation coefficients range from -1 to 1– 1 is a positive relationship (both ↑ or ↓ together)– -1 is an inverse relationship (one ↑ while the other ↓)– 0 suggests, but doesn’t guarantee independence
Unlike covariance the scale of the data does not matter
Correlation Coefficient
In chapter 6 the book introduces this statistics for with the assumption that the population covariance (C) & standard deviation (σ) are known or can be calculated
The correlation coefficient for a sample is discussed in chapter 12
Correlation Coefficient
Equation:
yxxy
YXC
),(