chapter 2: probability
DESCRIPTION
Chapter 2: Probability. Random Variable (r.v.) is a variable whose value is unknown until it is observed. The value of a random variable results from an experiment. - PowerPoint PPT PresentationTRANSCRIPT
2.1
Random Variable (r.v.) is a variable whose value is unknown until it is observed. The value of a random variable results from an experiment.
Chapter 2: Probability
Experiments can be either controlled (laboratory) or uncontrolled (observational). Most economic variables are random and are the result of uncontrolled experiments.
2.2Random Variables
A discrete random variable can take on only a finite number of values such as
• The number of visits to a doctor’s office
• Number of children in a household
• Flip of a coin
• Dummy (binary) variable: D=0 if male, D=1 if female
A continuous random variable can take any real value (not just whole numbers) in an interval on the real number line such as:
• Gross Domestic Product next year
• Price of a share in Microsoft
• Interest rate on a 30 year mortgage
2.3Probability Distributions of Random Variables
• All random variables have probability distributions that describe the values the random variable can take on and the associated probabilities of these values.
• Knowing the probability distribution of random variable gives us some indication of the value the r.v. may take on.
2.4Probability Distribution for Discrete Random Variable
Expressed as a table, graph or function
1. Suppose X = # of tails when a coin is flipped twice. X can take on the values 0, 1 or 2. Let f(x) be the associated probabilities:
Table Graph
X f(x)
0 0.25
1 0.50
2 0.25
0 1 2
0.25
0.50
f(x)
x
Probability is represented as height on this bar graph
2.5
2. Suppose X is a binary variable that can take on two values: 0 or 1. Furthermore, assume P(X=1) = p and P(X=0) = (1-p)
Function:
P(X=x) = f(x) = px(1-p)1-x for X = 0, 1
Table
X f(x)
0 (1-p)
1 p
Suppose p = 0.10
Then X takes on 0 with probability 0.90 and X takes on 1 with probability 0.10
2.6Facts about discrete probability distribution functions
1. Each probability P(X=x) = f(x) must lie between 0 and 1: 0 f(x) 1
2. The sum of the probabilities must be 1. If X can take on n different values then:
f(x1) + f(x2)+. . .+f(xn) = 1
2.7
Probability Distribution (Density)for Continuous Random Variables
Expressed as a function or graph.
Continuous r.v.’s can take on an infinite number of values in a given interval
– A table isn’t appropriate to express pdf
EX: f(x) = 2x for 0 x 1
= 0 otherwise
2.8
Because a continuous random variable has an uncountably infinite number of values, the probability of one occurring is zero.
P(X = a) = 0
Instead, we ask “What is the probability that X is between a and b?
P[a < X < b] = ?
In an experiment, the probability P[a < X < b] is the proportion of the time, in many experiments, that X will fall between a and b.
2.9
Probability is represented as area under the function.
Total area must
be 1.0
Area of triangle
is 1.0
Probability that x lies between 0 and 1/2 P [ 0 X 1/2 ] = 0.25[Area of any triangle is ½*Base*Height]
2
x1
f(x)
1/2
1
2.10
Uniform Random Variable: u is distributed uniformly between a and b
• p.d.f. is a line between a and b of height 1/(b-a)
• f(u) = 1/(b – a) if a u b
= 0 otherwise
EX: Spin a dial on a clock
a = 0 and b = 12
Find the probability that
u lies between 1 and 2
0 12
1/12
f(u)
u1 2
2.11
In calculus, the integral of a function defines the area under it:
For continuous random variables it is thearea under f(x), and not f(x) itself, whichdefines the probability of an event. We will NOT be integrating functions; when necessary we use tables and/or computers to calculate the necessary probability (integral).
b
aP [ a X b ] = f(x) dx
2.12
Rule 2: a = na i = 1
n
Rule 1: xi = x1 + x2 + . . . + xni = 1
n
Rule 4: xi +yi = xi + yii = 1 i = 1 i = 1
n n n
Rules of Summation
Rule 3: axi = a xi
2.13
Rule 5: axi +byi = a xi + b yii = 1 i = 1 i = 1
n n n
Rules of Summation (continued)
i = 1
n
n1
Rule 6: x = xi =x1 + x2 + . . . + xn
n
From Rule 6, we can prove (in class) that:
xi x) = 0i = 1
n
2.14
Rule 6: f(xi) = f(x1) + f(x2) + . . . + f(xn)i = 1
n
Notation: f(xi) = f(xi) = f(xi)
n
x i i = 1
n
Rule 7: f(xi,yj) = [ f(xi,y1) + f(xi,y2)+. . .+ f(xi,ym)] i = 1 i = 1
n m
j = 1
The order of summation does not matter :
f(xi,yj) = f(xi,yj)i = 1
n m
j = 1 j = 1
m n
i = 1
Rules of Summation (continued)
2.15
The mean of a random variable is its mathematical expectation, or expected value. For a discrete random variable, this is:
The Mean of a Random Variable
E(X) = xif(xi) = x1f(x1) + x2f(x2) + . . . + xnf(xn)where n measures the number of values X can take on
It is a probability-weighted average of the possible values the random variable X can take on. This is a sum for discrete r.v.’s and an integral for continuous r.v.’s
2.16
• E(X) tells us the “long-run” average value for X. It is not the value one would expect X to take on.
• If you were to randomly draw values of X from its pdf an infinite number of times and average these values, you would get E(X)
• E(X) = this greek letter “mu” is not used in your text but is commonly used to denote the mean of X.
2.17Example: Roll a fair die
5.36/21
)6/1(6)6/1(5
)6/1(4)6/1(3)6/1(2)6/1(1
6
1
i
ii xfxXE
Interpretation: In a large number of rolls of a fair die, one-sixth of the values will be 1’s, one-sixth of the values will be 2’s. etc., and the average of these values will be 3.5.
2.18Mathematical Expectation
• Think of E(.) as an operator that requires you to weight by probabilities any expression inside the parentheses, and then sum
• E(g(x)) = g(xi)f(xi) = g(x1)f(x1) + g(x2 ) f(x2) + . . . + g(xn ) f(xn)
2.19Rules of Mathematical Expectation
• E(c) = c where c is a constant
• E(cX) = cE(X) where c is a constant and X is a random variable
• E(a + cX) = a + cE(X) where a and c are constants and X is a random
variable.
2.20Variance of a Random Variable
• Like the mean, the variance of a r.v. is an expected value, but it is the expected value of the squared deviations from the mean
• Let g(x) = (x – E(x))2
• Variance 2 = Var(x) = E(x – E(x))2
= g(xi)f(xi)
= (xi – E(xi))2f(xi) • It measures the amount of dispersion in the possible values for X.
2.21About Variance
• Unit of measurement is X units squared
• When we create a new random variable as a linear transformation of X:
y = a + cx
We know that E(y) = a + cE(x)
But Var(y) = c2Var(x)
(proof in class) This property tells us that the amount of variation in y is determined by: the amount of variation in X and the constant c. The additive constant a in no way alters the amount of variation in the values on x.
2.22About Variance (con’t)
• E(x – E(x))2 = E[x2 – 2E(x)x + E(x)2]= E(x2) – 2E(x)E(x) + E(x)2
= E(x2) – 2E(x)2 + E(x)2
= E(x2) – E(x)2
• Run the E(.) operator thru, pulling out constants and stopping on random variables. Remember that E(x) is itself a constant, so
• E(E(x)) = E(x)
2.23Standard Deviation
• Because variance is in squared units of the r.v., we can take the square root of the variance to obtain the standard deviation.
= 2 = Var(x)
Be sure to take the square root after you square and sum the deviations from the mean.
2.24Joint Probability
• An experiment can randomly determine the outcome of more than one variable.
• When there are 2 random variables of interest, we study the joint probability density function
• When there are more than 2 random variables of interest, we study the multivariate probability density function.
2.25For a discrete joint pdf, probability is expressedin a matrix:
X f(y)
Y
-10 0 10 20
6 0 0 0.10 0.10
8 0 0.10 0.30 0.20
10 0.10 0.10 0 0
f(x)
Let X= return on stocks, Y= return on bonds
P(X=x,Y=y) = f(x,y)
e.g. P(X=10,Y=8) = 0.30
2.26About Joint P.d.F’s
• Marginal Probability Distribution: what is the probability distribution for X regardless of what values Y takes on?
f(x) = yf(x,y)
what is the probability distribution for Y regardless of what values X takes on?
f(y) = xf(x,y)
2.27• Conditional Probability Distribution:
What is the probability distribution for X given that Y takes on a particular value?
f(x|y) = f(x,y)/f(y)
What is the probability distribution for Y given that X takes on a particular value?
f(y|y\x) = f(x,y)/f(x)
2.28
• Covariance: A measure that summarizes the joint probability distribution between two random variables.
cov(x,y) = E[(x – E(x))(y-E(y))]
= x y (xi – E(x))(yi – E(y))f(x,y)
Ex:
2.29About Covariance:
It measures the joint association between 2 random variables. Try asking: “When X is large, is Y more or less likely to also be large?”
If the answer is that Y is likely to be large when X is large, then we say X and Y have a positive relationship. Cov(x,y) > 0
If the answer is that Y is likely to be small when X is large, then we say that X and Y have a negative relationship. Cov(x,y) < 0.
cov(x,y) = E[(x – E(x))(y – E(y))]
= E[xy – E(x)y – xE(y) + E(x)E(y)]
= E(xy) – E(x)E(y) – E(x)E(y) + E(x)E(y)
= E(xy) – E(x)E(y) useful!!
2.30
• Correlation
Covariance has awkward units of measurement. Correlation removes all units of measurement by dividing covariance by the product of the standarddeviations:
xy = Cov(x,y)/(xy)and –1 xy 1
Ex:
2.31What does correlation look like??
=0
=.3
=.7
=.9
2.32Statistical Independence
Two random variables are statistically independent if knowing the value that one will take on does not reveal anything about what value the other may take on:
f(x|y) = f(x) or f(y|x) = f(y)
This implies that f(x,y) = f(x)f(y) if X and Y are independent.
If 2 r.v.’s are independent, then their covariance will necessarily be equal to 0.
2.33Functions of more than one Random Variable
Suppose that X and Y are two random variables. If we sum them together we create a new random variable that has the following mean and variance:
Z = aX + bY
E(Z) = E(aX + bY) = aE(x) + bE(y)
Var(Z) = Var(aX + bY)
= a2Var(X) + b2Var(Y) + 2abCov(X,Y)
If X and Y are independent
Var(Z) = Var(aX + bY)
= a2Var(X) + b2Var(Y) see page 31
2.34Normal Probability Distribution
• Many random variables tend to have a normal distribution (a well known bell shape)
• Theoretically, x~N(β,2) where E(x) = β and Var(x) = 2
The probability density function is 2
22
1 ( )( ) exp ,
22
xf x x
xa b
2.35Normal Distribution (con’t)
• A family of distributions, each with its own mean and variance. The mean anchors the distribution’s center and the variance captures the spread of the bell-shaped curve
• To find area under the curve would require integrating the p.d.f – too complicated. Computer generated table gives all the probabilities we need for a normal r.v. that has mean 0 and variance of 1
To use the table (pg. 389), we need to take a normalrandom variable x~N(,2) and transform it by subtracting the mean and dividing by the standarddeviation. This is a linear transformation of X that creates a new random variable that has mean 0 and variance of 1.
Z = (x - )/ where z ~N(0,1)
2.36Statistical inference: drawing conclusions about a population based on a sample
)()( ii xfxXE
)(2 XVarxx
)()(
),(
YVarXVar
YXCovxy
yx
yx
XYE
YXEYXCov
)(
))((),(
T
X
X
T
tt
1
1
)( 22
T
xxs i
x
2xx ss
))((1
1yyxx
TS ttxy
2222 )()(
))((
yyxx
yyxx
ss
Sr
tt
tt
yx
xy
2
22
)(
))(()(
XE
XEXEXVar