lny09n01

8/12/2019 lny09n01

1/6

1MIT (14.32) Spring 2009J. Angrist

Preliminaries

Reading: MHE Chapters 1-2

Ultimately, were interested in measuring causal relationships !las, we ha"e to pay some pro# $ stats dues#e%ore we learn how &ut causality is a #ig and deep concept, so we should start thin'ing a#out it now

(e ma'e sense o% causal relationships using potential outcomes )hese capture *what i%s+ a#out the world

or e.ample,

/1i 0 my health i% go to the hospital

/i 0 my health i% stay away

3Here, were using an e.plicit notation %or potential outcomes 4ometimes well 'eep this in the#ac'ground5

My %riend Mi'e, who runs emergency medicine at Hart%ord Hospital, descri#es the causal e%%ect o%hospitali6ation li'e this:

People come to the ER and they want to be admitted. They figure theyll just get admitted to the hospital

and well take over and make them better. They dont realie that the hospital can be a pretty dangerous

place. !nless youre really sick" youre really better off going home.#

How does /1icompare with /i+ (e can ne"er 'now %or sure so we try to loo' at e.pectations or a"erages:

E7/1i 8/i901 0 E7/1i9i01 8 E7/i9i01

3n general, *E7/i;

8/12/2019 lny09n01

2/6

Lecture !te 1

Pr!"a"ilit# an$ %istri"uti!n

Reading: (ooldridge !ppendices ! and &

! Bro#a#ility

*! system %or uanti%ying chance and ma'ing predictions a#out %uture e"ents

Concepts

$ample space: 4 0 @a1, a2, a=, , aDA the basic elements of the experiment

example: toss two coins (J=4)

(to make this interesting, we could place bets)

Random variable: X(a) the data

A function that assigns numerical values to events

example: number of heads in two coin tosses

Probability: a function defined over events or random variables.

When defined over events, probability satisfies axioms:

0P(A)1P(S)=1

P{UjAj) =jP(Aj) for disjoint events Aj

and has properties

P() =0P(Ac) = 1 - P(A)

AB P(A)P(B)P(AB)=P(A)+P(B)-P(AB)

When we write P(x) for a discrete r.v. this is shorthand for P(the union of all events ajsuch that X(aj)=x).

For a continuous r.v., we write P(Xx) to mean P(the union of all events ajsuch that X(aj)x).

But what is probability really?

The relative frequency of an event in many () repeated trials.

A personal and subjective assessment of the likelihood of an event, where the assessment obeys the

axioms of probability

2

8/12/2019 lny09n01

3/6

Probabil ity (cont.)

Conditional probability: P(A|B) P(AB)/P(B)

Conditional probability has the properties of and obeys the axioms of probability

Bayes Rule: Let the set {Ci; i=1, . . .I} be apartitionof the sample space. Then:

P(Ci| A) = {P(A|Ci)P(Ci)}/{iP(A| Ci)P(Ci)}

Proof: use P(Ci| A)=P(A|Ci)P(Ci)/P(A) and the fact that {Ci; i=1,. . .,I} is a partition.

Bayes rule is useful for reversing conditional probability statements.

Independence: A is said to be independent of B iff P(AB)=P(A)P(B)

Sometimes we write: AB

Note: AB P(A|B)=P(A)

Note: r.v.s are independent if their distribution or density functions factor (more below)

B. Distribution and density functions (how we characterize r.v.s)

For the rest of the course, our probability statements will apply directly to r.v.s

1. Discrete random variables

Empirical distribution functions

Example: years of schooling

Probability mass function (pmf)

Parametric examples: Bernoulli, binomial, multinomial, geometric

Cumulative distribution functions(cdf)-- discrete r.v.

Obtain by summation

2. Continuous random variables

Probability density functions(pdf) note: P(X=x)=0

=

8/12/2019 lny09n01

4/6

Parametric examples: uniform, exponential, normal;

empirical PDFs of students grades

Cumulative distribution functions -- continuous r.v.

Obtain by integration

P(Xc)=F(c) = -cf(t)dt

P(aXb)= abf(t)dt= F(b)-F(a)

P(X>c)=1-F(c)

Relationship between cdf and pdf

F(x)=f(x).

3. Functions of random variables

Mantra: A function of a random variable is a random variable and therefore has a distribution

Discrete r.v.

Y=r(X); P(X=xj) = f(xj); then

g(y) = P[r(X)=y] = {x: r(x)=y}f(x)

Continuous r.v.

Examples: (i) Y=lnX; X F

G(y)=P(Y

8/12/2019 lny09n01

5/6

If r is decreasing, G(y) = 1-F[s(y)] and g(y) = -f[s(y)]s(y)

Important special case: Y = r(X) =a + bX; b>0

X = (Y-a)/b = s(Y)

G(Y) = F[(Y-a)/b]

g(Y)=f[(Y-a)/b](1/b)

Standardize r.v. X by setting a=-E(X)/Xand b=1/X.

C. Bivariate distribution functions: how r.v.s move together

For discrete r.v.s: f(x,y) = P(X=x, Y=y)

For continuous r.v.s: f(x,y) is the joint density

Probability statements for joint continuous r.v.s. use the cdf:

F(x, y) = P(Xx, Yy) = -x-

yf(s,t)dsdt

Marginal distributions

Marginal for X: f1(x); obtain by integrating the joint density or summing the joint pmf over Y

Marginal for Y: f2(y); obtain by integrating the joint density or summing the joint pmf over X

Conditional distributions

Divide the joint density or pmf by the marginal density or pmf

f2(y|x)=f(x,y)/f1(x); f1(x|y)=f(x,y)/f2(y)

Example:

Joint normal: marginal and conditional are also normal

f(x,y) = [(1/2)(1-2)]-1/2exp{-1/2(1-2)[(x-x)2-2(x-x)(y-y)+(y-y)

2]}

X and Y are normally distributed with means xand y, standard deviation 1,

F

8/12/2019 lny09n01

6/6

and correlation

Example:

Roof distribution

f(x,y)=(x+y) for 0x1 and 0y1.

f1(x)=x+(1/2)

f2(y)=y+(1/2)

f2(y|x)= 2(x+y)/(2x+1)

D. Example: the effect of a wage voucher (Burtless, 1985)

Simple conditional distributions for Bernoulli outcomes in a randomized trial

G

lny09n01

Documents