chapters 1 & 2-final.ppt econmetrics- smith/watson

7/27/2019 Chapters 1 & 2-final.ppt Econmetrics- Smith/Watson

1/71

Copyright 2011 Pearson Addison-Wesley. All rights reserved.

Introductionto Econometrics

Chapters 1 and 2

The statistical analysis of

economic (and related)data

and

Review of Probability


2/71

Copyright 2011 Pearson Addison-Wesley. All rights reserved. 1-2

Introduction to Econometrics is title of textWhat is econometrics?

What is it? Science (& art!)

Broadly, using theory and statistical methods to analyzedata

What are some uses? Test theories

Forecast values (e.g., firms sales, unemployment, stockprices, path of a hurricane, & much, much more)

Fit mathematical economic models to data

Use data to make numerical policy recommendations ingovt. and business


3/71


Brief Overview of the Course

Economics suggests important relationships, oftenwith policy implications, but virtually neversuggests quantitative magnitudes of causaleffects.

What is the quantitative effect of reducing class size onstudent achievement?

How does a bachelors degree change earnings?

What is the price elasticity of cigarettes?

What is the effect on output growth of a 1 percentage

point increase in interest rates by the Fed? What is the effect on housing prices of environmental

improvements?

How much does knowing econometrics improve your lovelife?


4/71


Economic Questions Well Examine

1. Does reducing class size improve elementaryschool education?

2. Is there racial discrimination in the market forhome loans?

3. How much do cigarette taxes reduce smoking? 4. What will be the rate of inflation next year?

(in todays economy, a bigger question might be Whatwill be the unemployment rate next year?)

5. How much does knowing econometrics improveyour love life?


5/71


This course is about using data tomeasure causal effects.

Ideally, we would like an experiment

What would be an experiment to estimate the effect of classsize on standardized test scores?

But almost always we only have observational(nonexperimental) data.

returns to education cigarette prices

monetary policy

Most of the course deals with difficulties arising from usingobservational data to estimate causal effects

confounding effects (omitted factors) simultaneous causality

correlation does not imply causation


6/71


Learn methods for estimating causal effects usingobservational data

Learn some tools that can be used for other purposes; forexample, forecasting using time series data;

Focus on applications theory is used only as needed tounderstand the whys of the methods;

Learn to evaluate the regression analysis of others thismeans you will be able to read/understand empiricaleconomics papers in other econ courses;

Get some hands-on experience with regression analysis inyour problem sets.

In this course you will:


7/71


Three types of data

Cross-sectional Different entities, single time period

Time series

Single entity, multiple time periods

Panel Multiple entities, two or more time periods

Speaking of using observational data. . .


8/71


Empirical problem: Class size and educationaloutput

Policy question: What is the effect on test scores (orsome other outcome measure) of reducing class size by

one student per class? by 8 students/class? We must use data to find out (is there any way to answer

this withoutdata?)

Review of Probability and Statistics(Chapter 2)


9/71


The California Test Score Data Set (note 1-1)

All K through 8 California school districts (n = 420)

1999

Variables: 5th grade test scores

district-wide mean of reading and math scores for fifthgraders.

Student-teacher ratio (STR)

no. of students in the district divided by no. of full-timeteachers


10/71


Initial look at the data: (note 1-2)(You should already know how to interpret this table)

What does this table tell us about the relationship between testscores and the STR?


11/71


Do districts with smaller classes havehigher test scores?

Scatterplot of test score v. student-teacher ratio

What does this figure show?


12/71


We need to get some numerical evidence on whetherdistricts with low STRs have higher test scores but how?

1. Compare average test scores in districts with low STRs to

those with high STRs (estimation)

2. Test the null hypothesis that the mean test scores in the

two types of districts are the same, against the

alternative hypothesis that they differ (hypothesistesting)

3. Estimate an interval for the difference in the mean test

scores, high v. low STR districts (confidence interval)


13/71


Initial data analysis: Compare districts withsmall (STR < 20)andlarge (STR 20) class sizes: (note 1-3)

1. Estimation of = difference between group means

2. Test the hypothesis that = 0

3. Construct a confidence intervalfor

Class Size Average score( )

Standard deviation(sY) n

Small 657.4 19.4 238

Large 650.0 17.9 182

Y


14/71


1. Estimation (note 1-4)

=

= 657.4 650.0

= 7.4 Is this a large difference in a real-world sense?

Standard deviation across districts = 19.1

Is this a big enough difference to be important for

school reform discussions, for parents, or for aschool committee?

What does this tell us about the population?

1nsmall

Yi

i1

nsmall

Ysmall Ylarge

1nlarge

Yi

i1

nlarge


15/71


2. Hypothesis testing (note 1-5)

tYs

Yl

ss2

ns

sl2

nl

Ys

Yl

SE(Ys

Yl)

Difference-in-means test: compute the t-statistic,(remember this?)

where SE( ) is thestandard error of ,

the subscripts s and lrefer to small and large

STR districts, and (etc.)

Ys

Yl

Ys

Yl

ss

2 1

ns 1(Y

i

Ys

)2

i1

ns


16/71


2. Hypothesis testing (note 1-6)

Before testing. . . what are the H0 and HAfor this test?

1-16


17/71


Compute the difference-of-means t-statistic:(note 1-7)

= 4.05

(note p-value = .000061)

So. . . reject the null hypothesis that the two meansare the same or not? Explain your decision.

Size sY nsmall 657.4 19.4 238

large 650.0 17.9 182

Y

tYs Ylss2

ns

sl2

nl

657.4 650.0

19.42

238 17.9

2

182

7.4

1.83


18/71


3. Confidence interval (note 1-8)

A 95% confidence interval for the differencebetween the means is,

( ) 1.96SE( )

= 7.4 1.961.83 = (3.8, 11.0)

So. . . reject the null hypothesis that the two meansare the same or not? Explain your decision.

Yl

Ys

Yl

Ys


19/71


What comes next

The mechanics of estimation, hypothesis testing,and confidence intervals should be familiar

These concepts extend directly to regression andits variants

Before turning to regression, however, we willreview some of the underlying theory ofestimation, hypothesis testing, and confidenceintervals: Why do these procedures work, and why use these rather

than others? We will review the intellectual foundations of statistics

and econometrics


20/71


Review of Statistical Theory (note 1-9)

Why review probability?

Randomness everywhere theory of probability so as todescribe that randomness

Structure of notes:

1. The probability framework for statistical inference -now

2. Estimation

3. Testing

4. Confidence Intervals


21/71


Review of Statistical Theory

The probability framework for statistical inference

Single random variable: Population, random variable, and distribution

Moments of a distribution (mean, variance, standard deviation,

covariance, correlation)Two random variables:

Conditional distributions and conditional means

Four useful distributions Normal, chi-squared, students t, F

Random sampling & sampling distribution: Distribution of a sample of data drawn randomly from a population: Y1,, Yn


22/71


(a) Single random variable (note 1-10)

Population

The group or collection of all possible entities of interest(school districts)

We will think of populations as infinitely large ( is an

approximation to very big)

Sample

Whats a sample?


23/71



Fundamental concepts Outcomes

Probability

Event

Random variable Y

Numerical summary of a random outcome (district average test score, district STR)

Types of random variables Discrete

Continuous


24/71



Probability distributions - discrete Definition

Probabilities of events

c.d.f.

Bernoulli


25/71



Probability distributions continuous p.d.f.

c.d.f.


26/71


Population distribution of Y

The probabilities of different values ofYthat occurin the population, for ex. Pr[Y= 650] (when Yisdiscrete)

or: The probabilities of sets of these values, forex. Pr[640 Y 660] (when Yis continuous).


27/71


(b) Moments of a population distribution: mean,variance, standard deviation (note 1-14)

mean = expected value (expectation) ofY

= E(Y)

= Y

= long-run average value ofYover manyrepeated occurrences ofY


28/71


Moments (cont.) (note 1-15)

variance = E[(YY)2]

=

= measure of the squared spread ofthe distribution around its mean

standard deviation = = Y

Y

2

variance


29/71



skewness =

= measure of asymmetry (lack of symmetry) of adistribution

skewness = 0: distribution is symmetric

skewness > (


30/71



kurtosis =

= measure of mass in tails

= measure of probability of large values kurtosis = 3: normal distribution

kurtosis> 3: heavy tails (leptokurtotic)

E Y Y

4

Y

4


31/71



32/71


Two random variables

Random variablesXand Y

Together they have ajoint distribution

Each one has a marginal distribution

Each one has a conditionaldistribution

Joint distribution of two discrete X and Y

Probability that X and Y simultaneously take on certain values, say xand y.

Pr(X = x, Y = y) or Pr(x, y) or P(X = x, Y = y) or P(x, y)

NOTE lower case symbols x and y denote values and. . .

Upper case symbols X and Y denote random variables

Probabilities of all possible (x, y) combinations sum to what?


33/71


Joint distribution (cont.)

After recording data for many commutes prob. of long, rainy commute = P(X=0,Y=0) = .15

prob. of long, clear commute = P(X=1,Y=0) = ??

prob. of short, rainy commute = P(X=0,Y=1) = ??

prob. of short, clear commute = P(X=1,Y=1) = ??

These four outcomes are mutually exclusive and exhaust all possibilities

So, they must sum to ??

1-33


34/71


Marginal distribution

Marginal distribution is P(X=x) or P(Y=y)

Sum of joint probabilities:

prob. of long commute = P(X=0,Y=0) + P(X=1,Y=0) = .15 +.07 =.22

prob. of short commute = ??

prob. of rainy commute = ??

prob. of clear commute = ??

1-34

1

( ) ( , )L

i

i

P Y y P X x Y y


35/71


Conditional Distribution

Conditional distribution of X and Y

Probability that Y is some value conditional on (depending on or after)X taking on a specified value

Examples: distribution of. . .

test scores, given that STR < 20 wages of all female workers (Y= wages,X= gender)

mortality rate of those given an experimental treatment (Y= live/die;

X= treated/not treated)

P(Y=y | X=x) or P(y | x)

( , ) ( , )( | ) ( | )

( ) ( )

P X x Y y P x yP Y y x X or P y x

P X x P x


36/71


Conditional Distribution (cont.)

Example prob. of long commute (Y=0) if you know its raining (X=0)

If its raining, only two possibilities. What are they?

So, prob. of short commute (Y=1) if you know its raining (X=0)

= P(Y=1| X=0) = ?? (hint: recall the answer above)

1-36

( 0, 0) .15( 0 | 0) .50

( 0) .30

P X YP Y X

P X


37/71


Conditional Distribution (cont.)

Question from previous slide (cont.) prob. of short commute (Y=1) if you know its raining (X=0)

Now, check your answer by calculation

1-37

( 0, 1)( 1| 0) ??

( 0)

P X YP Y X

P X


38/71


Conditional Distribution (cont.) (note 1-18)

Questions What is prob. of long commute (Y=0) if you know its not raining (X=1)?

What is prob. of short commute (Y=1) if you know its not raining(X=1)?

What do these two probabilities sum to?

1-38


39/71


Conditional Distribution Examples. (note 1-19)Figure 2.4 Average Hourly Earnings of U.S. Full-Time Workersin 2008. Why do I say that these are conditionaldistributions?

1-39


40/71


Independence

Two rvs X and Y independent if

Knowing value of one tells you nothing about the value of the other

Conditional distribution of X & Y = marginal distribution of Y (or X)

P(Y=y | X=x) = P(Y=y) or. . .

P(X=x | Y=y) = P(X=x)


41/71


Independence (cont.)

Recall rvs X and Y independent if

P(Y=y | X=x) = P(Y=y)

Example

M = number of PC crashes & A = age of PC (0 = old & 1 = new)

P(M = 0) = 0.80 and P(M = 1) = 0.07

Are M and A independent? Explain your answer.

Case 1: P(M=0 | A = 0) = 0.70

Case 2: P(M=1 | A = 1) = 0.07


42/71


Two random variables: joint distributionsand covariance (note 1-20)

Random variablesXand Zhave ajoint distribution

The covariance betweenXand Zis

cov(X,Z) = E[(XX)(ZZ)] = XZ

The covariance is a measure of the linear association

betweenXand Z; its units are units ofXunits ofZ

cov(X,Z) > 0 means a positive relation betweenXand Z

IfXand Zare independently distributed, then cov(X,Z) = 0(but not vice versa!!)


43/71


The covariance between Test Score and STR is negative:

So is the correlation


44/71


Covariance vs. Correlation

Recall The covariance. . . units are units ofXunits ofZ.

If X & Z in feet, then covariance is feet2

If X & Z (both same variables) in meters, thencovariance is meters2

Same association but different values ofcovariance

What if X in feet and Z in lbs. What units forcovariance?

Problems!

1-44


45/71


The correlation coefficientis defined in terms ofthe covariance:

corr(X,Z) = = rXZ

1 corr(X,Z) 1

corr(X,Z) = 1 mean perfect positive linear association corr(X,Z) = 1 means perfect negative linear association

corr(X,Z) = 0 means no linear association

Correlation coefficient is unitless, so it avoids theproblems of the covariance.

corr(X,Z) when measured in feet same as corr(X,Z) whenX & Z in meters or pounds or. . .

cov(X,Z)var(X)var(Z)

XZ

X

Z


46/71


Thecorrelation coefficient measures linearassociation


47/71


Four Distributions: normal, chi-squared,Student t, F (note 1-21)

Normal Distribution

bell-shaped probability density

X ~ N(, 2)

Standard normal Z ~ N(0, 1)

Standardizing a normal r.v. (z score)

Used for finding probabilities about X ~ N(, 2)

1-47


48/71


Normal Distribution (cont.)A Bad Day on Wall Street (note 1-22)

The box A Bad Day on Wall Street has anexample of the normal distribution in theU.S. stock market

1-48


49/71


A Bad Day on Wall Street (cont.)(note 1-22)

1-49


50/71


The Chi-squared Distribution (note 1-23)

Usually written as

Shape of distribution

Shape depends on degrees of freedom m

When used

1-50

2m


51/71


The Student t Distribution (note 1-24)

Always lower case t


Symmetric like normal distribution

Shape depends on degrees of freedom m m < 20: fatter tails than normal distribution

m > 30: shape close to normal distribution

m : exactly like normal distribution

When used

1-51


52/71


The F Distribution (note 1-25)


Shape depends on two degrees of freedom

Numerator d.f. n

denominator d.f. m When used

1-52


53/71


(d) Distribution of a sample of data drawnrandomly from a population: Y1,, Yn (note 1-26)

We will assume simple random sampling

Choose an individual (district, entity) at random from thepopulation

Randomness and data

Prior to sample selection, the value ofYis random becausethe individual selected is random

Once the individual is selected and the value ofYisobserved, then Yis just a number not random

The data set is (Y1, Y2,, Yn), where Yi= value ofYfor the ith

individual (district, entity) sampled


54/71


Distribution of Y1,, Ynunder simplerandom sampling (note 1-27)

Because individuals #1 and #2 are selected atrandom, the value ofY1 has no informationcontent for Y2. Thus:

Y1 and Y2 are independently distributed

Y1 and Y2 come from the same population (distribution).That is, Y1, Y2 are identically distributed

So, under simple random sampling, Y1 and Y2 areindependently and identically distributed (i.i.d.).

More generally, under simple random sampling, {Yi},

i= 1,, n, are i.i.d.


55/71


Simple Random Sampling (note 1-28)

Recall: Under simple random sampling, {Yi}, i= 1,, n, arei.i.d.

This framework allows rigorous statistical inferences aboutmoments of population distributions using a sample of datafrom that population

Structure of notes:

1. The probability framework for statistical inference

2. Estimation - now3. Testing

4. Confidence Intervals


56/71


Estimation

is the natural estimator of the population mean. But:

a) What are the properties of ?

b) Why should we use rather than some other estimator?

Y1 (the first observation)

maybe unequal weights not simple average

median(Y1,, Yn)

The starting point is thesampling distribution of

Y

Y

Y

Y


57/71


(a) The sampling distribution of (note 1-29)

is a random variable, and its properties aredetermined bythesampling distribution of

The individuals in the sample are drawn at random.

Thus the values of (Y1, , Yn) are random

Thus functions of (Y1, , Yn), such as , are random:had a different sample been drawn, they would havetaken on a different value

The distribution of overALL possible different

samples of size nis called the. . .

sampling distribution of .

Y

Y

Y

Y

Y

Y


58/71


(a) The sampling distribution of

Recall: The distribution of overALL possible different

samples of size nis called the. . .

sampling distribution of .

The mean and variance of all of the values are themean and variance of its sampling distribution,

E( ) and var( ).

(remember: is a sample statistic.)

VIP: The concept of the sampling distribution underpinsall of inference in econometrics.

Y

Y

Y

Y

Y Y

Y


59/71


The sampling distribution of (cont.)(note 1-30)

Example: Suppose Ytakes on 0 or 1 (a Bernoullirandomvariable) with the probability distribution,

Pr[Y= 0] = .22, Pr(Y=1) = .78

Then

E(Y) =p1 + (1 p) 0 =p = .78

= E[YE(Y)]2 =p(1 p) [remember this?]

= .78 (1.78) = 0.1716

The sampling distribution of depends on n.

Consider n = 2. The sampling distribution of is,

Pr( = 0) = .222 = .0484

Pr( = ) = 2.22.78 = .3432

Pr( = 1) = .782 = .6084

Y

Y

2

Y

Y

Y

Y

Y


60/71


The sampling distribution of when Yis Bernoulli (p= .78): (note 1-31)

Y

Thi t t k b t th


61/71


Things we want to know about thesampling distribution:

What is the mean of ? IfE( ) = true = .78, then is an unbiasedestimator

of

What is the variance of ?

How does var( ) depend on n (famous 1/n formula)

Does become close to when n is large?

Law of large numbers: is a consistentestimator of

Distribution of appears bell shaped for n

largeis this generally true? Wait until next section (2.6 in 3rd ed.) to answer this

question about the SHAPE of the sampling distribution of

.

Y

Y

Y

Y

Y

Y

Y

Y

Y

Th d i f th li


62/71


The mean and variance of the samplingdistribution of

General case that is, for Yi i.i.d. from ANY distribution, notjust Bernoulli:

mean: E( ) = E( ) = = = Y

Variance: var( ) = E[ E( )]2

= E[ Y]

2

= E

= E

Y

Y

1

nYi

i1

n

1

nE(Y

i)

i1

n

1

n

Y

i1

n

Y

Y

Y

1

nY

i

i1

n

Y

2

1

n(Y

i

Y)

i1

n

2

Y


63/71


so var( ) = E

=

=

=

=

=

1

n(Y

i

Y)

i1

n

2

Y

E1

n(Y

i

Y)

i1

n

1

n(Y

j

Y)

j1

n

1

n

2E (Y

i

Y)(Y

j

Y)

j1

n

i1

n

1

n2cov(Y

i,Y

j)

j1

n

i1

n

2 2

2 21

1 1

[ : cov( , ) var( )]

n

Y Y

in note Y Y Y n n

2

Y

n

Mean and a iance of sampling


64/71


Mean and variance of samplingdistribution of (cont.) (note 1-32)

E( ) = Y

var( ) =

Implications:1. is an unbiasedestimator ofY(that is, E( ) = Y)

2. var( ) is inversely proportional to n

1. the spread (standard deviation) of the samplingdistribution is proportional to 1/

2. Thus the sampling uncertainty associated withis proportional to 1/ (larger samples, lessuncertainty, but square-root law)

Y

Y

Y2

Y

n

Y

Y

Y

n

n

Y

The sampling distribution of when n isY


65/71


The sampling distribution of when n islarge (note 1-33)

For small sample sizes, the distribution of willusually be complicated (unless. . . what is true aboutthe distribution of the Yivalues in the population?)

But ifn is large, the sampling distribution is simple!1. As n increases, the distribution of becomes more

tightly centered around Y(the Law of Large Numbers)

2. Moreover, the distribution of both become normal (theCentral Limit Theorem)

1.

2.

Y

Y

Y

Y

YY


66/71


The Law of Large Numbers: (note 1-34)

An estimator is consistentif the probability that its falls withinan interval of the true population value tends to one as thesample size increases.

If (Y1,,Yn) are i.i.d. and < , then is a consistentestimator ofY, that is,

Pr[| Y| < ] 1 as n

which can be written,

( means converges in probability toY).

Y

2Y

Y

Y

p

YY

" "p

YY


67/71


The Central Limit Theorem (CLT): (note 1-35)

If (Y1,,Yn) are i.i.d. and 0 < < , then when n islarge the distribution of is well approximated bya normal distribution.

is approximately distributed N(Y, ) (normal

distribution with mean Yand variance /n) AND. . .

( Y)/Y is Y approximately distributed N(0,1)(standard normal)

That is, standardized = = is

approximately distributed as N(0,1)

VIP: The larger is n, the better is theapproximation.

n Y

Y

2

Y

Y

Y

2

n

Y2

Y YE(Y)

var(Y)

Y Y

Y / n

Fig 2 8 Sampling distribution of when Y isY


68/71


Fig. 2.8 Sampling distribution of when YisBernoulli,p = 0.78 (n = 2, 5, 25, 100)

Y

i 2 8 S li di ib i f ( )


69/71


Fig. 2.8 Sampling distribution of (cont.)(note 1-36)

In figure on previous slide (fig. 2.8), whenn = 100, it might not be easy to see thatthe distribution of is normal.

Its easier to see this if we examine thedistribution of standardized =

See next slide

1-69

Y

Y

/

Y

Y

Y

n

Y

Same example: sampling distribution of Y E(Y )


70/71


Same example: sampling distribution of(n = 2, 5, 25, 100) (Fig. 2.9 in book)

YE(Y)

var(Y)


71/71

Summary: The Sampling Distribution of

For Y1,,Yn i.i.d. with 0 < < ,

The exact (finite sample) sampling distribution of has meanY( is an unbiased estimator ofY) and variance /n

Other than its mean and variance, the exact distribution of iscomplicated and depends on the distribution ofY(thepopulation distribution)

When n is large, the sampling distribution simplifies:

Y

Y2

Y

Y

2

Y

(Law of large numbers)p

YY ( )

var( ) var( )

YY E Y Y

Y Y

is approximately N(0,1) (CLT)

Y

chapters 1 & 2-final.ppt econmetrics- smith/watson

Documents