using matching, instrumental variables and control...

Post on 22-May-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using Matching, Instrumental Variables

and Control Functions to Estimate

Economic Choice Models

James J. Heckman and Salvador NavarroThe University of Chicago

Review of Economics and Statistics 86(1) (2004)This draft for Nu eld College, Oxford

August 7, 2005

1

1 Introduction

1. We orient the discussion of the selection of alternativeestimators around the economic theory of choice.

We compare the di erent roles that the propensity scoreplays in three widely used econometric methods, and theimplicit economic assumptions that underlie applicationsof these methods.

2

2. Conventional matching methods do not distinguish be-tween excluded and included variables.We show that match-ing breaks down when there are variables that predictthe choice of treatment perfectly whereas control func-tion methods take advantage of exclusion restrictions anduse the information available from perfect prediction toobtain identification. Matching assumes away the possi-bility of perfect prediction while selection models rely onthis property in limit sets.

3

3. We define the concepts of “relevant” information and“minimal relevant” information, and distinguish agentand analyst information sets. We state clearly what in-formation is required to identify di erent treatment para-meters. In particular we show that when the analyst doesnot have access to the “minimal relevant” information,matching estimators of di erent treatment parametersare biased.

4

• Having more information, but not all “minimal rel-evant” information, can increase bias compared tohaving less information.

• Enlarging the analyst’s information set with vari-ables that do not belong in the relevant informationset may either increase or decrease bias from match-ing.

• Because the method of control functions explicitlymodels omitted relevant variables, rather than as-suming that there are none, it is more robust toomitted conditioning variables.

5

4. The method of matching o ers no guidance as to whichvariables to include or exclude in conditioning sets. Suchchoices can greatly a ect inference. There is no supportfor the commonly used rules of selecting matching vari-ables by choosing the set of variables that maximizes theprobability of successful prediction into treatment or byincluding variables in conditioning sets that are statis-tically significant in choice equations. This weakness isshared by many econometric procedures but is not fullyappreciated in recent applications of matching which ap-ply these selection rules when choosing conditioning sets.

6

2 A Prototypical Model of EconomicChoice

Two potential outcomes ( 0 1).= 1 if 1 is selected. = 0 if 0 is selected.

Let be utility.

= ( ) = 1 ( 0) (1)

: observed factors determining choices, unobserved

1 = 1( 1) (2a)

0 = 0( 0) (2b)

0 1 are (absolutely) continuous7

= 1 0

Additively Separable Case: For Familiarity. Not essential.

= ( ) + ( ) = 0 (10)

1 = 1( ) + 1 ( 1) = 0 (2 0)

0 = 0( ) + 0 ( 0) = 0 (2 0)

8

3 Parameters Discussed Today

: ( 1 0| ) (Average Treatment E ect)

: ( 1 0| = 1) (Treatment on the Treated)

: ( 1 0| = 0) (Marginal Treatment E ect)

These are familiar but by no means the only parameters wecould consider

From MTE, can identify many other parameters

(Recall IV Lectures)9

4 The Selection Problem

Samples generated by choices:

( | = 1) = ( 1| = 1)

( | = 0) = ( 0| = 0)

Data:Pr ( = 1| )

( 1| = 1) and ( 0| = 0)

From raw means, we get biases.Can form ( 1| = 1) ( 0| = 0)

10

In General This Produces BIASES

:

Bias = [ ( | = 1) ( | = 0)]

( 1 0| = 1)

= [ ( 0| = 1) ( 0| = 0)]

Under Additive Separability

Bias = ( 0| = 1) ( 0| = 0)

:

Bias = ( | = 1) ( | = 0)

( 1 0| )

11

Under Additive Separability

Bias = [ ( 1| = 1) ( 1| )]

[ ( 0| = 0) ( 0| )]

:

Bias = ( | = 1) ( | = 0)

( 1 0| = 0)

Under Additive Separability

Bias = ( 1| = 1) ( 1| = 0)

[ ( 0| = 0) ( 0| = 0)]

12

5 How Di erent Methods Solve theBias Problem

5.1 Matching

= ( )

( 1 0) | (M-1)

" ” denotes independence given

0 Pr( = 1| ) = ( ) 1 (M-2)

Rosenbaum and Rubin (1983) show (M-1) and (M-2) imply

( 1 0) | ( ) (M-3)

13

( 1| = 0 ( )) = ( 1| = 1 ( )) = ( 1| ( ))

( 0| = 1 ( )) = ( 0| = 0 ( )) = ( 0| ( ))

Dependence between and ( 1 0) is eliminated by condi-tioning on :

( 1 0)|“Selection on Observables”

If ( ) = 1 or ( ) = 0 method breaks down for thosevalues.

14

Extensions (Heckman, Ichimura, Smith and Todd)

Distinction between andIntroducing allows one to solve the breakdown problem aris-ing from

( ) = 1 or ( ) = 0

Thus if outcomes are defined in terms of and

( | ) = ( )

If we can find another value 0 such that

Pr( 0) 6= 1can match using this (IV assumption)

15

Require only weaker mean independence assumptions

( 1| = 1) = ( 1| )

( 0| = 0) = ( 0| )

Can be used for Means.

16

Matching is “for free” (Gill and Robins (2001)):

( 0| = 1 ) is not observed.

Can just as well replace it by

( 0| = 1 ) = ( 0| = 0 )

However, the implied economic restrictions are not “for free”.

Imposes that, conditional on and , the marginal person isthe same as the average person.

This is the same as a flat MTE( ) in(MTE does not depend on )

17

5.2 Control Functions

Additively Separable Case

We observe left-hand sides of

( 1| = 1) = 1 ( ) + ( 1| = 1)

( 0| = 0) = 0 ( ) + ( 0| = 0)

If ( 1 )

( 1| = 1) = ( 1| ( ) ) = 1 ( ( ))

If ( 1 )

( 0| = 0) = ( 0| ( ) ) = 0 ( ( ))

18

So, key assumption

( 1 0 ) ( )

Under this condition

( 1| = 1) = 1 ( ) + 1 ( ( ))

( 0| = 0) = 0 ( ) + 0 ( ( ))

Need Limit Set Results

lim1

1 ( ) = 0 and lim0

0 ( ) = 0

19

• If there are limit setsZ0 and Z1 such that limZ0

( ) =

0 and limZ1

( ) = 1, then we can identify the con-

stants.

• There are semiparametric versions of these estimators.• Use polynomials in ; Local Linear Regression in

20

From this model can obviously identify

= 1 ( ) 0 ( )

(As we have seen)

Plus,

= 1 ( ) 0 ( ) + ( 1 0| = 1)

= 1 ( ) 0 ( ) + 1 ( ( ))

+

μ1

¶0 ( ( ))

21

= 1 ( ) 0 ( )

+[ ( 1 0| = 1) ( )]

( )

= 1 ( ) 0 ( )

+

£( )

©1 ( ( ) + 1

0 ( ( )ª¤

( )

Marginal and Average are allowed to be di erent.

22

Both Matching and Control functions are defined only over

Support ( | = 1) Support ( | = 0)

Method of control functions does not require

( 0 1) | ( )

But Matching does.

23

Matching is a special case of control functions in the additivelyseparable case.

Additive separability and control functions assumptions arecentral to this claim.

( 1| = 1) = ( 1| ) = ( 1| ( ))

( 0| = 0) = ( 0| ) = ( 0| ( ))

If

1( ) = ( 1| ) and 0 ( ) = ( 0| )

then( 1| ( )) = 0 and ( 0| ( )) = 0

24

In the method of control functions,

If ( ) ( 0 1 )

( | )

= ( 1| = 1) + ( 0| = 0) (1 )

= 0 ( ) + ( 1 ( ) 0 ( )) + ( 1| = 1)

+ ( 0| ( ) = 0) (1 )

= 0 ( ) + ( 1 ( ) 0 ( )) + ( 1| ( ) = 1)

+ ( 0| ( ) = 0) (1 )

= 0 ( )

+ [ 1 ( ) 0 ( ) + 1 ( ( )) 0 ( ( ))]

+ 0 ( ( ))

25

To identify1( ) 0( )

must isolate it from1( ( )

and0( ( )

26

Under matching assumptions,

( | ( ) ) = 0 ( ( )) + ( 1 ( ( )) 0 ( ( )))

+ [ ( 1| ( )) ( 0| ( ))]

+ ( 0| ( ))

( | ( ) ) = 0 ( ( )) + ( 1 ( ( )) 0 ( ( )))

since( 1| ( )) = ( 0| ( )) = 0

27

We can get identification without exclusion, conditional on

( 1 0) |

28

Tables 1 and 2 present sensitivity analysis for the case of

( 1 0 )0 (0 )

( ) =

( ) = 2; = {0 1}

so

Bias ( ( ) = ) = 0 0 ( )

Bias ( ( ) = ) = ( ) [ 1 1 (1 ) + 0 0 ]

where ( ) =( 1(1 ))

(1 )

29

Even if the correlation between the observables and the unob-servables

¡ ¢is small, so that one might think that selection

on unobservables is relatively unimportant, we still get sub-stantial biases if we do not control for relevant omitted condi-tioning variables.

These examples also demonstrate that sensitivity analyses canbe conducted for control function models even when they arenot fully identified.

30

����� �

��� ��� �� �������� � ��� �������

��� ������� ��� ��� � �� ������� ��� ��� � ��

����� ������� �������

����� ���� � ���!���

����� �����!� �������

����� ��� �� �����!�

���� ������ ������

���� �� �� ����!�

���� ����!� ������

���� ��� � ��!���

���� ������ ������

"�#�� � ��� � �� ��������� � ���������

���������

31

����� �

��� ��� �� ������� �������� $��%�

��� � ��

��� ����� ����� ����� ����� � ���� ���� ���� ����

��� ��� � ��

����� ������� ����!�� ���� � ������� �����!� ���!��� ��� �� ����� � �

����� ����!�� ���� � ������� �����!� ���!��� ��� �� ����� � � ���� �

����� ���� � ������� �����!� ���!��� ��� �� ����� � � ���� � �� ��

����� ������� �����!� ���!��� ��� �� ����� � � ���� � �� �� ��!���

� �����!� ���!��� ��� �� ����� � � ���� � �� �� ��!��� ����!�

���� ���!��� ��� �� ����� � � ���� � �� �� ��!��� ����!� ������

���� ��� �� ����� � � ���� � �� �� ��!��� ����!� ������ ��� �

���� ����� � � ���� � �� �� ��!��� ����!� ������ ��� � ���!��

���� � ���� � �� �� ��!��� ����!� ������ ��� � ���!�� ������

��� ��� � ��

����� ���!��� ������� ������� ���� � �����!� ��� �� � �� �� ����!�

����� ��� !�� ������� ����!�� ������� ���!��� ����� � ���� � ��!��� ������

����� ������� ������� ���� � �����!� ��� �� � �� �� ����!� ��� �

����� ������� ����!�� ������� ���!��� ����� � ���� � ��!��� ������ ���!��

� ������� ���� � �����!� ��� �� � �� �� ����!� ��� � ������

���� ����!�� ������� ���!��� ����� � ���� � ��!��� ������ ���!�� ������

���� ���� � �����!� ��� �� � �� �� ����!� ��� � ������ ������

���� ������� ���!��� ����� � ���� � ��!��� ������ ���!�� ������ �� !��

���� �����!� ��� �� � �� �� ����!� ��� � ������ ������ ��!���

"�#��$ � ��� � �� ������� ��� � �� ������

"�#��$ & "�#��$ ������� �� � ���� � �� � ��� � �������� �

����������

����� �����������

�����32

5.3 Instrumental Variables

Matching andMethod of control functions work with ( | )and Pr ( = 1| ).

= 1 + (1 ) 0

= 0 ( ) + ( 1 ( ) 0 ( ) + 1 0) + 0

= 0 ( ) + ( ) + 0

If 1 = 0

( 0| ( ) ) = ( 0| ) (IV-1)

Pr ( = 1| ) is a nontrivial function of for each(IV-2)

33

When1 6= 0 but ( 1 0) |

or alternatively( 1 0| )

we have

= ( 1 0| ) = ( ( ) | )

= ( 1 0| = 1) = ( 1 0| )

=

34

Analytically More Interesting Case1 6= 0 and 6 ( 1 0)For :

( 0 + ( 1 0) | ( ) ) = ( 0 + ( 1 0) | )(IV-3)

For :

( 0 + ( 1 0) ( 0 + ( 1 0) | ) | ( ) )

= ( 0 + ( 1 0) ( 0 + ( 1 0) | ) | )

For we can rewrite:

( 0| ( ) )+ ( 1 0| = 1 ( ) ) ( )

= ( 0| ) + ( 1 0| = 1 ) ( )

Conditions Satisfied If 1 = 0, or ( 1 0) | ( )

35

1. Method of Control Functions Models Dependence be-tween ( 1 0) and

2. Matching assumes ( 1 0) |3. Linear IV requires that 0 + ( 1 0) be mean inde-pendent of (or ( )) given X.

36

Local Instrumental Variables (LIV) require that

( ) be a non-degenerate random variable given

(existence of an exclusion restriction) (LIV-1)

( 0 1 ) | (LIV-2)

0 Pr ( | ) 1 (LIV-3)

Support ( |( )) = [0 1] (LIV-4)

Under these conditions,

( | ( ))

( ( ))= ( ( ) = 0)

37

Proof:

( | ( )) = ( 1| = 1 ( )) ( )

+ ( 0| = 0 ( )) (1 ( ))

=

Z Z( )

1 ( 1 | ) 1

+

Z Z ( )

0 ( 0 | ) 0

where = ( ) Thus

( | ( ))

( )= ( 1 0| = ( ))

=

38

����� �

"���� '�� ���(�)���� �� "�)��%�� $%����% ���(�)���� *����'�� ��� +�(� ������� ,��%(���� � ���� -�)��

.�������� � � �� �

������ $/%�(��� :�;(����< #�)��������' � =���������� +(%����� +���� ������� & >�' "�����%����

�� *����������� :�;(����< �������< .������ �� ����

� =(�%��� $;(�����< ?@��� ���A ?���(��� ��)��������'A

���%���� B� B� B� C�� ������ � �� �� � ������� ������ � �� �� � �������

.����� +(%����� C�� ? �� .��������D .��������D B� ������ � �� �� ��

�)��������% �(� �� ��;(���� �(� �� ��;(���� ������ � �� ��

������%����A %� �� ������ ���)�����' �

�� ��� �� �� ��� � ���)�%�����'

�� ����%�)�� %� �� ��������

����(�� ����� ���(���� �� �'�����' ���(�)����

"E C�� C�� B� B� ?C�� � �� �� �� � �� �����?%��������A ������� %���A � �� �� �� � �� ��� � ��

�� �� �� � ��� �� �� �� � �� ��� �� ��� ���� �� �� �� � ��� �� �� �� � �� ��� ��� ��� �

F"E C�� B� B� B� ��� �� �� �� ���� �� � ������ �� � �������� (%��� � � �� ��%� ��

�+�� )��)����' �%��� ���%���D ����� ��� ��)��%�� G��� � ����� � ����� )��������� �� %�������� �������.������� �� G����� ��� %����� (%��� � ����� � � ����� ��� ���� � ��� ��/��

39

6 The bias fromMatching: Informa-tion requirements

Fundamental Problem: Information of the Analyst often lessthan that of the Agent.

Definition 1 We say that ( ) is a relevant informationset if its associated random variable, , satisfies (M-1) so

( 1 0) |

40

Definition 2 We say that ( ) is a minimal relevant in-formation set if it is the intersection of all sets ( ) and( 1 0) | . The associated random variable is theminimum amount of information that guarantees that (M-1)is satisfied. Intersection may be empty. May not be a uniqueminimal information set.

Definition 3 The agent’s information set, ( ), is definedby the information used by the agent when choosing amongtreatments. Accordingly, we call the agent’s information.

41

Definition 4 The econometrician’s full information set, ( ),is defined by all the information available to the econometri-cian,

Definition 5 The econometrician’s information set, ( )is defined by the information used by the econometrician whenanalyzing the agent’s choice of treatment, .

42

Obvious Inclusions: ( ) ( ) ( ) ( ) and( ) ( )

This assumes exists.

Matching implies( ) ( )

43

Generalized Roy Examples (Assume Factor Structure for errorterms)

= +

= + 1 1 + 2 2 +

= 1 if 0 = 0 otherwise

1 = 1 + 1 = 1 + 11 1 + 12 2 + 1

0 = 0 + 0 = 0 + 01 1 + 02 2 + 0

( 1 2 1 0) mean zero random variables, mutually inde-pendent of each other and

44

The minimal relevant information set when factor loadings arenot zero:

= { 1 2}Agent information sets may include di erent variables. If shocksto the outcomes not known, but other terms are:

= { 1 2 }Under perfect certainty, = { 1 2 1 0}.(This is the original Roy model)

Construct examples using:

( 1 2 1 0) (0 )

( ) =¡

21

22

2 21

20

¢45

6.1 The economist uses the minimal rele-vant information: ( ) ( )

Suppose = { 1 2}

( 1| = 1 ) ( 0| = 0 )

= 1 0 + ( 11 01) 1 + ( 12 02) 2

Knowledge of ( 1 2) and knowledge of ( 1 2) equivalent

( ) = Pr

μ1 1 2 2

= 1

μ1 1 2 2

¶=

46

( 1| = 1 ( ) = ) ( 0| = 0 ( ) = )

= 1 0 + ( 1| = 1 ( ) = )

( 0| = 0 ( ) = )

= 1 0 +

μ1| 1 (1 )

¶μ

0| 1 (1 )

¶= 1 0

47

This is equal to all the treatment parameters

(MTE = ATE = LATE = TT)μ1| 1 (1 )

¶=

( 1 )1( )μ

0| 1 (1 )

¶=

( 0 )0( )

where

1( ) =( 1 (1 ))

and 0( ) =( 1 (1 ))

1

because

( ) = ( 1 1 + 2 2 + ) = 0 = 0 1

48

6.2 The Economist does not use all of theMinimal Relevant Information:

= { }1 1 + 2 2 +q

2121+ 2

222+ 2

S 1 (1 )

Bias ( ( ) = ) = 0 ( )

( ) = 1( ) 0( )

Bias ( ( ) = ) = ( ) [ 1 (1 ) + 0 ]

49

Bias ( ( ) = ) = ( ) [ 1 (1 ) + 0 ]1 (1 ) [ 1 0]

where

( ) =( 1 (1 ))

(1 )

1 =1 11

21+ 2 12

22q

2121+ 2

222+ 2

0 =1 01

21+ 2 02

22q

2121+ 2

222+ 2

50

6.3 Adding information to the econometri-cian’s information set : using Some,but not all, Information from the Mini-mal Relevant Information Set

0 = { 2}May raise or lower the bias.Comparable to 1and 0 above, we can define

01 =

1 1121q

2121+ 2

00 =

1 0121q

2121+ 2

51

Condition 1 The bias produced by using matching to estimateis smaller in absolute value for any given when the new

information set ( 0 ) is used if

| 0| | 00|

Condition 2 The bias produced by using matching to estimateis smaller in absolute value for any given when the new

information set ( 0 ) is used if

| 1 (1 ) + 0 | | 01 (1 ) + 0

0 |

52

Condition 3 The bias produced by using matching to estimateis smaller in absolute value for any given when the

new information set ( 0 ) is used if¯̄( ) [ 1 (1 ) + 0 ]

1 (1 ) [ 1 0]¯̄¯̄

( ) [ 01 (1 ) + 0

0 ]1 (1 ) [ 0

100]¯̄

53

Proof of Condition 1

Suppose

0 =1 01

21+ ( 2

2)³

02

2

´22q

2121+ 2

222+ 2| {z }

when 2 is in information set

1 0121q

2121+ 2| {z }

when 1 is not

= 00

When³

02

2

´= 0 0

00

0³02

2

´ = 2222q

2121+ 2

222+ 2

0

54

There is some critical value 02 beyond which 000

Assume

01 = 1 = 2 = 1

02 = 12 = 1

11 = 2

55

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2B

ias

P

Matching on P(Z)Matching on P(Z,f2)

Average Bias = 1.2671

Average Bias = 1.1380

Figure 1.--Bias for Treatment on the TreatedSpecial case:Adding relevant information f2 increases the bias

56

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.5

2

2.5

3

3.5

4

P

Matching on P(Z)Matching on P(Z,f2)

Average Bias = 1.6553

Average Bias = 1.9007

Figure 2.--Bias for Average Treatment EffectSpecial case: Adding relevant information f2 increases the bias

Bia

s

57

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.5

2

2.5

3

3.5

4B

ias

P

Matching on P(Z)Matching on P(Z,f2)

Average Bias = 1.9007

Average Bias = 1.6553

Figure 3.--Bias for Marginal Treatment EffectSpecial case: Adding relevant information f2 increases the bias

58

Control Function Method Models the Bias

In control function method, adding 2 we simply change thecontrol function. We go from

1 ( ( ) = ) = 1 1 ( )

0 ( ( ) = ) = 0 0 ( )

to

01 ( ( 2) = ) = 0

1 1 ( )00 ( ( 2) = ) = 0

0 0 ( )

where 1 ( ) =( 1(1 )) and 0 ( ) =

( 1(1 ))1

This protects us against misspecification.59

6.4 Adding information to the econometri-cian’s information set: using proxies forthe relevant information

Suppose we do not know 2, just proxy it by ee =

n eoSuppose = e

Suppose further thate ¡0 2

¢³ e

2

´= and e ( 0 1 1)

60

Expressions corresponding to 0 and 1 :

e1 =

11 121+ 12 2 (1

2) 22q

2121+ 2

222(1 2) + 2

e0 =

01 121+ 02 2 (1

2) 22q

2121+ 2

222(1 2) + 2

61

• By substituting 0 for e and 0 for e ( = 0 1) intoConditions (1), (2) and (3) we obtain equivalent resultsfor this case. Whether e will be bias reducing dependson how well it spans and the signs of the terms in theabsolute values.

• In this case, there is another parameter ( = 0).

62

• The bias generated when the econometrician’s informa-tion is e can also be smaller than when it is 0 . It canbe the case that knowing variable e isthan knowing the actual variable 2. Take treatment onthe treated as the parameter. The bias is reduced whene is used instead of 2 if

¯̄̄¯̄̄ 01 1

21+ 02 2 (1

2) 22q

2121+ 2

222(1 2) + 2

¯̄̄¯̄̄

¯̄̄¯̄̄ 01 1

21q

2121+ 2

¯̄̄¯̄̄

63

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8B

ias

P

Matching on P(Z)Matching on P(Z,Z)

Average Bias = 1.1616Average Bias = 1.1380

Figure 4.--Bias for Treatment on the TreatedSpecial case: Adding irrelevant information Z increases the bias

correlation(Z,f2)=0.5

~~

~

64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.5

2

2.5

3

3.5B

ias

P

Matching on P(Z)Matching on P(Z,Z)

Average Bias = 1.6553 Average Bias = 1.7019

Figure 5.--Bias for Average Treatment EffectSpecial case: Adding irrelevant information Z increases the bias

correlation(Z,f2)=0.5

~~

~

65

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2B

ias

P

Matching on P(Z)Matching on P(Z,Z)

Average Bias = 1.7019Average Bias = 1.6553

Figure 6.--Bias for Marginal Treatment EffectSpecial case: Adding irrelevant information Z increases the bias

correlation(Z,f2)=0.5

~

~~

66

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

Figure 7.--Bias for Treatment on the TreatedUsing proxy Z for f2 increases the bias

correlation (Z,f2)=0.5B

ias

Matching on P(Z,f2)Matching on P(Z,Z)

P

~

Average Bias = 1.8910

Average Bias = 1.2671

~~

67

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.5

2

2.5

3

3.5

4Matching on P(Z,f2)Matching on P(Z,Z)

Bia

s

P

~

Figure 8.--Bias for the Average Treatment EffectUsing proxy Z for f2 increases the bias

correlation (Z,f2)=0.5

~~

Average Bias = 2.0666

Average Bias = 1.9007

68

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6Matching on P(Z,f2)Matching on P(Z,Z)~

P

Figure 9.--Bias for the Marginal Treatment EffectUsing proxy Z for f2 increases the bias

correlation (Z,f2)=0.5

Average Bias = 2.0666

Average Bias = 1.9007

Bia

s

~~

69

6.5 The case of a discrete treatment

= +

= 1 1 + 2 2 + , = 0 1

= 1 if 0 = 0 otherwise,

People receive treatment according to the rule

= +

= 1 1 + 2 2 +

= 1 if 0 = 0 otherwise;

1 2 0 1 ( 1 2) ( 0 1 )

70

The e ect of treatment is given by:

1 ( ) =Pr ( 1 = 1 = 1| )

Pr ( 0 = 1 = 1| )

A second definition works with odds ratios:

2 ( ) =

Pr( 1=1 =1| )Pr( 1=0 =1| )

Pr( 0=1 =1| )Pr( 0=0 =1| )

71

One could also work with log Under the null hypothesis ofno e ect of treatment 1 = 2 = 1

b1 ( ) =

Pr ( 1 = 1 = 1| )

Pr ( 0 = 1 = 0| )

The denominator replaces the desired probability

Pr ( 0 = 1 = 1| )

by

Pr ( 0 = 1 = 0| )

72

Under the Null Hypothesis of no “real” e ect of treatment

1 = 0 =

1 = 0 =

can be generated by

11 = 01 = 1

12 = 02 = 2

1 = 0 =

73

Assume initially that

= { 1 2}

b1 ( ) =

Pr ( 1 = 1 = 1| 1 2)

Pr ( 0 = 1 = 0| 1 2)

=Pr ( 1 = 1| 1 2)

Pr ( 0 = 1| 1 2)= 1( )

In general:

1( ) =Pr ( 1 = 1 = 1| )

Pr ( 0 = 1 = 1| )

6= Pr ( 1 = 1 = 1| )

Pr ( 0 = 1 = 0| )= b

1 ( )

74

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 30

1

4

6

8

10

12

14

16

f2

� 1

IE={f1,f2}IE={f2}

������

� �� �� ���� ���������

����� �� ���� ���������

����� �� ���� ���������

���������� ��������

���������� ��������

�=1(V>0)

Figure 10.--Estimated Effect of Treatment under Different Information SetsNo Effect of Treatment and � v2=1

�������������________________�1�������������

<

^

2

75

-1 -0.5 0 0.5 1 1.5 20

1

2

3

4

5

6

f2

� 1

Figure 11.--Estimated Effect of Treatment under Different Information SetsNo Effect of Treatment and �v2=-1

�������������________________�1�������������

IE={f1,f2}IE={f2}

������

� �� � ���� ���������

����� �� ���� ���������

����� �� ���� ���������

���������� ��������

���������� ��������

<

^

76

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 30

2

4

6

8

10

12

14

f2

� 2

Figure 12.--Estimated Effect of Treatment under Different Information SetsNo Effect of Treatment and �v2=1

�������������________________�2

IE={f1,f2}IE={f2}

������

� �� �� ���� ���������

����� �� ���� ���������

����� �� ���� ���������

���������� ��������

���������� ��������

�������

�������������________________

__________________�������������

�������������

<

^

1

77

-1 -0.5 0 0.5 1 1.5 20

5

10

15

20

25

f2

� 2

������

� �� � ���� ���������

����� �� ���� ���������

����� �� ���� ���������

���������� ��������

���������� ��������

�������

Figure 13.--Estimated Effect of Treatment under Different Information SetsNo Effect of Treatment and �v2=-1

�������������________________�2

�������������________________

__________________�������������

�������������

IE={f1,f2}IE={f2}

<

^

1

78

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

f2

� 3

Figure 14.--Estimated Effects of Treatment under Different Information SetsNo Effect of Treatment and �v2=1

�������������________________�3�!�"���������������

IE={f1,f2}IE={f2}

������

� �� �� ���� ���������

����� �� ���� ���������

����� �� ���� ���������

���������� ��������

���������� ��������

�������

<

^

79

-1 -0.5 0 0.5 1 1.5 2-4

-3

-2

-1

0

1

2

f2

� 3

IE={f1,f2}IE={f2}

������

� �� � ���� ���������

����� �� ���� ���������

����� �� ���� ���������

���������� ��������

���������� ��������

�������

Figure 15.--Estimated Effects of Treatment under Different Information SetsNo Effect of Treatment and �v2=-1

�������������________________�3�!�"���������������

<

^

80

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-1

-0.5

0

0.5

1

1.5

2

2.5

3

f2

� 4

������

� �� �� ���� ���������

����� �� ���� ���������

����� �� ���� ���������

���������� ��������

���������� ��������

�������

Figure 16.--Estimated Effect of Treatment under Different Infomation SetsNo Effect of Treatment and �v2=1

�������������________________�4!�"

�������������________________

__________________�������������

�������������

IE={f1,f2}IE={f2}

� �

<

^

-

81

-1 -0.5 0 0.5 1 1.5 2-1

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

f2

� 4

Figure 17.--Estimated Effects of Treatment under Different Information SetsNo Effect of Treatment and �v2=-1

�������������________________� ��������������________________

__________________�������������

�������������

IE={f1,f2}IE={f2}

������

� �� � ���� ���������

����� �� ���� ���������

����� �� ���� ���������

���������� ��������

���������� ��������

�������

�4!�"

<

^

82

6.6 On the use of model selection criteria tochoose matching variables

• Adding more variables to the Information Set may in-crease Bias.

• How to choose the relevant variables?

• Standard methods on model selection criteria fail.• An implicit assumption underlying such procedures isthat the added conditioning variables are exogenousin the following sense

( 0 1) | (M-4)

( is the list of initial variables used as conditioningvariables.)

83

• Sometimes procedures suggested “Add variables whenratios big in propensity score”

• Improve Fit• Such procedures can raise the bias.

Consider the following example:

ee = { }where

= +¡0 2

¢( 1 2 0 1 )

might be an elicitation from a questionaire.

84

Same expressions for the biases using ee ( = 0 1) instead ofwhere:

ee1 =

¡11 1

21+ 12 2

22

¢q2121+ 2

222+ 2

ee0 =

¡01 1

21+ 02 2

22

¢q2121+ 2

222+ 2

= q2121+ 2

222+ 2 + 2

In general, these expressions are not zero. Bias can be in-creased or decreased.

85

When 2 = 0we can perfectly predict . (Will pass a goodness-of-fit criterion)Thus for

2 2 0

It follows that

lim0Pr ( = 1| + ) = 1 for

lim0Pr ( = 1| + ) = 0 for

Assumption (M-2) is violated and matching breaks downMaking 2 arbitrarily small, we can predict arbitrarily well.Can improve over the fit with ( 1 2) in the set which producesno bias.

86

�����

@������ � �� ��������%� ������� ���

E�������� � -����� .����%� �����)�� )����%��� ���� -��(�� �� �� ��� ���

!!���H ����� ������ ��!��� ��!���

�� �����H ������ ���!�� ������ ������

�� �� ��� �H �� � ������ ������ ������

�� �����H ������ ���!�� ������ �� ���

�� �����H ��� �� ������ �� � � �� ���

�����I

� � �� � �� � �� �� � � �� �� �� � � �� �� � � �� ��� � ��� � ����� � �� �� � � �� �� �� � � �� �� � � �� ������ � �� � ����� � �� �� � � �� ��

�� � � �� ���� � � �� ��

87

A More General Example

Considers use of a proxy regressor

= + 1 1 + 2 2 + +

• ( 1 2 ) ;

• ( 1 2 ) has mean zero

• 1 2, and ( 1 2) ( );

• possibly dependent on in the latent variable gener-ating the treatment choice

• is measurement error.

88

• For di erent levels of dependence between and , anddi erent weights on 1 2 and on the scale of measure-ment error, can be a better predictor of than 1 2

or even 1 2 .

• However, in general, ( 1 0)Á | because is an im-perfect proxy for the combinations of 1 and 2 entering1 and 0.

• Thus conditioning on can produce a better fit forbut greater bias for the treatment parameters.

89

Consider the following example where is an outcome andis an index

= +

= +

where

( )

Obviously | .

90

• Suppose instead that we have a candidate conditioningvariable = + + .

• Suppose that all variables are normal with zero mean andare mutually independent.

• Then we may write= +

where

=2 +

2 2 + 2 + 2

It is assumed that is independent of all other errorcomponents on the right-hand sides of the equations for

and .91

• From normal regression theory we know that condition-ing is equivalent to residualizing.

• Constructing the residuals we obtain= (1 ) + ( + )

• By a parallel argument= (1 ) + ( + )

| requires that and be uncor-related, which in general does not happen.

• Letting the dependence between and get large, andsetting to suitable values, we can predict better (inthe sense of 2) with than with .

92

• Letting = 1( 0) produces a simple version of theexample because better prediction of produces betterprediction of .

93

Appendix

Consider a general model of the form:

1 = 1 + 1

0 = 0 + 0

= ( ) +

= 1 if 0 = 0 otherwise

= 1 + (1 ) 0

94

where

( 1 0 )0 (0 )

( ) = 2

( ) =

= 0; = 1

( 1 ) = 1

( 0 ) = 0

Let (·) and (·) be the pdf and the cdf of a standard normalrandom variable. Then, the propensity score for this model isgiven by:

Pr ( 0| ( )) = ( ( )) = Pr ( ( )) =

= 1

μ( )

¶=

95

so( )

= 1 (1 )

Since the event³

S 0 ( ( )) =´can be written as

S ( )

S 1 (1 )

we can write the conditional expectations required to get thebiases as a function of .

96

For 1 :

( 1| 0 ( ( )) = )

=1

μ| ( )

( ( )) =

=1

μ| 1 (1 )

¶= 1 1( )

97

( 1| = 0 ( ( )) = )

=1

μ| =

( )( ( )) =

=1

μ| = 1 (1 ) ( ( )) =

¶= 1

1 (1 )

where

1 =1

Similarly for 0 :

( 0| 0 ( ) = ) = 0 1( )

( 0| 0 ( ) = ) = 0 0( )

( 0| = 0 ( ) = ) = 01 (1 )

98

where0 =

0

and

1 ( ) =( 1 (1 ))

0 ( ) =( 1 (1 ))

(1 )

are inverse Mills ratio terms.Substituting these into the expressions for the biases

Bias ( ) = 0 1( ) 0 0( )

= 0 ( )

99

Bias ( ) = 1 1( ) 0 0( )

= ( ) ( 1 (1 ) + 0 )

Bias = 1 1( ) 0 0( )

11 (1 ) + 0

1 (1 )

= ( ) ( 1 (1 ) + 0 )1 (1 ) [ 1 0]

where

( ) = 1( ) 0( ) =( 1(1 ))

(1 )

100

top related