the - diw.de · tor of the treatmen t e ect. the timing of ev en ts con v eys useful information on...

The Non-Parametric Identi�cation of

Treatment E�ects in Duration Models

Jaap H. Abbring �

Gerard J. van den Berg y

Revised version, December 28, 2001

Abstract

This paper analyzes the speci�cation and identi�cation of causal bivariate duration

models. We focus on the case in which one duration concerns the point in time

a treatment is initiated and we are interested in the e�ect of this treatment on

some outcome duration. We de�ne \no anticipation of treatment" and relate it to a

common assumption in biostatistics. We show that (i) no anticipation, and (ii) ran-

domized treatment assignment can be imposed without restricting the observational

data. We impose (i) but not (ii) and prove identi�cation of models that impose some

structure. We allow for dependent unobserved heterogeneity and we do not exploit

exclusion restrictions on covariates. We provide results for both single-spell and

multiple-spell data. We also develop an intuitive strati�ed partial likelihood estima-

tor of the treatment e�ect. The timing of events conveys useful information on the

treatment e�ect.

�Department of Economics, Free University Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam,

The Netherlands. Phone/fax: +31-20-444-6030/6005. Email: [email protected] of Economics, Free University Amsterdam, Tinbergen Institute, CEPR, and IFAU

Uppsala. Email: [email protected].

Keywords: program evaluation, bivariate duration analysis, selectivity bias, hazard rate, partial

likelihood, unobserved heterogeneity, anticipation.

JEL codes: C14, C31, C41.

Thanks to Jim Heckman, John Kennan, Tony Lancaster, Arthur Lewbel, Geert Ridder, Michael

Visser, Ed Vytlacil, the editor and anonymous referees for useful comments and suggestions. Also thanks

to seminar participants at University College London, NYU, IFAU Uppsala, University of Chicago,

University of Iowa, INSEE-CREST, Tinbergen Institute Amsterdam, and participants at various

conferences. Jaap Abbring acknowledges �nancial support by the Royal Netherlands Academy of Arts

and Sciences.

1 Introduction

This paper discusses the speci�cation and identi�cation of causal bivariate duration mod-

els in economics. Consider a subject in a certain state. After a stochastic amount of time,

the subject leaves this state. A di�erent type of event may occur at some other random

time. We are interested in the causal e�ect of the latter event on the duration in the state

of interest. Examples include the e�ect of training programs or punitive bene�ts reduc-

tions on unemployment durations, the e�ect of the hiring of replacement workers on strike

durations and the e�ect of promotions on tenure. In biostatistics one is often interested in

the e�ect of a speci�c medical treatment. We borrow this terminology and refer to training

programs, punitive bene�ts reductions, etcetera, as \treatments", to their causal e�ects

on the outcome durations as \treatment e�ects" and to the subjects as \individuals".

At the core of the paper is a discussion of the causal relation between two durations,

the outcome duration Y and the treatment time S. A duration is simply a non-negative

random variable and it may seem that we can apply standard models for relating any

two non-negative random variables. However, durations are special in the sense that they

are inherently dynamic objects. This can be formalized by considering the corresponding

counting processes. In our case, we have an outcome process fNY (y); 0 � y < 1g such

that NY (y) = 0 if y < Y and NY (y) = 1 if y � Y and a similarly de�ned treatment process

fNSg. A model in terms of these processes, rather than the corresponding durations,

facilitates an explicit discussion of the dynamic e�ects of treatment. A practical matter

is that treatment may not be observed, or even well-de�ned, after the subject has left the

state of interest. This is more easily discussed in an explicitly dynamic model. The same

is true for censoring and time aggregation.

We specify this model in terms of the corresponding hazard rates. The outcome hazard

�Y (yj�) at time y is the intensity of leaving the state of interest at y given that the

subject is still in this state at y and conditional on the treatment history and other

relevant conditions (which are represented by a \�"). The hazard rate is the focal point

and basic building block of econometric duration models. Dynamic economic theories

often aim to explain the rate of leaving a state of interest at y in terms of economic

behavior and external conditions (see Van den Berg, 2001, for a survey). An analogy with

discrete choice models may clarify this. Let NY (y�) := limt"yNY (t). The outcome process

increments dNY (y) := NY ((y + dy)�) � NY (y�) form a sequence of binary choices. We

have dNY (y) = 1 if Y 2 [y; y + dy) and dNY (y) = 0 if y 62 [y; y + dy). The hazard can

be interpreted by the heuristic �Y (yj�)dy = Pr(dNY (y) = 1jNY (y�) = 0; �) (see Fleming

and Harrington, 1991). This is the continuous-time equivalent of a conditional choice

probability.

We construct a causal model that speci�es how the outcome varies if we hypothetically

manipulate the treatment time. The model also speci�es a (not necessarily randomized)

treatment process. Biostatistical studies routinely impose a \consistency" condition that

1

can be interpreted as claiming that each cause should precede its e�ect (e.g. Robins, 1998,

and Lok, 2001). Formally, it requires that the outcome paths corresponding to any two

treatment times coincide up to a duration y if the treatment paths are the same up to

y. In more familiar terms, the subject should be in the same outcome state at time y

if both treatments start after y. We impose a weak version of this condition that only

requires that the corresponding hazard paths coincide up to y. In economics this embodies

a substantial informational and behavioral assumption. It requires that individuals either

have no access to information on treatment beyond that encoded in the treatment history

fNS(t); 0 � t < yg or simply do not act on that information, at each time y. Otherwise,

there are anticipatory e�ects of future treatment and our consistency condition is violated.

Therefore, we refer to this condition as a \no anticipation" assumption.

Rational forward-looking individuals may be expected to exploit information on the

(perceived) properties of the treatment assignment process. In that case, changes in these

properties may a�ect the hazard rate out of the state of interest before any actual real-

ization of a treatment. Our model framework enables the speci�cation and interpretation

of such e�ects. Our analysis, however, focuses on contrasting outcomes between di�erent

realized treatment times for a given assignment mechanism. This is already an ambi-

tious task. We prove a basic non-identi�cation theorem that states that no-anticipation

and randomized assignment of treatment can be imposed on the causal model without

restricting the observational data. This highlights two inference problems. First, we can-

not identify anticipation e�ects of treatment. Second, we have the standard econometric

selection problem. Without further assumptions, we cannot distinguish between causal

e�ects and selection e�ects. In the identi�cation analysis we take the no-anticipation as-

sumption as fundamental and we only address the selection issue. It should be judged in

each application whether it makes sense to assume that the observed treatment cannot

be anticipated.

Identi�cation requires further structure. We build on the assumption that all selection

e�ects can be captured by related observed and unobserved covariates in the treatment and

outcome processes. We examine single-spell and multiple-spell settings. Single-spell data

provide information on a single outcome spell and possibly treatment for each individual.

We prove identi�cation of two bivariate duration models that allow for time-varying and

heterogeneous treatment e�ects and that borrow structure from the well-known non-

parametric1 mixed proportional hazard (MPH) model. MPH models are the most popular

reduced-form duration models in econometrics (see Van den Berg, 2001, for a survey).

With multiple-spell data we observe multiple outcome spells, with or without treat-

ment, for each individual or within groups of individuals who share the same unobserv-

able characteristics. We prove identi�cation of multiple-spell model generalizations. In

particular, we can allow for general non-proportionalities. The single-spell results concern

1In Subsection 3.2 we justify this terminology.

2

bivariate models that embed a marginal model for the treatment process, or a \selection

equation". For the multiple-spell case we show that it is suÆcient to specify the condi-

tional outcome model in order to identify treatment e�ects. As a by-product, we derive a

simple and intuitive strati�ed partial likelihood (SPL) estimator of the treatment e�ect.

This paper contributes to the extensive literature on the econometric evaluation of

social programs and treatment e�ects (see Heckman, LaLonde and Smith, 1999, for an

overview). This literature provides relatively little insight in problems with dynamically

assigned treatments and duration outcomes. We provide some guidance in tackling such

problems, which are frequently encountered in empirical research. We also complement the

closely related biostatistical literature (e.g. Robins, 1998). This literature allows for more

complex dynamic treatment regimes, but is not tailored to economic problems and data.

We o�er a dynamic economic perspective on the causal model and identi�cation results

for models with unobserved selection e�ects that cannot be found in the biostatistical

literature.

Our identi�cation results also extend the existing literature on the non-parametric

identi�cation of MPH-type duration models (see Elbers and Ridder, 1982, Heckman and

Singer, 1984, Heckman and Honor�e, 1989, Honor�e, 1993, and the surveys in Heckman

and Taber, 1994, and Van den Berg, 2001). The conditional outcome parts of the models

can be interpreted as univariate duration models with an endogenous time-varying re-

gressor, the treatment process, where the endogeneity originates from a dependence on

unobserved characteristics. We demonstrate that these models are identi�ed under similar

assumptions as those in the literature.

The empirical literature contains studies in which models are estimated that are similar

to the models we consider. For example, Eberwein, Ham and LaLonde (1997), Abbring,

Van den Berg and Van Ours (1997), and Van den Berg, Van der Klaauw and Van Ours

(1998) study the e�ect of a treatment of unemployed workers on the transition rate from

unemployment to work, allowing for a causal e�ect on the transition rate as well as for

related unobserved heterogeneity in order to deal with selectivity. Lillard (1993) estimates

a model for the joint durations of marriage and time until conception of a child, and

his model allows the rate at which the marriage dissolves to shift to another level at

moments of child birth. Lillard and Panis (1996) estimate a model on the joint durations

of marriage, non-marriage, and life, and their model allows the death rate to shift to

another level at moments of marriage formation and dissolution. These studies also allow

for related unobserved heterogeneity.

We do not impose exclusion restrictions on covariates, i.e. we do not require that

the data contain a variable that a�ects the treatment assignment but does not a�ect the

outcome of interest other than by way of the treatment. A variable that is observed by the

analyst is often also observable to the individuals under consideration. If such a variable

a�ects the treatment process then a rational individual will take this variable into account

in determining his optimal strategy. This behavioral response in turn a�ects the rate at

3

which the individual leaves the state of interest. As a result, exclusion restrictions are

diÆcult to justify and instrumental variable methods of inference are not likely to be of

help. A practical issue is that the latter methods are often not tailored to our dynamic

framework. One may be tempted to time-aggregate the treatment process into a binary

indicator for treatment in an interval and apply standard methods for binary treatments.

However, this way we would lose track of the parameters of interest and throw away

information that we need for identi�cation.

We also avoid conditional-independence assumptions, which require that all selectivity

in the assignment of treatment can be controlled by conditioning on observed covariates.

In many econometric applications, unobserved heterogeneity in treatment assignment and

outcomes cannot be excluded. By the argument against instrumental variables assump-

tions above, any unobserved heterogeneity in treatment assignment is likely to imply

unobserved selectivity and violations of conditional independence.

Finally, we show that multiple-spell data are similar to linear panel data. The intuition

for identi�cation in linear panel-data models carries over to our non-linear models.

The outline of the paper is a follows. Section 2 develops the causal framework and the

basic non-identi�cation result. Sections 3 and 4 derive the identi�cation results for the

single-spell and multiple-spell cases, respectively. Section 5 provides discussion. We pro-

vide some economic-theoretical justi�cation of the proportional hazard model structure.

We also discuss the applicability of the models as reduced form model frameworks, with

extensive examples. Finally, we review the identi�cation from the perspective of more

conventional latent variable and panel-data models. The Appendix contains the proofs.

2 A framework for causal inference

2.1 A potential outcome framework

Consider a population of individuals owing into a state of interest and the durations

these individuals subsequently spend in that state. We are interested in the causal e�ect

of a single binary treatment that is either assigned at some time in R+ := [0;1) after

entering the state or not assigned at all. We can cast this problem in the standard potential

outcome framework pioneered by Neyman (1923) by recognizing that our dynamically

assigned binary treatment can be reinterpreted as a set of mutually exclusive treatments

indexed by R+ := R+ [f1g. Here, the point1 represents the no-treatment case. To each

treatment s 2 R+ corresponds a random variable Y �(s) � 0, the potential outcome in the

case that we would intervene and assign treatment s. Causal inference is concerned with

contrasting potential outcomes corresponding to di�erent treatments. As the treatments

are mutually exclusive, all but one of the potential outcomes are counterfactual. This

is what Holland (1986) calls the \fundamental problem of causal inference". A solution

is to impose suÆcient structure on the model to allow for causal inference from actual

4

(experimental or observational) data. The inherently dynamic setting of our model and

its application in economics call for a separate discussion.

Unlike Neyman and much of the more recent literature (e.g. Rubin, 1974, Holland,

1986, and Heckman, LaLonde and Smith, 1999), we have to deal with in�nitely| actually

uncountably| many potential outcome variables.2 To facilitate a careful discussion, we

explicitly introduce the probability space (;F ;P) on which all random variables are

de�ned. Treatments are assigned according to an R+-valued random variable S. The

collection fY �g := fY �(s); s � 0g of all potential outcomes is a R+ -valued stochastic

process, which we require to be measurable. The actual outcome is Y := Y �(S); all other

potential outcomes are counterfactual. Measurability of fY �g ensures that Y is a random

variable (by application of Billingsley, 1995, Theorem 13.1).

Now consider a sample of n individuals, named 1; 2; : : : ; n. We assume that potential

outcomes and treatments (fY �i g; Si) are independent across individuals i, and distributed

like (fY �g; S) above. For any individual i, we only observe the outcome Yi corresponding to

the actual treatment Si, and we cannot contrast this outcome to any of the counterfactual

outcomes. This is Holland's fundamental problem of causal inference again. We are forced

to tone down our ambitions, and focus on inference of some causal parameters, for example

the means �(s) := E [Y �(s)], instead. For now, suppose that �(s) <1 for all s 2 R+ and

that we have been able to ensure randomized assignment, so that fY �g??S.3 Then, we have

that �(S) = E [Y jS] almost surely. So, under some additional smoothness assumptions,

we can estimate � by standard non-parametric regression techniques.

2.2 Dynamics and information

By adopting Neyman's static framework we have ignored the fact that our problem is

inherently dynamic. To facilitate an explicit discussion of dynamics, we assume that Y �(s)

is continuous and focus on its hazard rate, which is given almost everywhere by

�Y �(s)(y) = limdy#0

Pr(Y �(s) 2 [y; y + dy)jY �(s) � y)

dy:

The corresponding integrated hazards are �Y �(s)(y) =R y0�Y �(s)(u)du <1 for all y 2 R+ ,

with limy!1�Y �(s)(y) = 1. It is well-known, and easy to check, that �Y �(s)(Y�(s))

2Freedman (2001) and Gill and Robins (2001), however, discuss potential outcome models with con-

tinuous treatments. Also, structural models routinely allow for causal e�ects of continuous variables. See

Heckman (2000) for a review of structural models and causality in economics and Pearl (2000) for a

discussion of the link between structural models and potential outcome models.3Note that at this stage we have not speci�ed a model, so that we can not formulate randomized

assignment in a di�erent way (see Heckman, LaLonde and Smith (1999) for binary analogues). If fY �g

is not independent of S then there is selectivity. Of course, the issue whether fY �g is independent of S

or not is entirely di�erent from the issue whether Y �(s) varies with s. In the latter case there is a causal

e�ect.

5

has a unit exponential distribution. This suggests the following causal model. Explicitly

including the sample point ! as an argument, let

Y �(s; !) = ��1Y �(s) (E(s; !)) ;

with ��1Y �(s)(�) = inffy 2 R+ : �Y �(s)(y) � �g the generalized inverse of �Y �(s) and E(s; �),

s 2 R+, unit exponential random variables. Then, it is easy to derive that Y �(s; �) is

indeed a continuous random variable with integrated hazard �Y �(s), for given s 2 R+.

We could use the following causal interpretation. First, nature selects an ! according

to the model (;F ;P). Then, we intervene and set s, and thus �Y �(s) and E(s; �). Nature

does not re-randomize after our intervention, so ! is �xed throughout the experiment.

As the distribution of E(s; �) is invariant to our choice of s, the potential integrated

hazards �Y �(s) have a causal interpretation. We deliberately avoid imposing invariance

of the actual realization E(s; !), so that we are not forced to specify joint distributions

of counterfactual outcomes. This is consistent with our goal of only inferring potential

integrated hazards �Y �(s), but not realized outcomes Y �(s; !).

Now that we have explicitly stated the causal structure of our model and our lack

of interest in fully specifying joint distributions of counterfactuals, explicitly allowing for

dependence of E on s and carrying along the argument ! are unnecessarily cumbersome.

There is no substantial loss in simply specifying the model as Y �(s) = ��1Y �(s)(E), with E a

unit exponential random variable.4 By analogy to Freedman (2001), we give a causal inter-

pretation to �Y �(s) or, similarly, the implied stochastic kernel K(s; A) = Pr(��1Y �(s)(E) 2

A) for s 2 R+ and A 2 F . If we would again have randomized assignment and observe the

distribution of Y conditional on S, potential integrated hazards can be estimated using

standard hazard regression techniques (see e.g. Fleming and Harrington, 1991).

We are now ready to discuss more substantive dynamic issues. Heuristically, �Y �(s)(y)dy

is the probability that the outcome spell ends in a small interval [y; y + dy), conditional

on survival up to y and under treatment s. For each treatment s 2 R+, introduce a

dynamic treatment indicator fNs(y); 0 � y < 1g such that Ns(y) = 0 if y < s and

Ns(y) = 1 if y � s. Then, fNSg is a counting process that tracks the actual treatment

status over time. An issue that is central to a behavioral science like economics is the dy-

namic accumulation of information concerning treatment by the individual. As a routine

\consistency" condition, biostatisticians typically impose that the outcomes up to some

time y only depend on the treatment history fNs(u); 0 � u < yg, for a given assigned

treatment s 2 R+ (Robins, 1998, and Lok, 2001). A weak version of this condition re-

quires that hazard rates for any two treatments s; t 2 R+ coincide almost everywhere up

to the �rst of the two treatments, minfs; tg. This consistency condition seems natural, by

simply requiring that cause precedes e�ect. However, in economics outcomes are typically

4We can ensure that Y � is measurable by requiring that (s; e) 7! ��1Y �(s)(e) is measurable (again by

Billingsley, 1995, Theorem 13.1).

6

a�ected by the behavior of individuals acting on information about treatments. The po-

tential outcome model discussed so far is incomplete because it only discusses variation

in actual treatments and does not specify how information on the treatment accumulates

to the individuals over time. The consistency condition above can be interpreted as the

informational and behavioral assumption of \no anticipation",

Assumption 1. (no anticipation) For all s; t 2 R+,

�Y �(s)(y) = �Y �(t)(y) for all y � minfs; tg:

A suÆcient condition for Assumption 1 is that the individual's information at any time y

only varies with s through the treatment history fNs(u); 0 � u < yg whenever we would

intervene and manipulate s. Subsection 2.4 discusses a clarifying structural example.

Suppose that individuals are informed about the treatment time at the start of the

spell. If individuals act on this information and their actions a�ect outcomes, there are an-

ticipatory e�ects and potential hazards corresponding to di�erent treatments will diverge

from time 0 onwards. Similar but subtler patterns arise whenever individuals dynamically

accumulate treatment-speci�c information that strictly includes the actual treatment his-

tory. An example is that the treatment time is revealed at some �xed interval before the

treatment time. In either of these \anticipation" cases, the consistency condition above

is violated and we seem to end up with violations of the rule that any cause precedes its

e�ect. However, one should properly line up causes and e�ects. The above anticipatory

e�ects are not e�ects of the actual treatment, but of the revelation of information about

the treatment. So, if we extend the potential outcome model so that it recognizes all rel-

evant causes, notably information shocks, consistency should hold. After all, information

shocks cannot be anticipated.

As stated at the top of this section, we focus on a single treatment in this paper. If

we observe the timing of all information shocks, we can use a multiple-treatment exten-

sion of our model that is not hard to envision. In contrast, in the next subsection we

show that it is generally impossible to identify anticipatory e�ects if we do not know

how information about treatment evolves. Like statisticians, we therefore take a consis-

tency condition like Assumption 1 as primitive. A practical implication is that meaningful

contrasts of potential outcomes to the no-treatment baseline can be de�ned without al-

lowing for defects in the distribution of S. After all, Assumption 1 implies that �Y �(s)

coincides with the no-treatment integrated-hazard �Y �(1) on [0; s] for all s 2 R+ and that

lims!1�Y �(s)(y) = �Y �(1)(y) for all y 2 R+ .

It is important to stress that Assumption 1 does not exclude that forward-looking

individuals act on properties of the treatment process. It only requires that such e�ects

are the same between interventions manipulating the future treatment status. In that

sense, our potential outcome model is a model for contrasting treatments for a given

information structure. It does not allow us to contrast outcomes to outcomes in a world

7

without treatment. One of the contributions of this paper is to make these problems

explicit in a dynamic setting. Heckman, LaLonde and Smith (1999) have addressed similar

issues within the context of the static binary-treatment potential-outcome model. They

di�erentiate between the no-treatment outcome in a world with treatments and outcomes

in a world without treatments. However, considerations like this are usually absent from

the statistical literature.

2.3 A basic non-identi�cation result

So far, we have assumed that the econometrician always observes S and therefore fNSg.

However, in many applications treatment is irrelevant and/or unobserved if it occurs after

the spell of interest has ended. From now on, we explicitly entertain the possibility that

we only observe the treatment history fNS(y); 0 � y < Y g realized at Y . Assume that

S and fY �g are such that Pr(Y = S) = 0 (see also Assumption 2 below). In this case, a

large data set would provide

QS(y; s) := Pr(Y > y; S > s; Y > S) and QY (y) := Pr(Y > y; Y < S) (1)

for all (y; s) 2 R2+ . These are the sub-survival-functions of (Y; S) and Y for the sub-

populations with respectively Y > S and Y < S. So, we have

De�nition 1. Two causal models (fY �g; S) and (feY �g; eS) are observationally equivalentif QS = eQS and QY = eQY , where eQS and eQY are de�ned by (1) for eY := eY �(eS) and eS.Note that the distribution of the identi�ed minimum of (Y; S), i.e. the smallest of Y and

S joint with the identity of this smallest duration, is fully characterized by (Q0S; QY ), with

Q0S(s) = QS(�1; s) for all s 2 R+ (Tsiatis, 1975). We will occasionally exploit that our

models of (QS; QY ) embed competing risks models for (Q0S; QY ).

In the Appendix we show that any data pair (QS; QY ) can be supported by a causal

model that satis�es Assumption 1 and randomized assignment.5 This implies

Proposition 1. To each causal model (fY �g; S) corresponds an observationally equiva-

lent model (feY �g; eS) that satis�es Assumption 1 and randomized assignment (feY �g??eS).Proposition 1 provides two insights. First, without further structure, we cannot distin-

guish between causal e�ects and spurious selection e�ects. This is the standard selection

problem. Second, we cannot distinguish between potential outcome models with and with-

out anticipation (as de�ned by Assumption 1). Throughout this paper we assume that

Assumption 1 is true, and we focus on separating causal and spurious e�ects. So, we

address the �rst non-identi�cation problem, but not the second.6

5Gill and Robins (2001) provide a related result for a di�erent model.6Alternatively, it is easy to derive non-parametric identi�cation results under the assumption of ran-

domized assignment, but without restrictions on anticipatory e�ects. This is beyond the scope of this

paper.

8

We impose the necessary structure by explicitly modeling selection e�ects as related

e�ects of observed and unobserved (to the econometrician) covariates on outcomes and

assignment. We rely on

Assumption 2. For some random vectors X of observed and V of unobserved covari-

ates, fY �g??Sj(X; V ) and the distribution of (Y; S)j(X; V ) is absolutely continuous with

respect to Lebesgue measure on R2+ .

Assumption 2 requires that all selection e�ects can be captured by a �nite-dimensional

vector of covariates. It is reminiscent of a standard conditional-independence assumption,

but it allows for conditioning on both observed and unobserved covariates. The covari-

ates should include all joint determinants of outcomes and assignment, including any

information that triggers relevant behavioral responses. The restriction to time-invariant

covariates is unnecessarily strong for the exposition of the causal model and we could eas-

ily extend the analysis to external time-varying covariates.7 However, in our identi�cation

analysis we will only deal with time-invariant covariates, which is in line with most of the

literature on the identi�cation of duration models (see Heckman and Taber, 1994, and

Van den Berg, 2001, for surveys). The main reason to restrict attention to time-invariant

observed covariates in this literature and our paper is expositional convenience. Results

by Honor�e (1991) and Heckman and Taber (1994) suggest that identi�cation often bene-

�ts from any additional variation of the observed covariates over time. In this sense, our

analysis provides a baseline that can be enriched by extensions to observed time-varying

covariates. Note that, in contrast, we cannot deal with time-varying unobserved covariates

(unless the covariate paths are common within strata; see Section 4). In the single-spell

case we are e�ectively dealing with cross-sectional data. Identi�cation of models with

time-varying unobservables is an unreasonable goal to pursue and would take us well

beyond the current practice in empirical research.

We are now ready to present the causal model that underlies our identi�cation anal-

ysis in the following sections. Throughout, we assume that we have access to vectors

of covariates X and V as in Assumption 2. We focus on inference conditional on the

observed covariates X. Assumption 1 is assumed to hold conditional on (X; V ). In the

sequel, it is understood that we mean this slightly stronger assumption if we refer to As-

sumption 1. Denote the integrated Y �(s)-hazard conditional on (X; V ) by �Y �(s)(tjX; V ).

Let ��1Y �(s)(�jX; V ) = inffy 2 R+ : �Y �(s)(yjX; V ) � �g for s 2 R+. We have that

�Y �(s)(Y�(s)jX; V ) is unit exponential and independent of (X; V ). It is also independent

of S by Assumption 2. So, we can now specify

Y �(s) = ��1Y �(s) (EjX; V ) ;

7See Kalb eisch and Prentice (1980, Section 5.3), Heckman and Taber (1994), and Abbring (2001a,

Section 3.2) for discussions of external covariates in survival analysis.

9

with E a unit exponential random variable such that E??(S;X; V ). A causal interpreta-

tion of �Y �(s) follows as before, with the additional requirement that (X; V ) is invariant to

the intervention. Nature sets (X; V ) and E by picking a sample point from according

to (;F ;P). (X; V ) represents heterogeneity inherent and revealed immediately to the

individuals. E is an ex post random shock on which information only evolves naturally as

time proceeds. In this sense we allow for randomness at the individual level like Freed-

man (2001) and unlike for example Holland (1986). Because (X; V ) is invariant to the

intervention, it preserves the perceived assignment process. At each point in time t, the

model allows individuals to form expectations based on the covariates (X; V ) and the as-

signment and transition histories. In equilibrium, these expectations should be consistent

with actual assignment, Sj(X; V ).

Assumptions 1 and 2 provide some additional structure, but not enough to ensure iden-

ti�cation. After all, Assumption 2 allows for selective assignment through the unobserved

covariates V . By Proposition 1 (applied conditional on X) there exists an observationally

equivalent model such that fY �g??SjX. Rather than providing identi�cation directly,

Assumptions 1 and 2 are meant to provide a point of departure for a variety of identi-

fying assumptions. With single-spell data, further (MPH-like) separability assumptions

are unavoidable. This is explored in Section 3. In Section 4, we show how multiple-spell

data can be used to control for selection on unobservables in a much more exible model

framework.

We have focused on causal e�ects of treatment on outcomes and not the other way

around. This asymmetry is not important for our results. We could extend our analysis to

a simultaneous causal model for S and Y by adding a \potential treatment process" that

satis�es a consistency condition like Assumption 1. By analogy to the interpretation of

Assumption 1, we could interpret the new consistency condition as excluding anticipatory

e�ects of the outcome on treatment. As a practical matter, the consistency conditions

would imply a convenient recursive structure and guarantee the absence of coherency

problems. They would also provide a foundation for an asymmetric analysis with the

above model in the case that we only observe fNS(y); 0 � y < Y g and not S itself.

2.4 Example: the e�ect of time-varying bene�ts on the unem-

ployment duration

In this subsection we illustrate the non-identi�cation result in Proposition 1 in the con-

text of a speci�c economic-theoretical (job search) model framework. The treatment is a

reduction of the bene�ts level of an unemployed individual at time S and the outcome Y

is the unemployment duration. We contrast two job search models. Both models assume

rational expectations, but in the �rst model individuals have perfect foresight concerning

the realization of S whereas in the second model they only know the stochastic assign-

ment rule. In other words, in the �rst model there is full anticipation whereas in the

10

second there is no anticipation. We demonstrate that the models can be observationally

equivalent. The Appendix contains derivation details.

To shape thoughts, let the bene�ts reduction be a punishment in response to a violation

of minimum requirements on search e�ort (see Grubb, 1999, for a review of such sanction

policies in OECD countries). In reality, the realization of S may not be known to anyone

in advance, for example because of imperfect monitoring and discretionary behavior of

the case worker. Alternatively, individuals may anticipate a sanction if they are warned

in advance or if they willingly choose to behave such that they know they will receive

a sanction at realization s of S. Typically, the bene�ts reduction is not observed if it

is realized after the end of the unemployment spell (Abbring, Van den Berg and Van

Ours, 1997). This is what we assume here as well. To facilitate the exposition, we denote

potential outcomes by Y � in both models and we assume that assignment is randomized

(i.e. that S??fY �g). Of course, this does not mean that Y �(s) does not depend on s.

In both models, job o�ers arrive at a Poisson rate. An o�er is an independent draw from

a wage o�er distribution. Once rejected, job o�ers cannot be recalled and, once accepted,

individuals are employed forever at the accepted wage. Individuals aim to maximize their

expected discounted lifetime utility by choosing search e�ort and deciding which o�ers to

accept. The latter decision can be characterized by a reservation wage, which typically

depends on all parameters of the theoretical model. The hazard rate �Y �(s) (which is the

exit rate to work in the case that S = s) equals the product of the corresponding job o�er

arrival rate and the probability that a wage o�er exceeds the reservation wage.

The perfect-foresight model follows Mortensen (1977) and Van den Berg (1990).8 For

given s, the hazard rate of Y �(s) is increasing up to the point y = s, after which it

is constant. It is continuous at the point y = s. By choosing a suitable non-degenerate

distribution for S, the observable pre-sanction hazard rate (mixed over all possible realiza-

tions of S) can be made constant over time. This illustrates the principle that unobserved

heterogeneity mitigates positive duration dependence e�ects on an observable hazard rate.

In the model without anticipation (Abbring, Van den Berg and Van Ours, 1997), the

hazard rate of Y �(s) is constant before and after s, but it jumps upward at s. We now

choose another particular non-degenerate distribution for S. The pre-sanction hazard does

not depend on s, so the observable pre-sanction hazard rate (mixed over all possible real-

izations of S) is now also constant over time. In addition, the other observable quantities

can be made to equal those in the �rst model.

Basically, in the perfect-foresight model anticipation e�ects have to be integrated with

respect to the distribution of possible future sanction times in the cases in which no

sanction is imposed during the outcome spell. The resulting shape of the hazard rate can

also be generated by a model in which individuals do not anticipate and all face the same

8It is usually applied to analyze the e�ect of �nite bene�ts entitlement periods. In that case, S is

typically always observed by the econometrician. Here we use the model as an easy-to-analyze extreme

version of a sanctions model with anticipation.

11

sanction rate. This problem is a manifestation of the problem of distinguishing duration

dependence and heterogeneity.

3 Single-spell data

3.1 Model framework

Single-spell data ideally provide the joint distribution of (Y; fNS(y); 0 � y < Y g), condi-

tional on observed covariates X. In obvious notation, suppose we are given this data in the

form of a collection fQS; QY g := f(QS(�jx); QY (�jx)); x 2 Xg of conditional sub-survival

functions. Here, X � Rk , 1 � k < 1, is the support of X. In Section 2, we have seen

that Assumptions 1 and 2 are not suÆcient to identify the causal model from fQS; QY g.

In this section, we show that the model is identi�ed under additional separability (pro-

portionality) assumptions.

Assumption 2 implies that any dependence between Y and S conditional on (X; V )

stems from causal e�ects of S on Y . Unobserved selection e�ects are fully captured by the

unobserved covariates V . So, it is convenient to specify the model as a mixture of the joint

distribution of (Y; S)j(X; V ) over the distribution of V jX. In turn, the joint distribution of

(Y; S)j(X; V ) is the product of the distributions of Y j(S;X; V ) and Sj(X; V ). So, fQS; QY g

is fully speci�ed by

Model 1a. The hazard rates of Y j(S;X = x; V ) and Sj(X = x; V ) are given by

�Y (tjS; x; V ) =

(�Y (t)�Y (x)VY if t � S

�Y (t)�Y (x)Æ(tjS; x)VY if t > S(2)

and

�S(tjx; V ) = �S(t)�S(x)VS; (3)

respectively. V is an R2+-valued random vector (VY ; VS)

0 distributed with distribution G

independent of x. The following regularity conditions and normalizations hold.

(i). �Y and �S are continuous functions �S : X ! (0;1) and �Y : X ! (0;1) such

that �Y (x�) = �S(x

�) = 1 for some a priori chosen x� 2 X .

(ii). The functions �Y : R+ ! (0;1) and �S : R+ ! (0;1) have integrals

�Y (t) :=

Z t

0

�Y (�)d� <1 and �S(t) :=

Z t

0

�S(�)d� <1

for all t 2 R+ . For some a priori chosen t� 2 (0;1), �Y (t�) = �S(t

�) = 1.

(iii). G is such that Pr(V 2 (0;1)2) > 0.

12

(iv). Æ : R2+ � X ! (0;1) is such that

�(tjs; x) :=

Z t

s

Æ(� js; x)d� <1 and �(tjs; x) :=

Z t

s

�Y (�)Æ(� js; x)d� <1

exist and are continuous on f(�; �) 2 R2+ : � > �g � X .

Equations (2) and (3) have mixed proportional hazard (MPH) speci�cations, except for

the factor involving Æ. An assumption implicit in equation (2) is that Y??VSj(S;X; VY ).

Similarly, equation (3) embodies the assumption that S??VY j(X; VS). One could say that

VY captures the \unobserved heterogeneity" or \frailty" in Y and VS captures the unob-

served determinants of S. The factors �Y (x) and �S(x) are called the \systematic parts"

of the hazards. In applied duration analysis, it is common to specify �i(x) = exp(x0�i),

so that the hazard function is multiplicative in all separate elements of x. Note that we

allow the elements in X to be discrete and/or continuous.

Conditional on (X; V ), the variables Y and S are only dependent through Æ(tjS;X).

Thus, by Assumption 2, this factor can be given a causal interpretation as the \treatment

e�ect" of S on Y . Note that Æ(tjS; x) only enters �Y (tjS; x; V ) at durations t > S. So,

Model 1a satis�es Assumption 1.9 If Æ = 1, treatment is ine�ective. On the other hand,

suppose that Æ equals some constant larger than one. If S is realized then �Y increases by

(Æ� 1) � 100%. This stochastically reduces the remaining outcome duration in comparison

to the case where the treatment is given at a later point of time. More generally, Model 1a

allows the treatment e�ect to depend on elapsed duration t, the treatment time S and the

observed covariatesX. By implication, it may also vary with the time t�S since treatment.

If the treatment itself takes some time, the e�ect of t� S may for instance capture a low

exit rate during treatment. In Subsection 3.3, we discuss an alternative model that allows

Æ to depend on a third unobserved component but not on the treatment time.

Finally, the functions �Y and �S are called the \baseline hazards". The corresponding

hazard rates are said to be duration dependent if the functions �Y and �S are non-

trivial. The functions �Y , �S and � appear naturally whenever we integrate the hazard

rates over their �rst (duration) argument. We do not require that limt!1�Y (t) = 1,

limt!1 �S(t) =1 or limt!1�(tjs; x) =1. Also, we allow VY and VS to have mass points

at 0. Thus, we allow for two sources of defectiveness (see Abbring, 2001b, for discussion).

In particular, this allows for a mass of individuals that never receives treatment.

Recall that, under consistency conditions like Assumption 1, the causal model from

Section 2 can be trivially extended with causal e�ects of the outcome on treatment. Model

1a can similarly be extended by introducing the mirror image of Æ in equation (3). The

9This also ensures that the stochastic process Pr(Y � tjS;X; V ) is measurable with respect to

�(fNS(s); 0 � s < tg; X; V ) and Pr(Y � tjS;X; V ) = Pr(Y � tjfNS(s); 0 � s < tg; X; V ). We may

therefore apply the conventional exponential representation of the survivor function (Fleming and Har-

rington, 1991).

13

resulting symmetric model is an extension with observed covariates, duration dependence

and unobserved heterogeneity of the bivariate exponential model developed by Freund

(1961). An example discussed by Freund is the bivariate distribution of the failure times

of the two engines of a jet. Failure of one engine increases the stress on, and therefore the

failure rate of, the remaining engine. This e�ect is symmetric. In Section 5, we provide

some examples of economic applications of a symmetric model. In such applications, as in

Freund's example, we typically observe not only fQS; QY g, but the full joint distribution

of (Y; S)jX. We need this full distribution to identify causal e�ects in both directions. In

this paper, we focus on the asymmetric case in which we only observe fQS; QY g and the

treatment process fNSg may not even be well-de�ned after Y . This is why we use the

asymmetric terms \treatment" and \outcome" throughout.

3.2 Identi�cation

The components of Model 1a that can be freely varied are �Y , �S, �A, �S, � and G (we

take the support X of X as given). If we specify each of these components, we have fully

pinned down the the data fQS; QY g. Our identi�cation analysis is non-parametric in the

sense that we allow each of the model components to vary in in�nite-dimensional func-

tion spaces, restricted by the regularity conditions of Subsection 3.1 and some additional

assumptions. Let M be the collection of all (�Y ;�S; �A; �S;�; G) that are admissible

under a given set of assumptions. We simply refer to M as the \model" (under a given

set of assumptions and for a given structure as in Model 1a) and to each M 2 M as

a \speci�cation" of the model. Each speci�cation M 2 M maps into exactly one data

collection fQS; QY g. By analogy to De�nition 1, two speci�cations M;fM 2 M are said

to be observationally equivalent if the corresponding data fQS; QY g and f eQS; eQY g are

equal. Thus, we have

De�nition 2. A model M is identi�ed (from fQS; QY g) if observational equivalence of

any two speci�cations M;fM 2 M implies that M = fM .

IfM is identi�ed, then we also say that the components �Y , etcetera, are identi�ed.

Recall that, in the case of Model 1a, � is only de�ned, and only needs to be identi�ed,

on f(�; �) 2 R2+ : � > �g � X . Once we know �, we can identify Æ(tjs; x) for almost all

t > s, for all s 2 R+ and x 2 X . Because Æ only enters the model through the factor

Æ(tjS; x)NS(t�), only this restriction of Æ to f(�; �) 2 R2+ : � > �g � X is relevant. Also

note that we focus on non-parametric identi�cation of the maps �Y and �S. In practice,

one may start o� with a parametric speci�cation of, for example, �Y and require that all

parameters can be recovered from its graph f(x; �Y (x)); x 2 Xg. In the case where �Y (x)

is log-linear in x0�i, this requires that X is not contained in a proper linear subspace of

the k-dimensional Euclidian space Rk .

Now, consider the following assumptions.

14

Assumption 3. f(�Y (x); �S(x)); x 2 Xg contains a non-empty open set in R2 .

Assumption 4. E [VY ] <1 and E [VS ] <1.

Note that Assumption 3 does not impose exclusion restrictions on the structural com-

ponents of the hazards, but does require independent variation in these components. If

�Y (x) = exp(x0�Y ) and �S(x) = exp(x0�S) then it is suÆcient that X contains a non-

empty open set in R2 . Assumption 4 is a common assumption in the analysis of single-spell

duration data with MPH-type models (e.g. Elbers and Ridder, 1982).

The identi�cation analysis in this section exploits that Model 1a embeds a competing

risks model for the distribution fQ0S; QY g of the identi�ed minimum. This embedded

model is a standard MPH competing risks model that does not involve the treatment

e�ect Æ. Conditional on the observed covariates X, the competing risks are only dependent

through the unobserved covariates V . Heckman and Honor�e (1989) study the identi�ability

of similar competing risk models. Here, we rely on the following result of Abbring and

Van den Berg (2001a),

Proposition 2. Under Assumptions 3 and 4, �Y , �S, �A, �S and G in Model 1a are

identi�ed from fQ0S; QY g.

This leads to the following main result of this paper,

Proposition 3. Under Assumptions 3 and 4, �Y , �S, �A, �S, � and G in Model 1a are

identi�ed from fQS; QY g.

The proof of this proposition provides some intuition on what drives the identi�cation

of the treatment e�ect. Consider individuals who receive a treatment at s. The natural

control group consists of individuals whose spells have not ended at s and who have not yet

received a treatment. A necessary condition for a meaningful comparison of these groups

is that there is some randomization in the treatment assignment at s. The model allows

for such randomization because we specify assignment by way of the hazard rate of S. The

integrated hazard rateR S0�S(� jx; VS)d� has a unit exponential distribution independently

of (x; V ) (see e.g. Ridder, 1990, and Horowitz, 1999). Thus, there is a random component

in the assignment that is independent of (x; V ). This component a�ects Y only by way

of the treatment.10 (In Section 5 we discuss the interpretation of this component in some

applications.) However, the presence of this component is not suÆcient for a meaningful

comparison. We still have to deal with a selection e�ect: the unobserved heterogeneity

distribution is di�erent between the treatment and control groups at s. This can be

10It is tempting to interpret this random component in terms of the randomness of the outcome of S at

t that remains if the hazard rate �S(tjx; VS) at t is speci�ed. Consider an individual who has not yet been

given a treatment and whose spell has not yet ended, at t. Basically, in a small time interval [t; t+dt), the

probability of treatment is �S(tjx; VS)dt, and the probability of no treatment is 1� �S(tjx; VS)dt. This is

a Bernoulli trial. Given the value of �S(tjx; VS)dt, its outcome is completely random.

15

corrected by exploiting the information in the competing risks part of the data. The

competing risks part of the model determines the distribution of VY in the two groups at

s and the competing risks part of the data enables the identi�cation of this.

The results in our companion paper Abbring and Van den Berg (2001b) provide some

additional intuition for the case of a constant treatment e�ect Æ. We show that single-

spell data are informative on the sign of ln(Æ) even if there is no variation in X. If

treatment and outcome are typically realized very quickly after each other, no matter

how long the elapsed duration in the state of interest before the treatment, then this is

evidence of a positive causal treatment e�ect. The selection e�ect does not give rise to

the same type of quick succession of events. If there is no such pattern but yet there is a

correlation between the points in time of treatment and outcome then this is evidence of a

dominating selection e�ect. Somewhat loosely, local dependence indicates a causal e�ect

whereas global dependence indicates a selection e�ect. This is not surprising given that

the treatment only works after it has been realized, whereas the unobserved heterogeneity

values a�ect the hazard rates everywhere. This illustrates the usefulness of the information

on the timing of events to assess treatment e�ects. We return to this in Subsection 5.3,

where we make a comparison with di�erence-in-di�erence methods of inference.

3.3 Heterogeneous treatment e�ects

Model 1a excludes unobserved heterogeneity in the treatment e�ect. Recent contributions

to the literature on treatment evaluation are often concerned with treatment e�ects that

may depend on unobservables (see e.g. Heckman, LaLonde and Smith, 1999, for a survey).

The following model allows Æ to depend on unobservables, but, unlike Model 1a, excludes

variation of Æ with S.

Model 1b. The hazard rates of Y j(S;X = x; V ) and Sj(X = x; V ) are given by

�Y (tjS; x; V ) =

(�Y (t)�Y (x)VY if t � S

��(t)��(x)V� if t > S(4)

and

�S(tjx; V ) = �S(t)�S(x)VS; (5)

respectively. V is now a R3+-valued random vector (VY ; V�; VS)

0 distributed with distribu-

tion G� independent of x. The relevant regularity conditions and normalizations of Model

1a hold and

(i). �� : X ! (0;1) is such that ��(x�) = 1 for some a priori chosen x� 2 X .

(ii). The function �� : R+ ! (0;1) has an integral

��(t) :=

Z t

0

��(�)d� <1

for all t 2 R+ . For some a priori chosen t� 2 (0;1), ��(t�) = 1.

16

(iii). G� is such that Pr(V 2 (0;1)3) > 0.

The treatment e�ect,

Æ(tjx; VY ; V�) :=��(t)��(x)V��Y (t)�Y (x)VY

;

indeed depends on unobservables, but not on S. Note that Æ is only well-de�ned for VY > 0

here. We could use the point1 to represent the case VY = 0 but, instead, we focus directly

on identi�cation of �Y , ��, �Y , �� and G�.

First, note that Model 1b embeds the same MPH competing risks model as Model 1a,

so that Proposition 2 again applies. We need some additional assumptions,

Assumption 5. f��(x); x 2 Xg contains a non-empty open interval in R.

Assumption 6. E [V�VS] <1.

These new assumptions are simply standard MPH assumptions (e.g. Elbers and Ridder,

1982) on the embedded model for (Y � S)j(S; Y > S;X = x; V ). After all, this is a MPH

model with hazard ��(y)��(x)V� at time y � S (since S). The �nite-mean Assumption

6 applies to V�VS rather than V� because of the conditioning on S.

Proposition 4. Under Assumptions 3{6, �Y , ��, �S, �Y , ��, �S and G in Model 1b

are identi�ed from fQY ; QSg.

The results in this section require separability assumptions, independence of X and V ,

suÆcient variation in the observed covariates and �niteness of the means of V . Although

it is common to make such assumptions in applied duration analysis, they are sometimes

hard to justify (see Subsection 5.2 for an economic-theoretic perspective on the propor-

tionality assumption). To deal with this, we extend our analysis in two directions. In the

next subsection we show that observed covariates, and most of the assumptions above,

are not needed for identi�cation if we have multiple-spell data. In Abbring and Van den

Berg (2001b) we show that single spells are informative on aspects of Æ and G even if

there is no variation in X.

4 Identi�cation in case of multiple-spell data

4.1 Multiple-spell data

In this section, we study identi�cation from data that cover multiple outcome and treat-

ment spells for each individual. We focus on the case in which the data provide information

on the distribution of two full spells over the population of individuals. The extension to

more than two spells is trivial. Actually, the use of the term \individual" is not very ap-

propriate here, because the setup includes cases in which physically di�erent individuals

17

share the same value of V and we observe one duration for each of these individuals. It is

convenient to refer to either such a group of individuals or a single individual for which

we have multiple spells as a stratum. For each stratum, we observe a random draw from

the joint distribution of (Y1; fNS1(y); 0 � y < Y1g; Y2; fNS2(y); 0 � y < Y2g). We add

subscripts 1 and 2 to random variables and functions corresponding to respectively spell

1 and 2 in a stratum. So, Y1 is the outcome duration in the �rst spell, Y2 the outcome

duration in the second spell in a stratum, etcetera. By analogy to (1), the multiple-spell

data distribution can be characterized by

QSS(y1; y2; s1; s2) := Pr(Y1 > y1;Y2 > y2; S1 > s1;S2 > s2; Y1 > S1; Y2 > S2);

QY S(y1; y2; s2) := Pr(Y1 > y1;Y2 > y2; S2 > s2; S1 > Y1; Y2 > S2);

QSY (y1; y2; s1) := Pr(Y1 > y1;Y2 > y2; S1 > s1; Y1 > S1; S2 > Y2)

and

QY Y (y1; y2) := Pr(Y1 > y1;Y2 > y2; S1 > Y1; S2 > Y2)

for all (y1; y2; s1; s2) 2 R4+ . Throughout this section we suppress observed covariates. The

identi�cation results do not require variation in observed covariates. Moreover, the model

allows for full interaction of such covariates with all other determinants. All results can

be interpreted as being conditional on observed covariates.

Unlike single-spell data, multiple-spell data form a true, non-linear panel. As with

linear panel data, the panel structure can be exploited to deal with unobserved hetero-

geneity under conditions that are mild relative to the single-spell (i.e. cross-sectional)

case. Intuitively, we need some separability of the outcome hazards in treatment e�ect

and unobserved covariate components. Then, if the unobserved components are constant

between consecutive spells of each single individual, variation between spells and within

individuals can be used to control for selection e�ects and identify the treatment e�ects.

In the next subsection, we study the multiple-spell analogues of Models 1a and 1b.

These models allow for di�erent baseline hazards and treatment e�ects between spells

within a stratum, but have multiplicative unobserved heterogeneity in the hazards. In

the vein of the literature on identi�cation of multiple-spell duration models (Honor�e,

1993), we explore identi�ability of fully speci�ed bivariate models. In Subsection 4.3, we

focus on identi�cation of the treatment e�ect in models that only specify the conditional

distribution of the outcome spells. Strati�ed partial likelihood arguments are applied to

prove identi�cation of the treatment e�ects in a model that allows for interactions between

duration e�ects and unobserved heterogeneity.

4.2 Identi�cation of the full model

Consider the following multiple-spell analogues of Models 1a and 1b,

18

Model 2a. Multiple spells within a stratum are only dependent through the covariates V ,

i.e. (Y1; S1)??(Y2; S2)jV . The hazard rates of Ykj(Sk; V ) and SkjV are

�Yk(tjSk; V ) =

(�Yk(t)VY if t � Sk

�Yk(t)Æk(tjSk)VY if t > Skand �Sk(tjV ) = �Sk(t)VS; k = 1; 2;

respectively. V := (VY ; VS) has distribution G. Regularity conditions as in Model 1a apply.

For some a priori chosen t� 2 (0;1), �Y1(t�) = �S1(t

�) = 1 (in obvious notation).

and

Model 2b. Same as Model 2a except that

�Yk(tjSk; V ) =

(�Yk(t)VY if t � Sk

��k(t)V� if t > Sk

; k = 1; 2;

and V := (VY ; V�; VS) has distribution G�. For some a priori chosen t� 2 (0;1),

��1(t�) = 1.

The �rst line excludes any causal e�ects across di�erent spells within a stratum. Outcomes

and treatments may be related between spells, but only through common unobservable

components. The normalizations are innocuous because the unobserved covariates can

capture the scale of �rst-period hazards. In both models, the same unobservables enter

the �rst and second period hazards multiplicatively. This suggests that we can exploit the

panel structure of the data, even though the baseline hazards and treatment e�ects are

not speci�ed to be the same in the �rst and second spell of a stratum. Indeed, we have

Proposition 5. �Y1, �Y2, �S1, �S2 and, respectively, �1, �2 and G in Model 2a and

��1, ��2 and G� in Model 2b are identi�ed from (QSS; QY S; QSY ; QY Y ).

The proof exploits that both models embed a two-spell version of the competing risks

model with hazards that are proportional in V and t. Abbring and Van den Berg (2001a)

show that �Y1 , �Y2 , �S1 and �S2 (but not G) in Model 2a or 2b are identi�ed. Unlike the

single-spell case, we do not invoke a competing risks result that also provides identi�cation

of G. This would impose additional conditions on the model that we can circumvent here

by exploiting the additional (non-competing-risk) information in (QSS; QY S; QSY ; QY Y ).

Note that we do not need �nite E [V ]. Also, recall that we implicitly allow observed

covariates X to enter the model in a general way. We could add X as an argument to

each of the model components in Models 1a and 1b and allow V and X to be dependent.

The gain (in terms of identi�cation) from using multiple-spell data is similar to the gain

from using such data for MPH models without treatment e�ects (Honor�e, 1993).

19

4.3 Identi�cation and estimation from a strati�ed partial likeli-

hood

The results in the previous subsection show that multiple-spell data allow for robust

inference on the treatment e�ects under the assumption of proportional unobserved het-

erogeneity that is common within strata. We have provided some intuition by pointing out

the analogy with linear panel regression models with individual-speci�c e�ects. Provided

that there is suÆcient variation of the regressors over time, the \within estimator" can

be used to estimate the regression parameters in such models. This analogy seems to be

tailored to an analysis of the model of Y j(S; V ) only, i.e. without specifying the marginal

distribution of (S; V ). In this subsection, we analyze identi�cation of treatment e�ects in

such a conditional model. We use arguments based on an application of Cox (1972) partial

likelihood to strati�ed data pioneered by Holt and Prentice (1974). This \strati�ed partial

likelihood" (SPL) estimator is indeed the closest we can get to a \within" estimator in a

proportional hazard model.

First, we have to slightly adapt notation. The SPL method allows for general interac-

tions between time and unobserved covariates. Therefore, we replace VY by a (for now)

stochastic process fVY g. It is implicitly understood that fVY g replaces VY in V . We have

Model 2c. Multiple outcome spells within a stratum are only dependent through the co-

variates V and related treatments S1 and S2, i.e. Y1??(Y2; S2)j(S1; V ) and Y2??(Y1; S1)j(S2; V ).

The hazard rate of Ykj(Sk; V ) is

�Yk(tjSk; V ) =

(�Yk(t)VY (t) if t � Sk

�Yk(t)Æk(tjSk)VY (t) if t > Sk; k = 1; 2:

Regularity conditions as in Model 1a apply. In particular �Yk(�)VY (�) is integrable on

bounded intervals. We normalize �Y1 = 1.

The unobserved heterogeneity fVY g only enters hazard rates at t through its current

value VY (t). It is crucial that fVY g is the same between the spells in a stratum. The spell-

speci�c baseline hazards �Y1 and �Y2 allow for di�erent duration dependence patterns

between the spells. The normalization �Y1 = 1 is innocuous, as fVY g can absorb the

duration dependence and scale of �Y1 .

Let Y(1) := minfY1; Y2g be given. Intuitively, the probability that, say, the �rst spell

in the stratum is the shortest, i.e. Y1 = Y(1), does not depend on the factor fVY g that is

common between the spells in the stratum. It only depends on the duration dependence

factor �Y2 and the treatment e�ects, which may be di�erent between the spells if the

20

treatment histories fNS1(�); 0 � � < Y(1)g and fNS2(�); 0 � � < Y(1)g) are. Formally,

Pr(Y1 = Y(1)jY(1);fNS1(�); 0 � � < Y(1)g; fNS2(�); 0 � � < Y(1)g); V )

=Æ1(Y(1)jS1)

NS1(Y(1)�)

Æ1(Y(1)jS1)NS1

(Y(1)�) + �Y2(Y(1))Æ2(Y(1)jS2)NS2

(Y(1)�)

=

8>>>>>>>><>>>>>>>>:

11+�Y2 (Y(1))

if S1; S2 > Y(1)

Æ1(Y(1)jS1)

Æ1(Y(1)jS1)+�Y2 (Y(1))if S1 < Y(1) < S2

11+�Y2 (Y(1))Æ2(Y(1)jS2)

if S1 > Y(1) > S2

Æ1(Y(1)jS1)

Æ1(Y(1)jS1)+�Y2 (Y(1))Æ2(Y(1)jS2)if S1; S2 < Y(1):

(6)

The right-hand side does not depend on V . Indeed, the only unknown components are

the second period baseline hazard, which captures the duration-dependence di�erence

between spells and the treatment e�ects. For the �rst sub-population of strata in (6),

with S1; S2 > Y(1), the only reason the spells may not be equally likely to be the shortest

is a di�erence in the baselines at Y(1). Treatment e�ects are irrelevant because in both

spells treatment has not started at time Y(1). The second and third sub-populations are

directly informative on the treatment e�ects as the treatment status of the spells at Y(1) is

di�erent. Finally, the strata in which S1; S2 < Y(1) provide some more di�use information

on the parameters because the treatment e�ect enters for both spells.

Because V does not enter the right-hand side of (6), the same expression is valid for

Pr(Y1 = Y(1)jY(1); fNS1(�); 0 � � < Y(1)g; fNS2(�); 0 � � < Y(1)g)), which is data. So, we

can �rst identify �Y2 from data on the �rst sub-population of strata and then �1 and �2

from the second and third subpopulations, respectively. So, we have

Proposition 6. �Y2, �1 and �2 in Model 2c are identi�ed from (QSS; QY S; QSY ; QY Y ).

We �nish by exploring the relation to SPL estimation. Suppose we have a random

sample of n strata f(Y1;i; fNS1;i(y); 0 � y < Y1;ig; Y2;i; fNS2;i(y); 0 � y < Y2;ig); i =

1; : : : ; ng from (Y1; fNS1(y); 0 � y < Y1g; Y2; fNS2(y); 0 � y < Y2g). First, consider a

special case of Model 2c with �Y2 = 1 and Æ1 = Æ2 = Æ for some constant Æ 2 (0;1), so

that the outcome hazards for stratum i are

�Yk;i(tjSk;i; Vi) = exp�ln(Æ)NSk;i(t�)

�VY;i(t); k = 1; 2: (7)

Here, it is irrelevant that fVY;ig is a stochastic process. We can simply treat each VY;i(t)

as an incidental parameter and VY;i as a stratum-speci�c baseline function. A partial

likelihood can be based on the ranks of the outcome spells within strata. Equation (6)

delivers

Pr(Y1;i = Y(1);ijY(1);i;fNS1;i(�); 0 � � < Y(1);ig; fNS2;i(�); 0 � � < Y(1);ig))

=1

1 + exp�ln(Æ)

�NS2;i(Y(1);i�)�NS1;i(Y(1);i�)

�� : (8)

21

Clearly, a partial likelihood contribution based on (8) is only informative on Æ ifNS2;i(Y(1);i�) 6=

NS1;i(Y(1);i�), i.e. if exactly one of the two spells in the stratum has been treated by the

time Y(1);i. With S(1);i := minfS1;i; S2;ig and S(2);i := maxfS1;i; S2;ig, this gives

# strata such that with SPL contribution

M1 S(1);i < Y(1);i < S(2);i Æ(1 + Æ)�1

M2 S(1);i > Y(1);i > S(2);i (1 + Æ)�1

The �rst row corresponds to the strata in which the shortest outcome spell is the spell

that has been treated by Y(1);i. We denote the number of these strata in the sample by

M1. The second row corresponds to the M2 strata in which the longest outcome spell is

the one treated at Y(1);i. None of the other strata contributes to the SPL based on (8),

which therefore is proportional to�Æ

1 + Æ

�M1�

1

1 + Æ

�M2

:

Thus, we have

Proposition 7. The maximum SPL estimator bÆ of the treatment e�ect in Model 2c with

�Y2 = 1 and Æ1 = Æ2 = Æ for some Æ 2 (0;1) only uses strata in which exactly one of the

two spells has been treated at the time at which the spell with the shortest outcome ends.

Speci�cally,

bÆ = M1

M2=

number of strata with S(1);i < Y(1);i < S(2);i

number of strata with S(2);i > Y(1);i > S(1);i:

This is intuitively plausible. First, suppose that as many treated as untreated spells are

shortest among the strata in which exactly one of the spells has been treated by Y(1);i.

Then, there is no evidence that treatment has any e�ect and bÆ = 1. On the other hand,

if the treated spell is more often the shortest than the longest in the relevant strata,

treatment is likely to increase the outcome hazard and bÆ > 1.

Holt and Prentice (1974) discuss a proportional hazard model with time-varying co-

variates Xk;i(t) and hazard rate

�i(tjfXk;ig) = exp (Xk;i(t)0�)Vi(t) (9)

for spell k in stratum i. We attach the subscript i to the hazard instead of conditioning on

fVig because the Vi(t) are treated as incidental (stratum-speci�c) \parameters". In this

model, � can be estimated by maximizing a SPL based on within-stratum ranks (see also

Kalb eisch and Prentice, 1980, and Ridder and Tunal�, 1999, for a more formal analysis

with ample attention for practical problems like censoring). Our simple model (7) is a

special case of (9) with � = ln(Æ) and Xk;i(t) = NSk;i(t�). The estimatorbÆ in Proposition

22

7 is a special case of the SPL estimator of �. By analogy to the general SPL method, we

can treat fVig as an incidental \parameter" instead of a stochastic process.

Our estimator bÆ is interesting by itself because of its intuitive and simple form, which

is due to the special structure of the treatment e�ects in our model. From Proposition

6 it is clear that our estimator can be extended to allow for spell-speci�c baselines and

treatment e�ects that depend on t and S, at the cost of its simple structure. It is also

interesting that we can allow for di�erent treatment e�ects between spells in a stratum

even though identi�cation is based on contrasting outcomes between spells within strata.

By Assumption 1, treatment in one spell can always be contrasted with no treatment

(yet) in the other spell. It should be noted that a model with spell-speci�c treatment

e�ects and baselines requires that we can unambiguously order spells in each stratum.

Such an order is naturally provided by the chronology of the consecutive spells of a single

individual, but seems absent in, for example, a stratum consisting of a pair of twins.

5 Discussion

This section serves three purposes. First, we discuss the applicability of the model frame-

work as a reduced-form model. Secondly, we address the extent to which the proportional

speci�cation of the hazard rates at the individual level can be founded on economic-

theoretical grounds. Thirdly, we relate our results to the methodological literature on

treatment e�ects. Speci�cally, we compare our model framework and its identi�cation to

popular limited-dependent variable and panel-data model frameworks and their identi�-

cation.

5.1 Applicability

We start this subsection by brie y repeating some distinguishing properties of the model

as a reduced-form model of treatment e�ects in economics. We use this as a guideline

for a characterization of the type of situations for which our model provides a reasonably

accurate description of the individual treatment e�ect. After that, we give some concrete

examples. Throughout this subsection we take the proportionality assumptions as given.

Individuals can be aware of the existence of the treatment assignment process and we

allow them to in uence this process as well as the rate at which they leave the state of

interest.11 However, individuals do not use knowledge of the point in time at which they

receive treatment or the point in time at which they leave the state of interest.12 We do

11For example, the individuals may know that �Y (t) increases in the near future and modify their

strategy accordingly, which may a�ect their �S . The latter can be captured in the model by �S(t).12If the time span between the point in time at which anticipation of the treatment would occur

and the point in time of the actual treatment is short, relative to the durations S and Y � S, and if

the anticipatory e�ect is not very large, then estimation results may be rather insensitive to the no

23

not assume that the analyst observes determinants of the treatment assignment process

that the individual does not use himself to update his strategy. Finally, from the point

of view of the individual there is a purely random component in the duration until the

actual treatment. In practice, this may re ect behavior of the institution that supplies

or imposes the treatments or it may re ect purely random external shocks (see below for

examples).

It follows that the model is typically not suitable for situations in which the individual

himself has permanent full control over the assignment of treatment. In such situations, the

moment of treatment is a deterministic function of the (determinants of the) individual's

strategy, so that there is no random component in the duration until that moment. In

addition, in such situations the moment of treatment is by de�nition anticipated. Our

model is therefore more suitable for situations in which, from the perspective of the

individual, there is a surprise element in the occurrence of a treatment. The treatment

process may capture self-selection behavior on the basis of observable and unobservable

determinants of costs and bene�ts of the treatment.

Of course, as long as there are no multiple observations on the outcome of interest for

given unobserved determinants, all such considerations equally apply to other methods for

inference of treatment e�ects in the presence of selectivity. In general, some independent

randomness in treatment assignment is needed and anticipation is (often tacitly) ruled

out.13 An advantage of our dynamic analysis is that it highlights these issues.

We now discuss a number of applications, partly from empirical studies in which

treatments are analyzed with models similar to our Model 1a.

Example 1. Punitive sanctions for unemployment bene�ts recipients.

This has already served as an example in Subsection 2.4. Abbring, Van den Berg and Van

Ours (1997) and Van den Berg, Van der Klaauw and Van Ours (1998) analyze the e�ect

of sanctions on re-employment rates by estimating models similar to Model 1a. They

argue that imperfect monitoring and discretionary powers of the bene�ts agency give rise

to random components in the imposition of a sanction and that these are unpredictable

before a sanction is actually imposed. They support this with results from �eld research

among the case workers who decide on the imposition of a sanction. It is plausible that

individuals are aware of the relation between their behavior and their sanction risk. Both

the re-employment rate and the rate at which a sanction is imposed are a�ected by the

individual's strategy and all of its determinants. Note that sanctions can be supplemented

with compulsory training.

anticipation assumption.13For example, many standard treatment evaluation studies su�er from a potential bias due to anticipa-

tory e�ects. This includes studies using \di�erence-in-di�erence" methods where one \di�erence" concerns

a comparison between pre- and post-treatment circumstances (see Heckman, LaLonde and Smith, 1999,

for an overview).

24

Example 2. The hiring of replacement workers during strikes.

From the point of view of the employees who are on strike, Y is the duration of the

strike and S is the duration until the employer hires replacement workers. This is another

example where the treatment is a punishment. Punishments occur in a world with im-

perfect monitoring and in general the individuals do not know in advance if and exactly

when they are punished. However, the rate at which replacement workers are brought in

depends on characteristics of the con ict, including the behavior and characteristics of

the union which manages the strike. Also, the strike duration is a�ected by the hiring

of replacement workers. Cramton and Tracy (1998) survey the existing theoretical and

empirical literature on these issues.

Example 3. Labor market transition of spouse.

Consider an individual and spouse who have jobs in the same city. Suppose that the spouse

leaves her job in order to accept another one in another city. This will a�ect the individual's

own transition rate to another job. Let, from the point of view of the individual, Y denote

his own job duration and S the spouse's job duration. It is obvious that the (unobserved)

determinants of both job durations are related. Moreover, the realization of S is not fully

predictable because it depends on the inherent randomness in the process according to

which workers and employers meet and form matches. Note that this is an example in

which the symmetric extension of the model can be applied.

Example 4. Marriage, child birth and death.

Lillard (1993) estimates a model for the joint durations of marriage and time until con-

ception of a child, allowing the rate at which the marriage dissolves to change in response

to child birth. Lillard and Panis (1996) estimate a model on the joint durations of mar-

riage, non-marriage, and life, allowing the death rate to change in response to marriage

formation and dissolution. The model allows for selection on the basis of unobservables. It

does not capture that individuals know some time in advance when they marry or when

a child is born. However, the time span between the moment at which the anticipation

occurs and the moment of the actual treatment may be short relative to the durations

between the observed events.14

Example 5. Training of unemployed workers.

This example is similar to Example 1, albeit that here the worker often has a larger in u-

ence on whether the treatment occurs. However, if the training program is in short supply

then the availability of a slot depends on the labor market behavior of other (potential)

applicants and this is random from the point of view of the individual. Card and Sullivan

(1988) and Eberwein, Ham and LaLonde (1997) allow the exit rate out of unemployment

14Hougaard, Harvald and Holm (1992) study the joint distribution of life times of twins. They estimate

two models: the Freund (1961) model, which allows for a causal e�ect of the death of one twin on the

hazard of the other, and a model that does not allow for such a causal e�ect but that allows for identical

unobserved determinants.

25

to shift to another level at the moment of entry into a training program. They allow for

selection on the basis of unobservables, but they do not allow for anticipation.15

Example 6. Stepping-stone jobs.

Consider an individual who wants to obtain a highly desirable type of job that is in short

supply. The chances of the individual may be higher if he has some work experience in

another (less desirable) type of job, which is also in short supply. During the search for

the highly desirable job the individual may accept such a less desirable job if he �nds one.

Then, from the point of view of the individual, Y is the duration until the highly desirable

job is found and S the duration until the less desirable job is found. Like in Example 3, it

is obvious that the (unobserved) determinants of both search durations are related. Also,

the realization of S is not fully predictable because it depends on the inherent randomness

in the process according to with workers and employers are matched. For an empirical

example see Van den Berg, Holm and Van Ours (2001), in which the \highly desirable

job" is a slot in the educational program to become a medical specialist. To enter this

program, the student must follow an application procedure. The less desirable job is a job

as a medical assistant.

5.2 Theoretical foundation of proportionality

The MPH model and its special cases and extensions are often regarded to be useful

reduced-form models for duration analysis. However, the MPH model speci�cation and

our treatment e�ect speci�cation are not derived from economic theory. Let us consider

the model of Subsection 3.1. The determinants of the hazard rates act multiplicatively

on them. Of course, the distinction between two of the determinants (the observed and

unobserved explanatory variables) is only relevant from an empirical point of view. From

a theoretical point of view, the most relevant assumptions of the model are that the

elapsed duration, the explanatory variables and the treatment act multiplicatively on the

hazard rate of Y and that the �rst two act multiplicatively on the hazard rate of S. In

this subsection we therefore ignore unobserved heterogeneity.

In theoretical models the hazard rate �Y of Y depends on the individual's strategy.

Typically, the optimal strategy involves a comparison of the present value of leaving

the state of interest to the present value of staying. These present values depend on

the current and future values of the structural parameters in a highly non-linear way.

15In the applied literature on the e�ects of training on unemployment durations \training" is sometimes

seen as a separate labor market state. The fact that one has had any training may a�ect subsequent

transitions. See Gritz (1993) and Bonnal, Foug�ere and S�erandon (1997) for examples. In order to deal with

selectivity of those who enrol in training, they allow for related unobserved heterogeneity terms a�ecting

entry into training as well as the other transition rates. The models are estimated with multiple-spell data.

Ham and LaLonde (1996) use experimental data to investigate the e�ect of training on unemployment

durations.

26

As a result, �Y has a complicated non-linear relation to t, x and the treatment even if

the structural parameters are multiplicative in these determinants. Van den Berg (2001)

shows this in detail for the relation between search models and the MPH model without

treatment e�ects. Here, we discuss conditions under which a multiplicative speci�cation

of the structural parameters carries over to the implied hazard �Y .

The theoretical expression for �Y simpli�es considerably if the individual's strategy

is myopic, i.e. if it amounts to a comparison of the instantaneous utility ows of the

di�erent options. Cameron and Heckman (1998) and Van den Berg (2001) use this to

obtain MPH-type speci�cations for the hazard rate in theoretical models for a single

duration variable. Here we follow this line in the context of models with a treatment

e�ect and with treatment assignment.

Intuitively, the optimal strategy for exiting the current state is myopic if the individual

either does not throw away any option by making a transition or does not care about the

future because of an in�nite discount rate. For both of these cases the Appendix provides

detailed examples. In all of these examples, we consider job search models where the

transition rate out of the current state equals the product of the job o�er arrival rate

and the probability that a wage o�er exceeds the reservation wage. The reservation wage

equals the current instantaneous income ow if the optimal strategy is myopic.

In the �rst set of examples, the optimal strategy is myopic because search is repeated,

i.e. the individual is able to search further for better matches after a match has been

formed and no options are thrown away by a transition. The individual's search environ-

ment is subject to treatments that are not worker-speci�c or job-speci�c but that instead

act similarly on all workers (e.g. aggregate shocks). Here it does not matter whether such

changes are known in advance, because the optimal strategy does not depend on that.

In addition, there may be unanticipated treatments that are worker-speci�c but not job-

speci�c in the sense that the moment of occurrence is unrelated to the actual job held at

the time. We include an example that theoretically underpins Example 3 (labor market

transition of spouse) of the previous subsection.

In the second set of examples we assume sequential search and in�nite discounting.

Sequential search models do not allow the economic agent to search further for better

matches after a match has been formed. We focus on unemployed individuals that may

participate in a training program that increases their job o�er arrival rate (recall Example

5 from the previous subsection). Again, it does not matter whether participation is known

in advance. The speci�cation of the hazard rate of the duration S is determined by pro�ling

of unemployed workers by case workers.

We conclude that, under some conditions, the multiplicative speci�cation follows from

economic-theoretical models. In the examples, the optimal strategy is not a�ected by

information on the moment of future treatment. In the terminology of Section 2, there

is no anticipation. In this sense, Assumption 1 (\no anticipation") and the MPH model

framework are a natural combination.

27

The economic justi�cation of other popular reduced-form duration model speci�cations

is at least as diÆcult as the justi�cation of the (M)PH speci�cation. This applies in

particular for the Accelerated Failure Time model, in which the mean of ln(Y ) is speci�ed

as a linear function of x, so that ln(Y ) = �x0� + �, and the additive hazard model, in

which �Y (tjx) is speci�ed as �Y (t) + �Y (x). Discrete-time reduced-form duration model

speci�cations are also diÆcult to justify and often do not follow from underlying economic

models such as discrete-time search models or dynamic discrete-choice models.

5.3 Comparison with identi�cation in limited-dependent vari-

able and panel-data model frameworks

5.3.1 Limited-dependent variable model frameworks

We start with the identi�cation of the treatment e�ect in the prototypical limited-dependent

variable model with treatments (see Maddala, 1983, for an overview). We restrict atten-

tion to the relations between the mean of the endogenous variables on the one hand and

the explanatory variables on the other. This is in line with the spirit in which these models

are interpreted.16 The exposition is kept rather informal.

Consider

Y = x0Y �Y + x0Y Æ0 � I(Z > 0) + "Y

Z = x0Z�Z + "Z(10)

where Y , Z, "Y and "Z are continuous random variables, Z > 0 indicates treatment and

x0Y Æ0 is the treatment e�ect. We assume that inference is based on a random sample of

subjects with information on Y , xY , xZ and I(Z > 0) for each subject. Furthermore,

E ["Y jxY ] = E ["Z jxZ ] = 0. We take the parameter �Z to be identi�ed from the data

on I(Z > 0) and xZ . It is generally acknowledged that either exclusion restrictions or

parametric functional-form assumptions on the joint distribution of ("Y ; "Z) are required

for identi�cation of the treatment-e�ect parameter Æ0. Exclusion restrictions state that

some covariates in xZ with non-zero parameters in �Z are not included in xY . Recall that

these restrictions are often hard to justify. Similarly, it is usually hard to �nd convincing

arguments in favor of a particular functional form of the distribution of ("Y ; "Z).

Now suppose that we do not observe Y but only I(Y > 0). This is the \binary treat-

ment { binary outcome" speci�cation. For example, Z > 0 may indicate whether an

unemployed individual has participated in a training program, and Y > 0 may indicate

whether he has found a job. The binary speci�cation for Y e�ectively reduces the infor-

mation in the data, so identi�cation needs at least as many assumptions as above (see

Cameron and Heckman, 1998, for results).

16See e.g. Heckman (1990) for identi�cation results based on full information.

28

To make comparisons with our Model 1a, it is useful to rewrite the latter as a

regression-type model. Again using that the integrated hazard of a duration variable

evaluated at that random duration has a unit exponential distribution we can write

ln ([�Y (minfY; sg) + I(Y > s)�(Y js; x)]) = � ln(�Y (x))� ln(VY ) + ln(EY ) (11)

ln(�S(S)) = � ln(�S(x))� ln(VS) + ln(ES); (12)

with EY and ES unit exponential random variables such that

EY ?? ES; and (EY ; ES) ?? (X; VY ; VS)

but VY and VS possibly dependent. Note that ES captures the random component in the

treatment assignment that we discussed in Subsection 3.2.

We conclude, �rst of all, that the proportionality assumption is similar to the additiv-

ity assumption that is made in regression equations. Both impose some \smoothness" by

excluding certain interactions of time and explanatory variables at the individual level.

Secondly, let, in the above limited-dependent variable model, Y be a (log) duration vari-

able. The most fundamental di�erence between that model and Model 1a concerns the

fact that the latter incorporates the timing of the treatment whereas the former does

not. Since we do not need exclusion restrictions to identify Model 1a, it follows that the

information on the timing of events is very useful.17

Using limited-dependent variable models or discrete-time panel-data models when Y

is in fact a (function of a) duration variable also leads to a number of intractable practical

problems. First, it requires aggregation over time. Often, it is recorded whether a treat-

ment occurs in a baseline period, and it is recorded whether the outcome (i.e. the duration

variable) is realized in the subsequent period. Then, the question arises what to do when

the treatment and the outcome occur in the same period. In addition, it is not clear how to

deal with observations that are right-censored before the end of the observation window.

It is common that a spell in a state of interest can end in di�erent ways. Usually, only

a subset of these are deemed interesting. For example, an unemployment spell can end

because of a transition to work, but also because of a transition into education, military

service, etcetera. A study of the transition rate to work may treat transitions to other

destinations as independently right-censoring the duration until work. The treatment ef-

fect estimate may be biased if such observations are discarded or if they are treated as

observations of exit to work.

17There is a very remote similarity to an approach in Vella and Verbeek (1997) to identify a treatment

e�ect in a Heckman selection model without exclusion restrictions. They decompose the error term of

the latent equation into a component that basically captures the rank order of the error term and a

completely independent residual component and assume that the selection e�ect only works by way of

the �rst component.

29

5.3.2 Panel data model frameworks

Consider the dynamic panel-data model

Wt = x0W;t�W + x0W;tÆ0 � I(Zt > 0) + VW + �W;t

Zt = x0Z;t�Z + VZ + �Z;t;(13)

where the index t denotes time. We deliberately introduce new notation Wt for the out-

come because we will relateWt to duration outcomes in two di�erent ways later. Inference

is based on a random sample of subjects with information on Wt, Wt�1, xW;t, xW;t�1, xZ;t,

xZ;t�1, I(Zt > 0) and I(Zt�1 > 0) for each subject. We take �j;t to be i.i.d. across time and

across j = W;Z, so that the spurious dependence between treatment and outcome (the

\selection e�ect") runs by way of the relation between VW and VZ . We take E [�j;t jxj] = 0,

where for sake of brevity we use xj := xj;t; xj;t�1. We can treat VW as a �xed e�ect by

taking �rst di�erences of Wt across time,

E [Wt �Wt�1jZt�1 � 0; Zt > 0; xY ; xZ ] = (xW;t � xW;t�1)0�W + x0W;tÆ0: (14)

Clearly, Æ0 is identi�ed.

Now suppose that �W varies over time. The equivalent of the right-hand side of equa-

tion (14) equals x0W;t�W;t � x0W;t�1�W;t�1 + x0W;tÆ0. As a result, Æ0 is unidenti�ed from

E [Wt �Wt�1jZt�1 � 0; Zt > 0; xW ; xZ ]. A further \di�erence-in-di�erence" however gives

E [Wt �Wt�1jZt�1 � 0;Zt > 0; xW ; xZ]

� E [Wt �Wt�1jZt�1 � 0; Zt � 0; xW ; xZ ] = x0W;tÆ0:(15)

Clearly, Æ0 can be allowed to depend on t.

We can apply this panel setup to our problem in two ways. We can either (i) let Wt

be a survival indicator for a single outcome duration Y and set Wt = NY (t�) or (ii)

relate each Wt to a separate outcome duration Yt. Option (i) suggests a link between a

single-spell duration model and a multi-period panel-data model. However, this apparent

similarity is deceptive. In the case of (i),Wt�1 = 1 =) Wt = 1, which is not necessarily so

in the general case of speci�cation (13). The latter speci�cation allows for more variation

in Wt and Wt�1 given Zt�1; Zt; xW ; xZ . This variation distinguishes panel-data models

from our duration model. As a result, we cannot apply (14) and (15). In our companion

paper Abbring and Van den Berg (2001b) we develop a somewhat related, but much

more involved, approach for our model. We require that the unobserved explanatory

variables are independent of the observed explanatory variables and we have to specify the

treatment assignment process. All this is not required in the panel-data model framework.

Now consider option (ii). Let Zt indicate whether a treatment is given during the

t-th spell. This interpretation allows for a straightforward comparison with our multiple-

spell duration model. In both cases we observe multiple outcomes for an individual with

the same unobserved covariates. The SPL estimator developed in Subsection 4.3 is the

30

analogue of the �xed-e�ect panel-data estimator based on (15). Neither of these requires

any observed explanatory variables or a speci�cation of the treatment assignment process.

In conclusion, we can identify the treatment e�ect without exclusion restrictions or

parametric functional-form restrictions on the distribution of the unobservables, even

though the observational scheme in a single-spell duration model with treatments is more

restrictive than in panel-data models. Thus, with treatments, duration data are more

informative than the data used for standard limited-dependent variable models. In com-

parison to standard panel-data models, the single-spell model requires independence of

the unobserved and observed covariates. Furthermore, the treatment assignment process

has to be speci�ed. This can be remedied by using multiple-spell duration data.

Appendix.

Formal derivations for Subsection 2.4

Consider a standard partial search model (e.g. Mortensen, 1986). Job o�ers arrive at a rate � � e, where

e � 0 is the intensity of search and � > 0 is a structural parameter of the model. The corresponding

ow of search costs is denoted by c(e) and the wage o�er distribution is denoted by F . An individual

initially receives unemployment bene�ts b0 while searching, but these are reduced to b1 < b0 at some time

S. Utility is derived from consumption only. We abstract from any saving, so that consumption equals

income net of search costs at each point in time. We assume that utility is intertemporally separable,

with instantaneous utility function u and discount rate � > 0.

First, assume that individuals observe their individual realization of S at time 0. We consider two

simple alternative versions of this �rst model. Both lead to the same expression of the optimal re-

employment hazard rate. The �rst version is in the spirit of Mortensen (1977), and has an F that is

degenerate at some wage w > b0, search costs c(e) = e2=2 and linear instantaneous utility u(x) = x.

In this version, individuals only choose search e�ort to maximize their discounted ow of income. The

second version has e exogenously �xed at 1, search costs c(1) = 0 and logarithmic instantaneous utility

u(x) = ln(x). Furthermore, it has a dispersed log-uniform wage o�er distribution, with F (w) = 0 if w � �

and F (w) = ln(w=�)= ln(�=�) if � < w < �, with b0 � � < � <1 (Van den Berg, 1990). In this version,

the individuals only have to decide when to accept or reject job o�ers. In either case, the resulting hazard

rate �Y �(s)(y) at time y, if bene�ts are reduced at time s, can be expressed as follows. For y 2 [s;1), it

equals �1 := ��+p�2 + 2 (b1). Here, the function : R ! R takes a form speci�c to either version. We

are only interested in some properties of that hold in both models: � 0, 0 < 0, and (w) = 0, where

w equals the highest wage in the support of F (i.e., w = w in the �rst model and w = � in the second

model). For y < s, �Y �(s)(y) satis�es the di�erential equation

�0Y �(s)(y) = � (b0) + ��Y �(s)(y) +1

2�Y �(s)(y)

2

(see also Van den Berg, 1995). Continuity at s applies because there is no shock in the individual's

information set or the search technology parameters at s.

As a speci�c example, we take b0 and b1 such that (b0) = 0, (b1) = (2�+ 1)=2, and we take � # 0,

which gives

�Y �(s)(y) =

(2

2+s�y if 0 � y � s and

1 if y > s:(16)

31

Figure 1: Anticipated and unanticipated UI changes: potential hazards

Note: this graph plots �Y �(s) for the cases of anticipated (\perfect foresight") and unanticipated (\no

anticipation") UI changes at times s = 1:5, 2:0 and 2:5.

Figure 1 illustrates the potential hazard rates for s = 1:5, 2:0 and 2:5. Although (16) is only valid for

particular parameterizations of the models, the qualitative features that can be discerned in Figure 1 are

robust.

In the second model, individuals only observe the sanction history fNS(u); 0 � u < yg and not S

itself, at each time y. We assume that sanctions arrive at some exogenously given constant rate that

is known to the individuals. Our intervention involves a manipulation of the sanction time s without

changing this perceived arrival rate of sanctions. At s, the resulting hazard rate jumps from a constant

level �0 to �1.

We return to the �rst model and take S to have the density g(s) = (2 + s)2e�s=10. Recall that we

observe (QS ; QY ). Direct calculations show that

QS(y; s) =2

5(1 +maxf0; y � sg) e�maxfy;sg and QY (y) =

3

5e�y:

Our non-identi�cation example centers around the fact that the Y �(s)-hazard at time y < s cannot

be identi�ed from data on subjects with S = s. At the very best, we can pin down the mixture haz-

ard rateR1y �Y �(s)(y)e

��Y �(s)(y)g(s)ds=R1y e��Y �(s)(y)g(s)ds over the potential hazards corresponding to

treatments s in [y;1). This hazard rate equals

limdy#0

Pr(y � Y < y + dyjY � y; S � y)

dy=

�Q0Y (y)

Q0S(s) +QY (y)

=3

5:

Now note that the same mixture hazard would result if the potential hazards would equal 3=5 for 0 � y � s

for all s. We base our construction of an observationally equivalent model on this idea. Let eY �(s) have

hazard

�eY �(s)(y) =

(35 if 0 � y � s

1 if y > s

and eS have density eg(s) = (2=5)e�(2=5)s. It is easily checked that (fY �g; S) and (feY �g; eS) are indeed

observationally equivalent. Figure 1 illustrates the potential hazards for both equivalent models, and

s = 1:5, 2:0 and 2:5.

32

We end with a note concerning aggregation over values of the structural parameters and the meaning

of Assumptions 1 and 2. The structural models of this section imply potential re-employment rates �Y �(s)

for each sanction time s, given values of the structural \parameters" �, F , � and the sanction rate, say

�S . Suppose that there is heterogeneity in these parameters, which we model by letting �, � and �S be

random variables and fFg a stochastic process. The models above specify

Y �(s) = ��1Y �(s) (EY j�; fFg; �; �S) and S = ��1S ES ;

with EY and ES unit exponential random variables such that EY??ES and (EY ; ES)??(�; fFg; �; �S). As

in the general setup, we have made the dependence of �Y �(s) on the random parameters explicit. We have

that fY �g??Sj(�; fFg; �; �S), so that Assumption 2 is satis�ed with any (X;V ) such that (�; fFg; �; �S)

is measurable with respect to �(X;V ). In the sanctions model without anticipation, Assumption 1 is true

conditional on the parameters and, by implication, conditional on only a subset of parameters. However,

in general Assumption 2 will not hold in the latter case. In general, Assumptions 1 and 2 may hold after

aggregation over some of the parameters observed by individuals, even if there are anticipatory e�ects.

In this sense, the interpretation of Assumption 1 in terms of no anticipation is too strict.

Proof of Proposition 1

Take some (QS ; QY ) as given. By Miller (1977), Theorem 1, there exists some (eS; eY ) such that eS??eY ,eQ0S = Q0

S and eQY = QY . Let

�eY �(s)(y) :=

(�eY (y) if 0 � y � s

�eY (s)� ln [Pr(Y > yjY > S; S = s)] if y > s:

Let E be a unit exponential random variable such that E??eS. Set eY �(s) = ��1eY �(s)

(E) for all s 2 R+.

Then, it is easy to check that (feY �g; eS) also satis�es Assumption 1 and is observationally equivalent to

(fY �g; S).

The theorem by Miller (1977) that is invoked is an extension to general distributions of the non-

identi�cation results by Cox (1962) and Tsiatis (1975) for continuous competing risks models. The proof

and theorem therefore extend to mixed discrete-continuous models.


By Proposition 2, �Y , �S , �Y , �S and G are identi�ed from fQ0S; QY g. For �xed x 2 X ,

@QS (y; sjx)

@s= �S (x)�S (s)L

(S)G (�S (x) [�Y (s) + � (yjs; x)] ; �S (x) �S (s)) (17)

for all y 2 R+ and almost all s 2 R+ such that s < y. LG is the bivariate Laplace transform of G,

LG(z1; z2) :=

Z 1

0

Z 1

0

exp(�z1vY � z2vS)dG(vY ; vS)

and L(S)G (z1; z2) := @LG(z1; z2)=@z2 for all (z1; z2) 2 (0;1)2. For given s and x, the right-hand side of

(17) is an strictly increasing and identi�ed function of � (yjs; x). This identi�es � (yjs; x) and therefore

� (yjs; x) =

Z y

s

@�(� js; x) =@�

�Y (�)d�

for all x 2 X and all y; s 2 R+ such that s < y (using continuity).

33


For �xed x 2 X , @2QS(y; sjx)=@y@s and @2QS(y; sjx�)=@y@s exist for almost all y; s 2 R+ such that

s < y. For these (y; s)

@2QS(y; sjx)=@y@s

@2QS(y; sjx�)@y@s= �S(x)��(x)

L(�S)G�

(�Y (x)�Y (s); ��(x)(��(y)� ��(s)); �S(x)�S(s))

L(�S)G�

(�Y (s);��(y)� ��(s);�S(s)): (18)

LG� is the trivariate Laplace transform of the distribution G� of (VY ; V�; VS) and L(�S)G�

(z1; z2; z3) :=

@2LG�(z1; z2; z3)=@z2@z3 for all (z1; z2; z3) 2 (0;1)3. Letting y # 0 and s # 0, both sides of (18) above

reduce to �S(x)��(x) because E [V�VS ] = limz#(0;0;0) L(�S)G�

(z) < 1. As we have identi�ed �S and the

left-hand side is data, this identi�es ��.

Again for arbitrary x 2 X and y; s 2 R+ such that s < y,

@QS(y; sjx)=@s

�S(s)�S(x)= L

(S)G (�Y (x)�Y (s); ��(x)(��(y)� ��(s)); �S(x)�S(s)); (19)

with L(S)G�

(z1; z2; z3) := @LG�(z1; z2; z3)=@z3. Note that the left-hand side of this equation is already

identi�ed. As s # 0, the right-hand side reduces to

L(S)G (0; ��(x)��(y); 0): (20)

After imposing y = t�, we can identify the completely monotone function �L(S)G (0; �; 0) on a nonempty

open set in R by appropriately varying x in equation (20). This identi�es �L(S)G (0; z; 0) for all z 2 (0;1)

because of the real analyticity of �L(S)G (0; �; 0) (see below; note that this gives the marginal distribution

of V�). Subsequently, �� is identi�ed from (20), because the right-hand side is strictly monotone in ��.

Finally, by appropriately varying x and y in equation (19), we can trace L(S)G on an nonempty open

subset of R3 . This identi�es L(S)G on (0;1)3 because �L

(S)G is real analytic. A proof of these statements

is in the previous version of this paper, which is available (as a 2000 working paper) upon request. With

LG(0; 0; 0) = 1, this identi�es LG.


By Abbring and Van den Berg (2001a), �Y1 , �S1 , �Y2 and �S2 are identi�ed.

(i). For Model 2a, de�ne �k(yjs) :=R ys�Yk (�)Æk(� js)d� , k = 1; 2. Note that

�Y2(t�)

(Z t�

0

�Z y

s

@2 [QSY (�; t; s) +QSS(�; t; s; 0)] =@�@s

@2QSY (�; t; s)=@s@td�

��1dt

)�1

= �1(yjs):

Because we know �Y2(t�) and �Y1 , this identi�es �1. Similarly, we can identify �2 and �2 from

QY S and QSS.

Now, consider

@ [QSY (y; 0; s) +QSS(y; 0; s; 0)] =@s

�S1(s)= L

(S)G (�Y1(s) + �1(yjs);�S1(s));

for y > s. As we know �Y1 , �S1 and �1, we can �nd a non-empty open set in R2 on which we know

L(S)G . The function �L

(S)G is real analytic, so this determines L

(S)G on (0;1)2 (recall the discussion

in the proof of Proposition 4). Together with LG(0; 0) = 1 this identi�es LG.

34

(ii). For Model 2b, t1; s1 2 R such that s1 < t1 and t2; s2 2 R such that s2 < t2Z t1

s1

�Z t2

s2

@3QSS(y1; y2; s1; s2)=@s1@s2@y2@3QSS(y1; y2; s1; s2)=@s1@s2@y1

dy2

��1dy1 =

��1(t1)� ��1(s1)

��2(t2)� ��2(s2):

Setting s1 = s2 = 0 and consecutively t1 = t� and t2 = t� identi�es ��2 and ��1 . Identi�cation

of G� follows as under (i).

Examples of theoretical foundations for Model 1a

We �rst consider models of repeated search. For ease of exposition, we phrase the theoretical model as a

model for on-the-job search for better jobs (see Mortensen, 1986, for an overview). A job is characterized

by its wage w, which is constant within a job. Workers obtain wage o�ers, which are random drawings

from the wage o�er distribution F (w), at an exogenous rate �. Whenever an o�er arrives, the decision

has to be made whether to accept it or to reject it and search further for a better o�er. If the individual's

search environment (i.e. � and F ) is time-invariant (i.e. the model is stationary) then the optimal strategy

is constant during a job spell. In particular, as is well known, the optimal strategy is myopic: accept a job

if and only if the o�ered wage exceeds the current wage. Now let the individual's search environment be

subject to changes that are not worker-speci�c or job-speci�c but rather such that they act similarly on all

employed workers (e.g. business cycles). The changes may be anticipated or unanticipated. It is obvious

that this nonstationarity does not a�ect the optimal strategy: it remains optimal to accept another job

if and only if its wage exceeds the current wage. In the absence of other treatments we thus obtain for

the job-to-job transition rate,

�Y (tjex) = �(t; x)(1� F (wjt; x))

where ex := (x;w). We allow structural determinants to di�er across individuals. This is captured by time-

invariant explanatory variables x. We assume that the analyst observes x (and the duration Y ) but does

not directly observe how the structural determinants, the optimal strategy or the acceptance probability

change with t. The optimal strategy is also una�ected if we allow for unanticipated treatments that are

worker-speci�c but not job-speci�c in the sense that the moment of occurrence is unrelated to the actual

job held at the time.

Consider the following speci�c example. An individual and his spouse have jobs in the same city.

The treatment consists of the spouse �nding and accepting a job in another city. The e�ect of this is

a reduction of the instantaneous disposable income of the individual by a fraction � (e.g. because of

traveling costs; for ease of exposition we abstract from subsequent treatments like this). Let � vary with

the business cycle and let F be a Pareto distribution that varies with x by way of 1�F (wjx) = (w0(x)=w)�

for w 2 (w0(x);1). Then

�Y (tjS; ex) = �(t) � [w0(x)=w]� � [�� ]I(t>S)

which satis�es the speci�cation of Model 1a. (The derived theoretical speci�cation actually requires some

more justi�cation; note that F does not depend on whether the treatment has occurred.) Note that, for

reasons of symmetry, this structural model also gives a multiplicative speci�cation for the hazard rate of

the duration S.

Obviously, there are many more ways to obtain a Model 1a speci�cation in this example. We may

allow � and � to depend on x (in which case the treatment e�ect Æ depends on x), or we may treat w

as unobserved, or we may let � be multiplicative in t and x, or we may take F to have a transposed

exponential distribution, or we may allow the treatment to have a multiplicative e�ect on �, etcetera.

Also, obviously, many other repeated search models are able to generate Model 1a speci�cations.

35

Now let us consider sequential search models with treatments and in�nite discounting. For ease

of exposition, we phrase the theoretical model as a job search model for unemployed individuals (see

Mortensen, 1986, for an overview). The unemployed individual's search environment is as above, except

that a job, once obtained, is held forever. If the search environment is time-invariant then the optimal

strategy is constant during a job spell. In particular, if the discount rate is in�nity, so that workers do not

care about the future, then the optimal strategy is myopic: accept a job if and only if the o�ered wage

exceeds the current instantaneous utility ow b. Now let the individual's search environment change over

time, anticipated or unanticipated. Because of the in�nite discounting, this does not a�ect the optimal

strategy: it remains optimal to accept a job if and only if its wage exceeds b. In the absence of other

treatments we thus obtain for the exit rate out of unemployment,

�Y (tjex) = �(t; x)(1� F (bjt; x))

where ex := (x; b). The optimal strategy is also una�ected if we allow for treatments. Suppose that the

treatment consists of participation in a training course, and that the e�ect of this is a multiplicative

change of � in the job o�er arrival rate �. Let � also vary with t (e.g. because the long-term unemployed

are stigmatized), and let F be the same Pareto distribution as above, with b > w0(x). Then

�Y (tjS; ex) = �I(t>S) � �(t) � [w0(x)=b]�

which satis�es the speci�cation of Model 1a.

The speci�cation for the hazard rate of the duration S may follow from the assignment of individuals

to the course by case workers. They may pro�le the individuals, as follows: estimate a Weibull unemploy-

ment duration model with external data, predict the individual's hazard rate using this estimated model,

and use this as the hazard rate for assignment.

There are many more ways to obtain a multiplicative speci�cation for �Y . Note also that if b < w0

then 1� F (b) = 1 always, which facilitates the construction of a multiplicative speci�cation.

36

References

Abbring, J.H. (2001a), \Econometric duration and event-history analysis", Lecture notes,

Department of Economics, The University of Chicago.

Abbring, J.H. (2001b), \Stayers versus defecting movers: A note on the identi�cation of

defective duration models", Economics Letters [forthcoming].

Abbring, J.H. and G.J. van den Berg (2001a), \The identi�ability of the mixed propor-

tional hazards competing risks model", Working paper, Free University, Amsterdam.

Abbring, J.H. and G.J. van den Berg (2001b), \A simple procedure for inference on

treatment e�ects in duration models", Mimeo, Free University, Amsterdam.

Abbring, J.H., G.J. van den Berg, and J.C. van Ours (1997), \The e�ect of unemploy-

ment insurance sanctions on the transition rate from unemployment to employment",

Working paper, Tinbergen Institute, Amsterdam.

Billingsley, P. (1995), Probability and Measure, Wiley, New York, third edition.

Bonnal, L., D. Foug�ere, and A. S�erandon (1997), \Evaluating the impact of French em-

ployment policies on individual labour market histories", Review of Economic Studies,

64, 683{713.

Cameron, S.V. and J.J. Heckman (1998), \Life cycle schooling and dynamic selection

bias: Models and evidence for �ve cohorts of American males", Journal of Political

Economy, 106, 262{333.

Card, D. and D. Sullivan (1988), \Measuring the e�ect of subsidized training programs

on movements in and out of employment", Econometrica, 56, 497{530.

Cox, D.R. (1962), Renewal Theory, Methuen, London.

Cox, D.R. (1972), \Regression models and life-tables (with discussion)", Journal of the

Royal Statistical Society Series B, 34, 187{202.

Cramton, P. and J. Tracy (1998), \The use of replacement workers in union contract

negotiations: The U.S. experience, 1980{1989", Journal of Labor Economics, 16, 667{

701.

Eberwein, C., J.C. Ham, and R.J. LaLonde (1997), \The impact of being o�ered and

receiving classroom training on the employment histories of disadvantaged women:

Evidence from experimental data", Review of Economic Studies, 64, 655{682.

Elbers, C. and G. Ridder (1982), \True and spurious duration dependence: The identi�-

ability of the proportional hazard model", Review of Economic Studies, 64, 403{409.

Fleming, T.R. and D.P. Harrington (1991), Counting Processes and Survival Analysis,

Wiley, New York.

37

Freedman, D.A., \On specifying graphical models for causation, and the identi�cation

problem", Technical Report 601, Department of Statistics, University of California,

Berkeley, CA.

Freund, J.E. (1961), \A bivariate extension of the exponential distribution", Journal of

the American Statistical Association, 56, 971{977.

Gill, R.D. and J.M. Robins (2001), \Causal inference for complex longitudinal data: The

continuous case", Annals of Statistics, 29 [in press].

Gritz, R.M. (1993), \The impact of training on the frequency and duration of employ-

ment", Journal of Econometrics, 57, 21{51.

Grubb, D. (1999), \Making work pay: The role of eligibility criteria for unemployment

bene�ts", Mimeo, OECD, Paris.

Ham, J.C. and R.J. LaLonde (1996), \The e�ect of sample selection and initial conditions

in duration models: Evidence from experimental data on training", Econometrica, 64,

175{205.

Heckman, J.J. (1990), \Varieties of selection bias", American Economic Review, 80 (sup-

plement), 313{318.

Heckman, J.J. (2000), \Causal parameters and policy analysis in economics: A twentieth

century retrospective", Quarterly Journal of Economics, 115, 45{97.

Heckman, J.J. and B.E. Honor�e (1989), \The identi�ability of the competing risks model",

Biometrika, 76, 325{330.

Heckman, J.J., R.J. LaLonde, and J.A. Smith (1999), \The economics and econometrics

of active labor market programs", in O. Ashenfelter and D. Card, editors, Handbook

of Labor Economics, Volume III, North-Holland, Amsterdam.

Heckman, J.J. and B. Singer (1984), \The identi�ability of the proportional hazard

model", Review of Economic Studies, 51, 231{241.

Heckman, J.J. and C.R. Taber (1994), \Econometric mixture models and more general

models for unobservables in duration analysis", Statistical Methods in Medical Research,

3, 279{302.

Holland, P.W. (1986), \Statistics and causal inference", Journal of the American Statis-

tical Association, 81, 945{960.

Holt, J.D. and R.L. Prentice (1974), \Survival analysis in twin studies and matched pair

experiments", Biometrika, 61, 17{30.

Honor�e, B.E. (1993), \Identi�cation results for duration models with multiple spells",

Review of Economic Studies, 60, 241{246.

Horowitz, J.L. (1999), \Semiparametric estimation of a proportional hazard model with

unobserved heterogeneity", Econometrica, 67, 1001{1028.

38

Hougaard, P., B. Harvald, and N.V. Holm (1992), \Assessment of dependence in the life

times of twins", in J.P. Klein and P.K. Goel, editors, Survival Analysis: State of the

Art, Kluwer Academic Publishers, Dordrecht.

Kalb eisch, J.D. and R.L. Prentice (1980), The Statistical Analysis of Failure Time Data,

Wiley, New York.

Lancaster, T. (1990), The Econometric Analysis of Transition Data, Cambridge Univer-

sity Press, Cambridge.

Lillard, L.A. (1993), \Simultaneous equations for hazards", Journal of Econometrics, 56,

189{217.

Lillard, L.A. and C.W.A. Panis (1996), \Marital status and mortality: The role of health",

Demography, 33, 313{327.

Lok, J.J. (2001), Statistical Modelling of Causal E�ects in Time, PhD thesis, Division of

Mathematics and Computer Science, Faculty of Sciences, Free University, Amsterdam.

Maddala, G.S. (1983), Limited-dependent and qualitative variables in econometrics, Cam-

bridge University Press, Cambridge.

Miller, D.R. (1977), \A note on independence of multivariate lifetimes in competing risks

models", Annals of Statistics, 5, 576{579.

Mortensen, D.T. (1977), \Unemployment insurance and job search decisions", Industrial

and Labor Relations Review, 30, 505{517.

Mortensen, D.T. (1986), \Job search and labor market analysis", in O. Ashenfelter and

R. Layard, editors, Handbook of Labor Economics, North-Holland, Amsterdam.

Neyman, J. (1923), \On the application of probability theory to agricultural experiments.

Essay on principles.", Roczniki Nauk Rolniczych, 10, 1{51 [in Polish; edited and trans-

lated version of Section 9 by D.M. Dabrowska and T.P Speed (1990), Statistical Science,

5, 465{472].

Pearl, J. (2000), Causality: Models, Reasoning, and Inference, Cambridge University

Press, Cambridge, UK.

Ridder, G. (1990), \The non-parametric identi�cation of generalized accelerated failure-

time models", Review of Economic Studies, 57, 167{182.

Ridder, G. and I. Tunal� (1999), \Strati�ed partial likelihood estimation", Journal of

Econometrics.

Robins, J.M. (1998), \Structural nested failure time models", in P. Armitage and T. Colton,

editors, The Encyclopedia of Biostatistics, John Wiley and Sons, Chichester.

Rubin, D.B. (1974), \Estimating causal e�ects of treatments in randomized and nonran-

domized studies", Journal of Educational Psychology, 66, 688{701.

39

Tsiatis, A. (1975), \A nonidenti�ability aspect of the problem of competing risks", Pro-

ceedings of the National Academy of Sciences, 72, 20{22.

Van den Berg, G.J. (1990), \Nonstationarity in job search theory", Review of Economic

Studies, 57, 255{277.

Van den Berg, G.J. (1995), \Explicit expressions for the reservation wage path and the un-

employment duration density in nonstationary job search models", Labour Economics,

2, 187{198.

Van den Berg, G.J. (2001), \Duration models: Speci�cation, identi�cation, and multi-

ple durations", in J.J. Heckman and E. Leamer, editors, Handbook of Econometrics,

Volume V, North Holland, Amsterdam.

Van den Berg, G.J., A. Holm, and J.C. van Ours (2001), \Do stepping-stone jobs ex-

ist? Early career paths in the medical profession", Journal of Population Economics

[forthcoming].

Van den Berg, G.J., B. van der Klaauw, and J.C. van Ours (1998), \Punitive sanctions

and the transition rate from welfare to work", Working paper, Tinbergen Institute,

Amsterdam.

Vella, F. and M. Verbeek (1997), \Using rank order as an instrumental variable: An

application to the return to schooling", Working paper, Catholic University Leuven,

Leuven.

40

the - diw.de · tor of the treatmen t e ect. the timing of ev en ts con v eys useful information on...

Documents