the - diw.de · tor of the treatmen t e ect. the timing of ev en ts con v eys useful information on...
TRANSCRIPT
The Non-Parametric Identi�cation of
Treatment E�ects in Duration Models
Jaap H. Abbring �
Gerard J. van den Berg y
Revised version, December 28, 2001
Abstract
This paper analyzes the speci�cation and identi�cation of causal bivariate duration
models. We focus on the case in which one duration concerns the point in time
a treatment is initiated and we are interested in the e�ect of this treatment on
some outcome duration. We de�ne \no anticipation of treatment" and relate it to a
common assumption in biostatistics. We show that (i) no anticipation, and (ii) ran-
domized treatment assignment can be imposed without restricting the observational
data. We impose (i) but not (ii) and prove identi�cation of models that impose some
structure. We allow for dependent unobserved heterogeneity and we do not exploit
exclusion restrictions on covariates. We provide results for both single-spell and
multiple-spell data. We also develop an intuitive strati�ed partial likelihood estima-
tor of the treatment e�ect. The timing of events conveys useful information on the
treatment e�ect.
�Department of Economics, Free University Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam,
The Netherlands. Phone/fax: +31-20-444-6030/6005. Email: [email protected] of Economics, Free University Amsterdam, Tinbergen Institute, CEPR, and IFAU
Uppsala. Email: [email protected].
Keywords: program evaluation, bivariate duration analysis, selectivity bias, hazard rate, partial
likelihood, unobserved heterogeneity, anticipation.
JEL codes: C14, C31, C41.
Thanks to Jim Heckman, John Kennan, Tony Lancaster, Arthur Lewbel, Geert Ridder, Michael
Visser, Ed Vytlacil, the editor and anonymous referees for useful comments and suggestions. Also thanks
to seminar participants at University College London, NYU, IFAU Uppsala, University of Chicago,
University of Iowa, INSEE-CREST, Tinbergen Institute Amsterdam, and participants at various
conferences. Jaap Abbring acknowledges �nancial support by the Royal Netherlands Academy of Arts
and Sciences.
1 Introduction
This paper discusses the speci�cation and identi�cation of causal bivariate duration mod-
els in economics. Consider a subject in a certain state. After a stochastic amount of time,
the subject leaves this state. A di�erent type of event may occur at some other random
time. We are interested in the causal e�ect of the latter event on the duration in the state
of interest. Examples include the e�ect of training programs or punitive bene�ts reduc-
tions on unemployment durations, the e�ect of the hiring of replacement workers on strike
durations and the e�ect of promotions on tenure. In biostatistics one is often interested in
the e�ect of a speci�c medical treatment. We borrow this terminology and refer to training
programs, punitive bene�ts reductions, etcetera, as \treatments", to their causal e�ects
on the outcome durations as \treatment e�ects" and to the subjects as \individuals".
At the core of the paper is a discussion of the causal relation between two durations,
the outcome duration Y and the treatment time S. A duration is simply a non-negative
random variable and it may seem that we can apply standard models for relating any
two non-negative random variables. However, durations are special in the sense that they
are inherently dynamic objects. This can be formalized by considering the corresponding
counting processes. In our case, we have an outcome process fNY (y); 0 � y < 1g such
that NY (y) = 0 if y < Y and NY (y) = 1 if y � Y and a similarly de�ned treatment process
fNSg. A model in terms of these processes, rather than the corresponding durations,
facilitates an explicit discussion of the dynamic e�ects of treatment. A practical matter
is that treatment may not be observed, or even well-de�ned, after the subject has left the
state of interest. This is more easily discussed in an explicitly dynamic model. The same
is true for censoring and time aggregation.
We specify this model in terms of the corresponding hazard rates. The outcome hazard
�Y (yj�) at time y is the intensity of leaving the state of interest at y given that the
subject is still in this state at y and conditional on the treatment history and other
relevant conditions (which are represented by a \�"). The hazard rate is the focal point
and basic building block of econometric duration models. Dynamic economic theories
often aim to explain the rate of leaving a state of interest at y in terms of economic
behavior and external conditions (see Van den Berg, 2001, for a survey). An analogy with
discrete choice models may clarify this. Let NY (y�) := limt"yNY (t). The outcome process
increments dNY (y) := NY ((y + dy)�) � NY (y�) form a sequence of binary choices. We
have dNY (y) = 1 if Y 2 [y; y + dy) and dNY (y) = 0 if y 62 [y; y + dy). The hazard can
be interpreted by the heuristic �Y (yj�)dy = Pr(dNY (y) = 1jNY (y�) = 0; �) (see Fleming
and Harrington, 1991). This is the continuous-time equivalent of a conditional choice
probability.
We construct a causal model that speci�es how the outcome varies if we hypothetically
manipulate the treatment time. The model also speci�es a (not necessarily randomized)
treatment process. Biostatistical studies routinely impose a \consistency" condition that
1
can be interpreted as claiming that each cause should precede its e�ect (e.g. Robins, 1998,
and Lok, 2001). Formally, it requires that the outcome paths corresponding to any two
treatment times coincide up to a duration y if the treatment paths are the same up to
y. In more familiar terms, the subject should be in the same outcome state at time y
if both treatments start after y. We impose a weak version of this condition that only
requires that the corresponding hazard paths coincide up to y. In economics this embodies
a substantial informational and behavioral assumption. It requires that individuals either
have no access to information on treatment beyond that encoded in the treatment history
fNS(t); 0 � t < yg or simply do not act on that information, at each time y. Otherwise,
there are anticipatory e�ects of future treatment and our consistency condition is violated.
Therefore, we refer to this condition as a \no anticipation" assumption.
Rational forward-looking individuals may be expected to exploit information on the
(perceived) properties of the treatment assignment process. In that case, changes in these
properties may a�ect the hazard rate out of the state of interest before any actual real-
ization of a treatment. Our model framework enables the speci�cation and interpretation
of such e�ects. Our analysis, however, focuses on contrasting outcomes between di�erent
realized treatment times for a given assignment mechanism. This is already an ambi-
tious task. We prove a basic non-identi�cation theorem that states that no-anticipation
and randomized assignment of treatment can be imposed on the causal model without
restricting the observational data. This highlights two inference problems. First, we can-
not identify anticipation e�ects of treatment. Second, we have the standard econometric
selection problem. Without further assumptions, we cannot distinguish between causal
e�ects and selection e�ects. In the identi�cation analysis we take the no-anticipation as-
sumption as fundamental and we only address the selection issue. It should be judged in
each application whether it makes sense to assume that the observed treatment cannot
be anticipated.
Identi�cation requires further structure. We build on the assumption that all selection
e�ects can be captured by related observed and unobserved covariates in the treatment and
outcome processes. We examine single-spell and multiple-spell settings. Single-spell data
provide information on a single outcome spell and possibly treatment for each individual.
We prove identi�cation of two bivariate duration models that allow for time-varying and
heterogeneous treatment e�ects and that borrow structure from the well-known non-
parametric1 mixed proportional hazard (MPH) model. MPH models are the most popular
reduced-form duration models in econometrics (see Van den Berg, 2001, for a survey).
With multiple-spell data we observe multiple outcome spells, with or without treat-
ment, for each individual or within groups of individuals who share the same unobserv-
able characteristics. We prove identi�cation of multiple-spell model generalizations. In
particular, we can allow for general non-proportionalities. The single-spell results concern
1In Subsection 3.2 we justify this terminology.
2
bivariate models that embed a marginal model for the treatment process, or a \selection
equation". For the multiple-spell case we show that it is suÆcient to specify the condi-
tional outcome model in order to identify treatment e�ects. As a by-product, we derive a
simple and intuitive strati�ed partial likelihood (SPL) estimator of the treatment e�ect.
This paper contributes to the extensive literature on the econometric evaluation of
social programs and treatment e�ects (see Heckman, LaLonde and Smith, 1999, for an
overview). This literature provides relatively little insight in problems with dynamically
assigned treatments and duration outcomes. We provide some guidance in tackling such
problems, which are frequently encountered in empirical research. We also complement the
closely related biostatistical literature (e.g. Robins, 1998). This literature allows for more
complex dynamic treatment regimes, but is not tailored to economic problems and data.
We o�er a dynamic economic perspective on the causal model and identi�cation results
for models with unobserved selection e�ects that cannot be found in the biostatistical
literature.
Our identi�cation results also extend the existing literature on the non-parametric
identi�cation of MPH-type duration models (see Elbers and Ridder, 1982, Heckman and
Singer, 1984, Heckman and Honor�e, 1989, Honor�e, 1993, and the surveys in Heckman
and Taber, 1994, and Van den Berg, 2001). The conditional outcome parts of the models
can be interpreted as univariate duration models with an endogenous time-varying re-
gressor, the treatment process, where the endogeneity originates from a dependence on
unobserved characteristics. We demonstrate that these models are identi�ed under similar
assumptions as those in the literature.
The empirical literature contains studies in which models are estimated that are similar
to the models we consider. For example, Eberwein, Ham and LaLonde (1997), Abbring,
Van den Berg and Van Ours (1997), and Van den Berg, Van der Klaauw and Van Ours
(1998) study the e�ect of a treatment of unemployed workers on the transition rate from
unemployment to work, allowing for a causal e�ect on the transition rate as well as for
related unobserved heterogeneity in order to deal with selectivity. Lillard (1993) estimates
a model for the joint durations of marriage and time until conception of a child, and
his model allows the rate at which the marriage dissolves to shift to another level at
moments of child birth. Lillard and Panis (1996) estimate a model on the joint durations
of marriage, non-marriage, and life, and their model allows the death rate to shift to
another level at moments of marriage formation and dissolution. These studies also allow
for related unobserved heterogeneity.
We do not impose exclusion restrictions on covariates, i.e. we do not require that
the data contain a variable that a�ects the treatment assignment but does not a�ect the
outcome of interest other than by way of the treatment. A variable that is observed by the
analyst is often also observable to the individuals under consideration. If such a variable
a�ects the treatment process then a rational individual will take this variable into account
in determining his optimal strategy. This behavioral response in turn a�ects the rate at
3
which the individual leaves the state of interest. As a result, exclusion restrictions are
diÆcult to justify and instrumental variable methods of inference are not likely to be of
help. A practical issue is that the latter methods are often not tailored to our dynamic
framework. One may be tempted to time-aggregate the treatment process into a binary
indicator for treatment in an interval and apply standard methods for binary treatments.
However, this way we would lose track of the parameters of interest and throw away
information that we need for identi�cation.
We also avoid conditional-independence assumptions, which require that all selectivity
in the assignment of treatment can be controlled by conditioning on observed covariates.
In many econometric applications, unobserved heterogeneity in treatment assignment and
outcomes cannot be excluded. By the argument against instrumental variables assump-
tions above, any unobserved heterogeneity in treatment assignment is likely to imply
unobserved selectivity and violations of conditional independence.
Finally, we show that multiple-spell data are similar to linear panel data. The intuition
for identi�cation in linear panel-data models carries over to our non-linear models.
The outline of the paper is a follows. Section 2 develops the causal framework and the
basic non-identi�cation result. Sections 3 and 4 derive the identi�cation results for the
single-spell and multiple-spell cases, respectively. Section 5 provides discussion. We pro-
vide some economic-theoretical justi�cation of the proportional hazard model structure.
We also discuss the applicability of the models as reduced form model frameworks, with
extensive examples. Finally, we review the identi�cation from the perspective of more
conventional latent variable and panel-data models. The Appendix contains the proofs.
2 A framework for causal inference
2.1 A potential outcome framework
Consider a population of individuals owing into a state of interest and the durations
these individuals subsequently spend in that state. We are interested in the causal e�ect
of a single binary treatment that is either assigned at some time in R+ := [0;1) after
entering the state or not assigned at all. We can cast this problem in the standard potential
outcome framework pioneered by Neyman (1923) by recognizing that our dynamically
assigned binary treatment can be reinterpreted as a set of mutually exclusive treatments
indexed by R+ := R+ [f1g. Here, the point1 represents the no-treatment case. To each
treatment s 2 R+ corresponds a random variable Y �(s) � 0, the potential outcome in the
case that we would intervene and assign treatment s. Causal inference is concerned with
contrasting potential outcomes corresponding to di�erent treatments. As the treatments
are mutually exclusive, all but one of the potential outcomes are counterfactual. This
is what Holland (1986) calls the \fundamental problem of causal inference". A solution
is to impose suÆcient structure on the model to allow for causal inference from actual
4
(experimental or observational) data. The inherently dynamic setting of our model and
its application in economics call for a separate discussion.
Unlike Neyman and much of the more recent literature (e.g. Rubin, 1974, Holland,
1986, and Heckman, LaLonde and Smith, 1999), we have to deal with in�nitely| actually
uncountably| many potential outcome variables.2 To facilitate a careful discussion, we
explicitly introduce the probability space (;F ;P) on which all random variables are
de�ned. Treatments are assigned according to an R+-valued random variable S. The
collection fY �g := fY �(s); s � 0g of all potential outcomes is a R+ -valued stochastic
process, which we require to be measurable. The actual outcome is Y := Y �(S); all other
potential outcomes are counterfactual. Measurability of fY �g ensures that Y is a random
variable (by application of Billingsley, 1995, Theorem 13.1).
Now consider a sample of n individuals, named 1; 2; : : : ; n. We assume that potential
outcomes and treatments (fY �i g; Si) are independent across individuals i, and distributed
like (fY �g; S) above. For any individual i, we only observe the outcome Yi corresponding to
the actual treatment Si, and we cannot contrast this outcome to any of the counterfactual
outcomes. This is Holland's fundamental problem of causal inference again. We are forced
to tone down our ambitions, and focus on inference of some causal parameters, for example
the means �(s) := E [Y �(s)], instead. For now, suppose that �(s) <1 for all s 2 R+ and
that we have been able to ensure randomized assignment, so that fY �g??S.3 Then, we have
that �(S) = E [Y jS] almost surely. So, under some additional smoothness assumptions,
we can estimate � by standard non-parametric regression techniques.
2.2 Dynamics and information
By adopting Neyman's static framework we have ignored the fact that our problem is
inherently dynamic. To facilitate an explicit discussion of dynamics, we assume that Y �(s)
is continuous and focus on its hazard rate, which is given almost everywhere by
�Y �(s)(y) = limdy#0
Pr(Y �(s) 2 [y; y + dy)jY �(s) � y)
dy:
The corresponding integrated hazards are �Y �(s)(y) =R y0�Y �(s)(u)du <1 for all y 2 R+ ,
with limy!1�Y �(s)(y) = 1. It is well-known, and easy to check, that �Y �(s)(Y�(s))
2Freedman (2001) and Gill and Robins (2001), however, discuss potential outcome models with con-
tinuous treatments. Also, structural models routinely allow for causal e�ects of continuous variables. See
Heckman (2000) for a review of structural models and causality in economics and Pearl (2000) for a
discussion of the link between structural models and potential outcome models.3Note that at this stage we have not speci�ed a model, so that we can not formulate randomized
assignment in a di�erent way (see Heckman, LaLonde and Smith (1999) for binary analogues). If fY �g
is not independent of S then there is selectivity. Of course, the issue whether fY �g is independent of S
or not is entirely di�erent from the issue whether Y �(s) varies with s. In the latter case there is a causal
e�ect.
5
has a unit exponential distribution. This suggests the following causal model. Explicitly
including the sample point ! as an argument, let
Y �(s; !) = ��1Y �(s) (E(s; !)) ;
with ��1Y �(s)(�) = inffy 2 R+ : �Y �(s)(y) � �g the generalized inverse of �Y �(s) and E(s; �),
s 2 R+, unit exponential random variables. Then, it is easy to derive that Y �(s; �) is
indeed a continuous random variable with integrated hazard �Y �(s), for given s 2 R+.
We could use the following causal interpretation. First, nature selects an ! according
to the model (;F ;P). Then, we intervene and set s, and thus �Y �(s) and E(s; �). Nature
does not re-randomize after our intervention, so ! is �xed throughout the experiment.
As the distribution of E(s; �) is invariant to our choice of s, the potential integrated
hazards �Y �(s) have a causal interpretation. We deliberately avoid imposing invariance
of the actual realization E(s; !), so that we are not forced to specify joint distributions
of counterfactual outcomes. This is consistent with our goal of only inferring potential
integrated hazards �Y �(s), but not realized outcomes Y �(s; !).
Now that we have explicitly stated the causal structure of our model and our lack
of interest in fully specifying joint distributions of counterfactuals, explicitly allowing for
dependence of E on s and carrying along the argument ! are unnecessarily cumbersome.
There is no substantial loss in simply specifying the model as Y �(s) = ��1Y �(s)(E), with E a
unit exponential random variable.4 By analogy to Freedman (2001), we give a causal inter-
pretation to �Y �(s) or, similarly, the implied stochastic kernel K(s; A) = Pr(��1Y �(s)(E) 2
A) for s 2 R+ and A 2 F . If we would again have randomized assignment and observe the
distribution of Y conditional on S, potential integrated hazards can be estimated using
standard hazard regression techniques (see e.g. Fleming and Harrington, 1991).
We are now ready to discuss more substantive dynamic issues. Heuristically, �Y �(s)(y)dy
is the probability that the outcome spell ends in a small interval [y; y + dy), conditional
on survival up to y and under treatment s. For each treatment s 2 R+, introduce a
dynamic treatment indicator fNs(y); 0 � y < 1g such that Ns(y) = 0 if y < s and
Ns(y) = 1 if y � s. Then, fNSg is a counting process that tracks the actual treatment
status over time. An issue that is central to a behavioral science like economics is the dy-
namic accumulation of information concerning treatment by the individual. As a routine
\consistency" condition, biostatisticians typically impose that the outcomes up to some
time y only depend on the treatment history fNs(u); 0 � u < yg, for a given assigned
treatment s 2 R+ (Robins, 1998, and Lok, 2001). A weak version of this condition re-
quires that hazard rates for any two treatments s; t 2 R+ coincide almost everywhere up
to the �rst of the two treatments, minfs; tg. This consistency condition seems natural, by
simply requiring that cause precedes e�ect. However, in economics outcomes are typically
4We can ensure that Y � is measurable by requiring that (s; e) 7! ��1Y �(s)(e) is measurable (again by
Billingsley, 1995, Theorem 13.1).
6
a�ected by the behavior of individuals acting on information about treatments. The po-
tential outcome model discussed so far is incomplete because it only discusses variation
in actual treatments and does not specify how information on the treatment accumulates
to the individuals over time. The consistency condition above can be interpreted as the
informational and behavioral assumption of \no anticipation",
Assumption 1. (no anticipation) For all s; t 2 R+,
�Y �(s)(y) = �Y �(t)(y) for all y � minfs; tg:
A suÆcient condition for Assumption 1 is that the individual's information at any time y
only varies with s through the treatment history fNs(u); 0 � u < yg whenever we would
intervene and manipulate s. Subsection 2.4 discusses a clarifying structural example.
Suppose that individuals are informed about the treatment time at the start of the
spell. If individuals act on this information and their actions a�ect outcomes, there are an-
ticipatory e�ects and potential hazards corresponding to di�erent treatments will diverge
from time 0 onwards. Similar but subtler patterns arise whenever individuals dynamically
accumulate treatment-speci�c information that strictly includes the actual treatment his-
tory. An example is that the treatment time is revealed at some �xed interval before the
treatment time. In either of these \anticipation" cases, the consistency condition above
is violated and we seem to end up with violations of the rule that any cause precedes its
e�ect. However, one should properly line up causes and e�ects. The above anticipatory
e�ects are not e�ects of the actual treatment, but of the revelation of information about
the treatment. So, if we extend the potential outcome model so that it recognizes all rel-
evant causes, notably information shocks, consistency should hold. After all, information
shocks cannot be anticipated.
As stated at the top of this section, we focus on a single treatment in this paper. If
we observe the timing of all information shocks, we can use a multiple-treatment exten-
sion of our model that is not hard to envision. In contrast, in the next subsection we
show that it is generally impossible to identify anticipatory e�ects if we do not know
how information about treatment evolves. Like statisticians, we therefore take a consis-
tency condition like Assumption 1 as primitive. A practical implication is that meaningful
contrasts of potential outcomes to the no-treatment baseline can be de�ned without al-
lowing for defects in the distribution of S. After all, Assumption 1 implies that �Y �(s)
coincides with the no-treatment integrated-hazard �Y �(1) on [0; s] for all s 2 R+ and that
lims!1�Y �(s)(y) = �Y �(1)(y) for all y 2 R+ .
It is important to stress that Assumption 1 does not exclude that forward-looking
individuals act on properties of the treatment process. It only requires that such e�ects
are the same between interventions manipulating the future treatment status. In that
sense, our potential outcome model is a model for contrasting treatments for a given
information structure. It does not allow us to contrast outcomes to outcomes in a world
7
without treatment. One of the contributions of this paper is to make these problems
explicit in a dynamic setting. Heckman, LaLonde and Smith (1999) have addressed similar
issues within the context of the static binary-treatment potential-outcome model. They
di�erentiate between the no-treatment outcome in a world with treatments and outcomes
in a world without treatments. However, considerations like this are usually absent from
the statistical literature.
2.3 A basic non-identi�cation result
So far, we have assumed that the econometrician always observes S and therefore fNSg.
However, in many applications treatment is irrelevant and/or unobserved if it occurs after
the spell of interest has ended. From now on, we explicitly entertain the possibility that
we only observe the treatment history fNS(y); 0 � y < Y g realized at Y . Assume that
S and fY �g are such that Pr(Y = S) = 0 (see also Assumption 2 below). In this case, a
large data set would provide
QS(y; s) := Pr(Y > y; S > s; Y > S) and QY (y) := Pr(Y > y; Y < S) (1)
for all (y; s) 2 R2+ . These are the sub-survival-functions of (Y; S) and Y for the sub-
populations with respectively Y > S and Y < S. So, we have
De�nition 1. Two causal models (fY �g; S) and (feY �g; eS) are observationally equivalentif QS = eQS and QY = eQY , where eQS and eQY are de�ned by (1) for eY := eY �(eS) and eS.Note that the distribution of the identi�ed minimum of (Y; S), i.e. the smallest of Y and
S joint with the identity of this smallest duration, is fully characterized by (Q0S; QY ), with
Q0S(s) = QS(�1; s) for all s 2 R+ (Tsiatis, 1975). We will occasionally exploit that our
models of (QS; QY ) embed competing risks models for (Q0S; QY ).
In the Appendix we show that any data pair (QS; QY ) can be supported by a causal
model that satis�es Assumption 1 and randomized assignment.5 This implies
Proposition 1. To each causal model (fY �g; S) corresponds an observationally equiva-
lent model (feY �g; eS) that satis�es Assumption 1 and randomized assignment (feY �g??eS).Proposition 1 provides two insights. First, without further structure, we cannot distin-
guish between causal e�ects and spurious selection e�ects. This is the standard selection
problem. Second, we cannot distinguish between potential outcome models with and with-
out anticipation (as de�ned by Assumption 1). Throughout this paper we assume that
Assumption 1 is true, and we focus on separating causal and spurious e�ects. So, we
address the �rst non-identi�cation problem, but not the second.6
5Gill and Robins (2001) provide a related result for a di�erent model.6Alternatively, it is easy to derive non-parametric identi�cation results under the assumption of ran-
domized assignment, but without restrictions on anticipatory e�ects. This is beyond the scope of this
paper.
8
We impose the necessary structure by explicitly modeling selection e�ects as related
e�ects of observed and unobserved (to the econometrician) covariates on outcomes and
assignment. We rely on
Assumption 2. For some random vectors X of observed and V of unobserved covari-
ates, fY �g??Sj(X; V ) and the distribution of (Y; S)j(X; V ) is absolutely continuous with
respect to Lebesgue measure on R2+ .
Assumption 2 requires that all selection e�ects can be captured by a �nite-dimensional
vector of covariates. It is reminiscent of a standard conditional-independence assumption,
but it allows for conditioning on both observed and unobserved covariates. The covari-
ates should include all joint determinants of outcomes and assignment, including any
information that triggers relevant behavioral responses. The restriction to time-invariant
covariates is unnecessarily strong for the exposition of the causal model and we could eas-
ily extend the analysis to external time-varying covariates.7 However, in our identi�cation
analysis we will only deal with time-invariant covariates, which is in line with most of the
literature on the identi�cation of duration models (see Heckman and Taber, 1994, and
Van den Berg, 2001, for surveys). The main reason to restrict attention to time-invariant
observed covariates in this literature and our paper is expositional convenience. Results
by Honor�e (1991) and Heckman and Taber (1994) suggest that identi�cation often bene-
�ts from any additional variation of the observed covariates over time. In this sense, our
analysis provides a baseline that can be enriched by extensions to observed time-varying
covariates. Note that, in contrast, we cannot deal with time-varying unobserved covariates
(unless the covariate paths are common within strata; see Section 4). In the single-spell
case we are e�ectively dealing with cross-sectional data. Identi�cation of models with
time-varying unobservables is an unreasonable goal to pursue and would take us well
beyond the current practice in empirical research.
We are now ready to present the causal model that underlies our identi�cation anal-
ysis in the following sections. Throughout, we assume that we have access to vectors
of covariates X and V as in Assumption 2. We focus on inference conditional on the
observed covariates X. Assumption 1 is assumed to hold conditional on (X; V ). In the
sequel, it is understood that we mean this slightly stronger assumption if we refer to As-
sumption 1. Denote the integrated Y �(s)-hazard conditional on (X; V ) by �Y �(s)(tjX; V ).
Let ��1Y �(s)(�jX; V ) = inffy 2 R+ : �Y �(s)(yjX; V ) � �g for s 2 R+. We have that
�Y �(s)(Y�(s)jX; V ) is unit exponential and independent of (X; V ). It is also independent
of S by Assumption 2. So, we can now specify
Y �(s) = ��1Y �(s) (EjX; V ) ;
7See Kalb eisch and Prentice (1980, Section 5.3), Heckman and Taber (1994), and Abbring (2001a,
Section 3.2) for discussions of external covariates in survival analysis.
9
with E a unit exponential random variable such that E??(S;X; V ). A causal interpreta-
tion of �Y �(s) follows as before, with the additional requirement that (X; V ) is invariant to
the intervention. Nature sets (X; V ) and E by picking a sample point from according
to (;F ;P). (X; V ) represents heterogeneity inherent and revealed immediately to the
individuals. E is an ex post random shock on which information only evolves naturally as
time proceeds. In this sense we allow for randomness at the individual level like Freed-
man (2001) and unlike for example Holland (1986). Because (X; V ) is invariant to the
intervention, it preserves the perceived assignment process. At each point in time t, the
model allows individuals to form expectations based on the covariates (X; V ) and the as-
signment and transition histories. In equilibrium, these expectations should be consistent
with actual assignment, Sj(X; V ).
Assumptions 1 and 2 provide some additional structure, but not enough to ensure iden-
ti�cation. After all, Assumption 2 allows for selective assignment through the unobserved
covariates V . By Proposition 1 (applied conditional on X) there exists an observationally
equivalent model such that fY �g??SjX. Rather than providing identi�cation directly,
Assumptions 1 and 2 are meant to provide a point of departure for a variety of identi-
fying assumptions. With single-spell data, further (MPH-like) separability assumptions
are unavoidable. This is explored in Section 3. In Section 4, we show how multiple-spell
data can be used to control for selection on unobservables in a much more exible model
framework.
We have focused on causal e�ects of treatment on outcomes and not the other way
around. This asymmetry is not important for our results. We could extend our analysis to
a simultaneous causal model for S and Y by adding a \potential treatment process" that
satis�es a consistency condition like Assumption 1. By analogy to the interpretation of
Assumption 1, we could interpret the new consistency condition as excluding anticipatory
e�ects of the outcome on treatment. As a practical matter, the consistency conditions
would imply a convenient recursive structure and guarantee the absence of coherency
problems. They would also provide a foundation for an asymmetric analysis with the
above model in the case that we only observe fNS(y); 0 � y < Y g and not S itself.
2.4 Example: the e�ect of time-varying bene�ts on the unem-
ployment duration
In this subsection we illustrate the non-identi�cation result in Proposition 1 in the con-
text of a speci�c economic-theoretical (job search) model framework. The treatment is a
reduction of the bene�ts level of an unemployed individual at time S and the outcome Y
is the unemployment duration. We contrast two job search models. Both models assume
rational expectations, but in the �rst model individuals have perfect foresight concerning
the realization of S whereas in the second model they only know the stochastic assign-
ment rule. In other words, in the �rst model there is full anticipation whereas in the
10
second there is no anticipation. We demonstrate that the models can be observationally
equivalent. The Appendix contains derivation details.
To shape thoughts, let the bene�ts reduction be a punishment in response to a violation
of minimum requirements on search e�ort (see Grubb, 1999, for a review of such sanction
policies in OECD countries). In reality, the realization of S may not be known to anyone
in advance, for example because of imperfect monitoring and discretionary behavior of
the case worker. Alternatively, individuals may anticipate a sanction if they are warned
in advance or if they willingly choose to behave such that they know they will receive
a sanction at realization s of S. Typically, the bene�ts reduction is not observed if it
is realized after the end of the unemployment spell (Abbring, Van den Berg and Van
Ours, 1997). This is what we assume here as well. To facilitate the exposition, we denote
potential outcomes by Y � in both models and we assume that assignment is randomized
(i.e. that S??fY �g). Of course, this does not mean that Y �(s) does not depend on s.
In both models, job o�ers arrive at a Poisson rate. An o�er is an independent draw from
a wage o�er distribution. Once rejected, job o�ers cannot be recalled and, once accepted,
individuals are employed forever at the accepted wage. Individuals aim to maximize their
expected discounted lifetime utility by choosing search e�ort and deciding which o�ers to
accept. The latter decision can be characterized by a reservation wage, which typically
depends on all parameters of the theoretical model. The hazard rate �Y �(s) (which is the
exit rate to work in the case that S = s) equals the product of the corresponding job o�er
arrival rate and the probability that a wage o�er exceeds the reservation wage.
The perfect-foresight model follows Mortensen (1977) and Van den Berg (1990).8 For
given s, the hazard rate of Y �(s) is increasing up to the point y = s, after which it
is constant. It is continuous at the point y = s. By choosing a suitable non-degenerate
distribution for S, the observable pre-sanction hazard rate (mixed over all possible realiza-
tions of S) can be made constant over time. This illustrates the principle that unobserved
heterogeneity mitigates positive duration dependence e�ects on an observable hazard rate.
In the model without anticipation (Abbring, Van den Berg and Van Ours, 1997), the
hazard rate of Y �(s) is constant before and after s, but it jumps upward at s. We now
choose another particular non-degenerate distribution for S. The pre-sanction hazard does
not depend on s, so the observable pre-sanction hazard rate (mixed over all possible real-
izations of S) is now also constant over time. In addition, the other observable quantities
can be made to equal those in the �rst model.
Basically, in the perfect-foresight model anticipation e�ects have to be integrated with
respect to the distribution of possible future sanction times in the cases in which no
sanction is imposed during the outcome spell. The resulting shape of the hazard rate can
also be generated by a model in which individuals do not anticipate and all face the same
8It is usually applied to analyze the e�ect of �nite bene�ts entitlement periods. In that case, S is
typically always observed by the econometrician. Here we use the model as an easy-to-analyze extreme
version of a sanctions model with anticipation.
11
sanction rate. This problem is a manifestation of the problem of distinguishing duration
dependence and heterogeneity.
3 Single-spell data
3.1 Model framework
Single-spell data ideally provide the joint distribution of (Y; fNS(y); 0 � y < Y g), condi-
tional on observed covariates X. In obvious notation, suppose we are given this data in the
form of a collection fQS; QY g := f(QS(�jx); QY (�jx)); x 2 Xg of conditional sub-survival
functions. Here, X � Rk , 1 � k < 1, is the support of X. In Section 2, we have seen
that Assumptions 1 and 2 are not suÆcient to identify the causal model from fQS; QY g.
In this section, we show that the model is identi�ed under additional separability (pro-
portionality) assumptions.
Assumption 2 implies that any dependence between Y and S conditional on (X; V )
stems from causal e�ects of S on Y . Unobserved selection e�ects are fully captured by the
unobserved covariates V . So, it is convenient to specify the model as a mixture of the joint
distribution of (Y; S)j(X; V ) over the distribution of V jX. In turn, the joint distribution of
(Y; S)j(X; V ) is the product of the distributions of Y j(S;X; V ) and Sj(X; V ). So, fQS; QY g
is fully speci�ed by
Model 1a. The hazard rates of Y j(S;X = x; V ) and Sj(X = x; V ) are given by
�Y (tjS; x; V ) =
(�Y (t)�Y (x)VY if t � S
�Y (t)�Y (x)Æ(tjS; x)VY if t > S(2)
and
�S(tjx; V ) = �S(t)�S(x)VS; (3)
respectively. V is an R2+-valued random vector (VY ; VS)
0 distributed with distribution G
independent of x. The following regularity conditions and normalizations hold.
(i). �Y and �S are continuous functions �S : X ! (0;1) and �Y : X ! (0;1) such
that �Y (x�) = �S(x
�) = 1 for some a priori chosen x� 2 X .
(ii). The functions �Y : R+ ! (0;1) and �S : R+ ! (0;1) have integrals
�Y (t) :=
Z t
0
�Y (�)d� <1 and �S(t) :=
Z t
0
�S(�)d� <1
for all t 2 R+ . For some a priori chosen t� 2 (0;1), �Y (t�) = �S(t
�) = 1.
(iii). G is such that Pr(V 2 (0;1)2) > 0.
12
(iv). Æ : R2+ � X ! (0;1) is such that
�(tjs; x) :=
Z t
s
Æ(� js; x)d� <1 and �(tjs; x) :=
Z t
s
�Y (�)Æ(� js; x)d� <1
exist and are continuous on f(�; �) 2 R2+ : � > �g � X .
Equations (2) and (3) have mixed proportional hazard (MPH) speci�cations, except for
the factor involving Æ. An assumption implicit in equation (2) is that Y??VSj(S;X; VY ).
Similarly, equation (3) embodies the assumption that S??VY j(X; VS). One could say that
VY captures the \unobserved heterogeneity" or \frailty" in Y and VS captures the unob-
served determinants of S. The factors �Y (x) and �S(x) are called the \systematic parts"
of the hazards. In applied duration analysis, it is common to specify �i(x) = exp(x0�i),
so that the hazard function is multiplicative in all separate elements of x. Note that we
allow the elements in X to be discrete and/or continuous.
Conditional on (X; V ), the variables Y and S are only dependent through Æ(tjS;X).
Thus, by Assumption 2, this factor can be given a causal interpretation as the \treatment
e�ect" of S on Y . Note that Æ(tjS; x) only enters �Y (tjS; x; V ) at durations t > S. So,
Model 1a satis�es Assumption 1.9 If Æ = 1, treatment is ine�ective. On the other hand,
suppose that Æ equals some constant larger than one. If S is realized then �Y increases by
(� 1) � 100%. This stochastically reduces the remaining outcome duration in comparison
to the case where the treatment is given at a later point of time. More generally, Model 1a
allows the treatment e�ect to depend on elapsed duration t, the treatment time S and the
observed covariatesX. By implication, it may also vary with the time t�S since treatment.
If the treatment itself takes some time, the e�ect of t� S may for instance capture a low
exit rate during treatment. In Subsection 3.3, we discuss an alternative model that allows
Æ to depend on a third unobserved component but not on the treatment time.
Finally, the functions �Y and �S are called the \baseline hazards". The corresponding
hazard rates are said to be duration dependent if the functions �Y and �S are non-
trivial. The functions �Y , �S and � appear naturally whenever we integrate the hazard
rates over their �rst (duration) argument. We do not require that limt!1�Y (t) = 1,
limt!1 �S(t) =1 or limt!1�(tjs; x) =1. Also, we allow VY and VS to have mass points
at 0. Thus, we allow for two sources of defectiveness (see Abbring, 2001b, for discussion).
In particular, this allows for a mass of individuals that never receives treatment.
Recall that, under consistency conditions like Assumption 1, the causal model from
Section 2 can be trivially extended with causal e�ects of the outcome on treatment. Model
1a can similarly be extended by introducing the mirror image of Æ in equation (3). The
9This also ensures that the stochastic process Pr(Y � tjS;X; V ) is measurable with respect to
�(fNS(s); 0 � s < tg; X; V ) and Pr(Y � tjS;X; V ) = Pr(Y � tjfNS(s); 0 � s < tg; X; V ). We may
therefore apply the conventional exponential representation of the survivor function (Fleming and Har-
rington, 1991).
13
resulting symmetric model is an extension with observed covariates, duration dependence
and unobserved heterogeneity of the bivariate exponential model developed by Freund
(1961). An example discussed by Freund is the bivariate distribution of the failure times
of the two engines of a jet. Failure of one engine increases the stress on, and therefore the
failure rate of, the remaining engine. This e�ect is symmetric. In Section 5, we provide
some examples of economic applications of a symmetric model. In such applications, as in
Freund's example, we typically observe not only fQS; QY g, but the full joint distribution
of (Y; S)jX. We need this full distribution to identify causal e�ects in both directions. In
this paper, we focus on the asymmetric case in which we only observe fQS; QY g and the
treatment process fNSg may not even be well-de�ned after Y . This is why we use the
asymmetric terms \treatment" and \outcome" throughout.
3.2 Identi�cation
The components of Model 1a that can be freely varied are �Y , �S, �A, �S, � and G (we
take the support X of X as given). If we specify each of these components, we have fully
pinned down the the data fQS; QY g. Our identi�cation analysis is non-parametric in the
sense that we allow each of the model components to vary in in�nite-dimensional func-
tion spaces, restricted by the regularity conditions of Subsection 3.1 and some additional
assumptions. Let M be the collection of all (�Y ;�S; �A; �S;�; G) that are admissible
under a given set of assumptions. We simply refer to M as the \model" (under a given
set of assumptions and for a given structure as in Model 1a) and to each M 2 M as
a \speci�cation" of the model. Each speci�cation M 2 M maps into exactly one data
collection fQS; QY g. By analogy to De�nition 1, two speci�cations M;fM 2 M are said
to be observationally equivalent if the corresponding data fQS; QY g and f eQS; eQY g are
equal. Thus, we have
De�nition 2. A model M is identi�ed (from fQS; QY g) if observational equivalence of
any two speci�cations M;fM 2 M implies that M = fM .
IfM is identi�ed, then we also say that the components �Y , etcetera, are identi�ed.
Recall that, in the case of Model 1a, � is only de�ned, and only needs to be identi�ed,
on f(�; �) 2 R2+ : � > �g � X . Once we know �, we can identify Æ(tjs; x) for almost all
t > s, for all s 2 R+ and x 2 X . Because Æ only enters the model through the factor
Æ(tjS; x)NS(t�), only this restriction of Æ to f(�; �) 2 R2+ : � > �g � X is relevant. Also
note that we focus on non-parametric identi�cation of the maps �Y and �S. In practice,
one may start o� with a parametric speci�cation of, for example, �Y and require that all
parameters can be recovered from its graph f(x; �Y (x)); x 2 Xg. In the case where �Y (x)
is log-linear in x0�i, this requires that X is not contained in a proper linear subspace of
the k-dimensional Euclidian space Rk .
Now, consider the following assumptions.
14
Assumption 3. f(�Y (x); �S(x)); x 2 Xg contains a non-empty open set in R2 .
Assumption 4. E [VY ] <1 and E [VS ] <1.
Note that Assumption 3 does not impose exclusion restrictions on the structural com-
ponents of the hazards, but does require independent variation in these components. If
�Y (x) = exp(x0�Y ) and �S(x) = exp(x0�S) then it is suÆcient that X contains a non-
empty open set in R2 . Assumption 4 is a common assumption in the analysis of single-spell
duration data with MPH-type models (e.g. Elbers and Ridder, 1982).
The identi�cation analysis in this section exploits that Model 1a embeds a competing
risks model for the distribution fQ0S; QY g of the identi�ed minimum. This embedded
model is a standard MPH competing risks model that does not involve the treatment
e�ect Æ. Conditional on the observed covariates X, the competing risks are only dependent
through the unobserved covariates V . Heckman and Honor�e (1989) study the identi�ability
of similar competing risk models. Here, we rely on the following result of Abbring and
Van den Berg (2001a),
Proposition 2. Under Assumptions 3 and 4, �Y , �S, �A, �S and G in Model 1a are
identi�ed from fQ0S; QY g.
This leads to the following main result of this paper,
Proposition 3. Under Assumptions 3 and 4, �Y , �S, �A, �S, � and G in Model 1a are
identi�ed from fQS; QY g.
The proof of this proposition provides some intuition on what drives the identi�cation
of the treatment e�ect. Consider individuals who receive a treatment at s. The natural
control group consists of individuals whose spells have not ended at s and who have not yet
received a treatment. A necessary condition for a meaningful comparison of these groups
is that there is some randomization in the treatment assignment at s. The model allows
for such randomization because we specify assignment by way of the hazard rate of S. The
integrated hazard rateR S0�S(� jx; VS)d� has a unit exponential distribution independently
of (x; V ) (see e.g. Ridder, 1990, and Horowitz, 1999). Thus, there is a random component
in the assignment that is independent of (x; V ). This component a�ects Y only by way
of the treatment.10 (In Section 5 we discuss the interpretation of this component in some
applications.) However, the presence of this component is not suÆcient for a meaningful
comparison. We still have to deal with a selection e�ect: the unobserved heterogeneity
distribution is di�erent between the treatment and control groups at s. This can be
10It is tempting to interpret this random component in terms of the randomness of the outcome of S at
t that remains if the hazard rate �S(tjx; VS) at t is speci�ed. Consider an individual who has not yet been
given a treatment and whose spell has not yet ended, at t. Basically, in a small time interval [t; t+dt), the
probability of treatment is �S(tjx; VS)dt, and the probability of no treatment is 1� �S(tjx; VS)dt. This is
a Bernoulli trial. Given the value of �S(tjx; VS)dt, its outcome is completely random.
15
corrected by exploiting the information in the competing risks part of the data. The
competing risks part of the model determines the distribution of VY in the two groups at
s and the competing risks part of the data enables the identi�cation of this.
The results in our companion paper Abbring and Van den Berg (2001b) provide some
additional intuition for the case of a constant treatment e�ect Æ. We show that single-
spell data are informative on the sign of ln(Æ) even if there is no variation in X. If
treatment and outcome are typically realized very quickly after each other, no matter
how long the elapsed duration in the state of interest before the treatment, then this is
evidence of a positive causal treatment e�ect. The selection e�ect does not give rise to
the same type of quick succession of events. If there is no such pattern but yet there is a
correlation between the points in time of treatment and outcome then this is evidence of a
dominating selection e�ect. Somewhat loosely, local dependence indicates a causal e�ect
whereas global dependence indicates a selection e�ect. This is not surprising given that
the treatment only works after it has been realized, whereas the unobserved heterogeneity
values a�ect the hazard rates everywhere. This illustrates the usefulness of the information
on the timing of events to assess treatment e�ects. We return to this in Subsection 5.3,
where we make a comparison with di�erence-in-di�erence methods of inference.
3.3 Heterogeneous treatment e�ects
Model 1a excludes unobserved heterogeneity in the treatment e�ect. Recent contributions
to the literature on treatment evaluation are often concerned with treatment e�ects that
may depend on unobservables (see e.g. Heckman, LaLonde and Smith, 1999, for a survey).
The following model allows Æ to depend on unobservables, but, unlike Model 1a, excludes
variation of Æ with S.
Model 1b. The hazard rates of Y j(S;X = x; V ) and Sj(X = x; V ) are given by
�Y (tjS; x; V ) =
(�Y (t)�Y (x)VY if t � S
��(t)��(x)V� if t > S(4)
and
�S(tjx; V ) = �S(t)�S(x)VS; (5)
respectively. V is now a R3+-valued random vector (VY ; V�; VS)
0 distributed with distribu-
tion G� independent of x. The relevant regularity conditions and normalizations of Model
1a hold and
(i). �� : X ! (0;1) is such that ��(x�) = 1 for some a priori chosen x� 2 X .
(ii). The function �� : R+ ! (0;1) has an integral
��(t) :=
Z t
0
��(�)d� <1
for all t 2 R+ . For some a priori chosen t� 2 (0;1), ��(t�) = 1.
16
(iii). G� is such that Pr(V 2 (0;1)3) > 0.
The treatment e�ect,
Æ(tjx; VY ; V�) :=��(t)��(x)V��Y (t)�Y (x)VY
;
indeed depends on unobservables, but not on S. Note that Æ is only well-de�ned for VY > 0
here. We could use the point1 to represent the case VY = 0 but, instead, we focus directly
on identi�cation of �Y , ��, �Y , �� and G�.
First, note that Model 1b embeds the same MPH competing risks model as Model 1a,
so that Proposition 2 again applies. We need some additional assumptions,
Assumption 5. f��(x); x 2 Xg contains a non-empty open interval in R.
Assumption 6. E [V�VS] <1.
These new assumptions are simply standard MPH assumptions (e.g. Elbers and Ridder,
1982) on the embedded model for (Y � S)j(S; Y > S;X = x; V ). After all, this is a MPH
model with hazard ��(y)��(x)V� at time y � S (since S). The �nite-mean Assumption
6 applies to V�VS rather than V� because of the conditioning on S.
Proposition 4. Under Assumptions 3{6, �Y , ��, �S, �Y , ��, �S and G in Model 1b
are identi�ed from fQY ; QSg.
The results in this section require separability assumptions, independence of X and V ,
suÆcient variation in the observed covariates and �niteness of the means of V . Although
it is common to make such assumptions in applied duration analysis, they are sometimes
hard to justify (see Subsection 5.2 for an economic-theoretic perspective on the propor-
tionality assumption). To deal with this, we extend our analysis in two directions. In the
next subsection we show that observed covariates, and most of the assumptions above,
are not needed for identi�cation if we have multiple-spell data. In Abbring and Van den
Berg (2001b) we show that single spells are informative on aspects of Æ and G even if
there is no variation in X.
4 Identi�cation in case of multiple-spell data
4.1 Multiple-spell data
In this section, we study identi�cation from data that cover multiple outcome and treat-
ment spells for each individual. We focus on the case in which the data provide information
on the distribution of two full spells over the population of individuals. The extension to
more than two spells is trivial. Actually, the use of the term \individual" is not very ap-
propriate here, because the setup includes cases in which physically di�erent individuals
17
share the same value of V and we observe one duration for each of these individuals. It is
convenient to refer to either such a group of individuals or a single individual for which
we have multiple spells as a stratum. For each stratum, we observe a random draw from
the joint distribution of (Y1; fNS1(y); 0 � y < Y1g; Y2; fNS2(y); 0 � y < Y2g). We add
subscripts 1 and 2 to random variables and functions corresponding to respectively spell
1 and 2 in a stratum. So, Y1 is the outcome duration in the �rst spell, Y2 the outcome
duration in the second spell in a stratum, etcetera. By analogy to (1), the multiple-spell
data distribution can be characterized by
QSS(y1; y2; s1; s2) := Pr(Y1 > y1;Y2 > y2; S1 > s1;S2 > s2; Y1 > S1; Y2 > S2);
QY S(y1; y2; s2) := Pr(Y1 > y1;Y2 > y2; S2 > s2; S1 > Y1; Y2 > S2);
QSY (y1; y2; s1) := Pr(Y1 > y1;Y2 > y2; S1 > s1; Y1 > S1; S2 > Y2)
and
QY Y (y1; y2) := Pr(Y1 > y1;Y2 > y2; S1 > Y1; S2 > Y2)
for all (y1; y2; s1; s2) 2 R4+ . Throughout this section we suppress observed covariates. The
identi�cation results do not require variation in observed covariates. Moreover, the model
allows for full interaction of such covariates with all other determinants. All results can
be interpreted as being conditional on observed covariates.
Unlike single-spell data, multiple-spell data form a true, non-linear panel. As with
linear panel data, the panel structure can be exploited to deal with unobserved hetero-
geneity under conditions that are mild relative to the single-spell (i.e. cross-sectional)
case. Intuitively, we need some separability of the outcome hazards in treatment e�ect
and unobserved covariate components. Then, if the unobserved components are constant
between consecutive spells of each single individual, variation between spells and within
individuals can be used to control for selection e�ects and identify the treatment e�ects.
In the next subsection, we study the multiple-spell analogues of Models 1a and 1b.
These models allow for di�erent baseline hazards and treatment e�ects between spells
within a stratum, but have multiplicative unobserved heterogeneity in the hazards. In
the vein of the literature on identi�cation of multiple-spell duration models (Honor�e,
1993), we explore identi�ability of fully speci�ed bivariate models. In Subsection 4.3, we
focus on identi�cation of the treatment e�ect in models that only specify the conditional
distribution of the outcome spells. Strati�ed partial likelihood arguments are applied to
prove identi�cation of the treatment e�ects in a model that allows for interactions between
duration e�ects and unobserved heterogeneity.
4.2 Identi�cation of the full model
Consider the following multiple-spell analogues of Models 1a and 1b,
18
Model 2a. Multiple spells within a stratum are only dependent through the covariates V ,
i.e. (Y1; S1)??(Y2; S2)jV . The hazard rates of Ykj(Sk; V ) and SkjV are
�Yk(tjSk; V ) =
(�Yk(t)VY if t � Sk
�Yk(t)Æk(tjSk)VY if t > Skand �Sk(tjV ) = �Sk(t)VS; k = 1; 2;
respectively. V := (VY ; VS) has distribution G. Regularity conditions as in Model 1a apply.
For some a priori chosen t� 2 (0;1), �Y1(t�) = �S1(t
�) = 1 (in obvious notation).
and
Model 2b. Same as Model 2a except that
�Yk(tjSk; V ) =
(�Yk(t)VY if t � Sk
��k(t)V� if t > Sk
; k = 1; 2;
and V := (VY ; V�; VS) has distribution G�. For some a priori chosen t� 2 (0;1),
��1(t�) = 1.
The �rst line excludes any causal e�ects across di�erent spells within a stratum. Outcomes
and treatments may be related between spells, but only through common unobservable
components. The normalizations are innocuous because the unobserved covariates can
capture the scale of �rst-period hazards. In both models, the same unobservables enter
the �rst and second period hazards multiplicatively. This suggests that we can exploit the
panel structure of the data, even though the baseline hazards and treatment e�ects are
not speci�ed to be the same in the �rst and second spell of a stratum. Indeed, we have
Proposition 5. �Y1, �Y2, �S1, �S2 and, respectively, �1, �2 and G in Model 2a and
��1, ��2 and G� in Model 2b are identi�ed from (QSS; QY S; QSY ; QY Y ).
The proof exploits that both models embed a two-spell version of the competing risks
model with hazards that are proportional in V and t. Abbring and Van den Berg (2001a)
show that �Y1 , �Y2 , �S1 and �S2 (but not G) in Model 2a or 2b are identi�ed. Unlike the
single-spell case, we do not invoke a competing risks result that also provides identi�cation
of G. This would impose additional conditions on the model that we can circumvent here
by exploiting the additional (non-competing-risk) information in (QSS; QY S; QSY ; QY Y ).
Note that we do not need �nite E [V ]. Also, recall that we implicitly allow observed
covariates X to enter the model in a general way. We could add X as an argument to
each of the model components in Models 1a and 1b and allow V and X to be dependent.
The gain (in terms of identi�cation) from using multiple-spell data is similar to the gain
from using such data for MPH models without treatment e�ects (Honor�e, 1993).
19
4.3 Identi�cation and estimation from a strati�ed partial likeli-
hood
The results in the previous subsection show that multiple-spell data allow for robust
inference on the treatment e�ects under the assumption of proportional unobserved het-
erogeneity that is common within strata. We have provided some intuition by pointing out
the analogy with linear panel regression models with individual-speci�c e�ects. Provided
that there is suÆcient variation of the regressors over time, the \within estimator" can
be used to estimate the regression parameters in such models. This analogy seems to be
tailored to an analysis of the model of Y j(S; V ) only, i.e. without specifying the marginal
distribution of (S; V ). In this subsection, we analyze identi�cation of treatment e�ects in
such a conditional model. We use arguments based on an application of Cox (1972) partial
likelihood to strati�ed data pioneered by Holt and Prentice (1974). This \strati�ed partial
likelihood" (SPL) estimator is indeed the closest we can get to a \within" estimator in a
proportional hazard model.
First, we have to slightly adapt notation. The SPL method allows for general interac-
tions between time and unobserved covariates. Therefore, we replace VY by a (for now)
stochastic process fVY g. It is implicitly understood that fVY g replaces VY in V . We have
Model 2c. Multiple outcome spells within a stratum are only dependent through the co-
variates V and related treatments S1 and S2, i.e. Y1??(Y2; S2)j(S1; V ) and Y2??(Y1; S1)j(S2; V ).
The hazard rate of Ykj(Sk; V ) is
�Yk(tjSk; V ) =
(�Yk(t)VY (t) if t � Sk
�Yk(t)Æk(tjSk)VY (t) if t > Sk; k = 1; 2:
Regularity conditions as in Model 1a apply. In particular �Yk(�)VY (�) is integrable on
bounded intervals. We normalize �Y1 = 1.
The unobserved heterogeneity fVY g only enters hazard rates at t through its current
value VY (t). It is crucial that fVY g is the same between the spells in a stratum. The spell-
speci�c baseline hazards �Y1 and �Y2 allow for di�erent duration dependence patterns
between the spells. The normalization �Y1 = 1 is innocuous, as fVY g can absorb the
duration dependence and scale of �Y1 .
Let Y(1) := minfY1; Y2g be given. Intuitively, the probability that, say, the �rst spell
in the stratum is the shortest, i.e. Y1 = Y(1), does not depend on the factor fVY g that is
common between the spells in the stratum. It only depends on the duration dependence
factor �Y2 and the treatment e�ects, which may be di�erent between the spells if the
20
treatment histories fNS1(�); 0 � � < Y(1)g and fNS2(�); 0 � � < Y(1)g) are. Formally,
Pr(Y1 = Y(1)jY(1);fNS1(�); 0 � � < Y(1)g; fNS2(�); 0 � � < Y(1)g); V )
=Æ1(Y(1)jS1)
NS1(Y(1)�)
Æ1(Y(1)jS1)NS1
(Y(1)�) + �Y2(Y(1))Æ2(Y(1)jS2)NS2
(Y(1)�)
=
8>>>>>>>><>>>>>>>>:
11+�Y2 (Y(1))
if S1; S2 > Y(1)
Æ1(Y(1)jS1)
Æ1(Y(1)jS1)+�Y2 (Y(1))if S1 < Y(1) < S2
11+�Y2 (Y(1))Æ2(Y(1)jS2)
if S1 > Y(1) > S2
Æ1(Y(1)jS1)
Æ1(Y(1)jS1)+�Y2 (Y(1))Æ2(Y(1)jS2)if S1; S2 < Y(1):
(6)
The right-hand side does not depend on V . Indeed, the only unknown components are
the second period baseline hazard, which captures the duration-dependence di�erence
between spells and the treatment e�ects. For the �rst sub-population of strata in (6),
with S1; S2 > Y(1), the only reason the spells may not be equally likely to be the shortest
is a di�erence in the baselines at Y(1). Treatment e�ects are irrelevant because in both
spells treatment has not started at time Y(1). The second and third sub-populations are
directly informative on the treatment e�ects as the treatment status of the spells at Y(1) is
di�erent. Finally, the strata in which S1; S2 < Y(1) provide some more di�use information
on the parameters because the treatment e�ect enters for both spells.
Because V does not enter the right-hand side of (6), the same expression is valid for
Pr(Y1 = Y(1)jY(1); fNS1(�); 0 � � < Y(1)g; fNS2(�); 0 � � < Y(1)g)), which is data. So, we
can �rst identify �Y2 from data on the �rst sub-population of strata and then �1 and �2
from the second and third subpopulations, respectively. So, we have
Proposition 6. �Y2, �1 and �2 in Model 2c are identi�ed from (QSS; QY S; QSY ; QY Y ).
We �nish by exploring the relation to SPL estimation. Suppose we have a random
sample of n strata f(Y1;i; fNS1;i(y); 0 � y < Y1;ig; Y2;i; fNS2;i(y); 0 � y < Y2;ig); i =
1; : : : ; ng from (Y1; fNS1(y); 0 � y < Y1g; Y2; fNS2(y); 0 � y < Y2g). First, consider a
special case of Model 2c with �Y2 = 1 and Æ1 = Æ2 = Æ for some constant Æ 2 (0;1), so
that the outcome hazards for stratum i are
�Yk;i(tjSk;i; Vi) = exp�ln(Æ)NSk;i(t�)
�VY;i(t); k = 1; 2: (7)
Here, it is irrelevant that fVY;ig is a stochastic process. We can simply treat each VY;i(t)
as an incidental parameter and VY;i as a stratum-speci�c baseline function. A partial
likelihood can be based on the ranks of the outcome spells within strata. Equation (6)
delivers
Pr(Y1;i = Y(1);ijY(1);i;fNS1;i(�); 0 � � < Y(1);ig; fNS2;i(�); 0 � � < Y(1);ig))
=1
1 + exp�ln(Æ)
�NS2;i(Y(1);i�)�NS1;i(Y(1);i�)
�� : (8)
21
Clearly, a partial likelihood contribution based on (8) is only informative on Æ ifNS2;i(Y(1);i�) 6=
NS1;i(Y(1);i�), i.e. if exactly one of the two spells in the stratum has been treated by the
time Y(1);i. With S(1);i := minfS1;i; S2;ig and S(2);i := maxfS1;i; S2;ig, this gives
# strata such that with SPL contribution
M1 S(1);i < Y(1);i < S(2);i Æ(1 + Æ)�1
M2 S(1);i > Y(1);i > S(2);i (1 + Æ)�1
The �rst row corresponds to the strata in which the shortest outcome spell is the spell
that has been treated by Y(1);i. We denote the number of these strata in the sample by
M1. The second row corresponds to the M2 strata in which the longest outcome spell is
the one treated at Y(1);i. None of the other strata contributes to the SPL based on (8),
which therefore is proportional to�Æ
1 + Æ
�M1�
1
1 + Æ
�M2
:
Thus, we have
Proposition 7. The maximum SPL estimator bÆ of the treatment e�ect in Model 2c with
�Y2 = 1 and Æ1 = Æ2 = Æ for some Æ 2 (0;1) only uses strata in which exactly one of the
two spells has been treated at the time at which the spell with the shortest outcome ends.
Speci�cally,
bÆ = M1
M2=
number of strata with S(1);i < Y(1);i < S(2);i
number of strata with S(2);i > Y(1);i > S(1);i:
This is intuitively plausible. First, suppose that as many treated as untreated spells are
shortest among the strata in which exactly one of the spells has been treated by Y(1);i.
Then, there is no evidence that treatment has any e�ect and bÆ = 1. On the other hand,
if the treated spell is more often the shortest than the longest in the relevant strata,
treatment is likely to increase the outcome hazard and bÆ > 1.
Holt and Prentice (1974) discuss a proportional hazard model with time-varying co-
variates Xk;i(t) and hazard rate
�i(tjfXk;ig) = exp (Xk;i(t)0�)Vi(t) (9)
for spell k in stratum i. We attach the subscript i to the hazard instead of conditioning on
fVig because the Vi(t) are treated as incidental (stratum-speci�c) \parameters". In this
model, � can be estimated by maximizing a SPL based on within-stratum ranks (see also
Kalb eisch and Prentice, 1980, and Ridder and Tunal�, 1999, for a more formal analysis
with ample attention for practical problems like censoring). Our simple model (7) is a
special case of (9) with � = ln(Æ) and Xk;i(t) = NSk;i(t�). The estimatorbÆ in Proposition
22
7 is a special case of the SPL estimator of �. By analogy to the general SPL method, we
can treat fVig as an incidental \parameter" instead of a stochastic process.
Our estimator bÆ is interesting by itself because of its intuitive and simple form, which
is due to the special structure of the treatment e�ects in our model. From Proposition
6 it is clear that our estimator can be extended to allow for spell-speci�c baselines and
treatment e�ects that depend on t and S, at the cost of its simple structure. It is also
interesting that we can allow for di�erent treatment e�ects between spells in a stratum
even though identi�cation is based on contrasting outcomes between spells within strata.
By Assumption 1, treatment in one spell can always be contrasted with no treatment
(yet) in the other spell. It should be noted that a model with spell-speci�c treatment
e�ects and baselines requires that we can unambiguously order spells in each stratum.
Such an order is naturally provided by the chronology of the consecutive spells of a single
individual, but seems absent in, for example, a stratum consisting of a pair of twins.
5 Discussion
This section serves three purposes. First, we discuss the applicability of the model frame-
work as a reduced-form model. Secondly, we address the extent to which the proportional
speci�cation of the hazard rates at the individual level can be founded on economic-
theoretical grounds. Thirdly, we relate our results to the methodological literature on
treatment e�ects. Speci�cally, we compare our model framework and its identi�cation to
popular limited-dependent variable and panel-data model frameworks and their identi�-
cation.
5.1 Applicability
We start this subsection by brie y repeating some distinguishing properties of the model
as a reduced-form model of treatment e�ects in economics. We use this as a guideline
for a characterization of the type of situations for which our model provides a reasonably
accurate description of the individual treatment e�ect. After that, we give some concrete
examples. Throughout this subsection we take the proportionality assumptions as given.
Individuals can be aware of the existence of the treatment assignment process and we
allow them to in uence this process as well as the rate at which they leave the state of
interest.11 However, individuals do not use knowledge of the point in time at which they
receive treatment or the point in time at which they leave the state of interest.12 We do
11For example, the individuals may know that �Y (t) increases in the near future and modify their
strategy accordingly, which may a�ect their �S . The latter can be captured in the model by �S(t).12If the time span between the point in time at which anticipation of the treatment would occur
and the point in time of the actual treatment is short, relative to the durations S and Y � S, and if
the anticipatory e�ect is not very large, then estimation results may be rather insensitive to the no
23
not assume that the analyst observes determinants of the treatment assignment process
that the individual does not use himself to update his strategy. Finally, from the point
of view of the individual there is a purely random component in the duration until the
actual treatment. In practice, this may re ect behavior of the institution that supplies
or imposes the treatments or it may re ect purely random external shocks (see below for
examples).
It follows that the model is typically not suitable for situations in which the individual
himself has permanent full control over the assignment of treatment. In such situations, the
moment of treatment is a deterministic function of the (determinants of the) individual's
strategy, so that there is no random component in the duration until that moment. In
addition, in such situations the moment of treatment is by de�nition anticipated. Our
model is therefore more suitable for situations in which, from the perspective of the
individual, there is a surprise element in the occurrence of a treatment. The treatment
process may capture self-selection behavior on the basis of observable and unobservable
determinants of costs and bene�ts of the treatment.
Of course, as long as there are no multiple observations on the outcome of interest for
given unobserved determinants, all such considerations equally apply to other methods for
inference of treatment e�ects in the presence of selectivity. In general, some independent
randomness in treatment assignment is needed and anticipation is (often tacitly) ruled
out.13 An advantage of our dynamic analysis is that it highlights these issues.
We now discuss a number of applications, partly from empirical studies in which
treatments are analyzed with models similar to our Model 1a.
Example 1. Punitive sanctions for unemployment bene�ts recipients.
This has already served as an example in Subsection 2.4. Abbring, Van den Berg and Van
Ours (1997) and Van den Berg, Van der Klaauw and Van Ours (1998) analyze the e�ect
of sanctions on re-employment rates by estimating models similar to Model 1a. They
argue that imperfect monitoring and discretionary powers of the bene�ts agency give rise
to random components in the imposition of a sanction and that these are unpredictable
before a sanction is actually imposed. They support this with results from �eld research
among the case workers who decide on the imposition of a sanction. It is plausible that
individuals are aware of the relation between their behavior and their sanction risk. Both
the re-employment rate and the rate at which a sanction is imposed are a�ected by the
individual's strategy and all of its determinants. Note that sanctions can be supplemented
with compulsory training.
anticipation assumption.13For example, many standard treatment evaluation studies su�er from a potential bias due to anticipa-
tory e�ects. This includes studies using \di�erence-in-di�erence" methods where one \di�erence" concerns
a comparison between pre- and post-treatment circumstances (see Heckman, LaLonde and Smith, 1999,
for an overview).
24
Example 2. The hiring of replacement workers during strikes.
From the point of view of the employees who are on strike, Y is the duration of the
strike and S is the duration until the employer hires replacement workers. This is another
example where the treatment is a punishment. Punishments occur in a world with im-
perfect monitoring and in general the individuals do not know in advance if and exactly
when they are punished. However, the rate at which replacement workers are brought in
depends on characteristics of the con ict, including the behavior and characteristics of
the union which manages the strike. Also, the strike duration is a�ected by the hiring
of replacement workers. Cramton and Tracy (1998) survey the existing theoretical and
empirical literature on these issues.
Example 3. Labor market transition of spouse.
Consider an individual and spouse who have jobs in the same city. Suppose that the spouse
leaves her job in order to accept another one in another city. This will a�ect the individual's
own transition rate to another job. Let, from the point of view of the individual, Y denote
his own job duration and S the spouse's job duration. It is obvious that the (unobserved)
determinants of both job durations are related. Moreover, the realization of S is not fully
predictable because it depends on the inherent randomness in the process according to
which workers and employers meet and form matches. Note that this is an example in
which the symmetric extension of the model can be applied.
Example 4. Marriage, child birth and death.
Lillard (1993) estimates a model for the joint durations of marriage and time until con-
ception of a child, allowing the rate at which the marriage dissolves to change in response
to child birth. Lillard and Panis (1996) estimate a model on the joint durations of mar-
riage, non-marriage, and life, allowing the death rate to change in response to marriage
formation and dissolution. The model allows for selection on the basis of unobservables. It
does not capture that individuals know some time in advance when they marry or when
a child is born. However, the time span between the moment at which the anticipation
occurs and the moment of the actual treatment may be short relative to the durations
between the observed events.14
Example 5. Training of unemployed workers.
This example is similar to Example 1, albeit that here the worker often has a larger in u-
ence on whether the treatment occurs. However, if the training program is in short supply
then the availability of a slot depends on the labor market behavior of other (potential)
applicants and this is random from the point of view of the individual. Card and Sullivan
(1988) and Eberwein, Ham and LaLonde (1997) allow the exit rate out of unemployment
14Hougaard, Harvald and Holm (1992) study the joint distribution of life times of twins. They estimate
two models: the Freund (1961) model, which allows for a causal e�ect of the death of one twin on the
hazard of the other, and a model that does not allow for such a causal e�ect but that allows for identical
unobserved determinants.
25
to shift to another level at the moment of entry into a training program. They allow for
selection on the basis of unobservables, but they do not allow for anticipation.15
Example 6. Stepping-stone jobs.
Consider an individual who wants to obtain a highly desirable type of job that is in short
supply. The chances of the individual may be higher if he has some work experience in
another (less desirable) type of job, which is also in short supply. During the search for
the highly desirable job the individual may accept such a less desirable job if he �nds one.
Then, from the point of view of the individual, Y is the duration until the highly desirable
job is found and S the duration until the less desirable job is found. Like in Example 3, it
is obvious that the (unobserved) determinants of both search durations are related. Also,
the realization of S is not fully predictable because it depends on the inherent randomness
in the process according to with workers and employers are matched. For an empirical
example see Van den Berg, Holm and Van Ours (2001), in which the \highly desirable
job" is a slot in the educational program to become a medical specialist. To enter this
program, the student must follow an application procedure. The less desirable job is a job
as a medical assistant.
5.2 Theoretical foundation of proportionality
The MPH model and its special cases and extensions are often regarded to be useful
reduced-form models for duration analysis. However, the MPH model speci�cation and
our treatment e�ect speci�cation are not derived from economic theory. Let us consider
the model of Subsection 3.1. The determinants of the hazard rates act multiplicatively
on them. Of course, the distinction between two of the determinants (the observed and
unobserved explanatory variables) is only relevant from an empirical point of view. From
a theoretical point of view, the most relevant assumptions of the model are that the
elapsed duration, the explanatory variables and the treatment act multiplicatively on the
hazard rate of Y and that the �rst two act multiplicatively on the hazard rate of S. In
this subsection we therefore ignore unobserved heterogeneity.
In theoretical models the hazard rate �Y of Y depends on the individual's strategy.
Typically, the optimal strategy involves a comparison of the present value of leaving
the state of interest to the present value of staying. These present values depend on
the current and future values of the structural parameters in a highly non-linear way.
15In the applied literature on the e�ects of training on unemployment durations \training" is sometimes
seen as a separate labor market state. The fact that one has had any training may a�ect subsequent
transitions. See Gritz (1993) and Bonnal, Foug�ere and S�erandon (1997) for examples. In order to deal with
selectivity of those who enrol in training, they allow for related unobserved heterogeneity terms a�ecting
entry into training as well as the other transition rates. The models are estimated with multiple-spell data.
Ham and LaLonde (1996) use experimental data to investigate the e�ect of training on unemployment
durations.
26
As a result, �Y has a complicated non-linear relation to t, x and the treatment even if
the structural parameters are multiplicative in these determinants. Van den Berg (2001)
shows this in detail for the relation between search models and the MPH model without
treatment e�ects. Here, we discuss conditions under which a multiplicative speci�cation
of the structural parameters carries over to the implied hazard �Y .
The theoretical expression for �Y simpli�es considerably if the individual's strategy
is myopic, i.e. if it amounts to a comparison of the instantaneous utility ows of the
di�erent options. Cameron and Heckman (1998) and Van den Berg (2001) use this to
obtain MPH-type speci�cations for the hazard rate in theoretical models for a single
duration variable. Here we follow this line in the context of models with a treatment
e�ect and with treatment assignment.
Intuitively, the optimal strategy for exiting the current state is myopic if the individual
either does not throw away any option by making a transition or does not care about the
future because of an in�nite discount rate. For both of these cases the Appendix provides
detailed examples. In all of these examples, we consider job search models where the
transition rate out of the current state equals the product of the job o�er arrival rate
and the probability that a wage o�er exceeds the reservation wage. The reservation wage
equals the current instantaneous income ow if the optimal strategy is myopic.
In the �rst set of examples, the optimal strategy is myopic because search is repeated,
i.e. the individual is able to search further for better matches after a match has been
formed and no options are thrown away by a transition. The individual's search environ-
ment is subject to treatments that are not worker-speci�c or job-speci�c but that instead
act similarly on all workers (e.g. aggregate shocks). Here it does not matter whether such
changes are known in advance, because the optimal strategy does not depend on that.
In addition, there may be unanticipated treatments that are worker-speci�c but not job-
speci�c in the sense that the moment of occurrence is unrelated to the actual job held at
the time. We include an example that theoretically underpins Example 3 (labor market
transition of spouse) of the previous subsection.
In the second set of examples we assume sequential search and in�nite discounting.
Sequential search models do not allow the economic agent to search further for better
matches after a match has been formed. We focus on unemployed individuals that may
participate in a training program that increases their job o�er arrival rate (recall Example
5 from the previous subsection). Again, it does not matter whether participation is known
in advance. The speci�cation of the hazard rate of the duration S is determined by pro�ling
of unemployed workers by case workers.
We conclude that, under some conditions, the multiplicative speci�cation follows from
economic-theoretical models. In the examples, the optimal strategy is not a�ected by
information on the moment of future treatment. In the terminology of Section 2, there
is no anticipation. In this sense, Assumption 1 (\no anticipation") and the MPH model
framework are a natural combination.
27
The economic justi�cation of other popular reduced-form duration model speci�cations
is at least as diÆcult as the justi�cation of the (M)PH speci�cation. This applies in
particular for the Accelerated Failure Time model, in which the mean of ln(Y ) is speci�ed
as a linear function of x, so that ln(Y ) = �x0� + �, and the additive hazard model, in
which �Y (tjx) is speci�ed as �Y (t) + �Y (x). Discrete-time reduced-form duration model
speci�cations are also diÆcult to justify and often do not follow from underlying economic
models such as discrete-time search models or dynamic discrete-choice models.
5.3 Comparison with identi�cation in limited-dependent vari-
able and panel-data model frameworks
5.3.1 Limited-dependent variable model frameworks
We start with the identi�cation of the treatment e�ect in the prototypical limited-dependent
variable model with treatments (see Maddala, 1983, for an overview). We restrict atten-
tion to the relations between the mean of the endogenous variables on the one hand and
the explanatory variables on the other. This is in line with the spirit in which these models
are interpreted.16 The exposition is kept rather informal.
Consider
Y = x0Y �Y + x0Y Æ0 � I(Z > 0) + "Y
Z = x0Z�Z + "Z(10)
where Y , Z, "Y and "Z are continuous random variables, Z > 0 indicates treatment and
x0Y Æ0 is the treatment e�ect. We assume that inference is based on a random sample of
subjects with information on Y , xY , xZ and I(Z > 0) for each subject. Furthermore,
E ["Y jxY ] = E ["Z jxZ ] = 0. We take the parameter �Z to be identi�ed from the data
on I(Z > 0) and xZ . It is generally acknowledged that either exclusion restrictions or
parametric functional-form assumptions on the joint distribution of ("Y ; "Z) are required
for identi�cation of the treatment-e�ect parameter Æ0. Exclusion restrictions state that
some covariates in xZ with non-zero parameters in �Z are not included in xY . Recall that
these restrictions are often hard to justify. Similarly, it is usually hard to �nd convincing
arguments in favor of a particular functional form of the distribution of ("Y ; "Z).
Now suppose that we do not observe Y but only I(Y > 0). This is the \binary treat-
ment { binary outcome" speci�cation. For example, Z > 0 may indicate whether an
unemployed individual has participated in a training program, and Y > 0 may indicate
whether he has found a job. The binary speci�cation for Y e�ectively reduces the infor-
mation in the data, so identi�cation needs at least as many assumptions as above (see
Cameron and Heckman, 1998, for results).
16See e.g. Heckman (1990) for identi�cation results based on full information.
28
To make comparisons with our Model 1a, it is useful to rewrite the latter as a
regression-type model. Again using that the integrated hazard of a duration variable
evaluated at that random duration has a unit exponential distribution we can write
ln ([�Y (minfY; sg) + I(Y > s)�(Y js; x)]) = � ln(�Y (x))� ln(VY ) + ln(EY ) (11)
ln(�S(S)) = � ln(�S(x))� ln(VS) + ln(ES); (12)
with EY and ES unit exponential random variables such that
EY ?? ES; and (EY ; ES) ?? (X; VY ; VS)
but VY and VS possibly dependent. Note that ES captures the random component in the
treatment assignment that we discussed in Subsection 3.2.
We conclude, �rst of all, that the proportionality assumption is similar to the additiv-
ity assumption that is made in regression equations. Both impose some \smoothness" by
excluding certain interactions of time and explanatory variables at the individual level.
Secondly, let, in the above limited-dependent variable model, Y be a (log) duration vari-
able. The most fundamental di�erence between that model and Model 1a concerns the
fact that the latter incorporates the timing of the treatment whereas the former does
not. Since we do not need exclusion restrictions to identify Model 1a, it follows that the
information on the timing of events is very useful.17
Using limited-dependent variable models or discrete-time panel-data models when Y
is in fact a (function of a) duration variable also leads to a number of intractable practical
problems. First, it requires aggregation over time. Often, it is recorded whether a treat-
ment occurs in a baseline period, and it is recorded whether the outcome (i.e. the duration
variable) is realized in the subsequent period. Then, the question arises what to do when
the treatment and the outcome occur in the same period. In addition, it is not clear how to
deal with observations that are right-censored before the end of the observation window.
It is common that a spell in a state of interest can end in di�erent ways. Usually, only
a subset of these are deemed interesting. For example, an unemployment spell can end
because of a transition to work, but also because of a transition into education, military
service, etcetera. A study of the transition rate to work may treat transitions to other
destinations as independently right-censoring the duration until work. The treatment ef-
fect estimate may be biased if such observations are discarded or if they are treated as
observations of exit to work.
17There is a very remote similarity to an approach in Vella and Verbeek (1997) to identify a treatment
e�ect in a Heckman selection model without exclusion restrictions. They decompose the error term of
the latent equation into a component that basically captures the rank order of the error term and a
completely independent residual component and assume that the selection e�ect only works by way of
the �rst component.
29
5.3.2 Panel data model frameworks
Consider the dynamic panel-data model
Wt = x0W;t�W + x0W;tÆ0 � I(Zt > 0) + VW + �W;t
Zt = x0Z;t�Z + VZ + �Z;t;(13)
where the index t denotes time. We deliberately introduce new notation Wt for the out-
come because we will relateWt to duration outcomes in two di�erent ways later. Inference
is based on a random sample of subjects with information on Wt, Wt�1, xW;t, xW;t�1, xZ;t,
xZ;t�1, I(Zt > 0) and I(Zt�1 > 0) for each subject. We take �j;t to be i.i.d. across time and
across j = W;Z, so that the spurious dependence between treatment and outcome (the
\selection e�ect") runs by way of the relation between VW and VZ . We take E [�j;t jxj] = 0,
where for sake of brevity we use xj := xj;t; xj;t�1. We can treat VW as a �xed e�ect by
taking �rst di�erences of Wt across time,
E [Wt �Wt�1jZt�1 � 0; Zt > 0; xY ; xZ ] = (xW;t � xW;t�1)0�W + x0W;tÆ0: (14)
Clearly, Æ0 is identi�ed.
Now suppose that �W varies over time. The equivalent of the right-hand side of equa-
tion (14) equals x0W;t�W;t � x0W;t�1�W;t�1 + x0W;tÆ0. As a result, Æ0 is unidenti�ed from
E [Wt �Wt�1jZt�1 � 0; Zt > 0; xW ; xZ ]. A further \di�erence-in-di�erence" however gives
E [Wt �Wt�1jZt�1 � 0;Zt > 0; xW ; xZ]
� E [Wt �Wt�1jZt�1 � 0; Zt � 0; xW ; xZ ] = x0W;tÆ0:(15)
Clearly, Æ0 can be allowed to depend on t.
We can apply this panel setup to our problem in two ways. We can either (i) let Wt
be a survival indicator for a single outcome duration Y and set Wt = NY (t�) or (ii)
relate each Wt to a separate outcome duration Yt. Option (i) suggests a link between a
single-spell duration model and a multi-period panel-data model. However, this apparent
similarity is deceptive. In the case of (i),Wt�1 = 1 =) Wt = 1, which is not necessarily so
in the general case of speci�cation (13). The latter speci�cation allows for more variation
in Wt and Wt�1 given Zt�1; Zt; xW ; xZ . This variation distinguishes panel-data models
from our duration model. As a result, we cannot apply (14) and (15). In our companion
paper Abbring and Van den Berg (2001b) we develop a somewhat related, but much
more involved, approach for our model. We require that the unobserved explanatory
variables are independent of the observed explanatory variables and we have to specify the
treatment assignment process. All this is not required in the panel-data model framework.
Now consider option (ii). Let Zt indicate whether a treatment is given during the
t-th spell. This interpretation allows for a straightforward comparison with our multiple-
spell duration model. In both cases we observe multiple outcomes for an individual with
the same unobserved covariates. The SPL estimator developed in Subsection 4.3 is the
30
analogue of the �xed-e�ect panel-data estimator based on (15). Neither of these requires
any observed explanatory variables or a speci�cation of the treatment assignment process.
In conclusion, we can identify the treatment e�ect without exclusion restrictions or
parametric functional-form restrictions on the distribution of the unobservables, even
though the observational scheme in a single-spell duration model with treatments is more
restrictive than in panel-data models. Thus, with treatments, duration data are more
informative than the data used for standard limited-dependent variable models. In com-
parison to standard panel-data models, the single-spell model requires independence of
the unobserved and observed covariates. Furthermore, the treatment assignment process
has to be speci�ed. This can be remedied by using multiple-spell duration data.
Appendix.
Formal derivations for Subsection 2.4
Consider a standard partial search model (e.g. Mortensen, 1986). Job o�ers arrive at a rate � � e, where
e � 0 is the intensity of search and � > 0 is a structural parameter of the model. The corresponding
ow of search costs is denoted by c(e) and the wage o�er distribution is denoted by F . An individual
initially receives unemployment bene�ts b0 while searching, but these are reduced to b1 < b0 at some time
S. Utility is derived from consumption only. We abstract from any saving, so that consumption equals
income net of search costs at each point in time. We assume that utility is intertemporally separable,
with instantaneous utility function u and discount rate � > 0.
First, assume that individuals observe their individual realization of S at time 0. We consider two
simple alternative versions of this �rst model. Both lead to the same expression of the optimal re-
employment hazard rate. The �rst version is in the spirit of Mortensen (1977), and has an F that is
degenerate at some wage w > b0, search costs c(e) = e2=2 and linear instantaneous utility u(x) = x.
In this version, individuals only choose search e�ort to maximize their discounted ow of income. The
second version has e exogenously �xed at 1, search costs c(1) = 0 and logarithmic instantaneous utility
u(x) = ln(x). Furthermore, it has a dispersed log-uniform wage o�er distribution, with F (w) = 0 if w � �
and F (w) = ln(w=�)= ln(�=�) if � < w < �, with b0 � � < � <1 (Van den Berg, 1990). In this version,
the individuals only have to decide when to accept or reject job o�ers. In either case, the resulting hazard
rate �Y �(s)(y) at time y, if bene�ts are reduced at time s, can be expressed as follows. For y 2 [s;1), it
equals �1 := ��+p�2 + 2 (b1). Here, the function : R ! R takes a form speci�c to either version. We
are only interested in some properties of that hold in both models: � 0, 0 < 0, and (w) = 0, where
w equals the highest wage in the support of F (i.e., w = w in the �rst model and w = � in the second
model). For y < s, �Y �(s)(y) satis�es the di�erential equation
�0Y �(s)(y) = � (b0) + ��Y �(s)(y) +1
2�Y �(s)(y)
2
(see also Van den Berg, 1995). Continuity at s applies because there is no shock in the individual's
information set or the search technology parameters at s.
As a speci�c example, we take b0 and b1 such that (b0) = 0, (b1) = (2�+ 1)=2, and we take � # 0,
which gives
�Y �(s)(y) =
(2
2+s�y if 0 � y � s and
1 if y > s:(16)
31
Figure 1: Anticipated and unanticipated UI changes: potential hazards
Note: this graph plots �Y �(s) for the cases of anticipated (\perfect foresight") and unanticipated (\no
anticipation") UI changes at times s = 1:5, 2:0 and 2:5.
Figure 1 illustrates the potential hazard rates for s = 1:5, 2:0 and 2:5. Although (16) is only valid for
particular parameterizations of the models, the qualitative features that can be discerned in Figure 1 are
robust.
In the second model, individuals only observe the sanction history fNS(u); 0 � u < yg and not S
itself, at each time y. We assume that sanctions arrive at some exogenously given constant rate that
is known to the individuals. Our intervention involves a manipulation of the sanction time s without
changing this perceived arrival rate of sanctions. At s, the resulting hazard rate jumps from a constant
level �0 to �1.
We return to the �rst model and take S to have the density g(s) = (2 + s)2e�s=10. Recall that we
observe (QS ; QY ). Direct calculations show that
QS(y; s) =2
5(1 +maxf0; y � sg) e�maxfy;sg and QY (y) =
3
5e�y:
Our non-identi�cation example centers around the fact that the Y �(s)-hazard at time y < s cannot
be identi�ed from data on subjects with S = s. At the very best, we can pin down the mixture haz-
ard rateR1y �Y �(s)(y)e
��Y �(s)(y)g(s)ds=R1y e��Y �(s)(y)g(s)ds over the potential hazards corresponding to
treatments s in [y;1). This hazard rate equals
limdy#0
Pr(y � Y < y + dyjY � y; S � y)
dy=
�Q0Y (y)
Q0S(s) +QY (y)
=3
5:
Now note that the same mixture hazard would result if the potential hazards would equal 3=5 for 0 � y � s
for all s. We base our construction of an observationally equivalent model on this idea. Let eY �(s) have
hazard
�eY �(s)(y) =
(35 if 0 � y � s
1 if y > s
and eS have density eg(s) = (2=5)e�(2=5)s. It is easily checked that (fY �g; S) and (feY �g; eS) are indeed
observationally equivalent. Figure 1 illustrates the potential hazards for both equivalent models, and
s = 1:5, 2:0 and 2:5.
32
We end with a note concerning aggregation over values of the structural parameters and the meaning
of Assumptions 1 and 2. The structural models of this section imply potential re-employment rates �Y �(s)
for each sanction time s, given values of the structural \parameters" �, F , � and the sanction rate, say
�S . Suppose that there is heterogeneity in these parameters, which we model by letting �, � and �S be
random variables and fFg a stochastic process. The models above specify
Y �(s) = ��1Y �(s) (EY j�; fFg; �; �S) and S = ��1S ES ;
with EY and ES unit exponential random variables such that EY??ES and (EY ; ES)??(�; fFg; �; �S). As
in the general setup, we have made the dependence of �Y �(s) on the random parameters explicit. We have
that fY �g??Sj(�; fFg; �; �S), so that Assumption 2 is satis�ed with any (X;V ) such that (�; fFg; �; �S)
is measurable with respect to �(X;V ). In the sanctions model without anticipation, Assumption 1 is true
conditional on the parameters and, by implication, conditional on only a subset of parameters. However,
in general Assumption 2 will not hold in the latter case. In general, Assumptions 1 and 2 may hold after
aggregation over some of the parameters observed by individuals, even if there are anticipatory e�ects.
In this sense, the interpretation of Assumption 1 in terms of no anticipation is too strict.
Proof of Proposition 1
Take some (QS ; QY ) as given. By Miller (1977), Theorem 1, there exists some (eS; eY ) such that eS??eY ,eQ0S = Q0
S and eQY = QY . Let
�eY �(s)(y) :=
(�eY (y) if 0 � y � s
�eY (s)� ln [Pr(Y > yjY > S; S = s)] if y > s:
Let E be a unit exponential random variable such that E??eS. Set eY �(s) = ��1eY �(s)
(E) for all s 2 R+.
Then, it is easy to check that (feY �g; eS) also satis�es Assumption 1 and is observationally equivalent to
(fY �g; S).
The theorem by Miller (1977) that is invoked is an extension to general distributions of the non-
identi�cation results by Cox (1962) and Tsiatis (1975) for continuous competing risks models. The proof
and theorem therefore extend to mixed discrete-continuous models.
Proof of Proposition 3
By Proposition 2, �Y , �S , �Y , �S and G are identi�ed from fQ0S; QY g. For �xed x 2 X ,
@QS (y; sjx)
@s= �S (x)�S (s)L
(S)G (�S (x) [�Y (s) + � (yjs; x)] ; �S (x) �S (s)) (17)
for all y 2 R+ and almost all s 2 R+ such that s < y. LG is the bivariate Laplace transform of G,
LG(z1; z2) :=
Z 1
0
Z 1
0
exp(�z1vY � z2vS)dG(vY ; vS)
and L(S)G (z1; z2) := @LG(z1; z2)=@z2 for all (z1; z2) 2 (0;1)2. For given s and x, the right-hand side of
(17) is an strictly increasing and identi�ed function of � (yjs; x). This identi�es � (yjs; x) and therefore
� (yjs; x) =
Z y
s
@�(� js; x) =@�
�Y (�)d�
for all x 2 X and all y; s 2 R+ such that s < y (using continuity).
33
Proof of Proposition 4
For �xed x 2 X , @2QS(y; sjx)=@y@s and @2QS(y; sjx�)=@y@s exist for almost all y; s 2 R+ such that
s < y. For these (y; s)
@2QS(y; sjx)=@y@s
@2QS(y; sjx�)@y@s= �S(x)��(x)
L(�S)G�
(�Y (x)�Y (s); ��(x)(��(y)� ��(s)); �S(x)�S(s))
L(�S)G�
(�Y (s);��(y)� ��(s);�S(s)): (18)
LG� is the trivariate Laplace transform of the distribution G� of (VY ; V�; VS) and L(�S)G�
(z1; z2; z3) :=
@2LG�(z1; z2; z3)=@z2@z3 for all (z1; z2; z3) 2 (0;1)3. Letting y # 0 and s # 0, both sides of (18) above
reduce to �S(x)��(x) because E [V�VS ] = limz#(0;0;0) L(�S)G�
(z) < 1. As we have identi�ed �S and the
left-hand side is data, this identi�es ��.
Again for arbitrary x 2 X and y; s 2 R+ such that s < y,
@QS(y; sjx)=@s
�S(s)�S(x)= L
(S)G (�Y (x)�Y (s); ��(x)(��(y)� ��(s)); �S(x)�S(s)); (19)
with L(S)G�
(z1; z2; z3) := @LG�(z1; z2; z3)=@z3. Note that the left-hand side of this equation is already
identi�ed. As s # 0, the right-hand side reduces to
L(S)G (0; ��(x)��(y); 0): (20)
After imposing y = t�, we can identify the completely monotone function �L(S)G (0; �; 0) on a nonempty
open set in R by appropriately varying x in equation (20). This identi�es �L(S)G (0; z; 0) for all z 2 (0;1)
because of the real analyticity of �L(S)G (0; �; 0) (see below; note that this gives the marginal distribution
of V�). Subsequently, �� is identi�ed from (20), because the right-hand side is strictly monotone in ��.
Finally, by appropriately varying x and y in equation (19), we can trace L(S)G on an nonempty open
subset of R3 . This identi�es L(S)G on (0;1)3 because �L
(S)G is real analytic. A proof of these statements
is in the previous version of this paper, which is available (as a 2000 working paper) upon request. With
LG(0; 0; 0) = 1, this identi�es LG.
Proof of Proposition 5
By Abbring and Van den Berg (2001a), �Y1 , �S1 , �Y2 and �S2 are identi�ed.
(i). For Model 2a, de�ne �k(yjs) :=R ys�Yk (�)Æk(� js)d� , k = 1; 2. Note that
�Y2(t�)
(Z t�
0
�Z y
s
@2 [QSY (�; t; s) +QSS(�; t; s; 0)] =@�@s
@2QSY (�; t; s)=@s@td�
��1dt
)�1
= �1(yjs):
Because we know �Y2(t�) and �Y1 , this identi�es �1. Similarly, we can identify �2 and �2 from
QY S and QSS.
Now, consider
@ [QSY (y; 0; s) +QSS(y; 0; s; 0)] =@s
�S1(s)= L
(S)G (�Y1(s) + �1(yjs);�S1(s));
for y > s. As we know �Y1 , �S1 and �1, we can �nd a non-empty open set in R2 on which we know
L(S)G . The function �L
(S)G is real analytic, so this determines L
(S)G on (0;1)2 (recall the discussion
in the proof of Proposition 4). Together with LG(0; 0) = 1 this identi�es LG.
34
(ii). For Model 2b, t1; s1 2 R such that s1 < t1 and t2; s2 2 R such that s2 < t2Z t1
s1
�Z t2
s2
@3QSS(y1; y2; s1; s2)=@s1@s2@y2@3QSS(y1; y2; s1; s2)=@s1@s2@y1
dy2
��1dy1 =
��1(t1)� ��1(s1)
��2(t2)� ��2(s2):
Setting s1 = s2 = 0 and consecutively t1 = t� and t2 = t� identi�es ��2 and ��1 . Identi�cation
of G� follows as under (i).
Examples of theoretical foundations for Model 1a
We �rst consider models of repeated search. For ease of exposition, we phrase the theoretical model as a
model for on-the-job search for better jobs (see Mortensen, 1986, for an overview). A job is characterized
by its wage w, which is constant within a job. Workers obtain wage o�ers, which are random drawings
from the wage o�er distribution F (w), at an exogenous rate �. Whenever an o�er arrives, the decision
has to be made whether to accept it or to reject it and search further for a better o�er. If the individual's
search environment (i.e. � and F ) is time-invariant (i.e. the model is stationary) then the optimal strategy
is constant during a job spell. In particular, as is well known, the optimal strategy is myopic: accept a job
if and only if the o�ered wage exceeds the current wage. Now let the individual's search environment be
subject to changes that are not worker-speci�c or job-speci�c but rather such that they act similarly on all
employed workers (e.g. business cycles). The changes may be anticipated or unanticipated. It is obvious
that this nonstationarity does not a�ect the optimal strategy: it remains optimal to accept another job
if and only if its wage exceeds the current wage. In the absence of other treatments we thus obtain for
the job-to-job transition rate,
�Y (tjex) = �(t; x)(1� F (wjt; x))
where ex := (x;w). We allow structural determinants to di�er across individuals. This is captured by time-
invariant explanatory variables x. We assume that the analyst observes x (and the duration Y ) but does
not directly observe how the structural determinants, the optimal strategy or the acceptance probability
change with t. The optimal strategy is also una�ected if we allow for unanticipated treatments that are
worker-speci�c but not job-speci�c in the sense that the moment of occurrence is unrelated to the actual
job held at the time.
Consider the following speci�c example. An individual and his spouse have jobs in the same city.
The treatment consists of the spouse �nding and accepting a job in another city. The e�ect of this is
a reduction of the instantaneous disposable income of the individual by a fraction � (e.g. because of
traveling costs; for ease of exposition we abstract from subsequent treatments like this). Let � vary with
the business cycle and let F be a Pareto distribution that varies with x by way of 1�F (wjx) = (w0(x)=w)�
for w 2 (w0(x);1). Then
�Y (tjS; ex) = �(t) � [w0(x)=w]� � [��� ]I(t>S)
which satis�es the speci�cation of Model 1a. (The derived theoretical speci�cation actually requires some
more justi�cation; note that F does not depend on whether the treatment has occurred.) Note that, for
reasons of symmetry, this structural model also gives a multiplicative speci�cation for the hazard rate of
the duration S.
Obviously, there are many more ways to obtain a Model 1a speci�cation in this example. We may
allow � and � to depend on x (in which case the treatment e�ect Æ depends on x), or we may treat w
as unobserved, or we may let � be multiplicative in t and x, or we may take F to have a transposed
exponential distribution, or we may allow the treatment to have a multiplicative e�ect on �, etcetera.
Also, obviously, many other repeated search models are able to generate Model 1a speci�cations.
35
Now let us consider sequential search models with treatments and in�nite discounting. For ease
of exposition, we phrase the theoretical model as a job search model for unemployed individuals (see
Mortensen, 1986, for an overview). The unemployed individual's search environment is as above, except
that a job, once obtained, is held forever. If the search environment is time-invariant then the optimal
strategy is constant during a job spell. In particular, if the discount rate is in�nity, so that workers do not
care about the future, then the optimal strategy is myopic: accept a job if and only if the o�ered wage
exceeds the current instantaneous utility ow b. Now let the individual's search environment change over
time, anticipated or unanticipated. Because of the in�nite discounting, this does not a�ect the optimal
strategy: it remains optimal to accept a job if and only if its wage exceeds b. In the absence of other
treatments we thus obtain for the exit rate out of unemployment,
�Y (tjex) = �(t; x)(1� F (bjt; x))
where ex := (x; b). The optimal strategy is also una�ected if we allow for treatments. Suppose that the
treatment consists of participation in a training course, and that the e�ect of this is a multiplicative
change of � in the job o�er arrival rate �. Let � also vary with t (e.g. because the long-term unemployed
are stigmatized), and let F be the same Pareto distribution as above, with b > w0(x). Then
�Y (tjS; ex) = �I(t>S) � �(t) � [w0(x)=b]�
which satis�es the speci�cation of Model 1a.
The speci�cation for the hazard rate of the duration S may follow from the assignment of individuals
to the course by case workers. They may pro�le the individuals, as follows: estimate a Weibull unemploy-
ment duration model with external data, predict the individual's hazard rate using this estimated model,
and use this as the hazard rate for assignment.
There are many more ways to obtain a multiplicative speci�cation for �Y . Note also that if b < w0
then 1� F (b) = 1 always, which facilitates the construction of a multiplicative speci�cation.
36
References
Abbring, J.H. (2001a), \Econometric duration and event-history analysis", Lecture notes,
Department of Economics, The University of Chicago.
Abbring, J.H. (2001b), \Stayers versus defecting movers: A note on the identi�cation of
defective duration models", Economics Letters [forthcoming].
Abbring, J.H. and G.J. van den Berg (2001a), \The identi�ability of the mixed propor-
tional hazards competing risks model", Working paper, Free University, Amsterdam.
Abbring, J.H. and G.J. van den Berg (2001b), \A simple procedure for inference on
treatment e�ects in duration models", Mimeo, Free University, Amsterdam.
Abbring, J.H., G.J. van den Berg, and J.C. van Ours (1997), \The e�ect of unemploy-
ment insurance sanctions on the transition rate from unemployment to employment",
Working paper, Tinbergen Institute, Amsterdam.
Billingsley, P. (1995), Probability and Measure, Wiley, New York, third edition.
Bonnal, L., D. Foug�ere, and A. S�erandon (1997), \Evaluating the impact of French em-
ployment policies on individual labour market histories", Review of Economic Studies,
64, 683{713.
Cameron, S.V. and J.J. Heckman (1998), \Life cycle schooling and dynamic selection
bias: Models and evidence for �ve cohorts of American males", Journal of Political
Economy, 106, 262{333.
Card, D. and D. Sullivan (1988), \Measuring the e�ect of subsidized training programs
on movements in and out of employment", Econometrica, 56, 497{530.
Cox, D.R. (1962), Renewal Theory, Methuen, London.
Cox, D.R. (1972), \Regression models and life-tables (with discussion)", Journal of the
Royal Statistical Society Series B, 34, 187{202.
Cramton, P. and J. Tracy (1998), \The use of replacement workers in union contract
negotiations: The U.S. experience, 1980{1989", Journal of Labor Economics, 16, 667{
701.
Eberwein, C., J.C. Ham, and R.J. LaLonde (1997), \The impact of being o�ered and
receiving classroom training on the employment histories of disadvantaged women:
Evidence from experimental data", Review of Economic Studies, 64, 655{682.
Elbers, C. and G. Ridder (1982), \True and spurious duration dependence: The identi�-
ability of the proportional hazard model", Review of Economic Studies, 64, 403{409.
Fleming, T.R. and D.P. Harrington (1991), Counting Processes and Survival Analysis,
Wiley, New York.
37
Freedman, D.A., \On specifying graphical models for causation, and the identi�cation
problem", Technical Report 601, Department of Statistics, University of California,
Berkeley, CA.
Freund, J.E. (1961), \A bivariate extension of the exponential distribution", Journal of
the American Statistical Association, 56, 971{977.
Gill, R.D. and J.M. Robins (2001), \Causal inference for complex longitudinal data: The
continuous case", Annals of Statistics, 29 [in press].
Gritz, R.M. (1993), \The impact of training on the frequency and duration of employ-
ment", Journal of Econometrics, 57, 21{51.
Grubb, D. (1999), \Making work pay: The role of eligibility criteria for unemployment
bene�ts", Mimeo, OECD, Paris.
Ham, J.C. and R.J. LaLonde (1996), \The e�ect of sample selection and initial conditions
in duration models: Evidence from experimental data on training", Econometrica, 64,
175{205.
Heckman, J.J. (1990), \Varieties of selection bias", American Economic Review, 80 (sup-
plement), 313{318.
Heckman, J.J. (2000), \Causal parameters and policy analysis in economics: A twentieth
century retrospective", Quarterly Journal of Economics, 115, 45{97.
Heckman, J.J. and B.E. Honor�e (1989), \The identi�ability of the competing risks model",
Biometrika, 76, 325{330.
Heckman, J.J., R.J. LaLonde, and J.A. Smith (1999), \The economics and econometrics
of active labor market programs", in O. Ashenfelter and D. Card, editors, Handbook
of Labor Economics, Volume III, North-Holland, Amsterdam.
Heckman, J.J. and B. Singer (1984), \The identi�ability of the proportional hazard
model", Review of Economic Studies, 51, 231{241.
Heckman, J.J. and C.R. Taber (1994), \Econometric mixture models and more general
models for unobservables in duration analysis", Statistical Methods in Medical Research,
3, 279{302.
Holland, P.W. (1986), \Statistics and causal inference", Journal of the American Statis-
tical Association, 81, 945{960.
Holt, J.D. and R.L. Prentice (1974), \Survival analysis in twin studies and matched pair
experiments", Biometrika, 61, 17{30.
Honor�e, B.E. (1993), \Identi�cation results for duration models with multiple spells",
Review of Economic Studies, 60, 241{246.
Horowitz, J.L. (1999), \Semiparametric estimation of a proportional hazard model with
unobserved heterogeneity", Econometrica, 67, 1001{1028.
38
Hougaard, P., B. Harvald, and N.V. Holm (1992), \Assessment of dependence in the life
times of twins", in J.P. Klein and P.K. Goel, editors, Survival Analysis: State of the
Art, Kluwer Academic Publishers, Dordrecht.
Kalb eisch, J.D. and R.L. Prentice (1980), The Statistical Analysis of Failure Time Data,
Wiley, New York.
Lancaster, T. (1990), The Econometric Analysis of Transition Data, Cambridge Univer-
sity Press, Cambridge.
Lillard, L.A. (1993), \Simultaneous equations for hazards", Journal of Econometrics, 56,
189{217.
Lillard, L.A. and C.W.A. Panis (1996), \Marital status and mortality: The role of health",
Demography, 33, 313{327.
Lok, J.J. (2001), Statistical Modelling of Causal E�ects in Time, PhD thesis, Division of
Mathematics and Computer Science, Faculty of Sciences, Free University, Amsterdam.
Maddala, G.S. (1983), Limited-dependent and qualitative variables in econometrics, Cam-
bridge University Press, Cambridge.
Miller, D.R. (1977), \A note on independence of multivariate lifetimes in competing risks
models", Annals of Statistics, 5, 576{579.
Mortensen, D.T. (1977), \Unemployment insurance and job search decisions", Industrial
and Labor Relations Review, 30, 505{517.
Mortensen, D.T. (1986), \Job search and labor market analysis", in O. Ashenfelter and
R. Layard, editors, Handbook of Labor Economics, North-Holland, Amsterdam.
Neyman, J. (1923), \On the application of probability theory to agricultural experiments.
Essay on principles.", Roczniki Nauk Rolniczych, 10, 1{51 [in Polish; edited and trans-
lated version of Section 9 by D.M. Dabrowska and T.P Speed (1990), Statistical Science,
5, 465{472].
Pearl, J. (2000), Causality: Models, Reasoning, and Inference, Cambridge University
Press, Cambridge, UK.
Ridder, G. (1990), \The non-parametric identi�cation of generalized accelerated failure-
time models", Review of Economic Studies, 57, 167{182.
Ridder, G. and I. Tunal� (1999), \Strati�ed partial likelihood estimation", Journal of
Econometrics.
Robins, J.M. (1998), \Structural nested failure time models", in P. Armitage and T. Colton,
editors, The Encyclopedia of Biostatistics, John Wiley and Sons, Chichester.
Rubin, D.B. (1974), \Estimating causal e�ects of treatments in randomized and nonran-
domized studies", Journal of Educational Psychology, 66, 688{701.
39
Tsiatis, A. (1975), \A nonidenti�ability aspect of the problem of competing risks", Pro-
ceedings of the National Academy of Sciences, 72, 20{22.
Van den Berg, G.J. (1990), \Nonstationarity in job search theory", Review of Economic
Studies, 57, 255{277.
Van den Berg, G.J. (1995), \Explicit expressions for the reservation wage path and the un-
employment duration density in nonstationary job search models", Labour Economics,
2, 187{198.
Van den Berg, G.J. (2001), \Duration models: Speci�cation, identi�cation, and multi-
ple durations", in J.J. Heckman and E. Leamer, editors, Handbook of Econometrics,
Volume V, North Holland, Amsterdam.
Van den Berg, G.J., A. Holm, and J.C. van Ours (2001), \Do stepping-stone jobs ex-
ist? Early career paths in the medical profession", Journal of Population Economics
[forthcoming].
Van den Berg, G.J., B. van der Klaauw, and J.C. van Ours (1998), \Punitive sanctions
and the transition rate from welfare to work", Working paper, Tinbergen Institute,
Amsterdam.
Vella, F. and M. Verbeek (1997), \Using rank order as an instrumental variable: An
application to the return to schooling", Working paper, Catholic University Leuven,
Leuven.
40