disability income insurancethe data and background information needed for this thesis. last, but...

93
Disability Income Insurance Explaining the structural changes in the disability duration of Dutch self-employed with a focus on business cycle-related variables. Dorthe van Waarden July 14, 2012 Master’s Thesis Mathematics Supervisors: Michel Mandjes, Theo Beekman, Folkert de Jong, Martin Heijnsbroek Faculty of Science University of Amsterdam

Upload: others

Post on 13-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Disability Income Insurance

Explaining the structural changes in the disability duration of Dutchself-employed with a focus on business cycle-related variables.

Dorthe van Waarden

July 14, 2012

Master’s Thesis Mathematics

Supervisors: Michel Mandjes, Theo Beekman,Folkert de Jong, Martin Heijnsbroek

Faculty of Science

University of Amsterdam

Page 2: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

AbstractThis thesis analyses the dynamics of the return-to-work process of Dutch self-employed using a unique data set containing more than 30.000 sick leave claimsduring the period 2003− 2011. We estimate a multi-state model and analyse thetransitions from one state to another during the incapacity by both a proportionalhazards models and a logit model. In particular, we focus on the influence of thebusiness cycle on the full and partial recovery rates, as well as on the fall-backrate. Finally, the influence of the various risk factors is quantified by calculatingthe expected duration until recovery for different values of the risk factors andcomparing these to the benchmark self-employed.

Details

Title: Disability Income InsuranceAuthor: Dorthe van Waarden, [email protected], 5801974Supervisor: Prof.dr Michel Mandjes (UvA)

Theo Beekman (Achmea)Folkert de Jong, Martin Heijnsbroek (MIcompany)

Second reviewer: Prof.dr. Rudesindo Nunez-QueijaDate: July 14, 2012Faculty of ScienceUniversity of AmsterdamScience Park 904, 1098 XH Amsterdamhttp://www.science.uva.nl/math

Page 3: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Preface

This Master’s thesis is part of my internship at MIcompany, a specialized commercial analyticsagency that helps companies deal with two key challenges. Firstly, they help to discover growthopportunities based on granular analysis of customer data. Secondly, MIcompany helps to buildan in-house capability to leverage analytics in the business. They help their customers withrecruitment and development of their analytical talent, build analytical tools that facilitate easyreplication of smart analysis, and provide specialized support. One of their clients is a largeDutch insurance company, who provided the data and research question for this thesis. I reallyenjoyed the practical side of this internship and being able to work in two different companies.

Foremost, I would like to express my sincere gratitude to all my supervisors. First of all I wouldlike to thank Prof.dr Michel Mandjes for assisting me finding an internship and supporting methroughout the whole process. Second I would like to thank Theo Beekman for taking the timeto explain me everything about disability insurance and the various models, and for providingthe data and background information needed for this thesis. Last, but certainly not least, Iwould like to express my thanks to Folkert de Jong and Martin Heijnsbroek from MIcompany,for giving me the opportunity to do my internship and for initiating and supporting the contactwith the insurance company.

i

Page 4: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Contents

Preface i

1 Introduction 1

2 Disability Income Insurance 62.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Features of the insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Recovery models 103.1 Current model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Multi-state model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Explanatory variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.1 Business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Survival Theory 204.1 Hazard function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Kaplan Meier estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3 Censored data and the Likelihood function . . . . . . . . . . . . . . . . . . . . . 244.4 Cox Proportional Hazards model . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4.1 PH assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.4.2 Partial Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5 Time-varying covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.5.1 Episode splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.6 Competing risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.7 Unobserved heterogeneity: Frailty models . . . . . . . . . . . . . . . . . . . . . . 344.8 Goodness of fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Binary regression 385.1 The logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1.1 Likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 Panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.1 Linear panel models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2.2 Binary panel models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3 Goodness of fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.3.1 Maximum likelihood theory . . . . . . . . . . . . . . . . . . . . . . . . . . 455.3.2 Pseudo R2 measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

ii

Page 5: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CONTENTS iii

6 Results 496.1 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.2 Model without the business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.2.1 Unobserved heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.2.2 Proportional hazards assumption . . . . . . . . . . . . . . . . . . . . . . . 52

6.3 Business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Quantifying the influence of the risk factors 567.1 Transition probabilities in the multi-state model . . . . . . . . . . . . . . . . . . 567.2 Expected duration until recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.2.1 Other causes of the fluctuations in the loss ratio . . . . . . . . . . . . . . 61

8 Conclusion and advice 63

A Estimation results of the MPH model 65A.1 PH assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70A.2 Business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A.2.1 Influence business cycle per profession . . . . . . . . . . . . . . . . . . . . 74A.2.2 Influence business cycle per disorder . . . . . . . . . . . . . . . . . . . . . 75

B Estimation results of the logit model 77B.1 Business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

C Business cycle 83

D Comparison of models 84D.1 Comparing non-nested models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

D.1.1 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84D.1.2 Information Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Page 6: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Chapter 1

Introduction

When an employee is sick or injured and it is impossible to perform his normal working activities,his employer is obliged to continue his salary payments. In the first year this amounts to a full100% of his last salary and in the second year this is 70%. After these two years the governmentwill provide benefit payments until the employee is recovered. However, about 14% of the Dutchlabour force consists of self-employed, who cannot benefit from this construction. In order toassure themselves of continuation of income and make sure their business survives in case oflong-term sickness or disability, a self-employed can buy a so-called disability income insurance.

In this thesis we consider a large Dutch insurance company, which provides a disability incomeinsurance for self-employed. For this company it is crucial to know which risk factors affectthe probability that a client becomes disabled and which affect the return-to-work process. Un-derstanding these risk factors gives the company more insight into the individual risk of eachapplicant and it can help to determine the premium that should be asked and the amount ofcapital that should be kept. To illustrate the importance of knowing what these factors are andentail, we will start by sketching the financial and economic background. The yearly premiumof the disability income insurance amounts to €250− 275 million. This means that a differenceof one percentage point in the benefit payments results in a profit or loss of almost €3 million.Besides this, the financial interest is also caused by the planned introduction of Solvency II inJanuary 2013. This is the newest risk management regulatory framework, developed by the Eu-ropean Union (EU), and consists of a three-pillar structure of insurance supervision. The mostimportant of these three is quantitative requirements, which is a set of rules about determiningthe minimal capital and the target capital. The minimal capital partly depends on whether thebusiness is related to life or non-life insurance, whereas the target capital corresponds to theinsurers economic capital for running its business within a given safety level. To determine thetarget capital, an insurance company can use either a standard or an internal risk model. Thelatter is a model constructed by the insurer itself for its specific needs, based on its own data. Incontrast, a standard model is one designed by the EU and is used uniformly across insurers. It isexpected that internal models result in more accurate analysis of the insurers financial situationthan the more generic standard models. However, before an internal model may be used, ithas to be certified by the EU. This process requires detailed documentation of the model andits underlying assumptions. Furthermore it has to be examined periodically to ensure that themodel is properly adjusted to the dynamic financial environment.

It is reasonable to assume that more knowledge about which risk factors affect the disabilityprocess will result in a better-fitting model. When the fit of the model improves, this will result

1

Page 7: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 1. INTRODUCTION 2

in less unexplained variance in the payments. With Solvency II in mind, this means that thecompany could keep less target capital. This amount can then be used for other purposes tobenefit the company.

In this thesis we will focus on the loss ratio, which is the ratio between the benefit payments andthe premium. In an ideal situation, the company would be able to predict the payments withgreat accuracy so they could adjust their premiums in order to keep this ratio approximatelyconstant over all years. However, the loss ratio of the company under consideration was notconstant at all, but showed some large variations over the last years. This is shown in the graphbelow.

Figure 1.1: The loss ratio of the company under consideration in the period 2001− 2010.

The aim of this thesis is to explain these fluctuations and to give advice on how this explanationcould be used in order to improve the current models. The loss ratio is determined by threethings: the probability that an insured becomes disabled, the duration of the disability, and thepremium that is received. Therefore our research question should be divided into three separatesections. We will start by briefly discussing the last one. The premium is determined solely bythe company itself and it is not affected by other external factors. Therefore it seems reasonablenot to analyse it in this thesis. Still, it is worth mentioning that in the last years the policy of thecompany has changed. In order to satisfy their loyal costumers they felt that a new insuranceshould not be much cheaper than an existing one. This resulted in a reduction in prizes for theexisting clients, which amounted to a total of €10 million. However, even if the company hadnot implemented this reduction and we would add this €10 million extra premium to the lossratio, the fluctuations would still be clearly observable. This means that the fluctuations are notsolely caused by this price reduction. Therefore the research question is now reduced to:What is the cause of the variations in the benefit payments?

To answer this question we first take a look at the chance an insured becomes disabled. As itturns out, this chance is rather constant and there are no large deviations, as is shown in 1.2.

Page 8: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 1. INTRODUCTION 3

Figure 1.2: The percentage of insured that become disabled in a certain month

Therefore it can be stated that there are no indications that the fluctuations in the loss ratioare caused by changes in the disability probability. On the other hand, the average duration ofa claim does vary over time. This is clearly illustrated in figure 1.3. Since the influx did notchange substantially, an increase in the percentage disabled should be caused by an increase indisability duration and vice versa.

Figure 1.3: The percentage of insured that is disabled.

Therefore the final remaining question we would like to answer in this thesis is: What did causethe fluctuations in disability durations? First of all the current models do not include certainvariables, like the year the disability started, the brand, and the deferment period, which weassume to be important factors in disability duration. In the literature various other causes arementioned. For example, De Ravin [15] refers to the following reasons for increasing disability

Page 9: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 1. INTRODUCTION 4

experience:

• Weaker underwriting standards;

• Greater awareness of the cover and the right to claim;

• Changing work ethic and social attitudes to insurance;

• Changes in the economic environment;

• More liberal definitions of disability;

• Weakening of other policy terms and conditions

• Under-resourced or under-skilled claims management

The hypothesis that changes in the economic environment influence the recovery process is alsostated in [30] and [31]. Looking back at figure 1.3 it is noticed that the increase in the percentageof disabled insured started in the first quarter of 2008, the year in which the financial andeconomic crisis started. Therefore it seems reasonable to assume that the economic environmentis a major contributor to the variation in disability duration.

With respect to the business cycle, the hypothesis is that a self-employed wants to return tohis business as soon as possible when the economy is booming, since in that case a large profitand hence a large income can be achieved. Moral hazard could play a role when the economyis in a downturn. During a recession the income of a self-employed is likely to become less ascompared to periods of high economic growth and hence a replacement income paid by disabilityinsurance may seem an attractive alternative. These expectations are confirmed by Smoluk (31),who states that when the consumption-to-wealth ratio1is high, long-term disability claim ratesare low, and vice versa.It is also expected that the influence differs per disorder and profession. For example, it isplausible that the influence on claimants with cancer is relatively small, whereas it will probablybe more noticeable on stress-related disorders, such as backache or other locomotive disorders.Since the economic crisis heavily affected the construction and the shopping industry we thinkthat those claimants are more prone to changes in the economic environment. On the otherhand, we assume that claimants working in the (para)medical sector are hardly affected by thebusiness cycle.

The articles written by Spierdijk en Koning (32), Amelink (7) and Bultena (10) are used as astarting point of this thesis. In these articles similar data sets are used to identify risk factorsfor positive and negative recovery and to estimate the claim reserves and its uncertainty. Wewill extend this analysis by including extra variables and not only use survival analysis, butalso logistic regression methods. The outline of this thesis is as follows: In chapter 1 it will beexplained how disability income insurance for self-employed is organized in the Netherlands. Alsosome aspects of the insurance company under consideration and properties of the data set arediscussed. In chapter 2 a multi-state model to model the recovery process is introduced, togetherwith the principle of Markov chains. The chapter finishes with an overview of all explanatoryvariables which will be considered. In chapters 3 and 4 we respectively introduce survival andlogit models. These models will be used to estimate the four transitions in the multi-state

1Financial economic theory suggests that the consumption-to-wealth ratio reflects consumption smoothing andreveals expectations about future wealth. For individuals contemplating submitting an LTD claim, the expectedpayoff to exercising this insurance option is a function of their expectations about their future wealth.

Page 10: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 1. INTRODUCTION 5

model. The results of these methods will be discussed in chapter 5. In chapter 6 we will quantifythe influence of the various risk factors by calculating the expected duration until recovery fordifferent values of the risk factors. Finally, in chapter 7 we will draw our conclusions and givean advice on how the results could be used in practice.

Page 11: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Chapter 2

Disability Income Insurance

In this chapter it first will be explained how income insurance for self-employed is organized in theNetherlands. Second, we will discuss some features of the income insurance sold by the companythat provided the data for this thesis. We will finish with an overview of the characteristics ofthe data that will be used in our analysis.

2.1 History

Until 2004, the state provided income insurance for self-employed in the Netherlands by meansof the WAZ, the ’Wet Arbeidsongeschikheidsverzekering Zelfstandigen’ (English: Self-EmployedIncome Insurance Act). The costs of the WAZ were funded from the tax payed by the self-employed. This insurance, however, only included compensation for loss of income after one yearof disability and for a maximum of 70% of the statutory minimum wage. For those who wantedearlier or extra compensation, private insurance companies provided additional insurance. Suchan insurance consisted of two parts: an A-cover and a B-cover. The first one provided incomein the first year of sickness, whereas the second one covered the income loss after the first year.

The cover of the WAZ was very limited, but at the same time it was quite expensive for many self-employed. Therefore there were a lot of complaints and in August 2004 the WAZ was abolishedby the Dutch government, partly at the request of MKB-Nederland (the largest entrepreneursorganization in the Netherlands, representing small and medium-sized companies). Since then,income insurance for self-employed has only been available from private insurance companies.

2.2 Features of the insurance

In this section we will have a closer look at the policy conditions belonging to the income insurancesold by the company that provided the data for this study.

Upon buying an income insurance sold by the company under consideration, a self-employed hasto make a number of choices. First of all he has to decide about the amount insured. In orderto avoid moral hazard, this annual replacement income can never exceed 80% of the income ofthe self-employed, with a maximum of 250.000. Second, he has to choose the deferment period,which refers to the time between becoming disabled and the start of the benefit payments. Thisperiod can be 14 days, 1, 2, 3, 6, 12 or even 24 months and depends on how long the insured cansurvive with other income or savings. The longer this deferment period, the lower the premium

6

Page 12: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 2. DISABILITY INCOME INSURANCE 7

that should be payed. Furthermore an insured should decide which criterion is used in orderto determine whether he is disabled: unable to perform his original profession or disabled toperform any kind of job. Last, a client has to choose the end age, which is the age at whichthe payments will stop no matter the health state of the insured. This can chosen to be 50 orany age between 55 and 65. Instead of choosing an end age there is also the option of receivingbenefit payments for a fixed period of time, namely 1, 2, 3, 4 or 5 years.

The benefit payments depend on the percentage of disability of a self-employed, which is alsoknown as the replacement rate. There are two different cases.

1. The benefit payments equal the percentage of incapacity times the amount insured.

2. The benefit payments equal the payout percentage times the amount insured, where thepayout percentages are defined as follows:

disability percentage payout percentage0 - 24 % 025 - 34 % 3035 - 44 % 4045 - 54 % 5055 - 64 % 6065 - 79 % 7580 - 100 % 100

In both cases the client should choose a lower bound. If his disability percentage is beneath thisbound, he will not receive any benefit payments. In the first case the lower bound can be chosento be 25%, 35%, 50% or 75%, whereas in the second case the choice is between 25%, 35%, 45%,55%, 65% and 80%. For ease, in this thesis we will only consider case 1 and we will assume thatthe lower bounds are set to 25%, which is the case for the majority of the insured.

When an insured is fully disabled, no insurance premium has to be paid. In all other cases theinsured has to pay a premium which equals the original premium times the percentage he is stillable to work. The benefit payments can end for various reasons, for example if the insured isrecovered, passes away or reaches his end age.

2.3 Data description

The data set used for this thesis has been provided by a large Dutch insurance company andconsists of 62451 approved sick leave claims during the period running from December 2002 upto February 2012. For each claim a wide range of characteristics is given. Besides the gender andthe day of birth, we also know the profession in which the claimant is working. In addition, thedate on which the disability started and the current status of the claim are known. Furthermorefor each claim we know the total disability duration, measured in months, and the evolution ofthe replacement rate during the incapacity spell. This rate is a time-varying variable reportedon a monthly basis. Finally we have information about the postal code, the amount insured andthe brand of insurance (1, 2, 3 or 4).

For some of the claims the first couple of replacement rates are missing. This can be due toseveral reasons, for example it can be the case that the insurance company was not able todetermine the rate for the first month(s). This can happen when the claim is reported at the last

Page 13: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 2. DISABILITY INCOME INSURANCE 8

day of a given month. Another explanation could be that the claimant has a deferment periodof a couple of days, meaning that there will be some time between the start of the disability andthe start of the observation. It could also be that a client does not report his claim immediately,but waits for a period which can even last a couple of years. If the first replacement rates aremissing, these will be treated as delayed entries. Claims for which all entries are either a missingor between 0% and 25%, will be removed from our data set. Also claims due to pregnancywithout complications (the so-called G600) are removed, since these claims follow a differentprocess than claims due to other illnesses. For example, pregnancy claims almost always occurto women and are more frequent among younger insured, whereas the prevalence of other claimsis higher among older clients. The final adjustment that had to be made concerns the differentbrands. In our data set, 95% of the claims correspond to brand 1. The remaining 5% belongs toeither 2, 3 or 4, and these are added to our data set after the merger of the four different brandsin the years 2009/2010. This results in relatively few short claims and more long-term disabilityclaims for these brands, which causes a bias. Therefore it is chosen to only consider the claimsbelonging to brand 1.

This resulted in a final data set containing 31780 claims. However, before we could start analyzingthese data, we had to make some corrections.

• If the claim had ended because of death or expiring, the zeros after the last positivedisability percentage were replaced by missings. In this way only the changes in disabilityduring the validity of the claim are considered, and not the seemingly recovery at the end.

• Zeros preceding the first positive disability percentages were replaced by missings. Thiswas the case by so-called IBNER claims (incurred, but not enough reported), meaning thatthe claim was fully reported only after some time since the disability started. It would beincorrect to assign the value 0 to this cases, since that would mean that the client is notdisabled.

• There were two claims which were noteworthy, because they had a striking deviating re-covery process. After checking, however, it turned out that some replacement rates werewrong, so these were corrected manually.

From the final set 6171 claims are still continuing. We cannot simply assume that these haveended in our observation window, so we will treat these claims as right-censored. If we considerthe claims that have ended, we see that most of them did so because of recovery of the insured.There are, however, other reasons for the benefit payments to stop, for example because theinsured passed away or had reached his end age. For our analysis we are only interested in thoseclaims which have ended because of a full recovery. Claims ended because of another reason willtherefore also be treated as right censored, resulting in a total of 7083 right-censored claims,which is 22, 29% of the total data set.We end this chapter by providing an overview of some characteristics of our data:

Gender Number PercentageMale 27956 87, 97%

Female 3824 12, 03%

Table 2.1: Preliminary statistics: gender

Page 14: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 2. DISABILITY INCOME INSURANCE 9

Profession Number PercentageAgriculture 13639 42, 92%

Construction 5987 18, 84%Shopkeeper 2592 8, 16%

(Para)Medical 2407 7, 57%Service 1655 5, 21%Other 5500 17, 30%

Table 2.2: Preliminary statistics: type of profession.

Disorder Symptoms Cancer Infections Injury OtherLocomotive disease 28, 7% 0, 2% 0% 23, 0% 12, 5%

Psychological disease 4, 2% 0% 0% 0% 5, 3%Digestive disease 0, 4% 0, 5% 0, 2% 0% 3, 8%

Other 4, 5% 1, 9% 1, 3% 2.0% 11.8%

Table 2.3: Preliminary statistics: type of disorder.

The age of the claimants at the start of the disability varies between 18 and 64 years. Theaverage age is 43 years.

Page 15: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Chapter 3

Recovery models

In order to determine to what account the business cycle affects the expected duration untilrecovery, we have to be able to determine what the expected duration until recovery would be ifthe influence of the business cycle is neglected. Therefore we first will discuss how the disabilitydurations are modeled by the insurance company until now. A drawback of this current modelis that it only focuses on the total disability volume of all insured. Therefore no individualinformation of the claimants is considered and no prognoses about partial or negative recovery(= transition to a higher disability percentage) are provided. However, information about thesetypes of rehabilitation can be very useful, since claimants who face a health deterioration duringtheir disability spell may be particularly costly for insurers, due to a higher replacement incomeand a possibly prolonged sick leave duration. It is therefore important to analyze the transi-tions from one health state to another and to know which variables affect the various transitionrates. Furthermore, the future recovery process of a claimant is likely to depend on his currenthealth state. For example, we expect slower recovery for a self-employed with a higher disabilitypercentage. By analyzing disability spells in relation to the health condition of the claimant,we can assess the precise role of the risk factors in each stage of the return-to-work process.Another drawback of the current model is that it only contains a limited amount of explanatoryvariables, namely age, gender and time. In order to improve this and to give more insight inthe entire interrelated trajectory of the process of rehabilitation, we will present a multi-statemodel based on Markov-processes. We will add extra variables which are expected to affect therecovery process, such as type of disorder and deferment period.

3.1 Current model

In the model that is currently used by the insurance company there is a distinction between twostates:

• Healthy (H): Disability of 0 − 24%. In this state no benefit payments are made by theinsurer, so the insured is either at work or in a unpaid sick leave situation.

• Disabled (D): Disability of 25− 100%.

In order to model the transitions between these states the following definitions are used:

Definition 1. The disability volume at time t is the sum of all disability percentages of allinsured at time t. With DIS%(X,A, t) we denote the disability volume at time t of all insuredwho are X years old and who are disabled for A to A+ 1 months at time t.

10

Page 16: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 11

Figure 3.1: The two-state model which is currently used by the insurance company. Transitionλ reflects invalidation and µ recovery.

Definition 2. The recovery probability r(X,A, t) is the probability that an insured, who becamedisabled at the age of X and at time t − 1 is disabled for A months, recovers in the subsequentmonth.

The recovery probability is determined by the difference in the disability volume of all insuredof age X of two consecutive months, divided by the begin situation, or in formula (for t ≥ 1):

r(X,A, t) =DIS%(X,A− 1, t− 1)−DIS%(X,A, t)

DIS%(X,A− 1, t− 1)

Applying this formula to our data set with X = 43 results in the graph shown in figure 3.2(averaged over time t). The graphs shows a wiggly character for large A, since there are relativelyfew observations for these times. However, it is clear that the recovery probability stronglydeclines as the disability spell continues.

Figure 3.2: Recovery probabilities for a 43 years old.

Besides the influences of time, it was also shown that age is an important risk factor. In generalone can say that the older the claimant, the slower the recovery. Therefore it was chosen to picka linear relation between age and recovery. Third it was pointed out that the graphs showedsome cracks. This was modeled by dividing the formula into five parts and by including threeextra parameters for the first three months of disability. Because of the confidentiality of theinformation, the exact formulas cannot be shown in this thesis.

Page 17: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 12

3.2 Multi-state model

As mentioned before, the current model has the drawback that it only considers the total disabil-ity volume of all insured and it does not provide individual information about partial or negativerecovery. In order to give more insight in the recovery process, we will split the state of disabilityinto more states. In this way we can distinguish between mild and severe disability. When aninsured is disabled, his level of disability is expressed in a percentage, ranging from 0 to 100%. Ifwe would include all these different percentages in our model, this would result in 101 states and101 · 100 = 10100 possible transitions. Since we would like to create a simple model to describethe behavior of insured, this is not practical. Therefore we will define a model with only threedisability states. The choice of these states is based on the graph shown in figure 3.3. We noticethat the most common percentages in the third month of disability are 0, around 50 and 100.

Figure 3.3: The frequency of disability percentages in the third month of disability.

Based on this frequencies the different disability percentages can be classified into three states1:

• State 0, Healthy : Disability of 0− 24%. In this state no benefit payments are made by theinsurer, so the insured is either at work or in a unpaid sick leave situation.

• State 1, Partial disabled: Disability of 25− 75%.

• State 2, Disabled: Disability of 76− 100%.

During a disability spell, a claimant can jump between these three states. There are six possibletransitions: 0 → 1, 0 → 2, 1 → 0, 1 → 2, 2 → 0 and 2 → 1. However, the first two transitionsrepresent the probability that an insured becomes disabled or that a claimant experiences a fall-back within four weeks after he fully recovered. These transitions will not be discussed in thisthesis. Consequently, there are four transitions left to focus on. A graphical representation ofthe multi-state model is given in figure 3.4. We notice that transition 1→ 2 represents a declinein health status, whereas the other three reflect a full or partial recovery.

A claimant makes a transition from one state to another at some moment in time. There aretwo different ways we can specify time here. The first possibility is to define it as the total timespend in the system, whereas another way would be to specify it as the time in the current state.In the first case we have that the time at which a claimant makes a transition is the time sincethe start of the disability and in the second case it is the time spend in a particular state since

1We have performed some robustness checks and it turned out that our final results are robust to changes inthe definition of the three health states.

Page 18: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 13

Figure 3.4: A multi-state model with three states, representing the different degrees of disability.The two transitions which are illustrated by the dashed arrows are not observed.

the previous transition. As the purpose of this thesis is to address the influence of the businesscycle of the expected duration until recovery it is reasonable to consider the time in the system.Claims with a duration of more than 3 years are regarded chronical, and it is assumed that therecovery processes of these claimants are not affected by the business cycle. Therefore we willonly consider transitions that take place before the 37th month. This results in a total of 33595transitions, from which an overview is given below in table 3.1.

Current Next0 1 2

1 14584 (43, 4%) - 3118 (9, 3%)2 6554 (19, 5%) 9339 (27, 8%) -

Table 3.1: An overview of the transitions in the data set.

3.2.1 Markov chains

A natural way to model the recovery process is by using a multi-state model. Andersen et al ([3])has studied such models using a finite state Markov process model where the hazard rates for eachpossible transition in the multi-state model are modeled by a separate Cox proportional hazardsmodel. Proportional hazards models will be explained in chapter 4 about survival analysis. Inthis section we will give a more formal definition of the multi-state model introduced in theprevious section. Therefore we need the definitions of a Markov chain and stochastic processes,which we will discuss now.

Loosely speaking, a stochastic process is a phenomenon that can be thought of as evolving intime in a random manner. More formal we define:

Definition 3. A stochastic process is a collection X = (Xt)t∈T of measurable maps from aprobability space (Ω,F , P ) to the state space (E, E).

The index t is a time parameter, and we view the index set T as the set of all observation instantsof the process. The stochastic process is called a discrete-time process when T is countable. On

Page 19: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 14

the contrary, when T is an interval of the real line we say that X = (Xt)t∈T is a continuous-timeprocess.

Definition 4. Let X = (Xt)t∈T be a stochastic process with finite state space E = x0, x1, ...xmand let T correspond to a finite set of times. Then X = (Xt)t∈T is called a Markov chain if theMarkov property is satisfied: ∀n = 0, 1, .. and ∀x0, ..., xn ∈ E it holds that

P (Xn = xn|X0 = x0, ..., Xn−1 = xn−1) = P (Xn = xn|Xn−1 = xn−1). (3.1)

We notice that the Markov property states that the transition probability only depends on thecurrent state Xn−1 = xn−1, and is independent of the path before xn−1.

Another way to characterize a Markov chain is by stating:

f(x0, ..., xn) = f(x0)f(x1|x0)f(x2|x0, x1) · ... · f(xn|x0, ..., xn−1)

= f(x0)f(x1|x0)f(x2|x1) · ... · f(xn|xn−1).

Between two different states from E transitions can take place. These are formally defined asthe set

(i, j)|i 6= j; i, j ∈ E.

Definition 5. For 0 ≤ t < s and i, j ∈ 0, ..,m the transition probabilities are given by:

Pij(t, s) = P (Xs = xj |Xt = xi).

The transition intensities are defined by:

qij(t) = lims→t

Pij(t, s)

s− t

Therefore, the transition intensity can be interpreted as an instantaneous probability of goingfrom state xi to state xj . The advantage of the transition intensity over the related probability,is the fact that it depends on a single time variable, instead of two.

We end this section by formally define our multi-state model as a Markov chain X = (Xt)t∈Twith T = 0, 1, ..., 36 in months and state space (E = 0, 1, 2, 2E). The transitions are givenby (0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1), from which we only focus on the last four.

3.3 Explanatory variables

In this section we will discuss the independent variables which might affect the recovery pro-cess. First, we will give an overview of the individual-specific covariates that will be considered.Second, the variables related to the business cycle will be discussed.

Page 20: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 15

Regressor DescriptionSocio-economic statusGender Dummy variable for gender (male = 1)Age Age at the start of the disability spell (in years)

Classification of disorderLocomotive Dummy variable for locomotive diseasePsychological Dummy variable for psychological diseaseDigestive Dummy variable for digestive disease

Type of disorder1 Dummy variable for symptoms2 Dummy variable for cancer4 Dummy variable for infections5 Dummy variable for injury

Occupational classDummy variables for Agricultural sector, Construction, Shopkeeper,

(Para)Medical sector, Service and Other

Contract characteristicsCompensation Insured incomeDeferment2 Deferment period

OtherDis. Year The year in which the disability startedPrevious Dummy variable for previous state

Table 3.2: Description of (possible) explanatory variables.

3.3.1 Business cycle

The term business cycle (or economic cycle) refers to economy-wide fluctuations in productionor economic activity over several months or years. These fluctuations occur around a long-termgrowth trend, and typically involve shifts over time between periods of relatively rapid economicgrowth (an expansion or boom), and periods of relative stagnation or decline (a contraction orrecession).

2The possible deferment periods are divided in four groups: A (≤ 14 days), B (1 month), C (2 − 6 months)and D (> 6 months).

Page 21: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 16

Figure 3.5: The different phases of the business cycle.

In order to assist in the analysis of the state and the course of the Dutch economy, the CBS,also called Statistics Netherlands, developed the Business Cycle Tracer (BCT). As its nameindicates, the BCT traces the cyclical nature of economic developments. The state of the businesscycle is determined using a selection of key macro-economic indicators. Portraying this fifteenindicators together results in a coherent picture of the state of the economy at a particularmoment in time. We will illustrate this with the following figure.

Figure 3.6: The Business Cycle Tracer in March 2012.

For each indicator, the deviation from its long-term trend is given on the y-axis and the period-on-period change is given on the x-axis. Four situations can the distinguished, corresponding tothe four different quadrants:

• Above trend and decreased (upper left-hand quadrant).

• Below trend and decreased (lower left-hand quadrant).

• Below trend and increased (lower right-hand quadrant).

• Above trend and increased (upper right-hand quadrant).

Page 22: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 17

For each indicator this results in a coordinate in one of the four quadrants. The distribution ofthe various indicators across these quadrants gives an indication of the state and the course ofthe Dutch business cycle. This is based on the average position and movement of the indicators.In a period of high economic growth most of of the indicators will be above trend, whereas in aperiod of economic decline they will be below trend.

The 15 indicators are divided into three macro-economic clusters: confidence, economy andlabour market. To analyse the business cycle, it is also important to know how the 15 indicatorsrelate to each other in terms of time, which is also called business cycle phasing. To this endthe indicators can be divided into leading, coincident and lagging with respect to upward anddownward movements in the business cycle. The extent to which amount each indicator leads,coincides or lags is calculated by correlating it to the Business Cycle Tracer Indicator. Thisindicator is the unweighed arithmetic mean of the fifteen indicators [38].

Leading indicators are the first to show which way macro-economic activity is headed in themedium term. Therefore it is important to look at these indicators first in the BCT. Normallythey move into a subsequent phase an average six months earlier than the coincident indicators.There are 5 leading indicators in total, of which 4 are confidence indicators. As a sentimentfactor, confidence in the economy will adjust to the business cycle more quickly than the physicaleconomic and labour market indicators. The 5th leading indicator is temp hours. Because ofits temporary nature, work via temp agencies can also adapt to economic circumstance morequickly.

Coincident indicators correlate most closely in time with the upward and downward movementsin macro-economic activity. There are 7 coincident indicators in the BCT, of which 6 economicand 1 labour market indicator (bankruptcies). The coincident indicators are very important forthe BCT, as they provide the actual information for a reliable up-to-date picture of the Dutchbusiness cycle.

Lastly, there is the group of lagging indicators. In the BCT these lagging indicators are the secondconfirmation that the business cycle has moved to or is in a next phase. It is no coincidencethat the 3 lagging indicators are all labour market indicators: labour volume, job vacancies andunemployment. As a result of the strict labour regulations in the Netherlands, it takes sometime for the rigid labour market to adapt to changes in the economy. Compared with the othertwo groups, the movements of the lagging indicators are the calmest. And this is exactly whythey are included in the BCT. Once they begin to change, there is no doubt about the way theeconomy is headed.

The BCT indicator shows the state of the business cycle in one figure. The leading indicatorswill have shown this on average six months earlier. They should be seen as the quick businesscycle indicators. But it is the coincident indicators that show the actual changes in the businesscycle. The role of the lagging indicators is mainly to confirm the durability of the business cyclechanges. This is important because the course of the cycle is not constant but variable. Anoverview of the phasing of the 15 indicators is shown in figure 3.7.

Page 23: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 18

Figure 3.7: Phasing of the BCT indicators

In order to address the influence of the business cycle on the recovery process, a proper measureshould be used. However, as illustrated above, there is no universally agreed single measure ofthe economy. Data on interest rates, unemployment, GDP and inflation are frequently quoted inthe media, but this is by no means a definitive list of economic variables. Other measures such asthe number of bankruptcies and retail sales are also useful in explaining aspects of the economy.Each economic statistic captures some part of the economy, but no measure has been foundthat adequately describes the overall situation [30]. Furthermore, these measures differ in thedegree of correlation with the producer environment. Based on these differences and exploratoryanalyses it was chosen to consider the 5 indicators described below3.

DNB business cycle indicatorThe DNB business cycle indicator provides insight into the economic outlook in the short termand aims to identify turning points in the Dutch business cycle at maximum seven monthsforward. The indicator is drawn from consumer and producer surveys, financial indicators andexport indicators [39].

ConfidenceAs summary measure of the leading indicators it is chosen to consider the confidence, which isdefined as the weighted average of the producer and consumer confidence index:

• The producer confidence index (PCI) or business confidence is a survey of 1700 man-ufacturing companies which gathers up-to-date information on economic developments forall activities of the manufacturing industry. The basis of this producer confidence consistsof three components of the economic survey: how companies evaluate their order positions,the number of finished products in stock and the anticipated economic activity in the nextthree months.

• The consumer confidence index (CCI) is an indicator designed to measure consumerconfidence, which is defined as the degree of optimism on the state of the economy thatconsumers are expressing through their activities of savings and spending.

Since the values of the PCI range from −23, 5 to 9, 4, whereas the values of the CCI range from−40 to 18 it is chosen to assign them the weights 2 and 1 respectively.Hence: Confidence = (2*PCI + CCI)/3.

3Information about these variables can be found on www.cbs.nl and www.dnb.nl

Page 24: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 3. RECOVERY MODELS 19

Gross domestic productAs coincident indicator it is chosen to use the gross domestic product (GDP), since it is the mainindicator for the development of the economy [38]. The GDP refers to the market value of allofficially recognized final goods and services produced in the Netherlands. Economic growth ismeasured in terms of the volume change in GDP. In formula-form the GDP can be described as:GDP = private consumption + gross investment + government spending + (exports - imports).

LabourAs summary measure of the lagging indicators the arithmetic average of the labour volume andthe number of vacancies is used.

Average incomeAs a unconventional measure of the business climate, we have chosen to consider the averageincome of a self-employed, corrected for inflation.

It is assumed that an increases in each of these measures correspond with an improvement in thebusiness climate, and vice versa. In his thesis [10], Pieter Bultena measured the influence of theunemployment rate on the different transitions. We have chosen to neglect this variable, since itis a measure of employees rather than of self-employed. On the other hand, Remko Amelink [7]used the GDP growth rate and the business confidence index as business cycle related variables.He concludes that a decrease in the GDP growth rate leads to higher recovery rates, henceshorter durations. For the coefficient of the business confidence index he allows for a structuralchange after March 2009. After this date, most of the self-employed started to experience theconsequences of the financial and economic crisis. It is concluded that until March 2009, anincrease in the business confidence index led to a significant increase in recovery rates. However,after this date, a change in the index has no significant effect anymore.

Page 25: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Chapter 4

Survival Theory

In this and the following chapter we will discuss econometric models which could be used forthe four transitions introduced in section 3.2. An econometric model is a set of joint probabilitydistributions to which the true joint probability distribution of the variables under study issupposed to belong. In the case in which the elements of this set can be indexed by a finitenumber of real-valued parameters, the model is called a parametric model; otherwise it is anonparametric or semi-parametric model.We note that in our case there are only two possible outcomes for each transition (jump or nojump), hence the number of models that can be used is limited. In this chapter we will discusssurvival analysis, in the next one we will continue with binary regression.

In survival analysis, interest centers on a group or groups of individuals for each of whom (which)there is defined a point event, often called a failure, occurring after a length of time called thefailure time. To determine this time precisely, there are three requirements:

1. A time origin must be unambiguously defined.

2. A scale for measuring the passage of time must be agreed.

3. The meaning of failure must be entirely clear.

A special source of difficulty in the analysis of survival data is the possibility that some individualsmay not be observed for the full time to failure. Ideally both the ’birth’ and ’death’ dates of allsubjects are known (for our purpose the date a claimant enters and leaves a specific state). Inpractice, however, this often will not be the case. Sometimes it is only known that the failuretime is after some date, which is called right censoring. Right censoring will occur for thosesubjects whose date of birth is known, but who are still alive when they withdraw from thestudy or when the study ends. If a subject’s lifetime is known to be less than a certain duration,the lifetime is said to be left-censored.

It can also happen that subjects with a lifetime less than some threshold may not be observedat all: this is called truncation. Note that truncation is different from left-censoring, since fora left-censored datum, we know that the subject exists, but for a truncated datum, we may becompletely unaware of the subject.

In the rest of this chapter we think of failure time as a continuous random variable T , equippedwith distribution function F (t) = P (T ≤ t) = P (T < t) and probability density function

20

Page 26: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 21

f(t) = dF/dt. We consider a large population of people who enter some given state at a time weshall identify as T = 0. The calender time of entry need not be the same for all people and inmost practical cases it will not be. Thus, T measures the time on ’person-specific’ clocks that areeach set to zero at the moment that person enters the state we consider. T is then referred to asthe duration of stay in the state. For now it assumed that the population is homogeneous withrespect to regressor variables that affect the distribution of T . This means that the duration ofstay will be a realization of a random variable from the same probability distribution.

4.1 Hazard function

The probability that a person who has occupied a state for a time t leaves it in the short interval dtafter t is equal to P (t ≤ T < t+dt|T ≥ t). However, our interest is focused on the instantaneousrate of leaving per unit time period at t. Therefore we need the following definition:

Definition 6. The hazard function is defined as:

λ(t) = lim∆t→0

P (t ≤ T < t+ ∆t|T ≥ t)∆t

.

Globally stated the hazard function, which is also called the hazard rate, is the probability that acertain event happens in a certain period, given what has happened before the beginning of thatperiod. We note, however, that the hazard rate is not a true probability in the sense that it canexceed the value 1 when ∆t decreases. It is most useful to think of the hazard as a characteristicof individuals, not of populations or samples (unless everyone in the population is exactly thesame). Each individual may have a hazard function that is completely different from anyoneelse’s.

By using the definition of conditional probability, we can express the hazard function in termsof the distribution and probability density function of the continuous random variable T :

λ(t) = lim∆t→0

P (t ≤ T < t+ ∆t)

P (T ≥ t)1

∆t

= lim∆t→0

F (t+ ∆t)− F (t)

1− F (t)

1

∆t

=F ′(t)

1− F (t)=

f(t)

1− F (t)

=f(t)

S(t)(4.1)

where S(t) := 1− F (t) = P (T > t) is called the survival function, since it gives the probabilityof survival to time t.

Since f(t) = dF (t)/dt = −dS(t)/dt, we can view (4.1) as a differential equation in t whosesolution, subject to the initial condition S(0) = 1, is given by:

S(t) = exp

−∫ t

0

λ(s) ds

, (4.2)

as can be verified by differentiation.

Page 27: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 22

This shows how one can calculate the probability distribution of the duration of state occupancygiven the hazard function. We note that it follows that the density function of T can be writtenas:

f(t) = λ(t) exp

−∫ t

0

λ(s) ds

= λ(t) · S(t). (4.3)

So, λ(t), S(t) and f(t) are alternative ways to describe the distribution of the probability of exitover the positive real axis; if we know one, we can deduce the others.

Last, we introduce the cumulative hazard or integrated hazard, which is defined as

Λ(t) = − lnS(t) =

∫ t

0

λ(s) ds.

We can think of Λ(t) as the sum of the risk one will face going from duration 0 to t.

Suppose we are interested in the expectation of life (or the expected duration in a specific state).Let E(T ) = µ. By definition we have:

µ =

∫ ∞0

tf(t) dt.

Integrating by parts, and making use of the fact that f(t) = −dS(t)dt , which has limits S(0) = 1

and S(∞) = 0, one can show that

µ =

∫ ∞0

S(t) dt. (4.4)

We have seen that the hazard function is useful to describe the probability distribution for thetime of event occurrence. Every hazard function has a corresponding probability distribution.But hazard functions can be extremely complicated, and the associated probability distributionsmay be rather complex. We only will examine some simple hazard functions and discussestheir associated probability distributions. These hazard functions are the basis for some widelyemployed regression models.

Example 1. The simplest function states that the hazard is constant over time: λ(t) = λ or,equivalently, log λ(t) = log(λ) = µ. Substituting this hazard into equation (4.2) and carrying outthe integration implies that the survival function is S(t) = e−λt. Then, from equation ((4.3)),we get the density function, f(t) = λe−λt. This is the density function for the well-knownexponential distribution with parameter λ. Thus, a constant hazard implies an exponentialdistribution for the time until an event occurs (or the time between events), which makes sensedue to memoryless property of this distribution.

Example 2. The next step up in complexity is to let the natural logarithm of the hazard be alinear function of time: log λ(t) = µ + a · t. Taking the logarithm is a convenient and popularway to ensure that λ(t) is nonnegative, regardless of the values of µ, a, and t. Of course, we can

rewrite the equation as λ(t) = eµ · eat = λ · eat. After integration we find that S(t) = e−λa (eat−1),

so F (t) = 1− e−λa (eat−1). Hence, this hazard function implies that the time of event occurrencehas a Gompertz distribution.

Example 3. Another possibility is to assume: log λ(t) = log a + (a − 1) log t or, equivalently,λ(t) = λata−1. Then the survival function equals S(t) = e−λt

a

. The cumulative distributionfunction becomes F (t) = 1− e−λta , in which we recognize the Weibull-distribution.

The Weibull model is used the most frequent in economical applications.

Page 28: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 23

4.2 Kaplan Meier estimator

In this section we introduce the non-parametric Kaplan-Meier estimator, which estimates thesurvivor and hazard functions (see also [16]). Most of the times it is used for preliminary analysisof the data, since a drawback of this estimator is that it does not allow for covariates.

Suppose we consider a data set with n subjects, which have ordered survival times t1 < ... <tn <∞. We will use the counting process notation Ni(t), Yi(t) | 0 < t <∞, where Ni(t) takesthe value one if subject i has been observed to fail prior to time t and takes value zero otherwise.Yi(t) takes value one if subject i is at risk at time t and zero otherwise. We denote the aggregatedprocesses by:

• N(t) =∑iNi(t) =

∑i 1ti≤t: the number of spells completed up to and including time t.

• Y (t) =∑i Yi(t): the number of persons at risk of making a transition immediately prior to

t, which is made up of those who have a censored or completed spell of length t or longer.

We start by estimating the integrated hazard rate. Therefore we consider a small interval oftime:

Λ(s+ h)− Λ(s) ≈ λ(s)h

= P (event in (s, s+ h]|at risk at s)

It is reasonable to estimate this probability by:

N(s+ h)−N(s)

Y (s).

Integrating this over the range (0, t] yields the so-called Nelson-Aalen estimator

Λ(t) =

∫ t

0

dN(s)

Y (s).

Since we deal with discrete time intervals in the disability data set, it is more convenient to definethe Nelson-Aalen estimator by the equivalent sum

Λ(t) =∑i:ti≤t

∆N(ti)

Y (ti),

where ∆N(ti) denotes the number of events occurring precisely at time ti, the time until the ithevent. Since the integrated hazard rate has no useful interpretation, we will transform it to asurvival function. A logical estimator, proposed by Breslow (1972) is

SB(t) = exp[−Λ(t)

]. (4.5)

However, the Kaplan-Meijer estimator uses the increment of the Nelson-Aalen estimator at theith failure: dΛ(ti) = dN(ti)/Y (ti). The proportion of those entering a state who survive tothe first observed survival time t1 is simply one minus the proportion who made a transitionout of the state by that time: ˆSKM (t1) = 1 − dN(t1)/Y (t1) = 1 − dΛ(t1). Similarly, theproportion surviving to the second observed survival time t2 is ˆSKM (t1) multiplied by one minusthe proportion who made a transition out of the state between t1 and t2. More generally, theKaplan-Meijer estimator of the survival function is defined as:

SKM (t) =∏i:ti≤t

(1− dΛ(t)) (4.6)

Page 29: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 24

This estimator differs slightly from SB , but since ex ≈ 1 − x for small x, the Kaplan-Meijerestimator works fine for small increments dΛ, that is when there are many subjects still atrisk. The two estimates are in fact asymptotically equivalent, since as n → ∞ the individualincrements get arbitrary small [34].

4.3 Censored data and the Likelihood function

One of the main advantages of survival theory is that it can handle censored data. The onlytype of censoring we will consider is right-censoring. For this type we observe spells from time0 until a censoring time c. Some spells will have ended by this time anyway, but others willbe incomplete and all we know is that they will end somewhere in the interval (c,∞). In thissection we will discuss how these censored observations can be incorporated into the likelihoodfunction. For a brief introduction to maximum likelihood theory we refer to (D.1.1).

Suppose we have n individuals with transition lifetimes according to the survivor function S(t),with associated density f(t) and hazard λ(t). We further assume that person i is observed duringti time units. If he jumps at ti, its contribution to the likelihood function is the density at theduration, which according to (4.3) can be written as

Li = f(ti) = S(ti)λ(ti).

However, if we are dealing with a censored observation, all we know is that the lifetime exceedsti. The probability of this event is

Li = S(ti),

which becomes the contribution of a censored observation to the likelihood.We now introduce a transition indicator di, taking the value one if person i jumps and the valuezero if the observation is censored. Then the likelihood function can be written as

L =

n∏i=1

Li =

n∏i=1

S(ti)λ(ti)di .

Taking the natural logarithms and recalling the definition of the cumulative hazard, we obtainthe log-likelihood function

l = lnL =

n∑i=1

(di lnλ(ti)− Λ(ti)).

4.4 Cox Proportional Hazards model

Up to this point we have been concerned with a homogeneous population, where the lifetimesof all individuals are governed by the same survival function S(t) and hazard λ(t). However,individuals have distinctive features, such as age, gender and social environment, which are likelyto affect their lifetimes. To cope with this we will introduce a vector of covariates and considerthe general problem of modeling the influence of these independent variables on the survival time.This can be done by using a parametric model, for instance the exponential (see example 1) orWeibull (see example 3) distribution. Such models are relatively easy to estimate in the presenceof censoring, but they produce inconsistent parameter estimates if any part of the parametricmodel is misspecified. One way of resolving this is to choose parametric functional forms that areflexible and hence provide some protection against misspecification. Unfortunately, identificationand estimation of such flexible functional forms can be rather complicated. However, there

Page 30: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 25

is a semi-parametric method that requires less than complete distributional specification, theProportional Hazards model, developed by David Cox (1987, [13]). In fact, this method is viewedas empirically so successful that it has become the standard method for analyzing survival data[11].

In the proportional hazards (PH) model the hazard at time t for an individual with covariatesxi (not including a constant) is assumed to be

λ(t|xi) = λ0(t)φ(xi). (4.7)

Note that the model separates the effect of time from the effect of the covariates. The time-dependent function λ0(t) is the baseline hazard function that describes the risk for individualswith xi = 0, who serve as a reference cell. The function φ(xi) is the relative risk; a proportionateincrease or reduction in risk, associated with the set of characteristics xi. Note that the increase

or reduction in risk is the same at all durations t. Usually φ(xi) is chosen to be equal to eβTxi ,

since it ensures φ(xi) > 0. Furthermore it permits coefficients to be easily interpretable: Supposethe jth regressor xj increases by one unit, while the other regressors are unchanged. Then:

λ(t|xnew) = λ0(t)eβTx+βj

= eβjλ(t|x)

Thus the new hazard is eβj times the original hazard.Besides the reasons, there are other reasons for considering this model:

1. There is a simple and easily understood interpretation to the idea that the effect of avariable (say, treatment), is to multiply the hazard by a constant factor.

2. In some fields there is empirical evidence to support the assumption of proportional hazardsin distinct treatment groups.

3. Within this formulation, censoring and the occurrence of several types of failure are rela-tively easily accommodated.

4. It is possible to incorporate time-varying covariates with relative ease.

Therefore the following formulation of the proportional hazards model will be used:

λ(t|xi) = λ0(t)eβTxi . (4.8)

The baseline function, λ0(t), is an unspecified function, which makes the Cox model semi-parametric. Further we note that all hazard functions λ(t|x) of this form are proportional

to the baseline hazard, with scale factor eβTxi , which is not an explicit function of t.

4.4.1 PH assumption

The proportional hazards assumption requires that the hazard ratio is constant over time, orequivalently, that the hazard for one individual is proportional to the hazard for any otherindividual, where the proportionality constant is independent of time.

Let x and x denote the set of predictors for two different individuals. Then we can write thehazard ratio as

HR =λ(t|x)

λ(t|x)= e

∑ki=1 βi(xi−xi).

Page 31: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 26

So the proportional hazards assumption can be stated as:

e∑ki=1 βi(xi−xi) = constant,

or equivalentlyk∑i=1

βi(xi − xi) = constant. (4.9)

David Cox observed that if the proportional hazards assumption holds (or, is assumed to hold) itis possible to estimate the effect parameter(s) without any consideration of the baseline hazardfunction. We can integrate both sides of (4.8) from 0 tot t to obtain the cumulative hazards

Λ(t|xi) = Λ0(t)eβTxi ,

which are also proportional. By changing the signs and exponentiating we obtain the survivorfunctions

S(t|xi) = S0(t)eβT xi

, (4.10)

where S0(t) = e−Λ0(t) is the baseline survivor function. Thus, the effect of the covariate value xi

on the survivor function is to raise it to a power given by the relative risk eβTxi .

The most important question now arising is how to check the proportional hazards assumption.As a general rule one can take: if two hazards cross, the PH assumption is not met, so the CoxPH model is inappropriate. But what if the hazards do not cross? In that case there are twogeneral approaches to verify the PH assumption: graphical or by means of a goodness of fit test(GOF). The GOF approach is more appealing than the graphical one, since it provides a singletest statistic for each variable and is not as subjective as the graphical approach. Nevertheless,a GOF test may be too ”global” in that in may not detect specific departures from the PHassumption that may be observed in the graphical way. We will now explain the two methodsin detail.

1. Graphical approach

There are two types of graphical techniques available. The most popular of these involves com-paring estimated log-survivor curves over different categories of variables. A log survival curveis simply a transformation of an estimated survival curve that results from taking the naturallog of an estimated survival probability. By using (4.10) it follows that:

lnS(t,x) = βTxi + lnS0(t) =

k∑j=1

βj · xj + lnS0(t).

Now suppose we consider two different individuals and let x and x denote the set of predictorsfor these two. Subtracting the corresponding second log function from the first yields:

lnS(t, x)− lnS(t,x) =

k∑j−1

βj(xi − xi).

According to (4.9) this should be constant if the PH assumption is satisfied. Hence we canconclude that if the log survival curves of two individuals are parallel over time, they satisfy theproportional hazards assumption.

Page 32: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 27

An alternative graphical approach is to compare the observed with the predicted survivorcurves. The observed curves are derived for categories of the variable being checked, withoutputting this variable in the PH model. The predicted curves are derived with this variableincluded in the model. If both curves are close, then the PH assumption is satisfied. Thismethod is the graphical analog of the goodness of fit testing approach we will describe later andis therefore a reasonable alternative to the log survival curve approach. In particular, the GOFtest uses the same observed and expected survival probability estimates that are used to obtainthe observed and expected plots.

The first step is to stratify the data by categories of the predictor that is being verified. Then the’observed’ plots are drawn by deriving the Kaplan-Meijer curves for each category separately. Toobtain ’expected’ plots, we fit a Cox PH model containing the predictor. We then substitute thevalue for each category in the formula for the estimated survival curve. This results in separateestimated survival curves for each category. Finally, these plots are compared by putting bothsets on the same graph. When the observed and expected plots are ”close” to one another, itcan be concluded that the PH assumption is satisfied. If, however, one or more categories showquite discrepant plots, we will conclude that the PH assumption is violated.

An obvious drawback to both graphical approaches is deciding how parallel or close the twographs should be. This is a subjective decision. Most often a conservative strategy is used,meaning that the PH assumption is assumed to be satisfied unless there is strong evidence thatthe two curves are nonparallel or strongly discrepant.

2. Goodness-of-fit tests

In contrast to the graphical approach, the Goodness-of-fit testing method provides a test statisticor equivalent p-value for assessing the PH assumption for a given predictor. Hence it is possibleto make a more clear-cut and evidence-based decision. A number of different tests for assessingthe PH assumption have been proposed in the literature. The test proposed by Schoenfeld([29]) is the most common in practice. Instead of a single residual for each individual, there areseparate residuals for each individual, for each covariate. However, the Schoenfeld residuals arenot defined for censored individuals.

The Schoenfeld residual vector is calculated on a per event time basis as:

Si(t) = xi − xi,

where xi is a weighted average of the covariates over the risk set at time t and is given by:

xi =

∑nj=1 xj(t)Yj(t) exp(βTxj)∑n

j=1 Yj(t) exp(βTxj).

The implementation of the test can be performed in three steps.

• Perform a Cox PH model and obtain the Schoenfeld residuals for each covariate and indi-vidual.

• Create a variable that ranks the order of failures. The subject who has the first event getsa value of 1, the next gets a value of 2, and so on.

Page 33: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 28

• Test the correlation between the variables created in step 1 and 2. The null hypothesis isthat there is no correlation between the Schoenfeld residuals and ranked failure time.

The idea behind this statistical test is that if the PH assumption holds for a particular variablethen the Schoenfeld residuals for that covariate will not be related to survival time. A non-significant (i.e., large) p-value suggest that the PH assumption is reasonable, whereas a smallp-value, say less than 0.05, suggests that the independent variable being tested does not satisfythis assumption.

4.4.2 Partial Likelihood

Cox proposed a method to estimate β in the PH model without a simultaneous estimation of thebaseline hazard function λ0(t). However, if desired, an estimate of the baseline hazard can berecovered after estimation of β. The likelihood function for the proportional hazards model canbe factored into two parts. One that depends both on λ0(t) and β, and one that depends on βalone. Partial likelihood discards the first parts and treats the second part - the partial likelihoodfunction - as it were an ordinary likelihood function. Because there is some information about βin the discarded portion of the likelihood function, the resulting estimates are not fully correct.Their standard errors are larger than they would be if we would have used the entire likelihoodfunction to obtain the estimates. In most cases, however, the loss of efficiency is quite small andin return we gain robustness because the estimates have good properties regardless of the actualshape of the baseline hazard function. Partial likelihood estimates still have two of the threestandard properties of ML estimates: they are consistent and asymptotically normal [11].

In this section we will discuss the basics of partial likelihood. We will assume that the failuretimes of our observations are ordered and that we have a partition of observations into two groups:those who make a transition to another state, and those who are at risk. Let t1 < ... < ti < ... < tkdenote the ordered discrete failure times of the spells in the sample of size n (n ≥ k). We define:

• The risk set R(ti) = j : tj ≥ ti: the set of individuals who are at risk of failing justbefore the ith ordered failure time. It includes all spells that are not yet completed orcensored.

• The death set Di = D(ti) = j : ti = tj: the set of subjects that die at time ti.

• The risk score ri(β) = eβTxi = ri, for subject i.

Furthermore, we recall the indicator function:

Yj(ti) =

1 if j ∈ R(ti)

0 otherwise

We now consider the probability that a particular spell ends at time ti. First we compute theprobability that spell i is the actual spell that ends.

Page 34: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 29

P (Ti = ti|i ∈ R(ti)) =P (Ti = ti|Ti ≥ ti)∑

j∈R(ti)P (Tj = tj |Tj ≥ ti)

=λ(ti|xi)∑

j∈R(ti)λ(ti|xj)

=eβ

Txi∑j∈R(ti)

eβTxj

(4.11)

=ri∑

j Yj(ti)rj(4.12)

where in (4.11) the baseline hazard λ0(t) has dropped out, as a consequence of the PH assumption.

The partial likelihood function is now defined to be the joint product of P (Ti = ti|i ∈ R(ti))over the k ordered failure times:

PL(β) =

k∏i=1

ri∑j Yj(ti)rj

(4.13)

The partial log-likelihood is then given by

ln(PL(β)) =

k∑i=1

ln ri − ln

∑j

Yj(ti)rj

=

k∑i=1

βTxi − ln

∑j

Yj(ti)rj

(4.14)

Equations (4.13) and (4.14) are derived by assuming continuous survival times and do not allowfor tied events. This assumption is doubtful because survival times, in our case recovery times,are measured in discrete time units and there will be ties in the survival times. One option todeal with discrete time data is to use a discrete survival model, for example the logistic modelwhich will be discussed in chapter 5. Another option would be to adjust equation (4.12). Wewill discuss three ways this can be done. For each method we give the contribution to the partiallikelihood function, Li(β). The partial likelihood is then given by PL(β) =

∏ki=1 Li(β).

The exact method assumes that the survival time has a continuous distribution and that thetied survival times are in fact different. The exact likelihood contribution is given by

Li(β) =

∏k∈Di rk∑q∈Qi rq

, (4.15)

where Qi is the set of all di-tuples that could be selected from R(ti) and rq is the product of rj forall members j of the di-tuple q. We notice that if there are k individuals with tied failure times,there are k! terms in this sum. As one can imagine, this becomes quite complicated with manytied values. The partial likelihood of (4.15) can be expressed as an integral (using integration byparts, see [14]):

Page 35: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 30

∫ ∞0

k∏i=1

[1− exp

(rit∑

j Yj(ti)rj

)]e−t dt.

Due to the computational difficulty of this method, two other methods have been proposed toapproximate the exact partial likelihood. A standard one, due to Breslow (1974), is the Breslowapproximation, which states

Li(β) =

∏k∈Di rk(∑

j Yj(ti)rj

)|Di| . (4.16)

This approximation works well if the number of failures at time ti is small relative to the numberat risk. If this is not the case, however, it lacks accuracy. Therefore Efron (1977) proposed theEfron approximation of the contribution to the partial likelihood for which both subjects havea share1:

Li(β) =

∏k∈Di rk∏|Di|

k=1

(∑j Yj(ti)rj −

k−1|Di|

∑j∈Di rj

) . (4.17)

One can check that in the absence of ties these approximations give the same partial likelihoodas (4.13).

We will end this section with an example to illustrate the three methods.

Example 4. Suppose we consider four subjects for which we measure the time until an event,where the first two subjects have an event measured at exactly the same time. Without anyknowledge of the true ordering of the survival times of subjects 1 and 2, we have to consider allpossible orderings, which are 2! = 2. If subject 1 fails before 2, the contribution to the partiallikelihood is given by: (

r1

r1 + r2 + r3 + r4

)(r2

r2 + r3 + r4

).

A similar term arises if 2 fails before 1. The exact likelihood contribution is the average of thesetwo possibilities:

Li(β) =

(r1

r1 + r2 + r3 + r4

)(r2

r2 + r3 + r4

)+

(r2

r1 + r2 + r3 + r4

)(r1

r1 + r3 + r4

)=

r1 · r2

(r1 + r2 + r3 + r4)(r2 + r3 + r4) + (r1 + r2 + r3 + r4)(r1 + r3 + r4)

Using (4.16), we see that the Breslow contribution is equal to:

Li(β) =r1 · r2

(r1 + r2 + r3 + r4)2.

Finally, the Efron approximation is given by:

Li(β) =r1 · r2

(r1 + r2 + r3 + r4)( 12r1 + 1

2r2 + r3 + r4).

1Since there are a lot of ties in our data set, we will use the Efron approximation in our estimations.

Page 36: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 31

4.5 Time-varying covariates

All preceding results have been restricted to models where regressors are variables that varyacross individuals but - for a given individual - not over time. In our case, however, we willconsider variables related to the business cycle, which do vary over time. It would be incorrectto treat these variables as if they were fixed, since the entire history of the covariate over thespell may be relevant. There are two different types of time-dependent variables:

• An external time-dependent covariate is one whose path is generated without any influenceof the individual.

• An internal time-dependent covariate is one where the change of the covariate over timeis related to the behavior of the individual.

Suppose we have a vector of time-dependent covariates, which - for the ith individual in oursample - is denoted by xi(t) = (xi1(t), ..., xiq(t))

T , corresponding to the value of these covariatesat time t. Note that this notation allows us to use time-independent variables as well. Forexample, if the jth variable is time-independent, we have that xij(t) = xij , for all t.

We now let xHi (t) denote the history of the vector of the time-dependent covariates up to time t:

xHi (t) = xi(s), 0 ≤ s ≤ t.We now can define the hazard rate at time t conditional on this history by:

λ(t|xHi (t)) = lim∆t→0

P (t ≤ T < t+ ∆t|T ≥ t,xHi (t))

∆t.

This is the instantaneous rate of failure at time t, given that the individual was at risk at timet and has history xHi (t). For such a conditional hazard rate, we can consider a proportionalhazards model

λ(t|xHi (t)) = λ0(t)eβT ·g(xHi (t)),

where g(xHi (t)) is a vector of functions of the history.

A note has to be made when interpreting hazard rates with time-dependent variables, since thehazard function may not necessarily be used to construct survival distributions. For example, ifwe have a time-dependent variable x, then the conditional survival distribution

S(t|x) = P (T ≥ t|x) = exp

−∫ t

0

λ(s|x) ds

is well-defined and meaningful. However, the distribution

S(t|xH(t)) = P (T ≥ t|xH(t))

may not make any sense, since xH(t) was measured when an individual was alive at time t.

The estimation of the regression parameters in the model, as well as the underlying cumulativehazard functions, does not create additional difficulties. That is, we can use the theory developedso far for time-independent covariates with only slight modification. Remember the definition ofthe partial likelihood (4.13). It is clear that what matters at each failure time tj is the value ofthe regressors xHi (tj) for those observations in the risk set R(tj). Thus for the ith subject, thetime-independent vector xi is replaced by de function of time-dependent covariates g(xHi (tj)).The partial likelihood has similar changes.

Page 37: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 32

4.5.1 Episode splitting

One important way of dealing with time-dependent covariates in practice is by means of episodesplitting. Here, every time a covariate changes its value during an episode, the episode is splitup at that point in time, resulting in two new episodes. In our case, the variables related tothe business cycle are measured on a monthly basis, so the recovery process is split up severaltimes accordingly. Suppose individual i is disabled for k months. For each month he is receivingbenefit payments, the corresponding values of the business cycle variables are known. By meansof episode splitting this claim is split into k different observations over the period of 1 monthtogether with the value of the variables for that specific calender month. So each of this episodeswill look like a complete case in a way that it will carry all the necessary variables. However,there is one important different: if an episode has been split, this means that the first part ofthe former episode has to be considered as censored, since the event (if any) will occur only inthe last sub-episode.Note that this procedure, even though it looks as if we have increased the number of cases inthe data, actually does not invalidate the statistical inference. This is because the real ’units’ ofevent history analysis are not individuals but individuals * time, and the time span covered bythe claims does not change at all by episode splitting.

Example 5. Suppose we consider a claim which started in state 1 at t = 0, which correspondsto the calender date 04 − 2005. The health status of the claimant remains unchanged for twomonths. Then he recovers and the claim ends. Further suppose there are two variables, thetime-independent covariate ’age’ and the time-dependent covariate ’producer Confidence’. Forthis particular spell the data can be written as a three-line record, rather than a one line record,as follows:

ID Time Age Prod. Conf. Transition State Previous Date1 1 42 −0, 6 0 1 NA 04− 20051 2 42 −0, 1 0 1 NA 05− 20051 3 42 1, 4 1 0 1 06− 2005

Table 4.1: Episode split observation.

4.6 Competing risks

Until now we have only considered transitions from one state to another. However, if a clientis in state 1, he can jump either to state 0 or to state 2. Therefore the possibility of exit toone of several destination states has to be considered. For this purpose we will use a so-calledcompeting risks model (CRM) with latent approach. This model is applicable to modeling timein one state when exit is to a number of competing states. It is attractive because it is relativelystraightforward to implement in a PH model. In the latent approach, a survival analysis isperformed separately for each event type, where other competing events are treated as right-censored categories. Separate hazards are thus estimated for each failure type.

We will only treat the case with two competing risks, since an insured can jump to two differentstates, depending on the state he is staying in. However, it is relatively easy to generalize thismodel to the situation with m competing risks. The setup of the model is as follows. Eachclaim has an underlying failure time, which is subject to censoring. Failure time may be one of2 different types, given by the set 1, 2. We may think of this as a situation with two distinct

Page 38: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 33

causes of transition from a given state. However, the occurrence of a failure of one kind removesthe individual from risks of other kinds of events. Therefore, given censoring of the remainingduration for each individual, we observe at most one complete duration.In a CRM model with 2 types of failure, there are 3 states 0, 1, 2, where 0 represents theinitial state and 1, 2 are possible destination states. The model provides the joint distributionof the spell duration T and the exit route R, which is an indicator random variable that takesone of the values in the set 1, 2. We define:

• λ1(t) is the latent hazard rate of exit to destination 1, with survival times characterized bythe density function f1(t) and latent failure time given by the random variable T1:

λ1(t) = lim∆t→0

P (t ≤ T < t+ ∆t, R = 1|T ≥ t)∆t

• λ2(t) is the latent hazard rate of exit to destination 2, with survival times characterized bythe density function f2(t) and latent failure time given by the random variable T2:

λ2(t) = lim∆t→0

P (t ≤ T < t+ ∆t, R = 2|T ≥ t)∆t

• λ(t): the hazard rate for exit to one of the two destinations.

Each destination-specific hazard rate can be thought of as the hazard rate that would apply iftransition to the other state was not possible. If this would be the case, we would be able to linkthe observed hazards with the destination-specific hazard. However, since there are competingrisks, the hazard rates are latent rather than observed in this way. What we observe in thedata is either no event at all (a censored case, with spell length Tci) or an exit to state 1 or 2.The observed failure time is Ti = minT1i, T2i, Tci and the corresponding exit route is given byr = arg minT1i, T2i, Tci. We now define destination-specific censoring indicators:

δ1i =

1 if i exits to 1

0 otherwise(4.18)

δ2i =

1 if i exits to 2

0 otherwise(4.19)

Now, for each individual i we have a vector of the form (xi, Ti, δ1i , δ2

i ). Sometimes it can beuseful to include a censoring indicator δci which equals 1 if the spell is censored (for examplebecause of withdraw from study), and 0 otherwise. However, we note that δci = 1 − δ1

i − δ2i , so

all information is already captured in the vector.

Now our goal is to develop a method to estimate the destination-specific hazard rates. First, weassume that λ1(t) and λ2(t) are independent. This implies that

λ(t) = λ1(t) + λ2(t).

Given that failure occurs at time t, the conditional probability that the failure is of type i = 1, 2is λi(t)/λ(t), hence that the marginal probability that the failure is of type i = 1, 2 is

P (T = Ti) =

∫ ∞0

λi(t) exp

(−∫ t

0

λ(z) dz

)dt.

Page 39: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 34

Independence also means that the survivor function for exit can be factored into a product ofdestination-specific survivor functions:

S(t) = exp

[−∫ t

0

λ(s) ds

]= exp

[−∫ t

0

(λ1(s) + λ2(s)) ds

]= exp

[−∫ t

0

λ1(s) ds

]· exp

[−∫ t

0

λ2(s) ds

]= S1(t)S2(t).

Now we consider the likelihood in this independent competing risk model with two destinations.The individual sample likelihood contribution is of three types:

1. Exit to 1: L1 = f1(T )S2(T ). Summarizes the chances of a transition to 1 combined with notransition to 2.

2. Exit to 2: L2 = f2(T )S1(T ). Summarizes the chances of a transition to 2 combined with notransition to 1.

3. Censored spell: Lc = S(T ) = S1(T )S2(T ).

By using the destination-specific censoring indicators (4.18) and (4.19), we obtain that the overallcontribution from the individual i to the likelihood L is given by:

Li = (L1i )δ1i (L2

i )δ2i (Lci )

1−δ1i−δ2i

= [f1(Ti)S2(Ti)]δ1i [f2(Ti)S1(Ti)]

δ2i [S1(Ti)S2(Ti)]1−δ1i−δ

2i

=

[f1(Ti)

S1(Ti)

]δ1iS1(Ti)

[f2(Ti)

S2(Ti)

]δ2iS2(Ti)

=

[λ1(Ti)]δ1i S1(Ti)

[λ2(Ti)]

δ2i S2(Ti).

Or, equivalently:

lnLi =δ1i lnλ1(Ti) + lnS1(Ti)

+δ2i lnλ2(Ti) + lnS2(Ti)

.

Now the log-likelihood for the whole sample equals to sum of this expression over all individ-uals. In other words, the log-likelihood factors into two parts, each of which depends only onparameters specific to that destinations. Hence one can maximize the overall log-likelihood bymaximizing the two components separately. These results generalize straightforward to the sit-uation with more than two independent competing risks. This means that the model can beestimated very easily. One should define new destination-specific censoring variables as aboveand then estimate separate models for each transition.

4.7 Unobserved heterogeneity: Frailty models

In the PH model, it is implicitly assumed that a homogeneous population is studied. This meansthat all individuals sampled into the study are in principle similar and only differ on certaincovariates. In many applications, however, it is impossible to measure all relevant covariates

Page 40: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 35

related to the disease of interest. The frailty approach is a statistical modeling concept whichaims to account for heterogeneity, caused by such unmeasured covariates. Frailty models are anextension of the proportional hazards model.

We start this section by illustrating the importance of identifying unobserved heterogeneity insurvival analysis. Suppose the aggregated hazard rate out of disability is known to be a decliningfunction of the length of the spell. If all individuals were identical then this would imply negativeduration dependence, that is, a falling probability of recovery the longer an individual is disabled.However, suppose that the population consists of two different groups of equal size: type F (fast)and S (slow), where F has a higher hazard rate than S. As a result, group F has a higher riskof ’failure’ than group S. This implies that the proportion of the two subpopulations in thesample declines over time. So the hazard rate in the total population will appear to fall overtime, despite the fact that the hazard for both groups remains constant. It can be conclude thatunobserved heterogeneity may give the appearance of a total decline in hazard rate, even whenthe individual rates are constant. As a result, the estimated hazard rate in models that do notallow for unobserved heterogeneity can become biased towards negative duration dependence.

To allow for unobserved heterogeneity we can add a frailty term, or random effect, v to theproportional hazards model (4.8). This random effect can be interpreted as a function of unob-served explanatory variables such as risk aversion, motivation to recover, lifestyle or education.We will use the specification that the frailty term v = ev operates multiplicative on the hazardrate. Hence the hazard rate can be written as:

λ(t|xi, v) = λ0(t) · eβTxi+v

= λ(t|xi)v.

Because of the definition of the frailty term, there is no information available about v, whichimplies that we should make assumptions on the individual values. First of all it is assumedthat v is independent of time and of the vector of covariates x. Furthermore, v should haveunit mean (normalization required for identification) and finite variance σ2 > 0. Last, since thehazard rate cannot be negative, the distribution of v has to be chosen from a class of positivedistributions. The most frequently used frailty distribution is the gamma distribution. This isconvenient from a computational and analytical point of view, because it is easy to derive theclosed form expression of the survival, density and hazard function. This is due to the simplicityof the Laplace transform. For z > 0 the gamma density is given by:

f(z;α, n)αn

Γ(n)zn−1e−αz,

with E[z] = n/α and Var(z) = n/α2. Normalization sets n = α, E[z] = 1 and Var(z) = 1/α.

A PH model with frailty term is better known as a mixed proportional hazard (MPH) model. Itcan be shown that the relationship between the frailty survival function and the survival functionwithout frailty is given by:

S(t|x, v) = [S(t|x)]v.

Thus the individual effect v scales the survival function in a way that individuals with above-average frailty leave relatively fast, while the opposite occurs for individuals with below-averagevalues of v.

Page 41: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 36

Another advantage of adding a frailty term is the fact that it can help to solve the problem ofmultiple spells. Suppose that person i has c(i) different claims, each with its own frailty termvi. Conditional on the covariates xi and vi the individual hazard function λ(t|xi, vi) is the samefor all these spells. The likelihood contribution of person i is then given by:

∫ ∞0

...

∫ ∞0

c(i)∏j=1

S(tji |xji , v

ji )λ(tji |x

ji , v

ji )di dG(v1

i , ..., vc(i)i ), (4.20)

where tji is the length of the j-th duration of individual i and G(v1i , ..., v

c(i)i ) is a joint distribution,

see [9].

However, if it is assumed that an individual will have a shared frailty term for different spells,the individual likelihood contribution in (4.20) reduces to:

∫ ∞0

c(i)∏j=1

S(tji |xji , vi)λ(tji |x

ji , vi)

di dG(vi),

where G(vi) is a particular distribution, for example the gamma distribution.

4.8 Goodness of fit

One way to asses if a Cox PH model is adequately specified is via the diagnostic approach of theCox-Snell residuals which are defined by

(rCS)i = exp(βTxi) · Λ0(t),

where Λ0(t) is the estimated cumulative baseline hazard function at time t and is defined as

Λ0(t) =

n∑i=1

∫ t

0

dNi(s)∑nj=1 Yj(s) exp(β

Txj(s))

If the fitted model is correct and β is close to the true value of β, then the (rCS)i’s should be aplausible sample of observations from a unit exponential distribution. Thus, a plot of Cox-Snellresiduals versus observations or time will not lead to a symmetric display [1].

Another goodness-of-fit statistic can be formed by first obtaining the partial likelihood estimate

of φi = exp(βTxi), which is φi = exp(βTxi). Then the subjects are grouped into regions based

on the percentiles of φi, which are called percentiles of risk. Abeysekera and Sooriyarachchi [1]suggest to form G regions of approximately equal size so that the first group contains the n/G

subjects with the smallest φi’s, and the last groups contains the n/G subjects with the largest

φi’s. In general, this classification leads to partitioning subjects that have similar risks of deathat any given time i. For g = 1, ..., G− 1 group indicators are defined by

Iig =

1 if φi is in region g

0 otherwise

Page 42: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 4. SURVIVAL THEORY 37

In order to assess the goodness of fit of the PH model λ(t|xi) = λ0(t) exp(βTxi) we consider thealternative Cox model

λ(t|xi = λ0(t)) exp

(βTxi +

G−1∑g=1

γgIig

). (4.21)

If the PH model is correctly specified, one should have γg = 0, for all g. To test the goodness offit of the PH model versus the alternative (4.21) one can use the likelihood ratio, Wald, or scorestatistic to test the null hypothesis H0 : γ1 = γ2 = ... = γG−1 = 0. If the PH model has beencorrectly specified, each of these statistics should have an approximate chi-squared distributionwith (G− 1) degrees of freedom.

Page 43: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Chapter 5

Binary regression

In the previous chapter it was assumed that the survival process was continuous and that changecould occur anywhere during the time interval. In many cases, however, measurements of time areimprecise. Many researchers rely on event history data that is collected at particular intervalsin panel studies. This also is the case with the disability data in our study, where data wascollected on a monthly basis. This means that we do not know the exact timing of the event, butrather the interval in which it occurred. For this reason, this type of data is often referred to asinterval-censored. To analyse time-to-event data when survival times are grouped into discreteintervals of time, discrete-time models were developed.

As in the previous chapter, the focus is on how the hazard function λ(t) depends upon covariates.In both continuous and discrete-time models, the risk of the event occurring at time t is beingmodeled. Whereas the dependent variable in a continuous-time model is the hazard rate, in adiscrete-time model it is the odds. Although the dependent variable might at first glance appearto be different, the approximation is close and it actually conveys the same information [27].The central difference with discrete-time models is that the discrete-time hazard function is theprobability of an event occurring during interval (t− 1, t], conditional on the fact that the eventdid not occur before this interval. For T denoting the event time, this is written as:

λ(t) = P (T ∈ (t− 1, t]|T > t− 1).

The survival function is represented by:

S(t) = P (T > t|T > t− 1) = 1− λ(t),

which is the probability that an event did not occur before time t. The cumulative probabil-ity density function, which measures the probability that an event occurs before time t (theprobability of failure) is written as:

F (t) = P (T < t) = 1− S(t).

Considering the fact that our discrete-time data is in the form of a binary variable (transition orno transition), it is possible to estimate commonly used binary models, which will be discussedin this chapter. A basic knowledge of linear regression is presumed. For a brief introduction onecould consider [23] or [35].

Binary analysis is in many ways the complement of ordinary linear regression whenever thedependent variable is is not a continuous variable but a state which may or may not hold, or a

38

Page 44: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 39

category in a given classification. When such discrete variables occur among the regressors, theyare dealt with by the introduction of one or several (0, 1) dummy variables. However, when thedependent variable belongs to this type, the regression model breaks down. Binary analysis thenprovides a good alternative.

Suppose we consider m individuals which correspond to m independent observations y1, .., ym.The ith observation can be treated as a realization of the random variable Yi, with a Bernoullidistribution with success probability pi. Hence

Yi =

1 with probability pi

0 with probability 1− pi

We now assume that for each trial i there is a set of explanatory variables that might influencepi = P (yi = 1). These variables can be thought of as a k-dimensional vector xi = (xi1, ..., xik)T .The regression model defined by

yi = β0 + β1xi1 + β2xi2 + ...+ βkxik + εi,

is called the linear probability model. Here β0 is called the intercept and β1, β2, .. the regressioncoefficients of xi1, xi2, ... respectively. The intercept is the value of yi when all the independentvariables are equal to 0, hence it is the value of someone without risk factors. Each of theregression coefficients describes the size of the contribution of the corresponding risk factor. Apositive value means that the variable increases the possibility of the outcome, whereas a negativeone means that this risk factor decreases the probability of the outcome. The larger the value,the greater the influence. Finally, εi is the error term or the disturbance. It contains factorsother than xi1, ..xik that affect yi. The key assumption for every regression model is:

E[εi|xi1, ..., xik] := E[εi|xi] = 0 (5.1)

This requires that all factors in the unobserved error term are uncorrelated with the explanatoryvariables. It also means that we have correctly accounted for the functional relationships betweenyi and the regression coefficients.

If (5.1) is fulfilled, we obtain:

E[yi|xi] = β0 + β1xi1 + β2xi2 + ...+ βkxik = βTxi,

where we use that x0 = 1 and β = (β0, ..., βk)T . Since the variable yi can only take the values 0and 1, we have

pi = P (yi = 1|xi) (5.2)

= 1 · P (yi = 1|xi) + 0 · P (yi = 0|xi)

= E[yi|xi] (5.3)

= βTxi. (5.4)

The linear probability model has several disadvantages. For example, it places implicit restric-tions on the parameters β as (5.4) requires that 0 ≤ βTxi ≤ 1 for all i = 1, ..., n. Furthermore,the error terms εi are not normally distributed, which is caused by the distribution of yi. Sinceεi = yi − βTxi it follows that εi is a random variable with discrete distribution given by:

εi = 1− βTxi with probability βTxi

εi = −βTxi with probability 1− βTxi.

Page 45: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 40

Hence the distribution of εi depends on xi. Now the conventional ordinary least squares (OLS)

formula for the standard errors does not apply. Further, if the OLS estimates β are used to

compute the estimated probabilities P (yi = 1) = βTxi, then this may result in values smaller

than zero or larger than one. So in that case we cannot talk about real ’probabilities’.

However, the probabilities can be confined to values between zero and one by using a non-linearmodel. Let F be a function with range [0, 1] and let

pi = P (yi = 1|xi) = F (βTxi) (5.5)

The choice of such a function F corresponds to assuming a specific distribution for the unobservedindividual effects. Often F is chosen to be a cumulative distribution function, because in thatcase we have a monotonically non-decreasing function. This has the advantage that positivecoefficients βi correspond to positive effects on the success probability and vice versa. In thefollowing section we will discuss the most common choice for F , resulting in the logit model.There are also other probabilities, resulting in the probit model for example. However, theresults of these models turn out to be very similar [17].

5.1 The logit model

The logit model arises if F is chosen to be the logistic function.

Definition 7. The logistic function of y ∈ R is given by:

F (y) = Λ(y) =1

1 + e−y=

ey

1 + ey.

Figure 5.1: The logistic function

The logistic function is the cdf of the logistic distribution, where the density function is givenby:

f(y) = λ(y) =ey

(1 + ey)2.

Page 46: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 41

The main interest lies in determining the marginal effect of change in a regressor on the con-ditional probability that y = 1. For a change in the jth regressor, which is assumed to becontinuous, this equals (use (5.5)):

∂P (yi = 1|xi)

∂xij= Λ′(βTxi)βj

= Λ(βTxi)[1− Λ(βTxi)]βj .

= pi(1− pi)βj

A very common interpretation of the coefficients is in terms of marginal effects on the odds ratiorather than on the probability. We have:

pi = Λ(βTxi) =eβ

Txi

1 + eβTxi

pi1− pi

= eβTxi

lnpi

1− pi= βTxi

Here pi/(1− pi) measures the probability that yi = 1 relative to the probability that yi = 0 andis called the odds ratio or relative risk. We notice that for the logit model the log-odds ratio islinear in the regressors.

The function g(p) = ln p1−p is called the logit, which explains the name of the corresponding

model.

5.1.1 Likelihood function

The preferred method of estimation in probability models is maximum likelihood, since thispermits the estimation of the parameters of almost any analytical specification of the probabil-ity function. Furthermore, it yields estimates that are consistent and asymptotically efficient,together with estimates of their asymptotic covariance matrix.

Suppose we have i = 1, ..., n observations on the occurrence of a certain event, denoted by the(0, 1) variable yi, and a number of covariates which are arranged in the vector xi = (xi1, ..., xik)T .The density function of yi is given by:

f(yi,xi) = pyii (1− pi)1−yi yi ∈ 0, 1,

where pi = Λ(βTxi). This implies that the sample density of a vector y of zeros and ones iswritten as:

f(y,x) =

n∏i=1

f(yi,xi).

Page 47: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 42

Hence, the log-likelihood is given by:

l(β) = ln

n∏i=1

f(yi,xi)

=

n∑i=1

[yi ln(pi) + (1− yi) ln(1− pi)]

=

n∑i=1

[yi ln Λ(βTxi) + (1− yi) ln(1− Λ(βTxi))].

Note that the actual ordering of the observations does not affect their density nor the (log)likelihood.This is since the observations are independent, so the way of ordering is arbitrary.When differentiating with respect to β, it follows that the maximum likelihood estimator βsolves:

0 =

n∑i=1

[yi

Λ(βTxi)Λ′(βTxi)xi −

1− yi1− Λ(βTxi)

Λ′(βTxi)xi

]

=

n∑i=1

yi − Λ(βTxi)

Λ(βTxi)(1− Λ(βTxi))Λ′(βTxi)xi

=

n∑i=1

(yi − Λ(βTxi))xi.

5.2 Panel data

Panel or longitudinal data provide information on individual behavior both across time andacross individuals. Panel data are obtained by initially selecting a sample S and then collectingobservations for a sequence of time periods, t = 1, ..., T . This produces a sequence of data vectorsw1, ...wT that is used to make inferences about either the behavior of the population or thebehavior of the particular sample of data drawn from a non-stationary population. A majoradvantage of panel data is the increased precision in estimation. This is the result of an increasein the number of observations owing to combining - also called pooling - several time periods ofdata for each individual. In this section we will discuss linear panel data models, which later onwill be adapted to panel data models for logistic regression.

5.2.1 Linear panel models

A very general linear model for panel data permits the intercept and slope coefficients to varyover both individual and time:

yit = αit + βTitxit + εit.

where i indexes individuals in a cross section and t indexes time. Like before, yit is a scalardependent variable, xit a (k×1) vector of independent variables and εit an error term. However,this model is too general and it is not estimable since there are more parameters to estimate thanthere are observations. As a consequence, further restrictions need to be placed on the extent towhich αit and βit vary with i and t, and on the behavior of the error term εit.

Page 48: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 43

The most restrictive model is a pooled model that specifies constant coefficients, which is alsothe usual assumption for cross-section analysis:

yit = α+ βTitxit + εit. (5.6)

If this model is correctly specified and regressors are uncorrelated with the error, then it can beconsistently estimated using pooled OLS.

A simple variant of model (5.6) permits intercepts to vary across individuals and over time, whilethe slope parameters do not. Then

yit = αi + γt + βTxit + εit, (5.7)

where it is assumed that xit does not include an intercept. If there are n individuals and T timepoints, this model has n+ (T − 1) + dim[x] parameters. These can be consistently estimated ifboth n→∞ and T →∞. Since patients are followed for a finite time span T will not go to ∞.Therefore the γt can be consistently estimated, so the time dummies are simply incorporatedinto the regressors xit. The challenge then lies in estimating the parameters β controlling forthe n individual intercepts αi [11].

The individual-specific effects model allows each cross-sectional unit to have a differentintercept term:

yit = αi + βTxit + εit, (5.8)

where εit is assumed to be iid over both i and t. This is a more parsimonious way to expressmodel (5.7), with time dummies included in the regressors xit. The αi are random variables thatcapture unobserved heterogeneity. Throughout this section it is assumed that the error term hasmean zero conditional on past, current and future values of the regressors:

E[εit|αi,xi1, ...,xiT ] = 0 t = 1, ..., T (5.9)

There are two variants of the individual-specific effects model:

• The fixed effects (FE) model: αi is treated as an unobserved random variable that ispotentially correlated with the observed regressors xit

• The random effects (RE) model: The unobservable individual effects αi are assumed tobe random variables that are distributed independently of the regressors. Usually additionalassumptions are made:

αi ∼ [α, σ2α];

εit ∼ [0, σ2ε ],

so that both the random effects and the error term are assumed to be iid. Note that nospecific distributions have been specified.

Because of (5.9) both models assume that

E[yit|αi,xit] = αi + βTxit.

The individual-specific effect αi is unknown and in short panels cannot be consistently estimated,so we cannot estimate E[yit|αi,xit]. Instead, we can eliminate αi by taking the expectation onlywith respect to xit, resulting in

Page 49: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 44

E[yit|xit] = E[αi|xit] + βTxit.

For the RE model it is assumed that E[αi|xit] = α, so E[yit|xit] = α + βTxit. In the FEmodel, however, E[αi|xit] varies with xit and it is not known how. Therefore we cannot identifyE[yit|xit]. It is nonetheless possible to consistently estimate β in the FE model with short panelsand to identify the marginal effect

β =∂E[yit|αi,xit]

∂xit,

even though the conditional mean is not identified. However, a drawback is that identificationof the marginal effects is only permitted for time-varying regressors. In contrast to this, theRE model has the advantage of permitting consistent estimation of all parameters, includingcoefficients of time-invariant regressors [11].

In our study, the individual-specific term is used to capture unobserved characteristics as lifestyle,willingness to recover and education. It is reasonable to assume that these characteristics areuncorrelated with the known regressors. In the following section we will therefore only considerthe random effects model.

5.2.2 Binary panel models

The natural extension of the binary outcome model from cross-section data (as discussed in 5.1)to panel data with individual-specific effects is to specify that yit only takes the values 0 and 1,with

P (yit = 1|xit, αi) = Λ(αi + βTxit)

=eαi+βTxit

1 + eαi+βTxit.

Using this and assuming conditional independence, the joint density of the ith observation yi =(yi1, ..., yiT ) is given by

f(yi|xi, αi) =

T∏t=1

Λ(αi + βTxit)yit(1− Λ(αi + βTxit))

1−yit . (5.10)

For binary data, the conditional probability is also the conditional mean (see (5.2)), so

E[yit|xit, αi] = Λ(αi + βTxit).

The random effects MLE assumes that the individual effects are normally distributed, withαi ∼ N (0, σ2

α). The random effects MLE of β and σ2α maximizes the log-likelihood

lnL =

N∑i=1

ln f(yi|xi, σ2α).

Here

f(yi|xi, σ2α) =

∫f(yi|xi, αi)

1√2πσ2

α

exp(−αi2σ2

α

)2 dαi, (5.11)

Page 50: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 45

and f(yi|xi, αi) is given by (5.10). There is no closed-form solution for this integral and it isstandard to compute it numerically using quadrature methods [35]. If fixed effects are absent,an alternative to the RE model is a pooled binary model that simply specifies that P (yit =1|xit) = Λ(βTxit). Statistical inference should then be based on panel-robust standard errors.More efficient estimation is possible using a generalized estimating equation (GEE) approach,see Liang and Zeger (26). This is also the approach used by the statistical package SAS, bywhich the estimations are performed. 1

5.3 Goodness of fit

Measures of model fit and tests of significance for logistic regression are not identical to thosein OLS regression, although they are conceptually related. In the last, measures of variation inthe form of sums of squares are the building blocks of R2 as well as of tests of significance ofoverall prediction and gain in prediction. In logistic regression, measures of deviance replace thesums of squares of OLS regression as the building blocks of measures of fit and statistical tests.Each deviance measure is a measure of a lack of fit of the data to a logit model. Two of themare particularly useful. The first of them is the null deviance, Dnull = −2 lnLnull, which is asummary number of all the deviance that could potentially be accounted for. It can be thoughtof as a measure of lack of fit of data to a model containing an intercept but no predictors. Itprovides a baseline against which to compare predictions from other models that contain atleast one predictor. The second is the model deviance from a model containing k predictors,Dk = −2 lnLk, which is a summary number of all the deviance that remains to be predictedafter prediction from a set of k predictors. If the model containing k predictors fits better thana model without predictors, the model deviance should be smaller than the null deviance.

5.3.1 Maximum likelihood theory

As said, measures of goodness of fit and test statistics in logistic regression are constructedfrom the deviance measures. Since these are derived from ratios of maximum likelihoods underdifferent models, the statistical tests built on deviances are referred to as likelihood ratio tests.In this section we will discuss these kind of tests together with some other measures of goodnessof fit.

One of the basics of the statistical model of maximum likelihood is the definition of a parameterspace in which the true parameter θ as well as its maximum likelihood estimator θ must lie.Nested hypotheses that restrict θ to a subspace of a wider but still acceptable parameter spaceare tested. The restriction may constrain one element of θ to a particular value (most oftenzero), but it may also take a more general form, for example an interval. There are three typesof tests of such restrictions against the wider alternative, namely

• Likelihood ratio tests (LR) based on a comparison of the maximum value of the loglikelihood with and without restrictions.

• Wald tests based on a comparison of the restricted values with the asymptotic normaldistribution of the unrestricted parameter estimates. Under the Wald test, the maximumlikelihood estimate θ of the parameter of interest θ is compared with the proposed value θ0,with the assumption that the difference between the two will be approximately normallydistributed.

1The procedure proc genmod is used with the logistic function as link function.

Page 51: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 46

• Lagrange multiplier tests (LM) based on a comparison of the score vector of theunrestricted model, evaluated at the constrained parameter estimates, with the unrestrictedvalue (which is zero). These tests are also known as score tests.

Any nested hypothesis can be subjected to all three tests. They are asymptotically equivalent,but they may give different results in any particular instance. We will shortly discuss them here,following the approach taken by [19].

Suppose we are testing the null hypothesis that the data y are generated by a joint densityfunction f(y, θ0) against the alternative hypothesis that the data are generated by f(y, θ), forθ ∈ Rk. The log-likelihood is defined as

L(y, θ) = log f(y, θ).

Further we define the score function as

s(y, θ) =∂L(y, θ)

∂θ, (5.12)

and the Fisher Information as

I(θ) = −E∂2L

∂2θ(θ). (5.13)

The maximum likelihood estimator now solves the equation s(y, θ) = 0. If the MLE has a normal

distribution, and if I(θ) is consistently estimated by θ, then

ξW = (θ − θ0)TI(θ)(θ − θ0)

will have a χ2 distribution with k degrees of freedom when the null hypothesis is true. Since thevariance of θ can be calculated as the inverse of the Fisher Information, var(θ) = I−1(θ), in the1-dimensional case this can be reduced to

ξW =(θ − θ0)2

var(θ). (5.14)

This is commonly known as the Wald test .

The likelihood ratio test is based upon the difference between the maximum of the likelihoodunder the null and under the alternative hypothesis. The corresponding measure is given by

ξLR = −2[L(y, θ0)− L(y, θ)] (5.15)

= Dnull −Dk.

It can be shown that ξLR has a χ2 distribution if the null hypothesis is true.

There are several reasons to prefer the likelihood ratio test to the Wald test. One is that theWald test can give different answers to the same question, depending on how the question isphrased. For example, asking whether β = 1 is the same as asking whether log β = 0. However,the Wald statistic for the first case is not the same as the Wald statistic for the latter one. Thisis caused by the fact that there is in general no neat relationship between the standard errors ofβ and log β. On the contrary, likelihood ratio tests will give exactly the same answer whether wework with β, log β or any other monotonic transformation of β. Another reason is that the Waldtest is based on two assumptions (that we know the standard error, and that the distribution is

Page 52: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 47

chi-squared), whereas the likelihood ratio test uses only one assumption (that the distribution ischi-squared).

Another possible test is the Langrange multiplier (LM) or score test, which has the advantagethat it can be formulated in situations where the variability is difficult to estimate. The statisticto test the null hypothesis H0 : θ = θ0 is given by:

ξLM =s(y, θ0)2

I(θ0), (5.16)

which asymptotically takes a χ21 distribution under the null hypothesis. Alternatively, one can

also test the statistic√ξLM against a normal distribution. This approach is equivalent and

yields identical results. The main advantage of the Lagrange Multiplier test is that is doesnot require an estimate of the information under the alternative hypothesis or unconstrainedmaximum likelihood. This makes testing feasible even when the maximum likelihood estimateis a boundary point in the parameter space.

The three tests are based on different statistics which measure the distance between H0 and H1.The Wald test is formulated in terms of θ0 − θ, the LR test in terms of L(θ0) − L(θ), and theLM test in terms of s(θ0). This difference is illustrated by figure 5.2.

Figure 5.2: The log-likelihood function plotted against θ for a particular realization of y, fork = 1.

The Wald test is based upon the horizontal difference between θ0 and θ, the LR test is basedupon the vertical difference, and the LM test is based on the slope of the likelihood function atθ0. Each of this methods is a reasonable measure of the distance between H0 and H1. In fact,when L is a smooth curve that is well approximated by a quadratic, they all give the same test.

Theorem 1. If L(θ) = b−1/2 ·(θ− θ)TA(θ− θ) where A is a symmetric positive definite matrix,

b is a scalar and θ is a function of the data, then the Wald, LR and LM tests are identical.

Proof. By direct substituting in (5.15) one can see that

ξLR = (θ0 − θ)TA(θ0 − θ).

Page 53: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 5. BINARY REGRESSION 48

Further, by differentiating it follows that

s(θ) =∂L

∂θ(θ) = −(θ − θ)TA

and

−I =∂2L

∂2θ= −A.

Substituting this in (5.14) and (5.16) results in

ξW = (θ0 − θ)TA(θ0 − θ),ξLM = s(θ0)TA−1s(θ0)

= (θ0 − θ)TA(θ0 − θ).

Whenever the true value of θ is equal to or close to θ0, the likelihood function in the neighborhoodof θ0 will be approximately quadratic for large samples. This is the reason of the asymptoticequivalence of the different likelihood tests.

5.3.2 Pseudo R2 measures

In addition to the likelihood tests, several different ”R2-like” measures have been developed.These measures are interpreted in a manner similar to the coefficient of determination in multipleregression. The pseudo R2 value for a logit model can be calculated as

R2logit =

Dnull −Dk

Dnull.

Just like the multiple regression counterpart, the value of R2logit ranges from 0 to 1. As the model

fit increases, the deviance decreases. A perfect fit has a deviance of 0 and a R2logit of 1.

Page 54: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Chapter 6

Results

In this chapter the effects of the risk factors on the different transitions will be discussed, basedon the survival and logit model discussed in chapter 4 and 5 respectively.

6.1 Data set

The results in this section are obtained by fitting separate models for each transition. Thisis done based on an expanded data set, obtained from the original data set, containing thedurations of the transitions made by the claimants. To correctly account for competing risks, foreach claimant with transition i→ j (i 6= j) a row is added for the other (unobserved) transitionthat is possible from the initial state. This additional row contains the same information as theoriginal row, only the event variable is different and equals zero, since the added transition doesnot actually take place. Since we also applied the method of episode splitting, see section 4.5.1,there are a lot of left-truncated durations in our data, hence we have to characterize each durationby a ’start’ and ’stop’ variable. For the censored transitions we do not observe the end state.We deal with this by including two additional rows in the augmented data set, which correspondto the two possible transitions, with event indicator equal to zero. See [27] for a more extensiveexplanation about preparing data sets for (discrete) survival analysis. We have chosen to lagthe time-dependent covariates corresponding to the business-cycle by one month. By lagging avariable we ensure that it is correctly specified that the cause precedes the effect. If the valueof the variable that supposedly ’causes’ the effect occurs at the same time, the cause-effect logicis certainly lost and will not appear in the results. In this case, lagging ensures that changes inthe time-dependent variable precede the actual event. On the other hand, in this way we alsotake into account that changes in the disability percentages have to be communicated to andadministrated by the insurance company, which can take some time.

49

Page 55: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 6. RESULTS 50

6.2 Model without the business cycle

Figure 6.1: Kaplan-Meijer estimates of transition-specific survival functions.

On the vertical axis figure 6.1 shows the proportion of people entering a specific state at time0 (on a person-specific clock) which are still in that state after some days. Calender time isignored in this figure, meaning that the different start date of different clients plays no role inthe construction of the figure. As expected, the function starts at one and monotonically declinesto zero, indicating that all people should eventually jump to another state. We observe that atransition from state 2 to 0 takes place in the first months. After this period the probability of afull recovery from a severe disorder is minimal. On the other hand, for the transition from state1 to state 2 it is noted that even after two years, there is still a significant probability of fallingback. The probabilities of the other two transitions, which reflect a small improvement in healthstatus, are distributed almost equally.

Before the influence of the economy could be addressed, the model without business cycle relatedvariables had to be estimated. In this model (further referred to as ’the original model’) all otherpossible covariates, as discussed in table 3.2, were included. Then, using the Information Criteriadiscussed in section D.1.2, insignificant variables were deleted based on stepwise exclusion. Theresults of this estimation, together with the exponent of the coefficient and the results of thesignificance test can be found in the Appendix A and B. Here a ”−” for a variable x is used todenote that this covariate is not included in the model. The estimated coefficients are asymp-totically normally distributed and are tested by the null hypothesis H0 : β = 0. In this sectionthe effects of the included risks on the different transition probabilities will be discussed. Sincethe results of the survival and logit model are very similar, we will restrict the discussion to thesurvival model. First we note that the variable for compensation benefit level is insignificant forall transitions. This contradicts the findings of Spierdijk and Koning [32], but is in concordancewith Amelink [7].

Page 56: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 6. RESULTS 51

Transition 1→ 2 (see table A.1)A transition from state 1 to 2 means that a claimant falls back from a mild disability to a moresevere illness. The coefficient of age is insignificant and this regressor is not incorporated inthe final model. Therefore it can be concluded that the fall back rate of an older self-employedis not significantly lower or higher than the fall back rate of a younger self-employed. Basedon a 5% significance it can be said that females have a 14% higher fall back rate than males.Almost all dummies for occupational class are significant. Working in construction is an indicatorfor a higher fall back rate, while working in the (para)medical or agricultural sector, or in ashop indicates a lower fall back rate. Most of the disorder type dummies have an insignificantcoefficient. Only the ones corresponding to digestive and psychological disorders or cancer have ap-value smaller than 0, 05. Claimants with a digestive disorder have a 114% higher fall back rateand claimants with a psychological disorder a 15% lower, when compared to the other types ofdisorder, whereas the presence of cancer indicates an increase of 104%. Finally it is shown thatboth the previous state and the year in which the disability started have no significant influenceon the transition intensity from a mild to a severe disorder. These variables are not included inthe model. Also the coefficients of the deferment period are not significant.

Transition 1→ 0 (see table A.2)We will now consider which risk factors affect the probability of recovery from a mild disability.First we note that the coefficients for age is negative and significant. This implicates that olderclaimants recover more slowly. Surprisingly, the dummy variable for female is insignificant andnot included in the model. Hence, recovery from a mild disorder is independent of the gender ofthe claimant. For the dummy variables representing the occupational classes it is noted that aself-employed working in service or in the (para)medical sector has approximately a 20% higherrecovery rate, whereas a claimant working in the agricultural sector has 6% more chance torecover than insured working in other professions. Is also can be concluded that the recoveryrate is significantly influenced by almost all disorders. A digestive disorder indicates a higherrecovery rate, whereas locomotive or psychological disorders are indicators for slow recovery.Furthermore, claimants with symptoms, infections, or injuries recover faster than the average.On the other hand, cancer is an indicator for slow recovery. From the significant coefficient forthe previous state in can be concluded that if a claimant just experienced a partial recovery (somade a transition from state 2 to 1) he has a 52% higher recovery rate than a claimant who washealthy before entering state 1. All dummy variables for the year in which the disability startedare insignificant, except for the years 2005 and 2006. When compared to the year 2003 it canbe said that these years are an indicator for faster recovery. For the deferment period in can besaid that the longer the period, the lower the recovery rate.

Transition 2→ 0 (see table A.3)A transition from state 2 to 0 reflects a full recovery from a severe illness. We see that thecoefficients for both age and female are significant and negative. This means that females recovermore slowly than men, and the recovery rate decreases with age. There are two significantoccupational dummies: shopkeeper and construction. The first decreases the probability ofrecovery with 12%, whereas the latter one causes an increase in recovery of 24%. Most of thedisorder type dummies are significant, except for infections. We see that both locomotive andpsychological disorders slow down the recovery, although the effect of the second one is muchstronger. As with transition 1→ 0, it is the case that insured with a digestive disorder, symptomsor injuries recover faster. On the other hand, cancer is an indicator for longer duration in state2, slowing down the recovery rate with 75%. Further it is noticed that, compared to the year2003, insured who became disabled in 2006, 2008 or 2009 recover more slowly. Again it can beconcluded that the shorter the deferment period, the higher the recovery rate.

Page 57: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 6. RESULTS 52

Transition 2→ 1 (see table A.4)The last transition we should discuss is the partial recovery from a severe disorder to a milddisorder. We see that the significant coefficient of age is negative, so the older the self-employedthe lower the recovery rate. Almost all occupational dummies are significant, except the dummyfor service. Working in the agricultural or (para)medical sector, or in a shop influences therecovery rate in a positive way, while persons working in the construction recover more slowly.We notice that all disorder dummy variables are significant. Psychological disorder and cancerslow down the recovery, whereas the other types of disorders increase the recovery rate. Fromthe significant coefficient for the previous state 1 it can be concluded that a claimant has a lowerrecovery rate if he was healthy before entering state 2. Opposite to the transition 2→ 0 we seethat, when compared to the year 2003, insured who become disabled in later years recover faster.Like before it can be concluded that the shorter the deferment period, the higher the recoveryrate.

6.2.1 Unobserved heterogeneity

Besides the estimation results for the various risk factors, tables A.1-A.4 also report the frailtyterm. It is shown that these are significant for all four transition rates. However, the questionarises whether the results would be very different when this frailty term would be omitted.Therefore we have run the same model without unobserved heterogeneity. It turned out thatthere are 7 cases in which a variable was significant in one model and insignificant in the other.Therefore it can be concluded that adding a frailty term really influences the outcomes of themodel and that removing it would result in different conclusions.

6.2.2 Proportional hazards assumption

After fitting a proportional hazards model, one should test the proportional hazard assumption.As explained in section 4.4.1 this can be done visually or formally, based on the Schoenfeldresiduals1. The results of this Schoenfeld residuals test for the different transitions are given inAppendix A.1. Note that a large test statistic (and small p-value) suggests no proportionality.From these results it is concluded that, based on a 5% significance level, in about half of thecases proportionality does not hold. It now should be investigated how serious these violationsof the PH assumption are. Therefore we have plotted time-dependent coefficient plots of thedifferent covariates where the proportionality assumption was violated. For transition 1 → 2these are shown in Appendix A.1. We observe that in most cases the deviation outside theconfidence bounds of βj is numerically small. In some cases there are relatively large deviations,for example for the covariate psychological disorder. However, these deviations are mostly at theend of the time-horizon, when there are relatively few observations. Therefore we can concludethat the results are ’good enough’ to use the proportional hazards model.

1This in done by using the function cox.zph in R.

Page 58: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 6. RESULTS 53

6.3 Business cycle

The influence of a variable related to the business cycle was measured by adding it to the originalmodel and considering the values of the information criteria and the corresponding p-value. Ifthe BIC of the extended model was lower than that of the original and the p-value was significant,it was concluded that that specific variable affected the transition.

Survival modelAn overview of the coefficients of all variables per transition can be found in the tables A.9-A.12.The significant variables and the exponents of their corresponding coefficients are shown in table6.1. These values should be interpreted as follows: If, for example, the DNB-indicator increaseswith 1 point, for the hazard function we have λnew12 = 0, 971 · λold12 , and for the survival functionit holds that Snew12 = (Sold12 )0,971.

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator 0, 971 1, 021 − −Confidence 0, 996 1, 004 − −GDP 0, 971 1, 013 − −Labour market − − − −Income − − − −

Table 6.1: Summary of the significant variables per transition together with the exponent of theestimated coefficient (based on a 5% significance level).

Logit modelThe significant variables and the exponents of their corresponding coefficients are shown in table6.2. An overview of the coefficients of all variables per transition can be found in the tablesB.5-B.8.

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator 0, 970 1, 020 − −Confidence 0, 996 1, 004 − −GDP 0, 970 1, 013 − −Labour market − − − −Income − − − −

Table 6.2: Summary of the significant variables per transition, together with the exponent of theestimated coefficient.

It is noticed that again the results of both models are very similar. Therefore, the results of thelogit model will be viewed as a confirmation of the results obtained by applying survival analysis.We see that the variables ’labour market’ and ’income’ are insignificant for each transition.Because of this it is chosen to remove these two from further analysis. In the remainder of thisthesis only the DNB indicator, the confidence and the GDP will be considered.

Furthermore it is noticed that, based on a 5% significance level, the business cycle only affectsthe transitions 1→ 2 and 1→ 0. We see that in general, a positive business environment causesa higher recovery rate and a lower fall-back rate. The question arises whether this influence is thesame for all professions and disorders. To answer this question, we considered different sections

Page 59: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 6. RESULTS 54

and addressed the influence of the various business cycle indicators. The results for the differenttransitions can be found in Appendix A.2.1 and A.2.2 . We will give a summary of the results:

Influence per profession

• The influence of the business cycle on claimants working in construction is significant forthe fall-back rate and the recovery from a mild disorder. The variable ’confidence’ alsosignificantly affects full recovery from a severe disorder. However, there are no indicationsfor a relation between the business cycle and partial recovery from a severe disorder.

• Insured working in the agricultural sector or as a shopkeeper are not affected by changesin the business cycle.

• For claimants in the (para)medical sector the economic environment only influences fullrecovery from a mild disorder. All other transitions are unaffected.

• There is only one significant coefficient for insured working in service. The higher the GDPthe lower the fall-back rate and vice versa.

It can be concluded that the influence of the business cycle is foremost noticeable for claimantsworking in construction. The influence of the economy on the other professions is less evident.

Influence per disorder

• The business cycle clearly affects the recovery process of insured with a locomotive dis-order, since the variables are significant for the all transitions, except 2 → 0. Hence, theprobability of full recovery from a severe locomotive disorder is unaffected by the economicenvironment.

• There is no significant relationship between the state of the economy and the recoveryprocess of claimants with a digestive disorder.

• Insured with cancer or a psychological disorder are hardly affected by the business cycle.For both there is only one significant coefficient.

• For insured with disorder type ’symptoms’ the economic situations mainly affects smallrecovery, so transition 1 → 0 and 2 → 1. The influence on the fall-back rate is less clearand there are no significant coefficients for full recovery from a severe disorder.

• When a claimant suffers from an infection, the business cycle only influences the transition1→ 0. The better the economy, the higher the probability of recovery from a mild infection.

• There are indications that the business cycle affects the recovery process of claimants withan injury. However, there is no consistent influence on one of the transitions in particular.

Referring back to table 2.3 it can be concluded that the influence of the business cycle is mostevident for claimants with locomotive symptoms. There are also indications for influence onclaimants with a locomotive injury. On the other hand, there is hardly no influence on insuredwith other types of disability.

It is noticed that in almost all cases either all three business cycle related variables are significantor they all are insignificant. Looking at the graphs of the DNB-indicator, the confidence and theGDP as shown in appendix C, we see that they approximately follow the same pattern (only on adifferent scale). Besides this there is a lot of interaction between the different variables. Therefore

Page 60: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 6. RESULTS 55

it would be incorrect to add them all to the model. Instead of that, we should pick one. Now thequestion is: Which variable should be used to measure the business cycle? Different motivationscould be used resulting in different answers. For example, the GDP has the advantage of beingobjective, since its value is determined by the formula:

GDP = private consumption + gross investment + government spending + (exports - imports).

On the other hand, the DNB-indicator has the benefit of covering 7 months forward. Thereforethis variable could be used by the insurance company to adjust the premium or develop specialrecovery programs in an early stage. Because of this forecasting property it is chosen to add theDNB-indicator to our model as the business cycle-related variable.

Page 61: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Chapter 7

Quantifying the influence of therisk factors

In the previous chapter we considered the coefficients and corresponding hazards of the variousvariables. These two only provide information about the influence of the variable on the transitionrate. However, the insurance company is mainly interested in the influence of the risk factors onthe average recovery time. In this chapter we will calculate the expected duration until recovery,and we will show how this duration changes if the value of one of the covariate changes. Firstwe consider the transition probabilities in the multi-state model. These are used to calculatethe so-called taboo probabilities which are needed to determine the expected recovery period.Finally, the results of this calculations will be discussed in section 7.2.

7.1 Transition probabilities in the multi-state model

We start this section with a short review of the multi-state model. For an introduction to Markovchains and transition probabilities we refer to section 3.2.1.

Our multi-state model consists of three states, 0, 1 and 2, so the state space is given by S =0, 1, 2. We define S(t) as the random state at time t, where S(t) ∈ S and t > 0. A transitionfrom one state to another takes place with intensity λij(t) = λij(t|x), for i, j ∈ S. The transitionprobabilities are denoted by:

Pij(t, s) = P (S(s) = j|S(t) = i).

The relationship between the transition probabilities and intensities is given by:

λij(t) = lims→t

Pij(t, s)

s− t. (7.1)

The transition intensities of our model are known and they are given in chapter 6. We now wouldlike to express the transition probabilities in these transition intensities.

Theorem 2. The transition probabilities and transition intensities satisfy the Kolmogorov back-ward differential equations

d

dtPij(t, s) = Pij(t, s)

∑j:j 6=i

λij(t)−∑k:k 6=i

Pkj(t, s)λik(t), (7.2)

56

Page 62: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 7. QUANTIFYING THE INFLUENCE OF THE RISK FACTORS 57

and the Kolmogorov forward differential equations

d

dsPij(t, s) =

∑k:k 6=j

Pik(t, s)λkj(s)− Pij(t, s)∑i:i 6=j

λji(s). (7.3)

Proof. By using the Markov property (definition 3.1) and the law of total probability, it is realizedthat, for t < u < s :

P (S(t) = i, S(s) = j) =∑k

P (S(t) = i, S(u) = k, S(s) = j)

=∑k

P (S(s) = j|S(u) = k)P (S(u) = k|S(t) = i)P (S(t) = i)

Diving both sides by P (S(t) = i) results in

Pij(t, s) =∑k

Pik(t, u)Pkj(u, s) ∀i, j, k ∈ S and s > t > 0.

These equations are known as the Chapman-Kolmogorov equations. Next we consider:

Pij(u, s)− Pij(t, s) = Pij(u, s)−∑k

Pik(t, u)Pkj(u, s)

= [1− Pii(t, u)]Pij(u, s)−∑k:k 6=i

Pik(t, u)Pkj(u, s).

Dividing by u− t and taking the limits as u→ t results in

limu→t

Pij(u, s)− Pij(t, s)u− t

= limu→t

1− Pii(t, u)

u− tPij(u, s)−

∑k:k 6=i

Pik(t, u)Pkj(u, s)

u− t

= limu→t

∑j:j 6=i Pij(t, u)

u− tPij(u, s)−

∑k:k 6=i

Pkj(u, s)Pik(t, u)

u− t

.Since we are summing over all state spaces, the summing index is finite. So we may interchangethe limit and summation. Using (7.1) this results in

d

dtPij(t, s) = Pij(t, s)

∑j:j 6=i

λij(t)−∑k:k 6=i

Pkj(t, s)λij(t).

Hence (7.2) is proven. Analogue, the proof of (7.3) can be done.

However, the derivation of the transition probabilities from the Kolmogorov differential equa-tions is rather complex. To simplify this, Haberman and Pitacco [20] propose to assume time-homogeneous transition intensities. Unfortunately, this method in not applicable in our situation,since there is a negative duration dependence, as shown in chapter 6. Another possible solutionis to consider the occupancy probabilities:

Definition 8. The occupancy probability is the probability of staying in a state and is, for i ∈ Sand t < s defined as

Pii(t, s) := P (S(u) = i ∀u ∈ [t, s] |S(t) = i). (7.4)

Page 63: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 7. QUANTIFYING THE INFLUENCE OF THE RISK FACTORS 58

This definition is different from the definition of Pii(t, s), which also indicates the probability ofbeing in the same state for t and s, but one could have visited other states j (j 6= i) between tand s. Like before, the relationship between occupancy probabilities and transition intensitiescan be derived from the Kolmogorov backward different equations;

d

dtPii(t, s) = −Pii(t, s)

∑j:j 6=i

λij(t).

These equations can be solved by using the boundary condition Pii(t, t) = 1, resulting in

Pii(t, s) = exp

−∫ s

t

∑j:j 6=i

λij(u) du

, (7.5)

as can be verified by differentiation.In order to determine the expected duration until recovery we need the transition probabilitiesbetween the different states. As explained in the previous section, however, we have no solutionfor a closed formula of Pij(t, s). On the other hand we do have a closed form solution for theoccupancy probabilities. Using (7.5) we have for t < s:

P11(t, s) = exp

(−∫ s

t

[λ12(u) + λ10(u)] du

); (7.6)

P22(t, s) = exp

(−∫ s

t

[λ21(u) + λ20(u)] du

). (7.7)

Since our time-axis t is on a monthly scale, an insured can make only one transition between tand t + 1, or he stays in his current state. So in our case we have Pii(t, t + 1) = Pii(t, t + 1).Therefore it is possible to derive a closed form solution for the one-month transition probabilitiesPii(t, t + 1). The probability that a claimants leaves state i in the interval (t, t + 1] is equal to1−Pii(t, t+ 1). This portion should be divided proportionally to the transition intensities. Thisresults in the following one-month transition and occupancy probabilities:

P11(t, t+ 1) = exp

(−∫ t+1

t

[λ10(u) + λ12(u)]du

); (7.8)

P10(t, t+ 1) = (1− P11(t, t+ 1))

∫ t+1

tλ10(u)du∫ t+1

t[λ10(u) + λ12(u)]du

; (7.9)

P12(t, t+ 1) = 1− P11(t, t+ 1)− P10(t, t+ 1); (7.10)

P22(t, t+ 1) = exp

(−∫ t+1

t

[λ20(u) + λ21(u)]du

); (7.11)

P20(t, t+ 1) = (1− P22(t, t+ 1))

∫ t+1

tλ20(u)du∫ t+1

t[λ20(u) + λ21(u)]du

; (7.12)

P12(t, t+ 1) = 1− P11(t, t+ 1)− P10(t, t+ 1). (7.13)

With these probabilities it is now possible to calculate the probabilities Pij(t, s) for every pairt < s. However, in order to determine the expected duration until recovery we do not need toknow al these probabilities. We only need to know the probability of starting in state 1 or 2,

Page 64: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 7. QUANTIFYING THE INFLUENCE OF THE RISK FACTORS 59

follow some path through these two states an arriving in state 0 for the first time at time t (forall t = 2, 3, ..). This concept is captured in the so-called taboo probabilities.

Definition 9. For i ∈ 1, 2 the taboo probability is given by

qi0(1, t) = Pi[S(2) 6= 0, ..., S(t− 1) 6= 0, S(t) = 0]

qi0(1, 1) = 0.

In order to calculate the taboo probabilities we define the matrices Q(t) and R(t), t = 1, 2, .. by

Q(t) =

0 0 00 p11(t, t+ 1) p12(t, t+ 1)0 p21(t, t+ 1) p22(t, t+ 1)

(7.14)

R(t) =

0 0 0p10(t, t+ 1) 0 0p20(t, t+ 1) 0 0

(7.15)

where the entries are defined by (7.8)-(7.13). Combining this, we obtain, for i ∈ 1, 2 andt = 2, 3, ..

qi0(1, t) =

(t−1∏s=1

Q(i) ·R(s)

)i1

. (7.16)

Using definition 9 the expected duration until recovery ERi of a self-employed with risk factorsX = x, starting in state i, is given by

ERi =

∞∑t=1

qi0(1, t) · (t− 1). (7.17)

It is noted that this summing equation is from t = 1 to infinity, but only the transition inten-sities for t = 1, ..., 100 are known. However, at t = 101 there are still some claims continuing.Neglecting these claims would seriously bias the expected duration. Therefore it will be assumedthat these claims last for 101 months.

7.2 Expected duration until recovery

In order to determine the influence of the different risk factors on the expected recovery time,we will calculate the expected duration until recovery as described in the previous section. Firstof all, we will do the calculations for a benchmark self-employed starting in state 1, and we willrepeat this for the same self-employed starting in state 2. Subsequently, we will change the valueof one of the covariates and re-calculate the expected duration until recovery. In this way we canmeasure the impact of a change in one of the risk factors.

As benchmark self-employed is it chosen to consider the ’average’ insured; a male of 42, workingin the agricultural sectoring, suffering from locomotive symptoms (also known as an L1-disorder).Like the majority of the claimants, he has a deferment period of 14 days. The time-dependentDNB-indicator representing the business cycle gives rise to a problem. Since the future valuesof these variable is unknown, it cannot be used for the prediction of the expected recovery time.This is solved by assuming that the value of these indicator is constant during a spell. For

Page 65: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 7. QUANTIFYING THE INFLUENCE OF THE RISK FACTORS 60

State 1 State 2Benchmark 9, 88 14, 64

Age = 60 18, 20 25, 34Gender = Female 11, 16 16, 36Profession = Construction 10, 16 15, 36Profession = (Para)Medical 7, 67 12, 32Profession = Service 8, 14 13, 25Profession = Shopkeeper 11, 37 17, 61Disorder = Loc. injury 5, 19 9, 27Disorder = Psych. symptoms 9, 92 20, 57Disorder = Digestive 5, 74 8, 38Disorder = Dig. cancer 13, 40 30, 28Deferment = B 12, 91 19, 12Deferment = C 19, 35 25, 93Deferment = D 28, 58 40, 42DNB = −2 10, 72 15, 46DNB = 2 9, 11 13, 86

Table 7.1: Estimates of the expected recovery time (in months) for self-employed with differentcharacteristics, starting in disability state 1 or 2. The benchmark self-employed is a male of 42working in the agricultural sector, suffering from locomotive symptoms, with a deferment periodof 14 days and the DNB-indicator equal to 0.

the benchmark self-employed it is assumed that this value is equal to the average value during2003− 2011, namely 0.

The results of the calculations can be found in table 7.1. In the first column the expected durationuntil recovery is shown for claimants starting with a mild disorder. For claimants starting witha severe disorder the results are given the second column. We will consider a difference inexpectation significant if and only if it an increase/decrease of at least 5%. So for state 1,expectations are considered to be significant when they are outside the interval [9, 39; 10, 37],and for state 2 outside the interval [13, 91; 15, 37].We will start by discussing the result for the benchmark self-employed and subsequently discussthe influence of the various risk factors. The expected duration until recovery of the benchmarkself-employed depends significantly on whether he started in state 1 or 2. If he has started in state1, his expected duration until recovery is 9, 9 months. However, if the benchmark self-employedwould have started in state 2 instead of 1, his disability spell would last an expected period of14, 6 months. When comparing the columns of table 7.1 for the other characteristics, it is notedthat in all cases the expected duration when started in state 2 is significantly higher than whenstarted in state 1. Therefore it can be concluded that the expected duration until recovery ofa self-employed starting with a mild disorder is significantly lower than that of a self-employedstarting with a severe disorder. Furthermore it is noted that in both cases the expected durationuntil recovery significantly rises when the claimant in 60 years old. Also for female insured wesee an increase in the expected recovery time.

Concentrating on the different occupational classes it is noted that claimants working in the(para)medical sector or in service recover significantly faster than claimants in the agriculturalsector. On the other hand, working as a shopkeeper significantly increases the expected duration

Page 66: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 7. QUANTIFYING THE INFLUENCE OF THE RISK FACTORS 61

until recovery. For claimants working in the construction we see a slightly increased expectation of0, 3 (state 1) and 0, 7 months (state 2). This is a increase of less than 5%, hence it is insignificant.

Changes in the disorder type lead in almost all cases to significantly different estimates of theexpected duration until recovery from both a mild and a severe disorder. In both cases, claimantswith locomotive injuries or a digestive disorder recovery significantly faster than claimants withlocomotive symptoms. On the other hand, claimants with digestive cancer experience a muchlonger disability spell. When they start in state 2, the expected duration until recovery is morethan the double of the benchmark self-employed. Insured with a mild psychological discover showno significant deviation from the expected duration of the benchmark. However, for insured witha severe psychological disorder we observe an increase in expected duration of almost 5 months.

For both insured starting in state 1 and insured starting in state 2, the length of the defermentperiod is a significant indicator for the duration of the disability spell. It can be said that thelonger the deferment period, the longer it takes for a claimant to recover. This is probably causedby two reasons: First of all it sometimes is the case that claimants do not report their disabilitywhen its duration is shorter than the deferment period. Second, a selection effect can play animportant role. Most claimants choose for a longer deferment period either when they have a lotof savings to cover the first period, or when they suspect that they will not experience (many)short spells. So when they do suffer from a disorder, there is a higher chance that this spell willlast longer than average.

An increase in the DNB-indicator reflects an improvement of the economic situation. We seethat when the DNB-indicator increases from −2 to 2, the expected recovery period decreasessignificantly, for both state 1 and 2. For claimants starting with a mild disorder we see anaverage duration of 10, 72 months when the indicator is at its minimum, and a duration of9, 11 months when it is at its maximum. For claimants starting with a severe disorder thesevalues are respectively 15, 46 and 13, 86 months. Hence, in both cases we observe a decline inexpected recovery period of 1, 6 months. Therefore it can be concluded that an increase in theDNB-indicator leads to a decrease in the expected duration until recovery.

7.2.1 Other causes of the fluctuations in the loss ratio

Over the period 2003 − 2011 the DBN-indicator ranged from −2, 02 (August 2008) to 2, 74(July 2007). Hence this means a maximum difference of expected duration until recovery ofapproximately 1, 9 months. When comparing graph 1.3 and C.1a, it is noted that in the firstyears an increase in the DNB-indicator corresponds with a decrease in the percentage of insuredwith a claim and vice versa - as expected. However in the period after June 2009 this is not thecase. Therefore it is assumed that the fluctuations in the loss ratio are not solely explained bychanges in the business cycle, but that there are also other processes going on. For example,both the man/woman-ratio and the ratio between the different professions has changed overtime. In 2003 above 91% percent of the claims were made by man. This number declined toapproximately 85% in 2011. Since it is shown that in general women recover more slowly, thisdifference in ratio is another reason for the increase in average recovery time. Another changein the composition of the portfolio is one in the ratio of the different occupational classes, as isshown in the following graph.

Page 67: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 7. QUANTIFYING THE INFLUENCE OF THE RISK FACTORS 62

Figure 7.1: Change of ratio in professions over the different years in which the disability started.

We notice that the percentage of claimants working in the agricultural sector decreased veryfast, while the percentage of the other professions increased. For example, the percentage ofclaimants working in service doubled over the last two years. This is important because thedifferent professions have different expected durations until recovery, as was shown in table 7.1.Furthermore, the characteristics of the claimants differ per profession, as shown in figure 7.2aand 7.2b.

(a) Disorders per profession (in %). (b) Ratio men/women per profession (in %).

Page 68: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Chapter 8

Conclusion and advice

The aim of this thesis was to explain the fluctuations in the loss ratio of the disability insurancecompany and to give advice on how this explanation could be used in order to improve thecurrent models. We used a unique data set containing more than 30000 claims during the period2003 − 2011. The focus was primarily on the influence of the business cycle on the recoveryprocess. In order to differentiate between negative, partial and full recovery a multi-state modelwas introduced. The model consisted of three states: one healthy state and two disability states.Only the transitions between the two disability states and from the disability states to thehealthy state were considered. These transitions were estimated by both a proportional hazardsmodel and a logistic regression model. First a basic model was created, after which the influenceof the various business cycle related variables was addressed. The results of the two differentestimation methods were almost identical: the business cycle mainly affects the fall-back rateand full recovery from a mild disorder. Both transitions are affected by the DNB-indicator, theleading variable ’confidence’ and the coincident variable GDP. On the other hand, it was shownthat the lagging variable ’labour’ and the average income of self-employed has no influence onthe recovery process.

For transition 1 → 0 the business cycle-related variables have a positive effect: the higher thevalue of these variables, the higher the rate of full recovery from a mild disorder. For transition1 → 2 it is exactly the other way around: the higher the value of the business-cycle relatedvariables, the lower the fall-back rate. From these findings it can be concluded that a positivebusiness climate results in less negative recovery and more recovery from a mild disorder. Itwas also investigated whether these results differ per profession or disorder. It turned out thatthe influence of the business cycle is most significant for claimants working in construction andclaimants suffering from locomotive symptoms. For the other occupational classes and types ofdisability the influence was hardly, or not at all, noticeable.

These findings could be used when constructing an internal model for Solvency II. Since disabilityduration lasts longer in periods of low confidence, low GDP and a low value of the DNB-indicator,the insurance company should increase the required loss reserves in these periods, and vice versa.Another advantage of knowing what affects slow recovery, is that the company can provide extraservices, such as prevention or reintegration services, to repulse long sick-leave durations intimes of economic downturn. Since it was shown that claimants with a psychological disorderor cancer are not influenced by changes in the business climate, it is probable that for theseclaimants extra reintegration services will not be beneficial. The same, for example, holds truefor claimants working in the agricultural sector or as a shopkeeper.

63

Page 69: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

CHAPTER 8. CONCLUSION AND ADVICE 64

In order to quantify the influence of the different risk factors one of the business cycle-relatedvariables had to be added to the model. It was chosen to pick the DNB-indicator because of itsforecasting property. Subsequently the influence of the risk factors was addressed by determiningthe expected duration until recovery, both when started with a mild disorder and when startedwith a severe disorder. These results were compared to the expected durations of the benchmarkself-employed. It was shown that a difference in the DNB-indicator can account for a longerexpected duration until recovery of at most 1, 9 months. Therefore it can be concluded thatchanges in the business cycle did influence the percentage of insured with a claim and thereforeaffected the loss ratio of the insurance company. However, when compared to the other riskfactors, it is noted that the influence of the DNB-indicator is relatively small. It turned outthat the composition of the portfolio was not constant, but it changed over time. For instance,the men/women-ratio declined, meaning that in most recent years there were more women andless men than in earlier years. Also the ratio of the different professions changed during thelast decade. The percentage of claimants working in the agricultural sector decreased, while thecontribution of the other professions in our portfolio increased. This development is importantbecause the characteristics of the claimants differ per profession, for example on the frequencyof the various disorders.

Finally, some recommendations for further research are given. First of all it would be interestingto quantify the influence of the change in the portfolio. In this way it could be determinedwhich part of the increase in the average disability duration is caused by the business cycle andwhich by changes in the composition of the insured. Second, when determining the expectedduration until recovery it was assumed that the value of the time-dependent DNB-indicator wasconstant during a spell. However, to obtain more accurate expectations one should account forthe changes the DNB-indicator makes over time. Last, we only considered the data from theyears 2003−2011, which consisted of one period of economic growth and one of economic decline.By adding the years 1995 − 2002 to the data set one would be able to compare two economiccycles with each other, since in that period the ’dot-com bubble’ had its climax.

Page 70: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

65

Page 71: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 66

Appendix A

Estimation results of the MPHmodel

Coef Exp(Coef) se(Coef) p-ValueAge - - - -Female 0, 142 1, 153 0, 054 0, 008Agriculture −0, 110 0, 895 0, 049 0, 024Construction 0, 178 1, 195 0, 054 0, 003(Para)Medical −0, 204 0, 816 0, 077 0, 008Service −0, 177 0, 838 0, 101 0, 078Shopkeeper −0, 242 0, 785 0, 074 0, 001Locomotive 0, 006 1, 006 0, 049 0, 900Psychological −0, 146 0, 864 0, 070 0, 036Digestive 0, 761 2, 141 0,084 0, 000Symptoms −0, 002 0, 998 0, 042 0, 970Cancer 0, 695 2, 004 0, 101 0, 000Infections −0, 257 0, 773 0, 160 0, 110Injury −0, 043 0, 957 0, 053 0, 410Prev. state − − − −Year 2004 − − − −Year 2005 − − − −Year 2006 − − − −Year 2007 − − − −Year 2008 − − − −Year 2009 − − − −Year 2010 − − − −Year 2011 − − − −Deferment B −0, 050 0, 951 0, 042 0, 230Deferment C −0, 074 0, 929 0, 053 0, 160Deferment D −0, 479 0, 619 0, 542 0, 380Amount insured - - - -

Frailty 0,001

Table A.1: Estimation results for the MPH model, transition 1→ 2.

Page 72: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 67

Coef Exp(Coef) se(Coef) p-ValueAge −0, 020 0, 980 0, 001 0, 000Female − − − −Agriculture 0, 060 1, 062 0, 026 0, 018Construction 0, 055 1, 057 0, 032 0, 079(Para)Medical 0, 176 1, 193 0, 040 0, 000Service 0, 180 1, 197 0, 050 0, 000Shopkeeper −0, 044 0, 957 0, 038 0, 250Locomotive −0, 115 0, 891 0, 026 0, 000Psychological −0, 167 0, 846 0, 035 0, 000Digestive 0, 452 1, 572 0, 049 0, 000Symptoms 0, 185 1, 203 0, 022 0, 000Cancer −0, 155 0, 856 0, 077 0, 043Infections 0, 145 1, 156 0, 073 0, 046Injury 0,528 1, 695 0, 026 0, 000Prev. state 2 0, 419 1, 521 0, 020 0, 000Year 2004 0, 024 1, 024 0, 035 0, 500Year 2005 0, 120 1, 127 0, 035 0, 001Year 2006 0, 072 1, 074 0, 035 0, 040Year 2007 0, 062 1, 064 0, 035 0, 079Year 2008 0, 034 1, 035 0, 035 0, 330Year 2009 0, 029 1, 030 0, 035 0, 410Year 2010 0, 056 1, 058 0, 038 0, 140Year 2011 0, 007 1, 007 0, 068 0, 920Deferment B −0, 161 0, 851 0, 021 0, 000Deferment C −0, 431 0, 650 0, 029 0, 000Deferment D −0, 722 0, 486 0, 360 0, 045Amount insured - - - -

Frailty 0, 000

Table A.2: Estimation results for the MPH model, transition 1→ 0.

Page 73: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 68

Coef Exp(Coef) se(Coef) p-ValueAge −0, 026 0, 975 0, 002 0, 000Female −0, 142 0, 862 0, 055 0, 010Agriculture −0, 089 0, 970 0, 046 0, 054Construction 0, 244 1, 243 0, 050 0, 000(Para)Medical 0, 059 1, 084 0, 076 0, 440Service 0, 155 1, 170 0, 089 0, 080Shopkeeper −0, 160 0, 881 0, 073 0, 028Locomotive −0, 410 0, 708 0, 045 0, 000Psychological −1, 007 0, 398 0, 067 0, 000Digestive 0, 593 1, 786 0, 068 0, 000Symptoms 0, 277 1, 285 0, 043 0, 000Cancer −1, 484 0, 247 0, 126 0, 000Infections 0, 010 1, 016 0, 132 0, 940Injury 0, 485 1, 564 0, 048 0, 000Prev. state 1 0, 059 1, 061 0, 057 0, 300Year 2004 −0, 026 0, 003 0, 062 0, 680Year 2005 −0, 054 0, 114 0, 062 0, 390Year 2006 −0, 134 0, 063 0, 063 0, 033Year 2007 −0, 056 0, 040 0, 063 0, 370Year 2008 −0, 232 0, 015 0, 063 0, 000Year 2009 −0, 188 0, 007 0, 063 0, 003Year 2010 −0, 116 0, 030 0, 066 0, 077Year 2011 −0, 313 0, 978 0, 113 0, 006Deferment B −0, 237 1, 251 0, 038 0, 000Deferment C −0, 402 1, 040 0, 055 0, 000Deferment D −0, 322 1, 062 0, 421 0, 440Amount insured - - - -

Frailty 0, 000

Table A.3: Estimation results for the MPH model, transition 2→ 0.

Page 74: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 69

Coef Exp(Coef) se(Coef) p-ValueAge −0, 006 0, 994 0, 001 0, 011Female - - - -Agriculture 0, 344 1, 410 0, 033 0, 000Construction −0, 194 0.823 0, 039 0, 000(Para)Medical 0, 262 1, 300 0, 051 0, 000Service 0, 007 1, 007 0, 065 0, 910Shopkeeper 0, 147 1, 159 0, 051 0, 004Locomotive 0, 270 1, 310 0, 034 0, 000Psychological −0, 263 0, 769 0, 045 0, 000Digestive 0, 126 1, 134 0, 059 0, 033Symptoms 0, 152 1, 164 0, 030 0, 000Cancer −1, 059 0, 347 0, 073 0, 000Infections 0, 270 1, 310 0, 103 0, 009Injury 0, 115 1, 122 0, 035 0, 001Prev. state 1 0, 100 1, 105 0, 032 0, 002Year 2004 0, 169 1, 184 0, 047 0, 000Year 2005 0, 238 1, 269 0, 047 0, 000Year 2006 0, 191 1, 211 0, 047 0, 000Year 2007 0, 274 1, 316 0, 047 0, 000Year 2008 0, 167 1, 182 0, 046 0, 000Year 2009 0, 142 1, 153 0, 047 0, 002Year 2010 0, 207 1, 230 0, 049 0, 000Year 2011 0, 132 1, 142 0, 083 0, 110Deferment B −0, 118 0, 888 0, 028 0, 000Deferment C −0, 230 0, 794 0, 038 0, 000Deferment D −1, 378 0, 252 0, 343 0, 000Amount insured - - - -

Frailty 0, 000

Table A.4: Estimation results for the MPH model, transition 2→ 1.

Page 75: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 70

A.1 PH assumption

χ2-stat. p-value χ2-stat. p-valueFemale 1, 561 0, 211 Symptoms 1, 770 0, 183Agriculture 1, 160 0, 281 Cancer 7, 766 0, 005Construction 0, 036 0, 851 Infections 10, 294 0, 001(Para)Medical 3, 148 0, 076 Injury 11, 054 0, 001Service 3, 997 0, 046 Deferment B 7, 882 0, 005Shopkeeper 0, 040 0, 841 Deferment C 1, 877 0, 171Locomotive 3, 475 0, 062 Deferment D 0, 967 0, 325Psychological 0, 023 0, 879 GLOBAL 105, 129 0, 000Digestive 4, 507 0, 034

Table A.5: Test statistics of the univariate test for non-proportionality and global test of pro-portional hazard for the transition 1→ 2.

χ2-stat. p-value χ2-stat. p-valueAge 19, 10 0, 000 Year 2004 0, 670 0, 413Female 15, 00 0, 000 Year 2005 0, 450 0, 502Agriculture 3, 690 0, 055 Year 2006 0, 360 0, 548Construction 2, 540 0, 111 Year 2007 4, 800 0, 028(Para)Medical 1, 850 0, 174 Year 2008 0, 118 0, 731Service 0, 567 0, 452 Year 2009 1, 260 0, 261Shopkeeper 2, 030 0, 154 Year 2010 0, 472 0, 492Locomotive 9, 410 0, 002 Year 2011 5, 090 0, 024Psychological 273, 0 0, 000 Previous state 2 34, 00 0, 000Digestive 0, 858 0, 354 Deferment B 52, 00 0, 000Symptoms 19, 30 0, 000 Deferment C 102, 0 0, 000Cancer 0, 123 0, 726 Deferment D 0, 002 0, 966Infections 0, 161 0, 688 GLOBAL 920, 0 0, 000Injury 108, 0 0, 000

Table A.6: Test statistics of the univariate test for non-proportionality and global test of pro-portional hazard for the transition 1→ 0.

Page 76: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 71

χ2-stat. p-value χ2-stat. p-valueAge 0, 844 0, 358 Year 2004 4, 160 0, 041Female 5, 720 0, 017 Year 2005 1, 690 0, 193Agriculture 4, 280 0, 039 Year 2006 7, 780 0, 005Construction 0, 003 0, 955 Year 2007 2, 060 0, 152(Para)Medical 14, 80 0, 000 Year 2008 4, 110 0, 043Service 0, 255 0, 613 Year 2009 8, 690 0, 003Shopkeeper 0, 102 0, 750 Year 2010 12, 70 0, 000Locomotive 47, 20 0, 000 Year 2011 1, 070 0, 301Psychological 167, 0 0, 000 Previous state 2 0, 909 0, 340Digestive 14, 20 0, 000 Deferment B 2, 260 0, 133Symptoms 7, 970 0, 005 Deferment C 0, 039 0, 843Cancer 0, 264 0, 608 Deferment D 0, 261 0, 609Infections 0, 594 0, 441 GLOBAL 251, 0 0, 000Injury 12, 10 0, 001

Table A.7: Test statistics of the univariate test for non-proportionality and global test of pro-portional hazard for the transition 2→ 0.

χ2-stat. p-value χ2-stat. p-valueAge 9, 106 0, 003 Year 2004 0, 522 0, 470Agriculture 0, 014 0, 906 Year 2005 1, 097 0, 295Construction 6, 095 0, 014 Year 2006 6, 228 0, 013(Para)Medical 0, 683 0, 408 Year 2007 2, 513 0, 113Service 3, 447 0, 063 Year 2008 6, 919 0, 008Shopkeeper 5, 526 0, 019 Year 2009 21, 064 0, 000Locomotive 3, 239 0, 072 Year 2010 15, 137 0, 000Psychological 16, 621 0, 000 Year 2011 6, 389 0, 012Digestive 0, 013 0, 910 Previous state 2 80, 522 0, 000Symptoms 8, 606 0, 003 Deferment B 8, 140 0, 004Cancer 21, 125 0, 000 Deferment C 77, 106 0, 000Infections 0, 868 0, 352 Deferment D 0, 639 0, 424Injury 2, 110 0, 146 GLOBAL 89, 486 0, 000

Table A.8: Test statistics of the univariate test for non-proportionality and global test of pro-portional hazard for the transition 2→ 1.

Page 77: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 72

(a) Service, p-value: 0, 046 (b) Digestive, p-value: 0, 034

(c) Cancer, p-value: 0, 005 (d) Infections, p-value: 0, 001

(e) Injury, p-value: 0, 001 (f) Deferment B, p-value: 0, 005

Figure A.1: Time-dependent coefficient plots for variables where the proportional hazards as-sumption is violated (transition 1→ 0)

Page 78: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 73

A.2 Business cycle

Coefficient Exp(coef) p-valueDNB indicator −0, 030 0, 971 0, 013Confidence −0, 004 0, 996 0, 050GDP −0, 029 0, 971 0, 000Labour market −0, 001 0, 999 0, 610Income −0, 013 0, 987 0, 110

Table A.9: Coefficient, hazard rates and p-values for the business cycle variables, transition1→ 2.

Coefficient Exp(coef) p-valueDNB indicator 0, 022 1, 022 0, 026Confidence 0, 004 1, 004 0, 003GDP 0, 013 1, 013 0, 018Labour market −0, 000 1, 000 0, 750Income 0, 005 1, 005 0, 520

Table A.10: Coefficient, hazard rates and p-values for the business cycle variables, transition1→ 0.

Coefficient Exp(coef) p-valueDNB indicator 0, 004 1, 004 0, 870Confidence 0, 005 1, 005 0, 110GDP 0, 014 1, 014 0, 280Labour market −0, 001 0, 999 0, 670Income 0, 021 1, 021 0, 220

Table A.11: Coefficient, hazard rates and p-values for the business cycle variables, transition2→ 0.

Coefficient Exp(coef) p-valueDNB indicator −0, 019 0, 981 0, 110Confidence −0, 002 0, 998 0, 210GDP −0, 013 0, 987 0, 056Labour market 0, 000 1, 000 0, 830Income 0, 001 1, 001 0, 900

Table A.12: Coefficient, hazard rates and p-values for the business cycle variables, transition2→ 1.

Page 79: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 74

A.2.1 Influence business cycle per profession

Agriculture

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − − − −Confidence − − − −GDP − − − −

Table A.13: Profession = agriculture: Summary of the significant variables per transition to-gether with the exponent of the estimated coefficient (based on a 5% significance level).

Construction

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator 0, 941 1, 059 − −Confidence 0, 988 1, 010 1, 015 −GDP 0, 950 1, 038 − −

Table A.14: Profession = construction: Summary of the significant variables per transitiontogether with the exponent of the estimated coefficient (based on a 5% significance level).

(Para)Medical

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − 1, 064 − −Confidence − 1, 010 − −GDP − − − −

Table A.15: Profession = (para)medical: Summary of the significant variables per transitiontogether with the exponent of the estimated coefficient (based on a 5% significance level).

Service

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − − − −Confidence − − − −GDP 0, 928 − − −

Table A.16: Profession = service: Summary of the significant variables per transition togetherwith the exponent of the estimated coefficient (based on a 5% significance level).

Page 80: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 75

Shopkeeper

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − − − −Confidence − − − −GDP − − − −

Table A.17: Profession = shopkeeper: Summary of the significant variables per transition to-gether with the exponent of the estimated coefficient (based on a 5% significance level).

A.2.2 Influence business cycle per disorder

Locomotive

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator 0, 971 1, 013 − 1, 024Confidence − 1, 003 − 1, 004GDP 0, 966 1, 009 − −

Table A.18: Disorder = locomotive: Summary of the significant variables per transition togetherwith the exponent of the estimated coefficient (based on a 5% significance level).

Psychological

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − − − −Confidence − − − −GDP 0, 952 − − −

Table A.19: Disorder = psychological: Summary of the significant variables per transition to-gether with the exponent of the estimated coefficient (based on a 5% significance level).

Digestive

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − − − −Confidence − − − −GDP − − − −

Table A.20: Disorder = digestive: Summary of the significant variables per transition togetherwith the exponent of the estimated coefficient (based on a 5% significance level).

Page 81: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 76

Symptoms

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − 1, 024 − 1, 026Confidence − 1, 004 − 1, 004GDP 0, 973 1, 013 − 1, 014

Table A.21: Disorder type = symptoms: Summary of the significant variables per transitiontogether with the exponent of the estimated coefficient (based on a 5% significance level).

Cancer

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − − − −Confidence − − − −GDP − − − 0, 949

Table A.22: Disorder type = cancer: Summary of the significant variables per transition togetherwith the exponent of the estimated coefficient (based on a 5% significance level).

Infections

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − 1, 131 − −Confidence − 1, 018 − −GDP − 1, 050 − −

Table A.23: Disorder type = infections: Summary of the significant variables per transitiontogether with the exponent of the estimated coefficient (based on a 5% significance level).

Injury

1→ 2 1→ 0 2→ 0 2→ 1DNB indicator − − − 1, 026Confidence − − − −GDP 0, 968 − − −

Table A.24: Disorder type = injury: Summary of the significant variables per transition togetherwith the exponent of the estimated coefficient (based on a 5% significance level).

Page 82: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

77

Page 83: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX B. ESTIMATION RESULTS OF THE LOGIT MODEL 78

Appendix B

Estimation results of the logitmodel

Coef Exp(Coef) se(Coef) p-ValueAge - - - -Female 0, 147 1, 158 0, 054 0, 006Agriculture −0, 104 0, 925 0, 050 0, 036Construction 0, 180 1, 110 0, 059 0, 002(Para)Medical −0, 202 0, 817 0, 078 0, 010Service −0, 150 0, 861 0, 105 0, 152Shopkeeper −0, 242 0, 785 0, 074 0, 001Locomotive 0, 036 1, 037 0, 048 0, 940Psychological −0, 154 0, 857 0, 069 0, 0025Digestive 0, 759 2, 136 0, 085 0, 000Symptoms 0, 010 1, 010 0, 041 0, 804Cancer 0, 638 1, 893 0, 101 0, 000Infections −0, 220 0, 803 0, 163 0, 177Injury −0, 003 0, 997 0, 052 0, 948Prev. state 2 − − − −Year 2003 - - - -Year 2004 - - - -Year 2005 - - - -Year 2006 - - - -Year 2007 - - - -Year 2008 - - - -Year 2009 - - - -Year 2010 - - - -Deferment B −0, 063 0, 939 0, 042 0, 136Deferment C −0, 099 0, 906 0, 053 0, 060Deferment D −0, 446 0, 640 0, 575 0, 439Amount insured - - - -Time in system −0, 041 0, 960 0, 002 0, 000

Table B.1: Estimation results for the logit model, transition 1→ 2.

Page 84: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX B. ESTIMATION RESULTS OF THE LOGIT MODEL 79

Coef Exp(Coef) se(Coef) p-ValueAge −0, 022 0, 978 0, 001 0, 000Female −0, 042 0, 959 0, 029 0, 145Agriculture 0, 048 1, 049 0, 026 0, 068Construction 0, 042 1, 043 0, 033 0, 206(Para)Medical 0, 184 1,202 0, 040 0, 000Service 0, 184 1, 202 0, 050 0, 003Shopkeeper −0, 044 0, 957 0, 038 0, 237Locomotive −0, 111 0, 895 0, 027 0, 000Psychological −0, 092 0, 912 0, 032 0, 004Digestive 0, 467 1, 595 0, 053 0, 000Symptoms 0, 172 1, 188 0, 022 0, 000Cancer −0, 108 0, 898 0, 075 0, 150Infections 0, 165 1, 179 0, 071 0, 020Injury 0, 512 1, 669 0, 028 0, 000Prev. state 2 0, 532 1, 702 0, 020 0, 000Year 2004 0, 003 1, 003 0, 037 0, 936Year 2005 0, 108 1, 114 0, 036 0, 003Year 2006 0, 061 1, 063 0, 036 0, 089Year 2007 0, 039 1, 040 0, 037 0, 286Year 2008 0, 015 1, 015 0, 036 0, 674Year 2009 0, 007 1, 007 0, 036 0, 851Year 2010 0, 030 1, 030 0, 039 0, 447Year 2011 −0, 022 0, 978 0, 072 0, 763Deferment B −0, 147 0, 834 0, 022 0, 000Deferment C −0, 405 0, 641 0, 029 0, 000Deferment D −0, 706 0, 463 0, 346 0, 042Amount insured - - - -Time in system −0, 126 0, 882 0, 002 0, 000

Table B.2: Estimation results for the logit model, transition 1→ 0.

Page 85: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX B. ESTIMATION RESULTS OF THE LOGIT MODEL 80

Coef Exp(Coef) se(Coef) p-ValueAge −0, 024 0, 976 0, 002 0, 000Female −0, 125 0, 882 0, 051 0, 014Agriculture −0, 080 0, 923 0, 043 0, 063Construction 0, 226 1, 254 0, 046 0, 000(Para)Medical 0, 049 1, 050 0, 071 0, 487Service 0, 139 1, 149 0, 082 0, 090Shopkeeper −0, 135 0, 874 0, 067 0, 045Locomotive −0, 301 0, 740 0, 042 0, 000Psychological −0, 829 0, 436 0, 060 0, 000Digestive 0, 628 1, 874 0, 064 0, 000Symptoms 0, 231 1, 260 0, 039 0, 000Cancer −1, 376 0, 253 0, 121 0, 000Infections 0, 052 1, 053 0, 124 0, 673Injury 0, 409 1, 505 0, 044 0, 000Prev. state 1 0, 151 1, 163 0, 054 0, 005Year 2004 −0, 067 0, 935 0, 057 0, 240Year 2005 −0, 075 0, 928 0, 057 0, 189Year 2006 −0, 165 0, 848 0, 058 0, 005Year 2007 −0, 085 0, 919 0, 058 0, 142Year 2008 −0, 260 0, 771 0, 058 0, 000Year 2009 −0, 219 0, 803 0, 057 0, 000Year 2010 −0, 164 0, 849 0, 061 0, 007Year 2011 −0, 347 0, 707 0, 107 0, 001Deferment B −0, 217 0, 805 0, 035 0, 000Deferment C −0, 376 0, 687 0, 051 0, 000Deferment D −0, 300 0, 741 0, 382 0, 433Amount insured - - - -Time in system −0, 180 0, 835 0, 006 0, 000

Table B.3: Estimation results for the logit model, transition 2→ 0.

Page 86: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX B. ESTIMATION RESULTS OF THE LOGIT MODEL 81

Coef Exp(Coef) se(Coef) p-ValueAge −0, 007 0, 993 0, 001 0, 000Female - - - -Agriculture 0, 317 1, 373 0, 031 0, 000Construction −0, 158 0, 854 0, 036 0, 000(Para)Medical 0, 239 1, 270 0, 048 0, 000Service 0, 032 1, 033 0, 062 0, 603Shopkeeper 0, 091 1, 095 0, 048 0, 058Locomotive 0, 293 1, 340 0, 034 0, 000Psychological −0, 131 0, 877 0, 041 0, 001Digestive 0, 124 1, 132 0, 057 0, 030Symptoms 0, 112 1, 119 0, 028 0, 000Cancer −0, 783 0, 457 0, 065 0, 000Infections 0, 308 1, 361 0, 096 0, 001Injury 0, 083 1, 087 0, 033 0, 011Prev. state 1 0, 610 1, 840 0, 032 0, 000Year 2004 0, 114 1, 121 0, 043 0, 008Year 2005 0, 170 1, 185 0, 044 0, 000Year 2006 0, 108 1, 114 0, 044 0, 013Year 2007 0, 201 1, 223 0, 044 0, 000Year 2008 0, 083 1, 087 0, 043 0, 054Year 2009 0, 023 1, 023 0, 044 0, 606Year 2010 0, 116 1, 123 0, 045 0, 010Year 2011 0, 061 1, 063 0, 073 0, 405Deferment B −0, 093 0, 911 0, 027 0, 001Deferment C −0, 118 0, 889 0, 034 0, 001Deferment D −1, 042 0, 353 0, 289 0, 000Amount insured - - - -Time in system −0, 064 0, 938 0, 002 0, 000

Table B.4: Estimation results for the logit model, transition 2→ 1.

B.1 Business cycle

Coefficient Exp(coef) p-valueDNB indicator −0, 030 0, 970 0, 012Confidence −0, 004 0, 996 0, 032GDP −0, 030 0, 970 0, 000Labour −0, 005 0, 995 0, 707Income −0, 012 0, 988 0, 117

Table B.5: Coefficient, hazard rates and p-values for the business cycle variables, transition1→ 2.

Page 87: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX B. ESTIMATION RESULTS OF THE LOGIT MODEL 82

Coefficient Exp(coef) p-valueDNB indicator 0, 020 1, 020 0, 006Confidence 0, 004 1, 004 0, 006GDP 0, 013 1, 013 0, 027Labour −0, 007 0, 993 0, 599Income 0, 003 1, 003 0, 686

Table B.6: Coefficient, hazard rates and p-values for the business cycle variables, transition1→ 0.

Coefficient Exp(coef) p-valueDNB indicator −0, 005 0, 995 0, 818Confidence 0, 004 1, 004 0, 224GDP 0, 013 1, 013 0, 321Labour −0, 003 0, 997 0, 388Income 0, 012 1, 012 0, 494

Table B.7: Coefficient, hazard rates and p-values for the business cycle variables, transition2→ 0.

Coefficient Exp(coef) p-valueDNB indicator −0, 017 0, 983 0, 165Confidence −0, 001 0, 999 0, 544GDP −0, 011 0, 989 0, 094Labour 0, 005 1, 005 0, 739Income 0, 004 1, 004 0, 631

Table B.8: Coefficient, hazard rates and p-values for the business cycle variables, transition2→ 1.

Page 88: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Appendix C

Business cycle

In this chapter the graphs of the significant business cycle-related variables are plotted for theyears 2003− 2011. It is noted that all three graphs show approximately the same patern, but ona different scale.

(a) DNB-indicator (b) Confidence

(c) GDP

Figure C.1: The values of the significant business cycle-related variables during the period 2003−2011.

83

Page 89: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Appendix D

Comparison of models

In this chapter it will be explained how different models can be compared. We have used this inchapter 4 and 5 to obtain the best fitting survival and logit model.

D.1 Comparing non-nested models

Models are non-nested if it is impossible to express one model as a constrained version of theother. Non-nested models may include the same predictor variables, but they may also involvesome variables that are unique to each model. Discriminating between nested models is possibleby using a standard hypothesis test of the parametric restriction that reduces one model to theother. In the non-nested case, however, alternative methods need to be developed. For examplewe can use information criteria such as the Akaike’s Information Criterion (AIC) or the BayesianInformation Criterion (BIC). These criteria are full sample criteria and they seek, in modelselection, to incorporate the divergent considerations of accuracy of estimation and the ’best’approximation to reality. The essential intuition behind all these criteria is that there existsa tension between model fit, as measured by the maximized log-likelihood, and the principleof parsimony that favors a simple model. The fit of the model can be improved by increasingmodel complexity, but parameters are only added if the resulting improvement in fit sufficientlycompensates for loss of parsimony. The different information criteria vary in how steeply theypenalize model complexity. Before we can introduce them, we first should explain the principleof maximum likelihood estimation.

D.1.1 Maximum likelihood

The principle of maximum likelihood (ML) is based on distributional assumptions about thedata. Suppose we have i random variables with observations yi (i = 1, ..., n) for a sample of nindependent subjects. The probability density function for yi is denoted by f(yi|θ), where θ isa parameter characterizing the distribution. The ML principle is an estimation principle thatfinds an estimate for one or more unknown parameters (such as θ) such that it maximizes thelikelihood of observing the data yi, i = 1, ..., n. The likelihood L of a model can be interpretedas the probability of the observed data y, given that model. The probability of observing yi isgiven by the density function f(yi|θ). Because the observations on the n subjects are assumedto be independent, the joint density function of all observations is the product of the densities:

L(θ) =

n∏i=1

f(yi|θ) (D.1)

84

Page 90: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

APPENDIX D. COMPARISON OF MODELS 85

This is called the likelihood and is a function of the unknown parameter θ. A certain parametervalue θ1 is more likely than another θ2, in light of the observed data, if it makes those data moreprobable. In that case one should have:

L(θ1|y) > L(θ2|y).

The expression for the likelihood can be simplified if its natural logarithm is taken, in which casethe product in D.1 is replaced by a sum:

l(θ) = ln

(n∏i=1

f(yi|θ)

)=

N∑i=1

ln f(yi|θ). (D.2)

Since the natural logarithm is a monotone increasing function, maximizing the log-likelihood D.2yields the same estimates as maximizing the likelihood D.1.

D.1.2 Information Criteria

Akaike (1974) proposed the Akaike’s Information Criterion as a simple model comparison cri-terion to compare model fit. It takes into account the number of regression coefficients beingtested; given equal fit of two models, the more parsimonious model (i.e. having fewer predic-tors) will have a better AIC fit index. This criterion is represented by the log-likelihood and anadditional term to penalize for lack of parsimony:

AIC = −2 lnL+ 2k,

where k is the number of parameters, including the intercept.

Schwarz (1978) criticized Akaike’s criterion as being asymptotically non-optimal. He proposed arevised form of the penalty function by introducing the Bayesian Information Criterion:

BIC = −2 lnL+ (lnN)k,

where N is the number of observations. The BIC may be negative or positive; the more negativethe value, the better the fit.

If model parsimony is important, then the BIC is more widely used since the model-size penaltyfor AIC is relatively low. Given their simplicity, penalized likelihood criteria are often used forselecting ’the best model’. However, there is no clear answer to which criterion, if any, should bepreferred. From a decision-theoretic point of view, the choice of the model from a set of modelsshould depend on the intended use of that model [8].

Page 91: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

Bibliography

[1] W.W.M. Abeysekera and M.R. Sooriyarachchi, Use of Schoenfeld’s Global Test to Test theProportional Hazards Assumption in the Cox Proportional Hazards Model: an Applicationto a Clinical Study, J. Natn. Sci. Foundation Sri Lanka 2009 37(1): 41-51.

[2] A. Agresti, Categorical Data Analysis, New York: John Wiley & Sons Inc, 2002.

[3] P.K. Andersen, O. Borgan, R.D. Gill and N. Keiding, Statistical Models Based on CountingProcesses, Springer Series in Statistics, Springer-Verlag, 1993.

[4] B.H. Baltagi, Econometric Analysis of Panel Data, John Wiley & Sons, Ltd, second edition,2001.

[5] P.D. Allison, Logistic Regression Using the SAS System: Theory and Application, SASInstitute Inc., 1999.

[6] P.D. Allison, Survival Analysis using SAS: A Practical Guide, SAS Institute Inc., USA,2010.

[7] R. Amelink, Disability Durations of Dutch Self-Employed; assessing how certain risk factors,particular variables related to the business cycle, influence disability durations of Dutch self-employed, Faculty of Economics and Business, Groningen, 2010.

[8] D. Beal, Information Criteria Methods in SAS for Multiple Linear Regression Models, Sci-ence Applications International Corporation, Oak Ridge, TN.

[9] G. J. van den Berg, Duration Models: Specification, Identification and Multiple Durations,Handboook of Econometrics, 2001.

[10] P. W. Bultena, Application of Claim Reserving in a Multiple State Model for DisabilityInsurance, Faculty of Economic and Business, Groningen, 2009.

[11] A. Cameron and P. Trivedi, Microeconometrics; methods and applications, Cambridge Uni-versity Press, 2005.

[12] J. Cohen, P. Cohen, S. West and L. Aiken, Applied Multiple Regression/Correlation Analysisfor the Behavioral Sciences, Lawrence Erlbaum Associates, Publisher, London, 2003.

[13] D.R. Cox and D. Oakes, Analysis of Survival Data, Chapman and Hall, London, New York,1984.

[14] D. Delong, G. Guirguis and Y. So, Efficient Computation of Subset Selection Probabilitieswith Application to Cox Regression, Biometrika 81: 607-611, 1994.

86

Page 92: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

BIBLIOGRAPHY 87

[15] J. De Ravin, The Management of Disability Income Claims, Australian Actuarial Journal,Volume IV Issue 4, The Institute of Actuaries of Australia, 1998.

[16] S.P. Jenkins, Survival Analyis, Lecture notes, University of Essex, 2005.

[17] J. Cramer, Logit Models From Economics and Other Fields, Cambridge University Press,2003.

[18] J. Cramer, The Logit Model for Economists, Routledge, Chapman and Hall Inc, 1991.

[19] R. F. Engle, Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics, Hand-book of Econometrics, Volume II, Elsevier Science Publishers BV, 1984.

[20] S. Haberman and E. Pitacco, Actuarial Models for Disability Insurance, Chapman & Hall,1999.

[21] Hair, Black, Babin, Anderson and Tatham, Multivariate Data Analysis, Pearson PrenticeHall, New Jersey, 2006.

[22] K. van Harn and P. Holewijn, Markov-ketens in diskrete tijd, Epsilon Uitgaven, Utrecht,1991.

[23] C. Heij, P. de Boer, P. H. Franses, T. Kloek and H. van Dijk, Econometric Methods withApplications in Business and Economics, Oxford University Press, 2004.

[24] T. Lancaster, The Econometric Analysis of Transition Data, Cambridge University Press,1990.

[25] P. Leeflang, D. Wittink, M. Wedel and P. Naert, Building Models for Marketing Decisions,Kluwer Academic Publishers, Dordrecht, 2000.

[26] K. Liang and S. Zeger, Longitudinal Data Analysis Using Generalized Linear Models, 1986,Biometrika 73 (1): 1322.

[27] M. Mills, Introducing Survival and Event History Analysis, Sage Publications Ltd, 2011.

[28] F. van Ruth, B. Schouten and R. Wekker, The Statistics Netherlands’ Business Cycle Tracer.Methodological Aspects; concept, cycle computation and indicator selection, 2005.

[29] D. Schoenfeld, Residuals for the proportional hazards regression model, Biometrika, 1982.

[30] D. Service and K. Ferris, Disability Experience and Economic Correlations, Institute ofActuaries of Australia Convention, 2001.

[31] H.J. Smoluk, Long-Term Disability Claims Rates and the Consumption-to-Wealth Ratio,The Journal of Risk and Insurance 2009, Vol. 76, No. 1, 109-131.

[32] L. Spierdijk and R. Koning, Sickness Absenteeism Among Self-Employed: Determinants ofReturn to Work, 2010.

[33] L. Spierdijk, et al., The Determinants of Sick Leave Durations of Dutch Self-Emplyed, J.Health Econ. 2009.

[34] T. Therneau and P. Grambsch, Modeling Survival Data: Extending the Cox Model,Springer-Verlag, 2000.

[35] J.M. Wooldridge, Introductory Economics; a modern approach, 2008.

Page 93: Disability Income Insurancethe data and background information needed for this thesis. Last, but certainly not least, I Last, but certainly not least, I would like to express my thanks

BIBLIOGRAPHY 88

[36] D. Zhang, Analysis of Survival Data, Lecture notes, Department of Statistics, North CarolinaState University, 2005.

[37] http://www.ats.ucla.edu/stat/sas/seminars/sas logistic/logistic1.htm.

[38] http://www.cbs.nl/.

[39] http://www.dnb.nl/en/onderzoek-2/dnb238497.jsp.