applications of copulas

37
1 UNDERSTANDING RELATIONSHIPS USING COPULAS * Edward W. Frees ² and Emiliano A. Valdez ABSTRACT This article introduces actuaries to the concept of ‘‘copulas,’’ a tool for understanding relation- ships among multivariate outcomes. A copula is a function that links univariate marginals to their full multivariate distribution. Copulas were introduced in 1959 in the context of probabilistic metric spaces. The literature on the statistical properties and applications of copulas has been developing rapidly in recent years. This article explores some of these practical applications, in- cluding estimation of joint life mortality and multidecrement models. In addition, we describe basic properties of copulas, their relationships to measures of dependence, and several families of copulas that have appeared in the literature. An annotated bibliography provides a resource for researchers and practitioners who wish to continue their study of copulas. For those who wish to use copulas for statistical inference, we illustrate statistical inference procedures by using in- surance company data on losses and expenses. For these data, we (1) show how to fit copulas and (2) describe their usefulness by pricing a reinsurance contract and estimating expenses for pre-specified losses. 1. INTRODUCTION As emphasized in ‘‘General Principles of Actuarial Sci- ence’’ (Committee on Actuarial Principles 1997), ac- tuaries strive to understand stochastic outcomes of financial security systems. Because these systems are generally complex, outcomes are measured in several dimensions. Describing relationships among different dimensions of an outcome is a basic actuarial tech- nique for explaining the behavior of financial security systems to concerned business and public policy decision-makers. This article introduces the concept of a ‘‘copula’’ function as a tool for relating different dimensions of an outcome. Understanding relationships among multivariate outcomes is a basic problem in statistical science; it is not specific to actuarial science nor is it new. In the late nineteenth century, Sir Francis Galton made *This paper was originally presented at the 32nd Actuarial Research Conference, held August 6–8, 1997, at the University of Calgary, Calgary, Alberta, Canada. ²Edward W. (Jed) Frees, F.S.A., Ph.D., is Time Insurance Professor of Actuarial Science, School of Business, University of Wisconsin–Mad- ison, 975 University Avenue, Madison, Wisconsin 53706. ‡Emiliano A. Valdez, F.S.A., is a doctoral student at the Actuarial Science Risk Management and Insurance Department, School of Business, University of Wisconsin–Madison, 975 University Ave., Madison, Wisconsin 53706-1323. a fundamental contribution to understanding multi- variate relationships with his introduction of regres- sion analysis. In one dataset described in his 1885 presidential address to the anthropological section of the British Association of the Advancement of Sci- ences, Galton linked the distribution of heights of adult children to the distribution of their parents’ heights. Galton showed not only that each distribu- tion was approximately normal but also that the joint distribution could be described as a bivariate normal. Thus, the conditional distribution of adult children’s height, given the parents’ height, could also be de- scribed by using a normal distribution. As a by-prod- uct of his analysis, Galton observed that ‘‘tall parents tend to have tall children although not as tall as the parents’’ (and vice versa for short children). From this, he incorrectly inferred that children would ‘‘re- gress to mediocrity’’ in subsequent generations, hence suggesting the term that has become known as re- gression analysis. See Seal (1967) and Stigler (1986) for additional accounts of the works of Galton and other early contributors to statistical science. Regression analysis has developed into the most widely applied statistical methodology; see, for ex- ample, Frees (1996) for an introduction. It is an im- portant component of multivariate analysis because it allows researchers to focus on the effects of explana- tory variables. To illustrate, in the Galton dataset of family heights, regression allows the analyst to

Upload: lara-nym

Post on 21-Apr-2015

74 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Applications of Copulas

1

UNDERSTANDING RELATIONSHIPS USING COPULAS*

Edward W. Frees† and Emiliano A. Valdez‡

ABSTRACT

This article introduces actuaries to the concept of ‘‘copulas,’’ a tool for understanding relation-ships among multivariate outcomes. A copula is a function that links univariate marginals to theirfull multivariate distribution. Copulas were introduced in 1959 in the context of probabilisticmetric spaces. The literature on the statistical properties and applications of copulas has beendeveloping rapidly in recent years. This article explores some of these practical applications, in-cluding estimation of joint life mortality and multidecrement models. In addition, we describebasic properties of copulas, their relationships to measures of dependence, and several familiesof copulas that have appeared in the literature. An annotated bibliography provides a resourcefor researchers and practitioners who wish to continue their study of copulas. For those who wishto use copulas for statistical inference, we illustrate statistical inference procedures by using in-surance company data on losses and expenses. For these data, we (1) show how to fit copulasand (2) describe their usefulness by pricing a reinsurance contract and estimating expenses forpre-specified losses.

1. INTRODUCTION

As emphasized in ‘‘General Principles of Actuarial Sci-ence’’ (Committee on Actuarial Principles 1997), ac-tuaries strive to understand stochastic outcomes offinancial security systems. Because these systems aregenerally complex, outcomes are measured in severaldimensions. Describing relationships among differentdimensions of an outcome is a basic actuarial tech-nique for explaining the behavior of financial securitysystems to concerned business and public policydecision-makers. This article introduces the conceptof a ‘‘copula’’ function as a tool for relating differentdimensions of an outcome.

Understanding relationships among multivariateoutcomes is a basic problem in statistical science; itis not specific to actuarial science nor is it new. Inthe late nineteenth century, Sir Francis Galton made

*This paper was originally presented at the 32nd Actuarial ResearchConference, held August 6–8, 1997, at the University of Calgary,Calgary, Alberta, Canada.

†Edward W. (Jed) Frees, F.S.A., Ph.D., is Time Insurance Professor ofActuarial Science, School of Business, University of Wisconsin–Mad-ison, 975 University Avenue, Madison, Wisconsin 53706.

‡Emiliano A. Valdez, F.S.A., is a doctoral student at the ActuarialScience Risk Management and Insurance Department, School ofBusiness, University of Wisconsin–Madison, 975 University Ave.,Madison, Wisconsin 53706-1323.

a fundamental contribution to understanding multi-variate relationships with his introduction of regres-sion analysis. In one dataset described in his 1885presidential address to the anthropological section ofthe British Association of the Advancement of Sci-ences, Galton linked the distribution of heights ofadult children to the distribution of their parents’heights. Galton showed not only that each distribu-tion was approximately normal but also that the jointdistribution could be described as a bivariate normal.Thus, the conditional distribution of adult children’sheight, given the parents’ height, could also be de-scribed by using a normal distribution. As a by-prod-uct of his analysis, Galton observed that ‘‘tall parentstend to have tall children although not as tall as theparents’’ (and vice versa for short children). Fromthis, he incorrectly inferred that children would ‘‘re-gress to mediocrity’’ in subsequent generations, hencesuggesting the term that has become known as re-gression analysis. See Seal (1967) and Stigler (1986)for additional accounts of the works of Galton andother early contributors to statistical science.

Regression analysis has developed into the mostwidely applied statistical methodology; see, for ex-ample, Frees (1996) for an introduction. It is an im-portant component of multivariate analysis because itallows researchers to focus on the effects of explana-tory variables. To illustrate, in the Galton dataset offamily heights, regression allows the analyst to

Page 2: Applications of Copulas

2 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

describe the effect of parents’ height on a child’s adultheight. Regression analysis also is widely applied inactuarial science; as evidence, it is a required edu-cational component of the two main actuarial bodiesin the U.S. and Canada, the Society of Actuaries andthe Casualty Actuarial Society.

Although widely applicable, regression analysis islimited by the basic setup that requires the analyst toidentify one dimension of the outcome as the primarymeasure of interest (the dependent variable) andother dimensions as supporting or ‘‘explaining’’ thisvariable (the independent variables). This article ex-amines problems in which this relationship is not ofprimary interest; hence, we focus on the more basicproblem of understanding the distribution of severaloutcomes, a multivariate distribution. For an exam-ple in actuarial science, when two lives are subject tofailure, such as under a joint life insurance or annuitypolicy, we are concerned with joint distribution oflifetimes. As another example, when we simulate thedistribution of a scenario that arises out of a financialsecurity system, we need to understand the distribu-tion of several variables interacting simultaneously,not in isolation of one another.

The normal distribution has long dominated thestudy of multivariate distributions. For example, lead-ing references on multivariate analysis, such as Ander-son (1958) and Johnson and Wichern (1988), focusexclusively on the multivariate normal and related dis-tributions that can be derived from normal distribu-tions, including multivariate extensions of Student’s t-and Fisher’s F-distributions. Multivariate normal dis-tributions are appealing because the marginal distri-butions are also normal. For example, in the Galtondataset, the distribution of adult children’s height andthe distribution of parents’ height are each approxi-mately normal, in isolation of the other. Multivariatenormal distributions are also appealing because the as-sociation between any two random outcomes can befully described knowing only (1) the marginal distri-butions and (2) one additional parameter, the corre-lation coefficient.

More recent texts on multivariate analysis, such asKrzanowski (1988), have begun to recognize theneed for examining alternatives to the normal distri-bution setup. This is certainly true for actuarial sci-ence applications such as for lifetime randomvariables (Bowers et al. 1997, Chap. 3) and long-tailed claims variables (Hogg and Klugman 1984),where the normal distribution does not provide anadequate approximation to many datasets. An exten-sive literature in statistics deals with nonnormal

multivariate distributions; see, for example, Johnsonand Kotz (1973) and Johnson, Kotz and Balakrishnan(1997). However, historically many multivariate dis-tributions have been developed as immediate exten-sions of univariate distributions, examples being thebivariate Pareto, bivariate gamma, and so on. Thedrawbacks of these types of distributions are that (1)a different family is needed for each marginal distri-bution, (2) extensions to more than just the bivariatecase are not clear, and (3) measures of associationoften appear in the marginal distributions.

A construction of multivariate distributions thatdoes not suffer from these drawbacks is based on thecopula function. To define a copula, begin as youmight in a simulation study by considering p uniform(on the unit interval) random variables, u1, u2, . . . ,up. Here, p is the number of outcomes that you wishto understand. Unlike many simulation applications,we do not assume that u1, u2 , . . . , up are indepen-dent; yet they may be related. This relationship is de-scribed through their joint distribution function

C u , u , . . . , u~ !1 2 p

5 Prob U ≤ u , U ≤ u , . . . , U ≤ u~ 1 1 2 2 p p .!Here, we call the function C a copula. Further, U isa (ex-ante) uniform random variable, whereas u is thecorresponding (ex-post) realization. To complete theconstruction, we select arbitrary marginal distribu-tion functions F1(x1), F2(x2), . . . , Fp(xp). Then, thefunction

C F x , F x , . . . , F x~ ! ~ ! ~ !@ #1 1 2 2 p p

5 F x , x , . . . , x (1.1)~ !1 2 p

defines a multivariate distribution function, evaluatedat x1, x2, . . . , xp , with marginal distributions F1, F2,. . . , Fp .

With copula construction in Equation (1.1), we se-lect different marginals for each outcome. For exam-ple, suppose we are considering modeling male andfemale lifetimes for a joint-life annuity product. Then,with p52, we might choose the Gompertz distributionto represent mortality at the older ages, yet with dif-ferent parameters to reflect gender differences inmortality. As another example, in Section 4, we con-sider a bivariate outcome associated with the loss andthe expense associated with administering a propertyand casualty claim. There, we could elect to use alognormal distribution for expenses and a longer taildistribution, such as Pareto, for losses associated with

Page 3: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 3

1 F(x ,x ) 5 Prob(X ≤ x , X ≤ x ) 5 1 2 Prob(X . x ) 2 Prob(X . x ) 1 Prob(X . x , X . x )1 2 1 1 2 2 1 1 2 2 1 1 2 2

5 1 2 exp(2lx )(1 2 H (x )) 2 exp(2lx )(1 2 H (x )) 1 exp(2lmin(x , x ))(1 2 H (x ))(1 2 H (x ))1 1 1 2 2 2 1 2 1 1 2 2

5 1 2 (1 2 F (x )) 2 (1 2 F (x )) 1 exp(2l min(x , x )) exp(l(x 1 x ))(1 2 F (x ))(1 2 F (x )).1 1 2 2 1 2 1 2 1 1 2 2

the claim. The copula construction does not constrainthe choice of marginal distributions.

In Section 2 we see that the copula method for un-derstanding multivariate distributions has a relativelyshort history in the statistics literature; most of thestatistical applications have arisen in the last tenyears. However, copulas have been studied in theprobability literature for about 40 years (Schweizer1991), and thus several desirable properties of copu-las are now widely known. To begin, it is easy tocheck from the construction in Equation (1.1) that Fis a multivariate distribution function. Sklar (1959)established the converse. He showed that any multi-variate distribution function F can be written in theform of Equation (1.1), that is, using a copula repre-sentation. Sklar also showed that if the marginal dis-tributions are continuous, then there is a uniquecopula representation. In this sense copulas providea unifying theme for our study of multivariate distri-butions. Sections 3 and 5 describe other desirableproperties of copulas.

Given that copulas are fundamental building blocksfor studying multivariate distributions, we now turnto the question of how to build a copula function fora problem at hand. Despite Sklar’s result that a copulafunction always exists, Example 1.1 shows that it isnot always convenient to identify the copula. Example1.2 illustrates a useful way of building a copula, usingthe method of compounding. We describe this methodof constructing copulas in detail in Section 3.1.

Example 1.1 Marshall-Olkin (1967) ExponentialShock Model

Suppose that we wish to model p52 lifetimes that wesuspect are subject to some common disaster, or‘‘shock,’’ that may induce a dependency between thelives. For simplicity, let us assume that Y1 and Y2 aretwo independent (underlying) lifetimes with distri-bution functions H1 and H2. We further assume thereexists an independent exponential random variable Zwith parameter l that represents the time until com-mon disaster. Both lives are subject to the same dis-aster, so that actual ages-at-death are represented byX15min(Y1, Z) and X25min(Y2, Z). Thus, the mar-ginal distributions are

Prob X ≤ x 5 F x~ !~ !j j j j

5 1 2 exp 2lx 1 2 H x , for j 5 1, 2.~ ! ~ !~ !j j j

Basic calculations show that the joint distribution is1

F x , x 5 F x 1 F x 2 1~ ! ~ ! ~ !1 2 1 1 2 2

1 exp lmax x ,x 1 2 F x 1 2 F x .~ ! ~ ! ~ !~ !~ !~ !1 2 1 1 2 2

This expression, although intuitively appealing, is notin the form of the copula construction (1.1) becausethe joint distribution function F is not a function ofthe marginals F1(x1) and F2(x2). For further discus-sions in the actuarial literature of this bivariate dis-tribution, see Frees (1996) and Bowers et al. (1997,Sec. 9.6).

Example 1.2 Bivariate Pareto Model

Consider a claims random variable X that, given a riskclassification parameter g, can be modeled as an ex-ponential distribution; that is,

2gxProb X≤x |g 5 1 2 e .~ !

As is well known in credibility theory (see, for exam-ple, Klugman et al. 1997), if g has a gamma distribu-tion, then the marginal distribution (over all riskclasses) of X is Pareto. That is, if g is gamma(a,l), then

F(x) 5 Prob X ≤ x~ !la

a21 2lg5 * Prob X ≤ x |g g e dg~ !G l~ !

la2gx a21 2lg5 1 2 * e g e dg

G l~ !2a5 1 2 1 1 x /l , (1.2)~ !

a Pareto distribution.Suppose, conditional on the risk class g, that X1 and

X2 are independent and identically distributed. As-suming that they come from the same risk class ginduces a dependency. The joint distribution is2

Page 4: Applications of Copulas

4 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

2 F(x ,x ) 5 1 2 Prob(X . x ) 2 Prob(X . x ) 1 Prob(X . x , X . x )1 2 1 1 2 2 1 1 2 2

2a 2alx x a1 2 a21 2lg5 1 2 1 1 2 1 1 1 * Prob(X . x |g) Prob(X . x |g) g e dg1 1 2 2~ ! ~ !l l G(l)

2a 2alx x a1 2 2gx 2gx a21 2lg1 25 1 2 1 1 2 1 1 1 * e e g e dg~ ! ~ !l l G(l)

2a 2a 2ax x x 1 x1 2 1 25 1 2 1 1 2 1 1 1 1 1 .@ #~ ! ~ !l l l

F x ,x 5 F x 1 F x~ ! ~ ! ~ !1 2 1 1 2 2

21/a2 1 1 1 2 F x~ !@~ !1 1

2a21/a .1 1 2 F x 2 1~ ! #~ !2 2

This yields the copula function

C u , u 5 u 1 u 2 1~ !1 2 1 2

2a21/a 21/a .1 1 2 u 1 1 2 u 21@ #~ ! ~ !1 2 (1.3)

With this function, we can express the bivariate distri-bution function as H(x , x )5C(F (x ), F (x )).1 2 1 1 2 2

Alternatively, we can consider the copula

C u , u 5 C 1 2 u , 1 2 u~ ! ~ !* 1 2 1 2

2a21/a 21/a5 u 1 u 2 1 2 1~ !1 2

and express the joint survival function as Prob(X1.x1,X2.x2) 5 C*(S1(x1), S2(x2)), where S512F. Becauseour motivating examples in Section 2 concern lifetime(positive) random variables, we often find it intuitivelyappealing to work with survival in lieu of distributionfunctions.

Several methods are available for constructing multi-variate distributions; see Hougaard (1987) andHutchinson and Lai (1990) for detailed reviews. Ex-ample 1.1 illustrates the so-called ‘‘variables-in-com-mon’’ technique in which a common element serves toinduce dependencies among several random variables.This article focuses on the compounding method illus-trated in Example 1.2 for two reasons. First, there is along history of using compound distributions for riskclassification in the actuarial science literature, partic-ularly within the credibility framework. Second, Mar-shall and Olkin (1988) showed that compounding canbe used to generate several important families of cop-ulas. Additional discussion of this point appears in Sec-tions 2 and 3.

Examples 1.1 and 1.2 each describe bivariate distri-butions through probabilistic interpretations of ran-dom quantities. It is also useful to explore (in Section

3) a class of functions called ‘‘Archimedean copulas,’’which arise from the mathematical theory of associa-tivity. An important special case of this class, due toFrank (1979), is

au av1 (e 2 1)(e 2 1)C u, v 5 ln 1 1 . (1.4)~ ! a~ !a e 2 1

Although Frank’s copula does not appear to have a nat-ural probabilistic interpretation, its other desirableproperties make it well suited for empirical applica-tions (Nelson 1986 and Genest 1987).

The purpose of this paper is to introduce copulas,their characteristics and properties, and their appli-cability to specific situations. Section 2 reviews empir-ical applications of copulas in analyzing survival ofmultiple lives and competing risks. Both are familiartopics to actuaries. Section 3 discusses properties andcharacteristics of copulas. In particular, we show (1)how to specify a copula, (2) how the association struc-ture of copulas can be summarized in terms of familiarmeasures of dependence, and (3) how simulation ofmultivariate outcomes can be easily accomplishedwhen the distribution is expressed as a copula. Section4 provides an illustration of fitting a copula to insur-ance company losses and expenses. Section 5 reviewsadditional applications of copulas. We conclude in Sec-tion 6.

2. EMPIRICAL APPLICATIONS

Copulas are useful for examining the dependencestructure of multivariate random vectors. In this sec-tion, we describe two biological science subject areasthat are related to actuarial science and that have usedcopulas to understand empirical relationships amongmultivariate observations.

2.1 Survival of Multiple Lives

In epidemiological and actuarial studies, it is often ofinterest to examine the joint mortality pattern ofgroups of more than a single individual. This group

Page 5: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 5

could be, for example, a husband and wife, a familywith children, or twins (identical or nonidentical).There is strong empirical evidence that supports thedependence of mortality on pairs of individuals. Forexample, statistical analyses of mortality patterns ofmarried couples are frequently made to test the ‘‘bro-ken heart’’ syndrome. Using a dataset consisting of4,486 55-year-old widowers, Parkes et al. (1969)showed that there is a 40% increase in mortality amongthe widowers during the first few months after thedeath of their wives; see also Ward (1976). Intuitively,pairs of individuals exhibit dependence in mortality be-cause they share common risk factors. These factorsmay be purely genetic, as in the case of twins, or en-vironmental, as in the case of a married couple.

The first application of copulas in joint-life modelsarose indirectly through the work of Clayton (1978) inhis study of bivariate life tables of fathers and sons.Clayton developed the bivariate distribution functiongiven in Equation (1.3) as the solution of a second-order partial differential equation. Clayton also pointedout the random effects interpretation of the model thatwas subsequently developed by Oakes (1982). See alsoCook and Johnson (1981).

Random effects models are important in biologicaland epidemiological studies because they provide amethod of modeling heterogeneity. A random effectsmodel particularly suited for multivariate survival anal-ysis is the frailty model, due to Vaupel, Manton andStallard (1979) and Hougaard (1984). To describefrailty models, we first introduce some notation. In sur-vival analysis, it is customary to consider the comple-ment of the distribution function, the survival function,and the negative derivative of its logarithmic trans-form, the hazard function. Thus, for a continuous ran-dom survival time T, we define

S(t) 5 Prob(T . t) 5 1 2 F(t)

and

] ln S(t) f(t)h(t) 5 2 5 .

]t S(t)

Actuaries know the hazard function h(t) as the forceof mortality (see, for example, Bowers et al. 1997,Chap. 3).

To understand explanatory variables Z in survivalanalysis, we can use the Cox (1972) proportional haz-ards model, which represents the hazard function as

bZh(t, Z) 5 e b(t),

where b(t) is the so-called ‘‘baseline’’ hazard functionand b is a vector of regression parameters. It is

proportional in the sense that all the informationcontained in the explanatory variables is in the mul-tiplicative factor g5ebZ. Integrating and exponentiat-ing the negative hazard, we can also express Cox’sproportional hazard model as

tgS(t |g) 5 exp 2 * h(s, Z)ds 5 B(t) .~ !0

Here,

tB(t) 5 exp 2 * b(s)ds~ !0

is the survival function corresponding to the baselinehazard. Frailty models arise when Z, and hence g, isunobserved. The factor g is called a frailty becauselarger values of g imply a smaller survival function,S(t |g), indicating poorer survival. As demonstratedin Example 1.2, the marginal distribution for a singlelife T is obtained by taking expectations over the po-tential values of g; that is, S(t)5EgS(t |g).

Oakes (1989, 1994) described how frailties can beused to model the dependencies among multiplelives. Other studies, such as Jagger and Sutton(1991), used a Cox regression model with known ex-planatory variables Z to account for the dependen-cies among multiple lives. Multivariate frailty modelsare obtained when the investigator does not wish toattribute, or does not have knowledge of, specificcharacteristics that may induce dependencies. Formultivariate frailty models, we assume that ‘‘p’’ livesT1, T2, . . . , Tp are independent given the frailty g.That is,

Prob T . t , . . . , T . t |g~ !1 1 p p

5 Prob T . t |g z z z Prob T . t |g~ ! ~ !1 1 p p

5 S t |g z z z S t |g~ ! ~ !1 1 p p

g g5 B (t ) z z z B t .~ !1 1 p p

The joint multivariate survival function is defined as

Prob T . t , . . . , T . t~ !1 1 p p

g

5 E B t z z z B t . (2.1)~ ! ~ !g 1 1 p p$ %

Example 2.1 Hougaard’s Copula Family

To illustrate, an important frailty model was givenby Hougaard (1986), who assumed that the distri-bution of g could be modeled as a positive ‘‘stable dis-tribution’’ with Laplace transform Ege 5exp(2sa)2sg

Page 6: Applications of Copulas

6 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

and parameter a. Recall that the Laplace transform ofa positive random variable g is defined by

2sg 2stt(s) 5 E e 5 * e dG (t),g g

where Gg is the distribution function of g. This is alsothe moment generating function evaluated at -s; thus,knowledge of t(s) determines the distribution.

With a positive stable distribution for g, using Equa-tion (2.1) we have

Prob T . t , . . . , T . t~ !1 1 p p

5 E exp g ln B (t ) z z z B (t )$ %~ !g 1 1 p p

a5 exp 2 2ln B (t ) – – ln B (t ) .z z z$ %~ !1 1 p p

Because

aS (t ) 5 exp 2 2ln B (t ) ,$ %~ !i i i i

we can write the joint survival function as

Prob T . t , . . . , T . t~ !1 1 p p

a1/a 1/a

5 exp 2 2 ln S (t ) 1 1 2 ln S (t ) ,z z z@ #$ % $ %~ 1 1 p p !(2.2)

a copula expression. In particular, for bivariate life-times with p52, Hougaard proposed examining Wei-bull marginals so that biB (t)5exp 2a t and S (t |g)5~ !i i i

This yields the bivariate survivor functionbiexp 2a gt .~ !i

Prob T . t , T . t~ !1 1 2 2

ab b1 25 exp 2 a t 1 a t . (2.3)@ #~ 1 1 2 2 !

This is desirable in the sense that both the conditionaland marginal distributions are Weibull.

Equations (1.2) of Example 1.2 and (2.2) of Example2.1 show that these frailty models can be written ascopulas. Marshall and Olkin (1988) showed that theseare special cases of a more general result; they dem-onstrated that all frailty models of the form in Equation(2.1) can be easily written as copulas. Further, the cop-ula form is a special type called an Archimedean cop-ula, which we will introduce in Section 3.

In addition to the Clayton and Oakes studies, otherworks have investigated the use of copula models instudying behavior of multiple lives. Hougaard et al.(1992) analyzed the joint survival of Danish twins born

between 1881 and 1930. They use the frailty modelarising from a positive stable distribution as well asCox’s proportional hazard model. Frees et al. (1995)investigated mortality of annuitants in joint- and last-survivor annuity contracts using Frank’s copula(Equation 1.3). They found that accounting for depen-dency in mortality produced approximately a 3% to 5%reduction in annuity values when compared to stan-dard models that assume independence.

2.2 Competing Risks—Multiple DecrementTheory

The subject of competing risks deals with the study ofthe lifetime distribution of a system subject to severalcompeting causes; this subject is called multiple dec-rement theory in actuarial science (see, for example,Bowers et al. 1997, Chap. 10 and 11). The problem ofcompeting risks arises in survival analysis, systems re-liability, and medical studies as well as in actuarial sci-ence. For example, a person dies because of one ofseveral possible causes: cancer, heart disease, acci-dent, and so on. As yet another example, a mechanicaldevice fails because a component fails. For mathemat-ical convenience, the general framework begins withan unobserved multivariate lifetime vector (T1, T2, . . . ,Tp); each element in the vector denotes the lifetimedue to one of p competing causes. The quantities typ-ically observed are T5min(T1, T2, . . . , Tp) and thecause of failure J. To illustrate, in life insurance, Tusually denotes the lifetime of the insured individualand J denotes the cause of death such as cancer oraccidental death. Several texts lay the foundation ofthe theory of competing risks. For example, see Bow-ers et al. (1997), Cox and Oakes (1984), David andMoeschberger (1978), and Elandt-Johnson and John-son (1980).

When formulating the competing risk model, it isoften assumed that the component lifetimes Ti arestatistically independent. With independence, themodel is easily tractable and avoids the problem ofidentifiability encountered in inference. However,many authors, practitioners, and academicians rec-ognize that this assumption is not practical, realistic,or reasonable; see Carriere (1994), Makeham (1874),and Seal (1977).

To account for dependence in competing risk mod-els, one general approach is to apply copulas. In par-ticular, the frailty model seems well suited for handlingcompeting risks. Assuming that causes of death are in-dependent given a frailty g, we have

Page 7: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 7

TABLE 1ARCHIMEDEAN COPULAS AND THEIR GENERATORS

Family Generator f(t)Dependence

Parameter (a) Space Bivariate Copula Cf(u,v)

Independence 2 ln t Not applicable uv

Clayton (1978), Cook-Johnson (1981),Oakes (1982)

t 2a 2 1 a . 1 21/a2a 2au 1 v 2 1~ !

Gumbel (1960), Hougaard (1986) a

2 ln t~ !a ≥ 1

exp1/aa a

2 2 ln u 1 2 ln v! ~ !~@ #$ %Frank (1979)

lnate 2 1ae 2 1

2 ` , a , ` au av1 (e 2 1)(e 2 1)ln 1 1~ !aa e 2 1

Prob T . t |g 5 Prob min T , . . . , T . t~ ! ~ !~ !1 p

5 Prob T . t |g z z z Prob T . t |g~ ! ~ !1 p

g g5 B (t) z z z B (t) .1 p

Thus, similar to Equation (2.1), the overall survivalfunction is

g

Prob T . t 5 E B (t) z z z B (t) . (2.4)~ ! g 1 p$ %

Example 2.1 (Continued)

For a positive stable distribution for g, the survivalfunction is

1/a

Prob T . t 5 exp 2 2 ln S (t)$ %~ ! @~ 1

a1/a

1 1 2 ln S (t) ,z z z $ % # !p

similar to Equation (2.2). For bivariate lifetimes withWeibull marginals, we have

ab b1 2Prob T . t 5 exp 2 a t 1 a t .~ ! @ #~ !1 2

There have been several applications of frailty mod-els for studying competing risk situations. Oakes(1989) discussed the number of cycles of two che-motherapy regimes tolerated by 109 cancer patients.Shih and Louis (1995) analyzed HIV-infected patientsby using Clayton’s family, positive stable frailties, aswell as Frank’s copula. Zheng and Klein (1995) con-sidered data from a clinical trial of patients with non-Hodgkin’s lymphoma using gamma copula (as inClayton’s family). In a nonbiological context, Hou-gaard (1987) described how dependent competingrisk models using positive stable copulas can be usedto assess machine failure.

3. PROPERTIES OF COPULAS

This section discusses several properties and charac-teristics of copulas, specifically (1) how to generatecopulas, (2) how copulas can summarize associationbetween random variables, and (3) how to simulatecopula distributions.

3.1 Specifying Copulas: Archimedean andCompounding Approaches

Copulas provide a general structure for modeling mul-tivariate distributions. The two main methods forspecifying a family of copulas are the Archimedeanapproach and the compounding approach, the latterillustrated in Example 1.2.

The Archimedean representation allows us to re-duce the study of a multivariate copula to a singleunivariate function. For simplicity, we first considerbivariate copulas so that p52. Assume that f is a con-vex, decreasing function with domain (0, 1] and range[0, `) such that f(1)50. Use f21 for the inverse func-tion of f. Then the function

21C (u,v) 5 f f(u) 1 f(v) for u, v [ (0, 1]~ !f

is said to be an Archimedean copula. We call f a gen-erator of the copula Cf. Genest and McKay (1986a,1986b) give proofs of several basic properties of Cf,including the fact that it is a distribution function. Asseen in Table 1, different choices of generator yieldseveral important families of copulas. A generatoruniquely determines (up to a scalar multiple) an Ar-chimedean copula. Thus, this representation helpsidentify the copula form. This point is further devel-oped in Section 3.2.

Examples 1.2 and 2.1 show that compound distri-butions can be used to generate copulas of interest.

Page 8: Applications of Copulas

8 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

TABLE 2ARCHIMEDEAN GENERATORS AND THEIR INVERSES

Family Generator f(t)

Inverse Generator(Laplace Transform)

t (s) 5 f21 (s)Laplace Transform

Distribution

Independence 2 ln t exp (2s) Degenerate

Clayton (1978), Cook-Johnson (1981),Oakes (1982)

t 2a 2 1 (1 1 s)21/a Gamma

Gumbel (1960), Hougaard (1986) (2 ln t)a exp (2s 1/a) Positive stable

Frank (1979)ln

ate 2 1ae 2 1

a21 ln s a[1 1 e (e 2 1)] Logarithmic series distributionon the positive integers

These examples are special cases of a general methodfor constructing copulas due to Marshall and Olkin(1988). To describe this method, suppose that Xi is arandom variable whose conditional, given a positivelatent variable gi, distribution function is specified byHi(x |gi)5Hi(x) , where Hi(z) is some baseline distri-gi

bution function, for i51, . . . , p. Marshall and Olkinconsidered multivariate distribution functions of theform

g g1 pF(x , x , . . . , x ) 5 EK H (x ) , . . . , H (x ) .~ !1 2 p 1 1 p p

Here, K is a distribution function with uniform mar-ginals and the expectation is over g1, g2, . . . , gp. Asa special case, take all latent variables equal to oneanother so that g15g25. . .5gp5g and use the dis-tribution function corresponding to independent mar-ginals. Marshall and Olkin (1988) showed that

F x , x , . . . , x )1 2 p

g g5 E H (x ) z z z H (x ) !~g 1 1 p p

21 215 t t F (x ) 1 z z z 1 t F (x ) (3.2)~ !$ % $ %1 1 p p

where Fi is the i-th marginal distribution of F and t(z)is the Laplace transform of g, defined by t(s)5Ege2sg.

Laplace transforms have well-defined inverses. Thus,from Equation (3.2), we see that the inverse functiont21 serves as the generator for an Archimedean copula.In this sense, Equation (3.2) provides a probabilisticinterpretation of generators. To illustrate, Table 2 pro-vides the inverse Laplace transform for the generatorslisted in Table 1. Here, we see how well-known distri-butions can be used to generate compound distribu-tions. Because generators are defined uniquely only upto scalar multiple, any positive constant in the family

of Laplace transforms determines the same class ofgenerators. (Indeed, this methodology suggests newcopula families.) Thus, the inverse of a Laplace trans-form represents an important type of generator forArchimedean copulas.

To summarize, assume that X1, X2, . . . , Xp are con-ditionally, given g, independent with distributionfunctions Hi(x)g. Then, the multivariate distributionis given by the copula form with the generator beingthe inverse of the Laplace transform of the latent var-iable g. Because of the form of the conditional distri-bution, we follow Joe (1997) and call this a mixtureof powers distribution. We remark that

t 2 ln H (x) 5 E exp g 5 F (x)@ # 2 2 ln H (x)i g i@ #$ %i

so that

21H (x) 5 exp .2 t F (x)i @ #$ %i

This provides a way of specifying the baseline func-tion given the marginal distribution and the distribu-tion of the latent variable.

For the applications involving lifetimes in Section2, we found it natural to discuss distributions in termsof survival functions. Marshall and Olkin (1988)pointed out that the construction as in Equation (3.2)could also be used for survivor functions. That is, withthe frailty model Prob and condi-gT .t |g 5B (t)~ !i i

tional independence of T1, T2, . . . , Tp, from Equation(2.1), we have

Prob T . t , . . . , T . t~ !1 1 p p

g

5 E B (t ) z z z B (t )g 1 1 p p$ %215 t t S (x )~ $ %1 1

211 z z z 1 t S (x ) (3.3)!$ %p p

Page 9: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 9

As before,

t 2 ln B (t) 5 S (t) 5 1 2 F (t)@ #i i i

and

21B (t) 5 exp 2 t S (t) .i i@ #$ %If the mixing distribution remains the same, the La-place transform and hence the generator remain thesame. However, the two constructions yield differentmultivariate distributions because

g gProb T ≤ t |g 5 1 2 B(t) Þ 1 2 F t .~ ! ~ !~ !We follow Marshall and Olkin and first present thecopula construction in Equation (3.2) by using distri-bution functions because it is useful for all randomvariables. However, for positive lifetime random var-iables, the concept of frailty models is intuitivelyappealing; thus, using survival functions in theconstruction is preferred for these applications. Wefinally remark, unlike the gamma and positive stablefamilies, that Frank’s copula Cf(u,v) is symmetricabout the point (1/2, 1/2) [for example, Cf(u,v)5Cf(1/22u, 1/22v)]. Thus, it is invariant to thechoice of F or S512F in the construction (Genest1987).

3.2 Measures of Association

Recall the copula representation of a distributionfunction in Equation (1.1),

F(x , x , . . . , x ) 5 C F (x ), F (x ), . . . , F (x ) .~ !1 2 p 1 1 2 2 p p

With this expression, we see that F is a function of itsmarginals and the copula. As pointed out by Genestand Rivest (1993), this suggests that a natural way ofspecifying the distribution function is to examine thecopula and marginals separately. Moreover, the caseof independence is a special form of the copulaC(u1, u2, . . . , up)5u1zu2 z z z up (regardless of the mar-ginals), and this suggests that we examine the copulafunction to understand the association among randomvariables. Because we are concerned with correlationmeasures, we restrict our consideration to p52.

Schweizer and Wolff (1981) established that thecopula accounts for all the dependence between tworandom variables, X1 and X2, in the following sense.Consider g1 and g2, strictly increasing (but otherwisearbitrary) functions over the range of X1 and X2. Then,Schweizer and Wolff showed that the transformed var-iables g1(X1) and g2(X2) have the same copula as X1

and X2. Thus, the manner in which X1 and X2 ‘‘move

together’’ is captured by the copula, regardless of thescale in which each variable is measured.

Schweizer and Wolff also showed that two standardnonparametric correlation measures could be ex-pressed solely in terms of the copula function. Theseare Spearman’s correlation coefficient, defined by

r X , X 5 12E F x 2 1/2 F x 2 1/2~ ! ~ ! ~ !~ !~ !1 2 1 1 2 2$ %

5 12 ** C u,v 2 uv dudv~ !$ %and Kendall’s correlation coefficient, defined by

t X , X 5 Prob X 2 X* X 2 X* . 0~ ! ~ !~ !1 2 1 1 2 2$ %2 Prob X 2 X* X 2 X* , 0~ !~ !1 1 2 2$ %

5 4 **C u,v dC u,v 2 1.~ ! ~ !

For these expressions, we assume that X1 and X2 havea jointly continuous distribution function. Further, thedefinition of Kendall’s t uses an independent copy of (X1,X2),(X *1, X*2) to define the measure of ‘‘concordance.’’See Section 5 for more details. Also, the widely usedPearson correlation coefficient, Cov(X1, X2)/(VarX1

VarX2)1/2, depends not only on the copula but also onthe marginal distributions. Thus, this measure is af-fected by (nonlinear) changes of scale.

Table 3 illustrates the calculation of these correla-tion measures. The correlations from Frank’s copularely on the so-called ‘‘Debye’’ functions, defined as

x kk tD (x) 5 * dt ,k 0k tx e 2 1

for k51,2. To evaluate negative arguments of theDebye function Dk, basic calculus shows that

kxD (2x) 5 D (x) 1 .k k k 1 1

An important point of this table is that there is a one-to-one relationship between each correlation measureand the association parameter a. Further, Table 3 al-lows us to see a drawback of the Clayton/Cook-Johnson/Oakes and Gumbel-Hougaard copula families.Because of the limited dependence parameter spaceas shown in Table 1, these families permit only non-negative correlations, a consequence of the latent var-iable model. However, Frank’s family permits negativeas well as positive dependence.

Correlation measures summarize information inthe copula concerning the dependence, or associa-tion, between random variables. Following a proce-dure due to Genest and Rivest (1993), we can also

Page 10: Applications of Copulas

10 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

TABLE 3ARCHIMEDEAN COPULAS AND THEIR MEASURES OF DEPENDENCE

Family Bivariate Copula Cf(u,v) Kendall’s t Spearman’s r

Independence u v 0 0

Clayton (1978), Cook-Johnson (1981),Oakes (1982)

21/a2a 2au 1 v 2 1~ !

a

a 1 2Complicated form

Gumbel (1960), Hougaard (1986)exp

1/aa a2 (2 ln u) 1 (2 ln v)@ #$ % 1 2 a21 No closed form

Frank (1979) au av1 (e 2 1)(e 2 1)ln 1 1~ !aa e 2 1

1 24

{D (2a) 2 1}1a1 2

12{D (2a) 2 D (2a)}2 1a

use the dependence measure to specify a copula formin empirical applications, as follows.

Genest and Rivest’s procedure for identifying a cop-ula begins by assuming that we have available a ran-dom sample of bivariate observations, (X11, X21), . . . ,( ). Assume that the distribution function F hasX , X1n 2n

associated Archimedean copula Cf; we wish to iden-tify the form of f. We work with an intermediate(unobserved) random variable Zi5F( ) that hasX , X1i 2i

distribution function K(z)5Prob(Zi ≤ z). Genest andRivest showed that this distribution function is re-lated to the generator of an Archimedean copulathrough the expression

f(z)K(z) 5 z 2 .

f ' (z)

To identify f, we:1. Estimate Kendall’s correlation coefficient using

the usual (nonparametric or distribution-free) es-timate

21nt 5 sign[ X 2 X X 2 X ].Σ ~ !~ !n ~ ! 1i 1j 2i 2j2 i,j

2. Construct a nonparametric estimate of K, as fol-lows:a. First, define the pseudo-observations Zi 5 {num-

ber of (X ) such that ,X1i and X ,X } /, X X1j 2j 1j 2 2ij

(n 2 1) for i 5 1, . . . , n.b. Second, construct the estimate of K as Kn(z)5

proportion of Zi's ≤ z .3. Now construct a parametric estimate of K using the

relationship

f(z)K (z) 5 z 2 .f f '(z)

For example, refer to Table 1 for various choicesof f and use the estimate tn to calculate an

estimate of a, say an. Use an to estimate f(x), sayfn(x). Finally, use fn(x) to estimate Kf(z), sayKf (z).

n

Repeat step 3 for several choices of f. Then, com-pare each parametric estimate to the nonparametricestimate constructed in Step 2. Select the choice off so that the parametric estimate K (z) most closelyfn

resembles the nonparametric estimate Kn(z). Meas-uring ‘‘closeness’’ can be done by minimizing a dis-tance such as *[K (z)2Kn(z)]2 dKn(z) or graphically.fn

Graphical representations include (1) plots of Kn(z)and K (z) versus z, and (2) the corresponding quan-fn

tile plots. See Section 4 for an example.

3.3 Simulation

Actuaries routinely deal with complex nonlinear func-tions, such as present values, of random variables.Simulation is a widely used tool for summarizing thedistribution of stochastic outcomes and for commu-nicating the results of complex models. The copulaconstruction allows us to simulate outcomes from amultivariate distribution easily.

The two primary simulation strategies are the Ar-chimedean and compounding methods for construct-ing copulas; each has relative advantages and dis-advantages.

We begin with the Archimedean construction. Ourgoal is to construct an algorithm to generate

having known distribution functionX , X , . . . , X1 2 p

F(x1, x2, . . . . , xp)5C ,F (x ), F (x ), . . . , F (x )~ !1 1 2 2 p p

where the copula function is

21C u , u , . . . , u 5 f f u 1 1 f u .z z z~ ! ~ ! ~ !~ !1 2 p 1 p

For this construction, Genest and Rivest (1986b)and Genest (1987) introduced the idea of simulating

Page 11: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 11

the full distribution of (X1, X2, . . . , Xp) by recursivelysimulating the conditional distribution of Xk given X1,. . . . , Xk21, for k52, . . . , p. This idea, subsequentlydeveloped by Lee (1993), is as follows. For simplicity,we assume that the joint probability density functionof X1, X2, . . . , Xp exists. Using the copula construction,the joint probability density function of X1, . . . , Xk is

k]21f (x , . . . , x ) 5 fk 1 k ]x . . . ]x1 k

f F x 1 1 f F xz z z@ ~ !# @ ~ !#1 1 k k$ %21(k)5 f f F x 1 1 f F xz z z@ ~ !# @ ~ !#1 1 k k$ %

k(1) (1)f F x F x .@ ~ !# ~ !P j j j j

j51

Here, the superscript notation (j) means the j-thmixed partial derivative. Thus, the conditional den-sity of Xk given X1, . . . , Xk21 is

f (x , . . . , x )k 1 kf (x |x , . . . , x ) 5k k 1 k21 f (x , . . . , x )k21 1 k21

(1) (1)5 f F (x ) F (x )@ #k k k

21(k21)f f F (x ) 1 1 f F (x )@ # z z z @ #1 1 k k$ %.

21(k21)f f F (x ) 1 1 f F (x )@ # z z z @ #1 1 k21 k21$ %Further, the conditional distribution function of Xk

given X1, . . . , Xk21 is

F x |x , . . . , x~ !k k 1 k21

xk

5 * f x |x , . . . , x dx~ !k 1 k212`

21(k21)f f F x 1 1 f F xz z z@ ~ !# @ ~ !#1 1 k k$ %5

21(k21)f f F x 1 1 f F xz z z@ ~ !# @ ~ !#1 1 k21 k21$ %21(k21)f c 1 f F x@ ~ !#k21 k k$ %

5 ,21(k21)f c~ !k21

where ck5f[F1(x1)]1 z z z1f[Fk(xk)]. With this distri-bution function, we can now use the usual procedureof solving for the inverse distribution function andevaluating this at a uniform random number; that is,use F21(U)5Xk.

To summarize, the algorithm is (Lee 1993):

Algorithm 3.1 Generating Multivariate Outcomesfrom an Archimedean Copula

1. Generate U1, U2, . . . , Up independent uniform(0,1) random numbers.

2. Set X15F (U1) and c0 5 0.211

3. For k52, . . . . , p, recursively calculate Xk as thesolution of

U 5 F X |x , . . . , x~ !k k k 1 k21

21(k21)f c 1 f F x@ ~ !#k21 k k$ %5 . (3.4)

21(k21)f c~ !k21

This algorithm was initially introduced in the contextof Frank’s copula for p52. Here, pleasant calculationsshow that the algorithm reduces to:

Algorithm 3.2 Generating Bivariate Outcomes fromFrank’s Copula

1. Generate U1, U2 independent uniform (0,1) ran-dom numbers.

2. Set X15F (U1).211

3. Calculate X2 as the solution of21

2a 2aF (X )2 2e 2 e2aU1U 5 e 1 1 .2 2aF (X~ !)2 2e 2 1

That is, calculate X25F (U*2) where212

2a 2aU1U e 2 e 1 2 U~ !2 *2

U 5 .*22aU1U 1 e 1 2 U~ !2 *2

Genest (1987) gave this algorithm.The algorithm can also be readily used to simulate

distributions from Clayton’s family. From Table 2, wehave f21(s)5(11s)21/a so that

21(1) 21 2(1/a)21f (s)52a (11s) .

This expression with p52 and Equation (3.4) showthat

2(1/a)2121/a 21/a212a 1 1 F X 2 1 1 U 2 1~ !~ !2 2 1

U 5 .2 1(1/a)2121/a212a 1 1 U 2 1~ !1

That is, calculate X25F (U*2) where212

2a21/a 2a/(a11)U 5 1 1 U U 2 1 .~ !~ !*2 1 *2

For the Gumbel-Hougaard copula, determining X2

in Equation (3.4) requires an iterative solution.Although straightforward, this is computationally

Page 12: Applications of Copulas

12 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

TABLE 4SUMMARY STATISTICS OF LOSSES AND EXPENSES

ALAE Loss Policy LimitLoss

(Uncensored)Loss

(Censored)

Number 1,500 1,500 1,352 1,466 34Mean 12,588 41,208 559,098 37,110 217,491Median 5,471 12,000 500,000 11,048 100,000Standard Deviation 28,146 102,748 418,649 92,513 258,205Minimum 15 10 5,000 10 5,000Maximum 501,863 2,173,595 7,500,000 2,173,595 1,000,00025th Percentile 2,333 4,000 300,000 3,750 50,00075th Percentile 12,577 35,000 1,000,000 32,000 300,000

expensive because many applications require largenumbers of simulated values. This drawback leads usto introducing an alternative algorithm suggested byMarshall and Olkin (1988) for compound construc-tions of copulas.

To generate X1, X2, . . . , Xp having a mixture ofpowers distribution specified in Equation (3.2), thealgorithm is:

Algorithm 3.3 Generating Multivariate Outcomesfrom a Compound Copula

1. Generate a (latent) random variable g having La-place transform t.

2. Independently of step 1, generate U1, U2, . . . , Up

independent uniform (0, 1) random numbers.3. For k51, . . . , p, calculate Xk5F (U*k) where21

k

21U 5 t 2g ln U . (3.5)~ !*k k

Recall that the marginal distribution function can becalculated from the baseline distribution function us-ing Fk(x)5t . To illustrate for the Gumbel-2 ln H (x)~ !k

Hougaard copula, from Equation (3.5), we have

21/a21U 5 exp 2 2g ln U .@ #~ !*k k

The algorithm is straightforward for most copulasof interest that are generated by the compoundingmethod. Like the conditional distribution approach, itcan be easily implemented for more than two dimen-sions (p.2). It is computationally more straightfor-ward than the conditional distribution approach. Adisadvantage is that it requires the generation of anadditional variable, g. For bivariate applications, thismeans generating 50% more uniform random variates;this can be expensive in applications.

4. INSURANCE COMPANY LOSS ANDEXPENSE APPLICATION

This section illustrates methods of fitting copulas toinsurance company indemnity claims. The data com-prise 1,500 general liability claims randomly chosenfrom late settlement lags and were provided by Insur-ance Services Office, Inc. Each claim consists of anindemnity payment (the loss, X1) and an allocatedloss adjustment expense (ALAE, X2). Here, ALAE aretypes of insurance company expenses that are specif-ically attributable to the settlement of individualclaims such as lawyers’ fees and claims investigationexpenses; see, for example, Hogg and Klugman (1984).Our objective is to describe the joint distribution oflosses and expenses.

Estimation of the joint distribution of losses and ex-penses is complicated by the presence of censoring, acommon feature of loss data (Hogg and Klugman1984). Specifically, in addition to loss and expenseinformation, for each claim we have a record of thepolicy limit, the maximal claim amount. With thepresence of the policy limit, the loss variable is cen-sored because the amount of claim cannot exceed thestated policy limit. For some claims, the policy limitwas unknown, and for these policies, we assumedthere was no policy limit.

Table 4 summarizes the data. Here, only 34 of 1,500policies have claims that equaled the policy limit andthus are considered censored. However, the censoredlosses cannot be ignored; for example, the mean loss ofcensored claims is much higher than the correspondingmean for uncensored claims. Table 4 also shows thatour sample comprises only claims with positive lossesand expenses. Separate models would be required forclaims with positive losses but no expenses or for claimswith zero losses but positive expenses.

Figure 1 is a scatterplot of loss versus ALAE. Thecorresponding correlation coefficient is 0.41. This

Page 13: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 13

FIGURE 1PLOT OF ALAE VERSUS LOSS.

BOTH VARIABLES ARE ON A LOGARITHMIC SCALE.

THIRTY-FOUR LOSS OBSERVATIONS ARE CENSORED.

THIS PLOT DEMONSTRATES A STRONG RELATIONSHIP

BETWEEN ALAE AND LOSS.

statistic, together with the plot, suggests a strong re-lationship between losses and expenses.

4.1 Fitting Marginal Distributions

The initial step in our model fitting is to determinethe appropriate marginals. Determining appropriateparametric distributions for univariate data is well de-scribed in Hogg and Klugman (1984). For these data,this step was examined in detail earlier by Klugman

FIGURE 2FITTED DISTRIBUTION FUNCTIONS OF ALAE.

THE DOTTED CURVE IS THE EMPIRICAL DISTRIBUTION FUNCTION.

THE SMOOTH CURVE IS A FIT USING THE PARETO DISTRIBUTION.

and Parsa (1995). Thus, for simplicity, we presenthere only the fit of the univariate marginals using aPareto distribution. With parameters l and u, the dis-tribution function is

ulF(x) 5 1 2 .~ !l 1 x

The quality of the fit of the marginal distributions canbe examined with a graphical comparison of the fitteddistribution function against their empirical versions,as displayed in Figures 2 and 3. Because of censoring,we used the Kaplan-Meier empirical distribution func-tion for the loss variable.

4.2 Fitting a Copula to the Bivariate Data

To fit the copula, we first identify the form of the cop-ula in Section 4.2.1 and then estimate it using maxi-mum likelihood in Section 4.2.2.

4.2.1 Identifying a Copula

We use the procedure developed by Genest and Rivest(1993) for identifying an appropriate copula, outlinedin Section 3.2. According to the procedure, we exam-ine the degree of closeness of the parametric and non-parametric versions of the distribution function K(z).This procedure is based on estimates of K(z), the dis-tribution function of pseudo-observations Z5F(X1, X2).The idea behind the procedure is to compare a

FIGURE 3FITTED DISTRIBUTION FUNCTIONS OF LOSS.

THE DOTTED CURVE IS A KAPLAN-MEIER EMPIRICAL

DISTRIBUTION FUNCTION. THE SMOOTH CURVE IS A FIT USING THE

PARETO DISTRIBUTION.

Page 14: Applications of Copulas

14 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

nonparametric estimate of K(z) to those based on aspecific form of the copula. To compare the two esti-mates of K(z), we examine quantiles from each esti-mated distribution. Scatterplots of the two sets ofquantiles are widely known in statistics as quantile-quantile, or q-q, plots. For identification purposes, weignore the mild censoring in the loss variable althoughwe do accommodate it in the more formal maximumlikelihood fitting in Section 4.2.2.

We now discuss identification for three widely usedcopulas, namely, the Gumbel-Hougaard, Frank, andCook-Johnson copulas. The form of the copula for eachof these families appears in Table 3. The comparisonof the resulting q-q plots is displayed in Figure 4. Be-cause of the close agreement between nonparametricand parametric quantiles, the procedure suggests theuse of the Gumbel-Hougaard copula. The quantilesbased on Frank’s copula are also close to the nonpar-ametric quantiles, although there is greater disparity atthe higher quantiles, corresponding to high losses andexpenses. We therefore identify both Frank’s and Gum-bel-Hougaard’s as copulas that we fit more formally inSection 4.2.2.

4.2.2 Fitting a Copula Using MaximumLikelihood

Recall that our data consist of losses (X1) and ex-penses (X2) and that we also have available an indi-cator for censoring (d), so that d51 indicates theclaim is censored. Parameters were estimated usingmaximum likelihood procedures that were pro-grammed using the SAS procedure IML. In the devel-opment of the likelihood equation, we use thefollowing partial derivatives:

]F(x , x )1 2F (x , x ) 5 ,1 1 2 ]x1

]F(x , x )1 2F (x , x ) 5 ,2 1 2 ]x2

and

2] F(x , x )1 2f(x , x ) 5 .1 2 ]x ]x1 2

Similarly, the first partial derivatives for the copulawill be denoted by C1 and C2; the second mixed partialderivative by C12.

Using a one-parameter copula and Pareto marginals,we have a total of five parameters to estimate: two eachfor the marginals and one for the dependence param-eter. To develop the likelihood, we distinguish betweenthe censored and uncensored cases. If the loss variableis not censored, then d50 and the contribution to thelikelihood function is

f x ,x 5 f x f x C F x , F x . (4.1)~ ! ~ ! ~ ! @ ~ ! ~ !#1 2 1 1 2 2 12 1 1 2 2

If the loss variable is censored, then d51 and the jointprobability is given by

Prob X ≥ x , X ≤ x 5 F (x ) 2 F(x , x ).~ !1 1 2 2 2 2 1 2

Thus, the contribution to the likelihood when the ob-servation is censored is

f x 2 F x , x~ ! ~ !2 2 2 1 2

5 f x 1 2 C F x , F x . (4.2)~ ! ~ ! ~ !2 2 2 1 1 2 2@ #$ %Combining Equations (4.1) and (4.2), for a single ob-servation, the logarithm of the likelihood function is

FIGURE 4QUANTILE-QUANTILE (Q-Q) PLOTS, CORRESPONDING TO THE PARAMETRIC AND NONPARAMETRIC DISTRIBUTION ESTIMATES OF THE PSEUDO-

OBSERVATIONS DEFINED IN SECTION 3.2. THE DOTTED LINES CORRESPOND TO THE QUANTILES OF NONPARAMETRIC AND PARAMETRIC ESTIMATES OF

THE ARCHIMEDEAN GENERATOR f. THE SMOOTHED LINES CORRESPOND TO THE CASE WHERE THE QUANTILES ARE EQUAL.

Page 15: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 15

log L x , x , d 5 1 2 d log f x , x 1 d log f x~ ! ~ ! ~ ! ~ !1 2 1 2 2 2$1 log 1 2 C F x , F x .~ ! ~ !@ #~ !2 1 1 2 2 %

(4.3)

The parameter estimates are then determined bymaximizing the likelihood for the entire dataset:

n

log L x , x , d .Σ ~ !1 2i ii51

Results of the maximum likelihood estimation fittingthe Gumbel-Hougaard copula, whose partial derivativesare derived in Appendix A, are summarized in Table5. Here, we see that the parameter estimates of themarginal distributions are largely unchanged when wecompare the univariate to the bivariate estimation.Standard errors are smaller in the bivariate fit, indi-cating greater precision of the parameter estimates.The estimate of the dependence parameter is signifi-cantly different from 1; the estimate of a is approxi-mately 13 standard errors above 1. This providesstrong statistical evidence that losses and expenses arenot independent. Using Table 3 for the Gumbel-Hou-gaard copula, we can convert the dependence param-eter into a more familiar measure of association. Thus,the parameter value of a51.453 corresponds to an ap-ˆ

proximate Spearman’s correlation measure of 31%. A

95% confidence interval for a is therefore given bya51.96*se(a)51.45351.96(0.034)5(1.386,1.520). Thisˆ ˆ

corresponds to a 95% confidence interval of (28%,34%) for the Spearman’s correlation.Results of the maximum likelihood estimation fittingthe Frank copula are summarized in Table 6. Here,the behavior of parameter estimates and standard er-rors is similar to the fit using the Gumbel-Hougaardcopula. For our parameterization of Frank’s copula,the case of independence corresponds to a50. In ourcase, the parameter estimate of a523.158 corre-ˆ

sponds to an approximate Spearman’s correlationmeasure of 32%, which is close to the estimate usingthe Gumbel-Hougaard copula.It is difficult to compare the fit of the two copulasdirectly because they are non-nested models. How-ever, we did compute Akaike’s Information Criteria(AIC) for each model, given by AIC5[22 ln (maxi-mized likelihood)12(5)]/1500. The results are 15.02and 15.06 for the Gumbel-Hougaard and Frank copulamodels, respectively. The smaller AIC value for theGumbel-Hougaard model indicates that this model ispreferred.

4.3 Uses of the Bivariate Fit

This section describes two applications using the es-timated bivariate distribution of losses and expenses.

TABLE 5BIVARIATE DATA PARAMETER ESTIMATES USING GUMBEL-HOUGAARD’S COPULA WITH PARETO MARGINALS

Bivariate Distribution Univariate Distribution

Parameter Estimate Standard Error Estimate Standard Error

Loss (X1) l1 14,036 1,298 14,453 1,397u1 1.122 0.062 1.135 0.066

ALAE (X2) l2 14,219 1,426 15,133 1,633u2 2.118 0.153 2.223 0.175

Dependence a 1.453 0.034 Not applicable Not applicable

TABLE 6BIVARIATE DATA PARAMETER ESTIMATES USING FRANK’S COPULA WITH PARETO MARGINALS

Bivariate Distribution Univariate Distribution

Parameter Estimate Standard Error Estimate Standard Error

Loss (X1) l1 14,558 1,390 14,453 1,397u1 1.115 0.065 1.135 0.066

ALAE (X2) l2 16,678 1,824 15,133 1,633u2 2.309 0.187 2.223 0.175

Dependence a 23.158 0.174 Not applicable Not applicable

Page 16: Applications of Copulas

16 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

4.3.1 Calculating Reinsurance Premiums

After having identified the joint distribution of (X1,X2), we can examine the distribution of any knownfunction of X1 and X2, say, g(X1, X2). To illustrate, con-sider a reinsurer’s expected payment on a policy withlimit L and insurer’s retention R. Then, assuming apro-rata sharing of expenses, we have

0 if X , R1

X 2 R1g X , X 5 X 2 R 1 X if R ≤ X , L.~ !1 2 1 2 1X1$ L 2 RL 2 R 1 X if X ≥ L2 1L

The expected payment Eg(X1, X2) could be calculatedusing numerical integration when the joint density oflosses and expenses is available. However, simulationis a simpler, numerical evaluation tool. The procedurefor simulation is described in Section 3.3.

The idea with simulation is to generate a sequenceof bivariate data (x , x ) from the bivariate distribu-1 2i i

tion model. The procedure for the Gumbel-Hougaardcopula is summarized in Algorithm 3.3, using Equa-tion (3.6) for Equation (3.5). In the procedure, theinverse of the marginal distribution functions isneeded. In the Pareto case, it is not difficult to verifythat F21(x)5l[(12x)21/u21].

The simulation steps outlined in Algorithm 3.3, us-ing the Gumbel-Hougaard copula, are repeated a largenumber of times. Let nsim be the number of simula-tions performed so that we generate the sequence ofsample (x , x ), i51, . . . , nsim. Thus, the estimated1i 2i

value for the reinsurer’s expected payment is given bynsim

1g* L, R 5 g x , x ,Σ~ ! ~ !1i 2i

i51nsim

with standard error

nsim1

2 2ˆg(x , x ) 2 g* L, RΣ ~ !1i 2ii51nsimˆse g* L, R 5 .~ !~ ! = nsim

We performed a simulation study of size nsim5100,000; the results are summarized in Table 7.

The results in Table 7 provide the adjusted premi-ums the reinsurer would have assessed to cover costsof losses and expenses according to various policylimits and ratios of insurer’s retention to policy limit.On a crude basis, these results appear to make sense.For example, from the summary statistics in Table 4,the average policy limit is 559,098 with an average oflosses plus expenses of 53,796. Without any reinsur-ance, our results indicate an adjusted premium of49,367, with a standard error of 733, for a policy limitof 500,000. Furthermore, the results are intuitivelyappealing because as expected we observe (1) higherpremium for larger policy limits and (2) lower pre-mium when the ratio R /L is higher, that is, insurerretains larger amount of losses.

Because it is common practice to assume inde-pendence, we provide Table 8, which gives ratios ofdependence to independence reinsurance premiums.The dependence assumption is based on the Gum-bel-Hougaard estimation results, while the independ-ence assumption is based on the estimation resultswhen the joint distribution is assumed to be the prod-uct of the marginals. In previous sections, we arguethat the estimation results using dependence are sta-tistically significant. Table 8 shows that substantialmispricing could result if the unrealistic assumptionof independence between losses and expenses ismade. A ratio below 1.0 from the table suggests anovervalued reinsurance premium; a ratio above 1.0suggests undervalued premiums. According to the

TABLE 7SIMULATION-BASED REINSURANCE PREMIUMS (SIMULATION STANDARD ERRORS ARE IN PARENTHESIS)

Ratio of Insurer’s Retention to Policy Limit (R/L)

Policy Limit (L) 0.00 0.25 0.50 0.75 0.95

10,000 15,636 (640) 11,232 (480) 7,220 (320) 3,498 (160) 684 (32)100,000 34,264 (655) 17,965 (493) 10,003 (328) 4,425 (164) 819 (33)500,000 49,367 (733) 17,457 (544) 9,234 (359) 4,007 (179) 739 (36)

1,000,000 55,683 (818) 16,762 (597) 8,740 (390) 3,716 (193) 672 (38)

Page 17: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 17

TABLE 8SIMULATION-BASED RATIOS OF DEPENDENCE TO INDEPENDENCE

REINSURANCE PREMIUMS

Ratio of Insurer’s Retentionto Policy Limit (R/L)

Policy Limit (L) 0.00 0.25 0.50 0.75 0.95

10,000 0.80 0.95 1.02 1.07 1.10100,000 0.89 1.24 1.36 1.44 1.50500,000 0.92 1.31 1.40 1.47 1.52

1,000,000 0.93 1.31 1.39 1.47 1.53

table, undervalued premiums result from higher re-tention by the reinsured. This undervaluation is moreimportant for higher policy limits. The greatest over-valued premiums are for large retention levels andpolicy limits. This is intuitively plausible; pricing inthe tail of the distribution is very sensitive to modelmisspecification.

4.3.2 Estimating Regression Functions

As described in Section 1, the regression function isthe most widely used tool for describing multivariaterelationships. To illustrate in the context of losses andclaims, we examine situations in which it is useful toestimate the expected amount of expenses for a givenlevel of loss. Copulas can help us understand the fulljoint distribution and thus be used to address someimportant applications described in Sections 2, 4.3.1,and 5. We can also use copulas to define regressionfunctions, as follows.

To be specific, let us assume an Archimedean formof the copula as in Section 3.3 so that the conditionaldistribution of Xk given X1, . . . , Xk21 is

21(k21)f c 1 f F x~ !k21 k k@ #$ %F x |x , . . . , x 5 ,~ !k k 1 k21

21(k21)f c~ !k21

where ck5f{F1(x1)}1 z z z1f{Fk(xk)}. Basic resultsfrom mathematical statistics show that the regressionfunction can be expressed as

E X |x , . . . , x~ !k 1 k21

`

5 * 1 2 F x |x , . . . , x dx@ ~ !#k 1 k210

0

1 * F x |x , . . . , x dx .~ !k 1 k212`

In certain situations, this expression is convenient toevaluate. For example, assuming k52 and using uni-form marginals and Frank’s copula, Genest (1987)gave the regression function

2a 2ax 2a 2ax1 2 e xe 1 e e 2 1~ ! ~ !E X |X 5 x 5 .~ !2 1

2ax 2a 2axe 2 1 e 2 e~ !~ !

In general, however, the calculation of the regres-sion function is tedious. As an alternative, copulas arewell-suited to the concept of ‘‘quantile’’ regression.Here, in lieu of examining the mean of a conditionaldistribution, one looks at the median or some otherpercentile (quantile) of the distribution. Specifically,define the p-th quantile to be the solution xp of theequation:

p 5 F x |x , . . . , x .~ !k p 1 k21

For the case k52, we have

p 5 F x | X 5 x 5 C F x ,F x . (4.5)~ ! @ ~ ! ~ !#2 p 1 1 1 1 1 2 p

The first partial derivative for the case of the Gumbel-Hougaard copula is derived in Appendix A. For thecase of Frank copula, the first partial derivative isgiven by

au ave e 2 1~ !C u,v 5 . (4.6)~ !1

a au ave 2 1 1 e 2 1 e 2 1~ !~ !

For a complete derivation of partial derivatives of theFrank’s copula, see also Frees et al. (1996).

Thus, for a specified proportion p and amount ofloss x1, we can find the percentile of the correspond-ing expenses by solving Equation (4.5). For the caseof the Gumbel-Hougaard copula, we use Equation(A.2) to get the p-th percentile of the expense level,which is given by

21x 5 F (v ),p 2 p

where vp is the solution to the following equation:

a21 C F x , v~ !~ !1 1 pln F (x )1 1 5 p .F x~ !1 1~ !ln C F x , v~ !~ !1 1 p

For various percentiles, Figure 5 graphically dis-plays the result of regression curves that provideestimates of expenses conditional on the amount ofloss using the Gumbel-Hougaard copula. We super-imposed these regression curves on the scatterplot oflosses and expenses. This plot allows a manager toestimate expenses for a prespecified loss amount. Byproviding several percentiles, the manager can choosethe degree of conservatism that is appropriate for thebusiness decision at hand.

Page 18: Applications of Copulas

18 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

FIGURE 5SCATTERPLOT OF ALAE VERSUS LOSS

WITH QUANTILE REGRESSION CURVES SUPERIMPOSED

5. ADDITIONAL APPLICATIONS OFCOPULAS

We now discuss three different subject areas useful tothe actuary in which copulas have been applied: sto-chastic ordering, fuzzy logic, and insurance pricing.Stochastic ordering refers to relationships among dis-tribution functions of random variables. Fuzzy logic isan approach for dealing with uncertainty, analogous toprobability theory. All actuaries are familiar with in-surance pricing, which involves premium calculation.

5.1 Stochastic Ordering

In actuarial science and the economics of decision-making, the comparison of the attractiveness of variousrisks is usually of interest; therefore the subject of sto-chastic orderings is of prime importance. To illustrate,we say the random variable X1 stochastically domi-nates the random variable X2 with respect to a class offunctions, say U, if for any function u[U, we have

Eu X ≥ Eu X .~ ! ~ !1 2

Often, u is the utility-of-wealth function that definesrisk preferences of decision-makers. As special cases,we consider the well-known classes of first (FSD) andsecond (SSD) stochastic dominance. In FSD, the classU is taken to be the class of increasing utility func-tions. In SSD, U is the smaller class of increasing, con-cave utility functions, applicable to risk-aversedecision-makers; see Kaas et al. (1994), Heilman andSchroter (1991), and Gooavaerts et al. (1982).

Copulas are used in the analysis for the demand ofinsurance coverage studied by Tibiletti (1995). The

problem is to find the optimal insurance coverage fora decision-maker when a proportion of the wealth isuninsurable. Suppose that X1 is the amount of unin-surable asset and X2 is the insurable loss with 0≤x2≤m; that is, m is the maximum value of the insurableasset. In a two-period model, define the final wealthto be

Z 5 X 1 m 2 X 1 d X 2 p . (5.1)~ !1 2 2

Here, d is the coinsurance rate and p is the premiumrate. In this insurance setup, the questions typicallyexplored are: (1) what coinsurance rate d maximizesE(Z) and (2) when do certain ‘‘beneficial’’ changesthat increase expected utility affect optimal coverage?Tibiletti explored beneficial changes such as (1) ashift in the distribution of either X1 or X2, (2) a changein the dependence between X1 and X2, and (3) achange in both (1) and (2). Because dependencecomes into play, copulas provide a natural tool in thissituation. Most of the theorems described and proveduse conditions that involve a certain dependence or-dering called more concordance. If (X1, X2) and (Y1,Y2) are pairs with associated copulas Cx and Cy, thenthe pair (X1, X2) is said to be more concordant than(Y1, Y2) if Cx(a,b)≥CY(a,b) for all (a,b) [0,1]2.[

The more concordant ordering is just one of severaltypes of dependence ordering used to order multi-variate distributions. See Tchen (1980), Kimeldorfand Sampson (1987, 1989), and Metry and Sampson(1991) for alternative orderings. For example, anothertype is the so-called more regression dependent or-dering. Here, the random vector (X1, X2) is said to bemore regression dependent than (Y1, Y2) if for anyx '1.x1, we have

Pr(X ≤ x |X 5 x ')2 2 1 1 ≥ 1Pr(Y ≤ y |Y 5 x ')2 2 1 1

whenever we have

Pr(X ≤ x |X 5 x )2 2 1 1 ≥ 1.Pr(Y ≤ y |Y 5 x )2 2 1 1

See Bilodeau (1989). Dependence ordering is partic-ularly useful for determining the range of possible de-pendence in multivariate random variables. Otherareas in which stochastic orderings are useful areprobability theory, reliability, and operations re-search; see Mosler and Scarsini (1991).

5.2 Fuzzy Sets

Like probability theory, fuzzy set theory deals withuncertainty. To start, let U be a non-empty set whose

Page 19: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 19

subsets are of primary interest; in fuzzy set theory, itis called the universe of discourse. A fuzzy set is de-fined to be a pair (E, µE) where E,U and µE:U→[0, 1]. For any x in U, µE(x) denotes the degree towhich x belongs to the fuzzy set, almost like a prob-ability.

Operations, such as unions and intersections, onfuzzy sets are common. To illustrate, if we have twofuzzy sets (A, µA) and (B, µB), then their union is thefuzzy set (C, µC) where µC(x)5max[µA(x), µB(x)] andtheir intersection is the fuzzy set (D, µD) with µD

(x)5min[µA(x, µB(x)]. Other operations on fuzzy setsare performed using triangular norms. Triangular normswere considered originally in the context of probabilisticmetric spaces and provide the link to copulas.

Following Ostaszewski (1993) and Klement (1982a,1982b), a triangular norm (t-norm for short), T is amapping T :[0, 1]2 → [0, 1] with the following proper-ties: (1) boundary condition: T(x, 1)5 x; (2) monoton-icity: T(x, y)≤T(u, v) whenever x≤u, y≤v; (3)commutativity: T(x, y)5T(y, x); and (4) associativity:T(T(x, y), z)5T(x, T(y, z)). In Schweizer and Sklar(1983), several results are given that relate triangularnorms and copulas. Many copulas can serve as trian-gular norms, as the following examples illustrate.

Example 5.1 Frechet Bounds

T(x, y) 5 min(x, y) and T(x, y) 5 max(x 1 y 2 1, 0).

Example 5.2 Independence

T(x, y) 5 xy.

Example 5.3 Frank

x y(h 2 1)(h 2 1)ln 1 1

h 2 1T(x, y) 5

ln h

where h . 0, h Þ 1.Therefore, by considering familiar copulas, as

well as newly constructed copulas, we can define newoperations that can be performed on fuzzy sets. Fuzzyset theory has been applied in specific areas of inter-est to many actuaries such as risk economics, interesttheory, and underwriting/classification of risks. SeeOstaszewski (1993), LeMaire (1990), and Young(1993) for descriptions of these applications.

5.3 Insurance Pricing

The calculation of a premium to be assigned to aninsurance risk X is a fundamental job of the actuary.Recently, Wang (1996) developed a principle thatexhibits several desirable properties of a premiumprinciple. Wang (1997) used the concept of a distor-tion function and extended this, using copulas, to aportfolio of dependent risks.

Let X be an insurance risk with survival functionS(x)5Pr(X.x). A class of premium principles can bedefined by p(X)5* g[S(z)]dz, where g, called the dis-`

0

tortion function, is increasing with g(0)50 and g(1)51. In the special case of the function g(t)5t1/h withh≥0, we have the class of proportional hazard (PH)transforms.

For the bivariate case, consider a random vector(X1, X2) with distribution function F(x1, x2)5C[F1(x1), F2(x2)]; then the function

;F(x1, x2)5g[F(x1,

x2)] is another joint distribution function with mar-ginals g(F1) and g(F2). The associated copula is there-fore

;C(u1, u2)5g[C(g21(u1), g21(u2))]. Some illustra-

tions of distortion functions follow.

Example 5.4 Power Distortion Function

Here, we have g(t)5 t1/h, where h≥0. The independ-ence structure is preserved under this distortion func-tion. To see this, note that if C(u1, u2)5u1u2, then;C(u1, u2) 5g[C(u )]5u1u2.h h, u1 2

Example 5.5 Exponential Distortion Function

Here, we have

2ht1 2 eg(t) 5 ,

2h1 2 e

where h.0. Again, beginning with the independencecopula C(u1, u2)5u1u2, it is straightforward to showthat we can generate Frank’s family of copulas

u u1 2(h 2 1)(h 2 1)1 1

h 2 1;C(u , u ) 5 ln .1 2 ln h

Using the concept of copulas, Wang (1997) extendsthe PH measure to a portfolio of risks. In many situ-ations, for a portfolio of risks, the proposed measureis useful because individual risks are considered de-pendent. Wang justified dependence of individualrisks by stating that they are ‘‘influenced by the sameunderlying market environment.’’

Page 20: Applications of Copulas

20 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

6. SUMMARY AND CONCLUSIONS

In analyzing the impact of future contingent events,actuaries are faced with problems involving multi-variate outcomes. In this paper, we review the prob-lems of (1) estimating distributions of joint lifetimesof paired individuals, useful in the analysis of survi-vorship insurance protection, and (2) investigatingmortality experience, for the actuary who needs todistinguish among causes of death. We introduced,and provided a solution for, the problem of depend-ence between an insurer’s losses and expenses. Fail-ures of ignoring dependencies can lead to mispricing.Thus, it is important for actuaries to be able to ade-quately model multivariate outcomes.

The tool used to study multivariate outcomes is thecopula function; it couples univariate marginals to thefull multivariate distribution. The biological frailtymodels and the mathematical Archimedean modelscan motivate copulas. A statistical mixture of powersmodel serves as a bridge between these two sets offamilies.

Because copulas are parametric families, standardtechniques such as maximum likelihood can be usedfor estimation. Other statistical tools have been re-cently developed to help fit copulas. We described agraphical tool to identify the form of the copula. Wediscussed how copulas could be used to simulate multi-variate outcomes, an important tool for actuaries. Wealso developed the connection between copulas andthe regression function, a widely used tool for sum-marizing what we expect based on conditional distri-butions.

This article has focused on the connection betweencopulas and statistics, the theory of data. Yet, muchof the development of copulas has historically arisenfrom probability theory. To recognize this connec-tion, we briefly reviewed topics in applied probabilitytheory pertaining to copulas that are of the greatestinterest to actuaries: stochastic ordering, fuzzy settheory, and insurance pricing.

Our knowledge of copulas has been rapidly devel-oping recently. Many of the articles cited in this re-view paper were written in the 1990s. In anotherrecent survey paper, Kotz (1997) cites three confer-ences in the last five years that were largely devotedto copulas. Copulas offer analysts an intuitively ap-pealing structure, first for investigating univariate dis-tributions and second for specifying a dependencestructure. Copulas offer a flexible structure that canbe applied in many situations. We hope that this

article encourages actuaries to seek new applicationsfor this promising tool.

ACKNOWLEDGMENTS

We thank Stuart Klugman for introducing the first au-thor to the Section 4 insurance loss and expense prob-lem at the 1995 Casualty Actuarial Society RatemakingSeminar. Clive Keatinge of Insurance Services Office,Inc. provided the data for this problem and suggestedthe reinsurer’s pricing example. We also thank DonaldJones, Marjorie Rosenberg, and Virginia Young forhelpful comments from an early draft of this paper.The research of Emiliano Valdez was supported by adoctoral fellowship from the Society of Actuaries. Theresearch of Edward Frees was partially supported by aprofessorship funded by the Time Insurance Company.

ANNOTATED BIBLIOGRAPHY*ANDERSON, T.W. 1958. An Introduction to Multivariate Statis-

tical Analysis. New York: John Wiley.BILODEAU, M. 1989. ‘‘On the Monotone Regression Dependence

for Archimedean Bivariate Uniform,’’ Communications inStatistics–Theory & Methods A 18:981–988.This paper provides necessary and sufficient conditionsfor Archimedean copulas to exhibit the ‘‘monotone re-gression dependence’’ property. This property is desiredfor testing independence because of the resulting mono-tone power function. Frank’s family and Cook and John-son’s family of copulas are shown to possess the property.

BOWERS, N., GERBER, H., HICKMAN, J., JONES, D., AND NESBITT, C.1997. Actuarial Mathematics. 2nd ed. Schaumburg, Ill.:Society of Actuaries.

CARRIERE, J. 1994. ‘‘Dependent Decrement Theory,’’ Transac-tions of the Society of Actuaries XLVI:45–74.Copulas are used to model dependence in a multiple dec-rement framework. The problem of identifiability wasbriefly discussed. As an application, the paper uses U.S.population data to investigate the effect of removing heart/cerebrovascular diseases as a cause of death when as-sumed to be dependent with other causes.

CARRIERE, J. 1994. ‘‘A Large Sample Test for One-ParameterFamilies of Copulas,’’ Communications in Statistics–Theory and Methods 23, no. 5:1311–1317.This article provides an asymptotic test of whether or notidentically and independently distributed bivariate dataarise from a specified single-parameter copula family.

CLAYTON, D.G. 1978. ‘‘A Model for Association in Bivariate LifeTables and its Application in Epidemiological Studies of

*Articles directly related to copulas have been annotated for thereader’s convenience.

Page 21: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 21

Familial Tendency in Chronic Disease Incidence,’’ Bio-metrika 65:141–151.

COMMITTEES ON ACTUARIAL PRINCIPLES OF THE SOCIETY OF AC-TUARIES AND THE CASUALTY ACTUARIAL SOCIETY. ‘‘GeneralPrinciples of Actuarial Science,’’ April 30, 1997, discus-sion draft.

COOK, R.D., AND JOHNSON, M.E. 1981. ‘‘A Family of Distributionsfor Modeling Non-Elliptically Symmetric MultivariateData,’’ Journal of the Royal Statistical Society B 43:210–218.The Cook and Johnson’s family of copulas was first pro-posed in this paper. Designed for joint distributions thatare not elliptically symmetric, the family includes, as spe-cial cases, the multivariate Burr, logistic, and Pareto dis-tributions. Finally, the family was used to fit a uraniummining dataset.

COX, D.R., AND OAKES, D. 1984. Analysis of Survival Data. NewYork: Chapman and Hall.

DAVID, H.A., AND MOESCHBERGER, M.L. 1978. The Theory ofCompeting Risks. New York: MacMillan Publications.

ELANDT-JOHNSON, R.C., AND JOHNSON, N.L. 1980. Survival Mod-els and Data Analysis. New York: John Wiley.

FRANK, M.J. 1979. ‘‘On the Simultaneous Associativity of F(x,y)and x1y2F(x,y),’’ Aequationes Mathematicae 19:194–226.Frank’s copula first appeared in this paper as a solutionto a functional equation problem. That problem involvedfinding all continuous functions such that F(x,y) andx1y2F(x,y) are associative, defined in the paper. Asso-ciative functions are typically studies of concern in top-ological semigroups.

FREES, E. 1996. Data Analysis Using Regression Models: TheBusiness Perspective. Englewood Cliffs: Prentice Hall.

FREES, E., CARRIERE, J., AND VALDEZ, E. 1996. ‘‘Annuity Valua-tion with Dependent Mortality,’’ Journal of Risk and In-surance 63:229–261.Copulas are used to value annuities based on more thanone life in order to capture the possible dependence oflives. The study shows a reduction of approximately fivepercent in annuity values when dependent mortality mod-els are used, compared to standard models of independ-ence. The model was calibrated using data from a largeinsurer.

GALTON, F. 1885. ‘‘Regression Towards Mediocrity in HeredityStature,’’ Journal of Anthropological Institute 15:246–263.

GENEST, C. 1987. ‘‘Frank’s Family of Bivariate Distributions,’’Biometrika 74:549–555.Basic properties of the Frank’s family of copulas are fur-ther investigated in this paper. Members of this family ex-hibit properties of stochastically ordered and likelihoodratio dependent distributions. Nonparametric estimatorsof the dependence parameter are recommended. Simula-tion procedures were used to evaluate performance ofthese estimators.

GENEST, C., AND MACKAY, J. 1986. ‘‘Copules Archimediennes etFamilles de lois Bidimensionnelles Dont les Marges SontDonnees,’’ The Canadian Journal of Statistics 14:154–159.This paper studies Archimedean copulas. Limits of se-quences of members from this special class of copulas areinvestigated. Further, conditions for which these copulasexhibit properties of stochastically ordered distributionsare provided. (Written in French.)

GENEST, C., AND MACKAY, J. 1986. ‘‘The Joy of Copulas: Bivar-iate Distributions with Uniform Marginals,’’ The AmericanStatistician 40:280–283.The authors extend their previous work on Archimedeancopulas by proving properties using basic calculus only.In particular, they prove a result that relates the corre-sponding Archimedean function to the familiar Kendall’stau measure of association.

GENEST, C., AND RIVEST, L. 1993. ‘‘Statistical Inference Proce-dures for Bivariate Archimedean Copulas,’’ Journal of theAmerican Statistical Association 88:1034–1043.Assuming uniform marginals, this article introduces a pro-cedure for estimating the function which determines anArchimedean copula. The estimate is fully nonparametricand is (pointwise) root-n consistent. Using a uranium ex-ploration dataset, the article illustrates how to use thenew procedure to choose a copula.

GENEST, C., GHOUDI, K., AND RIVEST, L. 1995. ‘‘A Semi-para-metric Estimation Procedure of Dependence Parametersin Multivariate Families of Distributions,’’ Biometrika 82:543–552.

GOOVAERTS, M.J., DE VYLDER, F., AND HAEZENDONCK, J. 1982.‘‘Ordering of Risks: A Review,’’ Insurance: Mathematicsand Economics 1:131–163.

GOOVAERTS, M.J., DE VYLDER, F., AND HAEZENDONCK, J. 1984.Insurance Premiums: Theory and Applications. Amster-dam: North Holland.

GUMBEL, E.J. 1960. ‘‘Bivariate Exponential Distributions,’’ Jour-nal of the American Statistical Association 55:698–707.

HADAR, J., AND SEO, T.K. 1992. ‘‘A Note on Beneficial Changesin Random Variables,’’ The Geneva Papers on Risk andInsurance: Theory 17:171–179.

HEILMAN, W.R., AND SCHROTER, K.J. 1991. ‘‘Ordering of Risksand their Actuarial Applications,’’ in Stochastic Ordersand Decision under Risk, edited by K. Mosler and M. Scar-sini, Hayward, California: Institute of Mathematical Sta-tistics.

HOUGAARD, P. 1984. ‘‘Life Table Methods for HeterogeneousPopulations: Distributions Describing for Heterogeneity,’’Biometrika 71:75–83.

HOUGAARD, P. 1986. ‘‘A Class of Multivariate Failure Time Dis-tributions,’’ Biometrika 73:671–678.

HOUGAARD, P. 1987. ‘‘Modeling Multivariate Survival,’’ Scandi-navian Journal of Statistics 14:291–304.This reviews models for multivariate survival and dis-cusses practical situations for which they would be rele-vant. The author does not specifically suggest copulamodels for survival analysis but provides models based on

Page 22: Applications of Copulas

22 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

marginal distributions. Here, the joint survival distribu-tion is specified together with a single parameter to mea-sure dependence. One can therefore infer that the jointsurvival can be expressed in terms of copulas. The paperis therefore appropriate for discussion of situations forwhich copula models are suitable.

HOUGAARD, P., HARVALD, B., AND HOLM, N.V. 1992. ‘‘Measuringthe Similarities Between the Lifetimes of Adult DanishTwins Born Between 1881–1930,’’ Journal of the Ameri-can Statistical Association 87:17–24.

HUTCHINSON, T.P., AND LAI, C.D. 1990. Continuous BivariateDistributions, Emphasising Applications. Adelaide,South Australia: Rumsby Scientific Publishing.As the title indicates, this monograph organizes and de-scribes the many families of bivariate distributions thathave appeared in the literature. In addition to applica-tions, special attention is given to copula families. In lieuof developing material from the foundations, the mono-graph collects a wide variety of properties and potentialapplications about many distributions, with a citation oforiginal sources.

JAGGER, C., AND SUTTON, C.J. 1991. ‘‘Death After Marital Be-reavement—Is the Risk Increased?’’ Statistics in Medi-cine 10:395–404.

JOE, H. 1993. ‘‘Parametric Families of Multivariate Distributionswith Given Margins,’’ Journal of Multivariate Analysis 46:262–282.The author examines one-parameter bivariate families ofcopulas and describes some of their interesting properties.Whenever possible, these bivariate families were extendedto the multivariate setting and corresponding propertiesstudied. Several open problems are posed in the paper.

JOE, H. 1997. Multivariate Models and Dependence Concepts.London: Chapman and Hall.An excellent, although highly technical, research mono-graph that describes recent results in non-normal multi-variate models. Much of the monograph focuses on copulaconstructions. It also contains discussion of categorical,longitudinal, and time series data.

JOE, H., AND HU, T. 1996. ‘‘Multivariate Distributions fromMixtures of Max-Infinitely Divisible Distributions,’’ Jour-nal of Multivariate Analysis 57:240–265.This article explores the procedure based on the mixtureof powers to generate new families of copulas. The pro-cedure is based on distributions that are called max-infi-nitely divisible. Using this procedure, several families withclosed-form multivariate copulas can be obtained, butonly copulas that can be used for modeling positive de-pendence. It was mentioned in the discussion that exten-sions to allow for negative dependence will follow in asubsequent paper.

JOHNSON, N., AND KOTZ, S. 1972. Distributions in Statistics:Continuous Multivariate Distributions. New York: JohnWiley.

JOHNSON, N., KOTZ, S., AND BALAKRISHNAN, N. 1997. DiscreteMultivariate Distributions. New York: John Wiley.

JOHNSON, M., AND TENENBEIN, A. 1981. ‘‘A Bivariate DistributionFamily with Specified Marginals,’’ Journal of the Ameri-can Statistical Association 76:198–201.This paper provides a procedure to construct families ofbivariate distributions whose margins and measures of de-pendence are specified. The construction was based onthe method of weighted linear combination. While not di-rectly related to copulas, the procedure is well-suited forconstructing bivariate families of copulas.

JOHNSON, R., AND WICHERN, D. 1988. Applied Multivariate Sta-tistical Analysis. Englewood Cliffs, N.J.: Prentice Hall.

KAAS, R., HEERWAARDEN, A.E. VAN, AND GOOVAERTS, M.J. 1994.Ordering of Actuarial Risks. Amsterdam: North Holland.

KIMELDORF, G., AND SAMPSON, A.R. 1987. ‘‘Positive DependenceOrderings,’’ Annals of the Institute of Statistical Mathe-matics 39:113–128.

KIMELDORF, G., AND SAMPSON, A.R. 1989. ‘‘A Framework for Pos-itive Dependence,’’ Annals of the Institute of StatisticalMathematics 41:31–45.

KLEMENT, E.P. 1982a. ‘‘Construction of Fuzzy s-Algebras UsingTriangular Norms,’’ Journal of Mathematical Analysisand Applications 85:543–565.

KLEMENT, E.P. 1982b. ‘‘Characterization of Fuzzy MeasuresConstructed by Means of Triangular Norms,’’ Journal ofMathematical Analysis and Applications 86:345–358.

KLUGMAN, S., AND PARSA, A. 1995. ‘‘Fitting Bivariate Loss Dis-tributions with Plackett’s Model,’’ Paper presented at theCasualty Actuarial Society Ratemaking Seminar.

KLUGMAN, S., PANJER, H., VENTER, G., AND WILLMOT, G. 1997.Loss Models: From Data to Decisions. Unpublished Mon-ograph.

KOTZ, S. 1997. ‘‘Some Remarks on Copulas in Relation to Mod-ern Multivariate Analysis,’’ Contemporary MultivariateAnalysis and Its Applications k.1–k.13.

KRZANOWSKI, W.J. 1988. Principles of Multivariate Analysis: AUser’s Perspective. Oxford: Oxford University Press.

LEE, A.J. 1993. ‘‘Generating Random Binary Deviates HavingFixed Marginal Distributions and Specified Degrees of As-sociation,’’ The American Statistician 47:209–215.

LEMAIRE, J. 1990. ‘‘Fuzzy Insurance,’’ Astin Bulletin 20:33–55.LI, H., SCARSINI, M., AND SHAKED, M. 1996. ‘‘Linkages: A Tool for

the Construction of Multivariate Distributions with GivenNonoverlapping Multivariate Marginals,’’ Journal of Mul-tivariate Analysis 56:20–41.

MAGULURI, G. 1993. ‘‘Semiparametric Estimation of Associationin a Bivariate Survival Function,’’ Annals of Statistics 21:1648–1662.This article considers bivariate survival data with no cen-soring of each risk. For the dependency between risks, aone-parameter copula model is used. With nonparametricestimates of the marginal survival function, this paper de-rives the asymptotic lower bound for the information inthe copula parameter.

MAKEHAM, W.M. 1874. ‘‘On an Application of the Theory of Dec-remental Forces,’’ Journal of the Institute of Actuaries 18:317–322.

Page 23: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 23

MARSHALL, A.W., AND OLKIN, I. 1967. ‘‘A Multivariate Exponen-tial Distribution,’’ Journal of the American Statistical As-sociation 62:30–44.

MARSHALL, A.W., AND OLKIN, I. 1988. ‘‘Families of MultivariateDistributions,’’ Journal of the American Statistical Asso-ciation 83:834–841.Bivariate distributions with marginals as parameters canbe generated using the procedure of mixtures. If F and Gare univariate distribution functions, the distribution H(x)5*F u(x)dG(u) can be generated. The paper studies prop-erties of this mixture distribution and provides as exam-ples familiar copulas such as Frank’s, Cook and Johnson’s,and Gumbel’s. Extensions to the multivariate case arediscussed.

MEESTER, S.G., AND MACKAY, J. 1994. ‘‘A Parametric Model forCluster Correlated Categorical Data,’’ Biometrics 50:954–963.This article shows how to use copulas as a model for as-sociation in clustered or longitudinal data. The discussionof copulas is restricted to Frank’s family, although exten-sions to other parametric families are indicated. Symmet-ric dependencies are considered, which is natural forclustered data. The responses are from the generalizedlinear model family and, in particular, may be categorical.

METRY, M.H., AND SAMPSON, A.R. 1991. ‘‘A Family of Partial Or-derings for Positive Dependence Among Fixed MarginalBivariate Distributions,’’ in Advances in Probability Dis-tributions with Given Marginals: Beyond the Copulas,edited by G. Dall’Aglio, S. Kotz, and G. Salinetti. TheNetherlands: Kluwer Academic Publishers.

MEYER, J. 1992. ‘‘Beneficial Changes in Random Variables Un-der Multiple Sources of Risk and their ComparativeStatics,’’ The Geneva Papers on Risk and Insurance:Theory 17:7–19.

MOSLER, K., AND SCARSINI, M. EDS. 1991. Stochastic Orders andDecision under Risk. Hayward, California: Institute ofMathematical Sciences.

NELSEN, R.B. 1986. ‘‘Properties of a One-Parameter Family ofBivariate Distributions with Specified Marginals,’’ Com-munications in Statistics—Theory and Methods 15:3277–85.This paper explores properties of Frank’s one-parameterfamily of bivariate distributions. In particular, it offers for-mulas to evaluate three nonparametric measures of cor-relation, namely Spearman’s rho, Kendall’s tau, and theless familiar medial correlation coefficient. It also providesprocedures to generate observations from this bivariatedistribution, which may be used for simulation purposes.

OAKES, D. 1982. ‘‘A Model for Association in Bivariate SurvivalData,’’ Journal of the Royal Statistical Society B 44:414–422.The author offers an alternative procedure to estimate theassociation parameter in the Cook and Johnson’s familyof bivariate distributions, which has been used to modelassociation in bivariate survival data. The paper examinesthe likelihood procedure used to estimate the parameter

by earlier authors. As an alternative, it proposes the useof nonparametric procedures using Kendall’s coefficient.The asymptotic variance of this estimator is evaluated.

OAKES, D. 1989. ‘‘Bivariate Survival Models Induced by Frail-ties,’’ Journal of the American Statistical Association 84:487–493.This article shows how Archimedean copulas arise natu-rally in the context of frailty models. The work empha-sizes a special case of Marshall and Olkin (1989): bivariatesurvival times with a common latent mixing parameter.The paper introduces a ‘‘cross-ratio’’ function that pro-vides a useful diagnostic tool for identifying the survivaldistribution based on bivariate survival data.

OAKES, D. 1994. ‘‘Multivariate Survival Distributions,’’ Journalof Nonparametric Statistics, 343–354.This article reviews properties of latent mixtures of sur-vival distributions that give rise to frailty models.

OSTASZEWSKI, K. 1993. Fuzzy Set Methods in Actuarial Science.Schaumburg: Society of Actuaries.

PARKES, C.M., BENJAMIN, B., AND FITZGERALD, R.G. 1969. ‘‘BrokenHeart: A Statistical Study of Increased Mortality AmongWidowers,’’ British Medical Journal 1:740–743.

SCARSINI, M. 1984. ‘‘On Measures of Concordance,’’ Stochastica8:201–219.

SCHWEIZER, B. 1991. ‘‘Thirty Years of Copulas,’’ in Advances inProbability Distributions with Given Marginals: Beyondthe Copulas, edited by G. Dall’Aglio, S. Kotz, and G. Sal-inetti. The Netherlands: Kluwer Academic Publishers.This article reviews the early development of copulas. Thehistorical overview focuses on beginnings with a focus onprobability theory. Properties of n-dimensional distribu-tion functions are reviewed and their extensions to prob-abilistic metric spaces are discussed. Relationships totriangular norming functions are developed that providebounds on functions of distribution functions. The paperalso gives a useful introduction to the role of copulas fordeveloping measures of dependence and their relationshipwith the dependence axioms of Renyi.

SCHWEIZER, B., AND SKLAR, A. 1983. Probabilistic Metric Spaces.New York: North Holland.A probabilistic metric space is a set, or space, of distri-bution functions combined with a function, or metric, thatmeasures the distance between two elements of the set.This monograph develops probabilistic metric spaces. Aspecial feature is that an entire chapter (six) is devotedto copulas. Further, the relationship between copulas andtriangular norms is emphasized.

SCHWEIZER, B., AND WOLFF, E.F. 1981. ‘‘On NonparametricMeasures of Dependence for Random Variables,’’ The An-nals of Statistics 9:879–885.This paper uses copulas to define three nonparametricmeasures of dependence for pairs of random variables. Inparticular, the familiar measures of dependence Pearson’scorrelation, Spearman’s rho, and Kendall’s tau are ex-pressed in terms of copulas.

Page 24: Applications of Copulas

24 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 1

SEAL, H.L. 1967. ‘‘Studies in the History of Probability and Sta-tistics XV: The Historical Development of the Gauss Lin-ear Model,’’ Biometrika 54:1–24.

SEAL, H.L. 1977. ‘‘Multiple Decrements or Competing Risks,’’Biometrika 64:429–439.

SHIH, J.H., AND LOUIS, T.A. 1995. ‘‘Inferences on the AssociationParameter in Copula Models for Bivariate Survival Data,’’Biometrics 51:1384–1399.This paper studies bivariate survival data where the risksmay be dependent but do not preclude one another. Eachof the two risks is subject to censoring by an independentmechanism. The dependency between risks is modeledusing a one-parameter copula function. The marginal sur-vival functions are estimated (i) nonparametrically, usingKaplan-Meier estimators and (ii) parametrically. A two-stage estimation procedure is proposed; both finite andasymptotic properties are established. An AIDS datasetillustrates the procedures.

SKLAR, A. 1959. ‘‘Fonctions de repartition a n dimensions etleurs marges,’’ Publ. Inst. Stat. Univ. Paris 8:229–231.

SKLAR, A. 1973. ‘‘Random Variables, Joint Distribution Func-tions and Copulas,’’ Kybernetika 9:449–460.The author is one of the pioneers of copulas. In this paper,he proves elementary results that relate copulas to distri-bution functions and random variables. In particular,given a copula function C and an n-tuple distributionfunction (F1, . . . , Fn), he establishes the existence of aprobability space and a set of random variables X1, . . . ,Xn defined over that space with C as the associated copula.In addition, the paper describes structural properties of aspecial type of copula called associative copulas.

STIGLER, S.M. 1986. The History of Statistics: The Measurementof Uncertainty Before 1900. Cambridge, Mass.: HarvardUniversity Press.

SUNGUR, E.A. 1990. ‘‘Dependence Information in ParameterizedCopulas,’’ Communications in Statistics: Simulations 19:1339–1360.This paper provides mechanisms for extracting the de-pendence information from parameterized copulas. Theauthor defines various dependence functions to capturedependence information. Approximations to copulas usingTaylor’s expansion are discussed. Most of the derivationsrelate to two-dimensional copulas although a brief attemptto extend to three-dimensional copulas is made.

TCHEN, A. 1980. ‘‘Inequalities for Distributions with Given Mar-ginals,’’ Annals of Probability 8:814–827.

TIBILETTI, L. 1995. ‘‘Beneficial Changes in Random Variables viaCopulas: An Application to Insurance,’’ The Geneva Pa-pers on Risk and Insurance: Theory 20:191–202.

VAUPEL, J.W., MANTON, K.G. AND STALLARD, E. 1979. ‘‘The Im-pact of Heterogeneity in Individual Frailty on the Dynam-ics of Mortality,’’ Demography 16:439–454.

WANG, S. 1996. ‘‘Premium Calculation by Transforming theLayer Premium Density,’’ ASTIN Bulletin 26:71–92.

WANG, S. 1997. ‘‘PH-Measure of Portfolio Risks,’’ Chapter 9 ofunpublished lecture notes, Risk Measures with Applications

in Insurance Ratemaking and Actuarial Valuation. Water-loo, Ont.: University of Waterloo.

WARD, A. 1976. ‘‘Mortality of Bereavement,’’ British MedicalJournal 1:700–702.

YOUNG, V. 1993. ‘‘The Application of Fuzzy Sets to GroupHealth Underwriting,’’ Transactions of the Society of Ac-tuaries XLV:551–590.

ZHENG, M., AND KLEIN, J.P. 1995. ‘‘Estimates of Marginal Survivalfor Dependent Competing Risks Based on an AssumedCopula,’’ Biometrika 82:127–138.For bivariate survival data, it is well-known that withoutassumptions about the relationship between the competingsurvival times, the marginal survival functions are uniden-tifiable. This article assumes a known copula and intro-duces nonparametric estimators of the marginal survivalfunctions. The estimates reduce to the Kaplan-Meier prod-uct limit estimator in the case of independent survivaltimes. Further, bounds on the survival functions are pro-vided when the copula is known only approximately.

APPENDIX A

GUMBEL-HOUGAARD COPULAAND ITS PARTIAL DERIVATIVES

In this appendix, we derive the formulas needed toevaluate the likelihood in Equation (4.3) in the casein which we apply the Gumbel-Hougaard copula asgiven in Table 1. First, we re-express the Gumbel-Hougaard copula as follows:

aa a

2 ln C u, v 5 2 ln u 1 2 ln v .~ ! ~ ! ~ !@ #For simplicity, we denote C(u, v) by C. Now, take thepartial derivative with respect to u of both sides of theequation to get:

a21 a21(2 ln C) ]C (2 ln u)5 . (A.1)

C ]u u

Solving for ]C /]u, we have:

a21]C ln u C5 . (A.2)~ !]u ln C u

Applying symmetry, we get:

a21]C ln v C5 . (A.3)~ !]v ln C v

From Equation (A.1), we can take the partialderivative with respect to v to get:

Page 25: Applications of Copulas

UNDERSTANDING RELATIONSHIPS USING COPULAS 25

a21 2(2 ln C) ] CC ]u]v

a21]C ]C (2 ln C)212 (a 2 1)(2 ln C) 1 1 5 0.@ #2]u ]v C

Rearranging terms yields the second partial derivativeof the Gumbel-Hougaard copula:

2] C 1 ]C ]C215 (a 2 1)(2 ln C) 1 1 . (A.4)@ #

]u]v C ]u ]v

Equations (A.2), (A.3), and (A.4) are used to evaluatethe log-likelihood given in Equation (4.3).

APPENDIX B

TABLE OF ONE-PARAMETER FAMILY OF COPULAS

Family General Form of CopulaParameterConstraint Kendall’s Tau Spearman’s Rho

Ali-Mikhail-Haq uv [1 2 a(1 2 u)(1 2 v)]21 21 ≤ a ≤ 1 23a 2 2 2 12 1 2 ln (1 2 a)~ ! ~ !a 3 a

Complicated form

Cook-Johnson [u2a 1 v2a 2 1]21/a a ≥ 0 a

a 1 2Complicated form

Farlie-Gumbel-Morgenstern

uv[1 1 a(1 2 u)(1 2 v)] 21 ≤ a ≤ 1 2a

91

a3

Frank au av1 (e 2 1)(e 2 1)ln 1 1

a@ #a (e 2 1)a Þ 0

1 24

[D (2a) 2 1]1a1 2

12[D (2a) 2 D (2a)]2 1a

Gumbel-Hougaardexp 1/aa a2[(2ln u) 1 (2ln v) ]$ % a ≥ 1 1 2 a21 No closed form

Normal(Bivariate) H 21 21F u , F v~ ! ~ !~ !

where H is the bivariate normaldistribution function with cor-relation coefficient a and F21 isthe inverse of a univariate normaldistribution function.

21 ≤ a ≤ 1 2arcsin (a)

p6 a

arcsin~ !p 2

Plackett 121(a 2 1) 1 1 (a 2 1)(u 1 v) 2$2

21 1 (a 2 1)(u 1 v)@~ !1/2

1 4a(1 2 a)# %

a ≥ 0 No closed form (a 1 1) 2a ln a2

2(a 2 1) (a 2 1)

Page 26: Applications of Copulas

Name /7956/09 08/13/98 07:38AM Plate # 0 pg 143 # 3

DISCUSSIONS OF PAPERS ALREADY PUBLISHED 143

NAAJ (SOA)

2. The GB2 model, however, is not suitable for pricingor for reinsurance since it gives a lower value of xthan the gamma model for almost all values of p.

3. The mixture of two-gamma models, after applyingthe lemma below, becomes a four-parametermodel. Its log-likelihood is marginally better thanthat of GB2, and we expect its confidence regionis about the same width as the GB2.

Figure 4F(x�a, b, p, q) with (a, b, p, q) Runningthrough Its 95% Confidence Region

Mixture of Linear Exponential ObjectsFor both normal distribution and gamma distribution,the mean of the MLE-fitted model must match theempirical mean. For the normal, this result is foundin any Course 110 textbook. For the gamma, or mix-ture of gamma and normal, it’s not difficult to prove.

Lemma (Mixture of Exponential Family with a LinearTerm)

If X has the density function

f(x) � h(�, x)c(�, �) exp(��x), (1)

then

�c(�, �)

��E (X) � � xf(x) dx � (2)f c(�, �)

and the mean of the maximum likelihood estimateddensity function of the form (1) fitted to observeddata would match the empirical mean:

�c ˆ(�, �) x���E (X) � � . (3)f ˆc(�, �) n

And a similar result holds if X is a mixture of twodensities satisfying Equation (1)

f(x) � pf (x) � (1 � p)f (x)1 2

� ph (� , x)c (� , � ) exp(�� x)1 1 1 1 1 1

� (1 � p)h (� , x)c (� , � ) exp(�� x). (4)2 2 2 2 2 2

Then the formula mean of the maximum likelihoodestimated density function of the form (4) fitted toobserved data {x1 , . . . , xn} would match the empiricalmean:

�c(�, �) x���

E (X) � � .f c(�, �) n

This lemma says, for example, that a gamma mixedwith normal fitted by MLE will match the empiricalmean.

REFERENCE

COXETER, H.S.M. 1973. Regular Polytopes, 3rd ed. New York:Dover.

‘‘Understanding RelationshipsUsing Copulas,’’ by Edward Freesand Emiliano Valdez, January1998

CHRISTIAN GENEST,* KILANI GHOUDI†,AND LOUIS-PAUL RIVEST‡

This article aimed to introduce actuaries to ‘‘copu-las’’—that is, distributions whose univariate marginalsare uniform—as a tool for understanding relation-ships among multivariate outcomes. Through its lim-pid exposition of some of the recent developments in

*Christian Genest, Ph.D., is Professor of Statistics, Departement demathematiques et de statistique, Universite Laval, Sainte-Foy, Que-bec, Canada G1K 7P4, e-mail, [email protected].†Kilani Ghoudi, Ph.D., is Professor of Statistics, Departement demathematiques et d’informatique, Universite du Quebec a Trois-Rivieres, Trois-Rivieres, Quebec, Canada G9A 5H7, e-mail,[email protected].‡Louis-Paul Rivest, Ph.D., is Professor of Statistics, Departement demathematiques et de statistique, Universite Laval, Sainte-Foy, Que-bec, Canada G1K 7P4, e-mail, [email protected].

Page 27: Applications of Copulas

Name /7956/09 08/13/98 07:38AM Plate # 0 pg 144 # 4

144 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 3

NAAJ (SOA)

statistical modeling with copulas, this work and itssubstantial annotated bibliography should contributemuch to the dissemination of these techniques in ac-tuarial circles.

In their paper, the authors explain how it is possi-ble, using copulas, to dissociate the choice of mar-ginal distributions from that of a model describing thedependence between pairs of correlated variables suchas the economic loss and the medical indemnity com-ponent of a disability policy. Part of their discussionrevolves around the issue of identifying an appropriatefamily of copulas and the estimation of its parametersfrom a multivariate random sample. In Section 4.2.1,they show specifically how an indemnity payment X1

(loss) and an allocated loss adjustment expense(ALAE) X2 could profitably be modeled from a randomsample of 1,500 liability claims. The copula they selectis of the Archimedean variety, which means that forall 0 � u, v � 1, it may be written in the form

�1C(u, v) � � {�(u) � �(v)} (1)

for some decreasing, convex function �: (0,1] →[0, �) such that �(1) � 0, with the convention that��1(t) � 0 for all t � �(s) � �(0). In thatlims↓0context, they find that the generator �(t) � log�(1/t)of Gumbel’s family of copulas provides an adequate fit,given a suitable choice of parameter � � 1.

This discussion expands on the issue of selecting anappropriate copula for modeling purposes. Cast interms of the above example, the question to be ad-dressed is whether it is possible to improve on Gum-bel’s family as a model for describing the relationshipbetween variables loss and ALAE. Techniques for im-bedding copulas of the form (1) in larger models ofexchangeable and nonexchangeable variables are in-troduced, and the dataset considered by Frees andValdez is used to illustrate how such extensions canyield improvements in the fit of a model. Although theproposed methodology does not lead to a better so-lution in this particular application, it should proveuseful in a variety of contexts. Finally, the attentionof the actuarial community is directed to another po-tentially useful collection of bivariate copulas whereofGumbel’s family is a distinguished subset: the class ofextreme value copulas.

Generating Archimedean Copula ModelsAs pointed out by Frees and Valdez, and several othersbefore them, many well-known systems of bivariatedistributions have underlying copulas of the form (1).These Archimedean copulas (Genest and MacKay1986) provide a host of models that are versatile, in

terms of both the nature and strength of the associ-ation they induce between the variables. It is not sur-prising therefore that they have been used successfullyin a number of data-modeling contexts, especially inconnection with the notion of ‘‘frailty’’ (Clayton 1978,Oakes 1989, Zheng and Klein 1995, Bandeen-Rocheand Liang 1996, Day, Bryant and Lefkopoulou 1997).

Because copulas characterize the dependence struc-ture of a random vector once the effect of the mar-ginals has been factored out, identifying and fitting acopula to data poses special difficulties. The familiesof Clayton (1978), Gumbel (1960), and Frank (1979),for example, provide handy, one-parameter represen-tations of the association between variables Xi withmarginal distribution functions Fi . However, they areunsuitable in those situations in which the joint de-pendence structure of the uniform random variablesFi(Xi) is not exchangeable, because Archimedean cop-ulas in general are symmetric in their arguments. Un-fortunately, diagnostic procedures that can help delin-eate circumstances where model (1) is adequate haveyet to be analyzed.

Starting from the assumption that the Archimedeandependence structure is appropriate in a bivariatecontext, Frees and Valdez explain how the choice ofthe generator can be made, using the technique de-veloped by Genest and Rivest (1993). Given observa-tions from a random pair (X1 , X2) with distribution H,this procedure relies on the estimation of the univar-iate distribution function K(v) associated with the‘‘probability integral transformation,’’ V � H(X1 , X2).For each parametric Archimedean structure C� that isenvisaged, an estimator of � is obtained, and the�

are compared, graphically or otherwise, to a non-K ’s�

parametric estimator of K, whose stochastic behavioras a process has recently been studied by Barbe, Ge-nest, Ghoudi and Remillard (1996).

Having identified an Archimedean family of copulasthat provides a reasonable fit to a bivariate randomsample, are there simple ways of generating alterna-tive models that might improve this fit? One sugges-tion consists of enlarging the selected family throughvarious combinations of the following rules, which canbe used to construct nested classes of Archimedeangenerators. For ease of reference, this collection ofrules is presented as a proposition, whose proof isstraightforward and left to the reader.

Proposition 1

Suppose that � is the generator of a bivariate Archi-medean copula. In other words, assume that �: (0,1]

Page 28: Applications of Copulas

Name /7956/09 08/13/98 07:38AM Plate # 0 pg 145 # 5

DISCUSSIONS OF PAPERS ALREADY PUBLISHED 145

NAAJ (SOA)

→ [0, �) is a decreasing, convex function such that�(1) � 0.

(i) (right composition) If f: [0, 1] → [0, 1] is an in-creasing, concave bijection, then � � f(t) ��{f(t)} generates a bivariate Archimedean cop-ula.

(ii) (left composition) If f: [0, �) → [0, �) is an in-creasing, convex function such that f(0) � 0,then f � � generates a bivariate Archimedeancopula.

(iii) (scaling) If 0 � � � 1, then ��(t) � �(�t) � �(�)generates a bivariate Archimedean copula.

(iv) (composition via exponentiation) If (�)2 � ��and if generates a bivariate Archimedean cop-ula, then (e��) also generates a bivariate Archi-medean copula.

(v) (linear combination) If � and � are positive realsand if generates a bivariate Archimedean cop-ula, then � � �� also generates a bivariate Ar-chimedean copula.

To illustrate these composition rules, define �(t) �log(1/t) for all 0 � t � 1 and consider the functionf�(t) � (e�t � 1)/� on [0, �), which satisfies the re-quirements of part (ii) of the proposition for arbitrary� � 0. Then

��� (t) � f {�(t)} � (t � 1)/�, 0 � t � 1� �

is immediately recognized as the generator of Clay-ton’s family of bivariate copulas. Frank’s and manyother classical systems of bivariate Archimedean dis-tributions can be recovered in this fashion, often withthe generator log(1/t) of the independence copula asa starting point. Special cases of rules (i) and (ii) canbe found in the work of Oakes (1994) and Nelsen(1997). For some examples of Archimedean (as wellas non-Archimedean) copulas and for additional waysof combining them to build new bivariate and multi-variate distributions, see for example Joe (1993) orJoe and Hu (1996).

The usefulness of Proposition 1 is best demon-strated in a data analytic context. In their paper, forexample, Frees and Valdez envisage Clayton’s, Frank’s,and Gumbel’s bivariate copulas for modeling the lossand expense components of an insurance company’sindemnity claims. As explained below, a specific com-bination of composition rules (i) and (iii) makes itpossible to include these three families as special

cases of a single system of bivariate Archimedean cop-ulas, whose generator is given by

�1 � (1 � �)� (t) � log , 0 � t � 1� ��,�,� � �1 � (1 � �t )

(2)

with arbitrary � � 0, � � 1 and 0 � � � 1.The interest of this enlarged model for inferential

purposes is twofold. On one hand, it may sometimesprovide a significantly better fit of the data than anysubmodel. On the other, it suggests a simple statisti-cal procedure for choosing between the one-parameter models: only that which is ‘‘closest’’ to thefull model, in some sense, need be chosen. Tests ofsignificance of the appropriate parameters can beused to assist in this choice.

A similar strategy can be designed to handle situa-tions in which nonexchangeability is suspected be-tween the Fi(Xi)’s. It is briefly described in the nextsection, and a concrete application is given in the fol-lowing section using the liability data of Frees and Val-dez. The following paragraphs substantiate some ofthe above claims concerning the three-parameter,bivariate Archimedean family of copulas generatedby (2).

To check that ��,�,� is a valid generator of bivariateArchimedean copulas, start from �(t) � log(1/t), 0 �t � 1, and use the fact that f�(t) � 1 � (1 � t)� isincreasing and concave on [0, 1] for � � 1 to con-clude from (i) that �1(t) � �log{1 � (1 � t)�} gen-erates a bivariate Archimedean copula. Next, applyrule (iii) to see that �2(t) � �1(�t) � �1(�) also gen-erates a bivariate Archimedean copula. Finally, ob-serve that since t� is a concave, increasing function on[0, 1] for 0 � � � 1, ��,�,�(t) � �2(t

�) is indeed abivariate Archimedean copula generator because ofrule (i). Note that function (2) is actually convex anddecreasing even when � � 1.

The parameter values of (2) that yield the Clayton,Frank and Gumbel families of copulas are given in Ta-ble 1, in which the notation o(x) stands for a functionf(x) satisfying f(x)/x → 0 as x → 0. Note in particularthat Clayton’s and Gumbel’s families of bivariate cop-ulas obtain when � and 1 � � are both going to zero.If � converges to zero faster than 1 � �, Clayton’ssystem is actually the right limit, while when 1 � �

is going to zero faster than �, the limit is a Gumbelcopula.

Page 29: Applications of Copulas

Name /7956/09 08/13/98 07:38AM Plate # 0 pg 146 # 6

146 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 3

NAAJ (SOA)

Table 1Parameter Values of Archimedean Generator (2)

Yielding Clayton’s, Frank’s,and Gumbel’s Families

with Dependence Parameter �

Family � � �

Clayton o(1 � �) (1 � �)�/� ↑ 1

Frank 1 ↑ � ��/�

Gumbel ↓ 0 � 1 � o(�)

Generating Asymmetric CopulasAll bivariate copulas C in the Archimedean class sat-isfy the exchangeability condition C(u, v) � C(v, u)on their domain. If a family of copulas of form (1) isenvisaged as a model in a situation in which the ap-propriateness of this symmetry condition is doubtful,one may wish to enlarge the system to include nonex-changeable models. The following proposition showshow this can be done and paves the way to a simpletest of nonexchangeability, which is illustrated in thenext section.

Proposition 2

Let C be an exchangeable bivariate copula. A familyof nonexchangeable bivariate copulas C�, with para-meters 0 � �, � 1 that includes C as a limiting caseis defined by

1�� 1� � C (u, v) � u v C(u , v ), 0 � u, v � 1.�,

This mechanism for generating asymmetric copulaswas first studied by Khoudraji (1995) in an unpub-lished doctoral dissertation prepared under the super-vision of the first and third authors. An interestingproperty of this nonexchangeable model is that it iseasy to generate random variates distributed accord-ing to C�, . Indeed, if the pair (U1 , V1) is drawn fromcopula C(u, v) and if U2 and V2 are independent ob-servations from a uniform distribution on the interval[0, 1], then C�, (u, v) is the joint distribution of

1/� 1/(1��) 1/ 1/(1� )U � max{U , U }, V � max{V , V }.1 2 1 2

The verification of this assertion also constitutesa proof that the function C�, defined in Proposition2 is always a bivariate copula. A natural extension ofthis result is the case in which the pair (U2 , V2) isdistributed as a copula D(u, v), which may be differentfrom independence. The details are given in Chapter4 of Khoudraji (1995).

Seeking Improvements in the Fit of aModelTo illustrate concretely how Propositions 1 and 2 canhelp enhance the fit provided by an Archimedean cop-ula model, it is convenient to reconsider the loss andALAE data of Frees and Valdez. Specifically, this sec-tion investigates whether a better model than Gum-bel’s can be found for these data, either using copula

�1C(u, v) � � {� (u) � � (v)} (3)�,�,� �,�,� �,�,�

or an asymmetric copula constructed by the methoddescribed in the previous section.

In their paper, Frees and Valdez explain how maxi-mum-likelihood-based computer algorithms can assistin estimating simultaneously the parameters associ-ated with the marginal distributions of a random pair(X1 , X2) and those which correspond to the family ofcopulas selected as a model for dependence. Applyingthis procedure to the three-parameter copula (3) andPareto marginals, say, would thus require the esti-mation of seven parameters altogether, nine if thecopula were also expanded via Proposition 2 to checkfor asymmetry.

While this approach is sound and straightforward toimplement, it might yield inappropriate values of thedependence parameters �, � and � (and eventually �

and ) if the parametric models chosen for X1 and X2

turned out to be incorrect. There is no serious reasonto doubt this choice in the present case, but as a gen-eral precaution, one may wish to ensure that uncer-tainty about the marginals does not affect unduly theparameter estimates of the copula model. In otherwords, a parametric estimation procedure that is ro-bust to the choice of marginal distributions is calledfor.

A margin-free parameter estimation procedure forcopulas has been described in broad, nontechnicalterms by Oakes (1994). This semiparametric tech-nique was subsequently developed and studied in Ge-nest, Ghoudi and Rivest (1995) and adapted to thecase of censoring by Shih and Louis (1995). In thatpseudo-likelihood-based approach, the contribution ofan uncensored individual bivariate data point to thelikelihood is, in the notation of Frees and Valdez,

C {F (x ), F (x )},12 1n 1 2n 2

where C12 is the mixed partial derivative of the spec-ified parametric copula, and Fin(xi) stands for the em-pirical distribution function of Xi , i � 1, 2. In the ap-plication at hand, F1n(x) and F2n(x) would thus be theproportions of claims for which loss X1 and ALAE X2

are less than or equal to x, respectively.

Page 30: Applications of Copulas

Name /7956/09 08/13/98 07:38AM Plate # 0 pg 147 # 7

DISCUSSIONS OF PAPERS ALREADY PUBLISHED 147

NAAJ (SOA)

Table 2Maximized Pseudo-log-likelihood and Parameter Estimates,

with Their Standard Errors,Associated with Four Archimedean Copula Models

Family Pseudo-Log-Likelihood�

(s. e.)�

(s. e.)�

(s. e.)

Clayton 93.8 0.52 (0.032) — —Frank 172.5 �3.10 (0.57) — —Gumbel 207.0 1.44 (0.032) — —Three-Parameter 210.1 0.138 (0.147) 1.55 (0.92) 0.9986 (0.0025)

When the copula C considered involves three pa-rameters, as in model (3), computing the mixed par-tial derivative C12 and maximizing the pseudo-likelihood may seem like an insurmountable task. For-tunately, there is no need to have a closed analyticalexpression for C12 to produce numerical estimates ofthe dependence parameters. Using a software for sym-bolic manipulations, such as Maple, one can feed in��,�,� and its inverse and get, as an output, subrou-tines for the numerical evaluation of C and C12 . Thesesubroutines can then be linked to a general-purposemaximization program that finds the pseudo-likelihood estimates.

The calculations presented below were obtainedwith a Fortran program developed by S. G. Nash fromGeorge Mason University. Formulas for calculating thestandard errors of the estimates are given in Genest,Ghoudi and Rivest (1995). The slight censoring in X1

was ignored in the calculations, because the numeri-cal results presented by Frees and Valdez suggest thatthis has a negligible impact on the estimates. Alter-natively, the pseudo-likelihood procedure of Shih andLouis (1995) could be implemented to account forthe censoring in X1 .

The semiparametric parameter estimates for Clay-ton’s, Frank’s, and Gumbel’s families of copulas aregiven in Table 2, along with those of the three pa-rameters of the Archimedean model generated by (2).The estimates obtained for Frank’s and Gumbel’smodels are very close to those reported by Frees andValdez (1998). This provides indirect evidence thattheir choice of Pareto distributions for the marginalsis adequate.

As explained earlier, the graphical technique sug-gested by Genest and Rivest (1993) can be used toinvestigate the fit of the four Archimedean models athand. However, the plots obtained with parameters es-timated from the pseudo-likelihood tend to be moresensitive to model misspecification than those con-structed with estimates derived from Kendall’s tau.

Alternatively, the value of the maximized pseudo-log-likelihood may be used, formally or informally, tojudge the models’ relative merits.

In Table 2, 1 is much smaller than and nei-� � �,ther parameter is significantly different from 0. Asmentioned earlier, this is a situation in which Gum-bel’s family is indicated. The small difference observedbetween the maximized pseudo-log-likelihoods for thethree-parameter model and Gumbel’s sub-model confirms that the latter fits well.

Using Gumbel’s copula as a starting point, we mayalso wish to investigate the issue of asymmetry. Thiscan be done easily with the technique presentedabove. The maximum pseudo-log-likelihood is 207.3and the parameter estimates are � 1.47 (s. e. ��

0.07), (s. e. � 0.09), � 1. Because isˆ� � 0.94 �

not significantly different from 1, it appears that inthis case, the introduction of asymmetry in Gumbel’scopula does not improve the fit.

Extreme Value CopulasWhenever Gumbel’s family of copulas seems adequatefor a bivariate dataset, one may also wish to checkwhether a better fit could be achieved by another sys-tem in the maximum extreme value class. These arecopulas that can be expressed as

log(u)C (u, v) � exp log(uv)A ,� � ��A log(uv)

0 � u, v � 1 (4)

in terms of a dependence function A: [0, 1] →[1/2, 1], which is convex and verifies A(t) � max (t,1 � t) for all 0 � t � 1. It is easy to see that thechoice

� � 1/�A(t) � {t � (1 � t) } , 0 � t � 1

corresponds to Gumbel’s family of copulas, which isthe only one that can be written simultaneously in the

Page 31: Applications of Copulas

Name /7956/09 08/13/98 07:38AM Plate # 0 pg 148 # 8

148 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 2, NUMBER 3

NAAJ (SOA)

forms (1) and (4) (Genest and Rivest 1989). Addi-tional parametric families of maximum extreme valuecopulas are given by Tawn (1988) and Anderson andNadarajah (1993), among others.

The terminology for model (4) can be justified asfollows. Let (X11 , X21), . . . , (X1n , X2n) be a randomsample from bivariate distribution H, define Min �max(Xi1 , . . . , Xin), i � 1, 2, and suppose that thereexist constants ain � 0 and bin for which the pair

M � b M � b1n 1n 2n 2n,� a a1n 2n

has a non-degenerate, joint limiting distribution H*.The marginal distributions of H* then belong to lo-cation-scale families based either on the ‘‘extremevalue’’ distribution [exp(�e�x), �� � x � �], theFrechet distribution [exp(�x��), x � 0, � � 0], orthe Weibull distribution [exp{�(�x)�}, x � 0, � � 0](see Galambos 1987 for details). What may be lessfamiliar, however, is the result of Pickands (1981) tothe effect that the underlying copula associated withH* is necessarily of the form (4). Given the long-standing concern of actuaries for the prediction of ex-treme events and their related costs, maximum ex-treme value copulas should be of considerable appealin this area, where they would be expected to ariserather naturally. For a data-oriented presentation ofunivariate extreme value theory, with applications toinsurance and other fields, see the recent book byBeirlant, Teugels and Vynckier (1996).

Of course, copulas in the class (4) can provide anappropriate model of dependence between variables,whether the marginals are of one of the above threetypes or not. The appropriateness of a family of ex-treme value copulas can be determined withoutknowledge of the marginals—as it should—using theprocedure recently developed by Ghoudi, Khoudrajiand Rivest (1998). Their test is based on the fact thatif (X1 , X2) is an observation from H*, the distributionfunction K of V � H*(X1 , X2) has the form K(v) �v � (1 � �)v log(v) for 0 � v � 1, where

1 t(1 � t)� � � dA(t)

0 A(t)

is the population value of Kendall’s tau. Then � �8E(V ) � 9E(V2) � 1 � 0, and since the moments ofV are easy to estimate, a test statistic, Z, can be de-fined that rejects model (4) if the sample version of�, divided by a jackknife estimate of its standard de-viation, is significantly different from zero, as com-pared to the standard normal law.

When applied to the data of Frees and Valdez, thetest of Ghoudi, Khoudraji and Rivest (1998) yields a

value of z � 0.06 that is clearly insufficient to rejectthe maximum extreme value copula model. Becausethe procedure is consistent for Archimedean alterna-tives, this provides additional evidence that Clayton’sand Frank’s families are inferior to Gumbel’sin this case. This is not to say that the latter familyis best within the class of extreme value copulas,however.

Since dependence functions are univariate, thesearch for the best copula model of the form (4) canproceed along similar lines as for Archimedean copu-las. To choose between various parametric families ofdependence functions A� that have been fitted to thedata using the pseudo-likelihood approach of Genest,Ghoudi and Rivest (1995), one may plot the A ’s�

against a nonparametric estimator An . Tawn (1988)suggests that the classical estimator of Pickands(1981) or the variant due to Deheuvels (1991) can beused for this purpose, but a much more efficient pro-cedure has since been proposed by Caperaa, Fougeresand Genest (1997).

Alternatively, the following proposition providesways of embedding specific parametric families of de-pendence functions into larger systems, so that formalor informal selection procedures may be based as be-fore on the corresponding maximized pseudo-log-likelihoods.

Proposition 3

Let A and B be two dependence functions.(i) (convex combination) If 0 � � 1, then A � (1

� )B is a dependence function.(ii) (asymmetrization) If 0 � �, � 1, and if u de-

notes 1 � u for arbitrary 0 � u � 1, the followingformula defines a dependence function:

�t¯E(t) � (�t � t)A � ¯�t � t

�t¯ ¯ ¯� (�t � t)B , 0 � t � 1. (5)� ¯ ¯ ¯�t � t

As with the previous propositions, the above list isnot exhaustive. Rule (i), which is mentioned by Tawn(1988), is the special case of (ii) corresponding to� � . Function (5) is the dependence function of theextreme value copula defined by

1�� 1� � C (u , v )C (u , v ) (6)A B

for all 0 � u, v � 1. Rule (ii) is thus a consequenceof the general asymmetrization process alluded to be-low the statement of Proposition 2.

Page 32: Applications of Copulas

Name /7956/09 08/13/98 07:38AM Plate # 0 pg 149 # 9

DISCUSSIONS OF PAPERS ALREADY PUBLISHED 149

NAAJ (SOA)

To illustrate this final point, suppose that Gumbel’smodel, CB , is enlarged via (6) using the independencecopula CA(u, v) � uv, generated by A � 1. The re-sulting asymmetric model, already investigated at theend of the previous section, is then a three-parameter,maximum extreme value family of copulas, whose de-pendence functions are of the form

� � � � 1/�E(t) � {�t � (1 � t)} � {� t � (1 � t) } ,

0 � t � 1.

Those who are familiar with the literature on mul-tivariate extreme value theory will have recognized theclass of dependence functions of what Tawn (1988) orSmith, Tawn and Yuen (1990) refer to as the asym-metric logistic model.

ACKNOWLEDGMENTSThis work was supported in part through researchgrants from the Natural Sciences and Engineering Re-search Council of Canada, from the Fonds pour la for-mation de chercheurs et l’aide a la recherche du Gou-vernement du Quebec, from the Fonds institutionnelde la recherche de l’Universite du Quebec a Trois-Rivieres, and from the Fonds national de la recherchescientifique du Royaume de Belgique.

REFERENCES

ANDERSON, C.W., AND NADARAJAH, S. 1993. ‘‘Environmental Fac-tors Affecting Reservoir Safety,’’ in Statistics for the Envi-ronment, edited by V. Barnett and F. Turkman. New York:John Wiley.

BANDEEN-ROCHE, K.J., AND LIANG, K.-Y. 1996. ‘‘Modelling Failure-Time Associations in Data with Multiple Levels of Cluster-ing,’’ Biometrika 83:29–39.

BARBE, P., GENEST, C., GHOUDI, K., AND REMILLARD, B. 1996. ‘‘OnKendall’s Process,’’ Journal of Multivariate Analysis 58:197–229.

BEIRLANT, J., TEUGELS, J.L., AND VYNCKIER, P. 1996. PracticalAnalysis of Extreme Values. Leuven: Leuven UniversityPress.

CAPERAA, P., FOUGERES, A.-L., AND GENEST, C. 1997. ‘‘A Nonpar-ametric Estimation Procedure for Bivariate Extreme ValueCopulas,’’ Biometrika 84:567–77.

CLAYTON, D.G. 1978. ‘‘A Model for Association in Bivariate LifeTables and its Application in Epidemiological Studies ofFamilial Tendency in Chronic Disease Incidence,’’ Biomet-rika 65:141–51.

DAY, R., BRYANT, J., AND LEFKOPOULOU, M. 1997. ‘‘Adaptation ofBivariate Frailty Models for Prediction, With Applicationto Biological Markers as Prognostic Indicators,’’ Biomet-rika 84:45–56.

DEHEUVELS, P. 1991. ‘‘On the Limiting Behavior of the PickandsEstimator for Bivariate Extreme-Value Distributions,’’ Sta-tistics and Probability Letters 12:429–39.

FRANK, M.J. 1979. ‘‘On the Simultaneous Associativity of F(x, y)and x � y � F(x, y),’’ Aequationes Mathematicae 19:194–226.

GALAMBOS, J. 1987. The Asymptotic Theory of Extreme OrderStatistics, 2nd ed. Malabar, Fla.: Krieger.

GENEST, C., GHOUDI, K., AND RIVEST, L.-P. 1995. ‘‘A Semipara-metric Estimation Procedure of Dependence Parametersin Multivariate Families of Distributions,’’ Biometrika 82:543–55.

GENEST, C., AND MACKAY, R.J. 1986. ‘‘Copules archimedienneset familles de lois bidimensionnelles dont les marges sontdonnees,’’ The Canadian Journal of Statistics 14:145–59.

GENEST, C., AND RIVEST, L.-P. 1989. ‘‘A Characterization of Gum-bel’s Family of Extreme Value Distributions,’’ Statisticsand Probability Letters 8:207–211.

GENEST, C., AND RIVEST, L.-P. 1993. ‘‘Statistical Inference Pro-cedures for Bivariate Archimedean Copulas,’’ Journal ofthe American Statistical Association 88:1034–43.

GHOUDI, K., KHOUDRAJI, A., AND RIVEST, L.-P. 1998. ‘‘Proprietesstatistiques des copules de valeurs extremes bidimension-nelles,’’ The Canadian Journal of Statistics 26:187–97.

GUMBEL, E. 1960. ‘‘Distributions des valeurs extremes en plu-sieurs dimensions,’’ Publications de l’Institut de statistiquede l’Universite de Paris 9:171–73.

JOE, H. 1993. ‘‘Parametric Families of Multivariate Distributionswith Given Margins,’’ Journal of Multivariate Analysis 46:262–82.

JOE, H., AND HU, T. 1996. ‘‘Multivariate Distributions fromMixtures of Max-Infinitely Divisible Distributions,’’ Journalof Multivariate Analysis 57:240–65.

KHOUDRAJI, A. 1995. ‘‘Contributions a l’etude des copules et ala modelisation des valeurs extremes bivariees.’’ Ph.D. the-sis, Universite Laval, Quebec, Canada.

NELSEN, R.B. 1997. ‘‘Dependence and Order in Families ofArchimedean Copulas,’’ Journal of Multivariate Analysis60:111–22.

OAKES, D. 1989. ‘‘Bivariate Survival Models Induced by Frail-ties,’’ Journal of the American Statistical Association 84:487–93.

OAKES, D. 1994. ‘‘Multivariate Survival Distributions,’’ Journalof Nonparametric Statistics 3:343–54.

PICKANDS, J. 1981. ‘‘Multivariate Extreme Value Distributions,’’Bulletin of the International Statistical Institute 859–78.

SHIH, J.H., AND LOUIS, T.A. 1995. ‘‘Inferences on the AssociationParameter in Copula Models for Bivariate Survival Data,’’Biometrics 51:1384–99.

SMITH, R.L., TAWN, J.A., AND YUEN, H.K. 1990. ‘‘Statistics of Mul-tivariate Extremes,’’ International Statistical Review 58:47–58.

TAWN, J.A. 1988. ‘‘Bivariate Extreme Value Theory: Models andEstimation,’’ Biometrika 75:397–415.

ZHENG, M., AND KLEIN, J.P. 1995. ‘‘Estimates of Marginal Survivalfor Dependent Competing Risks Based on an AssumedCopula,’’ Biometrika 82:127–38.

Page 33: Applications of Copulas

Name /7991/05 01/21/99 09:41AM Plate # 0 pg 137 # 1

137

NAAJ (SOA)

DISCUSSIONS OF PAPERS ALREADY PUBLISHED

‘‘UNDERSTANDING RELATIONSHIPSUSING COPULAS,’’ EDWARD FREES ANDEMILIANO VALDEZ, JANUARY 1998

SHAUN S. WANG*Recently the actuarial profession has witnessed asurge of interest in correlation models. This demandarises from many areas of actuarial practice, includingmodeling of some key financial market variables, mul-tiple asset defaults, and catastrophe insurance losses.Thus, I was delighted to see the publication of the finepaper by Drs. Frees and Valdez; it provides an excel-lent review of state-of-the-art copula models.

In the last year I had the opportunity to undertakea research project on ‘‘Combining Correlated Risks’’initiated by the Casualty Actuarial Society Committeeon Theory of Risks. In doing that project, I investi-gated various models and methods for combining mul-tiple correlated lines of business. Here I share someinsights we gained into the correlation models andtheir relations to copula. In this discussion, I empha-size several points:1. Drs. Frees and Valdez devote most of their discus-

sion to the Archimedean family of copulas. From apractical point of view, I would like to advocate theuse of the ‘‘normal copula.’’ This non-Archimedeancopula is especially powerful in modeling multiple(more than two) correlated variables. This is be-cause only the normal copula allows an arbitrarycorrelation matrix yet still lends itself to efficientsimulation techniques.

2. The copula formula was initially intended to be ap-plied to cumulative distribution functions. Here Iwould like to show that copula formula can also beapplied to probability-generating functions as wellas characteristic functions. As a result of this in-novative use of the copula formulas, we get someinteresting multivariate distributions, which leadto efficient numerical techniques for calculatingthe aggregate loss distribution of correlated risks.

3. Although every correlation structure implicitly cor-responds to a copula, the underlying copula maynot be explicitly expressible. In fact, some very use-ful correlation models are best described in terms

*Shaun Wang, A.S.A., Ph.D., is an Associate Actuary at the SCORReinsurance Company, One Pierce Place, Itasca, Illinois 60143-4049,e-mail, scorus/itasca/swang%[email protected].

other than copula. I discuss some of these corre-lation models, especially with regard to modelingcorrelated claim-frequency variables.

1. THE NORMAL COPULA ANDSIMULATION

In general, the modeling and combining of correlatedrisks are most straightforward if the correlated riskshave a multivariate normal distribution. In this sec-tion, we use the multivariate normal distribution toconstruct the normal copula and then use that to gen-erate multivariate distributions with arbitrary mar-ginal distributions. The normal copula enjoys muchflexibility in the selection of correlation parameters,and lends itself well to simple Monte Carlo simulationtechniques.

Assume that (Z1, � � � , Zk) have a multivariate nor-mal distribution with standard normal marginals Zj �N(0, 1) and a positive definite correlation matrix

1 � � � � �12 1k

� 1 � � � �21 2k� � ,� � �

� � �� � �� �

� � � � � 1k1 k2

where � � � is the correlation coefficient betweeni j j i

Z i and Zj. Then (Z1, � � � , Zk) have a joint p.d.f.:

1 1�1f(z , � � � , z ) � exp � z� � z ,� �1 k n 2�(2�) ���

z � (z , � � � , z ). (1.1)1 k

From the correlation matrix � we can construct alower triangular matrix

b 0 � � � 011

b b � � � 021 22B � � � �� � �� � �� �

b b � � � bk1 k2 kk

such that � � BB�. In other words, the correlationmatrix � equals the matrix product of B and its trans-pose B�. The elements of the matrix B can be calcu-lated from the following Choleski’s algorithm (Burdenand Faires 1989, Sec. 6.6; Johnson 1987, Sec. 4.1):

Page 34: Applications of Copulas

Name /7991/05 01/21/99 09:41AM Plate # 0 pg 138 # 2

138 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 3, NUMBER 1

NAAJ (SOA)

j�1

� � b bi j is j ss�1b � , 1 � j � i � n, (1.2)i j

j�121 � b js

s�1

with the convention that � (.) � 0. Note that:0s�1

• For i � j, the denominator of Equation (1.2) equalsbj j.

• The elements of B should be calculated from top tobottom and from left to right.The following simulation algorithm can be used to

generate multivariate normal variables with a jointp.d.f. given by Equation (1.1) (Herzog 1986; Fishman1996, pp. 223–24).Step 1. Construct the lower triangular matrix B �

(bi j) by using Equation (1.2).Step 2. Generate a column vector of independent

standard normal variables Y � (Y1, � � � , Yk)�.Step 3. Take the matrix product Z � BY of B and

Equation Y. Then Z � (Z1, � � � , Zk)� has therequired joint p.d.f. given by Equation (1.1).

Let �(.) represent the c.d.f. of the standard normaldistribution

z 1 2�t /2�(z) � � e dt.�� �2�

Then �(Z1), � � � , �(Zk) have a multivariate uniformdistribution with Kendall’s tau (Frees and Valdez1998, p. 25)

2�[�(Z ), �(Z )] � �(Z , Z ) � arcsin(� ),i j i j ij�

and (Spearman) rank correlation coefficient

�6 ijRankCorr(Z , Z ) � arcsin ,� i j � 2

where arcsin(x) is an inverse trigonometric functionsuch that sin(arcsin(x)) � x.

The following result can be easily verified but nev-ertheless is stated as a theorem due to its importance.

Theorem 1.1

Assume that (Z1, � � � , Zk) have a multivariate normaljoint p.d.f. given by Equation (1.1), with correlationcoefficient �ij � �(Zi, Zj). Let H(z1, � � � , zk) be theirjoint cumulative distribution function. Then

�1 �1C(u , � � � , u ) � H[� (u ), � � � , � (u )]1 k 1 k

defines a multivariate uniform c.d.f., the normalcopula.

For any set of given marginal c.d.f.’s F1, � � � , Fk , thevariables

�1 �1X � F [�(Z )], � � � , X � F [�(Z )]1 1 1 k k k

have a joint c.d.f.

F (x , � � � , x )X ,���,X 1 k1 k

�1 �1� H{� [F (x )], � � � , � [F (x )]}1 1 k k

with marginal c.d.f.’s F1, � � � , Fk. For the multivariatevariables (X1, � � � , Xk), we have Kendall’s tau

2�(X , X ) � �(Z , Z ) � arcsin(� )i j i j ij�

and Spearman’s rank correlation coefficient

RankCorr(X , X ) � RankCorr(Z , Z )i j i j

�6 ij� arcsin .�

� 2

Although the normal copula does not have a simpleanalytical expression, it lends itself well to a very sim-ple Monte Carlo simulation algorithm.

Suppose that we are given a set of correlated risks(X1, � � � , Xk) with marginal c.d.f.’s F , � � � , F andX X1 k

Kendall’s tau � � �(Xi, Xj), or rank correlation coef-ij

ficient RankCorr(Xi , Xj). If we assume that (X1, � � pd ,Xk) can be described by the normal copula in Theorem1.1, then the following Monte Carlo simulation pro-cedures can be used:Step 1. Convert the given Kendall’s tau or rank cor-

relation coefficient to our usual measure ofcorrelation for multivariate normal variables,

�� � sin �� ij ij2

�� 2 sin RankCorr(X , X ) ,� �i j6

and construct the lower triangular matrixB � (bij) by Equation (1.2).

Step 2. Generate a column vector of independentstandard normal variables Y � (Y1, � � � , Yk)�.

Step 3. Take the matrix product of B and Y: Z � (Z1,� � � , Zk)� � BY. Set ui � �(Zi) for i � 1, � � � ,k.

Step 4. Set Xi � F (ui) for i � 1, � � � , k.�1Xi

Theorem 1.1 and the associated simulation algo-rithm provide a powerful tool for generating corre-lated variables. The normal copula is very flexible be-cause it allows any (symmetric, positive definite)

Page 35: Applications of Copulas

Name /7991/05 01/21/99 09:41AM Plate # 0 pg 139 # 3

DISCUSSIONS OF PAPERS ALREADY PUBLISHED 139

NAAJ (SOA)

matrix of rank correlation coefficients (or alterna-tively, Kendall’s tau). The use of this algorithm im-plicitly assumes that the underlying multivariables canbe described by a normal copula.

The normal copula is very practical because thereare computer programs readily available for MonteCarlo simulation using a normal copula. For example,the Palisade Corporation product RISK, a MicrosoftExcel add-in, implicitly utilizes the normal copula con-cept (Iman and Conver 1982).

2. AN INNOVATIVE USE OF THECOPULA FORMULA

The copula formula was initially intended to be ap-plied to cumulative distribution functions. Here Idemonstrate that, as a mathematical innovation,a copula formula can be applied to probability-generating functions or characteristic functions. Withthese new constructions, the resulting correlationstructures lend themselves to efficient numerical al-gorithms for combining correlated risks (that is, cal-culating the aggregate loss distribution for the sumof the correlated risks).

2.1 Some Basics of MultivariablesIn addition to joint cumulative distribution functionsF (t1, � � � , tk), other standard tools for multivar-X ,���,X1 k

iate random variables (X1, � � � , Xk) include the jointp.g.f., joint m.g.f, and joint ch.f., which are defined asfollows (see Johnson et al. 1997, pp. 2–12):

X X1 kP (t , � � � , t ) � E[t � � � t ]X ,���,X 1 k 1 k1 k

¡(t X �����t X )]1 1 k k� (t , � � � , t ) � E[eX ,���,X 1 k1 k

¡t ¡t1 k� P (e , � � � , e ).X ,���,X1 k

The joint p.g.f. P or the joint ch.f. � com-X ,���,X X ,���,X1 k 1 k

pletely specifies the joint probability distribution.Equivalent results are obtained in terms of either ap.g.f. or a ch.f.• The p.g.f. or ch.f. for the marginal (univariate) dis-

tribution F can be obtained byXj

P (t ) � P (1, � � � , 1, t , 1, � � � , 1),X j X ,���,X ,���,X jj 1 j k

� (t ) � � (0, � � � , 0, t , 0, � � � , 0).X j X ,���,X ,���,X jj 1 j k

• If the variables X1, � � � , Xk are mutually independent,then

k

P (t , � � � , t ) � P (t ).�X ,���,X 1 k X j1 k jj�1

• The covariances can be evaluated by Cov[Xi , Xj] �E[XiXj] � E[Xi]E[Xj] with

2E[X X ] � P (1, � � � , 1)i j X ,���,X1 mt ti j

2� � � (0, � � � , 0).X ,���,X1 mt ti j

This can be seen from the expression

2P (t , � � � , t ) � x x fX ,���,X 1 k i j x ,���,x1 k 1 kt ti j

x x �1 x �1 x1 i j k(x , � � � , x )t � � � t � � � t � � � t .1 k 1 i j k

The concepts of joint p.g.f. and joint ch.f. are veryuseful in combining correlated variables.

Theorem 2.2

For any k-correlated variables X1, � � � , Xk with jointp.g.f. P and joint ch.f. � the sum Z � X1 �,X ,���,X X ,���,X1 k 1 k

� � � � Xk has a p.g.f. and a ch.f.

P (t) � P (t, � � � , t), � (t) � � (t, � � � , t).Z X ,���,X Z X ,���,X1 k 1 k

Proof PZ(t) � E[t ]X �����X X X1 k 1 K] � E[t � � � t� (t, � � � , t). �PX ,���,X1 k

If we know the joint ch.f. of the k-correlated varia-bles, it is straightforward to get the ch.f. for their sum

� (t) � � (t, � � � , t).Z X ,���,X1 k

Then the probability distribution of Z can be obtainedby Fast Fourier Transform (FFT). The FFT is a map-ping of n points to n points, which can be viewed as adiscrete analog of the characteristic function. A rela-tion in terms of ch.f. corresponds to a relation interms of FFT. For more details of the FFT method, seeKlugman, Panjer, and Willmot (1998).

Consider the aggregation of two correlated riskportfolios:

Z � (X � � � � � X ) � (Y � � � � � T ),1 N 1 K

where N and K are correlated, the pair (N, K) is in-dependent of the claim sizes X and Y, and the Xi’s andYj’s are mutually independent. We have

Z (X �����X )�(Y �����Y )1 N 1 KP (t) � E[t ] � E[t ]Z

(X �����X )�(Y �����Y )1 n 1 m� E E[t �N � n, K � m]N,K

N K� E [P (t) P (t) ]N,K X Y

� P [P (t), P (t)].N,K X Y

In terms of ch.f. we have

Page 36: Applications of Copulas

Name /7991/05 01/21/99 09:41AM Plate # 0 pg 140 # 4

140 NORTH AMERICAN ACTUARIAL JOURNAL, VOLUME 3, NUMBER 1

NAAJ (SOA)

� (t) � P (� (t), � (t)). (2.3)Z N,K X Y

Having introduced some basic concepts, I now showhow a copula formula can be applied to p.g.f.’s andch.f.’s.

2.2 Multivariate Negative BinomialDistributions

Consider the following Cook-Johnson copula (Cookand Johnson 1981) with

�1/k�C(u , u , � � � , u ) � u � k � 1 . (2.4)� �1 2 k j

j�1

By applying the Cook-Johnson copula to theprobability-generating functions, we have

P (t , t , � � � , t )N ,N ,���,N 1 2 k1 2 k

�1/k�� P (t ) � k � 1 . (2.5)� �N jj

j�1

Wang (1998) shows that the joint p.g.f. in Equation(2.5) defines a multivariate negative binomial distri-bution if Nj has a negative binomial distribution withp.g.f.

jP (t ) � [1 � � (t � 1)]N j j jj

and provided that 0 � � min[j]. In this multivar-iate negative binomial distribution, we have

Cov[N , N ] � E[N ]E[N ].i j i j

2.3 Multivariate Long-Tailed DistributionsConsider the Morgenstein copula C(u, v) � uv[1 �(1 � u)(1 � v)]. By applying it to characteristicfunctions, we get

� (s, t)X,Y

� � (s)� (t){1 � [1 � � (s)][1 � � (t)]}. (2.6)X Y X Y

The bivariate ch.f. in Equation (2.6) can be used toconstruct bivariate lognormal distributions, bivariatePareto distributions, or lognormal/Pareto pairs. Thisbivariate model lends itself to the FFT method of eval-uating the sum

� (t) � � (t)� (t){1 � [1 � � (t)][1 � � (t)]}.X�Y X Y X Y

3. SOME CORRELATION MODELS FORCLAIM COUNTS

As noted by Drs. Frees and Valdez, parametric copulamodels may not be the best way of describing corre-lation models, especially for correlated frequencymodels.

In many situations, individual risks are correlatedsince they are subject to the same claim-generatingmechanism or are influenced by changes in the com-mon underlying economic or legal environment. Forinstance, in property insurance, risk portfolios in thesame geographic location are correlated where indi-vidual claims are contingent on the occurrence andseverity of a natural disaster (hurricane, tornado,earthquake, or severe weather condition). In liabilityinsurance, new court rulings or social inflation mayset new trends that affect the settlement of all liabilityclaims for one line of business. Based on these con-siderations, I discuss two correlated frequency mod-els: a common mixture model and a common shockmodel. None of these correlation models can be ex-plicitly expressed in terms of copula.

3.1 A Common Mixture ModelConsider k discrete random variables N1, � � � , Nk. As-sume that there exists a random parameter � suchthat

(N �� � ) � Poisson( � ), j � 1, � � � , k,j j

where the variable � has a p.d.f. �( ) and a moment-generating function M�. For any given � � , the var-iables (Nj� ) are independent and Poisson (�j ) dis-tributed with a conditional joint p.g.f.

P (t , � � � , t � )N ,���,N �� 1 k1 k

N N [� (t �1)����� (t �1)]1 k 1 1 k k� E[t � � � t �� � ] � e .1 k

However, unconditionally, N1, � � � , Nk are correlatedas they depend upon the same random parameter �.The unconditional joint p.g.f. for N1, � � � , Nk is

N N1 kP (t , � � � , t ) � E [E(t � � � t ��)]N ,���,N 1 k � 1 k1 k

� [� (t �1)������ (t �1)]1 1 k k� � e �( )d

0

� M [� (t � 1) � � � � � � (t � 1)].� 1 1 k k

Page 37: Applications of Copulas

Name /7991/05 01/21/99 09:41AM Plate # 0 pg 141 # 5

DISCUSSIONS OF PAPERS ALREADY PUBLISHED 141

NAAJ (SOA)

It has marginal p.g.f.’s P (tj) � M (�j(tj( � 1)) withN �j

E[Nj] � �jE[�].Note that

Cov[N , N ] � E Cov[N ��, N ��]i j � i j

� Cov[E[N ��], E[N ��]]i j

� Cov[�� , �� ] � � � Var[�].i j i j

In particular, if � has a gamma(, 1) distribution withm.g.f. M (z) � (1 � z) , then�

P (t , � � � , t )N ,���,N 1 k1 k

�� [1 � � (t � 1) � � � � � � (t � 1)] (3.1)1 1 k k

defines a multivariate negative binomial with margin-als NB(, �j) and Cov(Ni, Nj) � �i�j.

Consider combining k risk portfolios. Assume thatthe frequencies Nj , j � 1, � � � , k, are correlated via acommon Poisson-gamma mixture and have a jointp.g.f. given by Equation (3.1). If the severities Xj , j �1, � � � , k, are mutually independent and independentof the frequencies, there is a simple method of com-bining the aggregate loss distributions. Given

� � � � � � � � �1 k

and

� �1 1P (t) � P (t) � � � � P (t),X X X1 k� �1

then

�.P [P (t), � � � , P (t)] � [1 � �(P (t) � 1)]N ,���,N X X X1 k 1 k

In other words, the total loss amount for the com-bined risk portfolios has a compound NB(, �) distri-bution with the severity distribution being a weightedaverage of individual severity distributions. In thiscase, dependency does not complicate the computa-tion; in fact, it simplifies the calculation. It is simplerthan combining independent compound negative bi-nomial distributions.

Note that the common Poisson-gamma mixturemodel is a special case of the multivariate negativebinomial family in Equation (2.5). In the Poisson-gamma mixture model, the k marginals NB(, �j) arerequired to have the same parameter . On the otherhand, the multivariate negative binomial family inEquation (2.5) allows arbitrary negative binomial fre-quencies NB(j , �j).

3.2 A Common Shock ModelConsider a multivariate Poisson distribution definedby the following joint p.g.f.

P (t , t , t ) � expN ,N ,N 1 2 31 2 3

3

� (t � 1) � � (t t � 1) � � (t t t � 1) . � �ii i ij i j 123 1 2 3i�1 i�j

(3.2)

Its marginal distributions are

3

N � Poisson(� � � ), j � 1, 2, 3,j 123 iji�1

and for i � j, Cov[Ni, Nj] � � � � .ij 123

We let• Kii � Poisson(�ii), for i � 1, 2, 3• Kij � Poisson(�ij), for 1 � i � j � 3• Kij � K , for 1 � i, j � 3ji

• K � Poisson(� )123 123

• Nj � K � K � K � K , for j � 1, 2, 3.1j 2j 3j 123

Then the so-constructed (N1, N2, N3) have a jointp.g.f. given by (3.2). In this model, K represents the123

common shock among all three variables (N1, N2, N3).In addition, for i � j, K � K represents the extraij ji

common shock between Ni and Nj.Note that we can simulate the correlated frequen-

cies, (N1, N2, N3), component by component.Neither the common Poisson mixture or the com-

mon shock model can be explicitly expressed in termsof copula.

REFERENCES

BURDEN, R.L., AND FAIRES, J.D. 1989. Numerical Analysis, 4thed. Boston: PWS-KENT Publishing Company.

COOK, R.D., AND JOHNSON, M.E. 1981. ‘‘A Family of Distributionsfor Modelling Non-elliptically Symmetric MultivariateData,’’ Journal of the Royal Statistical Society B 43:210–18.

FISHMAN, G.S. 1996. Monte Carlo: Concepts, Algorithms, and Ap-plications. New York: Springer-Verlag.

FREES, E.W., AND VALDEZ, E.A. 1998. ‘‘Understanding Relation-ships Using Copulas,’’ NAAJ 2, no. 1:1–25.

HERZOG, T.N. 1986. ‘‘An Introduction to Stochastic Simula-tion,’’ Casualty Actuarial Society Study Note 4B.

IMAN, R.L., AND CONVER, W.J. 1982. ‘‘A Distribution-Free Ap-proach to Inducing Rank Correlation Among Input Varia-bles,’’ Communications in Statistics: Simulation Compu-tation 11, no. 3:311–34.

JOHNSON, M.E. 1987. Multivariate Statistical Simulation. NewYork: John Wiley and Sons, Inc.

JOHNSON, N., KOTZ, S., AND BALAKRISHNAN, N. 1997. Discrete Mul-tivariate Distribution. New York: John Wiley and Sons, Inc.

KLUGMAN, S., PANJER, H.H., AND WILLMOT, G.E. 1998. Loss Mod-els: From Data to Decisions. New York: John Wiley andSons, Inc.