a bayesian interpretation of whittaker—henderson graduation

10
Insurance: Mathematics and Economics 11 (1992) 7-16 North-Holland A Bayesian interpretation of Whittakeyhhgjhgjhgjghjsdfdf-Henderson graduation Greg Taylor Coopers & Lybrand, Sydney, Australia Received October 1990 Revised November 1991 Abstract: The primary ftgfggfggfgfgfdggpurpose of the paper is to place Whit- taker-Henderson graduation in a Bayesian context and show that this determines in a precise manner the extent to which goodness-of-fit should be traded off against smoothness in the Whittaker-Henderson loss function. This is done in Section 2. Section 3 generalizes the set of admissible graduating functions to a normed linear space. A specific example of this generalization is Schoenberg graduation, which is strongly related to Whittaker-Henderson but leads to particular spline graduations. These are placed in a Bayesian context parallel to that of Section 2. The similarity to shrunken or Stein-type estimators is pointed out. Section 4 considers the practical implications of these theoretical developments. Transformations of observations under graduation are examined and shown to be natural in some circumstances. The precise trade off mentioned above is enlarged upon, and the main conclusions reached here are seen to carry over to general spline graduation. The relation between Whittaker-Henderson and spline graduation is iden- tified. Keywords: Whittaker-Henderson graduation, Goodness-of-fit, Smoothness, Schoenberg graduation, Transformation of ob- servations, Spline functions. 1. Inlroduction Whittaker-Henderson graduation was intro- duced [Whittaker (1923), Henderson (192411 as a means of smoothing a sequence of data points with suitable compromise between the smooth- ness achieved and adherence of the graduation to the data. Correspondence IO: Prof. Dr. G.C. Taylor, Coopers & Lybrand Tower, 580 George Street, Sydney, NSW 2000, Australia The compromise is achieved by defining a loss function which contains separate components re- lating to goodness-of-fit and smoothness respec- tively, and a constant, referred to here as the relativity constant, which in effect specifies the extent to which one is willing to trade off one of these components against the other. One of the difficulties of Whittaker-Hender- son graduation has been the lack of theory guid- ing the choice of relativity constant, a criticism that was made by Lidstone as long ago as 1927. The primary purpose of the present paper is to place this form of graduation in a Bayesian con- text which effectively determines the proper value of the relativity constant. This has a number of other practical implica- tions for Whittaker-Henderson and spline gradu- ation, which are discussed later in the paper. 2. Whittaker-Henderson graduation in a Bayes- ian context 2.1. Definition of Whittaker-Henderson graduation Consider a parameter 19(x> which is a function of some real-valued argument x. Typically, in the actuarial context x is age and 13 is a mortality or some other decrement rate. For the sake of defi- niteness, the terms ‘age’ and ‘mortality rate’ will be used throughout this paper. Now suppose that observations Q(x) have been obtained on the &I>, x = a + 1, a + 2,. . . , a + m. This means that the Q(x) are random variables: Q(x) = e(x) t-e(x), (2.1.1) Ee( x) = 0. (2.1.2) By (2.1.21, the Q<x> are unbiased estimators of the 0(x>. As a sequence over x, the Q<x> may not be very smooth. However, the e(x) are as- 0167-6687/92/$05.00 0 1992 - Elsevier Science Publishers B.V. All rights reserved

Upload: greg-taylor

Post on 21-Jun-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Bayesian interpretation of Whittaker—Henderson graduation

Insurance: Mathematics and Economics 11 (1992) 7-16

North-Holland

A Bayesian interpretation of Whittakeyhhgjhgjhgjghjsdfdf-Henderson graduation

Greg Taylor Coopers & Lybrand, Sydney, Australia

Received October 1990

Revised November 1991

Abstract: The primary ftgfggfggfgfgfdggpurpose of the paper is to place Whit-

taker-Henderson graduation in a Bayesian context and show

that this determines in a precise manner the extent to which

goodness-of-fit should be traded off against smoothness in the

Whittaker-Henderson loss function. This is done in Section 2.

Section 3 generalizes the set of admissible graduating

functions to a normed linear space. A specific example of this

generalization is Schoenberg graduation, which is strongly

related to Whittaker-Henderson but leads to particular spline

graduations. These are placed in a Bayesian context parallel

to that of Section 2. The similarity to shrunken or Stein-type

estimators is pointed out.

Section 4 considers the practical implications of these

theoretical developments. Transformations of observations

under graduation are examined and shown to be natural in

some circumstances. The precise trade off mentioned above is

enlarged upon, and the main conclusions reached here are

seen to carry over to general spline graduation. The relation

between Whittaker-Henderson and spline graduation is iden-

tified.

Keywords: Whittaker-Henderson graduation, Goodness-of-fit,

Smoothness, Schoenberg graduation, Transformation of ob-

servations, Spline functions.

1. Inlroduction

Whittaker-Henderson graduation was intro- duced [Whittaker (1923), Henderson (192411 as a means of smoothing a sequence of data points with suitable compromise between the smooth- ness achieved and adherence of the graduation to the data.

Correspondence IO: Prof. Dr. G.C. Taylor, Coopers & Lybrand

Tower, 580 George Street, Sydney, NSW 2000, Australia

The compromise is achieved by defining a loss function which contains separate components re- lating to goodness-of-fit and smoothness respec- tively, and a constant, referred to here as the relativity constant, which in effect specifies the extent to which one is willing to trade off one of these components against the other.

One of the difficulties of Whittaker-Hender- son graduation has been the lack of theory guid- ing the choice of relativity constant, a criticism that was made by Lidstone as long ago as 1927. The primary purpose of the present paper is to place this form of graduation in a Bayesian con- text which effectively determines the proper value of the relativity constant.

This has a number of other practical implica- tions for Whittaker-Henderson and spline gradu- ation, which are discussed later in the paper.

2. Whittaker-Henderson graduation in a Bayes- ian context

2.1. Definition of Whittaker-Henderson graduation

Consider a parameter 19(x> which is a function of some real-valued argument x. Typically, in the actuarial context x is age and 13 is a mortality or some other decrement rate. For the sake of defi- niteness, the terms ‘age’ and ‘mortality rate’ will be used throughout this paper.

Now suppose that observations Q(x) have been obtained on the &I>, x = a + 1, a + 2,. . . , a + m. This means that the Q(x) are random variables:

Q(x) = e(x) t-e(x), (2.1.1)

Ee( x) = 0. (2.1.2)

By (2.1.21, the Q<x> are unbiased estimators of the 0(x>. As a sequence over x, the Q<x> may not be very smooth. However, the e(x) are as-

0167-6687/92/$05.00 0 1992 - Elsevier Science Publishers B.V. All rights reserved

Page 2: A Bayesian interpretation of Whittaker—Henderson graduation

8 G. Taylor / Whrtaker-Henderson graduation

sumed to be smooth, and it is desired that estima- tors e^ be formed which reflect this smoothness.

The required estimators are obtained as fol- lows. Define:

a+m

o(k Q) = c W[Q(x, - @,]‘,

x=a+l

(2.1.3)

afm-n

S(e^) = c [A??(x)]*, x=a+l

(2.1.4)

L(6, Q, c) =D(i, Q) +cS(@, (2.1.5)

where w(x) is a weight function, A” denotes nth difference, and c is a constant.

The function O<e^, Q) measures the deciution between observations and the associated estima- tors; S(i) measures the smoothnes.~ (or lack the;eof) of the sequence of estimators G(X); L(B, Q, c) is a loss function reflecting both devia- tion and lack of smoothness. The loss function L<e^, Q, c) depends on the constant c which as- signs relative weights to deviation and smooth- ness. This constant will be referred to in this paper as the relatkity constant.

Note that the loss function is rather more concisely expressed in matrix notation:

o(S, Q) = (Q - e^)T~(~ - 8^), (2.1.6)

S(e^) = (Ke^)T(K@, (2.1.7)

L(& Q, c) = (Q - s)‘B’(Q - B^)

+C(Ke^)T(K@, (2.1.8)

where f?, Q are m-vectors and

W=diag(w(a + l),...,w(a +m)), (2.1.9)

the superscript T denotes matrix transposition, and K is the (m - n) x m matrix of binomial coefficients effecting nth differences.

The required estimators $x> are obtained by minimizing the loss function:

aL( e*, Q, ,)/a$ = 0, (2.1.10)

yielding:

e^= (I+cW-*KTK)-lQ. (2.1.11)

The resulting vector gives the Whittaker-

Henderson graduated rates L$x>. This method of graduation derives from Whittaker (1923) and Henderson (1924). It is included in the text by Miller (1946) and discussed in many subsequent publications, e.g. Hoem (1984).

2.2. Bayesian interpretation of the loss function

Consider the deviation measure (2.1.3) with weight function given by

l/w(x) = v[QWl. (2.2.1)

and suppose that

v[Q<d =4~)/N(x), (2.2.2)

for some function UC.), and with N(x) denoting the sample size on which Q(X) is based.

By (2.1.0, (2.1.2) and (2.1.10,

EVI =(Z+cWLKTK)-‘8

=0+O(min{[N(x)]-‘: x=0 + l,...,a+m}),

(2.2.3)

showing that e^ is asymptotically unbiased for large samples.

Then (2.1.3) is, asymptotically,

D(g, Q) a+m

= C [QW - E[QwII*P[QW x=a+l

(2.2.4)

which, but for a missing constant, is the log-likeli- hood for the Q(x), assuming these to be i.i.d. normal variates with parameters E[Q(x)] and UQ(x>l.

One may note in passing that, since e^ is just a linear transformation of Q [see (2.1.101, asymp- totic normality of Q implies the same property for e^.

Now consider the smoothness criterion S(i) given by (2.1.4) and note that it too can be considered as a log-likelihood. Let t(x) denote A”B(x) and suppose that the parameters 0(x> are subject to the prior distribution

g-(x) _ N(0, 7’). (2.25)

Page 3: A Bayesian interpretation of Whittaker—Henderson graduation

G. Taylor / IjJhittaker-Henderson graduation 9

If the t(x) are assumed stochastically indepen- dent, then the prior log-likelihood (omitting a constant term) is

S’S/r’. (2.2.6)

Let [(x) = A”t%x>. Then by (2.2.3), i(x) is asymptotically equal to t(x) for large samples. Thus, the log-likelihood (2.2.6) is asymptotically equal to

#$?$,2 = S( e^) /r? (2.2.7)

The fact that the prior (2.2.5) has been placed on A’Wx) rather than e(x) implies prior knowledge only about the shape of e(x) as a function of x and not about the lecel of that function. This is essentially the same as the assumption made by Whittaker (1923). Kimeldorf and Jones (1967, e.g. p. 67), on the other hand, emphasised the inclu- sion in the prior of knowledge on both shape and level.

Let G(X) denote the log-likelihood of random variable X.

Then (2.2.4) and (2.2.7) give:

G(Qle> = -@, Q), (2.2.8)

G(t) = -C’S(,_), (2.2.9)

asymptotically, where the true sign of the log-like- lihoods has now been recognised and constant terms ignored.

Then,

G(Q. 0) = G(Qle> + G(8)

=G(QW +G(t)

= -D(i, Q) + F’S(6)

= -I@, Q, F2), (2.2.10)

where the second equation follows from the fact that the only prior on t9 is in fact applied to 5, a linear transformation of 8 of less than full rank. Some discussion of the rank reduction appears in Hickman and Miller (1978, pp. 433-434).

It now follows that

WlQ>=G<Q3>-G(Q)

= -f@, Q, T-‘) -G(Q), (2.2.11)

with unnecessary constant terms still omitted. All likelihoods involved in this calculation are

normal, and therefore so is G(8 I Q). Hence the maximum likelihood estimator of E[8 / Q] may be

found by maximizing G(0 I Q>, i.e. by minimiza- tion of L(B, Q, re2). This is the same as carrying out a Whittaker-Henderson graduation with rel- ativity constant c set equal to l/~~.

This result is summarized as follows.

Theorem 2.2.1. Using the aboce notation, sup- pose that mortality rates Q(x) are stochastically independent at different ages, and asymptotically normally distributed for large samples. Suppose also that 8 = E[Q] is subject to a prior distribution according to which the t(x) =A”e(x) are i.i.d. N(0, r2). Then asymptotically the maximum likeli- hood estimator of the Bayesian posterior expecta- tion E[0 ] Q] is gioen by a W’hittaker-Henderson graduation of Q procided that the weight function w(x) and relaticity constant c are set to the follow- ing calues:

w(x) = l/V[Q(x)], c= l/7’.

Interestingly, and perhaps surprisingly, Whit- taker’s justification of his method of graduation [Whittaker (1923), Whittaker and Robinson (1924); see also Jones (1965, p. 35)] ran very much along the lines leading to Theorem 2.2.1. Though he did not have the benefit of modern day rigor- ous Bayesian statistics, Whittaker nevertheless identified what was virtually (2.2.7) as the log of an ‘antecedent probability’, or in other words a prior.

The formal similarity between Whittaker- Henderson and Bayesian graduation was noted by Kimeldorf and Jones (1967, pp. 72-73), but without leading to a conclusion like that of the preceding theorem. Kimeldorf and Jones were more concerned with showing that Bayesian grad- uation with particular types of constraint on the prior covariance matrix couId lead to a gradua- tion resembling Whittaker-Henderson.

One may also note the formal similarity be- tween the Whittaker-Henderson solution (2.1.11) and the ridge regression estimator of Hoer1 and Kennard (1970a,b). The above Bayesian interpre- tation of the former shows that its nature derives from the same origins as many other ‘shrunken’ or Stein-type estimators [e.g. Efron and Morris (1970, Efron (1978)], namely from parametric uncertainty as in (2.2.5).

Estimators of these types are perhaps better known to actuaries through the credibility litera-

Page 4: A Bayesian interpretation of Whittaker—Henderson graduation

10 G. Taylor / Whittaker-Henderson graduation

ture [e.g. Biihlmann (1967)]. Specific examples of the connection between Bayesian and credibility estimation are given by Mayerson (1964) and Jewel1 (1974, 1975). An example specific to the graduation context is given by Kimeldorf and Jones (1967, pp. 69-71).

Note that, while Whittaker-Henderson gradu- ation is unbiased for unboundedly large samples, as remarked just after (2.2.3), it is not so in general. This is consistent with the willingness of the users of Stein-type estimators to accept a certain amount of bias in return for a reduction in mean square error of estimation.

3. Interpretation of the Bayesian prior in a linear function space context.

3.1. The linear function space

When the Whittaker-Henderson graduation is placed in a Bayesian setting, the prior is given by (2.25). Note that the variable regarded as subject to the prior is t(x) =AntI(x). Since e’.$ is a quantity to be minimized, subject to goodness-of- fit considerations, it is tempting to regard 5 as some sort of measure of distance of the function 0 from the zero function in an appropriate func- tion space.

With this motivation consider the space V-‘[LK, /3] of functions which are continuously differentiable n - 1 times on domain [a, p] which contains the values a + 1,. .., a + m, and have piecewise continuous nth derivatives. This space is a vector space under the operations of point- wise addition and scalar multiplication of func- tions.

It is converted into a normed linear space ‘Yn-‘[~, /3] by definition of the norm

a+m l/2 Ilf-gll = c (A”[f(x) -g(x)])2

[ x=a+l 1 . (3.1.1)

Note that A” annihilates all polynomials of de- gree n - 1 or less, and so II f-g II = 0 whenever f and g differ only by such a polynomial. Strictly then the members of ‘Yn-‘[a, PI are equicalence classes of functions which differ from one another by polynomials of degree n - 1 or less.

Now the smoothness criterion SC& = (‘6 in (2.2.7) may be written as

S( e^) = II e^li 2. (3.1.2)

Then Whittaker-Henderson graduation with loss function (2.15) can be regarded as minimization of th: distance of the estimated mortality func- tion 0 from the zero function subject to satisfac- tory goodness-of-fit.

InJhe Bayesian context, (2.25) with S’s equal to II 8 II ’ yields:

Ile^ll -N(O, mr’). (3.1.3)

Then Theorem 2.2.1 takes the following some- what different form.

Theorem 3.1.1. Suppose that mortality rates Q(x) are stochastically independent at different ages, and asymptotically normally distributed for large samples. Suppose also that 8 = E[ Q] E V”- ‘[a, p] nnd is subject to a prior according to which II 0 11 _ N(0, mr2). Then asymptotically the maximum like- lihood estimator of the Bayesian posterior expecta- tion E[B I Q] is gicen by Whittaker-Henderson graduation of Q procided that the weight function w(x) and relaticity constant c are set to the follow- ing values :

w(x) = l/V[Q(x)], c = 1/r2.

Note that the prior specified here is less re- strictive than in Theorem 2.2.1. Here only the distribution of I] t9 ]I is specified, whereas in The- orem 2.2.1 a distribution (of 4) is specified at each point X. The same result could have been obtained in Theorem 2.2.1, of course, since all that has changed between that and the present theorem is the notation. However, the present result emerges a little more naturally from the normed space setting.

While the most common application of Whit- taker-Henderson graduation bases its smooth- ness measure on departure from polynomial smoothness, and this approach has been followed here, other choices are possible. For example, Peterson (1952) adopted a smoothness measure which measured departures from exponential smoothness. Section 4.1 of this paper considers generalization of the smoothness measure.

Page 5: A Bayesian interpretation of Whittaker—Henderson graduation

G. Taylor / Cyhiftaker-Henderson graduation 11

3.2. Schoenberg’s cariation of Whittaker-Hender- degree n with knots x,, . . . , x, is function S : 9’

son graduation +A%’ with the following two properties:

Greville (1969) reports a variation of Whit- taker-Henderson graduation due to Schoenberg (1964) in which smoothness is measured in terms of nth order dericatives instead of differences. In particular, the smoothness measure (2.1.4) is re- placed by

(a) in each interval (xi, xi+ ,) for i = 0, 1,. . . , m (where x0 = --co and x,+, = +m), S(x) is given by some polynomial of degree n or less;

(b) S E Cn-‘(&‘>, i.e. S is n - 1 times continu- ously differentiable everywhere.

Let Pfl (xi,..., x,) denote the class of spline functions of degree n with knots x,, . . . , x,.

Let r,, denote the class of polynomials of degree n or less. Note that rr,, C9”’ (x,, . .., x,).

Let

(3.2.1)

the superscript (n) denoting nth order differenti- ation.

Though the scope of this paper is intended to be wider than the mortality context (despite the mortality specific terminology), it is worth noting that A”g(x> and 8(“‘(x) do not usually differ greatly for mortality functions 8, and so Whit- taker-Henderson results using (2.1.4) and (3.2.1) respectively can be expected to be similar.

It is evident that all of the foregoing reasoning concerning the Bayesian context can be adapted to the smoothness measure (3.2.1).

In particular, it would need to be assumed in Theorem 2.2.1 that B(“)(x) _ N(0, r*). In Section 3.1.1, the norm (3.1.1) would need to be replaced

by

IIf -g 11 = (I”[( f -g)‘“‘(x)]* dx)“*. (3.2.2) (I

Then (3.1.2) continues to hold with smooth- ness measure now defined by (3.2.1). Thus, the counterpart of Theorem 3.1.1 would require the assumption that

II 8 II N N(O, (P - c+‘). (3.2.3)

The interest in Schoenberg’s variation of the Whittaker-Henderson graduation lies in the fact that the solution is a particular spline function. This is explained below, but first the next sub-sec- tion makes a short digression in order to present some basic spline concepts.

3.3. Spline functions

The following definitions and discussion are taken from Greville (1969).

Definition. For a strictly increasing sequence of real numbers, x,, . . . , x,, a spline function of

xn=xn + 7 x 2 0;

=o, x50. (3.3.1)

It may be shown [Greville (1969, p. 3)] that any SE7 (XI,..., x,) may be uniquely decom- posed thus:

S(x) =p(x) + 5 cj(x-xj):, (3.3.2)

where p E T,,.

Definition. A natural spline of degree 2k - 1 (the degree of a natural spline must be odd) with knots x,, . . . , x,,, is a member of PZk-’ (x i, . . . , AT,,,> which is contained in Tk_ I when restricted to (-cc, x,) or (xmr +a).

Let JV*~-’ (xi,..., x,) denote the class of natu- ral splines of degree 2k - 1 with knots x,, . . . , x,.

It may be shown [Greville (1969, p. 3)] that any sEJ-*k-’ (x i,. . . , x,) may be uniquely decom- posed thus:

S(X) =p(X) + E cj(x-xj)~-lT (3.3.3) j=l

with p E rk_,, and the coefficients cj satisfying the relations

m ccjxJ=O, r=O,l,..., k-l. (3.3.4) j=l

It is fairly clear that the constraints (3.3.4) serve to restrict the right-hand ‘end polynomial’ to de- gree k - 1 (rather than 2k - 1). Hence the need for k constraints.

The rudiments of spline functions have been familiar to actuaries for many years. They occur

Page 6: A Bayesian interpretation of Whittaker—Henderson graduation

12 G. Taylor / Whittaker-Henderson graduation

as the osculatory interpolation functions intro- duced by the actuary George King in the gradua- tion of English Life Table No. 7 [Supplement to the Registrar-General’s 75th Annual Report (1914)].

3.4. Schoenberg graduation

If the Schoenberg measure of smoothness (3.2.1) is inserted in the loss function (2.1..5), the result is a new loss function:

atm

a(!? Q, c) = c w(4[Q(4 -fWl’

z-0+1

+c/‘[f(‘+)]2 dx. (3.4.1) a

Schoenberg graduation consists of choosing f to minimize this. The result derived by Schoenberg (1964) and cited by Greville (1969) is that the minimizing f is a natural spline. The precise re- sult is as follows:

Note that the differences A”&(x) will tend to be large when the &x> are large. For example if

e(x) =k”, k const. > 0, (4.1.1)

then

Theorem 3.4.1 (Schoenberg). Assume that c > 0, w(x) > 0, x = a + 1,. . . , a + m. Suppose that (Y 5 a + 1, p 2 a + m, n 2 1, andfE C”-‘[cr, ~1. Then there exists a unique s EJV’~-’ (a + 1,. . . , a + m) such that

AV( x) = (k - l)“k”, (4.1.2)

and SC& will tend to be dominated by the values of x at which $(x> is large.

Intuitively, it seems more reasonable to mea- sure smoothness by

a+m-n

o(s, Q, c) I a(f, Q, c), (3.4.2)

for any f satisfying the stated conditions, and with equality in (3.4.2) if and only if f = s. Moreover, ifs is expressed in the form (3.3.3), it is uniquely deter- mined by the equations:

s( a + j) + ( - l)“c(2n - 1) !cj/w( a + j)

S(i) = c [A”+)/~(x)]~. (4.1.3) X=O+ 1

Then in the case (4.1.0, all summands of (4.1.3) are equal [in fact equal to (k - 1>2n].

= Q(a +i), (3.4.3)

g3;;; 1,2,..., m [together with the constraints . . .

The Bayesian interpretation of Whittaker- Henderson (or Schoenberg) graduation provides a theoretical justification for adjusting the smoothness measure in this way through the fol- lowing theorem.

Theorem 3.1.1 showed that Whittaker- Henderson graduation could be viewed as Bayesian graduation with a prior chosen from the normed linear function space P-‘[cy, PI. Sec- tion 3.2 showed that a completely parallel theo- rem would hold for Schoenberg graduation if the norm were adjusted from (3.1.1) to (3.2.2).

Theorem 4.1.1. Suppose that mortality rates Q(x) are stochastically independent at different ages, and asymptotically normally distributed for large samples. Let t(x) be a functional of tNx> [e.g. it may depend on e(x) and its differences] and define the smoothness measure:

a+m-n s(e) = c k(X)12~ (4.1.4)

x=a+l

The substance of Theorem 3.4.1 is that the where a + m -n is the maximum calue of x for function space from which the Schoenberg prior which e(x) is defined by e(a + 0,. . , , @(a + m). is chosen can be very much reduced from Now suppose that 0 = E[Ql is subject to a prior

Y”-‘[a, 81, which norm (3.2.2). It can (a + l,...,a +m>.

is CY-‘[a, PI supplied with in fact be reduced to JZn-’

4. Practical implications of the Bayesian interpre- tation

4.1. Choice of smoothness measure

The smoothness measure used hitherto has been (2.1.4) or (3.2.1). This is in fact typical of mortality graduation. For the sake of definite- ness, consider (2.1.4):

a+m--n S(i)= c [A”e^(x)]‘. (2.1.4)

x=(I+ I

Page 7: A Bayesian interpretation of Whittaker—Henderson graduation

G. Taylor / Whirtaker-Henderson graduation 13

distribution according to which the t(x) are i.i.d. N(0, TV). Then asymptotically the maximum likeli- hood estimator of the Bayesian posterior expecta- tion E[8IQ] . g IS iuen by a ’ Whittaker-Henderson graduation’ of Q procided that the smoothness measure (4.1.4) is adopted in the loss function (2.1.5) and that the weight function w(x) and rela- ticity constant c are set to the following values:

w(x) = l/V[Q(x)], c= l/7’.

This theorem runs absolutely parallel to Theo- rem 22.1, and the proof is just as for that theo- rem but with the new smoothness measure (4.1.4).

Theorem 4.1.1 shows that, if

t(x) =A”~(x)/~(x) -N(O, T’), (4.1.5)

then a Bayesian graduation of the Q(x) is given by a Whittaker-Henderson graduation based on the smoothness measure (4.1.3) rather than (2.1.4).

Unfortunately, smoothness measures like (4.1.3) are not linear in the f%x>. This is computa- tionally awkward, since linear St?) lead to gradu- ated e^ which are just linear transformations of the Q(x) while non-linear S(g) do not. Neverthe- less, the graduation may be performed by any standard, technique for minimizing the (now non- linear) loss function (2.1.5).

An alternative approach to non-linear func- tionals .$ of 0 such as (4.1.5) is to apply such a transformation F to the e(x) that the result is subject to the following prior distribution, at least approximately:

A”F(B(x)) -N(O, 7’). (4.1.6)

Note that (4.1.6) may be a reasonable assumption if F(e(x)) is approximately polynomial in x. The whole difficulty with simple nth differences in (4.1.2) arose from the fact that g(x) was not polynomial but geometric. If a log transformation [F(B) = log 01 were applied, then (4.1.1) would yield:

F(B(x)) =x log k, (4.1.7)

A”F(B(x)) = log k, n = 1; (4.1.8)

0, n> 1, \ _I =

and (4.1.6) may be reasonable for some n > 1. In this case, the relation (4.1.6) has been

achieved by a linearizing function F(.), i.e. a function which transforms 8 to a linear function of x. This concept can be generalized to that of a

polynomializing transformation, defined as fol- lows.

Definition. Let 8 : 9 +9?. A transformation F: 9’ -+9’ is a polynomializing transformation of degree n [relative to 01 if F(@(x)) is a polynomial of degree n in x.

It follows from this definition that the applica- tion of a polynomializing transformation F of degree n - 1 to 0 yields

A”F( r3( x)) = 0. (4.1.9)

Of course, in practice the function 0(.) is un- known and so it will not be possible to specify a polynomializing transformation with any preci- sion. However, the general shape of 0(.) will often be known sufficiently for an approximately polynomializing transformation to be determined.

For example, if e(x) is thought to be approxi- mately exponential, then it may be reasonable to assume that

A”F(B(x)) _ N(0, TV),

with F(e) = log 8.

(4.1.10)

More specifically in relation to mortality rates, one might assume that (approximately) a log-lo- gistic transformation would yield a polynomial [see e.g. Forfar, McCutcheon and Wilkie (1988, p. 16)]:

e(x) = [exp q(x)l/[l + em 4(x)1 t (4.1.11)

where q(x) is a polynomial of degree n - 1. Then

F(e) = iog[e/(i -e)] (4.1.12)

is a polynomializing transformation of degree n - 1.

It is possible to carry out a Whittaker- Henderson graduation yielding estimates of the F(B(x)) provided that the data are transformed by G: k%‘+.%’ to be unbiased with respect to these estimands, i.e.

E[G(Q<xNl =ww). (4.1.13)

Now it may be checked that this relation is satis- fied by

G(Q<x)) =F(Q<x)) - 1/2f”‘(Q<x))Q(x)

X[l -Q(x)]/[N(x) -I} +other terms, (4.1.14)

Page 8: A Bayesian interpretation of Whittaker—Henderson graduation

13 G. Taylor / Whittaker-Henderson graduation

where the ‘other terms’ involve higher powers of N-’ and/or third or higher derivatives of F(.).

To check this, note that

wc2w)l =F(E[QWl) + 1/2F”(E[Q<x>l)V[Q(x>l

+ higher order derivatives,

[by Taylor series expansion of Q(x)]

=F(8(x)) + 1/2FV(8(x))8(x)[1 -e(x)]

/N(x); (4.1.15)

E]&(X)] = v[e(x)l + WQW

= {@(x)+[W)- W(x)}/W), (4.1.16)

where it has been assumed that

W)Q(x) -EWVx), e(x)), (4.1.17)

as would be the case for mortality rates. Taking the expectation of (4.1.14) and substituting (4.1.1.5) and (4.1.16) into the result yields (4.1.13) as required.

Summarizing all of this, the new variable G(Q(x)) is an unbiased estimator of F(B(x)). The situation surrounding G(Q(x)) is therefore quite parallel to that surrounding Q(x) when a Whittaker-Henderson graduation was performed on this vector of observations in Section 2.

The parallel is completed if one is entitled to assume that A”F(B(x)) is subject to a suitable prior:

A”F( e( x)) N N(0, TV), (4.1.18)

and, at least to a reasonable approximation, G(O(x)) is normally distributed. This last approxi- mation is reasonable in cases where N(x) is large enough that skewness of Q(x) is small and F(.) [and therefore GO] is not sufficiently non-linear as to introduce substantial skewness.

Under these conditions, one obtains the fol- lowing parallel to Theorem 2.2.1.

Theorem 4.1.2. Suppose that mortality rates Q(x) are stochastically independent at different ages. Let F(.) be a polynomializing transforamtion relatice to 00 = E[Q(.>], and let G(.) be defined by (4.1.14). Suppose further that, for each x, G(tI(x)) is ap- proximately normally distributed and that 9(x> is subject to a prior distribution according to which the A”F(B(x)) for curious x are i.i.d. N(0, ~~1.

Then the maximum likelihood estimator of the Bayesian posterior expectation E[ F(B) 1 Q] is gicen

w(x) = l/V[G(B(x))], c= l/T?

approximately by a Whittaker-Henderson gradua- tion of G(Q) procided that the weight function w(x) and relaticity constant c are set to the follow- ing calues :

If the transformation F(.) is one-one, then the mode of the distribution of 8 given Q is that value of 8 which gives the mode of the distribu- tion of F(8) given Q. The following corollary to Theorem 4.1.2 is then derived.

Corollary 4.1.3. Let e^ denote the approximate maximum likelihood estimator of E[ F(8) I Q] in Theorfm 4.1.2. Suppose that F(.) is one-one. Then F-‘(8) is the approximate maximum likelihood estimator of E[O I Q].

While the present sub-section has hitherto been concerned with transformation of the un- derlying sequence of mortality rates to something approximately a polynomial, it is interesting to note that Hickman and Miller (1977, pp. 12-13) apply the standard variance stabilization transfor- mation:

O(x) * [N(x)] “2 x (arc sin[ Q( x)] 1’2

-arc sin[ O( x)] I”},

in recoginition of the binomial variance

Wwl = w[l - wl/w).

4.2. The relaticity constant

All of the foregoing has placed the relativity constant c in a context which assists in the selec- tion of its value. Usually, the principles according to which this selection is made are only vaguely stated. The constant measures in some way the extent to which the user of the graduation is willing to compromise its adherence to the data in favour of smoothness, but without any precise formulation of the compromise.

The foregoing results all indicate that l/c may be viewed as the variance of a prior on whatever smoothness measure is in use, usually a certain order of differences or derivatives.

In certain circumstances it may be possible to express this prior variance with reasonable preci- sion. For example, if the table of mortality rates

Page 9: A Bayesian interpretation of Whittaker—Henderson graduation

G. Taylor / Whittaker-Henderson gruduation 15

under graduation falls in a sequence of national mortality tables, then variation of the relevant smoothness measure, as between previous mortal- ity tables in the sequence, may well represent one’s current opinion on the prior variation of this smoothness measure.

Hickman and Miller (1978) discuss Bayesian graduation in which a prior is also placed on c.

4.3. Spline graduation

The foregoing results are also useful in two ways in connection with spline graduation. First, they indicate why spline graduation is likely to be preferable to Whittaker-Henderson graduation in most circumstances. Second, they assist in se- lection of the optimization criterion to be used in carrying out a spline graduation.

Consider the first of these two statements. Now Section 3.2 notes that there will often be little practical difference between Whittaker- Henderson and Schoenberg graduations, and Theorem 3.4.1. shows that the latter leads to a natural spline graduation.

The natural spline function concerned has m knots, one at each value of x for which an observation has been made. Then it may be ex- pressed in the form

S(x) =p(x) + 5 cj(x-xj)~-l,

j=l (3.3.3)

with xi = a + j and the constants cj satisfying (3.3.4).

More generally, a natural spline with knots corresponding to a subset A? of (a + 1,. . . , a + m} will take the form

S(X) “p(X) + C Cj( X -xj):“-‘. (4.3.1)

This more general form allows one to drop those terms of (3.3.3) in which the coefficients cj are too small to make worthwhile contributions. Thus, the spline (4.3.1) with general number of knots may be viewed as admitting the fit to data which is most parsimonious in terms of parameters. That is, the possibility of restricting the set .A? by setting some cj to 0 in (3.3.3) may degrade the quality of fit only little while producing a model

which may be described much more economi- cally. Questions of the statistical significance of the cj may be decided by routine procedures associated with the fitting technique adopted.

The greater flexibility of the general spline function (4.3.1) will usually render it preferable to the Whittaker-Henderson graduation.

The second comment made at the start of the present sub-section concerned the optimization criterion to be used to perform a spline gradua- tion. The reasoning here is exactly as for Whit- taker-Henderson. Indeed, as has just been seen, natural spline graduation can often be regarded as approximately Whittaker-Henderson gradua- tion with statistically insignificant terms removed.

Then all of the reasoning concerning the Whit- taker-Henderson loss function (2.1.5) is still ap- plicable, at least approximately. In particular, the Bayesian arguments of Section 2.2 are applicable. It follows therefore that a spline function (4.3.1) may be fitted to the data by minimization of loss function (2.1.5) and, in accordance with Theorem 2.2.1, with w(x) = l/V[Q(x>], c = l/7*. Alterna- tively, the spline may be fitted to a polynomial- ized version of 0(x) in accordance with Theorem 4.1.2 and Corollary 4.~1.3, which again specify the weight function w(x) and the relativity constant C.

The end result is a rigorous formulation of the criterion (loss function) according to which the spline fitting is carried out.

It is of interest to note the spline fitting tech- niques of Craven and Wahba (1979). They select the relativity constant by means of cross valida- tion. Essentially, this means that the constant is so chosen as to minimize the sum of squared deviations between the observations and their predictions by the Schoenberg graduations which omit one observation (the one under estimation) at a time.

5. Acknowledgement

I wish to thank Professor James C. Hickman for the helpful comment and much useful back- ground information which he provided in relation to an earlier draft of this paper.

Page 10: A Bayesian interpretation of Whittaker—Henderson graduation

16 G. Taylor / Whittaker-Henderson graduation

References

Biihlmann, H. (1967). Experience rating and credibility. Astin Bulletin 3, 199-207.

Craven, P. and G. Wahba (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smooth- ing by the methods of generalized cross-validation. Nu- merische Mathematik 3 1, 377-403.

Efron, B. (1978). Controversies in the foundations of statistics. The American Mathematical Monthly 85, 231-246.

Efron, B. and C. Morris (1971). Limiting the risk of Bayes and empirical Bayes estimators - Part I: The Bayes case. Journal of the American Statistical Association 66, 807-815.

Forfar, D.O., J.J. McCutcheon and A.D. Wilkie (1988). On graduation by mathematical formula. Journal of the Insti- tute of Actuaries 1 15. I - 149.

Greville, T.N.E., ed. (1969). Theory and applications of spline functions. Academic Press, New York.

Henderson, R. (1924). A new method of graduation. Transac- tions of the Actuarial Society of America 25, 29-40.

Hickman, J.C. and R.B. Miller (1977). Notes on Bayesian graduation. Transactions of the Society of Actuaries 24, 7-49.

Hickman, J.C. and R.B. Miller (1978). Discussion of ‘A linear programming approach to graduation’. Transactions of the Society of Actuaries 30, 433-436.

Hoem, J. (19%). A contribution to the statistical theory of linear graduation. Insurance: Mathematics and Economics 3, I-17.

Hoerl, A.E. and R.W. Kennard (1970a). Ridge regression:

Biased estimation for non-orthogonal problems. Techno- metrics 12, 55-67.

Hoerl, A.E. and R.W. Kennard (1970b). Ridge regression: Applications to non-orthogonal problems. Technometrics 12, 69-82.

Jewell, W.S. (1974). Credible means are exact Bayesian for exponential families. Astin Bulletin 8, 77-90.

Jewell, W.S. (1975). Regularity conditions for exact credibility. Astin Bulletin 8, 336-341.

Jones, D.A. (1965). Bayesian statistics. Transactions of the Society of Actuaries 18. 33-57.

Kimeldorf, G.S. and D.A. Jones (1967). Bayesian graduation. Transactions of the Society of Actuaries 19, 66- 112.

Lidstone, G.J. (1926/27). Correspondence. Transactions of the Faculty of Actuaries 11, 233-237.

Mayerson, A.L. (1964). A Bayesian view of credibility. Pro- ceedings of the Casualty Actuarial Society 51, 85-104.

Miller, M.D. (1946). Elements of Graduation. Actuarial Soci- ety of America and American Institute of Actuaries.

Peterson, R.M. (1952). Group annuity mortality. Transactions of the Society of Actuaries 4, 246-307.

Schoenberg, I.J. (1964). Spline functions and the problem of graduation. Proceedings of the National Academy of Sci- ences of the fLYA 52, 947-950.

Supplement to the Registrar-General’s 75th annual report, Part 1. (1914). HMSO, Cmd. 7512.

Whittaker. E.T. (1923). On a new method of graduation. Proceedings of the Edinburgh Mathematical Society 41, 63- 75.

Whittaker, E.T. and G. Robinson (1924). The Calculus of Observations. Blackie, London.