nchs july 11, 2006. a semiparametric approach to forecasting us mortality age patterns presenter:...

40
NCHS July 11, 2006

Upload: francine-eleanor-cain

Post on 13-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

NCHS July 11, 2006

NCHS July 11, 2006

A Semiparametric Approach to Forecasting US Mortality Age Patterns

Presenter: Rong Wei1

Coauthors: Guanhua Lu2, Benjamin Kedem2 and Paul D. Williams1

1National Center for Health Statistics (NCHS)2Math Dept. University of Maryland, College Park

NCHS July 11, 2006

Outline

Background Project tasks Model Introduction New Approach: Semiparametric model Mortality forecasting: US, small states Comparison with Lee-Carter Model Conclusion

NCHS July 11, 2006

Background NCHS publishes race-gender specific life

tables for each of 50 states plus DC decennially;

Out of 300+ tables, about 1/5 of tables could not be published due to small numbers of deaths in a short time period;

Mortality data have been well documented in NCHS for every year, state, race-gender population since 1968.

NCHS July 11, 2006

An example of life tables

NCHS July 11, 2006

Mortality age patterns: data from US and large states

Mortality of white male in California - data from 1998

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

0 10 20 30 40 50 60 70 80 90

age

ln (q

)

Mortality of US male - 1998

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

0 20 40 60 80 100

age

ln (q

)

NCHS July 11, 2006

Mortality in small states: one year data vs. 30 years historical data

Mortality of black female in Iowa state - data from 1968 to 1998

-7

-6

-5

-4

-3

-2

-1

0

-10 10 30 50 70 90

Age in year

Ln

(q

)

Mortality of black female in Iowa state - data from 1998

-7

-6

-5

-4

-3

-2

-1

0

-10 10 30 50 70 90

Age in year

Ln

(q

)

NCHS July 11, 2006

Another view of the data: time series at each age

NCHS July 11, 2006

The tasks To solve the insufficient data problem, data

from 30+ years are used to model the age-specific death pattern for small areas;

Select a time series model which gives better control for time effect and random error in multiple time series with short prediction;

Project mortality curves (one year ahead vs. many years prediction) in small areas with historical data and robust statistical methodology.

NCHS July 11, 2006

Introduction to mortality forecasting models:

US mortality forecasting model by Lee and Carter (1992):

Ln( mx,t ) = ax + bx kt + ex,t

kt = kt-1 + c + et

The LC model is based on principle components. It searches for the 1st PC in n dimensional time series data and solves for the age and time parameters by singular value decompositions.

The LC model explains 60 – 93% of total dimensional variance (Girosi and King). For some populations, the 1st PC may be insufficient to explain the variance in high-dimensional data.

NCHS July 11, 2006

New Approach: Semiparametric model

Semiparametric approach Short mortality time series used from

1968 to 1998 for consistency of data collection

Combining more information from age neighborhood

Centered death rates Emphasis on predictions of incoming

years

NCHS July 11, 2006

Semiparametric model

NCHS July 11, 2006

Semiparametric model in Time Series

NCHS July 11, 2006

Parameter estimation from pooled sample

NCHS July 11, 2006

Maximum likelihood function

NCHS July 11, 2006

Reference sample distribution

NCHS July 11, 2006

Application on US mortality forecasting

Data: Mortality data from death certificates

filed in state vital statistics offices and reported to NCHS from 1968 – 2002;

Population data from decennial census and interpolated between two adjacent decennial census

Age-specific mortality rates were calculated for each race-gender demographic population.

NCHS July 11, 2006

Cont’d 85 age-specific time series for ages 1,…, 85, where

the age category 85+ includes age 85 and above; For each age, time series is from 1970 to 2001,

2002 data are available for comparison with the prediction result;

All the 85 time series are categorized into 5 year age groups 1-5, 6-10, ..., 81-85+, a total of 17 groups;

Death rates at each age are rescaled by centralized from the averages over years;

Residuals from the time series “in the middle” of each group are taken as the reference.

NCHS July 11, 2006

Mortality age-patterns across four decades: 1970 – 2000: US National Vital Statistics

NCHS July 11, 2006

Age-specific time series for log-death rates

NCHS July 11, 2006

Log-death rates centered by rescaling from age-specific averages over years

NCHS July 11, 2006

Centered age-specific time series for log-death rates

NCHS July 11, 2006

Mortality forecasting procedure

NCHS July 11, 2006

Procedure cont’d

NCHS July 11, 2006

Fit of TS & histogram of residuals

NCHS July 11, 2006

Comparison for single age

NCHS July 11, 2006

Comparison of age groups 32-34 & 31-35

Combining more information increases the fit of density

curves

NCHS July 11, 2006

Empirical (solid) and estimated (dot) CDF

NCHS July 11, 2006

One-year-ahead predictive distribution

NCHS July 11, 2006

Predicted mortality curves from LC & SP models in 2002

NCHS July 11, 2006

Predicted mortality curves for age group 1-30

NCHS July 11, 2006

Predicted mortality curves for age group 31-50

NCHS July 11, 2006

Predicted mortality curves for age group 51-70

NCHS July 11, 2006

Predicted mortality curves for age group 71-85

NCHS July 11, 2006

Mean Square Error of prediction from Semiparametric model (SP) & Lee-Carter (LC)

MSE for total population

MSE for Female

Age Group

1-85 1-30 31-50 51-70 71-85

SP model .104 .050 .015 .030 .009

LC model .297 .078 .180 .029 .013

Age Group

1-85 1-30 31-50 51-70 71-85

SP model .187 .121 .026 .032 .008

LC model .619 .226 .341 .027 .025

NCHS July 11, 2006

Black females in IA, 1999

-8

-7

-6

-5

-4

-3

-2

-1

0

0 20 40 60 80 100

age

log

(d

eath

rat

e)

Black males in IA, 1999

-8

-7

-6

-5

-4

-3

-2

-1

0

0 20 40 60 80 100

age

Log(

deat

h ra

te)

TRUE

U75

L25

Estimate

Semiparametric Time Series Estimate: Mortalities in Small Populations

NCHS July 11, 2006

Semiparametric Time Series Estimate: Mortalities in Small Populations

White females in DC, 1999

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

0 20 40 60 80 100age

Lo

g(d

ea

th r

ate

)

TRUE

U75

L25

Estimate

White males in DC, 1999

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

0 20 40 60 80 100

age

Lo

g(d

ea

th r

ate

)

NCHS July 11, 2006

Conclusion Historical data fitted by the time series -

semiparametric model can help when estimating mortality rates in small areas with insufficient observations;

Compared to LC model, the semiparametric method reduces the overall MSE appreciably due to better modeling the predictive probabilities with conditional distributions;

This is a non-Bayesian method. The Bayesian method will result in relatively large prediction interval, so further than one year ahead prediction could apply.

NCHS July 11, 2006

Alternative ways to solve the problem of estimating mortalities for small areas

In addition to the way of borrowing strength from historical data, other alternatives include:

Borrow strength from national mortality data;

Borrow strength from geographic neighborhood data;

Borrow strength from other area data with similarities in cause of death.

NCHS July 11, 2006

Small area estimation by Bayesian: borrow national data strength

Mortality curve - black male, IA

-10

-8

-6

-4

-2

0

0 10 20 30 40 50 60 70 80 90

Age

Lo

g (

q)

EstimatedState observationNational observation