general session v, the nitty gritty of hmd · 2017-08-31 · 1.4 1.6 1.8 2 2.2 2.4 x 10 4 ( ) 3 ( )...
TRANSCRIPT
General Session V, The Nitty Gritty of HMD
Moderator:
R. Dale Hall, FSA, CERA, MAAA
Presenters: Magali Barbieri
The Human Mortality DatabaseChallenges and Methods
SOA Living to 100 Symposium, January 4‐6, 2017, Orlando, Session V
Magali BarbieriHMD Associate DirectorUniversity of California, Berkeley, and
French National Institute for Demographic Studies (INED)
Acknowledgement: this presentation is based on the work conducted by many members of the HMD team over the years, at both the University of California, Berkeley, and the Max Planck Institute for DemographicResearch (MPIDR), Rostock.
www.mortality.org
What is in the HMD?
• Detailed historical life table series for 38 countries:– Death counts and estimated population exposures (person‐years lived) at the finest detail possible
– Original estimates of age‐specific death rates and life tables in various formats (age x time)
– Cause‐specific information (forthcoming)• All the raw data used to prepare the mortality series
• Extensive documentation(method protocol and background information)
www.mortality.org
Guiding principles• Comparability
– Over time (from 1751 to 2014)– Across countries (38 mostly high‐income)
• Accessibility (free and relatively painless)• Flexibility (estimates in multiple formats)• Reproducibility (all input data included, full documentation provided)
• Quality control (procedures to identify and correct errors in data and calculations)
www.mortality.org
Additional information in
HMD, Dec. 2016: 38 countriesAustralia Finland Latvia Slovenia
Austria France Lithuania Spain
Belarus Germany Luxembourg Sweden
Belgium Greece Netherlands Switzerland
Bulgaria Hungary New Zealand Taiwan
Canada Iceland Norway Ukraine
Chile Ireland Poland United States
Czech Republic Israel Portugal United Kingdom
Denmark Italy Russia Spain
Estonia Japan Slovakia
www.mortality.org
HMD series by country and time period
SwedenFrance (Civilian)France (Total)DenmarkIcelandBelgiumEngland and Wales (Civilian)England and Wales (Total)NorwayThe NetherlandsScotlandItalySwitzerlandFinlandNew Zealand (non-Maori)SpainNorthern IrelandThe United KingdomCanadaAustraliaThe United StatesPortugalJapanAustriaNew Zealand (Total)SlovekiaIrelandHungaryThe Czech RepublicBulgariaNew Zealand (Maori)West GermanyEast GermanyPolandRussiaBelarusUkraineLatviaLuxembourgLithuaniaEstoniaTaiwanGreeceSloveniaIsraelGermany (Total)Chile
1750 1800 1850 1900 1950 2000
Period life tables onlyPeriod and cohort life tables
Credit: Adrien Remund and Timothy Riffe.
The HMD ideal raw data
• Detailed vital statistics and population data with complete and consistent coverage, perfect quality, timely publication and freely available1. Mortality data: death counts by year, sex, single year
of age up to maximum age at death and year of birth, with timely registration
2. Birth data: live births by year, sex, and month, with timely registration
3. Population data: annual January 1st population estimates by sex and single year of age up to maximum age, with constant definition
www.mortality.org
Basic methods
• Life table calculations based on: – Death counts by sex and Lexis triangle– Exposure counts by sex and Lexis triangle
• Methodological steps with ideal raw data1. Construct exposure counts by sex and Lexis
triangle from births, deaths and annual population estimates
2. Compute death rates (deaths/exposure) by sex3. Compute complete life tables from the death rates
www.mortality.org
Lexis diagram and Lexis triangles
Time (t)
Age (x)
Lexis diagram and Lexis triangles
1/1/2011 1/1/2012 1/1/2013 1/1/2014 1/1/2014
Age 0
Age 1
Age 2
Age 3
Age 4
2011 cohort
2012 cohort
2013 cohort
2014 cohort
Lexis diagram and Lexis triangles
1/1/2011 1/1/2012 1/1/2013 1/1/2014 1/1/2014
Age 0
Age 1
Age 2
Age 3
Age 4
2011 cohort
2012 cohort
2013 cohort
2014 cohort
Death rates computed from the ratio of deaths to population in Lexis triangles
Deaths/ExposuresUpper
Lexis Triangle
Deaths/ExposuresLower
Lexis Triangle
Population Dec. 31stPopu
latio
n Jan. 1st
X+1
X
t t+1
Few country/years with perfect data
CountryAnnual population
estimates by single year of age up to maximum age
Death counts by Lexis triangle up to
maximum age
Sweden 1751-2014 Since 1992 1895-2011
Denmark 1835-2014 Since 1976 Since 1943
Norway 1846-2014 Since 1846 Since 1980
Finland 1878-2012 Since 1995 Since 1917
Iceland 1838-2013 Since 1840 Since 1981
www.mortality.org
Data challenges• Availability
– Publication delays (e.g. CAN)– Gaps in data series (e.g. Missing deaths BEL 1914‐1918; CHL >2002)– Lack of annual population estimates
• Details– Single year, 5‐year or 10‐year age group mortality and population
data rather than Lexis triangle– Diversity of mortality data “shapes”– Open age interval
• Definitions– Live births– De jure vs. de facto reference population– Territorial changes
• Reliability– Under‐registration (births, deaths, population)– Immortals/phantoms– Unknown age– Age misstatement (attraction, overstatement)
A multi‐step process• Gathering of raw data• Input data quality checks (=> decision to publish)• Formating of input data for HMD processing• Estimation of death counts by Lexis triangle• Estimation of exposure counts by Lexis triangle• Territorial adjustment• Calculation of death rates by Lexis triangle• Construction of complete and abridged life tables• Verification (internal consistency) • External validation• Completion of country‐specific documentation• Publication
HMD MethodsFull HMD Methods Protocol available online at:
http://www.mortality.org/Public/Docs/MethodsProtocol.pdf
Step 1. Processing of birth data
Birth counts are used for1. Verification of deaths/population for first
year of life2. Estimating the size of individual cohorts from
birth until their first census3. Adjustment for non‐uniformity in births over
months (estimation of exposure‐to‐risk)
Step 2. Reformating of death counts
Variety of death count formats =>• Proportionate redistribution of age unknown• Estimation of death counts by Lexis triangle from 10‐year, 5‐year or single year age groups
• Redistribution of deaths in open age intervalinto Lexis triangles
A variety of shapes in the original mortality data
Age
TimeSource: Tim Riffe.
Splitting deathsfrom age groups to single years of age
• Same method used to split death counts for all age groups below the open age interval
• Use of cubic splines fitted to the cumulative distribution of deaths within each calendaryear
• Applies to any configuration of death countsbut requires death counts for the first year of life (age 0) and for ages 1‐4 years
Using a spline function to redistribute deaths
0 20 40 60 80 100 1200.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4x 10
4
)()()()()( 31
311
33
2210 nnn kxIkxkxIkxxxxxY
Source: Vladimir Shkolnikov. www.mortality.org
where α0, …, α3 and β1, …, βn are coefficients to be estimatedand Y(x) is the number of deaths cumulated from age 0 to x
Allocating deaths in single years of ageto Lexis triangles (1)
• Constraints1. Deaths at age 0 concentrated in the lower
triangle2. At any age, the distribution of deaths across the
2 Lexis triangles is affected by the relative size of the two cohorts (esp. relevant when rapidchanges in cohort size due to markeddiscontinuities in the birth series – e.g. wars)
Allocating deaths in single years of ageto Lexis triangles (2)
• Approach: Regression analysis (sep. by sex) based on data for ages 0‐104 years from threecountries fitted by weighted least squares– Sweden (1901‐1999)– Japan (1950‐1998)– France (1907‐1997)
Allocating deaths in single years of ageto Lexis triangles (3)
Model• Dependent variable:
– proportion of deaths in the lower Lexis triangle• Explanatory variables:
– Age x– Proportion of births associated with the lower triangle– Year 1918 and year 1919 (Spanish influenza => more deaths in July‐Dec. 1918 and Jan‐June 1919)
– Level of IMR (proxy for mortality trends)– Interaction IMR and Age = 1– IMR < 0.01 for Age = 0
Allocating deaths in single years of ageto Lexis triangles (4)
Formula for men:
Allocating deaths in single years of ageto Lexis triangles (4)
Formula for men:Share of births in the lower triangle
Allocating deaths in single years of ageto Lexis triangles (4)
Formula for men:Influenza epidemics
Allocating deaths in single years of ageto Lexis triangles (4)
Formula for men:Mortality level (as measured by infant
mortality)
Allocating deaths in single years of ageto Lexis triangles (4)
Formula for men:Interaction with x=0
Allocating deaths in single years of ageto Lexis triangles (4)
Formula for men:Interaction with x=1
Allocating deaths in single years of ageto Lexis triangles (4)
Formula for men:
Adjustment for verylow levels of mortality
Source: Wilmoth et al. 2012, HMD MP V6, p.43.
Source: Wilmoth et al. 2012, HMD MP V6, p.43.
Source: Wilmoth et al. 2012, HMD MP V6, p.44.
Source: Wilmoth et al. 2012, HMD MP V6, p.44.
Redistributing deaths in the open age interval into Lexis triangles
• Kannisto method to model cohort mortality at older ages
i.e. intuitive explanation: we use information on cohort survival for the 20 single years of age below the open age interval to infer distribution of deaths inside the open age interval using the Kannistomodel (e.g. if the age interval is 100+ years, we look at cohort mortality at ages 80‐99 years)
www.mortality.org
Step 3. Territorial adjustment
• Needed to reconcile numerator and denominator when change in national boundaries or in definition (eg. from de jure to de facto)
• Adjustment carried out after producing the death and exposure counts by Lexis trianglebut before calculating the death rates
www.mortality.org
Territorial adjustment for changes in definition
Trends in the official population estimates (as of December 31st) by sex, Poland, 1960‐2014
Source: Tymicki et al., 2015, cited by Domantas Jasilionis, in B&D file for POL.
Territorial adjustment for changes in definition
Trends in the official population estimates (as of December 31st) by sex, Poland, 1960‐2014
Source: Tymicki et al., 2015, cited by Domantas Jasilionis, in B&D file for POL.
Post‐censalestimates(1988 Census)
Official post‐censalestimates (2002 Census)
Reconstructedintercensalestimates (2011 Census)
Step 4. Adjustment to raw population data
Methods used to obtain mid‐year annualpopulation estimates by single year of age and sex:
• Linear interpolation• Intercensal survival• Extinct cohorts• Survival ratios
=> Depends on the age group
Rules
• The extinct cohort method is used for all cohorts considered extinct
• The survivor ratio method is used for all almost‐extinct cohorts (= at least age 90 at end of observation period but no yet extinct)
• The intercensal survival method is used for all other cohorts (i.e. same as for ages lower than80 years)
Methods for estimating population at ages 80+ years
A = Intercensal estimates; B = Extinct cohort; C = Survivor ratio
Source: HMD Methods Protocol, V6. www.mortality.org
Population counts
Below age 80• Redistribution of population of unknown age• If January 1st population estimates by single yearof age not available =>Adjustment to Jan. 1st of other annual estimatesby linear interpolation
• If no reliable annual population estimatesavailable, we calculate our own intercensalestimates from Census and vital statistics data (cohort component method)
www.mortality.org
1. Linear interpolation• Used to compute January 1st population estimates when the period between twopopulation estimates (or between a population estimate and a census count) isone year or less
• Simple linear interpolation to derive the January 1st population estimate (separatelyfor each age and sex)
2. Intercensal survival methods
• Used to compute January 1st population estimates when the period between twopopulation estimates (or between a population estimate and a census count) ismore than one year (e.g. between two censuscounts)
• Two separate methods1. One for pre‐existing cohorts2. One for new cohorts (born during the interval)
2a. Intercensal survival methodAn example for pre‐existing cohorts
2b. Intercensal survival methodfor new cohorts
• Same idea as for existing cohortsbut we startfrom the birthsrather than fromthe population
Post‐censal versus inter‐censalpopulation estimates
Source: Dmitri Jdanov (B&D for Bulgaria, 2016)
Some countries never substitute inter‐censalestimates to previous post‐censal estimates when a new census become available (e.g. Germany, Italy, the Czech Republic…) Census vs. post‐censal
estimates in Bulgaria
Estimating population at ages 80+
Motivations1. Lack of detailed population data
(open age interval)2. Age overstatement in population data
Approach
Re‐estimate population size and distribution from cohort death counts
Methods to estimate population at ages 80+ years
1. Intercensal estimates for non‐extinct cohortsaged 80‐89 by end of observation period(except for N. European countries)
2. Survivor ratio (cohorts aged 90+ by end of observation period)
3. Extinct cohort methods
www.mortality.org
Extinct cohort method
• Cohort considered extinct by comparison withsurvival curve in prior cohorts
• For cohorts considered as extinct, populations are estimated by reverse survival of deaths in each cohort.
Survival ratio method
• Survival ratio R :with
• When mortality is stable:• Adjustment when R changes smoothly:
such that
Step 4. Computing exposure‐to‐risk
• Calculated from the annual Jan. 1st population estimates
• Small correction reflecting the timing of deaths during the interval
• Adjustment for cohort size using birth‐by‐month data if available
Ex (exposure) ratios from one year to the next, France
Source: Tim Riffe.
Map of mortality deviations
Source of the problem
Births by month in France from 1912 to 1921
Source: Tim Riffe.www.mortality.org
Calculation of the exposure‐to‐risk
• Start from annual population estimates• Adjust for cohort sizeusing births by monthinformation
• If no birth‐by‐month data,assumption of uniformdistribution of birthswithin calendar year:
Using birth‐by‐month data to compute exposures more precisely
where
and
with the coefficients s1, s2, u1 and u2 calculated using the distribution of birthdays within annual cohort:
Estimations for period rates (using simple rules of algebra)
Estimating exposures with birth‐by‐month data
Estimations for cohort rates (using simple rules of calculus)
And exposure estimated as:
Where zL and zU are calculated using the distribution of birthdays within annual cohort:
Estimations for cohort rates (using simple rules of calculus)
Using birth‐by‐month data to compute exposures more precisely
Step 6. Computing the death rates
• 1. Ratio of death counts to exposure‐to‐riskestimates in matched intervals of age and time (Lexis triangles)
• 2. Smoothing of period rates at older ages by fitting a logistic function with asymptote at 1 (to account for inherent randomness of mortality at older ages) (Kannisto model)
Fluctuating mortality rates at higher ages
Source: HMD.
2. Smoothing of life table death rates
• Smoothing of central death rates above age Yusing Kannisto’s parametric model
• Age Y chosen as the lowest age with at most100 male or female deaths andso that 80 <= Y <= 95
Logistic curve of death rates with asymptot at one to estimate the instantaneous death rate fonction μxwith a >= 0 and b >= 0
Fitting a mortality curve at higher ages
Source: Tim Riffe.
Validation
Source: Tim Riffe.
Sweden, 2000 Females
www.mortality.org
Step 7. Construction ofcomplete and abridged life tables
• Start with the death rates • Estimate a0 using Andreev and Kingkade’sformula (taking the level of mortality intoaccount)
• For the open age interval:• For all other ages:
Re‐estimation of a0 using Andreev and Kingkade, 2015
Source: Tim Riffe. www.mortality.org
Life tables
• Period rates converted to probabilities of deathusing standard demographic method
• Cohort probabilities of death computed directlyfrom raw data (related to cohort death rates in consistent way)
• Complete life tables by sex constructed usingstandard demographic methods
• Abridged and both‐sex combined life tables constructed from the complete life tables by sex(for consistency of e(x) and other quantities)
Many thanks to our collaborators, users, host institutions…
• To our friends and colleagues around the world who have help us to build the database
• To the many users of the data who make our work worthwhile
• To the Max Planck Society in Germany, the Department of Demography at UC Berkeley, the French national institute for demographic studies, the US National Institutes of Health, and the Berkeley Center for the Economics and Demography of Aging for their continuing support
• To the U.S. National Institutes of Health for its long term support of the HMD UCB team
www.mortality.org
… and to our private sponsorsThank you to
– the Society of Actuaries – the Canadian Institute of Actuaries– Hannover‐Re
for their financial contributions to the HMD
As well as to– the UK Institute and Faculty of Actuaries– SCOR‐France– AXA‐France– Hymans‐Robertson
For pledging forthcoming support
Disclaimer
The Human Mortality Database team is solely responsible for the content of this presentation, which does not necessarily represent the official views of the National Institutes of Health and other funders.
www.mortality.org