![Page 1: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/1.jpg)
Modern regression and Mortality
Doug NychkaNational Center for Atmospheric Research
www.cgd.ucar.edu/~nychka
• Statistical models
• GLM models
• Flexible regression
• Combining GLM and splines.
1
![Page 2: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/2.jpg)
Statistical tools for the analysis of Mortalityrelated to air pollution and temperature.
The goal of this talk is to give a gentle introductionto the models used by Zeger, Dominici et al. SPHJohns Hopkins for explaning mortality based on envi-ronmental factors.
The Johns Hopkins group is focused on the effect ofhigh levels of particulates (PM10) on short term, non-accidental mortality rates.
This leverages the National Morbidity, Mortality andAir Pollution Study Database (NMMAPS). (Welty,Peng and McDermott)
2
![Page 3: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/3.jpg)
Schematic of PM10 model:
Mortality in an urban center =
3
![Page 4: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/4.jpg)
Schematic of PM10 model:
Mortality in an urban center =
Calendar effects + Age categories
3
![Page 5: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/5.jpg)
Schematic of PM10 model:
Mortality in an urban center =
Calendar effects + Age categories
+ seasonal effects
3
![Page 6: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/6.jpg)
Schematic of PM10 model:
Mortality in an urban center =
Calendar effects + Age categories
+ seasonal effects
+ particulates + temperature/humidity stress +
3
![Page 7: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/7.jpg)
Schematic of PM10 model:
Mortality in an urban center =
Calendar effects + Age categories
+ seasonal effects
+ particulates + temperature/humidity stress +
+ interaction of temperture with age
3
![Page 8: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/8.jpg)
Schematic of PM10 model:
Mortality in an urban center =
Calendar effects + Age categories
+ seasonal effects
+ particulates + temperature/humidity stress +
+ interaction of temperture with age
+ random component.
3
![Page 9: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/9.jpg)
Schematic of PM10 model:
Mortality in an urban center =
Calendar effects + Age categories
+ seasonal effects
+ particulates + temperature/humidity stress +
+ interaction of temperture with age
+ random component.
This work is a useful motivation for understanding
Poisson Regression
Flexible regression models
3
![Page 10: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/10.jpg)
Some data atributes:
• A response depends on other variables: (Mortalitydepends ontemperature and PM.)
Yk = f (Xk) + noise
Here we interpret the noise as having a mean ofzero.
• The explanatory variables can be eithercatgorical (age category, day-of-week)or continuous (daily average temperature, PM)
• The response may be discrete or NonGaussian
4
![Page 11: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/11.jpg)
Issues for data analysis
How does one specify all of these components in a simple andunambigous fashion?
How does one estimate a multivariate functional relationship?
5
![Page 12: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/12.jpg)
Formula’s
Some variables:
mort: Daily, nonaccidental mortality for three age cat-gories.time: Day
tmp: Daily average temperaturetmp3: Daily average temperature for past three days
PM: Daily Particulate (< 10µg)
AgeCat: Three age categories <65, 65-74, >75
6
![Page 13: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/13.jpg)
Formula for a linear regression of mortality on 3-day temper-ature
mort ~ tmp3
3-day temperature and PM
mort ~ tmp + PM
Dependence on the age categories
mort ~ AgeCat + tmp + PM
These are additive models because the variables ap-pear by themselves and the contribution of each canbe inferred from their individual values.
7
![Page 14: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/14.jpg)
Interactive dependence on the age categories
mort ~ AgeCat+ tmp+ AgeCat:tmp
or justmort ~ AgeCat*tmp
Three different slopes and intercepts, the : is an in-teraction. * includes all possible terms. There areseveral conventions that make this not as transparentin the fit.
This is more interpretable.
mort ~ AgeCat + AgeCat:tmp -1
Coefficients:
AgeCat1 AgeCat2 AgeCat3 tmp:AgeCat1 tmp:AgeCat2 tmp:AgeCat3
66.804 44.522 121.158 -0.156 -0.139 -0.473
8
![Page 15: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/15.jpg)
All models are wrong, but some are useful ...
9
![Page 16: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/16.jpg)
All models are wrong, but some are useful ...
but some models are more wrong than others!
9
![Page 17: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/17.jpg)
All models are wrong, but some are useful ...
but some models are more wrong than others!
Changing variance is missed.Absolute residuals from 3 slopes model.
9
![Page 18: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/18.jpg)
Sublties of the temperature response are missed!
10
![Page 19: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/19.jpg)
A Generalized Linear Model (GLM)
Assume that mortality (Mt) is approximately Pois-son distributed
Mean model with log link function
Mt ∼ Poisson(µt)
E(Mt) = µt
model the log of the mean.
log(µt) = f (covariates)
11
![Page 20: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/20.jpg)
Variance model with over-dispersion
The Poission distribution has a variance equal to themean:V ar(Mt) = µt. This allows modeling of both the meanand variance simultaneously.
Often the Poission does not include enough variabilityso an additional parameter is included:
V ar(Mt) = φµt
φ the over-dispersion parameter.
12
![Page 21: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/21.jpg)
Example of a Poisson Response
13
![Page 22: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/22.jpg)
GLM fit to mortality
First consider the linear model but using the Pois-sion regression. log(µt) has three different slopes as afunction of temperature based on the age categories.
In R-code:
glm( mort~ AgeCat*tmp3, family=quasipoisson)
Results in residuals that better fit model assumptions.
14
![Page 23: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/23.jpg)
Flexible Regression
Response of mortality is not linear, not clear whatfunctional form to choose.
One strategy is to represent the unknown function asa linear combination of basis functions
f (temp) =∑j=1
Mψj(temp)βj
Splines: Local basis functions that allow one to con-trol the flexibility of the shape of f.
15
![Page 24: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/24.jpg)
Three factors that control splines:
• number of knots
• location of knots (usually equally spaced)
• order of fit (usually cubic)
A cubic B-spline basis for temperature (6 knots)
Functions are piecewise cubic in between knots.
16
![Page 25: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/25.jpg)
Controlling the resolution
4 knots
12 knots
17
![Page 26: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/26.jpg)
Applying the spline model to temperature
The simplest model is now
log(µt) = AgeCat + f (tmp3) = AgeCat +M∑j=1
ψj(tmp3t)βj
where ψj are the B-splines.
No interaction here between age category and thetemperature response.
An extension is to have a distinct response to temper-ature for each age category. This gives three seperateB-spline curves with different coefficients.
18
![Page 27: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/27.jpg)
Results with 4 knots
Relative Risk estimates:
19
![Page 28: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/28.jpg)
Results with 6 knots
Relative Risk estimates:
20
![Page 29: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/29.jpg)
Issues of inference
Selecting the amount of curve flexibilityThere are information criteria and cross-validation tech-niques to do this.
Uncertainty in estimatesParametric bootstrapping, fitting synthetic data setsgenerated from the fitted model gives a useful mea-sure of error.
Bayesian methods can also give an idea of the uncer-tainty of the estimated relationships.
21
![Page 30: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/30.jpg)
This talk is ending too soon!
Standardized residuals against time:
22
![Page 31: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/31.jpg)
This talk is ending too soon!
Standardized residuals against time:
Use an additive model:
log(µt) = f1(tmp3t) + f2(t)
f1 and f2 are both modeled using B-splines
22
![Page 32: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/32.jpg)
Additive curves for time trends
23
![Page 33: Modern regression and Mortality · high levels of particulates (PM10) on short term, non-accidental mortality rates. This leverages the National Morbidity, Mortality and Air Pollution](https://reader033.vdocument.in/reader033/viewer/2022050415/5f8b8ce9ec29657a0549ebc0/html5/thumbnails/33.jpg)
Remarks
• There are many new statistical tools to discernstructure in complex data sets.
• Inference can be formalized by Bayesian methods
• One challenge is to combine models across cities.
24