exploring the shape of the dose-response function
TRANSCRIPT
Exploring the Shape of the Dose-Response Function
Traditional approach to dose-response analysis The “step function”
Alternative: “Flexible” regression line Spline regression
Examples: logistic/linear/Cox
Outline
Example: Sleep-Disordered Breathing and Stroke
Study: the Sleep Heart Health Study
Data set: cross-sectional
Exposure variable: apnea-hypopnea index (AHI)
Dependent variable: self-reported stroke
Potential confounders: known stroke risk factors
Data set
Observations: N=5,192
Self-reported stroke: N=204
Mean Percentile Distribution 5th 25th 50th 75th 95th
8.9 0.2 1.4 4.5 11.3 34.1
Apnea- Hypopnea Index (AHI)
Traditional Approach: Categorical Analysis
Categorization dummy coding
AHI Q2 Q3 Q4
0 - 1.4 0 0 0
1.5 - 4.5 1 0 0
4.6 - 11.3 0 1 0
>11.3 0 0 1
Traditional Approach: Step Function
Model:
Log odds (stroke) = 1 + 2Q2 + 3Q3 + 4Q4 + Z
Maximum Likelihood Estimates:
Log odds (stroke) =
(-9.924) + (0.301)Q2 + (0.344)Q3 + (0.454)Q4 + Z
Adjusted Odds Ratios of Prevalent STROKEby Quartile of the Apnea-Hypopnea Index
AHI Quartile
1.0 (ref.) 1.35(0.84 - 2.18)
1.41(0.88 - 2.26)
1.57(0.98 - 2.53)
I II III IV
Traditional Approach: Step Function
1.0 1.35 1.41 1.57
0.1
1
10
Adj. OR
Q1 Q2 Q3 Q4
AHI Quartile
Traditional Approach: “Step Function”
Log odds (stroke) = 1 + 2Q2 + 3Q3 + 4Q4 + Z
AHI Fitted Model
0 - 1.4 Log (odds of stroke) = 1 + Z
1.5 - 4.5 Log (odds of stroke) = 1 + 2 + Z
4.6 - 11.3 Log (odds of stroke) = 1 + 3 + Z
> 11.3 Log (odds of stroke) = 1 + 4 + Z
Traditional Approach: “Step Function”
Log odds (stroke) = 1 + 2Q2 + 3Q3 + 4Q4 + Z
AHI Fitted Model
0 - 1.4 Log (odds of stroke) = -9.924 + Z
1.5 - 4.5 Log (odds of stroke) = -9.623 + Z
4.6 - 11.3 Log (odds of stroke) = -9.580 + Z
> 11.3 Log (odds of stroke) = -9.470 + Z
Traditional Approach: Step Function
-9.470 + Z
-9.580 + Z
-9.924 + Z
Log odds (stroke)
-9.623 + Z
0 1.4 4.5 11.3 AHI
Unrealistic assumptions A “step function” We actually don’t believe it; our mind tries to draw an
imaginary smooth line through the step
Choice of categories could influence the shape
Test for trend Not a test for monotonic dose-response Statistical hypothesis testing
Step Function: Problems
Alternative: “Flexible” Regression Line
Spline Regression
Categorize (specify cutoff points)(as in categorical analysis)
Fit the regression line in segments (as in categorical analysis)
Enforce continuity at the junctions (knots) (new)
EXAMPLE: Linear Spline Regression
Log odds (stroke)
0 1.4 4.5 11.3 AHI
Linear Spline Regression
Log odds (stroke)
0 1.4 4.5 11.3
Linear Spline Regression
Fit two straight regression lines
Ensure continuity at the knot (AHI=1.4)
Method:
Define a new variable, SS=0, if AHI<1.4
S=AHI-1.4, if AHI>1.4
Log odds (stroke) = 0 + 1(AHI)+ 2(S)+ Z
To the left of the knot: S=0
Log odds (stroke) = 0 + 1(AHI) + Z
To the right of the knot: S=AHI-1.4
Log odds (stroke) = 0 + 1(AHI) + 2(AHI-1.4) + Z
= 0 -1.4 2 + (1+ 2)AHI + Z
Different slopes
Identical predicted value at the knot (AHI=1.4)
Linear Spline Regression
More Flexible Spline Regression
Quadratic spline
AHI + AHI2
Cubic spline
AHI + AHI2 + AHI3
Basic quadratic spline: Step #1
Determine cutpoints (C1, C2, C3) on the exposure scale (4 categories)
These are either percentiles or some other values. That is, decide on the values of C1, C2, C3 of your choice
C1=?;
C2=?;
C3=?;
Step #2S1 = EXP2;
S2 = 0; S3 = 0; S4 = 0;
IF EXP > C1 THEN S2 = (EXP-C1)2;
IF EXP > C2 then S3 = (EXP-C2)2;
IF EXP > C3 then S4 = (EXP-C3)2;
Step #3
Step #4
Regress the dependent variable on
EXP S1 S2 S3 S4 covariates
And find the four regression equations: one per exposure category(together they form a continuous dose-response function)
Compute and display the dose-response function
C1=14;
C2=29; Example: pack-years of smoking and CHD
C3=43; EXP = pack-years
S1 = EXP**2;
S2=0; S3=0; S4=0;
IF EXP > C1 THEN S2 = (EXP-C1)**2;
IF EXP > C2 then S3 = (EXP-C2)**2;
IF EXP > C3 then S4 = (EXP-C3)**2;
PROC LOGISTIC;
MODEL DIS = EXP S1 S2 S3 S4;
Maximum Likelihood Estimates
Parameter DF Estimate
Intercept 1 -1.7022 (α)
EXP 1 -0.0203 (β0)
S1 1 0.00252 (β1)
S2 1 -0.00265 (β2)
S3 1 -0.00047 (β3)
S4 1 0.000305 (β4)
Log odds (CHD) = α + 0(EXP)+ 1(S1) + 2(S2) + 3(S3) + 4(S4)
EXP Four regression equations
< 14 Log odds (CHD) = S1=EXP2, S2=0, S3=0, S4=0
15-29 Log odds (CHD) = S1=EXP2, S2=(EXP-14)2, S3=0, S4=0
30-43 Log odds (CHD) = S1=EXP2, S2=(EXP-14)2, S3=(EXP-29)2, S4=0
>43 Log odds (CHD) = S1=EXP2, S2=(EXP-14)2, S3=(EXP-29)2, S4=(EXP-43)2
(Unrestricted) Quadratic Spline:Pack-years and CHD
-2
-1.5
-1
-0.5
0
0.5
1
0 15 30 45 60 75 90 105 120 135 150
Pack-years
log o
dds
(cas
enes
s)
-2
-1.5
-1
-0.5
0
0.5
0 15 30 45 60 75 90 105 120 135 150 165
Pack-years
log o
dds
(cas
enes
s)
Cubic Spline RegressionLog odds (stroke) vs. AHI
3 Knots: 0.2, 4.5, 34.1
-4.50
-4.00
-3.50
-3.00
-2.50
0 10 20 30 40 50
AHI
0100200
300400500600
700800900
Cubic Spline RegressionLog odds (stroke) vs. AHI
4 knots: 0.2, 1.4, 11.3, 34.1
-5.00
-4.50
-4.00
-3.50
-3.00
-2.50
0 10 20 30 40 50
AHI
0
200
400
600
800
1000
Spline Regression: Applications
Regression Dependent SAS ProcedureModel Variable
Logistic log odds (Y=1) PROC LOGISTIC
Linear mean Y PROC REG
Cox log (hazard) PROC PHREG
All models are linear functions of the predictors
Spline Regression (within PROC REG)
Systolic BP vs. AHI3 knots: 0.1, 3.6, 29.1
124.0
125.0
126.0
127.0
128.0
129.0
130.0
0 10 20 30 40 50
AHI
0
100
200
300
400
500
600
700
Spline Regression (within PROC REG)
Systolic BP vs. AHI4 knots: 0.1, 1.1, 9.5, 29.1
124.0
125.0
126.0
127.0
128.0
129.0
130.0
0 10 20 30 40 50
AHI
0
100
200
300
400
500
600
700
Spline Regression (within PROC REG)
Systolic BP vs. AHI5 knots: 0.1, 1.1, 3.6, 9.5, 29.1
124.0
125.0
126.0
127.0
128.0
129.0
130.0
0 10 20 30 40 50
AHI
0
100
200
300
400
500
600
700
Spline RegressionKey Advantages
Less restrictive assumptions More regional flexibility Does not rely on statistical hypothesis testing Not as sensitive to the choice of cutoff points Visual inspection of the dose-response pattern Might be used to guide the choice of categories
for traditional categorical analysis
Spline RegressionKey Issues
Moderately sensitive to the number of knots (especially if only 3 are specified)
What do the “bumps and valleys” really mean? Visual (subjective) interpretation
Consider the scale of the Y-axis Consider the amount of data at the tail(s) Straight line at the outermost segments