logistic regression iii: advanced topics conditional logistic regression for matched data...
Embed Size (px)
TRANSCRIPT

Logistic Regression III: Advanced topics

Conditional Logistic Regression for Matched Data

Recall: MatchingMatching can control for extraneous sources of variability and increase the power of a statistical test.Match M controls to each case based on potential confounders, such as age and gender.If the data are matched, you must account for the matching in the statistical analysis!!

Recall: Agresti example, diabetes and MIMatch each MI case to an MI control based on age and gender. Ask about history of diabetes to find out if diabetes increases your risk for MI.

odds(favors case/discordant pair) =

Conditional Logistic Regression

The Conditional Likelihood: each discordant stratum (rather than individual) gets 1 term in the likelihoodNote: the marginal probability of disease may differ in each agegender stratum, but we assume that the (multiplicative) increase in disease risk due to exposure is constant across strata.For each stratum, we add to the likelihood: the CONDITIONAL probability that the case got disease and the control did not, given that we have a casecontrol pair.

Recall probability terms:

The conditional likelihood=

Conditional Logistic Regression

Example: MI and diabetes

Conditional Logistic Regression

In SASproc logistic data = YourData; model MI (event = "Yes") = diabetes; strata PairID; run;

Could there be an association between exposure to ultrasound in utero and an increased risk of childhood malignancies?Previous studies have found no association, but they have had poor statistical power to detect an association.Swedish researchers performed a nationwide population based casecontrol study using prospectively assembled data on prenatal exposure to ultrasound.
Example:Prenatal ultrasound examinations and risk of childhood leukemia: casecontrol study
BMJ 2000;320:282283

Example:Prenatal ultrasound examinations and risk of childhood leukemia: casecontrol study
BMJ 2000;320:282283535 cases: all children born and diagnosed as having myeloid leukemia between 1973and 1989in Swedish registers of birth, cancer, and causes of death. 535 matched controls: 1 control was randomly selected for each case from the Swedish Birth Registry, matched by sex and year and month of birth.

23510011585But this type of analysis is limited to single dichotomous exposure

Used conditional logistic regression to look at doseresponse with number of ultrasounds:
Results:Reference OR = 1.0; no ultrasoundsOR =.91 for 12 ultrasoundsOR=.64 for >=3 ultrasounds
Conclusion: no evidence of a positive association between prenatal ultrasound and childhood leukemia; even evidence of inverse association (which could be explained by reasons for frequent ultrasound)

Each term in the likelihood represents a stratum of 1+M individualsMore complicated likelihood expression! Just as easy to implement in SAS as well see WednesdayExtension: 1:M matching

Ordinal Logistic Regression

Ordinal Logistic RegressionWhat if your outcome variable has more than two levels?
For ordinal outcomes, use ordinal logistic regression:
*Relies on the cumulative logit*Models the predicted probability of multiple outcomes*Also known as the proportional odds model

Ordinal Variable Example: Likert Scale1 = strongly disagree2 = disagree3 = neutral 4 = agree5 = strongly agree
Cumulative outcomes: *strongly agree vs. the rest *agree or strongly agree vs. neutral or negative *agree or neutral vs. negative *the rest vs. strongly negativeOrdinal logistic regression gives you a way to model these cumulative outcomes all at once!
 Ordinal Variable Example: Continuous variable measured crudely1 = breastfed >=6 months2 = breastfed 45 months3 = breastfed 23 months4 = breastfed

Another example, 3 levels:1 = eumenorrhea (normal menses) (66.6%)2 = oligomenorrhea (mild irregularity) (24.6%)3 = amenorrhea (severe irregularity) (8.6%)
From my data on runners:

Cumulative logit, 3 groups(2 potential positive outcomes)In words: The log odds of having amenorrhea (versus everything else).And the log odds of having any irregularity (versus normal).

Corresponding logistic model (no predictors)The interceptonly model, no predictors (two intercepts!):Log odds (amenorrhea)= amen Log odds (any irregularity)= amen or oligo

Fitted model:Logit of amenorreha=8.6% of my sample has amenorrheaOdds = 8.6/91.4=.094Ln (.094) = 2.3623Logit of any irregularity=33.3% has any irregularity (24.6% + 8.6%)Odds=(1/3)/(2/3) = 1/2Ln(1/2) = .70Fitted models are: Log odds (amenorrhea)= 2.36 Log odds (any irregularity)= 0.70

Logistic model with predictors:Log odds (amenorrhea)= amen + 1*X1 + 2*X2
Log odds (any irregularity)= amen or oligo + 1*X1 + 2*X2
Note, different intercepts but shared betas (shared slopes)!

Odds ratio interpretation (a):

Odds ratio interpretation (b):

Odds ratio interpretation:Interpretation of the betas:e = adjusted odds ratioFor every 1unit increase in X, its the increase in the odds of any menstrual irregularity compared with none and its also the increase in the odds of amenorrhea compared with the other two categories (adjusted for any other predictors in the model).
Note: proportional odds assumption! The odds ratios are the same across different levels of the outcome.

Example predictor, EDIA:Score on the anorexia subscale of the eating disorder inventory (EDIA)

Cumulative logit plot (4 bins)

Fitted model with EDIA:
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error ChiSquare Pr > ChiSqIntercept 1 1 3.2630 0.3823 72.8648

Fitted Model: Predicted logit at every level of EDIA

Compare actual data and fitted model:

Fitted model with EDIA:Odds Ratio Estimates
Point 95% WaldEffect Estimate Confidence Limits
EDIA 1.129 1.072 1.189
For every 1unit increase in EDIA score, theres a 13% increase in the odds of being amenorrheic versus the other two categories and a 13% increase in the odds of being amenorrheic or oligomenorrheic versus normal.

Predictions:Log odds (outcome)= 3.2630 + 1.3888 + 0.1211*EDIA1
The model predicts that a woman with an EDIA score of 15 would have:

Predictions:Predicted logit=1.446Predicted probability = 19%Predicted logit=.4281Predicted probability = 60.5%

Advantages & disadvantagesOrdinal logistic is better than running separate logistic models for different outcomes (e.g., one model for amenorrhea, one model for any irregularity) because of the improvement in statistical power!Ordinal logistic prevents you from having to arbitrarily turn an ordinal variable into a binary variable!But does require that you meet the proportional odds assumption