biost 536 lecture 11 1 lecture 11 – additional topics in logistic regression c-statistic...

BIOST 536 Lecture 11 1

Lecture 11 – Additional topics in Logistic Regression C-statistic (“concordance statistic”)

Same as Area under the curve (AUC) in LROC (logistic receiving operating characteristic)

Fit a model and generate logit (p) and p for each observation Form all possible pairings of the m cases and n controls

(total number of pairs is m x n) Compare logit (pcase) to logit (pcontrol) C-statistic is equal to

# pairs (logit logit ) 0.50*# pairs (logit = logit )

# pairs totalcase control case controlAUC

. logistic chd age sc1 sbp Logistic regression Number of obs = 910 LR chi2(3) = 43.01 Prob > chi2 = 0.0000 Log likelihood = -428.26102 Pseudo R2 = 0.0478 ------------------------------------------------------------------------------ chd | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1.027559 .0142093 1.97 0.049 1.000083 1.055789 sc1 | 1.006021 .0019341 3.12 0.002 1.002237 1.009819 sbp | 1.016381 .0035904 4.60 0.000 1.009368 1.023443 ------------------------------------------------------------------------------

. predict xb , xb

. graph box xb, over(chd)

Create all possible pairs of cases (m=178) x number of controls (n=732) = 130,296

Assess number of pairs where the case logit > control logit No ties – get the same result as lroc

Can also compute the c-statistic for the validation sample to test prediction in a new sample

. lroc Logistic model for chd number of observations = 910 area under ROC curve = 0.6473

. tabulate concord concord | Freq. Percent Cum. ------------+----------------------------------- Pcase<Pcont | 45,951 35.27 35.27 Pcase>Pcont | 84,345 64.73 100.00 ------------+----------------------------------- Total | 130,296 100.00

Small sample sizes Logistic regression LR tests, odds ratio estimates, confidence

intervals depend on asymptotic large-sample results May not work well for small samples May not even be able to get estimates in some cases if a

category has all cases or all controls Sir DR Cox proposed some small sample exact logistic

regression methods in his 1970 text Analysis of Binary Data Not computationally feasible until an algorithm developed by

Hirji, Mehta, and Patel (1987) reduced computations (programs marketed as StatXact and LogXact)

Exact logistic regression uses the sufficient statistics for all covariates in the model:

Condition on the sufficient statistics and consider all permutations of the data consistent with the sufficient statistics

Can derive estimates and confidence intervals

i ix y

Small sample sizes Computation can be extensive Can stratify by variables that we control for Methods now included in SAS and Stata (Version 10 on?) Small dose escalation example

Too small for ordinary logistic regression

Dose Deaths N

Small sample sizes Do this example in Stata using exact logistic regression

(exlogistic command)

Do an incorrect standard logistic regression first

Wald test and LR disagree

. list +----------------------+ | dose count death | |----------------------| 1. | 0 3 0 | 2. | 0 0 1 | 3. | 1 3 0 | 4. | 1 0 1 | 5. | 2 3 0 | 6. | 2 0 1 | 7. | 3 3 0 | 8. | 3 0 1 | 9. | 4 2 0 | 10. | 4 1 1 | 11. | 5 1 0 | 12. | 5 2 1 | +----------------------+

. logistic death dose [fw=count] Logistic regression Number of obs = 18 LR chi2(1) = 8.15 Prob > chi2 = 0.0043 Log likelihood = -4.0362174 Pseudo R2 = 0.5023 ------------------------------------------------------------------------------ death | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dose | 8.007606 10.09189 1.65 0.099 .6772423 94.68068 ------------------------------------------------------------------------------

Small sample sizes

Exact logistic regression does show a significant relationship of deaths with dose and gives odds ratio and permutation-based confidence intervals

Note sufficient statistic is

. exlogistic death dose [fw=count] Enumerating sample-space combinations: observation 1: enumerations = 2 observation 2: enumerations = 4 observation 3: enumerations = 7 observation 4: enumerations = 10 observation 5: enumerations = 20 observation 6: enumerations = 30 observation 7: enumerations = 33 observation 8: enumerations = 16 Exact logistic regression Number of obs = 18 Model score = 5.472381 Pr >= score = 0.0245 --------------------------------------------------------------------------- death | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval] -------------+------------------------------------------------------------- dose | 6.049377 14 0.0245 1.122698 353.0003 ---------------------------------------------------------------------------

dose * deaths 0*0 1*0 2*0 3*0 4*1 5*2 14

Small sample sizes – Example 2 Two binary covariates

Only 3 outcomes observed First consider Fisher’s exact test to relate A to outcome

Set up the data using frequency counts

A B Y N

0 0 1 1

0 1 0 2

1 0 1 8

1 1 1 21

. tabulate y a [fw=count] , exact chi2 | a y | 0 1 | Total -----------+----------------------+---------- 0 | 2 27 | 29 1 | 1 2 | 3 -----------+----------------------+---------- Total | 3 29 | 32 Pearson chi2(1) = 2.2365 Pr = 0.135 Fisher's exact = 0.263 1-sided Fisher's exact = 0.263

Small sample sizes – Example 2 Same answer with exact logistic regression

Now consider both covariates together

. exlogistic y a [fw=count] Enumerating sample-space combinations: observation 1: enumerations = 2 observation 2: enumerations = 4 observation 3: enumerations = 6 observation 4: enumerations = 9 observation 5: enumerations = 10 observation 6: enumerations = 4 Exact logistic regression Number of obs = 32 Model score = 2.166601 Pr >= score = 0.2633 --------------------------------------------------------------------------- y | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval] -------------+------------------------------------------------------------- a | .1647192 2 0.5266 .0055396 13.0711 ---------------------------------------------------------------------------

. exlogistic y a b [fw=count] Enumerating sample-space combinations: observation 1: enumerations = 2 observation 2: enumerations = 4 observation 3: enumerations = 8 observation 4: enumerations = 17 observation 5: enumerations = 21 observation 6: enumerations = 12 Exact logistic regression Number of obs = 32 Model score = 4.360821 Pr >= score = 0.0798 --------------------------------------------------------------------------- y | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval] -------------+------------------------------------------------------------- a | .1649572 2 0.5797 .0018228 14.92823 b | .1982354 1 0.4138 .0031804 4.02635 ---------------------------------------------------------------------------

Small sample sizes – Example 3 Crossover design

Same individuals get tested in all treatments Outcome is recorded after each treatment Treatment effect is assumed to wash out quickly after outcome is

measured Order of treatments may still matter

Example has 15 individuals undergoing three treatments but in different orders

+--------------------------+ | person time y drug | |--------------------------| 1. | 1 1 0 1 | 2. | 1 2 0 2 | 3. | 1 3 0 0 | 4. | 2 1 1 1 | 5. | 2 2 1 2 | 6. | 2 3 0 0 | 7. | 3 1 0 1 | 8. | 3 2 1 2 | 9. | 3 3 1 0 | 10. | 4 1 1 1 | 11. | 4 2 0 0 | 12. | 4 3 1 2 | 13. | 5 1 1 1 | 14. | 5 2 0 0 | 15. | 5 3 0 2 |

16. | 6 1 0 2 | 17. | 6 2 0 1 | 18. | 6 3 0 0 | 19. | 7 1 1 2 | 20. | 7 2 1 1 | 21. | 7 3 0 0 | 22. | 8 1 0 2 | 23. | 8 2 0 0 | 24. | 8 3 1 1 | 25. | 9 1 1 2 | 26. | 9 2 0 0 | 27. | 9 3 1 1 | 28. | 10 1 0 2 | 29. | 10 2 1 0 | 30. | 10 3 0 1 |

31. | 11 1 0 0 | 32. | 11 2 1 1 | 33. | 11 3 0 2 | 34. | 12 1 1 0 | 35. | 12 2 0 2 | 36. | 12 3 1 1 | 37. | 13 1 0 0 | 38. | 13 2 0 2 | 39. | 13 3 1 1 | 40. | 14 1 0 0 | 41. | 14 2 1 2 | 42. | 14 3 0 1 | 43. | 15 1 0 0 | 44. | 15 2 1 2 | 45. | 15 3 1 1 |

Small sample sizes – Example 3

Drug: 0 Placebo; 1 Drug A; 2 Drug B Treat time and drug as categorical variables Need to group observations within individual (all comparisons

are within individual)

. xi: exlogistic y i.drug i.time , group(person) i.drug _Idrug_0-2 (naturally coded; _Idrug_0 omitted) i.time _Itime_1-3 (naturally coded; _Itime_1 omitted) Enumerating sample-space combinations: observation 1: enumerations = 1 observation 2: enumerations = 1 observation 3: enumerations = 1 observation 4: enumerations = 2 observation 5: enumerations = 3 observation 6: enumerations = 3 observation 7: enumerations = 6 observation 8: enumerations = 8 etc. observation 43: enumerations = 10286 observation 44: enumerations = 11395 observation 45: enumerations = 6877

Small sample sizes – Example 3

Drug A is significantly different than Placebo Drug B has higher odds ratio than Placebo, but is not

statistically significant Time effects are not strong Have accounted for the correlation within individual by

grouping Conditioning methods used extensively later

Exact logistic regression Number of obs = 45 Group variable: person Number of groups = 15 Obs per group: min = 3 avg = 3.0 max = 3 Model score = 6.14764 Pr >= score = 0.1835 --------------------------------------------------------------------------- y | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval] -------------+------------------------------------------------------------- _Idrug_1 | 5.276637 10 0.0450 1.029821 49.90737 _Idrug_2 | 2.301805 7 0.3516 .5056709 12.74542 _Itime_2 | 1.468032 7 0.9203 .2282417 11.0921 _Itime_3 | .9958269 7 1.0000 .1534726 5.69607 ---------------------------------------------------------------------------

biost 536 lecture 11 1 lecture 11 – additional topics in logistic regression c-statistic...

Documents

advanced programming - lecture 4njc23/lecture4.pdf ·...

biost 536 / epi 536 categorical data analysis in...

biost/stat 578 a statistical methods in infectious...

biost presentation

biost 514/517 biostatistics i / applied biostatistics...

biost 536 lecture 4 1 lecture 4 – logistic regression:...

math statistic

statistic - english.nutn.edu.tw

lecture outline biost 524: design of medical...

biost 536 lecture 14 1 lecture 14 – stratified models...

question biost 536 / epi 536 categorical data analysis in...

biost 514/517 biostatistics i / applied biostatistics...

test statistic: group comparison jobayer hossain larry...

biost 518 / biost 515 applied biostatistics ii ... · biost...

biost 536 lecture 12 1 lecture 12 – introduction to...

lecture 9: chapters 11 &12 repeated measures t - statistic &...

biost 518 / biost 515 applied biostatistics ii /...

lecture outline biost 524: design of medical studies 9:...

statistical methods for analysis with missing...

biost 517 lecture outline applied biostatistics i ·...