1 epi 5240: introduction to epidemiology measures used to compare groups october 5, 2009 dr. n....

EPI 5240:Introduction to Epidemiology

Measures used to compare groups October 5, 2009

Dr. N. Birkett,Department of Epidemiology &

Community Medicine,University of Ottawa

Session Overview

• Methods of Comparing groups– Risk/rate ratios– Odd ratios– Difference measures

ONE BIG WARNING!!!!!Some books (e.g. the Greenberg one used in the summer course) rotate their 2X2 tables from the normal approach.

That is, they have the outcomes as the rows and the exposure as the columns.

BE WARNED. This could cause confusion. My tables use the more common approach.

Comparing groups (1)

• Two main outcome measures– Incidence (either risk or rate)– Prevalence

• How do you determine if an exposure is related to an outcome?– Need to compare the measure in the two groups.

• Differences• Ratios (we’ll start with this one).

– Ratio measures have NO units.– All ratio measures have the same interpretation

• 1.0 = no effect• < 1.0 protective effect• > 1.0 increased risk

– Values over 2.0 are of strong interest

Comparing groups:Cohorts (2)

YES NO

YES 1,000 9,000 10,000

NO 100 9,900 10,000

1,100 18,900 20,000

Disease

RISK RATIO

Risk in exposed: = 1000/10000Risk in Non-exposed = 100/10000

If exposure increases risk, you would expect the risk in the exposed to be larger than risk in the unexposed. How much larger can be assessed by the ratio of one to the other: Risk in expRisk ratio (RR) = ----------------------- Risk in non-exp

= (1000/10000)/(100/10000)

= 10.0

YES NO

YES a b a+b

NO c d c+d

a+c b+d N

Disease

RISK RATIO

Risk in exposed: = a/(a+b)Risk in Non-exposed = c/(c+d)

If exposure increases risk, you would expect a/(a+b) to be larger than c/(c+d). How much larger can be assessed by the ratio of one to the other: Risk in expRisk ratio (RR) = ----------------------- Risk in non-exp

= (a/(a+b))/(c/(c+d)

a/(a+b)= -------------- c/(c+d)

YES NO

High 42 80 122

Low 43 302 345

85 382 467

Pollutantlevel

Risk in exposed: = 42/122 = 0.344Risk in Non-exposed = 43/345 = 0.125

Exp riskRisk ratio (RR) = ---------------------- Non-exp risk

= 0.344/0.125

= 2.76

95% CI’s for CIR (1)

• For a mean value, the 95% CI is given as:

• Assumes mean has a normal (Gaussian) distribution

• Might try using the same approach to obtain ’95% CI’ for CIR using:

• BUT: CIR is NOT normally distributed– Range from 0 to +∞– Null value = 1.0– Implies a non-symmetric distribution

0 2 4 6 8

Plot of ‘CIR’ distribution when H0 is true

• Instead, use ‘log(CIR)’ where the log is taken to the ‘natural’ base ‘e– Often written ln(CIR)

• ln(CIR) is approximately normally distributed– Range from -∞ to +∞– Null value = 0.0

• 95% CI is given by:

• Need to find formula for ‘se(ln(CIR))’

Plot of ‘ln(CIR)’ distribution when H0 is true

-6 -4 -2 0 2 4 6

if exposed and unexposed are independent

After some math, this gives the following result (next slide)

YES NO

YES a b a+b

NO c d c+d

a+c b+d N

Disease

95% CI’s for CIR (6)We’re close now. Just take the ‘anti-logs’ (usually called the ‘exp’ function

YES NO

High 42 80 122

Low 43 302 345

85 382 467

Pollutantlevel

Risk ratio (RR) or CIR = 2.76

80 302var(ln(CIR)) = ------------ + ------------- = 0.03597 42*122 43*345

se(ln(CIR)) = sqrt(0.03597) = 0.190

Upper 95% CI = 2.76 * exp(+1.96*0.190) = 4.00Lower 95% CI = 2.76 * exp(-1.96*0.190) = 1.90

Conclusion:CIR is:

2.76 (1.90 to 4.00)

Comparing groups: Cohorts (6)

• Hypothesis testing (H0: CIR=1)

– Much less common than 95% CI’s– Normal approximation test is generally OK

YES NO

YES 1,000 9,000 10,000

NO 100 9,900 10,000

1,100 18,900 20,000

Disease

RISK DIFFERENCE

Risk in exposed: = 1000/10000Risk in Non-exposed = 100/10000

If exposure increases risk, you would expect the risk in the exposed to be larger than risk in the unexposed. How much larger can be assessed by the difference between the two:

Risk difference (RD)

= (Risk in Exp) – (Risk in Non-exp)

1000 100 900= ---------- - ----------- = ----------- = 0.90 10,000 10,000 10,000

YES NO

YES a b a+b

NO c d c+d

a+c b+d N

Disease

RISK DIFFERENCE

Risk in exposed: = a/(a+b)Risk in Non-exposed = c/(c+d)

If exposure increases risk, you would expect a/(a+b) to be larger than c/(c+d). How much larger can be assessed by the difference between the two:

Risk difference (RD)

= (Risk in Exp) – (Risk in Non-exp)

a c= ---------- - ----------- a + b c + d

YES NO

High 42 80 122

Low 43 302 345

85 382 467

PollutantLevel

Risk in exposed: = 42/122 = 0.344Risk in Non-exposed = 43/345 = 0.125

Risk difference (RD) = (Risk in Exp) - (Risk in Non-exp)

= 0.344 - 0.125

= 0.219

We assume that the incidence follows a binomial distributionCan be considered as approximately normal if incidence isn’t too small).

95% CI’s for Risk Diff (1)

95% CI’s for Risk Diff (2)

YES NO

High 42 80 122

Low 43 302 345

85 382 467

Pollutantlevel

RD = 0.219

42*80 43*302var(RD) = ------------ + ------------- = 0.00217 1223 3453

se(RD) = sqrt(0.00217) = 0.047

Upper 95% CI = 0.219 + 1.96*0.047 = 0.310Lower 95% CI = 0.219 - 1.96*0.047 = 0.127

Conclusion:RD is:

0.219 (0.127 to 0.310)

• Which comparative measure do you use?• Depends on the circumstances.• Risk Ratio RELATIVE risk measure• Risk Difference ABSOLUTE risk

measure• Post-menopausal estrogens & endometrial

cancer– RR = 2.3– RD = 2/10,000

Disease Person-years

YES 1,000 9,500

NO 100 9,950

1,100 19,450

RATE RATIO

Rate in exposed: = 1000/9500Rate in Non-exposed = 100/9950

If exposure increases rate of getting disease, you would expect the rate in exposed to be larger than the rate in unexposed. How much larger can be assessed by the ratio of one to the other: Rate in ExpRate ratio (RR) = ------------------------ Rate in Non-exp

= (1000/9500)/(100/9950)

= 10.5

DISEASE Person-time

YES A Y1

NO B Y2

A + B Y1 + Y2

RATE RATIO

Rate in exposed: = A/Y1

Rate in Non-exposed = B/Y2

If exposure increases rate of getting disease, you would expect A/Y1 to be larger than B/Y2. How much larger can be assessed by the ratio of one to the other: Rate in ExpRate ratio (RR) = ------------------------ Rate in Non-exp

= (A/Y1))/(B/Y2)

= -------------- B/Y2

Rate in exposed: = 42/101 = 0.416Rate in Non-exposed = 43/323.5 = 0.133

Rate in ExpRate ratio (RR) = ------------------------ Rate in Non-exp

= 0.416/0.133

= 3.13

Pollutantlevel

Dead Person-years

High 42 101

Low 43 323.5

85 424.5

• Use the same approach to obtain ’95% CI’ for IDR as we used for CIR:

• BUT: IDR is NOT normally distributed– Range from 0 to +∞– Null value = 1.0– Implies a non-symmetric distribution

95% CI’s for IDR (1)

• Instead, use ‘log(IDR)’ where the log is taken to the ‘natural’ base ‘e– Often written ln(IDR)

• ln(IDR) is approximately normally distributed– Range from -∞ to +∞– Null value = 0.0

• Need to find formula for ‘se(ln(IDR))’

if exposed and unexposed are independent

After some math, this gives the following result (next slide)

95% CI’s for IDR (4) DISEASE Person-time

YES a Y1

NO c Y2

a+c Y1 + Y2

DOES NOT DEPENDON PERSON-TIME!!

95% CI’s for IDR (5)We’re close now. Just take the ‘anti-logs’ (usually called the ‘exp’ function

Pollutantlevel

Dead Person-years

High 42 101

Low 43 323.5

85 424.5

Rate ratio (RR) or IDR = 3.13

1 1var(ln(IDR)) = ------ + ----- = 0.047 42 43

se(ln(IDR)) = sqrt(0.047) = 0.217

Upper 95% CI = 3.13 * exp(+1.96*0.217) = 4.79Lower 95% CI = 3.13 * exp( -1.96*0.217) = 2.05

Conclusion:IDR is:

3.13 (2.05 to 4.79)

• Hypothesis testing (H0: IDR=1)

Disease Person-years

YES 1,000 9,500

NO 100 9,950

1,100 19,450

RATE DIFFERENCE

Rate in exposed: = 1000/9500Rate in Non-exposed = 100/9950

If exposure increases rate of getting disease, you would expect the rate in exposed to be larger than the rate in unexposed. How much larger can be assessed by the difference between the two:

Rate difference

= (Rate in Exp) – (Rate in Non-exp)

1000 100= --------- - --------- = 0.095 cases/PY 9500 9950

DISEASE Person-time

YES A Y1

NO B Y2

A + B Y1 + Y2

RATE DIFFERENCE

Rate in exposed: = A/Y1

Rate in Non-exposed = B/Y2

If exposure increases rate of getting disease, you would expect A/Y1 to be larger than B/Y2. How much larger can be assessed by the difference between the two:

Rate difference

= (Rate in Exp) – (Rate in Non-exp)

A B= ------ - ------- Y1 Y2

Rate in exposed: = 42/101 = 0.416Rate in Non-exposed = 43/323.5 = 0.133

Rate difference (RD) = (Rate in Exp) – (Rate in Non-exp)

= 0.416 - 0.133

= 0.283 cases/person-year

Pollutantlevel

Dead Person-years

High 42 101

Low 43 323.5

85 424.5

We assume that the incidence follows a Poisson distributionCan be considered as approximately normal if incidence isn’t too small).

95% CI’s for Rate Diff (1)

95% CI’s for Rate Diff (2)

RD = 0.283 cases/PY

42 43var(RD) = --------- + ----------- = 0.00453 1012 323.52

se(RD) = sqrt(0.00453) = 0.067

Upper 95% CI = 0.283 + 1.96*0.067 = 0.415Lower 95% CI = 0.283 - 1.96*0.067 = 0.152

Conclusion:Rate Diff is:

0.283 (0.152 to 0.415) Cases/PY

Pollutantlevel

Dead Person-years

High 42 101

Low 43 323.5

85 424.5

Some Issues• What does RR (or RD) mean

– Can mean risk or rate ratio. Some people think this is pedantic rather than correct

– Need to tell which from context.– Sometimes referred to as Relative Risk (generic

term).

• Are risk differences or ratios preferred?– RR’s are much more common– Both have a role to play.

• CAN NOT COMPUTE A RISK RATIO!• Can not estimate incidence from a case-control

study.• Can not compute risk differences.• Why? We choose the subjects based on their

outcome status. Usually, that means making the number of cases and controls equal. Hence, the ‘incidence’ in the case-control study is fixed at 0.50. In real world, it is most likely much lower (1/100,000).

• Let’s look at an example.

Comparing groups:Case-control (1)

YES NO

YES 1,000 9,000 10,000

NO 100 9,900 10,000

1,100 18,900 20,000

Disease

RISK RATIO

Risk in exposed: = 0.1Risk in Non-exposed = 0.01

RR = 0.1/0.01

= 10.0

Case Control

YES 1,000 524 1,524

NO 100 576 676

1,100 1,100 2,200

‘RISK RATIO’

‘Risk’ in exposed: = 0.656‘Risk’ in Non-exposed = 0.148

‘RR’ = 0.656/.148

= 4.44

• CAN NOT COMPUTE A RISK RATIO!

• So, what do we do?– Cornfield & Haenzel provided solution in

1960. They looked at the ODDS of exposure. The ratio of the odds of exposure in the cases and controls is almost the same as the RR, if the disease is rare.

YES NO

YES 900 400 1,300

NO 100 600 700

1,000 1,000 2,000

Disease

ODDS RATIO

Odds of exposure in cases = 900/100Odds of exposure in controls = 400/600

If exposure increases rate of getting disease, you would to find more exposed cases than exposed controls. That is, the odds of exposure for case would be high. How much larger can be assessed by the ratio of one to the other: Exp odds in casesOdds ratio (OR) = ----------------------------- Exp odds in controls

= (900/100)/(400/600)

= 13.5

YES NO

YES a b a+b

NO c d c+d

a+c b+d N

Disease

ODDS RATIO

Odds of exposure in cases = a/cOdds of exposure in controls = b/d

If exposure increases rate of getting disease, you would to find more exposed cases than exposed controls. That is, the odds of exposure for case would be high (a/c > b/d). How much larger can be assessed by the ratio of one to the other: Exp odds in casesOdds ratio (OR) = ----------------------------- Exp odds in controls= (a/c)/(b/d)

ad= ---------- bc

Yes No

High 42 18

Low 43 67

PollutantLevel

Odds of exp in cases: = 42/43 = 0.977Odds of exp in controls: = 18/67 = 0.269

Odds ratio (OR) = Odds in cases/odds in controls

= 0.977/ 0.269 = (42*67)/(43*18)

= 3.64

Comparing groups:Case-control (6)Disease

NOTE:Risk ratio = 2.76Rate ratio = 3.13

• Use the same approach to obtain ’95% CI’ for OR as we used for CIR/IDR:

• BUT: OR is NOT normally distributed– Range from 0 to +∞– Null value = 1.0– Implies a non-symmetric distribution

95% CI’s for OR (1)

• Instead, use ‘log(OR)’ where the log is taken to the ‘natural’ base ‘e– Often written ln(OR)

• ln(OR) is approximately normally distributed– Range from -∞ to +∞– Null value = 0.0

• Need to find formula for ‘se(ln(OR))’

95% CI’s for OR (2)

95% CI’s for OR (3) Case Control

YES a b

NO c d

a+c a+d

95% CI’s for OR (4)We’re close now. Just take the ‘anti-logs’ (usually called the ‘exp’ function

Odds ratio (OR) = 3.63

1 1 1 1var(ln(OR)) = ----- + ---- + ---- + ---- = 0.118 42 18 43 67

se(ln(OR)) = sqrt(0.118) = 0.343

Upper 95% CI = 3.63 * exp(+1.96*0.343) = 7.11Lower 95% CI = 3.63 * exp( -1.96*0.343) = 1.85

Conclusion:OR is:

3.63 (1.85 to 7.11)

Yes No

High 42 18

Low 43 67

PollutantLevel

Disease

Comparing groups:Case-Control (7)

• Hypothesis testing (H0: OR=1)

• JUST USE THE STANDARD Chi-square TEST!

• You can compute an OR for a cohort. Why would you do so?– OR’s are the key outcome measure for logistic

regression, one of the most common analysis methods used in epidemiology

– Unless disease is common, the OR and the RR from the cohort will be very similar.

• But, where possible, rate ratios are preferred.

• Cohort studies– Relative risk– Relative rate– Risk/rate differences

• Case-control study– Odds-ratio

Summary: comparisons

1 epi 5240: introduction to epidemiology measures used to compare groups october 5, 2009 dr. n....

Documents

epi 5240: introduction to epidemiology mortality in canada:...

9/20091 epi 5240: introduction to epidemiology descriptive...

epi 5240: introduction to epidemiology an overview of the...

march 20141 back to basics, 2014 population health (1):...

01/20141 epi 5344: survival analysis in epidemiology sas...

12/20091 epi 5240: introduction to epidemiology incidence...

01/20151 epi 5344: survival analysis in epidemiology...

birkett range

01/20151 epi 5344: survival analysis in epidemiology time...

march 20121 back to basics, 2012 population health (1):...

01/20151 epi 5344: survival analysis in epidemiology epi...

march 20151 back to basics, 2015 population health dr....

01/20141 epi 5344: survival analysis in epidemiology quick...

01/20151 epi 5344: survival analysis in epidemiology age as...

disease classification, morbidity, mortality. dr. n....

01/20151 epi 5344: survival analysis in epidemiology cox...

03/2014 back to basics, 2014 population health :...

03/20131 back to basics, 2013 population health : vital &...

11/20091 epi 5240: introduction to epidemiology confounding:...

31/7/20091 summer course: introduction to epidemiology...