aspire session 5 part 1: interpreting resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1...

12
1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are constantly weighing risks. If I give this woman a bisphosphonate, what is her risk of developing osteonecrosis of the jaw? If I increase this man’s beta-blocker dose, what is his risk of bradycardia? If I change this patient’s SSRI from a brand-name to a generic, what is the risk she will have a conniption? How to Calculate Risk Risk expresses the likelihood an event will occur in a given population. Mathematically, that is: number of events number of individuals in that population Risk can be graphed, providing a handy way to explain this concept. Example: Consider the risk of head injury in rugby players. The National Urban Rugby Association League (NURAL) reports that 30% of players experienced neural injury last season. The NURAL mothers’ auxiliary decided to make the players wear helmets for a season and found the neural injury rate in NURAL players with helmets was 20%: Quiz Question 1: What is the treatment effect of wearing helmets? Answer in your own words here: __________________________________ (hint: more than one correct answer!)

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

1

ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are constantly weighing risks. If I give this woman a bisphosphonate, what is her risk of developing osteonecrosis of the jaw? If I increase this man’s beta-blocker dose, what is his risk of bradycardia? If I change this patient’s SSRI from a brand-name to a generic, what is the risk she will have a conniption? How to Calculate Risk Risk expresses the likelihood an event will occur in a given population. Mathematically, that is:

number of events number of individuals in that population

Risk can be graphed, providing a handy way to explain this concept. Example: Consider the risk of head injury in rugby players. The National Urban Rugby Association League (NURAL) reports that 30% of players experienced neural injury last season. The NURAL mothers’ auxiliary decided to make the players wear helmets for a season and found the neural injury rate in NURAL players with helmets was 20%:

Quiz Question 1: What is the treatment effect of wearing helmets? Answer in your own words here: __________________________________ (hint: more than one correct answer!)

Page 2: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

2

Does your answer match one of the multiple choice options below? The treatment effect of wearing helmets is:

a. 0.67 b. 33% c. 10% d. Cannot be determined from the data provided

Answer: In this case, options a, b, and c are all correct. They are just different ways of expressing risk differences. The relative risk (RR) of neural injury in the treatment group is 0.67, the relative risk reduction (RRR) is 0.33 or 33%, and 10% is the absolute risk difference (ARD, sometimes called the absolute risk reduction, but it isn’t always a reduction, so ARD is really a better term). Let’s talk more about these terms… Relative Risk (RR) When interpreting research results, we often use the term relative risk. Relative risk, in this sense, differs from the experience when one marrying into a large, eccentric family and suddenly acquires a bunch of zany in-laws… Our interest here is in understanding the risk of an outcome in a particular group of people exposed to something relative to the risk in non-exposed people (control group). In this case, we have an historical control—our control group outcomes were measured in the past. Quiz Question 2: Draw a line to match the RR value to its correct interpretation Compared to the control group, the therapy is associated with: a. RR > 1 (a) a lower risk of the outcome b. RR = 1 (b) a higher risk of the outcome c. RR < 1 (c) the same risk of the outcome Quiz Question 3: How do you calculate the Relative Risk (RR) of neural injury in players wearing helmets relative to the risk in those without helmets?

a. Risk in helmet group (risk in helmet group + risk in no helmet group)

b. Number of events in helmet group / Total people in helmet group Number of events in no helmet group / Total people in no helmet group

c. Number of events in helmet group Number of events in no helmet group

d. Risk in no helmet group Risk in helmet group Answers are at the top of the next page, but try getting them yourself before peeking.

Page 3: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

3

Q 2: a = (b) b= (c) c= (a) Q 3: The correct answer is b If you got it right---congrats! You’ve clearly been paying attention because you had to apply the concepts you learned earlier about risk (see “How to calculate risk” above). Give yourself 5 bonus points. Answer (d) actually provides the same info, but it is the inverse of what the question asked. It provides the risk of injury in the no-helmet group relative to the helmet group. Plugging in the numbers for our rugby example, the RR of neural injury is: Risk in helmet group = 20% = 0.67 Risk in non-helmet group = 30% Quiz Question 4: What type of variable must you be dealing with to calculate RR? a. continuous b. ordinal c. dichotomous The correct answer is c. RR, OR, and HR calculations all utilize yes/no-type variables. Some people like to calculate RR using the statistician’s dear friend, the 2x2 Table. Allow me to introduce this useful tool:

Outcome + Outcome -

Intervention Group A B

Control Group C D

The 2x2 Table allows us to conveniently write the RR formula as: A/(A+B) RR = C/(C+D) For our NURAL folks, the 2x2 Table would look like this (we had 200 in each group):

Neural injury No neural injury Total

Helmets 40 160 200

No Helmets 60 140 200

Quiz Question 5: Try calculating RR yourself… see if you get the same result I did above: RR= Relative Risk Reduction (RRR) For some reason, our human brains naturally resist thinking in terms of relative risk, and tend to convert this figure to a relative risk reduction (RRR). The RRR = 1-RR, so it’s not like our brains are performing calculus here, but it’s still kind of an interesting phenomenon. For our rugby players, RRR = 1-0.67 = 0.33. In other words, helmet use is associated with a 33% relative risk reduction for neural injury, which, incidentally, made one out of 3 mothers quite happy.

Page 4: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

4

Absolute Risk Difference (ARD) Another way to compare risk between groups that you probably know is the absolute risk difference (ARD), also called the absolute risk reduction (ARR). However, sometimes the risk difference is an increase versus a reduction, so the more general term ARD is a little easier to use.

ARD = Risk (intervention) – Risk (control) In cases where the risk of an event is higher in the control group, such as would be the case if your outcome is adverse drug events and your intervention therapy has a lower risk for these events, the ARD could also be calculated as:

ARD = Risk (control) – Risk (intervention) To keep things simple, just put the bigger number first, unless you like dealing with negative numbers. Number Needed to Treat (NNT) or Number Needed to harm (NNH) The cool thing about ARD is that it allows you to calculate a very useful parameter called Number Needed to Treat (NNT). A similar concept to NNT is the Number Needed to Harm (NNH). This applies to negative outcomes associated with your intervention, such as adverse drug events, and is calculated the same way. The NNT is the inverse of the ARD, or 1/ARD. It represents the number of subjects who need to receive the intervention being studied (for the length of time of the study) in order to avoid one additional outcome. It’s important to specify one additional outcome, because we generally don’t succeed in preventing all of them. In our example, the ARD for helmets vs no helmets is 30% - 20% = 10% Quiz Question 6: How many NURAL players do we need to make wear a helmet for one season to avoid one additional neural injury? Write answer here: ________________ People often get confused about whether they should use % or the actual number when performing this calculation. The solution is just to be consistent with what is in the numerator and the denominator. If you like to use %, calculate NNT as:

100% / ARD (10% in our example) = 10 If you prefer the purity of plain numbers, calculate NNT as: 1 / 0.1 = 10 As you can see, you get the correct answer either way.

Page 5: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

5

Interpreting Confidence Intervals The next logical question is whether this difference is statistically significant. Either the p value or the confidence interval will tell you this, but the CI also tells you the size and precision of the result. A 95% CI is equivalent to a p value of 0.05—that is, when your p value is < 0.05, the 95% CI will also be significant if it does not cross the value indicating no difference (either a 1 or a 0, depending on the outcome). Some people get confused with how to tell when confidence intervals show statistical significance; they can never remember whether the significance “cutoff” is when 1 or 0 lies within the interval. You can figure it out by thinking “Hmmm…. What would the results look like if the two groups were the same?” There are two types of scenarios: Most Common Scenario: For any outcome that uses a ratio (i.e. relative risk, odds ratio, hazards ratio—recall this means a dichotomous variable is in play here), a ratio of 1 means the top and bottom numbers are the same. That is, having 1 in the confidence interval means NO DIFFERENCE. Less Common Scenario: The other scenario occurs when comparing a continuous variable between two groups, such as blood pressure, weight, or SAT scores. In this case, a value of 0 indicates no difference between groups, so a 0 within the CI means NO DIFFERENCE.

For example, say we measure the attention span for pharmacy residents reading a module on statistical interpretation. One group is given Starbuck’s coffee (group S), and the other is given chocolate milk (group M) before settling down to read their module. Group S lasts 22 minutes before they drop off and Group M lasts 28 minutes, for a difference of 6 minutes (95% CI= 1-11). In this case, no difference between the groups would be a value of 0 minutes. Since the CI does not include 0, we can conclude that chocolate milk is associated with a longer attention span than a pricy cup of joe. Our best guess is that this means a difference of 6 minutes, but we are 95% certain the true difference lies between 1 and 11 minutes.

If you were wondering about our NURAL case, the confidence interval for the RR (0.67) is (0.56-0.92). Is this statistically significant? A ratio of 1 would mean the 2 groups are the same, so look to see whether the interval crosses 1—it doesn’t, so the 2 groups are statistically different. Quiz Question 7: (answers at end of module) For each scenario, write the letter corresponding to the correct description of the result a. no difference in risk between the groups b. outcome is less likely to occur in the experimental group vs the control group c. outcome is more likely to occur in the experimental group vs the control group 1. RR = 1.21 (0.56 – 1.39) ____ 2. RR = 1.15 (1.02 – 1.28) ____ 3. RR = 0.87 (0.68 – 0.98) ____

Page 6: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

6

RR vs AR—using this info wisely Most journals like to report RR and RRR, because this number is generally more exciting than ARD. As a pharmacist evaluating new drug therapies, you need to take into account baseline risk when assessing the benefits and risks of new therapies. Here’s a story to illustrate this better: The Badminton Association of America (BAA—here’s their logo) got wind of the NURAL success with the new helmets, and they decided to implement the helmets in their own league. Here are their results:

As you can see, their baseline injury rate was lower than for the rugby players, but the relative effect of the helmets was the same. Quiz Question 8: (answers at end of module) Calculate RR, RRR, ARD, and NNT for the BAA players: RR = RRR = ARD = NNT = Notice anything interesting about the NNT? In a population where the baseline event rate is low, you have to treat a lot more individuals to prevent an outcome. Also, you may wish to know, the confidence interval for this study of 400 BAA players was wide: RR=0.67 (95% CI 0.21 – 1.24). Is this statistically significant? Hint: The BAA players traded their helmets in for much cooler-looking sweat bands. There’s no universally “acceptable” number for what NNT indicates a therapy should be adopted. You have to take into consideration risks, benefits, and cost.

Page 7: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

7

Odds Ratio (OR) What’s an odds ratio? It’s similar to RR in that it describes the strength of association between an event (e.g. experiencing a side effect or death) as relates to an exposure, but differs in that it’s calculated using retrospective data (vs prospective for RR).

RR tells the risk of developing a disease or outcome given an exposure. It is used in prospective studies. When we start with a sample of a population, we know in advance who eats carrots and who does not. We can classify these people and measure them later to see who develops poor eyesight.

OR tells the odds of exposure, given an individual has the disease or outcome. It is used in retrospective, typically case-control studies. Case-control studies are trickier since, by design, you select subjects who already have the outcome of interest, such as poor eyesight, and match them to independently-selected controls without the outcome. Then you sort both groups according to who consumes carrots. Since you don’t have a common population base, you can’t calculate the true risk of poor eyesight.

Population

Sample

Exposed

Unexposed

Outcome Yes

Outcome Yes

Outcome No

Outcome No

Population

CasesControls

Disease Yes

Disease No

Exposure

No Exposure

No Exposure

Exposure

Page 8: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

8

If we accept two assumptions, we can make a decent estimate of disease risk within a case-control study: Assumption 1: The control group represents the general population with respect to the presence of risk factors. Assumption 2: The disease or outcome is relatively rare. Let’s look at some actual case-control data using our friend the 2x2 Table. The

investigators wanted to measure the relationship between hip fracture and proton pump

inhibitors (PPIs). They identified subjects with a hip fracture, and then matched them to

subjects without a hip fracture by birth year, sex, and calendar period. Next, they

looked back at pharmacy records to quantify exposure to a PPI:

Hip fracture (Cases)

No Hip fracture (Controls)

> 1 year exposure to PPI

A

571

B

12,985

No exposure to PPI

C

3,351

D

132,035

Since this outcome is rare, A and C are small compared to B and D, so we can use B and D as an estimate of A+B and C+D (as in equation for RR at the middle of page 3). This estimate of relative risk is called the odds ratio (OR): A/B OR= C/D Quiz Question 9: Calculate the OR for the above study: OR = So we can now say that given an individual has a hip fracture, his or her odds of exposure to a PPI in the past year is 1.7. If our assumptions are correct (the distribution of PPI use in the controls represents the general population, and hip fractures are rare), we know this is a reasonable estimate of the relative risk of a hip fracture given PPI exposure. We also should recall it is inappropriate to use the term RR, lest the guardians of statistical correctness rise up and zap us with a bell-shaped curve. Technically, you also can’t really calculate a true ARD, but you can calculate an estimate of ARD based upon the difference in odds between the two groups, and therefore an estimate of NNT or NNH.

Page 9: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

9

Hazard Ratio (HR) This is the last commonly-reported ratio to introduce. HR’s are frequently employed in clinical trials studying new drug therapies. The calculation is more sophisticated than that for RR or OR and is based upon Cox-proportional Hazard regression. The HR tells the hazard of an event or outcome over time given an exposure. Recall, RR tells the risk of developing a disease or outcome given an exposure. It is calculated at a specific time point. For instance, we might calculate RR at the end of a 2-year study. Notice: the main difference between these two measures is the incorporation of TIME, so you can think of HR as a RR wearing a wristwatch. Any time your outcome involves time, think about the Cox proportional hazard model. It turns out that the sophisticated HR is too complex (some say uppity) to associate with our humble friend the 2x2 Table, but it does have a solid relationship with a nice graph named Kaplan-Meier.

This Kaplan-Meier curve represents the results of a cohort study in men with low testosterone levels. [JAMA 2013;310(17):1829-36] Some men received testosterone replacement (N=1223) and the remaining 8709 did not. Over 5½ years of follow-up, men receiving testosterone had a significantly lower survival rate.

Page 10: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

10

A word on bias and confounding—can you tell the difference? The testosterone study brings up an opportunity to reinforce the really really important concepts of bias and confounding. This was not a randomized trial; subjects received testosterone or not according to physician discretion, so differences may exist in patient characteristics that could affect the outcome (survival). If we look at some of the patient characteristics listed in Table 1 for this study, we see that this is true: Table 1: (Selected) Characteristics of Patients at Study Entry

No Testosterone Therapy (n=7,436)

Testosterone Therapy (n=1,223)

P value

Age 63.8 60.6 <0.001

Hypertension (%) 92.9 90.0 0.001

Hyperlipidemia (%) 88.3 85.9 0.02

Diabetes (%) 55.7 53.2 0.09

Depression (%) 35.3 36.6 0.37

Prior myocardial infarction (%) 21.7 18.6 0.02

Quiz Question 10: a. In looking at this table which group would you assess to be “sicker”? b. How would you expect this to affect the results? c. How should the authors deal with this problem? d. Is this difference an example of bias or confounding? Answers: a. The “no testosterone” group is older, and has significantly higher instances of hypertension, hyperlipidemia, and prior MI. b. You would expect these differences to result in a higher mortality rate in the “no testosterone” group. However, we just saw that these folks have a lower mortality rate compared to testosterone users. You may wonder, did the authors account for these factors? If not, did they under-estimate the detrimental effects of testosterone? c. In a randomized trial, the distribution of these factors is usually evenly distributed, but in a cohort study like this one, we need to use statistical methods to control for these confounders. A common technique for controlling for confounding factors is to use logistic regression modeling. This case was more complex, with lots of inter-correlated factors, so they used a different technique I’m not even going to attempt to describe. The important thing is that they controlled for the confounding factors.

Page 11: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

11

d. This scenario represents confounding (in case you didn’t already conclude that from the prior paragraph). A confounder is a feature of study subjects that has not been separated from another feature, in this case, patient disease states could not be separated from their use of testosterone. It is a common problem with non-randomized studies and is important to address. In contrast, bias is a systematic, error in the design, conduct or analysis of a study that results in mistaken conclusions about the exposure’s effect on risk. If you need a brief review, here are a few examples:

Selection bias In the 1948 presidential election, the Chicago Daily Tribune had to go to press before all the polling places had closed, so relied on a telephone poll of voters to predict the results. They incorrectly reported “Dewey Defeats Truman” on the front page of their paper. What went wrong? The poll only included voters with a telephone, and thus a disproportionate number of wealthier (& republican) households, a classic example of selection bias.

Recall bias Say you want collect data from mothers whose babies are healthy and mothers whose kids are malformed. The data from the latter is usually more accurate, because mothers of malformed babies have carefully reviewed every illness that occurred during the pregnancy, every drug taken, every detail directly or remotely related to the tragedy in an attempt to find an explanation. Mothers of healthy infants, in contrast, don’t pay much attention to this information.

Observer bias Let’s say two pairs of individuals are assigned to assess which houses in the neighborhood need painting. Jane and Kate examine the north half while Tom and Jerry take the south. The ladies report that 28% of the houses need painting, while the guys only identify 5%. Are they really that different or are the women just pickier?

Don’t forget to consider statistical versus clinical significance!

Page 12: ASPIRE Session 5 Part 1: Interpreting Resultsaspirekpco.weebly.com/uploads/1/5/9/3/15930538/...1 ASPIRE Session 5 Part 1: Interpreting Results What is risk? In medical care, we are

12

Summary of formulas covered in this unit:

Measure Formula When to use Description

RR A/(A+B) C/(C+D)

Prospective studies

Given exposure over a specified time period what’s the likelihood of the outcome?

RRR

1-RR

ARD

|C/(C+D) - A/(A+B)|

OR A/B C/D

Retrospective studies

Given an outcome has occurred what’s the likelihood (odds) of having the exposure?

HR Cox proportional hazard model

Survival analyses

Given exposure how quickly is the event likely to occur in one group versus another?

Answers: Quiz Question #7: 1. a 2. c 3. b Quiz Question #8: RR=0.67 RRR=33% ARD=1% NNT=100