Canadian Medical Association Journal 1995; 152: 351-357
Résumé
Dans le troisième article d'une série de quatre, les auteurs
illustrent le calcul de mesures d'association et discutent de leur utilité dans
la prise de décisions cliniques. À partir des taux de décès ou d'autres
«événement» dans des groupes de sujets expérimentaux et des groupes de sujets
témoins au cours d'un essai clinique, nous pouvons calculer le risque relatif
(RR) de l'événement après un traitement expérimental, exprimé en pourcentage du
risque sans le recours au traitement en question. La réduction du risque absolu
(RRA) est la différence entre les groupes quant au risque qu'un événement se
produise. La réduction du risque relatif est le pourcentage du risque de base
(le risque qu'un événement se produise chez les patients du groupe témoin)
éliminé à la suite du traitement. Le ratio des probabilités (RP), qui représente
la mesure privilégiée dans les études cas–témoins, représente le ratio entre les
probabilités qu'un événement se produise dans le groupe de sujets expérimentaux
et celles qu'il se produise dans le groupe de sujets témoins. Le RP et le RR
donnent des informations limitées lorsqu'il s'agit de faire état des résultats
d'essais prospectifs parce qu'ils ne reflètent pas les changements du risque de
base. Le RRA et le nombre de sujets à traiter, qui indiquent aux cliniciens le
nombre de patients qu'il faut traiter pour prévenir un événement, prennent en
compte à la fois le risque de base et la réduction du risque relatif. Si le
moment des événements est important — pour déterminer si le traitement prolonge
la vie, par exemple — on utilise des courbes de survie pour illustrer le moment
où se produit l'événement.
[ Top of
document ]
The reader familiar with the first two articles in this
series will, when presented with the results of a clinical trial, know how to
discover the range within which the treatment effect likely lies. This treatment
effect is worth considering if it comes from a study that is valid [1]. In this
article, we explore the ways investigators and representatives of pharmaceutical
companies may present the results of a trial.
When clinicians look at the results of clinical trials they are interested in the association between a treatment and an outcome. There may be no association; for example, there may be no difference in mean values of an indicator — such as blood pressure — between groups, or the same risk of an adverse event — such as death — in both groups. Alternatively, the trial results may show a decreased risk of adverse outcomes in patients receiving the experimental treatment. In a study examining a putatively harmful agent there may be no increase in risk among patients in a group exposed to the agent in comparison with those in a control group or an association between exposure and an adverse event, which suggests that the agent is indeed harmful. In this article, we examine how one can express the magnitude of these associations.
When investigators present results that show a difference in the mean value of a clinical measurement between two groups, the interpretation is usually straightforward. However, when they present results that show the proportion of patients who suffered an adverse event in each group, interpretation may be more difficult. In this situation they may express the strength of the association as a relative risk, an absolute risk reduction or an odds ratio. Understanding these measures is challenging and important; they will provide the focus of this article. We will examine the relative merits of the different measures of association and show how they can lead clinicians to different conclusions.
[ Top of
document ]
Introducing the 2 × 2 table
A crucial concept in analysing the
efficacy of therapeutic interventions is the "event." Analysis often examines
the proportion of patients who suffered a particular outcome (the "event") in
the treatment and control groups. This is always true when the outcome is
clearly a dichotomous variable — that is, a discrete event that either occurs or
does not occur. Examples of dichotomous outcomes are the occurrence of negative
events, such as stroke, myocardial infarction, death or recurrence of cancer, or
positive events, such as ulcer healing or resolution of symptoms. Not only an
event's occurrence but also its timing may be important. We will return to this
issue later.
Even if the results are not of a yes-or-no form, investigators sometimes choose to present them as if they were. Investigators may present variables such as duration of exercise before chest pain develops, number of episodes of angina per month, change in lung function or number of visits to the emergency room as mean values in each of the two groups. However, they may also transform these values into dichotomous data by specifying a threshold or degree of change that constitutes an important improvement or deterioration and then examining the proportion of patients above and below this threshold. For example, investigators in one study used forced expiratory volume in 1 second (FEV1) to assess the efficacy of therapy with corticosteroids taken orally by patients with a chronic stable airflow limitation; they defined an "event" as an improvement in FEV1 of more than 20% over the baseline value [2].
The results of trials with dichotomous outcomes can usually be presented in a form of 2 × 2 table (Table 1). For instance, in a randomized trial investigators compared rates of death among patients with bleeding esophageal varices controlled by either endoscopic ligation or sclerotherapy [3]. After a mean follow-up period of 10 months, 18 of 64 patients assigned to ligation died, as did 29 of 65 patients assigned to sclerotherapy. Table 2 summarizes the data from this trial in a 2 × 2 table.
Table 1: Sample 2 x 2 table | ||
Exposure | Outcome | |
Yes | No | |
Yes | A | B |
No | C | D |
Table 2: Results from a randomized trial comparing treatment of bleeding esophageal varices with endoscopic sclero-therapy and with ligation* | |||
Intervention | Outcome, no. of patients | Total no. of patients treated | |
Death | Survival | ||
Ligation | 18 | 46 | 64 |
Sclerotherapy | 29 | 36 | 65 |
*Reprinted with permission from N Engl J Med 1992; 326: 1527-1532. |
[ Top of
document ]
Relative risk
The first thing we can determine from the 2 × 2 table
is that the risk of an event (death, in this case) was 28.1% (18/64) in the
ligation group and 44.6% (29/65) in the sclerotherapy group. The ratio of these
risks is called the relative risk (RR) or the risk ratio. This value tells us
the risk of the event after the experimental treatment (in this case, ligation),
as a percentage of the original risk (in this case, the risk of death after
sclerotherapy). From Table 1, the formula for calculating the RR from the data
gathered is [A/(A + B)]/[C/(C + D)]. In our example, the RR of death after
receiving initial ligation compared with sclerotherapy is 18/64 (the risk in the
ligation group) divided by 29/65 (the risk in the sclerotherapy group), which
equals 63%. That is, the risk of death after ligation is about two thirds as
great as the risk of death after sclerotherapy.
[ Top of
document ]
Absolute risk reduction
The difference in the risk of the outcome
between patients who have undergone one therapy and those who have undergone
another is called the absolute or attributable risk reduction (ARR) or the risk
difference. The formula for its calculation, from Table 1, is [C/(C + D)] -
[A/(A + B)]. This measure tells us the percentage of patients who are spared the
adverse outcome as a result of having received the experimental rather than the
control therapy. In our example, the ARR is 0.446 minus 0.281, which equals
0.165, or 16.5%.
[ Top of
document ]
Relative risk reduction
Another measure used to assess the
effectiveness of treatment is relative risk reduction (RRR). One considers first
the risk of an adverse event among patients taking the placebo or, if two
therapies are being compared, the risk among patients receiving the standard or
inferior therapy. This is called the baseline risk. The relative risk reduction
is an estimate of the percentage of baseline risk that is removed as a result of
the therapy; it is calculated as the ARR between the treatment and control
groups, divided by the absolute risk among patients in the control group; from
Table 1, {[C/(C + D)] - [A/(A + B)]}/[C/(C + C)]. In our example, the RRR is
calculated by dividing 16.5% (the ARR) by 44.6% (the risk among patients
receiving sclero-therapy), which equals 37%. One may also derive the RRR by
subtracting the RR from 1. In our example, the RRR is equal to 1 minus 0.63, or
0.37 (37%).
[ Top of
document ]
Odds ratio
Instead of looking at the risk of an event, we could
estimate the odds of an event occurring. In our example, the odds of death after
ligation are 18 (death) versus 46 (survival), or 18/46 (A/B), and the odds of
death after sclero-therapy are 29 versus 36 (C/D). The formula for the ratio of
these odds — called, not surprisingly, the odds ratio (OR) — is (A/C)/(B/D). In
our example, this calculation yields (18/46)/(29/36), which equals 0.49.
The OR is probably less familiar to physicians than risk or RR. However, the OR is usually the measure of choice in the analysis of casecontrol studies. In general, the OR has certain optimal statistical properties that make it the fundamental measure of association in many types of studies[4]. These statistical advantages may be particularly important when data from several studies are combined, as they are in a meta-analysis. Among such advantages, the comparison of risk represented by the OR does not depend on whether the investigator chose to determine the risk of an event occurring (e.g., death) or not occurring (e.g., survival). This is not true for relative risk. In some situations the OR and the RR will be close — for example, in casecontrol studies of a rare disease.
[ Top of
document ]
RR versus OR versus ARR: Why the fuss?
The important distinction
among the ARR, the RR and the OR may be illustrated by modifying the death rates
in each of the two treatment groups shown in Table 2. In the explanation that
follows, the reader should note that the effect on the various expressions of
risk depends on the way the death rates are changed. We could alter the death
rates by the same absolute amount in each group, by the same relative amount, or
in some other way.
There is some evidence that, when treatment reduces the rate of death, the reduction in rates or proportion of deaths will often be similar in each subgroup of patients [5]. In our example, if we assume that the number of patients who died decreased by 50% in both groups, the risk of death in the ligation group would decrease from 28% to 14% and in the sclerotherapy group from 44.6% to 22.3%. The RR would be 14/22.3 or 0.63 — the same as before. The OR would be (9/55)/(14.5/51) or 0.58, which differs moderately from the OR based on the higher death rate (0.49), and is closer to the RR. The ARR would decrease from 16.5% to approximately 8%. Thus, a decrease in the proportion of patients who died in both groups by a factor of two leaves the RR unchanged, results in a moderate increase in the OR and reduces the ARR by a factor of two. This example highlights the fact that the same RR can be associated with very different ORs and ARRs. A major change in the risk of an adverse event without treatment (or, as in this case, with the inferior treatment) will not be reflected in the RR or the OR; in contrast, the ARR changes markedly with a change in the baseline risk.
Hence, the RR and the OR do not tell us the magnitude of the absolute risk. An RR of 33% may mean that the treatment reduces the risk of an adverse outcome from 3% to 1% or from 60% to 20%. The clinical implications of these risk reductions are very different. Consider a therapy with severe side effects. If such side effects occur in 5% of patients treated, and the treatment reduces the probability of an adverse outcome from 3% to 1%, we probably will not institute this therapy. However, we may be willing to accept this incidence of side effects if the therapy reduces the probability of an adverse outcome from 60% to 20%. In the latter situation, of every 100 patients treated 40 would benefit and 5 would suffer side effects — a trade-off that most would consider worth while.
The RRR behaves the same way as the RR: it does not reflect the change in the underlying risk in the control population. In our example, if the incidence of adverse events decreased by approximately 50% in both groups, the RRR would be the same as it was at the previous incidence rate: (22.3 - 14)/22.3 or 0.37. The RRR therefore shares with the RR the disadvantage of not reflecting the baseline risk.
These observations depend on the assumption that the death rates in the two groups change by the same proportion. If these changes are not proportional the conclusions may be different. For instance, suppose that the rates of death between the two groups differ by 10 percentage points; for example, if the death rates are 80% and 90%, respectively, the RR is 0.8/0.9 or 89%, the RRR 11%, the ARR 10% and the OR 0.44. If the rates of death then decrease by 50 percentage points in each group, to 30% and 40% respectively, the RR would be 0.3/0.4 or 75%, the RRR 25%, the ARR 10% and the OR 0.64. In this case, the ARR remains constant and thus does not reflect the change in the magnitude of risk without therapy. In contrast, the other indices differ in the two cases and hence reflect the change in the baseline risk.
[ Top of
document ]
Number needed to treat
The number needed to treat (NNT) is the most
recently introduced measure of treatment efficacy [7]. Let us return to our 2 ×
2 tables for a short exercise. In Table 2 we see that the risk of death in the
ligation group is 28.1% and in the sclerotherapy group 44.6%. Therefore,
treating 100 patients with ligation rather than sclerotherapy will save the
lives of between 15 and 16 patients, as shown by the ARR. If treating 100
patients prevents 16 adverse events, how many patients do we need to treat to
prevent 1 event? The answer is 100 divided by 16, which yields approximately 6.
This is the NNT. One can also arrive at this number by taking the reciprocal of
the ARR (1/ARR). Since the NNT is related to the ARR, it is not surprising that
the NNT also changes with a change in the underlying risk.
The NNT is directly related to the proportion of patients in the control group who suffer an adverse event. For instance, if the incidence of these events (the baseline risk) decreased by a factor of two and the RRR remained constant, treating 100 patients with ligation would mean that 8 events had been avoided, and the NNT would double, from 6 to 12. In general, the NNT changes inversely in relation to the baseline risk. If the risk of an adverse event doubles, we need treat only half as many patients to prevent the same number of adverse events; if the risk decreases by a factor of four, we must treat four times as many patients to achieve the same result.
[ Top of
document ]
Back to the 2 × 2 table
The data we have presented so far could have
been derived from the original 2 × 2 table (Table 2). The ARR and its
reciprocal, the NNT, incorporate the influence of any change in baseline risk,
but they do not tell us the magnitude of the baseline risk. For example, an ARR
of 5% (and a corresponding NNT of 20) may represent reduction of the risk of
death from 10% to 5% or from 50% to 45%. The RR and RRR do not take into account
the baseline risk, and the clinical utility of these measures suffers as a
result.
Whichever way we choose to express the efficacy of a treatment, we must keep in mind that the 2 × 2 table reflects results at a given time. Therefore, our comments on the RR, the ARR, the RRR, the OR and the NNT must be qualified by giving them a time frame. For example, we must say that use of ligation rather than sclerotherapy for a mean period of 10 months resulted in an ARR of 17% and an NNT of 6. The results could be different if the duration of observation was very short, in which case there was little time for an event such as death to occur, or very long, in which case it is much more likely that an event will occur (e.g., if the outcome is death, after 100 years of follow-up all of the patients will have died).
[ Top of
document ]
Confidence intervals
We have presented all of the measures of
association for treatment with ligation versus sclerotherapy as if they
represented the true effect. As we pointed out in the previous article in this
series, the results of any experiment are an estimate of the truth. The true
effect of treatment may actually be greater or less than what we observed. The
confidence interval tells us, within the bounds of plausibility, how much
greater or smaller the true effect is likely to be. Confidence intervals can be
calculated for each of the measures of association we have discussed.
[ Top of
document ]
Survival data
As we pointed out, the analysis of a 2 × 2 table is an
examination of the data at a specific time. Such analysis is satisfactory if we
are investigating events that occur within relatively short periods and if all
patients are followed for the same duration. However, in longer-term studies we
are interested not only in the number of events but also in their timing. We
may, for instance, wish to know whether therapy for a fatal condition such as
severe congestive heart failure or unresectable lung cancer delays death.
When the timing of events is important, the results can be presented in several 2 × 2 tables constructed at certain points after the beginning of the study. In this sense, Table 2 showed the situation after a mean of 10 months of follow-up. Similar tables could be constructed to show the fate of all patients at given times after their enrolment in the trial, i.e., at 1 week, 1 month, 3 months or whatever intervals we choose. An analysis of accumulated data that takes into account the timing of events is called survival analysis. Despite the name, such analysis is not restricted to deaths; any discrete event may be studied in this way.
The survival curve of a group of patients shows the status of the patients at different times after a defined starting point [8]. In Fig. 1, we show an example of a survival curve taken from a trial of treatments of bleeding varices. Although the mean follow-up period in this trial was 286 days, the survival curve extends beyond this time, presumably to a point at which the number of patients still at risk is sufficient to make reasonably confident predictions. At a later point, prediction would become very imprecise because there would be too few patients to estimate the probability of survival. This imprecision can be captured by confidence intervals or bands extending above and below the survival curves.
Hypothesis tests can be applied to survival curves, the null hypothesis being that there is no difference between two curves. In the first article in this series, we described how an analysis based on hypothesis testing can be adjusted or corrected for differences in the two groups at the baseline. If one group were older (and thus had a higher risk of the adverse outcome) or had less severe disease (and thus had a lower risk), the investigators could conduct an analysis that takes into account these differences. Such an analysis tells us, in effect, what would have happened if the two groups had comparable risks of adverse outcomes at the start of the trial.
[ Top of
document ]
Casecontrol studies
The examples we have used so far have been
prospective randomized controlled trials. In such trials we start with an
experimental group of patients who are subject to an intervention and a control
group of patients who are not. The investigators follow the patients over time
and record the incidence of events. The process is similar in prospective cohort
studies, although in this study design the "exposure" or treatment is not
controlled by the investigators. Instead of being assigned to receive or not
receive the intervention, patients are chosen, sampled or classified according
to whether they were or were not exposed to the treatment or risk factor. In
both randomized trials and prospective cohort studies we can calculate risks,
ARRs and RRs.
In casecontrol studies participants are chosen or sampled not according to whether they have been exposed to the treatment or risk factor but on the basis of whether they have experienced an event. Participants start the study with or without the event rather than with or without the exposure or intervention. Patients with the adverse outcome — be it stroke, myocardial infarction or cancer — are compared with control patients who have not suffered the outcome. The investigators wish to determine if any factor seems to be more common in one of these groups than in the other.
In one casecontrol study investigators examined whether the use of sun-beds or sun-lamps increased the risk of melanoma [9]. They identified 583 patients with melanoma and 608 control patients. The control and case patients had similar distributions of age, sex and region of residence. The results for men and women were presented separately (those for men are shown in Table 3).
Table 3: Results from a casecontrol study of the association between melanoma and the use of sun-beds and sun-lamps* | ||
Ever exposed to sun-beds or sun-lamps | No. of patients | |
Case | Control | |
Yes | 67 | 41 |
No | 210 | 242 |
*Reproduced with permission from Walter SD, Marrett LD, From L, et al: The association of cutaneous malignant melanoma with the use of sunbeds and sunlamps. Am J Epidemiol 1990; 131: 232-243. |
If the information in Table 3 came from a prospective cohort study or randomized controlled trial we could begin by calculating the risk of an event in the experimental and control groups. However, this would not make sense in a casecontrol study because the number of patients who did not have melanoma was chosen by the investigators. For calculation of the RR we need to know the population at risk, and this information is not available in a casecontrol study.
The only measure of association that makes sense in a casecontrol study is the OR. One can investigate whether the odds of having been exposed to sun-beds or sun-lamps among the patients with melanoma are the same as the odds of exposure among the control patients. In the study the odds were 67/210 in the patients with melanoma and 41/242 in the control patients. The odds ratio is therefore (67/210)/(41/242) or 1.88 (95% confidence interval [CI] 1.20 to 2.98), which suggests an association between the use of sun-beds or sun-lamps and melanoma. The fact that the CI does not include 1.0 means that the association is unlikely to be due to chance.
Even if the association were not due to chance, this does not necessarily mean that the sun-beds or sun-lamps were the cause of melanoma in these patients. Potential explanations could include higher recollection of use of these devices among patients with melanoma (recall bias), longer exposure to sun among these patients or different skin colour. (In fact, in this study the investigators addressed many of these possible explanations.) Confirmatory studies would be needed to be confident that exposure to sun-beds or sun-lamps was the cause of melanoma.
[ Top of
document ]
Which measure of association is best?
In randomized trials and
cohort studies, investigators can usually choose from several measures of
association. Which should the reader hope to see? We believe that the best
option is to show all of the data, in the form of 2 × 2 tables or life tables
(deaths or other events during follow-up presented in tabular form), and then
consider both the relative and absolute figures. As the reader examines the
results, she or he will find the ARR and its reciprocal, the NNT, the most
useful measures for deciding whether to institute treatment. As we have
discussed, the RR and the RRR do not take baseline risk into account and can
therefore be misleading.
In fact, clinicians make different decisions depending on the way the results are reported. Clinicians consistently judge a therapy to be less effective when the results are presented in the form of the NNT than when any other measure of association is used [10-13].
[ Top of
document ]
Interpreting study results
We complete this exposition by reviewing
the results of a landmark study — the Lipid Research Clinics Coronary Primary
Prevention Trial — of the usefulness of therapy to lower serum cholesterol
levels [14]. In this randomized, placebo-controlled trial the investigators
tested the hypothesis that a reduction in cholesterol levels reduces the
incidence of coronary heart disease (CHD). They followed 3806 asymptomatic
middle-aged men with primary hyper-cholesterolemia (serum cholesterol levels
above the 95th percentile), of whom one third were smokers, for a mean period of
7.4 years. Patients in one group received cholestyramine (24 g/d) and those in
the other a placebo. The main outcome measures (events) were death due to CHD
and nonfatal myocardial infarction. After 7.4 years of follow-up the results
showed an ARR of 1.71% (95% CI -0.11% to 3.53%) and an NNT of 58 (the 95% CI for
the NNT would include the fact that the therapy causes one death in 935 treated
patients and requires treatment of 28 patients to save one life). The original
report did not provide CIs for the RR and the ARR. We used the original data to
calculate these measures and the associated CIs, so our point estimates differ
slightly from the adjusted estimates given in the original report.
The risk of an event was 9.8% among the patients taking a placebo and 8.1% among those receiving cholestyramine. The RR of an event for those taking cholestyramine versus those taking a placebo was 83% (95% CI 68% to 101%). The use of cholestyramine was associated with a 17% reduction in the incidence of an event (RRR), with a 95% CI from a 33% reduction in risk to a 1% increase in risk, and with prevention of 17 primary events per 1000 patients treated. Therefore, 58 patients (100/1.7) needed to be treated for 7 years to prevent one primary event.
In addition to calculating the NNT, one could also consider resources expended to prevent an event. The cost of a month's supply of cholestyramine is $120.49. The cost of the drug required to prevent one event is 58 (the NNT) × 7 years of follow-up × 12 months per year × $120.49 for a 1-month supply = $587 027.28. Alternatively, to prevent one event, patients need to take 24 g/d × 58 (NNT) × 365 days per year × 7 years of follow-up = 3 556 560 g, approximately 3.56 tonnes to swallow of cholestyramine.
If one considered only patients with a lower risk of CHD (younger men, women, nonsmokers and those with cholesterol levels that are elevated but not in the top 95th percentile) the NNT would rise. It is not surprising that advertisements promoting the use of cholesterol-lowering drugs cite the RRR rather than the ARR or the NNT and do not mention the cost per event prevented.
The results of this study provide another caution for the clinician. The results we have described are based on the incidence of both fatal and nonfatal coronary events. However, the death rates shown in this study were similar in the two groups: there were 71 deaths among patients receiving placebo and 68 among patients receiving cholestyramine. Furthermore, when investigators have examined all trials of drug therapy for lowering cholesterol, they have found a possible association between administration of these agents and death from causes other than cardiovascular disease [15]. As this result highlights, the wary user of the medical literature must be sure that all relevant outcomes are reported [16].
ARRs are easy to calculate, as is their reciprocal, the NNT. If the NNT is not presented in trial results, clinicians who wish to get the best sense of the effect of an intervention should take the trouble to determine the number of patients they need to treat to prevent an event as well as the cost and toxic effects associated with treatment of that number of patients. These measures will help clinicians to weigh the benefits and costs of treatments.
[ Top of
document ]
References