MATHEMATICS IN MEDICINE: November 2007

Medical research relies on clinical trials to assess therapeuticbenefits. Because of the effort and cost involved in these studies,investigators frequently use analyses of subgroups of studyparticipants to extract as much information as possible. Suchanalyses, which assess the heterogeneity of treatment effectsin subgroups of patients, may provide useful information forthe care of patients and for future research. However, subgroupanalyses also introduce analytic challenges and can lead tooverstated and misleading results. This reportoutlines the challenges associated with conducting and reportingsubgroup analyses, and it sets forth guidelines for their usein the Journal. Although this report focuses on the reportingof clinical trials, many of the issues discussed also applyto observational studies.

Subgroup Analyses and Related Concepts

Subgroup Analysis

By "subgroup analysis," we mean any evaluation of treatmenteffects for a specific end point in subgroups of patients definedby baseline characteristics. The end point may be a measureof treatment efficacy or safety. For a given end point, thetreatment effect — a comparison between the treatmentgroups — is typically measured by a relative risk, oddsratio, or arithmetic difference. The research question usuallyposed is this: Do the treatment effects vary among the levelsof a baseline factor?

A subgroup analysis is sometimes undertaken to assess treatmenteffects for a specific patient characteristic; this assessmentis often listed as a primary or secondary study objective. Forexample, Sacks et al. conducted a placebo-controlled trialin which the reduction in the incidence of coronary events withthe use of pravastatin was examined in a diverse populationof persons who had survived a myocardial infarction. In subgroupanalyses, the investigators further examined whether the efficacyof pravastatin relative to placebo in preventing coronary eventsvaried according to the patients' baseline low-density lipoprotein(LDL) levels.

Subgroup analyses are also undertaken to investigate the consistencyof the trial conclusions among different subpopulations definedby each of multiple baseline characteristics of the patients.For example, Jackson et al. reported the outcomes of a studyin which 36,282 postmenopausal women 50 to 79 years of age wererandomly assigned to receive 1000 mg of elemental calcium with400 IU of vitamin D₃ daily or placebo. Fractures, the primaryoutcome, were ascertained over an average follow-up period of7.0 years; bone density was a secondary outcome. Overall, notreatment effect was found for the primary outcome; that is,the active treatment was not shown to prevent fractures. Theeffect of calcium plus vitamin D supplementation relative toplacebo on the risk of each of four fracture outcomes was furtheranalyzed for consistency in subgroups defined by 15 characteristicsof the participants.

Heterogeneity and Statistical Interactions

The heterogeneity of treatment effects across the levels ofa baseline variable refers to the circumstance in which thetreatment effects vary across the levels of the baseline characteristic.Heterogeneity is sometimes further classified as being eitherquantitative or qualitative. In the first case, one treatmentis always better than the other, but by various degrees, whereasin the second case, one treatment is better than the other forone subgroup of patients and worse than the other for anothersubgroup of patients. Such variation, also called "effect modification,"is typically expressed in a statistical model as an interactionterm or terms between the treatment group and the baseline variable.The presence or absence of interaction is specific to the measureof the treatment effect.

The appropriate statistical method for assessing the heterogeneityof treatment effects among the levels of a baseline variablebegins with a statistical test for interaction. Forexample, Sacks et al. showed the heterogeneity in pravastatinefficacy by reporting a statistically significant (P=0.03) resultof testing for the interaction between the treatment and baselineLDL level when the measure of the treatment effect was the relativerisk. Many trials lack the power to detect heterogeneity intreatment effect; thus, the inability to find significant interactionsdoes not show that the treatment effect seen overall necessarilyapplies to all subjects. A common mistake is to claim heterogeneityon the basis of separate tests of treatment effects within eachof the levels of the baseline variable. For example, testingthe hypothesis that there is no treatment effect in women andthen testing it separately in men does not address the questionof whether treatment differences vary according to sex. Anothercommon error is to claim heterogeneity on the basis of the observedtreatment-effect sizes within each subgroup, ignoring the uncertaintyof these estimates.

Multiplicity

It is common practice to conduct a subgroup analysis for eachof several — and often many — baseline characteristics,for each of several end points, or for both. For example, theanalysis by Jackson and colleagues of the effect of calciumplus vitamin D supplementation relative to placebo on the riskof each of four fracture outcomes for 15 participant characteristicsresulted in a total of 60 subgroup analyses.

When multiple subgroup analyses are performed, the probabilityof a false positive finding can be substantial. For example,if the null hypothesis is true for each of 10 independent testsfor interaction at the 0.05 significance level, the chance ofat least one false positive result exceeds 40%. Thus, one mustbe cautious in the interpretation of such results. There areseveral methods for addressing multiplicity that are based onthe use of more stringent criteria for statistical significancethan the customary P<0.05.A less formal approach foraddressing multiplicity is to note the number of nominally significantinteraction tests that would be expected to occur by chancealone. For example, after noting that 60 subgroup analyses wereplanned, Jackson et al. pointed out that "Up to three statisticallysignificant interaction tests (P<0.05)> on the basis of chance alone," and then they incorporated thisconsideration in their interpretation of the results.

Prespecified Analysis versus Post Hoc Analysis

A prespecified subgroup analysis is one that is planned anddocumented before any examination of the data, preferably inthe study protocol. This analysis includes specification ofthe end point, the baseline characteristic, and the statisticalmethod used to test for an interaction. For example, the HeartOutcomes Prevention Evaluation 2 investigators conducted astudy involving 5522 patients with vascular disease or diabetesto assess the effect of homocysteine lowering with folic acidand B vitamins on the risk of a major cardiovascular event.The primary outcome was a composite of death from cardiovascularcauses, myocardial infarction, and stroke. In the Methods sectionof their article, the authors noted that "Prespecified subgroupanalyses involving Cox models were used to evaluate outcomesin patients from regions with folate fortification of food andregions without folate fortification, according to the baselineplasma homocysteine level and the baseline serum creatininelevel." Post hoc analyses refer to those in which the hypothesesbeing tested are not specified before any examination of thedata. Such analyses are of particular concern because it isoften unclear how many were undertaken and whether some weremotivated by inspection of the data. However, both prespecifiedand post hoc subgroup analyses are subject to inflated falsepositive rates arising from multiple testing. Investigatorsshould avoid the tendency to prespecify many subgroup analysesin the mistaken belief that these analyses are free of the multiplicityproblem.

Subgroup Analyses in the Journal — Assessment of Reporting Practices

As part of internal quality-control activities at the Journal,we assessed the completeness and quality of subgroup analysesreported in the Journal during the period from July 1, 2005,through June 30, 2006. A detailed description of the study methodscan be found in the available with thefull text of this article at www.nejm.org. In this report, wedescribe the clarity and completeness of subgroup-analysis reporting,evaluate the authors' interpretation and justification of theresults of subgroup analyses, and recommend guidelines for reportingsubgroup analyses.

Among the original articles published in the Journal duringthe period from July 1, 2005, through June 30, 2006, a totalof 95 articles reported primary outcome results from randomizedclinical trials. Among these 95 articles, 93 reported resultsfrom one clinical trial; the remaining 2 articles reported resultsfrom two trials. Thus, results from 97 trials were reported,from which subgroup analyses were reported for 59 trials (61%). summarizes the characteristics of the trials. We foundthat larger trials and multicenter trials were significantlymore likely to report subgroup analyses than smaller trialsand single-center trials, respectively. With the use of multivariatelogistic-regression models, when ranked according to the numberof participants enrolled in a trial and compared with trialswith the fewest participants, the odds ratio for reporting subgroupanalyses for the second quartile was 1.38 (95% confidence interval[CI], 0.45 to 4.20), for the third quartile was 1.98 (95% CI,0.62 to 6.24), and for the fourth quartile was 8.90 (95% CI,2.10 to 37.78) (P=0.02, trend test). The odds ratio for reportingsubgroup analyses in multicenter trials as compared with single-centertrials was 4.33 (95% CI, 1.56 to 12.16).

MATHEMATICS IN MEDICINE

Friday, November 23, 2007

An Introduction to Vedic Mathematics

Thursday, November 22, 2007

Statistics Reporting of Subgroup Analyses in Clinical Trials

Blog Archive

About Me