The significance of study designs and outcomes
The primary outcome is positive – is that good enough?
Pocock SJ and Stone GW
N Engl J Med. 2016 Sep 8;375(10):971-9
In order to qualify a clinical trial as successful, attention is typically focused on whether the pre-specified measure of success of a primary outcome has been met, that is, whether the P value of less than 0.05 has been achieved. However, a valid conclusion requires examination of the totality of evidence, including secondary end points and safety issues but also the study design.
This review highlights important aspects of consideration when judging the outcomes of a trial and demonstrates study gaps in particular in in cardiovascular clinical trials.
The value of the P valueThe achievement of statistical significance is not sufficient. A P value of 0.05 means a 5% risk of a false positive result.
- In contrast to the PARADIGM-HF trial with an overwhelming P value of treatment difference below the 0.00001, the SAINT I trial revealed a P value of 0.038 for the primary outcome ‘disability within 90 days’, when acute ischemic stroke patients were treated with NXY-059 or placebo. Although this P value is below 0.05 suggesting for effectivity of NXY-059, a second, larger trial (SAINT II) concluded no significant effect (P=0.33), which lead to the contrary conclusion that NXY-059 was ineffective for the treatment of acute ischemic stroke.
Magnitude of treatment benefitA treatment difference needs to be clinically meaningful. Therefore treatment effect must be assessed on the relative scale (relative risk/hazard ratio) and the absolute scale (event rate and number needed to treat). In addition, the 95% confidence interval tells something about the level of uncertainty and should definitely be considered.
- In the IMPROVE-IT trial, the 7-years difference of primary events between ezetimibe-treated and placebo patients was only 2%; 32.7 vs 34.7% with a 95% confidence interval of 0 to 4%. Although the findings for this trial were described as ‘positive’, one might question whether the benefit of ezetimibe is large enough to warrant its cost and potential complications.
Surrogate markers as primary endpointFor some diseases, a surrogate primary outcome measure has been accepted. These markers correlate but do not necessarily have a guaranteed relationship and therefore, questions have been raised about the value of these markers.
- In the ACCORD trial, intensive treatment resulted in markedly lower glycated haemoglobin levels compared to standard therapy, however the rate of cardiovascular events was not significantly lower and the mortality was even higher.
- In the LIDO trial, levosimendan resulted in greater hemodynamic improvement than dobutamine but primary benefit could not be confirmed by the subsequent, larger SURVIVE trial. Although LIDO lead to approval of levosimendan in many countries, the FDA did not approve it after publication of the SURVIVE results.
Composite primary endpointsWhen composite endpoints are used, it is important to critically evaluate the driving components of the result.
- In the RITA-3 trial the decrease of the composite endpoint could largely be attributed to refractory angina, while there was no evidence of a difference in the rate of the 2 other endpoints that were included in the composite endpoint (death and myocardial infarction [MI] in the short term). Nevertheless, it was sold as “RITA-3: First proof intervention saves lives”. Fortunately, results of later follow-up studies supported this statement.
- Also the EXPEDITION trial showed a very positive result, comparing cariporide with placebo and finding a P value of 0.0002 for the composite outcome (death or MI). However, this was mainly driven by MI (P=0.000005) and not by mortality which was even higher (P=0.02), as was the rate of cerebrovascular events (P<0.001).
The impact of secondary endpoint resultsThe primary outcome results are enhanced if pre-specified secondary outcomes also show benefit. On the other hand, if these outcomes do not show benefit, doubts should be raised.
- No evidence of benefit existed for two key secondary outcomes in the SAINT I trial. This absence created suspicion regarding the ‘positive’ primary outcome. Indeed, a negative result for the primary outcome was found in the subsequent SAINT II trial.
- In contrast, the composite primary outcome was found borderline significant in the EMPA-REG OUTCOME trial but the secondary outcomes results were robust and significant. Thus, the effect of empagliflozin received more credits thanks to the secondary endpoints.
SubgroupsA consistent relative treatment effect may be observed across all patient types, but certain high-risk subgroups may have greater absolute benefits or some patients do not appear to benefit from the new treatment. Caution is warranted, since spurious findings can arise when multiple subgroups are analysed.
- In the PLATO trial, the overall risk of cardiovascular death, MI or stroke was 16% lower with ticagrelor than with clopidogrel (P<0.001). However, subgroup analyses revealed that patients receiving a high maintenance dose of aspirin had a 45% higher risk with ticagrelor than with clopidogrel, whereas ticagrelor was associated with a lower risk of cardiovascular death, MI or stroke among patients receiving a low maintenance dose. As this observation arose from numerous exploratory subgroup analyses and lacks obvious biologic plausibility, the validity of this observation is still disputed. Nevertheless, the FDA issued a warning regarding aspirin dose in this setting.
Small trials lack power, so positive treatments effects are susceptible to exaggeration and false positives may occur.
Size of the trial
- The N-acetylcysteine versus placebo trial concluded that “N-acetylcysteine is an effective means of preventing renal damage”. However, this statement is too strong; 1 of 41 patients receiving N-acetylcysteine had a primary event whereas 9 of the 42 placebo patients had such an event. More appropriate would be that N-acetylcysteine “may be effective”. Indeed, a meta-analysis of 10 randomized trials (1916 patients) concluded that the evidence was too weak and heterogeneous.
- In the PRAMI trial, the 65% reduction in hazard was too good to be true. This finding was based on relatively few primary events (21 versus 53). Two subsequent, similarly sized trials, showed mixed results. Thus, more evidence is needed to adjust clinical management.
Effect of prematurely stopped trialsAn early stopped trial can exaggerate treatment efficacy. As a trial progresses, the estimated treatment effect varies randomly in relation to the true effect. If the interim estimate is based on a randomly high indication of efficacy, it is more likely to cross a statistical stopping boundary. Stopping early also truncates evidence for important secondary (and safety) outcomes.
- The FAME 2 trial stopped early because the hazard ratio for the primary outcome favouring PCI (compared to medical therapy alone) was 0.39 (95% CI 0.26-0.57, P<0.001). This benefit was driven by fewer urgent revascularizations; a ‘soft’ outcome in an unblinded trial. The rate of death or MI was, although lower with PCI, inconclusive. Completion of the trial would have resulted in more events, which would have greatly enhanced the value of this trial.
- The SPRINT trial was stopped early at a median of 3.26 years instead of 5. The hazard ratio for the primary outcome was 0.75 (95% CI, 0.64-0.89, P<0.001) and 4 weeks lapsed between stopping of the trial and publication. Therefore, the quality and completeness of any interim database are inevitably imperfect; there will be events yet to be ascertained and adjudicated. In addition, orderly trial closure after early stoppage takes several months and is necessary to achieve robust interpretation of all evidence. The time the trial is stopped, is likely the time at which an exaggerated estimate of efficacy is present.
Safety versus positive efficacyA balanced account of both efficacy and safety must be provided; absolute benefits and risk should be presented in terms of differences in percentages and the number needed to treat for benefit versus the number needed to harm may provide a guide to net clinical benefit.
- The benefit in the DAPT trial, in which addition (18 months) of dual antiplatelet therapy versus aspirin was tested after a drug-eluting stent, came at the cost of higher rates of major bleeding events. All-cause mortality was 0.5% higher which was attributed to a greater noncardiovasculair mortality.
- In the SPRINT trial, the rate of composite cardiovascular outcome was 1.6% lower and a rate of death was 1.2% lower with intensive blood-pressure lowering compared to standard blood-pressure lowering. However, hypotension, syncope and acute kidney injury were increased with 1.4, 1.1 and 1.8% respectively, with intensive blood pressure control.
Trial designBiases in the design and conduct of the trial must be ruled out before a genuine benefit can be acknowledged.
- SYMPLICITY HTN-2 lacked blinding of the trial, which introduced major issues. The trial showed effectivity, however the subsequent SYMPLICITY HTN-3 sham-controlled trial did disprove this finding.
- In the ATLAS ACS 2-TIMI 51 trial, 27.6% of the patients discontinued treatment prematurely and data on vital status were missing for 7.2% of the patients. These problems appeared to be greater in this trial than in other comparable large trials.
Limitations of applicability trial resultsThe patient population that is evaluated in the trial has a role in the impact of the result for management changes and different countries.
- The SPRINT trial excluded patients younger than 50 years of age and those with diabetes or a history of stroke. Thus, the trial results apply to only ~20% of all patients with hypertension who are seen in practice. Moreover, in the ACCORD trial no effect on cardiovascular events with type 2 diabetic patients was observed with intensive blood-pressure treatment as compared with standard therapy.
- The single-center TAPAS trial with 1071 patients showed dramatically lower mortality at 1 year after PCI and thrombus aspiration than after conventional PCI. This outcome was unrealistic given the modest benefit in reperfusion success (primary outcome). Nevertheless, the study led to widespread adoption of thrombus aspiration for many years. Later, two multicentre trials involving more than 17.000 patients have convincingly shown that routine thrombus aspiration offers no advantage with regard to mortality on cardiovascular events.
- By the time the long-term findings for the primary outcome of a trial become available, advances in care may have lessened their relevance to contemporary practice. In the SYNTAX and FREEDOM trials, patients were assigned to PCI with first-generation drug-eluting stents or to CABG. However, these contemporary drug-eluting stents represent a substantial improvement from first-generation devices; a fact that diminishes the applicability of these findings in current practice.