Skip to main content
Top
Published in: Trials 1/2020

Open Access 01-12-2020 | Methodology

A simulation study comparing the power of nine tests of the treatment effect in randomized controlled trials with a time-to-event outcome

Authors: Patrick Royston, Mahesh K. B. Parmar

Published in: Trials | Issue 1/2020

Login to get access

Abstract

Background

The logrank test is routinely applied to design and analyse randomized controlled trials (RCTs) with time-to-event outcomes. Sample size and power calculations assume the treatment effect follows proportional hazards (PH). If the PH assumption is false, power is reduced and interpretation of the hazard ratio (HR) as the estimated treatment effect is compromised. Using statistical simulation, we investigated the type 1 error and power of the logrank (LR)test and eight alternatives. We aimed to identify test(s) that improve power with three types of non-proportional hazards (non-PH): early, late or near-PH treatment effects.

Methods

We investigated weighted logrank tests (early, LRE; late, LRL), the supremum logrank test (SupLR) and composite tests (joint, J; combined, C; weighted combined, WC; versatile and modified versatile weighted logrank, VWLR, VWLR2) with two or more components. Weighted logrank tests are intended to be sensitive to particular non-PH patterns. Composite tests attempt to improve power across a wider range of non-PH patterns. Using extensive simulations based on real trials, we studied test size and power under PH and under simple departures from PH comprising pointwise constant HRs with a single change point at various follow-up times. We systematically investigated the influence of high or low control-arm event rates on power.

Results

With no preconceived type of treatment effect, the preferred test is VWLR2. Expecting an early effect, tests with acceptable power are SupLR, C, VWLR2, J, LRE and WC. Expecting a late effect, acceptable tests are LRL, VWLR, VWLR2, WC and J. Under near-PH, acceptable tests are LR, LRE, VWLR, C, VWLR2 and SupLR. Type 1 error was well controlled for all tests, showing only minor deviations from the nominal 5%. The location of the HR change point relative to the cumulative proportion of control-arm events considerably affected power.

Conclusions

Assuming ignorance of the likely treatment effect, the best choice is VWLR2. Several non-standard tests performed well when the correct type of treatment effect was assumed. A low control-arm event rate reduced the power of weighted logrank tests targeting early effects. Test size was generally well controlled. Further investigation of test characteristics with different types of non-proportional hazards of the treatment effect is warranted.
Literature
3.
go back to reference StataCorp. Stata Statistical Software: Release 15. College Station, TX: StataCorp LLC; 2017. StataCorp. Stata Statistical Software: Release 15. College Station, TX: StataCorp LLC; 2017.
4.
go back to reference Fleming TR, Harrington DP. Counting processes and survival analysis. New York: Wiley; 1991. Fleming TR, Harrington DP. Counting processes and survival analysis. New York: Wiley; 1991.
5.
go back to reference Peto R, Peto J. Asymptotically efficient rank invariant test procedures. J Royal Stat Soc, Ser A. 1972; 135:185–207.CrossRef Peto R, Peto J. Asymptotically efficient rank invariant test procedures. J Royal Stat Soc, Ser A. 1972; 135:185–207.CrossRef
6.
go back to reference Prentice RL. Linear rank tests with right censored data. Biometrika. 1978; 65:167–79.CrossRef Prentice RL. Linear rank tests with right censored data. Biometrika. 1978; 65:167–79.CrossRef
7.
go back to reference Fleming TR, Harrington DP, O’Sullivan M. Supremum versions of the log-rank and generalized Wilcoxon statistics. J Am Stat Assoc. 1987; 82:312–20.CrossRef Fleming TR, Harrington DP, O’Sullivan M. Supremum versions of the log-rank and generalized Wilcoxon statistics. J Am Stat Assoc. 1987; 82:312–20.CrossRef
9.
go back to reference Royston P, Parmar MKB. Augmenting the logrank test in the design of clinical trials in which non-proportional hazards of the treatment effect may be anticipated. BMC Med Res Methodol. 2016; 16:16.CrossRef Royston P, Parmar MKB. Augmenting the logrank test in the design of clinical trials in which non-proportional hazards of the treatment effect may be anticipated. BMC Med Res Methodol. 2016; 16:16.CrossRef
10.
go back to reference Lee JW. Some versatile tests based on the simultaneous use of weighted log-rank statistics. Biometrics. 1996; 52:721–5.CrossRef Lee JW. Some versatile tests based on the simultaneous use of weighted log-rank statistics. Biometrics. 1996; 52:721–5.CrossRef
11.
go back to reference Lee JW. On the versatility of the combination of the weighted log-rank statistics. Comput Stat Data Anal. 2007; 51:6557–64.CrossRef Lee JW. On the versatility of the combination of the weighted log-rank statistics. Comput Stat Data Anal. 2007; 51:6557–64.CrossRef
12.
go back to reference Karrison TG. Versatile tests for comparing survival curves based on weighted log-rank statistics. Stat J. 2016; 16:678–90.CrossRef Karrison TG. Versatile tests for comparing survival curves based on weighted log-rank statistics. Stat J. 2016; 16:678–90.CrossRef
13.
go back to reference Scirica BM, Bhatt DL, Braunwald E, Steg PG, Davidson J, Hirshberg B, Ohman P, Frederich R, Wiviott SD, Hoffman EB, Cavender MA, Udell JA, Desai NR, Mosenzon O, McGuire DK, Ray KK, Leiter LA, Raz I, for the SAVOR-TIMI 53 Steering Committee and Investigators. Saxagliptin and cardiovascular outcomes in patients with type 2 diabetes mellitus. New Eng J Med. 2013; 369:1317–26.CrossRef Scirica BM, Bhatt DL, Braunwald E, Steg PG, Davidson J, Hirshberg B, Ohman P, Frederich R, Wiviott SD, Hoffman EB, Cavender MA, Udell JA, Desai NR, Mosenzon O, McGuire DK, Ray KK, Leiter LA, Raz I, for the SAVOR-TIMI 53 Steering Committee and Investigators. Saxagliptin and cardiovascular outcomes in patients with type 2 diabetes mellitus. New Eng J Med. 2013; 369:1317–26.CrossRef
14.
go back to reference Ferris RL, G. Blumenschein J, Fayette J, Guigay J, Colevas AD, Licitra L, Harrington K, Kasper S, Vokes EE, Even C, Worden F, Saba NF, Docampo LCI, Haddad R, Rordorf T, Kiyota N, Tahara M, Monga M, Lynch M, Geese WJ, Kopit J, Shaw JW, Gillison ML. Nivolumab for recurrent squamous-cell carcinoma of the head and neck. New Eng J Med. 2016; 375:1856–67.CrossRef Ferris RL, G. Blumenschein J, Fayette J, Guigay J, Colevas AD, Licitra L, Harrington K, Kasper S, Vokes EE, Even C, Worden F, Saba NF, Docampo LCI, Haddad R, Rordorf T, Kiyota N, Tahara M, Monga M, Lynch M, Geese WJ, Kopit J, Shaw JW, Gillison ML. Nivolumab for recurrent squamous-cell carcinoma of the head and neck. New Eng J Med. 2016; 375:1856–67.CrossRef
15.
go back to reference Royston P, Parmar MKB. Flexible proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002; 21:2175–97.CrossRef Royston P, Parmar MKB. Flexible proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002; 21:2175–97.CrossRef
16.
go back to reference Royston P, Lambert PC. Flexible parametric survival analysis using Stata: beyond the Cox model. College Station, TX: Stata Press; 2011. Royston P, Lambert PC. Flexible parametric survival analysis using Stata: beyond the Cox model. College Station, TX: Stata Press; 2011.
17.
go back to reference Royston P. Tools to simulate realistic censored survival-time distributions. Stata J. 2012; 12:639–54.CrossRef Royston P. Tools to simulate realistic censored survival-time distributions. Stata J. 2012; 12:639–54.CrossRef
18.
19.
go back to reference Barthel FM-S, Royston P, Babiker A. A menu-driven facility for complex sample size calculation in randomized controlled trials with a survival or a binary outcome: update. Stat J. 2005; 5:123–9.CrossRef Barthel FM-S, Royston P, Babiker A. A menu-driven facility for complex sample size calculation in randomized controlled trials with a survival or a binary outcome: update. Stat J. 2005; 5:123–9.CrossRef
20.
go back to reference Thomas KS, others for the PATCH1 Trial Team. Penicillin to prevent recurrent leg cellulitis. New Eng J Med. 2013; 368:1695–703.CrossRef Thomas KS, others for the PATCH1 Trial Team. Penicillin to prevent recurrent leg cellulitis. New Eng J Med. 2013; 368:1695–703.CrossRef
22.
go back to reference Medical Research Council Renal Cancer Collaborators. Interferon- α and survival in metastatic renal carcinoma: early results of a randomised controlled trial. Lancet. 1999; 353:14–7.CrossRef Medical Research Council Renal Cancer Collaborators. Interferon- α and survival in metastatic renal carcinoma: early results of a randomised controlled trial. Lancet. 1999; 353:14–7.CrossRef
23.
go back to reference Freidlin B, Korn EL. Methods for accommodating nonproportional hazards in clinical trials: ready for the primary analysis?. J Clin Oncol. 2019; 37:3455–9.CrossRef Freidlin B, Korn EL. Methods for accommodating nonproportional hazards in clinical trials: ready for the primary analysis?. J Clin Oncol. 2019; 37:3455–9.CrossRef
24.
go back to reference Lin RS, Lin J, Roychoudhury S, Anderson KM, Hu T, Huang B, Leon LF, Liao JJ, Liu R, Luo X, Mukhopadhyay P, Qin R, Tatsuoka K, Wang X, Wang Y, Zhu J, Chen T-T, Iacona R, Cross-Pharma Non-proportional Hazards Working Group. Alternative analysis methods for time to event endpoints under non-proportional hazards: a comparative analysis. 2019. http://arxiv.org/abs/1909.09467. Accessed 20 Sep 2019. Lin RS, Lin J, Roychoudhury S, Anderson KM, Hu T, Huang B, Leon LF, Liao JJ, Liu R, Luo X, Mukhopadhyay P, Qin R, Tatsuoka K, Wang X, Wang Y, Zhu J, Chen T-T, Iacona R, Cross-Pharma Non-proportional Hazards Working Group. Alternative analysis methods for time to event endpoints under non-proportional hazards: a comparative analysis. 2019. http://​arxiv.​org/​abs/​1909.​09467.​ Accessed 20 Sep 2019.
25.
go back to reference Royston P. Power and sample size analysis for the Royston-Parmar combined test in clinical trials with a time-to-event outcome. Stat J. 2018; 18:3–21.CrossRef Royston P. Power and sample size analysis for the Royston-Parmar combined test in clinical trials with a time-to-event outcome. Stat J. 2018; 18:3–21.CrossRef
26.
go back to reference Royston P. Power and sample size analysis for the Royston-Parmar combined test in clinical trials with a time-to-event outcome: correction and program update. Stat J. 2018; 18:995–6.CrossRef Royston P. Power and sample size analysis for the Royston-Parmar combined test in clinical trials with a time-to-event outcome: correction and program update. Stat J. 2018; 18:995–6.CrossRef
Metadata
Title
A simulation study comparing the power of nine tests of the treatment effect in randomized controlled trials with a time-to-event outcome
Authors
Patrick Royston
Mahesh K. B. Parmar
Publication date
01-12-2020
Publisher
BioMed Central
Published in
Trials / Issue 1/2020
Electronic ISSN: 1745-6215
DOI
https://doi.org/10.1186/s13063-020-4153-2

Other articles of this Issue 1/2020

Trials 1/2020 Go to the issue