Skip to main content
Top
Published in: Trials 1/2020

Open Access 01-12-2020 | Malaria | Methodology

Machine learning analysis plans for randomised controlled trials: detecting treatment effect heterogeneity with strict control of type I error

Authors: James A. Watson, Chris C. Holmes

Published in: Trials | Issue 1/2020

Login to get access

Abstract

Background

Retrospective exploratory analyses of randomised controlled trials (RCTs) seeking to identify treatment effect heterogeneity (TEH) are prone to bias and false positives. Yet the desire to learn all we can from exhaustive data measurements on trial participants motivates the inclusion of such analyses within RCTs. Moreover, widespread advances in machine learning (ML) methods hold potential to utilise such data to identify subjects exhibiting heterogeneous treatment response.

Methods

We present a novel analysis strategy for detecting TEH in randomised data using ML methods, whilst ensuring proper control of the false positive discovery rate. Our approach uses random data partitioning with statistical or ML-based prediction on held-out data. This method can test for both crossover TEH (switch in optimal treatment) and non-crossover TEH (systematic variation in benefit across patients). The former is done via a two-sample hypothesis test measuring overall predictive performance. The latter is done via ‘stacking’ the ML predictors alongside a classical statistical model to formally test the added benefit of the ML algorithm. An adaptation of recent statistical theory allows for the construction of a valid aggregate p value. This testing strategy is independent of the choice of ML method.

Results

We demonstrate our approach with a re-analysis of the SEAQUAMAT trial, which compared quinine to artesunate for the treatment of severe malaria in Asian adults. We find no evidence for any subgroup who would benefit from a change in treatment from the current standard of care, artesunate, but strong evidence for significant TEH within the artesunate treatment group. In particular, we find that artesunate provides a differential benefit to patients with high numbers of circulating ring stage parasites.

Conclusions

ML analysis plans using computational notebooks (documents linked to a programming language that capture the model parameter settings, data processing choices, and evaluation criteria) along with version control can improve the robustness and transparency of RCT exploratory analyses. A data-partitioning algorithm allows researchers to apply the latest ML techniques safe in the knowledge that any declared associations are statistically significant at a user-defined level.
Appendix
Available only for authorised users
Footnotes
1
Figure 2, panel D gives an example of a shallow decision tree. In contrast, RF build deep decision trees from subsamples of the data where the branches (questions) descend until only a small number of samples lie within each leaf of each tree. Predictions on new data are then averaged across all trees.
 
Literature
1.
go back to reference Rothwell P. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet. 2005; 365(9454):176–86.PubMedCrossRef Rothwell P. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet. 2005; 365(9454):176–86.PubMedCrossRef
2.
go back to reference Altman D. Clinical trials: subgroup analyses in randomized trials – more rigour needed. Nat Rev Clin Oncol. 2015; 12(9):506–7.PubMedCrossRef Altman D. Clinical trials: subgroup analyses in randomized trials – more rigour needed. Nat Rev Clin Oncol. 2015; 12(9):506–7.PubMedCrossRef
4.
go back to reference Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001; 16(3):199–231.CrossRef Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001; 16(3):199–231.CrossRef
5.
go back to reference Murphy S. J R Stat Soc Ser B (Stat Methodol). 2003; 65(2):331–55. Murphy S. J R Stat Soc Ser B (Stat Methodol). 2003; 65(2):331–55.
6.
go back to reference Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Nonparametric tests for treatment effect heterogeneity. Rev Econ Stat. 2008; 90(3):389–405.CrossRef Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Nonparametric tests for treatment effect heterogeneity. Rev Econ Stat. 2008; 90(3):389–405.CrossRef
7.
go back to reference Su X, Tsai C-L, Wang H, Nickerson D, Li B. Subgroup analysis via recursive partitioning. J Mach Learn Res. 2009; 10(Feb):141–58. Su X, Tsai C-L, Wang H, Nickerson D, Li B. Subgroup analysis via recursive partitioning. J Mach Learn Res. 2009; 10(Feb):141–58.
9.
go back to reference Foster J, Taylor J, Ruberg S. Subgroup identification from randomized clinical trial data. Stat Med. 2011; 30(24):2867–80.PubMedCrossRef Foster J, Taylor J, Ruberg S. Subgroup identification from randomized clinical trial data. Stat Med. 2011; 30(24):2867–80.PubMedCrossRef
10.
go back to reference Zhao Y, Zeng D, Rush A, Kosorok M. Estimating individualized treatment rules using outcome weighted learning. J Am Stat Assoc. 2012; 107(499):1106–18.PubMedPubMedCentralCrossRef Zhao Y, Zeng D, Rush A, Kosorok M. Estimating individualized treatment rules using outcome weighted learning. J Am Stat Assoc. 2012; 107(499):1106–18.PubMedPubMedCentralCrossRef
11.
go back to reference Imai K, Ratkovic M. Estimating treatment effect heterogeneity in randomized program evaluation. Ann Appl Stat. 2013; 7(1):443–70.CrossRef Imai K, Ratkovic M. Estimating treatment effect heterogeneity in randomized program evaluation. Ann Appl Stat. 2013; 7(1):443–70.CrossRef
12.
go back to reference Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci. 2016; 113(27):7353–60.PubMedCrossRef Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci. 2016; 113(27):7353–60.PubMedCrossRef
13.
go back to reference Lipkovich I, Dmitrienko A, D’Agostino B. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Stat Med. 2017; 36(1):136–96.PubMedCrossRef Lipkovich I, Dmitrienko A, D’Agostino B. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Stat Med. 2017; 36(1):136–96.PubMedCrossRef
14.
go back to reference Athey S, Tibshirani J, Wager S. Generalized random forests. Ann Stat. 2019; 47(2):1148–78.CrossRef Athey S, Tibshirani J, Wager S. Generalized random forests. Ann Stat. 2019; 47(2):1148–78.CrossRef
15.
go back to reference Chernozhukov V, Demirer M, Duflo E, Fernadezval I. Generic machine learning inference on heterogeneous treatment effects in randomized experiments. 2019. arXiv:1712.04802v4. Chernozhukov V, Demirer M, Duflo E, Fernadezval I. Generic machine learning inference on heterogeneous treatment effects in randomized experiments. 2019. arXiv:1712.04802v4.
16.
go back to reference Brookes ST, Whitley E, Peters TJ, Mulheran PA, Egger M, Davey Smith G. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Health Technol Assess. 2001; 5(33):1–56.PubMedCrossRef Brookes ST, Whitley E, Peters TJ, Mulheran PA, Egger M, Davey Smith G. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Health Technol Assess. 2001; 5(33):1–56.PubMedCrossRef
17.
go back to reference Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroup-specific analyses;: power and sample size for the interaction test. J Clin Epidemiol. 2004; 57(3):229–36.PubMedCrossRef Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroup-specific analyses;: power and sample size for the interaction test. J Clin Epidemiol. 2004; 57(3):229–36.PubMedCrossRef
18.
go back to reference Kent DM, Rothwell PM, Ioannidis JP, Altman DG, Hayward RA. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal. Trials. 2010; 11(1):85.PubMedPubMedCentralCrossRef Kent DM, Rothwell PM, Ioannidis JP, Altman DG, Hayward RA. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal. Trials. 2010; 11(1):85.PubMedPubMedCentralCrossRef
21.
go back to reference Dondorp A, Nosten F, Stepniewska K, Day N, White N. Artesunate versus quinine for treatment of severe falciparum malaria: a randomised trial. The Lancet (London, England). 2004; 366(9487):717–25. Dondorp A, Nosten F, Stepniewska K, Day N, White N. Artesunate versus quinine for treatment of severe falciparum malaria: a randomised trial. The Lancet (London, England). 2004; 366(9487):717–25.
22.
go back to reference Gail M, Simon R. Testing for qualitative interactions between treatment effects and patient subsets. Biometrics. 1985; 41(2):361–72.PubMedCrossRef Gail M, Simon R. Testing for qualitative interactions between treatment effects and patient subsets. Biometrics. 1985; 41(2):361–72.PubMedCrossRef
23.
go back to reference Gelman A, Loken E. The statistical crisis in science. Data-dependent analysis—a garden of forking paths—explains why many statistically significant comparisons don’t hold up. Am Sci. 2014; 102(6):460.CrossRef Gelman A, Loken E. The statistical crisis in science. Data-dependent analysis—a garden of forking paths—explains why many statistically significant comparisons don’t hold up. Am Sci. 2014; 102(6):460.CrossRef
25.
go back to reference Meinshausen N, Meier L, Bühlmann P. P-values for high-dimensional regression. J Am Stat Assoc. 2009; 104(488):1671–81.CrossRef Meinshausen N, Meier L, Bühlmann P. P-values for high-dimensional regression. J Am Stat Assoc. 2009; 104(488):1671–81.CrossRef
26.
go back to reference Hayward RA, Kent DM, Vijan S, Hofer TP. Multivariable risk prediction can greatly enhance the statistical power of clinical trial subgroup analysis. BMC Med Res Methodol. 2006; 6(1):18.PubMedPubMedCentralCrossRef Hayward RA, Kent DM, Vijan S, Hofer TP. Multivariable risk prediction can greatly enhance the statistical power of clinical trial subgroup analysis. BMC Med Res Methodol. 2006; 6(1):18.PubMedPubMedCentralCrossRef
27.
go back to reference Kent DM, Steyerberg E, van Klaveren D. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects. Br Med J. 2018; 363:4245.CrossRef Kent DM, Steyerberg E, van Klaveren D. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects. Br Med J. 2018; 363:4245.CrossRef
28.
go back to reference Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques, 2nd ed. Burlington: Morgan Kaufmann; 2016. Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques, 2nd ed. Burlington: Morgan Kaufmann; 2016.
30.
go back to reference Spiegelhalter D. J R Stat Soc Ser A (Stat Soc). 2017; 180(4):1–16. Spiegelhalter D. J R Stat Soc Ser A (Stat Soc). 2017; 180(4):1–16.
31.
go back to reference Hastie T, Tibshirani R, Friedman J. The elements of statistical learning, 2nd ed. New York: Springer; 2009.CrossRef Hastie T, Tibshirani R, Friedman J. The elements of statistical learning, 2nd ed. New York: Springer; 2009.CrossRef
32.
go back to reference Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008; 2(3):841–60.CrossRef Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008; 2(3):841–60.CrossRef
33.
go back to reference Dondorp AM, Lee SJ, Faiz M, Mishra S, Price R, Tjitra E, Than M, Htut Y, Mohanty S, Yunus EB. The relationship between age and the manifestations of and mortality associated with severe malaria. Clin Infect Dis. 2008; 47(2):151–7.PubMedCrossRef Dondorp AM, Lee SJ, Faiz M, Mishra S, Price R, Tjitra E, Than M, Htut Y, Mohanty S, Yunus EB. The relationship between age and the manifestations of and mortality associated with severe malaria. Clin Infect Dis. 2008; 47(2):151–7.PubMedCrossRef
34.
go back to reference Dondorp AM, Fanello CI, Hendriksen IC, Gomes E, Seni A, Chhaganlal KD, Bojang K, Olaosebikan R, Anunobi N, Maitland K, et al.Artesunate versus quinine in the treatment of severe falciparum malaria in African children (AQUAMAT): an open-label, randomised trial. Lancet. 2010; 376(9753):1647–57.PubMedPubMedCentralCrossRef Dondorp AM, Fanello CI, Hendriksen IC, Gomes E, Seni A, Chhaganlal KD, Bojang K, Olaosebikan R, Anunobi N, Maitland K, et al.Artesunate versus quinine in the treatment of severe falciparum malaria in African children (AQUAMAT): an open-label, randomised trial. Lancet. 2010; 376(9753):1647–57.PubMedPubMedCentralCrossRef
36.
go back to reference Hanson J, Lee SJ, Mohanty S, Faiz M, Anstey NM, Charunwatthana Pk, Yunus EB, Mishra SK, Tjitra E, Price RN, et al. A simple score to predict the outcome of severe malaria in adults. Clin Infect Dis. 2010; 50(5):679–85.PubMedPubMedCentralCrossRef Hanson J, Lee SJ, Mohanty S, Faiz M, Anstey NM, Charunwatthana Pk, Yunus EB, Mishra SK, Tjitra E, Price RN, et al. A simple score to predict the outcome of severe malaria in adults. Clin Infect Dis. 2010; 50(5):679–85.PubMedPubMedCentralCrossRef
37.
go back to reference Ashley EA, Dhorda M, Fairhurst RM, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Mao S, Sam B, et al. Spread of artemisinin resistance in Plasmodium falciparum malaria. N Engl J Med. 2014; 371(5):411–23.PubMedPubMedCentralCrossRef Ashley EA, Dhorda M, Fairhurst RM, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Mao S, Sam B, et al. Spread of artemisinin resistance in Plasmodium falciparum malaria. N Engl J Med. 2014; 371(5):411–23.PubMedPubMedCentralCrossRef
Metadata
Title
Machine learning analysis plans for randomised controlled trials: detecting treatment effect heterogeneity with strict control of type I error
Authors
James A. Watson
Chris C. Holmes
Publication date
01-12-2020
Publisher
BioMed Central
Keyword
Malaria
Published in
Trials / Issue 1/2020
Electronic ISSN: 1745-6215
DOI
https://doi.org/10.1186/s13063-020-4076-y

Other articles of this Issue 1/2020

Trials 1/2020 Go to the issue