Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2019

Open Access 01-12-2019 | Technical advance

True and false positive rates for different criteria of evaluating statistical evidence from clinical trials

Authors: Don van Ravenzwaaij, John P. A. Ioannidis

Published in: BMC Medical Research Methodology | Issue 1/2019

Login to get access

Abstract

Background

Until recently a typical rule that has often been used for the endorsement of new medications by the Food and Drug Administration has been the existence of at least two statistically significant clinical trials favoring the new medication. This rule has consequences for the true positive (endorsement of an effective treatment) and false positive rates (endorsement of an ineffective treatment).

Methods

In this paper, we compare true positive and false positive rates for different evaluation criteria through simulations that rely on (1) conventional p-values; (2) confidence intervals based on meta-analyses assuming fixed or random effects; and (3) Bayes factors. We varied threshold levels for statistical evidence, thresholds for what constitutes a clinically meaningful treatment effect, and number of trials conducted.

Results

Our results show that Bayes factors, meta-analytic confidence intervals, and p-values often have similar performance. Bayes factors may perform better when the number of trials conducted is high and when trials have small sample sizes and clinically meaningful effects are not small, particularly in fields where the number of non-zero effects is relatively large.

Conclusions

Thinking about realistic effect sizes in conjunction with desirable levels of statistical evidence, as well as quantifying statistical evidence with Bayes factors may help improve decision-making in some circumstances.
Appendix
Available only for authorised users
Literature
1.
go back to reference Katz R. FDA: evidentiary standards for drug development and approval. NeuroRx. 2004;1:307–16.CrossRef Katz R. FDA: evidentiary standards for drug development and approval. NeuroRx. 2004;1:307–16.CrossRef
2.
go back to reference Goodman SN. P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. Am J Epidemiol. 1993;137:485–96.CrossRef Goodman SN. P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. Am J Epidemiol. 1993;137:485–96.CrossRef
3.
go back to reference Goodman SN. Toward evidence-based medical statistics. 1: the P value fallacy. Ann Intern Med. 1999;130:995–1004.CrossRef Goodman SN. Toward evidence-based medical statistics. 1: the P value fallacy. Ann Intern Med. 1999;130:995–1004.CrossRef
4.
go back to reference Food and Drug Administration. Guidance for industry: providing clinical evidence of effectiveness for human drug and biological products. Maryland: United States Food and Drug Administration; 1998. Food and Drug Administration. Guidance for industry: providing clinical evidence of effectiveness for human drug and biological products. Maryland: United States Food and Drug Administration; 1998.
5.
go back to reference van Ravenzwaaij D, Ioannidis JP. A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results. PLoS One. 2017;12:e0173184.CrossRef van Ravenzwaaij D, Ioannidis JP. A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results. PLoS One. 2017;12:e0173184.CrossRef
6.
go back to reference Goodman SN. Toward evidence-based medical statistics. 2: the Bayes factor. Ann Intern Med. 1999;130:1005–13.CrossRef Goodman SN. Toward evidence-based medical statistics. 2: the Bayes factor. Ann Intern Med. 1999;130:1005–13.CrossRef
7.
go back to reference Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008;358:252–60.CrossRef Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008;358:252–60.CrossRef
8.
go back to reference Monden R, Roest AM, van Ravenzwaaij D, Wagenmakers EJ, Morey R, Wardenaar KJ, de Jonge P. The comparative evidence basis for the efficacy of second-generation antidepressants in the treatment of depression in the US: a Bayesian meta-analysis of Food and Drug Administration reviews. J Affect Disord. 2018;235:393–8.CrossRef Monden R, Roest AM, van Ravenzwaaij D, Wagenmakers EJ, Morey R, Wardenaar KJ, de Jonge P. The comparative evidence basis for the efficacy of second-generation antidepressants in the treatment of depression in the US: a Bayesian meta-analysis of Food and Drug Administration reviews. J Affect Disord. 2018;235:393–8.CrossRef
9.
go back to reference Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2:e124.CrossRef Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2:e124.CrossRef
10.
go back to reference Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA, Brembs B, Brown L, Camerer C, Cesarini D, et al. Redefine statistical significance. Nat Hum Behav. 2018;2:6.CrossRef Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA, Brembs B, Brown L, Camerer C, Cesarini D, et al. Redefine statistical significance. Nat Hum Behav. 2018;2:6.CrossRef
11.
go back to reference Jeffreys H. Theory of probability. 1st ed. Oxford: Oxford University Press; 1961. Jeffreys H. Theory of probability. 1st ed. Oxford: Oxford University Press; 1961.
12.
13.
go back to reference Djulbegovic B, Kumar A, Glasziou PP, Perera R, Reljic T, Dent L, Raftery J, Johansen M, Di Tanna GL, Miladinovic B, Soares HP. New treatments compared to established treatments in randomized trials. Cochrane Database Syst Rev. 2012;10:MR000024.PubMedPubMedCentral Djulbegovic B, Kumar A, Glasziou PP, Perera R, Reljic T, Dent L, Raftery J, Johansen M, Di Tanna GL, Miladinovic B, Soares HP. New treatments compared to established treatments in randomized trials. Cochrane Database Syst Rev. 2012;10:MR000024.PubMedPubMedCentral
14.
go back to reference Djulbegovic B, Lacevic M, Cantor A, Fields KK, Bennett CL, Adams JR, Kuderer NM, Lyman GH. The uncertainty principle and industry-sponsored research. Lancet. 2000;356:635–8.CrossRef Djulbegovic B, Lacevic M, Cantor A, Fields KK, Bennett CL, Adams JR, Kuderer NM, Lyman GH. The uncertainty principle and industry-sponsored research. Lancet. 2000;356:635–8.CrossRef
15.
go back to reference Ocana A, Tannock IF. When are “positive” clinical trials in oncology truly positive? J Natl Cancer Inst. 2010;103:16–20.CrossRef Ocana A, Tannock IF. When are “positive” clinical trials in oncology truly positive? J Natl Cancer Inst. 2010;103:16–20.CrossRef
16.
go back to reference Pereira TV, Horwitz RI, Ioannidis JPA. Empirical evaluation of very large treatment effects of medical interventions. JAMA. 2012;308:1676–84.CrossRef Pereira TV, Horwitz RI, Ioannidis JPA. Empirical evaluation of very large treatment effects of medical interventions. JAMA. 2012;308:1676–84.CrossRef
17.
go back to reference Cipriani A, Furukawa TA, Salanti G, Chaimani A, Atkinson LZ, Ogawa Y, Leucht S, Ruhe HG, Turner EH, Higgins JP, Egger M. Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. Lancet. 2018;391:1357–66.CrossRef Cipriani A, Furukawa TA, Salanti G, Chaimani A, Atkinson LZ, Ogawa Y, Leucht S, Ruhe HG, Turner EH, Higgins JP, Egger M. Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. Lancet. 2018;391:1357–66.CrossRef
18.
go back to reference Hopewell S, Dutton S, Yu LM, Chan AW, Altman DG. The quality of reports of randomised trials in 2000 and 2006: comparative study of articles indexed in PubMed. BMJ. 2010;340:c723.CrossRef Hopewell S, Dutton S, Yu LM, Chan AW, Altman DG. The quality of reports of randomised trials in 2000 and 2006: comparative study of articles indexed in PubMed. BMJ. 2010;340:c723.CrossRef
19.
go back to reference In’t Hout J, JPA I, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14:25.CrossRef In’t Hout J, JPA I, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14:25.CrossRef
20.
go back to reference Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev. 2009;16:225–37.CrossRef Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev. 2009;16:225–37.CrossRef
21.
go back to reference Bayarri MJ, Berger JO, Forte A, García-Donato G. Criteria for Bayesian model choice with application to variable selection. Ann Stat. 2012;40:1550–77.CrossRef Bayarri MJ, Berger JO, Forte A, García-Donato G. Criteria for Bayesian model choice with application to variable selection. Ann Stat. 2012;40:1550–77.CrossRef
22.
go back to reference Consonni G, Fouskakis D, Liseo B, Ntzoufras I. Prior distributions for objective Bayesian analysis. Bayesian Anal. 2018;13:627–79.CrossRef Consonni G, Fouskakis D, Liseo B, Ntzoufras I. Prior distributions for objective Bayesian analysis. Bayesian Anal. 2018;13:627–79.CrossRef
23.
go back to reference Morey RD, Rouder JN, Jamil T, Urbanek S, Forner K, Ly A. BayesFactor: Computation of Bayes factors for common designs. R package version 0.9.12–4.2; 2018. Morey RD, Rouder JN, Jamil T, Urbanek S, Forner K, Ly A. BayesFactor: Computation of Bayes factors for common designs. R package version 0.9.12–4.2; 2018.
24.
go back to reference Held L, Ott M. On p-values and Bayes factors. Ann Rev Stat Appl. 2018;5:393–419.CrossRef Held L, Ott M. On p-values and Bayes factors. Ann Rev Stat Appl. 2018;5:393–419.CrossRef
25.
go back to reference Ott M, Held L. pCalibrate: Bayesian Calibrations of p-values. R package version 0.1–1; 2017. Ott M, Held L. pCalibrate: Bayesian Calibrations of p-values. R package version 0.1–1; 2017.
26.
go back to reference Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003;41:582–92.PubMed Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003;41:582–92.PubMed
27.
go back to reference Copay AG, Subach BR, Glassman SD, Polly DW Jr, Schuler TC. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007;7:541–6.CrossRef Copay AG, Subach BR, Glassman SD, Polly DW Jr, Schuler TC. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007;7:541–6.CrossRef
28.
go back to reference Hays RD, Woolley JM. The concept of clinically meaningful difference in health-related quality-of-life research. Pharmacoeconomics. 2000;18:419–23.CrossRef Hays RD, Woolley JM. The concept of clinically meaningful difference in health-related quality-of-life research. Pharmacoeconomics. 2000;18:419–23.CrossRef
29.
go back to reference Hobbs BP, Carlin BP. Practical Bayesian design and analysis for drug and device clinical trials. J Biopharm Stat. 2008;18:54–80.CrossRef Hobbs BP, Carlin BP. Practical Bayesian design and analysis for drug and device clinical trials. J Biopharm Stat. 2008;18:54–80.CrossRef
30.
go back to reference Zaslavsky BG. Bayesian hypothesis testing in two-arm trials with dichotomous outcomes. Biometrics. 2013;69:157–63.CrossRef Zaslavsky BG. Bayesian hypothesis testing in two-arm trials with dichotomous outcomes. Biometrics. 2013;69:157–63.CrossRef
31.
go back to reference Woodcock J, Temple R, Midthun K, Schultz D, Sundlof S. FDA senior management perspectives. Clin Trials. 2005;2:373–8.CrossRef Woodcock J, Temple R, Midthun K, Schultz D, Sundlof S. FDA senior management perspectives. Clin Trials. 2005;2:373–8.CrossRef
32.
go back to reference Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of reporting p values in the biomedical literature, 1990-2015. J Am Med Assoc. 2016;315:1141–8.CrossRef Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of reporting p values in the biomedical literature, 1990-2015. J Am Med Assoc. 2016;315:1141–8.CrossRef
34.
go back to reference van Ravenzwaaij D, Monden R, Tendeiro JN, Ioannidis JP. Bayes factors for superiority, non-inferiority, and equivalence designs. BMC Med Res Methodol. 2019;19:71.CrossRef van Ravenzwaaij D, Monden R, Tendeiro JN, Ioannidis JP. Bayes factors for superiority, non-inferiority, and equivalence designs. BMC Med Res Methodol. 2019;19:71.CrossRef
Metadata
Title
True and false positive rates for different criteria of evaluating statistical evidence from clinical trials
Authors
Don van Ravenzwaaij
John P. A. Ioannidis
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2019
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-019-0865-y

Other articles of this Issue 1/2019

BMC Medical Research Methodology 1/2019 Go to the issue