Skip to main content
Top
Published in: European Journal of Epidemiology 4/2016

Open Access 01-04-2016 | ESSAY

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Authors: Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, Douglas G. Altman

Published in: European Journal of Epidemiology | Issue 4/2016

Login to get access

Abstract

Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so—and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting.
Literature
2.
5.
go back to reference Altman DG, Machin D, Bryant TN, Gardner MJ, eds. Statistics with confidence. 2nd ed. London: BMJ Books; 2000. Altman DG, Machin D, Bryant TN, Gardner MJ, eds. Statistics with confidence. 2nd ed. London: BMJ Books; 2000.
6.
go back to reference Atkins L, Jarrett D. The significance of “significance tests”. In: Irvine J, Miles I, Evans J, editors. Demystifying social statistics. London: Pluto Press; 1979. Atkins L, Jarrett D. The significance of “significance tests”. In: Irvine J, Miles I, Evans J, editors. Demystifying social statistics. London: Pluto Press; 1979.
7.
go back to reference Cox DR. The role of significance tests (with discussion). Scand J Stat. 1977;4:49–70. Cox DR. The role of significance tests (with discussion). Scand J Stat. 1977;4:49–70.
9.
go back to reference Cox DR, Hinkley DV. Theoretical statistics. New York: Chapman and Hall; 1974.CrossRef Cox DR, Hinkley DV. Theoretical statistics. New York: Chapman and Hall; 1974.CrossRef
10.
go back to reference Freedman DA, Pisani R, Purves R. Statistics. 4th ed. New York: Norton; 2007. Freedman DA, Pisani R, Purves R. Statistics. 4th ed. New York: Norton; 2007.
11.
go back to reference Gigerenzer G, Swijtink Z, Porter T, Daston L, Beatty J, Kruger L. The empire of chance: how probability changed science and everyday life. New York: Cambridge University Press; 1990. Gigerenzer G, Swijtink Z, Porter T, Daston L, Beatty J, Kruger L. The empire of chance: how probability changed science and everyday life. New York: Cambridge University Press; 1990.
12.
go back to reference Harlow LL, Mulaik SA, Steiger JH. What if there were no significance tests?. New York: Psychology Press; 1997. Harlow LL, Mulaik SA, Steiger JH. What if there were no significance tests?. New York: Psychology Press; 1997.
13.
go back to reference Hogben L. Statistical theory. London: Allen and Unwin; 1957. Hogben L. Statistical theory. London: Allen and Unwin; 1957.
14.
go back to reference Kaye DH, Freedman DA. Reference guide on statistics. In: Reference manual on scientific evidence, 3rd ed. Washington, DC: Federal Judicial Center; 2011. p. 211–302. Kaye DH, Freedman DA. Reference guide on statistics. In: Reference manual on scientific evidence, 3rd ed. Washington, DC: Federal Judicial Center; 2011. p. 211–302.
15.
go back to reference Morrison DE, Henkel RE, editors. The significance test controversy. Chicago: Aldine; 1970. Morrison DE, Henkel RE, editors. The significance test controversy. Chicago: Aldine; 1970.
16.
go back to reference Oakes M. Statistical inference: a commentary for the social and behavioural sciences. Chichester: Wiley; 1986. Oakes M. Statistical inference: a commentary for the social and behavioural sciences. Chichester: Wiley; 1986.
17.
go back to reference Pratt JW. Bayesian interpretation of standard inference statements. J Roy Stat Soc B. 1965;27:169–203. Pratt JW. Bayesian interpretation of standard inference statements. J Roy Stat Soc B. 1965;27:169–203.
18.
go back to reference Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd ed. Philadelphia: Lippincott-Wolters-Kluwer; 2008. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd ed. Philadelphia: Lippincott-Wolters-Kluwer; 2008.
19.
go back to reference Ware JH, Mosteller F, Ingelfinger JA. p-Values. In: Bailar JC, Hoaglin DC, editors. Ch. 8. Medical uses of statistics. 3rd ed. Hoboken, NJ: Wiley; 2009. p. 175–94. Ware JH, Mosteller F, Ingelfinger JA. p-Values. In: Bailar JC, Hoaglin DC, editors. Ch. 8. Medical uses of statistics. 3rd ed. Hoboken, NJ: Wiley; 2009. p. 175–94.
20.
go back to reference Ziliak ST, McCloskey DN. The cult of statistical significance: how the standard error costs us jobs, justice and lives. Ann Arbor: U Michigan Press; 2008. Ziliak ST, McCloskey DN. The cult of statistical significance: how the standard error costs us jobs, justice and lives. Ann Arbor: U Michigan Press; 2008.
21.
go back to reference Altman DG, Bland JM. Absence of evidence is not evidence of absence. Br Med J. 1995;311:485.CrossRef Altman DG, Bland JM. Absence of evidence is not evidence of absence. Br Med J. 1995;311:485.CrossRef
22.
go back to reference Anscombe FJ. The summarizing of clinical experiments by significance levels. Stat Med. 1990;9:703–8.PubMedCrossRef Anscombe FJ. The summarizing of clinical experiments by significance levels. Stat Med. 1990;9:703–8.PubMedCrossRef
23.
24.
go back to reference Bandt CL, Boen JR. A prevalent misconception about sample size, statistical significance, and clinical importance. J Periodontol. 1972;43:181–3.PubMedCrossRef Bandt CL, Boen JR. A prevalent misconception about sample size, statistical significance, and clinical importance. J Periodontol. 1972;43:181–3.PubMedCrossRef
25.
go back to reference Berkson J. Tests of significance considered as evidence. J Am Stat Assoc. 1942;37:325–35.CrossRef Berkson J. Tests of significance considered as evidence. J Am Stat Assoc. 1942;37:325–35.CrossRef
26.
go back to reference Bland JM, Altman DG. Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach. Am J Clin Nutr. 2015;102:991–4.PubMedCrossRef Bland JM, Altman DG. Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach. Am J Clin Nutr. 2015;102:991–4.PubMedCrossRef
27.
go back to reference Chia KS. “Significant-itis”—an obsession with the P-value. Scand J Work Environ Health. 1997;23:152–4.PubMedCrossRef Chia KS. “Significant-itis”—an obsession with the P-value. Scand J Work Environ Health. 1997;23:152–4.PubMedCrossRef
28.
go back to reference Cohen J. The earth is round (p < 0.05). Am Psychol. 1994;47:997–1003.CrossRef Cohen J. The earth is round (p < 0.05). Am Psychol. 1994;47:997–1003.CrossRef
30.
go back to reference Fidler F, Loftus GR. Why figures with error bars should replace p values: some conceptual arguments and empirical demonstrations. J Psychol. 2009;217:27–37. Fidler F, Loftus GR. Why figures with error bars should replace p values: some conceptual arguments and empirical demonstrations. J Psychol. 2009;217:27–37.
31.
go back to reference Gardner MA, Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J. 1986;292:746–50.CrossRef Gardner MA, Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J. 1986;292:746–50.CrossRef
34.
go back to reference Gelman A, Stern HS. The difference between “significant” and “not significant” is not itself statistically significant. Am Stat. 2006;60:328–31.CrossRef Gelman A, Stern HS. The difference between “significant” and “not significant” is not itself statistically significant. Am Stat. 2006;60:328–31.CrossRef
35.
go back to reference Gigerenzer G. Mindless statistics. J Socioecon. 2004;33:567–606. Gigerenzer G. Mindless statistics. J Socioecon. 2004;33:567–606.
36.
go back to reference Gigerenzer G, Marewski JN. Surrogate science: the idol of a universal method for scientific inference. J Manag. 2015;41:421–40. Gigerenzer G, Marewski JN. Surrogate science: the idol of a universal method for scientific inference. J Manag. 2015;41:421–40.
37.
38.
go back to reference Goodman SN. P-values, hypothesis tests and likelihood: implications for epidemiology of a neglected historical debate. Am J Epidemiol. 1993;137:485–96.PubMed Goodman SN. P-values, hypothesis tests and likelihood: implications for epidemiology of a neglected historical debate. Am J Epidemiol. 1993;137:485–96.PubMed
39.
go back to reference Goodman SN. Towards evidence-based medical statistics, I: the P-value fallacy. Ann Intern Med. 1999;130:995–1004.PubMedCrossRef Goodman SN. Towards evidence-based medical statistics, I: the P-value fallacy. Ann Intern Med. 1999;130:995–1004.PubMedCrossRef
40.
41.
go back to reference Greenland S. Null misinterpretation in statistical testing and its impact on health risk assessment. Prev Med. 2011;53:225–8.PubMedCrossRef Greenland S. Null misinterpretation in statistical testing and its impact on health risk assessment. Prev Med. 2011;53:225–8.PubMedCrossRef
42.
go back to reference Greenland S. Nonsignificance plus high power does not imply support for the null over the alternative. Ann Epidemiol. 2012;22:364–8.PubMedCrossRef Greenland S. Nonsignificance plus high power does not imply support for the null over the alternative. Ann Epidemiol. 2012;22:364–8.PubMedCrossRef
43.
go back to reference Greenland S. Transparency and disclosure, neutrality and balance: shared values or just shared words? J Epidemiol Community Health. 2012;66:967–70.PubMedCrossRef Greenland S. Transparency and disclosure, neutrality and balance: shared values or just shared words? J Epidemiol Community Health. 2012;66:967–70.PubMedCrossRef
44.
go back to reference Greenland S, Poole C. Problems in common interpretations of statistics in scientific articles, expert reports, and testimony. Jurimetrics. 2011;51:113–29. Greenland S, Poole C. Problems in common interpretations of statistics in scientific articles, expert reports, and testimony. Jurimetrics. 2011;51:113–29.
45.
go back to reference Greenland S, Poole C. Living with P-values: resurrecting a Bayesian perspective on frequentist statistics. Epidemiology. 2013;24:62–8.PubMedCrossRef Greenland S, Poole C. Living with P-values: resurrecting a Bayesian perspective on frequentist statistics. Epidemiology. 2013;24:62–8.PubMedCrossRef
46.
go back to reference Greenland S, Poole C. Living with statistics in observational research. Epidemiology. 2013;24:73–8.PubMedCrossRef Greenland S, Poole C. Living with statistics in observational research. Epidemiology. 2013;24:73–8.PubMedCrossRef
48.
go back to reference Hoekstra R, Finch S, Kiers HAL, Johnson A. Probability as certainty: dichotomous thinking and the misuse of p-values. Psychon Bull Rev. 2006;13:1033–7.PubMedCrossRef Hoekstra R, Finch S, Kiers HAL, Johnson A. Probability as certainty: dichotomous thinking and the misuse of p-values. Psychon Bull Rev. 2006;13:1033–7.PubMedCrossRef
49.
go back to reference Hurlbert Lombardi CM. Final collapse of the Neyman–Pearson decision theoretic framework and rise of the neoFisherian. Ann Zool Fenn. 2009;46:311–49.CrossRef Hurlbert Lombardi CM. Final collapse of the Neyman–Pearson decision theoretic framework and rise of the neoFisherian. Ann Zool Fenn. 2009;46:311–49.CrossRef
50.
go back to reference Kaye DH. Is proof of statistical significance relevant? Wash Law Rev. 1986;61:1333–66. Kaye DH. Is proof of statistical significance relevant? Wash Law Rev. 1986;61:1333–66.
51.
go back to reference Lambdin C. Significance tests as sorcery: science is empirical—significance tests are not. Theory Psychol. 2012;22(1):67–90.CrossRef Lambdin C. Significance tests as sorcery: science is empirical—significance tests are not. Theory Psychol. 2012;22(1):67–90.CrossRef
53.
go back to reference LeCoutre M-P, Poitevineau J, Lecoutre B. Even statisticians are not immune to misinterpretations of null hypothesis tests. Int J Psychol. 2003;38:37–45.CrossRef LeCoutre M-P, Poitevineau J, Lecoutre B. Even statisticians are not immune to misinterpretations of null hypothesis tests. Int J Psychol. 2003;38:37–45.CrossRef
54.
go back to reference Lew MJ. Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don’t know P. Br J Pharmacol. 2012;166:1559–67.PubMedPubMedCentralCrossRef Lew MJ. Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don’t know P. Br J Pharmacol. 2012;166:1559–67.PubMedPubMedCentralCrossRef
55.
go back to reference Loftus GR. Psychology will be a much better science when we change the way we analyze data. Curr Dir Psychol. 1996;5:161–71.CrossRef Loftus GR. Psychology will be a much better science when we change the way we analyze data. Curr Dir Psychol. 1996;5:161–71.CrossRef
56.
go back to reference Matthews JNS, Altman DG. Interaction 2: Compare effect sizes not P values. Br Med J. 1996;313:808.CrossRef Matthews JNS, Altman DG. Interaction 2: Compare effect sizes not P values. Br Med J. 1996;313:808.CrossRef
57.
go back to reference Pocock SJ, Ware JH. Translating statistical findings into plain English. Lancet. 2009;373:1926–8.PubMedCrossRef Pocock SJ, Ware JH. Translating statistical findings into plain English. Lancet. 2009;373:1926–8.PubMedCrossRef
58.
go back to reference Pocock SJ, Hughes MD, Lee RJ. Statistical problems in the reporting of clinical trials. N Eng J Med. 1987;317:426–32.CrossRef Pocock SJ, Hughes MD, Lee RJ. Statistical problems in the reporting of clinical trials. N Eng J Med. 1987;317:426–32.CrossRef
61.
go back to reference Poole C. Low P-values or narrow confidence intervals: which are more durable? Epidemiology. 2001;12:291–4.PubMedCrossRef Poole C. Low P-values or narrow confidence intervals: which are more durable? Epidemiology. 2001;12:291–4.PubMedCrossRef
62.
go back to reference Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol. 1989;44:1276–84.CrossRef Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol. 1989;44:1276–84.CrossRef
65.
66.
go back to reference Salsburg DS. The religion of statistics as practiced in medical journals. Am Stat. 1985;39:220–3. Salsburg DS. The religion of statistics as practiced in medical journals. Am Stat. 1985;39:220–3.
67.
go back to reference Schmidt FL. Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychol Methods. 1996;1:115–29.CrossRef Schmidt FL. Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychol Methods. 1996;1:115–29.CrossRef
68.
go back to reference Schmidt FL, Hunter JE. Methods of meta-analysis: correcting error and bias in research findings. 3rd ed. Thousand Oaks: Sage; 2014. Schmidt FL, Hunter JE. Methods of meta-analysis: correcting error and bias in research findings. 3rd ed. Thousand Oaks: Sage; 2014.
69.
go back to reference Sterne JAC, Davey Smith G. Sifting the evidence—what’s wrong with significance tests? Br Med J. 2001;322:226–31.CrossRef Sterne JAC, Davey Smith G. Sifting the evidence—what’s wrong with significance tests? Br Med J. 2001;322:226–31.CrossRef
71.
go back to reference Thompson B. The “significance” crisis in psychology and education. J Soc Econ. 2004;33:607–13.CrossRef Thompson B. The “significance” crisis in psychology and education. J Soc Econ. 2004;33:607–13.CrossRef
72.
go back to reference Wagenmakers E-J. A practical solution to the pervasive problem of p values. Psychon Bull Rev. 2007;14:779–804.PubMedCrossRef Wagenmakers E-J. A practical solution to the pervasive problem of p values. Psychon Bull Rev. 2007;14:779–804.PubMedCrossRef
74.
go back to reference Wood J, Freemantle N, King M, Nazareth I. Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data. BMJ. 2014;348:g2215. doi:10.1136/bmj.g2215.PubMedCrossRef Wood J, Freemantle N, King M, Nazareth I. Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data. BMJ. 2014;348:g2215. doi:10.​1136/​bmj.​g2215.PubMedCrossRef
75.
go back to reference Stigler SM. The history of statistics. Cambridge, MA: Belknap Press; 1986. Stigler SM. The history of statistics. Cambridge, MA: Belknap Press; 1986.
76.
go back to reference Neyman J. Outline of a theory of statistical estimation based on the classical theory of probability. Philos Trans R Soc Lond A. 1937;236:333–80.CrossRef Neyman J. Outline of a theory of statistical estimation based on the classical theory of probability. Philos Trans R Soc Lond A. 1937;236:333–80.CrossRef
77.
go back to reference Edwards W, Lindman H, Savage LJ. Bayesian statistical inference for psychological research. Psychol Rev. 1963;70:193–242.CrossRef Edwards W, Lindman H, Savage LJ. Bayesian statistical inference for psychological research. Psychol Rev. 1963;70:193–242.CrossRef
78.
go back to reference Berger JO, Sellke TM. Testing a point null hypothesis: the irreconcilability of P-values and evidence. J Am Stat Assoc. 1987;82:112–39. Berger JO, Sellke TM. Testing a point null hypothesis: the irreconcilability of P-values and evidence. J Am Stat Assoc. 1987;82:112–39.
79.
go back to reference Edwards AWF. Likelihood. 2nd ed. Baltimore: Johns Hopkins University Press; 1992. Edwards AWF. Likelihood. 2nd ed. Baltimore: Johns Hopkins University Press; 1992.
81.
go back to reference Royall R. Statistical evidence. New York: Chapman and Hall; 1997. Royall R. Statistical evidence. New York: Chapman and Hall; 1997.
82.
go back to reference Sellke TM, Bayarri MJ, Berger JO. Calibration of p values for testing precise null hypotheses. Am Stat. 2001;55:62–71.CrossRef Sellke TM, Bayarri MJ, Berger JO. Calibration of p values for testing precise null hypotheses. Am Stat. 2001;55:62–71.CrossRef
83.
go back to reference Goodman SN. Introduction to Bayesian methods I: measuring the strength of evidence. Clin Trials. 2005;2:282–90.PubMedCrossRef Goodman SN. Introduction to Bayesian methods I: measuring the strength of evidence. Clin Trials. 2005;2:282–90.PubMedCrossRef
84.
87.
go back to reference Mayo DG, Cox DR. Frequentist statistics as a theory of inductive inference. In: J Rojo, editor. Optimality: the second Erich L. Lehmann symposium, Lecture notes-monograph series, Institute of Mathematical Statistics (IMS). 2006;49: 77–97. Mayo DG, Cox DR. Frequentist statistics as a theory of inductive inference. In: J Rojo, editor. Optimality: the second Erich L. Lehmann symposium, Lecture notes-monograph series, Institute of Mathematical Statistics (IMS). 2006;49: 77–97.
89.
go back to reference Hedges LV, Olkin I. Vote-counting methods in research synthesis. Psychol Bull. 1980;88:359–69.CrossRef Hedges LV, Olkin I. Vote-counting methods in research synthesis. Psychol Bull. 1980;88:359–69.CrossRef
90.
go back to reference Chalmers TC, Lau J. Changes in clinical trials mandated by the advent of meta-analysis. Stat Med. 1996;15:1263–8.PubMedCrossRef Chalmers TC, Lau J. Changes in clinical trials mandated by the advent of meta-analysis. Stat Med. 1996;15:1263–8.PubMedCrossRef
91.
go back to reference Maheshwari S, Sarraj A, Kramer J, El-Serag HB. Oral contraception and the risk of hepatocellular carcinoma. J Hepatol. 2007;47:506–13.PubMedCrossRef Maheshwari S, Sarraj A, Kramer J, El-Serag HB. Oral contraception and the risk of hepatocellular carcinoma. J Hepatol. 2007;47:506–13.PubMedCrossRef
92.
go back to reference Cox DR. The planning of experiments. New York: Wiley; 1958. p. 161. Cox DR. The planning of experiments. New York: Wiley; 1958. p. 161.
93.
go back to reference Smith AH, Bates M. Confidence limit analyses should replace power calculations in the interpretation of epidemiologic studies. Epidemiology. 1992;3:449–52.PubMedCrossRef Smith AH, Bates M. Confidence limit analyses should replace power calculations in the interpretation of epidemiologic studies. Epidemiology. 1992;3:449–52.PubMedCrossRef
95.
go back to reference Goodman SN, Berlin J. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121:200–6.PubMedCrossRef Goodman SN, Berlin J. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121:200–6.PubMedCrossRef
96.
go back to reference Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat. 2001;55:19–24.CrossRef Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat. 2001;55:19–24.CrossRef
98.
go back to reference Lash TL, Fox MP, Maclehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43:1969–85.PubMedCrossRef Lash TL, Fox MP, Maclehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43:1969–85.PubMedCrossRef
99.
go back to reference Dwan K, Gamble C, Williamson PR, Kirkham JJ, Reporting Bias Group. Systematic review of the empirical evidence of study publication bias and outcome reporting bias—an updated review. PLoS One. 2013;8:e66844.PubMedPubMedCentralCrossRef Dwan K, Gamble C, Williamson PR, Kirkham JJ, Reporting Bias Group. Systematic review of the empirical evidence of study publication bias and outcome reporting bias—an updated review. PLoS One. 2013;8:e66844.PubMedPubMedCentralCrossRef
100.
go back to reference Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, Forbes A. Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions. Cochrane Database Syst Rev. 2014;10:MR000035. Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, Forbes A. Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions. Cochrane Database Syst Rev. 2014;10:MR000035.
101.
go back to reference You B, Gan HK, Pond G, Chen EX. Consistency in the analysis and reporting of primary end points in oncology randomized controlled trials from registration to publication: a systematic review. J Clin Oncol. 2012;30:210–6.PubMedCrossRef You B, Gan HK, Pond G, Chen EX. Consistency in the analysis and reporting of primary end points in oncology randomized controlled trials from registration to publication: a systematic review. J Clin Oncol. 2012;30:210–6.PubMedCrossRef
102.
go back to reference Button K, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14:365–76.PubMedCrossRef Button K, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14:365–76.PubMedCrossRef
103.
go back to reference Eyding D, Lelgemann M, Grouven U, Härter M, Kromp M, Kaiser T, Kerekes MF, Gerken M, Wieseler B. Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials. BMJ. 2010;341:c4737.PubMedPubMedCentralCrossRef Eyding D, Lelgemann M, Grouven U, Härter M, Kromp M, Kaiser T, Kerekes MF, Gerken M, Wieseler B. Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials. BMJ. 2010;341:c4737.PubMedPubMedCentralCrossRef
104.
go back to reference Land CE. Estimating cancer risks from low doses of ionizing radiation. Science. 1980;209:1197–203.PubMedCrossRef Land CE. Estimating cancer risks from low doses of ionizing radiation. Science. 1980;209:1197–203.PubMedCrossRef
106.
go back to reference Greenland S. Dealing with uncertainty about investigator bias: disclosure is informative. J Epidemiol Community Health. 2009;63:593–8.PubMedCrossRef Greenland S. Dealing with uncertainty about investigator bias: disclosure is informative. J Epidemiol Community Health. 2009;63:593–8.PubMedCrossRef
107.
go back to reference Xu L, Freeman G, Cowling BJ, Schooling CM. Testosterone therapy and cardiovascular events among men: a systematic review and meta-analysis of placebo-controlled randomized trials. BMC Med. 2013;11:108.PubMedPubMedCentralCrossRef Xu L, Freeman G, Cowling BJ, Schooling CM. Testosterone therapy and cardiovascular events among men: a systematic review and meta-analysis of placebo-controlled randomized trials. BMC Med. 2013;11:108.PubMedPubMedCentralCrossRef
108.
go back to reference Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purposes of statistical inference: part I. Biometrika. 1928;20A:175–240. Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purposes of statistical inference: part I. Biometrika. 1928;20A:175–240.
109.
go back to reference Pearson ES. Statistical concepts in the relation to reality. J R Stat Soc B. 1955;17:204–7. Pearson ES. Statistical concepts in the relation to reality. J R Stat Soc B. 1955;17:204–7.
110.
go back to reference Fisher RA. Statistical methods and scientific inference. Edinburgh: Oliver and Boyd; 1956. Fisher RA. Statistical methods and scientific inference. Edinburgh: Oliver and Boyd; 1956.
112.
go back to reference Casella G, Berger RL. Reconciling Bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc. 1987;82:106–11.CrossRef Casella G, Berger RL. Reconciling Bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc. 1987;82:106–11.CrossRef
114.
go back to reference Yates F. The influence of statistical methods for research workers on the development of the science of statistics. J Am Stat Assoc. 1951;46:19–34. Yates F. The influence of statistical methods for research workers on the development of the science of statistics. J Am Stat Assoc. 1951;46:19–34.
115.
go back to reference Cumming G. Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. London: Routledge; 2011. Cumming G. Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. London: Routledge; 2011.
116.
go back to reference Morey RD, Hoekstra R, Rouder JN, Lee MD, Wagenmakers E-J. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev (in press). Morey RD, Hoekstra R, Rouder JN, Lee MD, Wagenmakers E-J. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev (in press).
117.
go back to reference Rosenthal R, Rubin DB. The counternull value of an effect size: a new statistic. Psychol Sci. 1994;5:329–34.CrossRef Rosenthal R, Rubin DB. The counternull value of an effect size: a new statistic. Psychol Sci. 1994;5:329–34.CrossRef
118.
go back to reference Mayo DG, Spanos A. Severe testing as a basic concept in a Neyman–Pearson philosophy of induction. Br J Philos Sci. 2006;57:323–57.CrossRef Mayo DG, Spanos A. Severe testing as a basic concept in a Neyman–Pearson philosophy of induction. Br J Philos Sci. 2006;57:323–57.CrossRef
119.
go back to reference Whitehead A. Meta-analysis of controlled clinical trials. New York: Wiley; 2002.CrossRef Whitehead A. Meta-analysis of controlled clinical trials. New York: Wiley; 2002.CrossRef
120.
go back to reference Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta-analysis. New York: Wiley; 2009.CrossRef Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta-analysis. New York: Wiley; 2009.CrossRef
121.
go back to reference Chen D-G, Peace KE. Applied meta-analysis with R. New York: Chapman & Hall/CRC; 2013. Chen D-G, Peace KE. Applied meta-analysis with R. New York: Chapman & Hall/CRC; 2013.
122.
go back to reference Cooper H, Hedges LV, Valentine JC. The handbook of research synthesis and meta-analysis. Thousand Oaks: Sage; 2009. Cooper H, Hedges LV, Valentine JC. The handbook of research synthesis and meta-analysis. Thousand Oaks: Sage; 2009.
123.
go back to reference Greenland S, O’Rourke K. Meta-analysis Ch. 33. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. 3rd ed. Philadelphia: Lippincott-Wolters-Kluwer; 2008. p. 682–5. Greenland S, O’Rourke K. Meta-analysis Ch. 33. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. 3rd ed. Philadelphia: Lippincott-Wolters-Kluwer; 2008. p. 682–5.
124.
go back to reference Petitti DB. Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. 2nd ed. New York: Oxford U Press; 2000. Petitti DB. Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. 2nd ed. New York: Oxford U Press; 2000.
125.
go back to reference Sterne JAC. Meta-analysis: an updated collection from the Stata journal. College Station, TX: Stata Press; 2009. Sterne JAC. Meta-analysis: an updated collection from the Stata journal. College Station, TX: Stata Press; 2009.
126.
Metadata
Title
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations
Authors
Sander Greenland
Stephen J. Senn
Kenneth J. Rothman
John B. Carlin
Charles Poole
Steven N. Goodman
Douglas G. Altman
Publication date
01-04-2016
Publisher
Springer Netherlands
Published in
European Journal of Epidemiology / Issue 4/2016
Print ISSN: 0393-2990
Electronic ISSN: 1573-7284
DOI
https://doi.org/10.1007/s10654-016-0149-3

Other articles of this Issue 4/2016

European Journal of Epidemiology 4/2016 Go to the issue