Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2002

Open Access 01-12-2002 | Debate

Do multiple outcome measures require p-value adjustment?

Author: Ronald J Feise

Published in: BMC Medical Research Methodology | Issue 1/2002

Login to get access

Abstract

Background

Readers may question the interpretation of findings in clinical trials when multiple outcome measures are used without adjustment of the p-value. This question arises because of the increased risk of Type I errors (findings of false "significance") when multiple simultaneous hypotheses are tested at set p-values. The primary aim of this study was to estimate the need to make appropriate p-value adjustments in clinical trials to compensate for a possible increased risk in committing Type I errors when multiple outcome measures are used.

Discussion

The classicists believe that the chance of finding at least one test statistically significant due to chance and incorrectly declaring a difference increases as the number of comparisons increases. The rationalists have the following objections to that theory: 1) P-value adjustments are calculated based on how many tests are to be considered, and that number has been defined arbitrarily and variably; 2) P-value adjustments reduce the chance of making type I errors, but they increase the chance of making type II errors or needing to increase the sample size.

Summary

Readers should balance a study's statistical significance with the magnitude of effect, the quality of the study and with findings from other studies. Researchers facing multiple outcome measures might want to either select a primary outcome measure or use a global assessment measure, rather than adjusting the p-value.
Literature
1.
go back to reference Godfrey K: Statistics in practice. Comparing the means of several groups. N Engl J Med. 1985, 313: 1450-1456.CrossRefPubMed Godfrey K: Statistics in practice. Comparing the means of several groups. N Engl J Med. 1985, 313: 1450-1456.CrossRefPubMed
2.
go back to reference Feise RJ: Behavioral-graded activity compared with usual care after first-time disk surgery: Considerations of the design of a randomized clinical trial (Letter). J Manipulative Physiol Ther. 2001, 24: 67-68. 10.1067/mmt.2001.112007.CrossRefPubMed Feise RJ: Behavioral-graded activity compared with usual care after first-time disk surgery: Considerations of the design of a randomized clinical trial (Letter). J Manipulative Physiol Ther. 2001, 24: 67-68. 10.1067/mmt.2001.112007.CrossRefPubMed
3.
go back to reference Ostelo RW, de Vet HC: Behavioral-graded activity compared with usual care after first-time disk surgery: Considerations of the design of a randomized clinical trial (Letter). J Manipulative Physiol Ther. 2001, 24: 68-10.1067/mmt.2001.112008.CrossRef Ostelo RW, de Vet HC: Behavioral-graded activity compared with usual care after first-time disk surgery: Considerations of the design of a randomized clinical trial (Letter). J Manipulative Physiol Ther. 2001, 24: 68-10.1067/mmt.2001.112008.CrossRef
4.
go back to reference Tukey JW: Some thoughts on clinical trials, especially problems of multiplicity. Science. 1977, 198: 679-684.CrossRefPubMed Tukey JW: Some thoughts on clinical trials, especially problems of multiplicity. Science. 1977, 198: 679-684.CrossRefPubMed
7.
go back to reference Ludbrook J: Multiple comparison procedures updated. Clin Exp Pharmacol Physiol. 1998, 25: 1032-1037.CrossRefPubMed Ludbrook J: Multiple comparison procedures updated. Clin Exp Pharmacol Physiol. 1998, 25: 1032-1037.CrossRefPubMed
8.
go back to reference Ahlbom A: Biostatistics for Epidemiologists. Boca Raton (FL), Lewis Publishers. 1993, 52-53. Ahlbom A: Biostatistics for Epidemiologists. Boca Raton (FL), Lewis Publishers. 1993, 52-53.
9.
go back to reference Steenland K, Bray I, Greenland S, Boffetta P: Empirical bayes adjustments for multiple results in hypothesis-generating or surveillance studies. Cancer Epidemiol Biomarkers Prev. 2000, 9: 895-903.PubMed Steenland K, Bray I, Greenland S, Boffetta P: Empirical bayes adjustments for multiple results in hypothesis-generating or surveillance studies. Cancer Epidemiol Biomarkers Prev. 2000, 9: 895-903.PubMed
10.
go back to reference Sidak Z: Rectangular confidence regions for the means of multivariate normal distribution. J Am Statist Assoc. 1967, 62: 626-633. Sidak Z: Rectangular confidence regions for the means of multivariate normal distribution. J Am Statist Assoc. 1967, 62: 626-633.
11.
go back to reference Williams DA: A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics. 1971, 27: 103-117.CrossRefPubMed Williams DA: A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics. 1971, 27: 103-117.CrossRefPubMed
12.
go back to reference Holm S: A simple sequentially rejective multiple test procedure. Scand J Statis. 1979, 6: 65-70. Holm S: A simple sequentially rejective multiple test procedure. Scand J Statis. 1979, 6: 65-70.
13.
14.
go back to reference Stoline MR: The status of multiple comparisons: simultaneous estimation of all pairwise comparisons in one-way ANOVA designs. Am Stat. 1981, 35: 134-141. Stoline MR: The status of multiple comparisons: simultaneous estimation of all pairwise comparisons in one-way ANOVA designs. Am Stat. 1981, 35: 134-141.
15.
go back to reference Tukey JW, Ciminera JL, Heyse JF: Testing the statistical certainty of a response to increasing doses of a drug. Biometrics. 1985, 41: 295-301.CrossRefPubMed Tukey JW, Ciminera JL, Heyse JF: Testing the statistical certainty of a response to increasing doses of a drug. Biometrics. 1985, 41: 295-301.CrossRefPubMed
16.
go back to reference Shaffer JP: Modified sequentially rejective multiple test procedures. J Amer Stat Assn. 1986, 81: 826-831.CrossRef Shaffer JP: Modified sequentially rejective multiple test procedures. J Amer Stat Assn. 1986, 81: 826-831.CrossRef
17.
go back to reference Hochberg Y, Tamhane AC: Multiple comparison procedures. New York, John Wiley. 1987 Hochberg Y, Tamhane AC: Multiple comparison procedures. New York, John Wiley. 1987
18.
go back to reference Hommel G: A stepwise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988, 75: 383-386.CrossRef Hommel G: A stepwise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988, 75: 383-386.CrossRef
19.
go back to reference Westfall PH, Young SS: p-Value adjustments for multiple tests in multivariate binomial models. J Amer Stat Assn. 1989, 84: 780-786. Westfall PH, Young SS: p-Value adjustments for multiple tests in multivariate binomial models. J Amer Stat Assn. 1989, 84: 780-786.
20.
21.
go back to reference Turkheimer F, Pettigrew K, Sokoloff L, Smith CB, Schmidt K: Selection of an adaptive test statistic for use with multiple comparison analyses of neuroimaging data. Neuroimage. 2000, 12: 219-229. 10.1006/nimg.2000.0608.CrossRefPubMed Turkheimer F, Pettigrew K, Sokoloff L, Smith CB, Schmidt K: Selection of an adaptive test statistic for use with multiple comparison analyses of neuroimaging data. Neuroimage. 2000, 12: 219-229. 10.1006/nimg.2000.0608.CrossRefPubMed
22.
go back to reference Neyman J, Pearson ES: On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika. 1928, 20A: 175-240. Neyman J, Pearson ES: On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika. 1928, 20A: 175-240.
24.
25.
go back to reference Savitz DA, Olshan AF: Multiple comparisons and related issues in the interpretation of epidemiologic data. Am J Epidemiol. 1995, 142: 904-908.PubMed Savitz DA, Olshan AF: Multiple comparisons and related issues in the interpretation of epidemiologic data. Am J Epidemiol. 1995, 142: 904-908.PubMed
26.
go back to reference Thompson JR: Invited commentary: Re: "Multiple comparisons and related issues in the interpretation of epidemiologic data". Am J Epidemiol. 1998, 147: 801-806.CrossRefPubMed Thompson JR: Invited commentary: Re: "Multiple comparisons and related issues in the interpretation of epidemiologic data". Am J Epidemiol. 1998, 147: 801-806.CrossRefPubMed
28.
go back to reference Thomas DC, Siemiatycki J, Dewar R, Robins J, Goldberg M, Armstrong BG: The problem of multiple inference in studies designed to generate hypotheses. Am J Epidemiol. 1985, 122: 1080-1095.PubMed Thomas DC, Siemiatycki J, Dewar R, Robins J, Goldberg M, Armstrong BG: The problem of multiple inference in studies designed to generate hypotheses. Am J Epidemiol. 1985, 122: 1080-1095.PubMed
29.
go back to reference Aickin M, Gensler H: Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. Am J Public Health. 1996, 86: 726-728.CrossRefPubMedPubMedCentral Aickin M, Gensler H: Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. Am J Public Health. 1996, 86: 726-728.CrossRefPubMedPubMedCentral
30.
go back to reference Manor O, Peritz E: Re: "Multiple comparisons and related issues in the interpretation of epidemiologic data". Am J Epidemiol. 1997, 145: 84-85.CrossRefPubMed Manor O, Peritz E: Re: "Multiple comparisons and related issues in the interpretation of epidemiologic data". Am J Epidemiol. 1997, 145: 84-85.CrossRefPubMed
31.
go back to reference O'Brien PC: Procedures for comparing samples with multiple endpoints. Biometrics. 1984, 40: 1079-1087.CrossRefPubMed O'Brien PC: Procedures for comparing samples with multiple endpoints. Biometrics. 1984, 40: 1079-1087.CrossRefPubMed
32.
go back to reference Simes RJ: An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1988, 73: 751-754.CrossRef Simes RJ: An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1988, 73: 751-754.CrossRef
33.
go back to reference Goldsmith CH, Smythe HA, Helewa A: Interpretation and power of a pooled index. J Rheumatol. 1993, 20: 575-578.PubMed Goldsmith CH, Smythe HA, Helewa A: Interpretation and power of a pooled index. J Rheumatol. 1993, 20: 575-578.PubMed
34.
go back to reference Zhang J, Quan H, Ng J, Stepanavage ME: Some statistical methods for multiple endpoints in clinical trials. Control Clin Trials. 1997, 18: 204-221. 10.1016/S0197-2456(96)00129-8.CrossRefPubMed Zhang J, Quan H, Ng J, Stepanavage ME: Some statistical methods for multiple endpoints in clinical trials. Control Clin Trials. 1997, 18: 204-221. 10.1016/S0197-2456(96)00129-8.CrossRefPubMed
36.
go back to reference deGruy F: Significance of multiple inferential tests. J Fam Pract. 1990, 30: 15-16. deGruy F: Significance of multiple inferential tests. J Fam Pract. 1990, 30: 15-16.
40.
go back to reference Small RD, Schor SS: Bayesian and non-Bayesian methods of inference. Ann Intern Med. 1983, 99: 857-859.CrossRefPubMed Small RD, Schor SS: Bayesian and non-Bayesian methods of inference. Ann Intern Med. 1983, 99: 857-859.CrossRefPubMed
Metadata
Title
Do multiple outcome measures require p-value adjustment?
Author
Ronald J Feise
Publication date
01-12-2002
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2002
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-2-8

Other articles of this Issue 1/2002

BMC Medical Research Methodology 1/2002 Go to the issue