Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2013

Open Access 01-12-2013 | Research article

Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

Authors: Benoit Liquet, Jérémie Riou

Published in: BMC Medical Research Methodology | Issue 1/2013

Login to get access

Abstract

Background

In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level.

Methods

For each coding, a test on the nullity of the coefficient associated with the new coded variable is computed. The selected coding corresponds to that associated with the largest statistical test (or equivalently the smallest p v a l u e ). In the context of the Generalized Linear Model, Liquet and Commenges (Stat Probability Lett,71:33–38,2005) proposed an asymptotic correction of the significance level. This procedure, based on the score test, has been developed for dichotomous and Box-Cox transformations. In this paper, we suggest the use of resampling methods to estimate the significance level for categorical transformations with more than two levels and, by definition those that involve more than one parameter in the model. The categorical transformation is a more flexible way to explore the unknown shape of the effect between an explanatory and a dependent variable.

Results

The simulations we ran in this study showed good performances of the proposed methods. These methods were illustrated using the data from a study of the relationship between cholesterol and dementia.

Conclusion

The algorithms were implemented using R, and the associated CPMCGLM R package is available on the CRAN.
Appendix
Available only for authorised users
Literature
1.
go back to reference Bennette C, Vickers A: Against Quantiles: categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol. 2012, 12: 21-25. 10.1186/1471-2288-12-21.CrossRefPubMedPubMedCentral Bennette C, Vickers A: Against Quantiles: categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol. 2012, 12: 21-25. 10.1186/1471-2288-12-21.CrossRefPubMedPubMedCentral
2.
go back to reference Altman D, Lausen B, Sauerbrei W, Schumacher M: Dangers of using optimal cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst. 1994, 86 (11): 829-835. 10.1093/jnci/86.11.829.CrossRefPubMed Altman D, Lausen B, Sauerbrei W, Schumacher M: Dangers of using optimal cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst. 1994, 86 (11): 829-835. 10.1093/jnci/86.11.829.CrossRefPubMed
3.
go back to reference Royston P, Altman D, Sauerbrei W: Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006, 25: 127-141. 10.1002/sim.2331.CrossRefPubMed Royston P, Altman D, Sauerbrei W: Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006, 25: 127-141. 10.1002/sim.2331.CrossRefPubMed
4.
go back to reference Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15 (4): 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15 (4): 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed
5.
go back to reference Miller RG: Simultaneous statistical inference. 2nd ed. 1981, New York - Heidelberg: Berlin: Springer- Verlag. XVI 299, figs. DM 44.00CrossRef Miller RG: Simultaneous statistical inference. 2nd ed. 1981, New York - Heidelberg: Berlin: Springer- Verlag. XVI 299, figs. DM 44.00CrossRef
6.
go back to reference Westfall PH: Improving power by dichotomizing (even under normality). Stat Biopharm Res. 2011, 3 (2): 353-362. 10.1198/sbr.2010.09055.CrossRef Westfall PH: Improving power by dichotomizing (even under normality). Stat Biopharm Res. 2011, 3 (2): 353-362. 10.1198/sbr.2010.09055.CrossRef
7.
go back to reference Simes R: An improved bonferroni procedure for multiple tests of significance. Biometrika. 1986, 73 (3): 751-754. 10.1093/biomet/73.3.751.CrossRef Simes R: An improved bonferroni procedure for multiple tests of significance. Biometrika. 1986, 73 (3): 751-754. 10.1093/biomet/73.3.751.CrossRef
8.
go back to reference Sidak Z: Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc. 1967, 62: 626-633. Sidak Z: Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc. 1967, 62: 626-633.
9.
go back to reference Holm S: A simple sequentially rejective multiple test procedure. Scand J Stat. 1979, 6: 65-70. Holm S: A simple sequentially rejective multiple test procedure. Scand J Stat. 1979, 6: 65-70.
10.
go back to reference Hommel G: A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988, 75: 383-386. 10.1093/biomet/75.2.383.CrossRef Hommel G: A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988, 75: 383-386. 10.1093/biomet/75.2.383.CrossRef
11.
go back to reference Hochberg Y: A sharper Bonferroni procedure for multiple test procedure. Biometrika. 1988, 75: 800-802. 10.1093/biomet/75.4.800.CrossRef Hochberg Y: A sharper Bonferroni procedure for multiple test procedure. Biometrika. 1988, 75: 800-802. 10.1093/biomet/75.4.800.CrossRef
12.
go back to reference Efron B: The length heuristic for simultaneous hypothesis tests. Biometrika. 1997, 84: 143-157. 10.1093/biomet/84.1.143.CrossRef Efron B: The length heuristic for simultaneous hypothesis tests. Biometrika. 1997, 84: 143-157. 10.1093/biomet/84.1.143.CrossRef
13.
go back to reference Liquet B, Commenges D: Correction of the P-value after multiple coding of an explanatory variable in logistic regression. Stat Med. 2001, 20: 2815-2826. 10.1002/sim.916.CrossRefPubMed Liquet B, Commenges D: Correction of the P-value after multiple coding of an explanatory variable in logistic regression. Stat Med. 2001, 20: 2815-2826. 10.1002/sim.916.CrossRefPubMed
14.
go back to reference Liquet B, Commenges D: Computation of the p-value of the minimum of score tests in the generalized linear model, application to multiple coding. Stat Probability Lett. 2005, 71: 33-38. 10.1016/j.spl.2004.10.019.CrossRef Liquet B, Commenges D: Computation of the p-value of the minimum of score tests in the generalized linear model, application to multiple coding. Stat Probability Lett. 2005, 71: 33-38. 10.1016/j.spl.2004.10.019.CrossRef
15.
go back to reference Hashemi R, Commenges D: Correction of the p-value after multiple tests in a Cox proportional hazard model. Lifetime Data Anal. 2002, 8: 335-348. 10.1023/A:1020514804325.CrossRefPubMed Hashemi R, Commenges D: Correction of the p-value after multiple tests in a Cox proportional hazard model. Lifetime Data Anal. 2002, 8: 335-348. 10.1023/A:1020514804325.CrossRefPubMed
16.
go back to reference Bonarek M, Barberger-Gateau P, Letenneur L, Deschamps V, Iron A, Dubroca B, Dartigues J: between cholesterol, apolipoprotein E polymorphism and dementia: a cross-sectional analysis from the PAQUID study. Neuroepidemiology. 2000, 19: 141-48. 10.1159/000026249.CrossRefPubMed Bonarek M, Barberger-Gateau P, Letenneur L, Deschamps V, Iron A, Dubroca B, Dartigues J: between cholesterol, apolipoprotein E polymorphism and dementia: a cross-sectional analysis from the PAQUID study. Neuroepidemiology. 2000, 19: 141-48. 10.1159/000026249.CrossRefPubMed
17.
go back to reference McCullagh P, Nelder J: Genaralized Linear Models. 1989, New York: Chapman & HallCrossRef McCullagh P, Nelder J: Genaralized Linear Models. 1989, New York: Chapman & HallCrossRef
18.
go back to reference Genz A: Numerical computation of multivariate normal probabilities. J Comput Graphical Stat. 1992, 1: 141-149. Genz A: Numerical computation of multivariate normal probabilities. J Comput Graphical Stat. 1992, 1: 141-149.
19.
go back to reference Cox D, Hinkley D: Theoretical Statistics. 1994, London: Chapman & Hall Cox D, Hinkley D: Theoretical Statistics. 1994, London: Chapman & Hall
20.
go back to reference Royen T: Expansions for the multivariate chi-Square distribution. J Multivariate Anal. 1991, 38: 213-232. 10.1016/0047-259X(91)90041-Y.CrossRef Royen T: Expansions for the multivariate chi-Square distribution. J Multivariate Anal. 1991, 38: 213-232. 10.1016/0047-259X(91)90041-Y.CrossRef
21.
go back to reference Dagupsta N, Spurrier J: A class of multivariate χ2 distributions with applications to comparison with a control. Commun Stat- Theory Methods. 1997, 26: 1559-1573. 10.1080/03610929708832000.CrossRef Dagupsta N, Spurrier J: A class of multivariate χ2 distributions with applications to comparison with a control. Commun Stat- Theory Methods. 1997, 26: 1559-1573. 10.1080/03610929708832000.CrossRef
22.
go back to reference Worsley K: An improved Bonferroni inequality and applications. Biometrika. 1982, 69: 297-302. 10.1093/biomet/69.2.297.CrossRef Worsley K: An improved Bonferroni inequality and applications. Biometrika. 1982, 69: 297-302. 10.1093/biomet/69.2.297.CrossRef
23.
go back to reference Westfall PH, Young S: Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. 1992, New York: NY Wiley, xvii, 340 p Westfall PH, Young S: Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. 1992, New York: NY Wiley, xvii, 340 p
24.
go back to reference Yu K, Liang F, Ciampa J, Chatterjee N: Efficient p-value evaluation for resampling-based tests. Biostatistics. 2011, 12 (3): 582-593. 10.1093/biostatistics/kxq078.CrossRefPubMedPubMedCentral Yu K, Liang F, Ciampa J, Chatterjee N: Efficient p-value evaluation for resampling-based tests. Biostatistics. 2011, 12 (3): 582-593. 10.1093/biostatistics/kxq078.CrossRefPubMedPubMedCentral
25.
go back to reference Commenges D, Liquet B: Asymptotic distribution of score statistics for spatial cluster detection with censored data. Biometrics. 2008, 64 (4): 1287-1289. 10.1111/j.1541-0420.2008.01132_1.x.CrossRefPubMed Commenges D, Liquet B: Asymptotic distribution of score statistics for spatial cluster detection with censored data. Biometrics. 2008, 64 (4): 1287-1289. 10.1111/j.1541-0420.2008.01132_1.x.CrossRefPubMed
26.
go back to reference Romano J: On the behavior of randomization tests without a group invariance assumption. J Am Stat Assoc. 1990, 85 (411–412): 686-CrossRef Romano J: On the behavior of randomization tests without a group invariance assumption. J Am Stat Assoc. 1990, 85 (411–412): 686-CrossRef
27.
go back to reference Xu H, Hsu J: Applying the generalized partitioning principle to control the generalized familywise error rate. Biom J. 2007, 49: 52-67. 10.1002/bimj.200610307.CrossRefPubMed Xu H, Hsu J: Applying the generalized partitioning principle to control the generalized familywise error rate. Biom J. 2007, 49: 52-67. 10.1002/bimj.200610307.CrossRefPubMed
28.
go back to reference Kaizar E, Li Y, Hsu J: Permutation multiple tests of binary features do not uniformly control error rates. J Am Stat Assoc. 2011, 106 (495): 1067-1074. 10.1198/jasa.2011.tm10067.CrossRef Kaizar E, Li Y, Hsu J: Permutation multiple tests of binary features do not uniformly control error rates. J Am Stat Assoc. 2011, 106 (495): 1067-1074. 10.1198/jasa.2011.tm10067.CrossRef
29.
go back to reference Commenges D: Transformations which preserve exchangeability and application to permutation tests. J Nonparametric Stat. 2003, 15 (2): 171-185. 10.1080/1048525031000089310.CrossRef Commenges D: Transformations which preserve exchangeability and application to permutation tests. J Nonparametric Stat. 2003, 15 (2): 171-185. 10.1080/1048525031000089310.CrossRef
31.
go back to reference Good P: Permutation Tests: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. 2000, New-York: Springer-Verlag Good P: Permutation Tests: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. 2000, New-York: Springer-Verlag
32.
go back to reference Efron B, Tibshirani R: An Introduction to the Bootstrap (Chapman & Hall/CRC Monographs on Statistics & Applied Probability). 1994, London: Chapman and Hall/CRC Efron B, Tibshirani R: An Introduction to the Bootstrap (Chapman & Hall/CRC Monographs on Statistics & Applied Probability). 1994, London: Chapman and Hall/CRC
Metadata
Title
Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models
Authors
Benoit Liquet
Jérémie Riou
Publication date
01-12-2013
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2013
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-13-75

Other articles of this Issue 1/2013

BMC Medical Research Methodology 1/2013 Go to the issue