Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2012

Open Access 01-12-2012 | Research article

Fitting parametric random effects models in very large data sets with application to VHA national data

Authors: Mulugeta Gebregziabher, Leonard Egede, Gregory E Gilbert, Kelly Hunt, Paul J Nietert, Patrick Mauldin

Published in: BMC Medical Research Methodology | Issue 1/2012

Login to get access

Abstract

Background

With the current focus on personalized medicine, patient/subject level inference is often of key interest in translational research. As a result, random effects models (REM) are becoming popular for patient level inference. However, for very large data sets that are characterized by large sample size, it can be difficult to fit REM using commonly available statistical software such as SAS since they require inordinate amounts of computer time and memory allocations beyond what are available preventing model convergence. For example, in a retrospective cohort study of over 800,000 Veterans with type 2 diabetes with longitudinal data over 5 years, fitting REM via generalized linear mixed modeling using currently available standard procedures in SAS (e.g. PROC GLIMMIX) was very difficult and same problems exist in Stata’s gllamm or R’s lme packages. Thus, this study proposes and assesses the performance of a meta regression approach and makes comparison with methods based on sampling of the full data.

Data

We use both simulated and real data from a national cohort of Veterans with type 2 diabetes (n=890,394) which was created by linking multiple patient and administrative files resulting in a cohort with longitudinal data collected over 5 years.

Methods and results

The outcome of interest was mean annual HbA1c measured over a 5 years period. Using this outcome, we compared parameter estimates from the proposed random effects meta regression (REMR) with estimates based on simple random sampling and VISN (Veterans Integrated Service Networks) based stratified sampling of the full data. Our results indicate that REMR provides parameter estimates that are less likely to be biased with tighter confidence intervals when the VISN level estimates are homogenous.

Conclusion

When the interest is to fit REM in repeated measures data with very large sample size, REMR can be used as a good alternative. It leads to reasonable inference for both Gaussian and non-Gaussian responses if parameter estimates are homogeneous across VISNs.
Appendix
Available only for authorised users
Literature
1.
go back to reference Ornstein S, Nemeth LS, Jenkins RG, Nietert PJ: Colorectal cancer screening in primary care: translating research into practice. Medical care. 2010, 48 (10): 900-906.CrossRefPubMedPubMedCentral Ornstein S, Nemeth LS, Jenkins RG, Nietert PJ: Colorectal cancer screening in primary care: translating research into practice. Medical care. 2010, 48 (10): 900-906.CrossRefPubMedPubMedCentral
2.
go back to reference Eckert MA, Keren NI, Roberts DR, Calhoun VD, Harris KC: Age-related changes in processing speed: unique contributions of cerebellar and prefrontal cortex. Front Hum Neurosci. 2010, 4: 10-PubMedPubMedCentral Eckert MA, Keren NI, Roberts DR, Calhoun VD, Harris KC: Age-related changes in processing speed: unique contributions of cerebellar and prefrontal cortex. Front Hum Neurosci. 2010, 4: 10-PubMedPubMedCentral
3.
go back to reference Fitzmaurice GM, Laird NM, Ware JH: Applied Longitudinal Analysis. 2004, New York: John Wiley & Sons Fitzmaurice GM, Laird NM, Ware JH: Applied Longitudinal Analysis. 2004, New York: John Wiley & Sons
4.
go back to reference Aitkin M, Anderson D, Francis B, Hinde J: Statistical modeling in GLIM. 1989 Aitkin M, Anderson D, Francis B, Hinde J: Statistical modeling in GLIM. 1989
5.
go back to reference Breslow NE, Clayton DG: Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993, 88: 9-25. Breslow NE, Clayton DG: Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993, 88: 9-25.
6.
go back to reference Guha S, Ryan L: Gauss-Seidel estimation of generalized linear mixed models with application to Poisson modeling of spatially varying disease rates. 2006, Boston: MA: Harvard School of Public Health Guha S, Ryan L: Gauss-Seidel estimation of generalized linear mixed models with application to Poisson modeling of spatially varying disease rates. 2006, Boston: MA: Harvard School of Public Health
7.
go back to reference Huang Z, Gelman A: Sampling for Bayesian computation with large datasets. SSRN eLibrary. 2005 Huang Z, Gelman A: Sampling for Bayesian computation with large datasets. SSRN eLibrary. 2005
8.
go back to reference Tao H, Palta M, Yandell BS, Newton MA: An estimation method for the semiparametric mixed effects model. Biometrics. 1999, 55 (1): 102-110.CrossRefPubMed Tao H, Palta M, Yandell BS, Newton MA: An estimation method for the semiparametric mixed effects model. Biometrics. 1999, 55 (1): 102-110.CrossRefPubMed
9.
go back to reference Owen A: Data squashing by empirical likelihood. Data Mining and Knowledge Discovery. 2003, 7: 101-113.CrossRef Owen A: Data squashing by empirical likelihood. Data Mining and Knowledge Discovery. 2003, 7: 101-113.CrossRef
10.
go back to reference DuMouchel WH: Bayesian meta-analysis. Statistical Methodology in the Pharmaceutical Sciences. Edited by: Berry DA. 1999, New York: Marcel Dekker DuMouchel WH: Bayesian meta-analysis. Statistical Methodology in the Pharmaceutical Sciences. Edited by: Berry DA. 1999, New York: Marcel Dekker
11.
go back to reference Madigan D, Raghavan N, DuMouchel W, Nason M, Posse C, Ridgeway G: Likelihood-based data squashing: a modeling approach to instance construction. Data Mining and Knowledge Discovery. 2002, 6: 173-190.CrossRef Madigan D, Raghavan N, DuMouchel W, Nason M, Posse C, Ridgeway G: Likelihood-based data squashing: a modeling approach to instance construction. Data Mining and Knowledge Discovery. 2002, 6: 173-190.CrossRef
12.
go back to reference Pennell ML, Dunson DB: Fitting semiparametric random effects models to large data sets. Biostatistics. 2007, 8 (4): 821-834.CrossRefPubMed Pennell ML, Dunson DB: Fitting semiparametric random effects models to large data sets. Biostatistics. 2007, 8 (4): 821-834.CrossRefPubMed
13.
go back to reference Bush CA, Maceachern SN: A semiparametric Bayesian model for randomised block designs. Biometrika. 1996, 83 (2): 275-285.CrossRef Bush CA, Maceachern SN: A semiparametric Bayesian model for randomised block designs. Biometrika. 1996, 83 (2): 275-285.CrossRef
14.
go back to reference Kleinman KP, Ibrahim JG: A semi-parametric Bayesian approach to generalized linear mixed models. Stat Med. 1998, 17 (22): 2579-2596.CrossRefPubMed Kleinman KP, Ibrahim JG: A semi-parametric Bayesian approach to generalized linear mixed models. Stat Med. 1998, 17 (22): 2579-2596.CrossRefPubMed
15.
go back to reference Ishwaran H, James LF: Gibbs Sampling Methods for Stick-Breaking Priors. J Am Stat Assoc. 2001, 96 (453): 161-173.CrossRef Ishwaran H, James LF: Gibbs Sampling Methods for Stick-Breaking Priors. J Am Stat Assoc. 2001, 96 (453): 161-173.CrossRef
16.
go back to reference Miller DR, Safford MM, Pogach LM: Who has diabetes? Best estimates of diabetes prevalence in the Department of Veterans Affairs based on computerized patient data. Diabetes Care. 2004, 27 (Suppl 2): B10-21.CrossRefPubMed Miller DR, Safford MM, Pogach LM: Who has diabetes? Best estimates of diabetes prevalence in the Department of Veterans Affairs based on computerized patient data. Diabetes Care. 2004, 27 (Suppl 2): B10-21.CrossRefPubMed
17.
go back to reference West AN, Lee RE, Shambaugh-Miller MD, Bair BD, Mueller KJ, Lilly RS, Kaboli PJ, Hawthorne K: Defining “Rural” for Veterans’ Health Care Planning. J Rural Health. 2011, 26 (4): 301-309.CrossRef West AN, Lee RE, Shambaugh-Miller MD, Bair BD, Mueller KJ, Lilly RS, Kaboli PJ, Hawthorne K: Defining “Rural” for Veterans’ Health Care Planning. J Rural Health. 2011, 26 (4): 301-309.CrossRef
18.
go back to reference ORD: Veterans Health Administration Field Research Advisory Committee Operating Procedure. 2004, Office of Research and Development (ORD) ORD: Veterans Health Administration Field Research Advisory Committee Operating Procedure. 2004, Office of Research and Development (ORD)
19.
go back to reference Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, Saunders LD, Beck CA, Feasby TE, Ghali WA: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Medical care. 2005, 43 (11): 1130-1139.CrossRefPubMed Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, Saunders LD, Beck CA, Feasby TE, Ghali WA: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Medical care. 2005, 43 (11): 1130-1139.CrossRefPubMed
20.
go back to reference Diggle PJ, Heagerty P, Liang K-Y, Zeger SL: Analysis of Longitudunal Data. 2002, Oxford, England: Oxford University Press, 2nd Diggle PJ, Heagerty P, Liang K-Y, Zeger SL: Analysis of Longitudunal Data. 2002, Oxford, England: Oxford University Press, 2nd
21.
22.
go back to reference Little RJA: Models for nonresponse in sample surveys. J Am Stat Assoc. 1982, 77: 237-250.CrossRef Little RJA: Models for nonresponse in sample surveys. J Am Stat Assoc. 1982, 77: 237-250.CrossRef
23.
go back to reference Pfeffermann D: The use of sampling weights for survey data analysis. Stat Methods Med Res. 1996, 5: 239-261.CrossRefPubMed Pfeffermann D: The use of sampling weights for survey data analysis. Stat Methods Med Res. 1996, 5: 239-261.CrossRefPubMed
24.
go back to reference Pfeffermann D, Skinner CJ, Holmes DJ, Goldstein H, Rasbash J: Weighting for unequal selection probabilities in multilevel models. J R Stat Soc: Series B. 1998, 60: 23-40.CrossRef Pfeffermann D, Skinner CJ, Holmes DJ, Goldstein H, Rasbash J: Weighting for unequal selection probabilities in multilevel models. J R Stat Soc: Series B. 1998, 60: 23-40.CrossRef
25.
go back to reference Kish L: Survey Sampling. 1965, London: John Wiley & Sons Kish L: Survey Sampling. 1965, London: John Wiley & Sons
26.
go back to reference Cochran WG: Sampling Techniques. 1977, New York: John Wiley & Sons, 3rd Cochran WG: Sampling Techniques. 1977, New York: John Wiley & Sons, 3rd
27.
go back to reference Rabe-Hesketh S, Skrondal A, Pickles A: Maximum likelihood estiamtion of limited and discrete dependent variable models with nested random effects. J Econometrics. 2005, 128: 301-323.CrossRef Rabe-Hesketh S, Skrondal A, Pickles A: Maximum likelihood estiamtion of limited and discrete dependent variable models with nested random effects. J Econometrics. 2005, 128: 301-323.CrossRef
28.
go back to reference Rabe-Hesketh S, Skrondal A, Pickles A: Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal. 2002, 2 (1): 1-21. Rabe-Hesketh S, Skrondal A, Pickles A: Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal. 2002, 2 (1): 1-21.
29.
go back to reference Binder DA: On the variances of asymptotically normal estimators from complex surveys. Int Stat Rev. 1983, 51: 279-292.CrossRef Binder DA: On the variances of asymptotically normal estimators from complex surveys. Int Stat Rev. 1983, 51: 279-292.CrossRef
30.
go back to reference Chambers RL, Skinner CJ: Analysis of Survey Data. 2003, Chichester: John Wiley & SonsCrossRef Chambers RL, Skinner CJ: Analysis of Survey Data. 2003, Chichester: John Wiley & SonsCrossRef
31.
go back to reference Skinner CJ: Domain means, regression and multivariate analysis. Analysis of Complex Surveys. Edited by: Skinner CJ, Holt D, Smith TMF. 1989, Chichester: John Wiley & Sons, Inc Skinner CJ: Domain means, regression and multivariate analysis. Analysis of Complex Surveys. Edited by: Skinner CJ, Holt D, Smith TMF. 1989, Chichester: John Wiley & Sons, Inc
32.
go back to reference Isaki CT, Fuller WA: Survey design under the regression super-population model. J Am Stat Assoc. 1982, 77: 89-96.CrossRef Isaki CT, Fuller WA: Survey design under the regression super-population model. J Am Stat Assoc. 1982, 77: 89-96.CrossRef
33.
go back to reference Binder DA, Roberts GR: Design-based and model-based methods for estimating model parameters. Analysis of Survey Data. Edited by: Chambers RL, Skinner CJ. 2003, Chichester: John Wiley & Sons Binder DA, Roberts GR: Design-based and model-based methods for estimating model parameters. Analysis of Survey Data. Edited by: Chambers RL, Skinner CJ. 2003, Chichester: John Wiley & Sons
34.
go back to reference Normand SL: Meta-analysis: formulating, evaluating, combining, and reporting. Stat Med. 1999, 18 (3): 321-359.CrossRefPubMed Normand SL: Meta-analysis: formulating, evaluating, combining, and reporting. Stat Med. 1999, 18 (3): 321-359.CrossRefPubMed
35.
go back to reference Berkey CS, Hoaglin DC, Mosteller F, Colditz GA: A random-effects regression model for meta-analysis. Stat Med. 1995, 14 (4): 395-411.CrossRefPubMed Berkey CS, Hoaglin DC, Mosteller F, Colditz GA: A random-effects regression model for meta-analysis. Stat Med. 1995, 14 (4): 395-411.CrossRefPubMed
36.
go back to reference Hartung J, Knapp G, Sinha BK: Statistical meta-analysis with applications. 2008, New York: John Wiley & SonsCrossRef Hartung J, Knapp G, Sinha BK: Statistical meta-analysis with applications. 2008, New York: John Wiley & SonsCrossRef
37.
go back to reference Jackson C, Best N, Richardson S: Hierarchical related regression for combining aggregate and individual data in studies of socio-economic disease risk factors. J R Stat Soc, Series A. 2008, 171: 159-178. Jackson C, Best N, Richardson S: Hierarchical related regression for combining aggregate and individual data in studies of socio-economic disease risk factors. J R Stat Soc, Series A. 2008, 171: 159-178.
38.
go back to reference van Houwelingen HC, Arends LR, Stijnen T: Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med. 2002, 21 (4): 589-624.CrossRefPubMed van Houwelingen HC, Arends LR, Stijnen T: Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med. 2002, 21 (4): 589-624.CrossRefPubMed
39.
go back to reference Stuck AE, Siu AL, Wieland GD, Adams J, Rubenstein LZ: Comprehensive geriatric assessment: a meta-analysis of controlled trials. Lancet. 1993, 342 (8878): 1032-1036.CrossRefPubMed Stuck AE, Siu AL, Wieland GD, Adams J, Rubenstein LZ: Comprehensive geriatric assessment: a meta-analysis of controlled trials. Lancet. 1993, 342 (8878): 1032-1036.CrossRefPubMed
40.
go back to reference DerSimonian R, Laird N: Meta-analysis in clinical trials. Control Clin Trials. 1986, 7 (3): 177-188.CrossRefPubMed DerSimonian R, Laird N: Meta-analysis in clinical trials. Control Clin Trials. 1986, 7 (3): 177-188.CrossRefPubMed
41.
go back to reference Morton SC, Adams JL, Suttorp MJ, Shekelle PG: Meta-regression approaches: What, Why, When, and How?. AHRQ Publication No 04-0033. 2004, Agency for Healthcare Research and Quality, Rockville (MD) Morton SC, Adams JL, Suttorp MJ, Shekelle PG: Meta-regression approaches: What, Why, When, and How?. AHRQ Publication No 04-0033. 2004, Agency for Healthcare Research and Quality, Rockville (MD)
42.
go back to reference Thompson SG: Controversies in meta-analysis: the case of the trials of serum cholesterol reduction. Stat Methods Med Res. 1993, 2 (2): 173-192.CrossRefPubMed Thompson SG: Controversies in meta-analysis: the case of the trials of serum cholesterol reduction. Stat Methods Med Res. 1993, 2 (2): 173-192.CrossRefPubMed
43.
go back to reference Hardy RJ, Thompson SG: Detecting and describing heterogeneity in meta-analysis. Stat Med. 1998, 17 (8): 841-856.CrossRefPubMed Hardy RJ, Thompson SG: Detecting and describing heterogeneity in meta-analysis. Stat Med. 1998, 17 (8): 841-856.CrossRefPubMed
44.
go back to reference Draper NR, Smith H: Applied regression analysis. 1998, New York, NY: John Wiley & Sons, IncCrossRef Draper NR, Smith H: Applied regression analysis. 1998, New York, NY: John Wiley & Sons, IncCrossRef
45.
go back to reference Akaike H: Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory: 1973. 1973, Budapest: Akademiai Kiado, 267-281. Akaike H: Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory: 1973. 1973, Budapest: Akademiai Kiado, 267-281.
46.
go back to reference Schwarz GE: Estimating the dimension of a model. Annals of Statistics. 1978, 6 (2): 461-464.CrossRef Schwarz GE: Estimating the dimension of a model. Annals of Statistics. 1978, 6 (2): 461-464.CrossRef
47.
go back to reference Harrell FE: Hmisc: Harrell Miscellaneous. R package version 3.8-3. 2010 Harrell FE: Hmisc: Harrell Miscellaneous. R package version 3.8-3. 2010
48.
go back to reference DebRoy S, Bivand R: foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase, … R package version 0.8-41. 2010 DebRoy S, Bivand R: foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase, … R package version 0.8-41. 2010
49.
go back to reference Bates D, Maechler M, Bolker B: lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-33. 2010 Bates D, Maechler M, Bolker B: lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-33. 2010
50.
go back to reference Louis TA, Zelterman D: Bayesian approaches to research synthesis. The Handbook of Research Synthesis. Edited by: Cooper H, Hedges LV. 2000, New York: Russel Sage Foundation, 411-422. Louis TA, Zelterman D: Bayesian approaches to research synthesis. The Handbook of Research Synthesis. Edited by: Cooper H, Hedges LV. 2000, New York: Russel Sage Foundation, 411-422.
51.
go back to reference Smith TC, Spiegelhalter DJ, Thomas A: Bayesian approaches to random-effects meta-analysis: a comparative study. Stat Med. 1995, 14 (24): 2685-2699.CrossRefPubMed Smith TC, Spiegelhalter DJ, Thomas A: Bayesian approaches to random-effects meta-analysis: a comparative study. Stat Med. 1995, 14 (24): 2685-2699.CrossRefPubMed
Metadata
Title
Fitting parametric random effects models in very large data sets with application to VHA national data
Authors
Mulugeta Gebregziabher
Leonard Egede
Gregory E Gilbert
Kelly Hunt
Paul J Nietert
Patrick Mauldin
Publication date
01-12-2012
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2012
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-12-163

Other articles of this Issue 1/2012

BMC Medical Research Methodology 1/2012 Go to the issue