Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2010

Open Access 01-12-2010 | Research article

A probit- log- skew-normal mixture model for repeated measures data with excess zeros, with application to a cohort study of paediatric respiratory symptoms

Authors: Sadia Mahmud, WY Wendy Lou, Neil W Johnston

Published in: BMC Medical Research Methodology | Issue 1/2010

Login to get access

Abstract

Background

A zero-inflated continuous outcome is characterized by occurrence of "excess" zeros that more than a single distribution can explain, with the positive observations forming a skewed distribution. Mixture models are employed for regression analysis of zero-inflated data. Moreover, for repeated measures zero-inflated data the clustering structure should also be modeled for an adequate analysis.

Methods

Diary of Asthma and Viral Infections Study (DAVIS) was a one year (2004) cohort study conducted at McMaster University to monitor viral infection and respiratory symptoms in children aged 5-11 years with and without asthma. Respiratory symptoms were recorded daily using either an Internet or paper-based diary. Changes in symptoms were assessed by study staff and led to collection of nasal fluid specimens for virological testing. The study objectives included investigating the response of respiratory symptoms to respiratory viral infection in children with and without asthma over a one year period. Due to sparse data daily respiratory symptom scores were aggregated into weekly average scores. More than 70% of the weekly average scores were zero, with the positive scores forming a skewed distribution. We propose a random effects probit/log-skew-normal mixture model to analyze the DAVIS data. The model parameters were estimated using a maximum marginal likelihood approach. A simulation study was conducted to assess the performance of the proposed mixture model if the underlying distribution of the positive response is different from log-skew normal.

Results

Viral infection status was highly significant in both probit and log-skew normal model components respectively. The probability of being symptom free was much lower for the week a child was viral positive relative to the week she/he was viral negative. The severity of the symptoms was also greater for the week a child was viral positive. The probability of being symptom free was smaller for asthmatics relative to non-asthmatics throughout the year, whereas there was no difference in the severity of the symptoms between the two groups.

Conclusions

A positive association was observed between viral infection status and both the probability of experiencing any respiratory symptoms, and their severity during the year. For DAVIS data the random effects probit -log skew normal model fits significantly better than the random effects probit -log normal model, endorsing our parametric choice for the model. The simulation study indicates that our proposed model seems to be robust to misspecification of the distribution of the positive skewed response.
Appendix
Available only for authorised users
Literature
1.
go back to reference Lambert D: Zero-Inflated Poisson Regression, With an Application to Defects in Manufacturing. Technometrics. 1992, 34: 1-14. 10.2307/1269547.CrossRef Lambert D: Zero-Inflated Poisson Regression, With an Application to Defects in Manufacturing. Technometrics. 1992, 34: 1-14. 10.2307/1269547.CrossRef
2.
go back to reference Hall DB: Zero-Inflated Poisson and Binomial Regression with Random Effects: A Case Study. Biometrics. 2000, 56: 1030-1039. 10.1111/j.0006-341X.2000.01030.x.CrossRefPubMed Hall DB: Zero-Inflated Poisson and Binomial Regression with Random Effects: A Case Study. Biometrics. 2000, 56: 1030-1039. 10.1111/j.0006-341X.2000.01030.x.CrossRefPubMed
3.
go back to reference Yau KKW, Lee AH: Zero-Inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine. 2001, 20: 2907-2920. 10.1002/sim.860.CrossRefPubMed Yau KKW, Lee AH: Zero-Inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine. 2001, 20: 2907-2920. 10.1002/sim.860.CrossRefPubMed
4.
go back to reference Hall DB, Berenhaut KS: Score tests for heterogeneity and overdispersion in zero-inflated Poisson and binomial regression models. The Canadian Journal of Statistics. 2002, 30: 415-430. 10.2307/3316145.CrossRef Hall DB, Berenhaut KS: Score tests for heterogeneity and overdispersion in zero-inflated Poisson and binomial regression models. The Canadian Journal of Statistics. 2002, 30: 415-430. 10.2307/3316145.CrossRef
5.
go back to reference Min Y, Agresti A: Random effect models for repeated measures of zero-inflated count data. Statistical Modelling. 2005, 5: 1-19. 10.1191/1471082X05st084oa.CrossRef Min Y, Agresti A: Random effect models for repeated measures of zero-inflated count data. Statistical Modelling. 2005, 5: 1-19. 10.1191/1471082X05st084oa.CrossRef
6.
go back to reference Lee AH, Wang K, Scott JA, Yau KKW, McLachlan GJ: Multi-level zero-Inflated Poisson regression modelling of correlated count data with excess zeros. Statistical Methods in medical Research. 2006, 15: 47-61. 10.1191/0962280206sm429oa.CrossRefPubMed Lee AH, Wang K, Scott JA, Yau KKW, McLachlan GJ: Multi-level zero-Inflated Poisson regression modelling of correlated count data with excess zeros. Statistical Methods in medical Research. 2006, 15: 47-61. 10.1191/0962280206sm429oa.CrossRefPubMed
7.
go back to reference Ma R, Hasan MT, Sneddon G: Modelling heterogeneity in clustered count data with extra zeros using compound Poisson random effect. Statistics in Medicine. 2009, 28: 2356-69. 10.1002/sim.3619.CrossRefPubMed Ma R, Hasan MT, Sneddon G: Modelling heterogeneity in clustered count data with extra zeros using compound Poisson random effect. Statistics in Medicine. 2009, 28: 2356-69. 10.1002/sim.3619.CrossRefPubMed
8.
go back to reference Min Y, Agresti A: Modeling nonnegative data with clumping at zero: A survey. JIRSS. 2002, 1: 7-33. Min Y, Agresti A: Modeling nonnegative data with clumping at zero: A survey. JIRSS. 2002, 1: 7-33.
9.
go back to reference Tobin J: Estimation of relationships for limited dependent variables. Econometrica. 1958, 26: 24-36. 10.2307/1907382.CrossRef Tobin J: Estimation of relationships for limited dependent variables. Econometrica. 1958, 26: 24-36. 10.2307/1907382.CrossRef
10.
go back to reference Cragg JG: Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica. 1971, 39: 829-844. 10.2307/1909582.CrossRef Cragg JG: Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica. 1971, 39: 829-844. 10.2307/1909582.CrossRef
11.
go back to reference Duan N, Manning WG, Morris CN, Newhourse JP: A comparison of alternative models for the demand of medical care. Journal of Business and Economic Statistics. 1983, 1: 115-126. 10.2307/1391852. Duan N, Manning WG, Morris CN, Newhourse JP: A comparison of alternative models for the demand of medical care. Journal of Business and Economic Statistics. 1983, 1: 115-126. 10.2307/1391852.
12.
go back to reference Moulton LH, Halsey NA: A Mixture Model with Detection Limits for Regression Analyses of Antibody Response to Vaccine. Biometrics. 1995, 51: 1570-1578. 10.2307/2533289.CrossRefPubMed Moulton LH, Halsey NA: A Mixture Model with Detection Limits for Regression Analyses of Antibody Response to Vaccine. Biometrics. 1995, 51: 1570-1578. 10.2307/2533289.CrossRefPubMed
13.
go back to reference Heckman J: Shadow prices, market wages, and labor supply. Econometrica. 1974, 42: 679-694. 10.2307/1913937.CrossRef Heckman J: Shadow prices, market wages, and labor supply. Econometrica. 1974, 42: 679-694. 10.2307/1913937.CrossRef
14.
go back to reference Heckman J: Sample selection bias as a specification error. Econometrica. 1979, 47: 153-161. 10.2307/1912352.CrossRef Heckman J: Sample selection bias as a specification error. Econometrica. 1979, 47: 153-161. 10.2307/1912352.CrossRef
15.
go back to reference Duan N, Manning WG, Morris CN, Newhourse JP: Choosing between the sample selection model and the multi-part model. Journal of Business and Economic Statistics. 1984, 2: 283-289. 10.2307/1391711. Duan N, Manning WG, Morris CN, Newhourse JP: Choosing between the sample selection model and the multi-part model. Journal of Business and Economic Statistics. 1984, 2: 283-289. 10.2307/1391711.
16.
go back to reference Olsen , Schafer : A two-part random-effects model for semicontinuous longitudinal data. Journal of American Statistical Association. 2001, 96: 730-745. 10.1198/016214501753168389.CrossRef Olsen , Schafer : A two-part random-effects model for semicontinuous longitudinal data. Journal of American Statistical Association. 2001, 96: 730-745. 10.1198/016214501753168389.CrossRef
17.
go back to reference Tooze JA, Grunwald GK, Jones RH: Analysis of repeated measures data with clumping at zero. Statistical Methods in medical Research. 2002, 11: 341-355. 10.1191/0962280202sm291ra.CrossRefPubMed Tooze JA, Grunwald GK, Jones RH: Analysis of repeated measures data with clumping at zero. Statistical Methods in medical Research. 2002, 11: 341-355. 10.1191/0962280202sm291ra.CrossRefPubMed
18.
go back to reference Li N, Elashoff DA, Robbins WA, Xun L: A hierarchical zero-inflated log-normal model for skewed responses. Statistical Methods in medical Research. 2008, 00: 1-15. Li N, Elashoff DA, Robbins WA, Xun L: A hierarchical zero-inflated log-normal model for skewed responses. Statistical Methods in medical Research. 2008, 00: 1-15.
19.
go back to reference Liu L, Ma JZ, Johnson BA: A multi-level two-part random effects model, with application to an alcohol-dependence study. Statistics in Medicine. 2008, 27: 3528-3539. 10.1002/sim.3205.CrossRefPubMed Liu L, Ma JZ, Johnson BA: A multi-level two-part random effects model, with application to an alcohol-dependence study. Statistics in Medicine. 2008, 27: 3528-3539. 10.1002/sim.3205.CrossRefPubMed
20.
go back to reference Su L, Tom BDM, Farewell VT: Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics. 2009, 10: 374-389. 10.1093/biostatistics/kxn044.CrossRefPubMedPubMedCentral Su L, Tom BDM, Farewell VT: Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics. 2009, 10: 374-389. 10.1093/biostatistics/kxn044.CrossRefPubMedPubMedCentral
21.
go back to reference Chai HS, Bailey KR: Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Statistics in Medicine. 2008, 27: 3643-55. 10.1002/sim.3210.CrossRefPubMedPubMedCentral Chai HS, Bailey KR: Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Statistics in Medicine. 2008, 27: 3643-55. 10.1002/sim.3210.CrossRefPubMedPubMedCentral
22.
go back to reference Tooze JA, Midthune D, Dodd KW, Freedman LS, Krebs-Smith SM, Subar AF, Guenther PM: Journal of American Dietetic Association. 2006, 106: 1575-87. 10.1016/j.jada.2006.07.003.CrossRef Tooze JA, Midthune D, Dodd KW, Freedman LS, Krebs-Smith SM, Subar AF, Guenther PM: Journal of American Dietetic Association. 2006, 106: 1575-87. 10.1016/j.jada.2006.07.003.CrossRef
23.
go back to reference Kipnis V, Midthune D, Buckman DW, Dodd KW, Guenther PM, Krebs-Smith SM, Subar AF, Tooze JA, Carroll RJ, Freedman LS: Modeling data with excess zeros and measurement error: Application to Evaluating relationships between episodically consumed foods and health outcomes. Biometrics. 2009, 65: 1003-10. 10.1111/j.1541-0420.2009.01223.x.CrossRefPubMedPubMedCentral Kipnis V, Midthune D, Buckman DW, Dodd KW, Guenther PM, Krebs-Smith SM, Subar AF, Tooze JA, Carroll RJ, Freedman LS: Modeling data with excess zeros and measurement error: Application to Evaluating relationships between episodically consumed foods and health outcomes. Biometrics. 2009, 65: 1003-10. 10.1111/j.1541-0420.2009.01223.x.CrossRefPubMedPubMedCentral
24.
go back to reference Denise JE, Mei HHW: Two new unconstrained optimization algorithms which use function and gradient values. Journal of Optimization Theory and Applications. 1979, 28: 453-482. 10.1007/BF00932218.CrossRef Denise JE, Mei HHW: Two new unconstrained optimization algorithms which use function and gradient values. Journal of Optimization Theory and Applications. 1979, 28: 453-482. 10.1007/BF00932218.CrossRef
25.
go back to reference Gay DM: Subroutines for unconstrained minimization using a model/trust region approach. ACM Transactions on Mathematical Software. 1983, 9: 503-524. 10.1145/356056.356066. ALGORITHM 611CrossRef Gay DM: Subroutines for unconstrained minimization using a model/trust region approach. ACM Transactions on Mathematical Software. 1983, 9: 503-524. 10.1145/356056.356066. ALGORITHM 611CrossRef
26.
go back to reference Dean EJ: A model trust-region modification of Newton's method for non-linear two-point boundary value problems. Journal of Optimization Theory and Applications. 1992, 75: 297-312. 10.1007/BF00941469.CrossRef Dean EJ: A model trust-region modification of Newton's method for non-linear two-point boundary value problems. Journal of Optimization Theory and Applications. 1992, 75: 297-312. 10.1007/BF00941469.CrossRef
27.
go back to reference Johnston SL, Pattemore PK, Sanderson G, Smith S, Lampe F, Josephs L, Symington P, O'Toole S, Myint SH, Tyrrell DAJ, Holgate ST: Community study of role of viral infections in exacerbations of asthma in 9-11 year old children. BMJ. 1995, 310: 1225-1229.CrossRefPubMedPubMedCentral Johnston SL, Pattemore PK, Sanderson G, Smith S, Lampe F, Josephs L, Symington P, O'Toole S, Myint SH, Tyrrell DAJ, Holgate ST: Community study of role of viral infections in exacerbations of asthma in 9-11 year old children. BMJ. 1995, 310: 1225-1229.CrossRefPubMedPubMedCentral
28.
go back to reference Rakes GP, Arruda E, Ingram JM, Hoover GE, Zambrano JC, Hayden FG, Platts-Mills TA, Heymann PW: Rhinovirus and respiratory syncytial virus in wheezing children requiring emergency hospital care. Am J Respir Crit Care Med. 1999, 159: 785-790.CrossRefPubMed Rakes GP, Arruda E, Ingram JM, Hoover GE, Zambrano JC, Hayden FG, Platts-Mills TA, Heymann PW: Rhinovirus and respiratory syncytial virus in wheezing children requiring emergency hospital care. Am J Respir Crit Care Med. 1999, 159: 785-790.CrossRefPubMed
29.
go back to reference Heymann PW, Carper HT, Murphy DD, Platts-Mills TA, Patrie J, McLaughlin AP, Erwin EA, Shaker MS, Hellems M, Peerzada J, Hayden FG, Hatley TK, Chamberlain R: Viral infections in relation to age, atopy and season of admission among children hospitalized for wheezing. J Allergy Clin Immunol. 2004, 114: 239-247. 10.1016/j.jaci.2004.04.006.CrossRefPubMed Heymann PW, Carper HT, Murphy DD, Platts-Mills TA, Patrie J, McLaughlin AP, Erwin EA, Shaker MS, Hellems M, Peerzada J, Hayden FG, Hatley TK, Chamberlain R: Viral infections in relation to age, atopy and season of admission among children hospitalized for wheezing. J Allergy Clin Immunol. 2004, 114: 239-247. 10.1016/j.jaci.2004.04.006.CrossRefPubMed
30.
go back to reference Johnston NW, Johnston SL, Dai J, Norman GR, Sears MR: The September epidemic of asthma exacerbations: School children as disease vectors. J Allergy Clin Immunol. 2006, 117: 557-62. 10.1016/j.jaci.2005.11.034.CrossRefPubMed Johnston NW, Johnston SL, Dai J, Norman GR, Sears MR: The September epidemic of asthma exacerbations: School children as disease vectors. J Allergy Clin Immunol. 2006, 117: 557-62. 10.1016/j.jaci.2005.11.034.CrossRefPubMed
31.
go back to reference Chauhan AJ, Inskip HM, Linaker CH, Smith S, Schreiber J, Johnston SL, Holgate ST: Personal exposure to nitrogen dioxide (NO2) and the severity of virus-induced asthma in children. Lancet. 2003, 361: 1939-44. 10.1016/S0140-6736(03)13582-9.CrossRefPubMed Chauhan AJ, Inskip HM, Linaker CH, Smith S, Schreiber J, Johnston SL, Holgate ST: Personal exposure to nitrogen dioxide (NO2) and the severity of virus-induced asthma in children. Lancet. 2003, 361: 1939-44. 10.1016/S0140-6736(03)13582-9.CrossRefPubMed
32.
go back to reference Hastie H, Tibshirani R: Generalized Additive Models. Statistical Science. 1986, 1: 297-310. 10.1214/ss/1177013604.CrossRef Hastie H, Tibshirani R: Generalized Additive Models. Statistical Science. 1986, 1: 297-310. 10.1214/ss/1177013604.CrossRef
33.
go back to reference Arrellano-Valle RB, Azzalini A: The centered parametrization for the multivariate skew-normal distribution. Journal of Multivariate Analysis. 2008, 99: 1362-1382. 10.1016/j.jmva.2008.01.020.CrossRef Arrellano-Valle RB, Azzalini A: The centered parametrization for the multivariate skew-normal distribution. Journal of Multivariate Analysis. 2008, 99: 1362-1382. 10.1016/j.jmva.2008.01.020.CrossRef
34.
go back to reference Sahu SK, Dey DK, Branco MD: A new class of multivariate skew distributions with applications to Bayesian regression models. The Canadian Journal of Statistics. 2003, 31: 129-150. 10.2307/3316064.CrossRef Sahu SK, Dey DK, Branco MD: A new class of multivariate skew distributions with applications to Bayesian regression models. The Canadian Journal of Statistics. 2003, 31: 129-150. 10.2307/3316064.CrossRef
35.
go back to reference Alonso A, Litière S, Molenberghs G: A family of tests to detect misspecifications in the random-effects structure of generalized linear mixed models. Computational Statistics and Data Analysis. 2008, 52: 4474-86. 10.1016/j.csda.2008.02.033.CrossRef Alonso A, Litière S, Molenberghs G: A family of tests to detect misspecifications in the random-effects structure of generalized linear mixed models. Computational Statistics and Data Analysis. 2008, 52: 4474-86. 10.1016/j.csda.2008.02.033.CrossRef
36.
go back to reference Diggle PJ, Liang KY, Zeger SL: Analysis of Longitudinal Data. 1994, New York: Oxford University Press Diggle PJ, Liang KY, Zeger SL: Analysis of Longitudinal Data. 1994, New York: Oxford University Press
37.
go back to reference Grunwald GK, Jones RH: Markov models for time series with mixed distribution. Environmetrics. 2000, 11: 327-339. 10.1002/(SICI)1099-095X(200005/06)11:3<327::AID-ENV412>3.0.CO;2-R.CrossRef Grunwald GK, Jones RH: Markov models for time series with mixed distribution. Environmetrics. 2000, 11: 327-339. 10.1002/(SICI)1099-095X(200005/06)11:3<327::AID-ENV412>3.0.CO;2-R.CrossRef
Metadata
Title
A probit- log- skew-normal mixture model for repeated measures data with excess zeros, with application to a cohort study of paediatric respiratory symptoms
Authors
Sadia Mahmud
WY Wendy Lou
Neil W Johnston
Publication date
01-12-2010
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2010
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-10-55

Other articles of this Issue 1/2010

BMC Medical Research Methodology 1/2010 Go to the issue

Research article

A nomogram for Pvalues