Skip to main content
Top
Published in: BMC Public Health 1/2015

Open Access 01-12-2015 | Research article

Multiple imputation for non-response when estimating HIV prevalence using survey data

Authors: Amos Chinomona, Henry Mwambi

Published in: BMC Public Health | Issue 1/2015

Login to get access

Abstract

Background

Missing data are a common feature in many areas of research especially those involving survey data in biological, health and social sciences research. Most of the analyses of the survey data are done taking a complete-case approach, that is taking a list-wise deletion of all cases with missing values assuming that missing values are missing completely at random (MCAR). Methods that are based on substituting the missing values with single values such as the last value carried forward, the mean and regression predictions (single imputations) are also used. These methods often result in potential bias in estimates, in loss of statistical information and in loss of distributional relationships between variables. In addition, the strong MCAR assumption is not tenable in most practical instances.

Methods

Since missing data are a major problem in HIV research, the current research seeks to illustrate and highlight the strength of multiple imputation procedure, as a method of handling missing data, which comes from its ability to draw multiple values for the missing observations from plausible predictive distributions for them. This is particularly important in HIV research in sub-Saharan Africa where accurate collection of (complete) data is still a challenge. Furthermore the multiple imputation accounts for the uncertainty introduced by the very process of imputing values for the missing observations. In particular national and subgroup estimates of HIV prevalence in Zimbabwe were computed using multiply imputed data sets from the 2010–11 Zimbabwe Demographic and Health Surveys (2010–11 ZDHS) data. A survey logistic regression model for HIV prevalence and demographic and socio-economic variables was used as the substantive analysis model. The results for both the complete-case analysis and the multiple imputation analysis are presented and discussed.

Results

Across different subgroups of the population, the crude estimates of HIV prevalence are generally not identical but their variations are consistent between the two approaches (complete-case analysis and multiple imputation analysis). The estimates of standard errors under the multiple imputation are predominantly smaller, hence leading to narrower confidence intervals, than under the complete case analysis. Under the logistic regression adjusted odds ratios vary greatly between the two approaches. The model based confidence intervals for the adjusted odds ratios are wider under the multiple imputation which is indicative of the inclusion of a combined measure of the within and between imputation variability.

Conclusions

There is considerable variation between estimates obtained between the two approaches. The use of multiple imputations allows the uncertainty brought about by the imputation process to be measured. This consequently yields more reliable estimates of the parameters of interest and reduce the chances of declaring significant effects unnecessarily (type I error). In addition, the utilization of the powerful and flexible statistical computing packages in R enhances the computations.
Literature
1.
go back to reference Rubin DB. Multiple Imputation for Non-response in Surveys. New York, USA: John Wiley and Sons, Ltd; 1987. Rubin DB. Multiple Imputation for Non-response in Surveys. New York, USA: John Wiley and Sons, Ltd; 1987.
2.
go back to reference Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiology and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.CrossRefPubMedPubMedCentral Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiology and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.CrossRefPubMedPubMedCentral
3.
go back to reference Kalton G, Brick JM. Handling Missing Data in Survey Research. Stat Methods Med Res. 1996;5:215–38.CrossRefPubMed Kalton G, Brick JM. Handling Missing Data in Survey Research. Stat Methods Med Res. 1996;5:215–38.CrossRefPubMed
4.
go back to reference Little RJ, Rubin DB. Statistical Analysis with Missing Data. New York, USA: Wiley Series in Probability and Statistics; 1987. Little RJ, Rubin DB. Statistical Analysis with Missing Data. New York, USA: Wiley Series in Probability and Statistics; 1987.
5.
go back to reference Baraldi AN, Enders CK. An Introduction to Modern Missing Data Analysis. J Sch Psychol. 2010;48:5–37.CrossRefPubMed Baraldi AN, Enders CK. An Introduction to Modern Missing Data Analysis. J Sch Psychol. 2010;48:5–37.CrossRefPubMed
6.
go back to reference Lohr, S. Sampling: Design and Analysis, Second Edition. Boston, UK: Cengage Learning; 2010. Lohr, S. Sampling: Design and Analysis, Second Edition. Boston, UK: Cengage Learning; 2010.
7.
go back to reference Little RJ, Rubin DB. Statistical Analysis with Missing Data. J Educ. 1987;16:150–5. Little RJ, Rubin DB. Statistical Analysis with Missing Data. J Educ. 1987;16:150–5.
8.
go back to reference Schefer JL. Analysis of Incomplete Multivariate Data. New York, USA: Chapman and Hall; 1997. Schefer JL. Analysis of Incomplete Multivariate Data. New York, USA: Chapman and Hall; 1997.
9.
go back to reference Heeringa SG, West BT, Berglund PA. Applied Survey Data Analysis. New York, USA: Chapman and Hall/CRC Press; 2010. Heeringa SG, West BT, Berglund PA. Applied Survey Data Analysis. New York, USA: Chapman and Hall/CRC Press; 2010.
10.
go back to reference Pigott TD. A Review of Methods for Missing Data. Educ Res Eval. 2001;7:353–83.CrossRef Pigott TD. A Review of Methods for Missing Data. Educ Res Eval. 2001;7:353–83.CrossRef
11.
go back to reference Schefer JL, Olsen MK. Multiple Imputation for Multivariate Missing Data Problems: A Data Analyst's Perspective. Multivar Behav Res. 1998;33:545–71.CrossRef Schefer JL, Olsen MK. Multiple Imputation for Multivariate Missing Data Problems: A Data Analyst's Perspective. Multivar Behav Res. 1998;33:545–71.CrossRef
12.
go back to reference Schefer JL. Multiple Imputation: A Premier. Stat Methods Med Res. 1999;8:3–15.CrossRef Schefer JL. Multiple Imputation: A Premier. Stat Methods Med Res. 1999;8:3–15.CrossRef
13.
go back to reference Spratt M, Carpenter J, Sterne JAC, Carlin JB, Heron J, Henderson J, et al. Strategies for Multiple Imputation in Longitudinal Studies. Am J Epidemiol. 2010;172:478–87.CrossRefPubMed Spratt M, Carpenter J, Sterne JAC, Carlin JB, Heron J, Henderson J, et al. Strategies for Multiple Imputation in Longitudinal Studies. Am J Epidemiol. 2010;172:478–87.CrossRefPubMed
14.
go back to reference Lesaffre E, Lawson AB. Bayesian Biostatistics. West Sussex, UK: John Wiley and Sons, Ltd; 2012. Lesaffre E, Lawson AB. Bayesian Biostatistics. West Sussex, UK: John Wiley and Sons, Ltd; 2012.
15.
go back to reference Press SJ. Bayesian Statistics. New York, USA: John Wiley and Sons, Ltd; 1989. Press SJ. Bayesian Statistics. New York, USA: John Wiley and Sons, Ltd; 1989.
16.
go back to reference von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies; 2007;147(8):W168-W194. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies; 2007;147(8):W168-W194.
18.
go back to reference Nelder JA, Wedderburn RWM. Generalized Linear Models. J R Stat Soc Ser A. 1972;135:370–84.CrossRef Nelder JA, Wedderburn RWM. Generalized Linear Models. J R Stat Soc Ser A. 1972;135:370–84.CrossRef
19.
go back to reference McCullagh P, Nelder JA. Generalized Linear Models. London, UK: Chapman and Hall; 1989. McCullagh P, Nelder JA. Generalized Linear Models. London, UK: Chapman and Hall; 1989.
21.
go back to reference Gelman A, Hill J, Su Y, Yajima M. Multiple Imputation with Diagnostics (mi) in R: Opening windows into the Black Box. J Stat Softw. 2011;45:1–31. Gelman A, Hill J, Su Y, Yajima M. Multiple Imputation with Diagnostics (mi) in R: Opening windows into the Black Box. J Stat Softw. 2011;45:1–31.
22.
go back to reference van Buuren S, Groothuis-Oudshoorn K. Multiple Imputation by Chained Equations in R. J Stat Softw. 2011;45:1–67.CrossRef van Buuren S, Groothuis-Oudshoorn K. Multiple Imputation by Chained Equations in R. J Stat Softw. 2011;45:1–67.CrossRef
23.
go back to reference Hosmer DW, Lemeshow S. Applied Logistic Regression. New York, USA: Wiley Series in Probability and Statistics; 2000. Hosmer DW, Lemeshow S. Applied Logistic Regression. New York, USA: Wiley Series in Probability and Statistics; 2000.
24.
go back to reference Gelman A, Jakulin M, Pittau MG, Su Y. A Weakly Informative Default Prior Distribution for Logistic Regression Models. Ann Appl Stat. 2008;2:1360–83.CrossRef Gelman A, Jakulin M, Pittau MG, Su Y. A Weakly Informative Default Prior Distribution for Logistic Regression Models. Ann Appl Stat. 2008;2:1360–83.CrossRef
25.
go back to reference Lumley T. Complex Surveys: A guide to Analysis Using R. Washington: John Wiley and Sons Inc.; 2010.CrossRef Lumley T. Complex Surveys: A guide to Analysis Using R. Washington: John Wiley and Sons Inc.; 2010.CrossRef
26.
go back to reference Myer L, Kuhn L, Stein ZA, Wright TC, Denny L. Intravaginal practices, bacterial vaginosis, and women's susceptibility to HIV infections: epidemiological evidence and biological mechanisms. Lancet Infect Dis. 2003;12:786–94. Myer L, Kuhn L, Stein ZA, Wright TC, Denny L. Intravaginal practices, bacterial vaginosis, and women's susceptibility to HIV infections: epidemiological evidence and biological mechanisms. Lancet Infect Dis. 2003;12:786–94.
27.
go back to reference Coombs RW, Reichelerfer PS, Landay AL. Recent observations on HIV-type 1 infection in the genital tract of men and women. AIDS. 2003;4:455–80.CrossRef Coombs RW, Reichelerfer PS, Landay AL. Recent observations on HIV-type 1 infection in the genital tract of men and women. AIDS. 2003;4:455–80.CrossRef
Metadata
Title
Multiple imputation for non-response when estimating HIV prevalence using survey data
Authors
Amos Chinomona
Henry Mwambi
Publication date
01-12-2015
Publisher
BioMed Central
Published in
BMC Public Health / Issue 1/2015
Electronic ISSN: 1471-2458
DOI
https://doi.org/10.1186/s12889-015-2390-1

Other articles of this Issue 1/2015

BMC Public Health 1/2015 Go to the issue