Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2021

Open Access 01-12-2021 | Research

Analysis of zero inflated dichotomous variables from a Bayesian perspective: application to occupational health

Authors: David Moriña, Pedro Puig, Albert Navarro

Published in: BMC Medical Research Methodology | Issue 1/2021

Login to get access

Abstract

Background

Zero-inflated models are generally aimed to addressing the problem that arises from having two different sources that generate the zero values observed in a distribution. In practice, this is due to the fact that the population studied actually consists of two subpopulations: one in which the value zero is by default (structural zero) and the other is circumstantial (sample zero).

Methods

This work proposes a new methodology to fit zero inflated Bernoulli data from a Bayesian approach, able to distinguish between two potential sources of zeros (structural and non-structural).

Results

The proposed methodology performance has been evaluated through a comprehensive simulation study, and it has been compiled as an R package freely available to the community. Its usage is illustrated by means of a real example from the field of occupational health as the phenomenon of sickness presenteeism, in which it is reasonable to think that some individuals will never be at risk of suffering it because they have not been sick in the period of study (structural zeros). Without separating structural and non-structural zeros one would be studying jointly the general health status and the presenteeism itself, and therefore obtaining potentially biased estimates as the phenomenon is being implicitly underestimated by diluting it into the general health status.

Conclusions

The proposed methodology is able to distinguish two different sources of zeros (structural and non-structural) from dichotomous data with or without covariates in a Bayesian framework, and has been made available to any interested researcher in the form of the bayesZIB R package (https://​cran.​r-project.​org/​package=​bayesZIB).
Appendix
Available only for authorised users
Literature
4.
go back to reference Abiodun G, Makinde O, Adeola A, Njabo K, Witbooi P, Djidjou-Demasse R, et al. A dynamical and zero-inflated negative binomial regression modelling of malaria incidence in Limpopo Province, South Africa. Int J Environ Res Public Health. 2019;16(11). https://doi.org/10.3390/IJERPH16112000. Abiodun G, Makinde O, Adeola A, Njabo K, Witbooi P, Djidjou-Demasse R, et al. A dynamical and zero-inflated negative binomial regression modelling of malaria incidence in Limpopo Province, South Africa. Int J Environ Res Public Health. 2019;16(11). https://​doi.​org/​10.​3390/​IJERPH16112000.
7.
go back to reference Paulo Favero L, de Freitas Souza R, Belfiore P, Luiz Corrêa H, Haddad MF, Paulo L, et al. Count data regression analysis: concepts, overdispersion detection, zero-inflation identification, and applications with R. Pract Assess Res Eval. 2021;26. https://doi.org/10.7275/44nn-cj68. Paulo Favero L, de Freitas Souza R, Belfiore P, Luiz Corrêa H, Haddad MF, Paulo L, et al. Count data regression analysis: concepts, overdispersion detection, zero-inflation identification, and applications with R. Pract Assess Res Eval. 2021;26. https://​doi.​org/​10.​7275/​44nn-cj68.
14.
go back to reference R Core Team: R: A Language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria (2021). R Foundation for statistical computing. https://www.R-project.org/. R Core Team: R: A Language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria (2021). R Foundation for statistical computing. https://​www.​R-project.​org/​.
15.
go back to reference Moriña Soler D, Puig P, Navarro A. bayesZIB: Bayesian zero-inflated Bernoulli regression model. In: R package version 0.0.2; 2021. Moriña Soler D, Puig P, Navarro A. bayesZIB: Bayesian zero-inflated Bernoulli regression model. In: R package version 0.0.2; 2021.
16.
go back to reference Zeileis A, Kleiber C, Jackman S. Regression models for count data in R. J Stat Softw. 2008;27(8)1–25. Zeileis A, Kleiber C, Jackman S. Regression models for count data in R. J Stat Softw. 2008;27(8)1–25.
21.
go back to reference Navarro A, Salas‐Nicás S, Llorens C, Moncada S, Molinero‐Ruíz E, Moriña D. Sickness presenteeism: are we sure about what we are studying? A research based on a literature review and an empirical illustration. Am J Ind Med. 2019;62(7). https://doi.org/10.1002/ajim.22982. Navarro A, Salas‐Nicás S, Llorens C, Moncada S, Molinero‐Ruíz E, Moriña D. Sickness presenteeism: are we sure about what we are studying? A research based on a literature review and an empirical illustration. Am J Ind Med. 2019;62(7). https://​doi.​org/​10.​1002/​ajim.​22982.
Metadata
Title
Analysis of zero inflated dichotomous variables from a Bayesian perspective: application to occupational health
Authors
David Moriña
Pedro Puig
Albert Navarro
Publication date
01-12-2021
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2021
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-021-01427-2

Other articles of this Issue 1/2021

BMC Medical Research Methodology 1/2021 Go to the issue