Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2015

Open Access 01-12-2015 | Research article

Fuzzy association rule mining and classification for the prediction of malaria in South Korea

Authors: Anna L. Buczak, Benjamin Baugher, Erhan Guven, Liane C. Ramac-Thomas, Yevgeniy Elbert, Steven M. Babin, Sheri H. Lewis

Published in: BMC Medical Informatics and Decision Making | Issue 1/2015

Login to get access

Abstract

Background

Malaria is the world’s most prevalent vector-borne disease. Accurate prediction of malaria outbreaks may lead to public health interventions that mitigate disease morbidity and mortality.

Methods

We describe an application of a method for creating prediction models utilizing Fuzzy Association Rule Mining to extract relationships between epidemiological, meteorological, climatic, and socio-economic data from Korea. These relationships are in the form of rules, from which the best set of rules is automatically chosen and forms a classifier. Two classifiers have been built and their results fused to become a malaria prediction model. Future malaria cases are predicted as LOW, MEDIUM or HIGH, where these classes are defined as a total of 0–2, 3–16, and above 17 cases, respectively, for a region in South Korea during a two-week period. Based on user recommendations, HIGH is considered an outbreak.

Results

Model accuracy is described by Positive Predictive Value (PPV), Sensitivity, and F-score for each class, computed on test data not previously used to develop the model. For predictions made 7–8 weeks in advance, model PPV and Sensitivity are 0.842 and 0.681, respectively, for the HIGH classes. The F0.5 and F3 scores (which combine PPV and Sensitivity) are 0.804 and 0.694, respectively, for the HIGH classes. The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class. For the MEDIUM class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3.

Conclusions

A previously described method for creating disease prediction models has been modified and extended to build models for predicting malaria. In addition, some new input variables were used, including indicators of intervention measures. The South Korea malaria prediction models predict LOW, MEDIUM or HIGH cases 7–8 weeks in the future. This paper demonstrates that our data driven approach can be used for the prediction of different diseases.
Literature
3.
go back to reference Sinka M, Bangs M, Manguin S. Chareonviriyaphap, Patil A, Temperley W, Gething P, Elyazar R, Kabaria C, Harbach R, Hay S: The dominant Anopheles vectors of human malaria in the Asia-Pacific region: occurrence data, distribution maps and bionomic précis. Parasit Vectors. 2011;4:89.CrossRefPubMedPubMedCentral Sinka M, Bangs M, Manguin S. Chareonviriyaphap, Patil A, Temperley W, Gething P, Elyazar R, Kabaria C, Harbach R, Hay S: The dominant Anopheles vectors of human malaria in the Asia-Pacific region: occurrence data, distribution maps and bionomic précis. Parasit Vectors. 2011;4:89.CrossRefPubMedPubMedCentral
4.
go back to reference Kitron U, Pener H, Costin C, Orshan L, Greenberg Z, Shalom U. Geographic information system in malaria surveillance: mosquito breeding and imported cases in Israel, 1992. Am J Trop Med Hyg. 1994;50(5):550–6.CrossRefPubMed Kitron U, Pener H, Costin C, Orshan L, Greenberg Z, Shalom U. Geographic information system in malaria surveillance: mosquito breeding and imported cases in Israel, 1992. Am J Trop Med Hyg. 1994;50(5):550–6.CrossRefPubMed
5.
go back to reference Cho S-H, Lee H-W, Shin E-H, Lee H-I, Lee W-G, Kim C-H, et al. A mark-release-recapture experiment with Anopheles sinensis in the northern part of Gyeongg-do, Korea. Korean J Parasitol. 2002;40(3):139–48.CrossRefPubMedPubMedCentral Cho S-H, Lee H-W, Shin E-H, Lee H-I, Lee W-G, Kim C-H, et al. A mark-release-recapture experiment with Anopheles sinensis in the northern part of Gyeongg-do, Korea. Korean J Parasitol. 2002;40(3):139–48.CrossRefPubMedPubMedCentral
8.
go back to reference Fukuda M, Klein T, Kochel T, Quandelacy T, Smith B, Villinski J, et al. Malaria and other vector-borne infection surveillance in the U.S. Department of Defense Armed Forces Health Surveillance Center-Global Emerging Infections Surveillance program: review of 2009 accomplishments. BMC Public Health. 2011;11 Suppl 2:59.CrossRef Fukuda M, Klein T, Kochel T, Quandelacy T, Smith B, Villinski J, et al. Malaria and other vector-borne infection surveillance in the U.S. Department of Defense Armed Forces Health Surveillance Center-Global Emerging Infections Surveillance program: review of 2009 accomplishments. BMC Public Health. 2011;11 Suppl 2:59.CrossRef
9.
go back to reference Nkya T, Akhouayri I, Poupardin R, Batengana B, Mosha F, Magesa S, et al. Insecticide resistance mechanisms associated with different environments in the malaria vector Anopheles gambiae: a case study in Tanzania. Malar J. 2014;13:38.CrossRef Nkya T, Akhouayri I, Poupardin R, Batengana B, Mosha F, Magesa S, et al. Insecticide resistance mechanisms associated with different environments in the malaria vector Anopheles gambiae: a case study in Tanzania. Malar J. 2014;13:38.CrossRef
10.
go back to reference Robert L, Santos-Ciminera P, Andre R, Schulz G, Lawyer P, NIgro J, et al. Plasmodium-infected Anopheles mosquitoes collected in Virginia and Maryland following local transmission of Plasmodium vivax malaria in Loudoun County, Virginia. J Am Mosq Control Assoc. 2005;21(2):187–93.CrossRefPubMed Robert L, Santos-Ciminera P, Andre R, Schulz G, Lawyer P, NIgro J, et al. Plasmodium-infected Anopheles mosquitoes collected in Virginia and Maryland following local transmission of Plasmodium vivax malaria in Loudoun County, Virginia. J Am Mosq Control Assoc. 2005;21(2):187–93.CrossRefPubMed
13.
go back to reference Linthicum K, Anyamba A, Killenbeck B, Lee W-J, Lee H, Klein T, et al. Association of temperature and historical dynamics of malaria in the Republic of Korea, including reemergence in 1993. Mil Med. 2014;179(7):806–14.CrossRefPubMed Linthicum K, Anyamba A, Killenbeck B, Lee W-J, Lee H, Klein T, et al. Association of temperature and historical dynamics of malaria in the Republic of Korea, including reemergence in 1993. Mil Med. 2014;179(7):806–14.CrossRefPubMed
14.
go back to reference Benali A, Nunes J, Freitas F, Sousa C, Novo M, Lourenco P, et al. Satellite-derived estimation of environmental suitability for malaria vector development in Portugal. Remote Sens Environ. 2014;145:116–30.CrossRef Benali A, Nunes J, Freitas F, Sousa C, Novo M, Lourenco P, et al. Satellite-derived estimation of environmental suitability for malaria vector development in Portugal. Remote Sens Environ. 2014;145:116–30.CrossRef
15.
go back to reference Machault V, Vignolles C, Pages F, Gadiaga L, Tourre Y, Gaye A, et al. Risk mapping of Anopheles gambiae s.l. densities using remotely-sensed environmental and meteorological data in an urban area: Dakar, Senegal. PLoS ONE. 2012;7(11):e50674.CrossRefPubMedPubMedCentral Machault V, Vignolles C, Pages F, Gadiaga L, Tourre Y, Gaye A, et al. Risk mapping of Anopheles gambiae s.l. densities using remotely-sensed environmental and meteorological data in an urban area: Dakar, Senegal. PLoS ONE. 2012;7(11):e50674.CrossRefPubMedPubMedCentral
16.
go back to reference Kitron U. Risk maps: transmission and burden of vector-borne diseases. Parasitol Today. 2000;16(8):324–5.CrossRefPubMed Kitron U. Risk maps: transmission and burden of vector-borne diseases. Parasitol Today. 2000;16(8):324–5.CrossRefPubMed
17.
go back to reference Corley C, Pullum L, Hartley D, Benedum C, Noonan C, Rabinowitz P, et al. Disease prediction models and operational readiness. PLoS ONE. 2014;9(3):e91989.CrossRefPubMedPubMedCentral Corley C, Pullum L, Hartley D, Benedum C, Noonan C, Rabinowitz P, et al. Disease prediction models and operational readiness. PLoS ONE. 2014;9(3):e91989.CrossRefPubMedPubMedCentral
18.
go back to reference Abeku T, De Vlas S, Borsboom G, Tadege A, Gebreyesus Y, Gebreyohannes H, et al. Effects of meteorological factors on epidemic malaria in Ethiopia: a statistical modeling approach based on theoretical reasoning. Parasitology. 2004;128:585–93.CrossRefPubMed Abeku T, De Vlas S, Borsboom G, Tadege A, Gebreyesus Y, Gebreyohannes H, et al. Effects of meteorological factors on epidemic malaria in Ethiopia: a statistical modeling approach based on theoretical reasoning. Parasitology. 2004;128:585–93.CrossRefPubMed
20.
21.
go back to reference Buczak A, Koshute P, Babin S, Feighner B, Lewis S. A data-driven epidemiological prediction methods for dengue outbreaks using local and remote sensing data. BMC Med Inform Decis Mak. 2012;12:124.CrossRefPubMedPubMedCentral Buczak A, Koshute P, Babin S, Feighner B, Lewis S. A data-driven epidemiological prediction methods for dengue outbreaks using local and remote sensing data. BMC Med Inform Decis Mak. 2012;12:124.CrossRefPubMedPubMedCentral
22.
go back to reference Buczak A, Baugher B, Babin S, Ramac-Thomas L, Guven E, Elbert Y, et al. Prediction of high incidence of dengue in the Philippines. PLoS Negl Trop Dis. 2014;8(4):e2771.CrossRefPubMedPubMedCentral Buczak A, Baugher B, Babin S, Ramac-Thomas L, Guven E, Elbert Y, et al. Prediction of high incidence of dengue in the Philippines. PLoS Negl Trop Dis. 2014;8(4):e2771.CrossRefPubMedPubMedCentral
27.
go back to reference US National Aeronautics and Space Administration (NASA) Goddard Earth Sciences Data and Information Services Center. Mirador Earth Science Data Search Tool. 2014. Available at http://mirador.gsfc.nasa.gov/ (accessed 26 July 2014). US National Aeronautics and Space Administration (NASA) Goddard Earth Sciences Data and Information Services Center. Mirador Earth Science Data Search Tool. 2014. Available at http://​mirador.​gsfc.​nasa.​gov/​ (accessed 26 July 2014).
29.
go back to reference Ferreira N, Ferreira L, Huete A. Assessing the response of the MODIS vegetation indices to landscape disturbance in the forested areas of the legal Brazilian Amazon. Int J Remote Sens. 2010;31(3):745–59.CrossRef Ferreira N, Ferreira L, Huete A. Assessing the response of the MODIS vegetation indices to landscape disturbance in the forested areas of the legal Brazilian Amazon. Int J Remote Sens. 2010;31(3):745–59.CrossRef
32.
go back to reference Yim S-Y, Wang B, Kwon MH. Interdecadal change in the controlling mechanisms for East Asian early summer rainfall variations around the mid-1990s. Climate Dynam. 2014;42.5-6:1325–33. doi:10.1007/s00382-013-1760-6.CrossRef Yim S-Y, Wang B, Kwon MH. Interdecadal change in the controlling mechanisms for East Asian early summer rainfall variations around the mid-1990s. Climate Dynam. 2014;42.5-6:1325–33. doi:10.​1007/​s00382-013-1760-6.CrossRef
33.
go back to reference Kuok CM, Fu A, Wong MH. Mining fuzzy association rules in databases, vol. 27(1). New York, NY: ACM SIGMOD Record; 1998. p. 41–6. Kuok CM, Fu A, Wong MH. Mining fuzzy association rules in databases, vol. 27(1). New York, NY: ACM SIGMOD Record; 1998. p. 41–6.
34.
go back to reference Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. Washington, DC: In Proc. of the ACM SIGMOD Int’l Conference on Management of Data; 1993. p. 207–16. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. Washington, DC: In Proc. of the ACM SIGMOD Int’l Conference on Management of Data; 1993. p. 207–16.
35.
go back to reference Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. In: Proceedings of 4th International Conference on Knowledge Discovery Data Mining (KDD). New York, NY: AAAI Press; 1998. p. 80–6. ISBN 1-57735-070-7. Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. In: Proceedings of 4th International Conference on Knowledge Discovery Data Mining (KDD). New York, NY: AAAI Press; 1998. p. 80–6. ISBN 1-57735-070-7.
36.
go back to reference Cawley GC, Talbot NLC. On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res. 2010;11:2079–107. Cawley GC, Talbot NLC. On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res. 2010;11:2079–107.
37.
go back to reference Quinlan JR. C4.5: Programs for machine learning. San Francisco: Morgan Kaufman; 1993. ISBN 1-55860-238-0. Quinlan JR. C4.5: Programs for machine learning. San Francisco: Morgan Kaufman; 1993. ISBN 1-55860-238-0.
38.
go back to reference Powers DMW. Evaluation: from precision, recall, and f-measure to ROC, informedness, markedness, and correlation. J Mach Learn Tech. 2011;2(1):37–83. Powers DMW. Evaluation: from precision, recall, and f-measure to ROC, informedness, markedness, and correlation. J Mach Learn Tech. 2011;2(1):37–83.
39.
go back to reference Lodhi H, Muggleton S, Sternberg MJE. Learning large margin first order decision lists for multi-class classification. In: Discovery Science. Berlin Heidelberg: Springer; 2009. p. 168–83.CrossRef Lodhi H, Muggleton S, Sternberg MJE. Learning large margin first order decision lists for multi-class classification. In: Discovery Science. Berlin Heidelberg: Springer; 2009. p. 168–83.CrossRef
41.
go back to reference Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106. Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106.
42.
go back to reference Quinlan JR. Simplifying decision trees. Int J Man Mach Stud. 1987;27:221–34.CrossRef Quinlan JR. Simplifying decision trees. Int J Man Mach Stud. 1987;27:221–34.CrossRef
44.
go back to reference Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. 3rd ed. San Francisco: Morgan Kaufmann; 2011. Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. 3rd ed. San Francisco: Morgan Kaufmann; 2011.
45.
go back to reference Vapnik V. The Nature of Statistical Learning Theory, 2nd Edition, Springer-Verlag, New York, NY, USA, 2000. Vapnik V. The Nature of Statistical Learning Theory, 2nd Edition, Springer-Verlag, New York, NY, USA, 2000.
46.
go back to reference Chatfield C. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton: Chapman and Hall/CRC Texts in Statistical Science; 2013. Chatfield C. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton: Chapman and Hall/CRC Texts in Statistical Science; 2013.
47.
go back to reference Wilson EB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927;22:209–12.CrossRef Wilson EB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927;22:209–12.CrossRef
Metadata
Title
Fuzzy association rule mining and classification for the prediction of malaria in South Korea
Authors
Anna L. Buczak
Benjamin Baugher
Erhan Guven
Liane C. Ramac-Thomas
Yevgeniy Elbert
Steven M. Babin
Sheri H. Lewis
Publication date
01-12-2015
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2015
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-015-0170-6

Other articles of this Issue 1/2015

BMC Medical Informatics and Decision Making 1/2015 Go to the issue