Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2005

Open Access 01-12-2005 | Research article

Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk

Authors: Jean Gaudart, Belco Poudiougou, Stéphane Ranque, Ogobara Doumbo

Published in: BMC Medical Research Methodology | Issue 1/2005

Login to get access

Abstract

Background

In order to detect potential disease clusters where a putative source cannot be specified, classical procedures scan the geographical area with circular windows through a specified grid imposed to the map. However, the choice of the windows' shapes, sizes and centers is critical and different choices may not provide exactly the same results.
The aim of our work was to use an Oblique Decision Tree model (ODT) which provides potential clusters without pre-specifying shapes, sizes or centers. For this purpose, we have developed an ODT-algorithm to find an oblique partition of the space defined by the geographic coordinates.

Methods

ODT is based on the classification and regression tree (CART). As CART finds out rectangular partitions of the covariate space, ODT provides oblique partitions maximizing the interclass variance of the independent variable. Since it is a NP-Hard problem in RN, classical ODT-algorithms use evolutionary procedures or heuristics. We have developed an optimal ODT-algorithm in R2, based on the directions defined by each couple of point locations. This partition provided potential clusters which can be tested with Monte-Carlo inference.
We applied the ODT-model to a dataset in order to identify potential high risk clusters of malaria in a village in Western Africa during the dry season. The ODT results were compared with those of the Kulldorff' s SaTScan™.

Results

The ODT procedure provided four classes of risk of infection. In the first high risk class 60%, 95% confidence interval (CI95%) [52.22–67.55], of the children was infected. Monte-Carlo inference showed that the spatial pattern issued from the ODT-model was significant (p < 0.0001).
Satscan results yielded one significant cluster where the risk of disease was high with an infectious rate of 54.21%, CI95% [47.51–60.75]. Obviously, his center was located within the first high risk ODT class. Both procedures provided similar results identifying a high risk cluster in the western part of the village where a mosquito breeding point was located.

Conclusion

ODT-models improve the classical scanning procedures by detecting potential disease clusters independently of any specification of the shapes, sizes or centers of the clusters.
Appendix
Available only for authorised users
Literature
1.
go back to reference Kulldorff M, Feuer EJ, Miller BA, Freeman LS: Breast cancer in northeastern United States: a geographical analysis. Am J Epidemiol. 1997, 146: 161-170.CrossRefPubMed Kulldorff M, Feuer EJ, Miller BA, Freeman LS: Breast cancer in northeastern United States: a geographical analysis. Am J Epidemiol. 1997, 146: 161-170.CrossRefPubMed
2.
go back to reference Bithell JF: The choice of test for detecting raised disease risk near a point source. Stat Med. 1995, 14: 2309-2322.CrossRefPubMed Bithell JF: The choice of test for detecting raised disease risk near a point source. Stat Med. 1995, 14: 2309-2322.CrossRefPubMed
3.
go back to reference Cuzick J, Edwards R: Spatial clustering for inhomogeneous populations. J R Stat Soc [Ser B]. 1990, 52: 73-104. Cuzick J, Edwards R: Spatial clustering for inhomogeneous populations. J R Stat Soc [Ser B]. 1990, 52: 73-104.
4.
go back to reference Tango T: A class of tests for detecting 'general' and 'focused' clustering of rare diseases. Stat Med. 1995, 14: 2323-2334.CrossRefPubMed Tango T: A class of tests for detecting 'general' and 'focused' clustering of rare diseases. Stat Med. 1995, 14: 2323-2334.CrossRefPubMed
5.
go back to reference Diggle PJ, Morris S, Elliott P, Shaddick G: Regression modelling of disease risk in relation to point sources. J R Stat Soc [Ser A]. 1997, 160: 491-505. 10.1111/1467-985X.00076.CrossRef Diggle PJ, Morris S, Elliott P, Shaddick G: Regression modelling of disease risk in relation to point sources. J R Stat Soc [Ser A]. 1997, 160: 491-505. 10.1111/1467-985X.00076.CrossRef
6.
go back to reference Anderson NH, Titterington DM: Some methods for investigating spatial clustering, with epidemiological applications. J R Stat Soc [Ser A]. 1997, 160: 87-105. 10.1111/1467-985X.00047.CrossRef Anderson NH, Titterington DM: Some methods for investigating spatial clustering, with epidemiological applications. J R Stat Soc [Ser A]. 1997, 160: 87-105. 10.1111/1467-985X.00047.CrossRef
7.
go back to reference Tango T: Score tests for detecting excess risks around putative sources. Stat Med. 2002, 21: 497-514. 10.1002/sim.1003.CrossRefPubMed Tango T: Score tests for detecting excess risks around putative sources. Stat Med. 2002, 21: 497-514. 10.1002/sim.1003.CrossRefPubMed
8.
go back to reference Diggle PJ, Chetwynd AG: Second-order analysis of spatial clustering for inhomogeneous populations. Biometrics. 1991, 47: 1155-1163.CrossRefPubMed Diggle PJ, Chetwynd AG: Second-order analysis of spatial clustering for inhomogeneous populations. Biometrics. 1991, 47: 1155-1163.CrossRefPubMed
10.
go back to reference Turnbull BW, Iwano EJ, Burnett WS, Howe HL, Clark LC: Monitoring for clusters of disease: application to leukemia incidence in upstate New York. Am J Epidemiol. 1990, 132: S136-143.PubMed Turnbull BW, Iwano EJ, Burnett WS, Howe HL, Clark LC: Monitoring for clusters of disease: application to leukemia incidence in upstate New York. Am J Epidemiol. 1990, 132: S136-143.PubMed
11.
go back to reference Kulldorff M: A spatial scan statistic. Commun Stat Theor M. 1997, 26: 1481-1496.CrossRef Kulldorff M: A spatial scan statistic. Commun Stat Theor M. 1997, 26: 1481-1496.CrossRef
12.
go back to reference Waller LA, Gotway CA: Applied spatial statistics for public health data. 2004, Wiley: Hoboken New JerseyCrossRef Waller LA, Gotway CA: Applied spatial statistics for public health data. 2004, Wiley: Hoboken New JerseyCrossRef
13.
go back to reference Wakefield J, Elliott P: Issues in the statistical analysis of small area health data. Stat Med. 1999, 18: 2377-2399. 10.1002/(SICI)1097-0258(19990915/30)18:17/18<2377::AID-SIM263>3.3.CO;2-7.CrossRefPubMed Wakefield J, Elliott P: Issues in the statistical analysis of small area health data. Stat Med. 1999, 18: 2377-2399. 10.1002/(SICI)1097-0258(19990915/30)18:17/18<2377::AID-SIM263>3.3.CO;2-7.CrossRefPubMed
14.
go back to reference Kulldorff M, Nargawalla N: Spatial disease clusters: detection and inference. Stat Med. 1995, 14: 799-810.CrossRefPubMed Kulldorff M, Nargawalla N: Spatial disease clusters: detection and inference. Stat Med. 1995, 14: 799-810.CrossRefPubMed
15.
go back to reference Thomas AJ, Carlin BP: Late detection of breast and colorectal cancer in Minnesota counties: an application of spatial smoothing and clustering. Stat Med. 2003, 22: 113-127. 10.1002/sim.1215.CrossRefPubMed Thomas AJ, Carlin BP: Late detection of breast and colorectal cancer in Minnesota counties: an application of spatial smoothing and clustering. Stat Med. 2003, 22: 113-127. 10.1002/sim.1215.CrossRefPubMed
16.
go back to reference Sheehan TJ, De Chello LM, Kulldorff M, Gregorio DI, Gershman S, Mroszczyk M: The geographic distribution of breast cancer incidence in Massachusetts 1988 to 1997, adjusted for covariates. Int J Health Geogr. 2004, 3: 17-10.1186/1476-072X-3-17.CrossRef Sheehan TJ, De Chello LM, Kulldorff M, Gregorio DI, Gershman S, Mroszczyk M: The geographic distribution of breast cancer incidence in Massachusetts 1988 to 1997, adjusted for covariates. Int J Health Geogr. 2004, 3: 17-10.1186/1476-072X-3-17.CrossRef
17.
go back to reference Hjalmars U, Kulldorff M, Gustafsson G, Nagarwall N: Childhood leukemia in Sweden: using GIS and spatial scan statistic for cluster detection. Stat Med. 1996, 15: 707-715. 10.1002/(SICI)1097-0258(19960415)15:7/9<707::AID-SIM242>3.3.CO;2-W.CrossRefPubMed Hjalmars U, Kulldorff M, Gustafsson G, Nagarwall N: Childhood leukemia in Sweden: using GIS and spatial scan statistic for cluster detection. Stat Med. 1996, 15: 707-715. 10.1002/(SICI)1097-0258(19960415)15:7/9<707::AID-SIM242>3.3.CO;2-W.CrossRefPubMed
18.
go back to reference Kulldorff M: SaTScanTM v5.l-Software for the spatial and space-time scan statistics. 2004, Information Management Services Inc., Silver Spring, Maryland, [http://www.satscan.org] Kulldorff M: SaTScanTM v5.l-Software for the spatial and space-time scan statistics. 2004, Information Management Services Inc., Silver Spring, Maryland, [http://​www.​satscan.​org]
19.
go back to reference Gangnon RE, Clayton MK: Bayesian detection and modeling of spatial disease clustering. Biometrics. 2000, 56: 922-935. 10.1111/j.0006-341X.2000.00922.x.CrossRefPubMed Gangnon RE, Clayton MK: Bayesian detection and modeling of spatial disease clustering. Biometrics. 2000, 56: 922-935. 10.1111/j.0006-341X.2000.00922.x.CrossRefPubMed
20.
go back to reference Patil GP, Taillie C: Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ Ecol Stat. 2004, 11: 183-197. 10.1023/B:EEST.0000027208.48919.7e.CrossRef Patil GP, Taillie C: Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ Ecol Stat. 2004, 11: 183-197. 10.1023/B:EEST.0000027208.48919.7e.CrossRef
21.
go back to reference Duczmal L, Assunciao RM: A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Comput Statist Data Anal. 2004, 45: 269-286. 10.1016/S0167-9473(02)00302-X.CrossRef Duczmal L, Assunciao RM: A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Comput Statist Data Anal. 2004, 45: 269-286. 10.1016/S0167-9473(02)00302-X.CrossRef
22.
go back to reference Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and regression trees. 1993, Chapman & Hall: New York Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and regression trees. 1993, Chapman & Hall: New York
24.
go back to reference Xu R, Adak S: Survival analysis with time-varying regression effects using a tree-based approach. Biometrics. 2002, 58: 305-315. 10.1111/j.0006-341X.2002.00305.x.CrossRefPubMed Xu R, Adak S: Survival analysis with time-varying regression effects using a tree-based approach. Biometrics. 2002, 58: 305-315. 10.1111/j.0006-341X.2002.00305.x.CrossRefPubMed
25.
go back to reference Leblanc M, Crowley J: Relative Risk trees for censored survival data. Biometrics. 1992, 48: 411-425.CrossRefPubMed Leblanc M, Crowley J: Relative Risk trees for censored survival data. Biometrics. 1992, 48: 411-425.CrossRefPubMed
26.
go back to reference Schmoor C, Ulm K, Schumacher M: Comparison of the Cox model and the regression tree procedure in analyzing a randomized clinical trial. Stat Med. 1993, 12: 2351-2366.CrossRefPubMed Schmoor C, Ulm K, Schumacher M: Comparison of the Cox model and the regression tree procedure in analyzing a randomized clinical trial. Stat Med. 1993, 12: 2351-2366.CrossRefPubMed
27.
go back to reference Zhang H, Holford T, Bracken MB: A tree-based method of analysis for prospective studies. Stat Med. 1996, 15: 37-49. 10.1002/(SICI)1097-0258(19960115)15:1<37::AID-SIM144>3.3.CO;2-S.CrossRefPubMed Zhang H, Holford T, Bracken MB: A tree-based method of analysis for prospective studies. Stat Med. 1996, 15: 37-49. 10.1002/(SICI)1097-0258(19960115)15:1<37::AID-SIM144>3.3.CO;2-S.CrossRefPubMed
28.
go back to reference Crichton NJ, Hinde JP, Marchini J: Models for diagnosing chest pain: is cart helpful?. Stat Med. 1997, 16: 717-727. 10.1002/(SICI)1097-0258(19970415)16:7<717::AID-SIM504>3.0.CO;2-E.CrossRefPubMed Crichton NJ, Hinde JP, Marchini J: Models for diagnosing chest pain: is cart helpful?. Stat Med. 1997, 16: 717-727. 10.1002/(SICI)1097-0258(19970415)16:7<717::AID-SIM504>3.0.CO;2-E.CrossRefPubMed
29.
go back to reference Fu CY: Combining loglinear model with classification and regression tree (CART): an application to birth data. Comput Statist Data Anal. 2004, 45: 865-874. 10.1016/S0167-9473(03)00092-6.CrossRef Fu CY: Combining loglinear model with classification and regression tree (CART): an application to birth data. Comput Statist Data Anal. 2004, 45: 865-874. 10.1016/S0167-9473(03)00092-6.CrossRef
30.
go back to reference McBride WJH, Mullner H, Muller R, Labrooy J, Wronski I: Determinants of dengue 2 infection among residents of charters towers, Queensland, Australia. Am J Epidemiol. 1998, 148: 1111-1116.CrossRefPubMed McBride WJH, Mullner H, Muller R, Labrooy J, Wronski I: Determinants of dengue 2 infection among residents of charters towers, Queensland, Australia. Am J Epidemiol. 1998, 148: 1111-1116.CrossRefPubMed
31.
go back to reference Gey S: Bornes de risque, détection de ruptures boosting: trois thèmes statistiques autour de CART en régression. PhD thesis. 2002, University of Paris XI Gey S: Bornes de risque, détection de ruptures boosting: trois thèmes statistiques autour de CART en régression. PhD thesis. 2002, University of Paris XI
32.
go back to reference Heath D, Kasif M, Salzberg S: Induction of oblique decision trees. Proceedings of the 13th International Joint Conference on Artificial Intelligence: August 28-September 3 1993; Chambery, France. Edited by: Ruzena Bajcsy. 1993, Morgan Kaufmann, 1002-1007. Heath D, Kasif M, Salzberg S: Induction of oblique decision trees. Proceedings of the 13th International Joint Conference on Artificial Intelligence: August 28-September 3 1993; Chambery, France. Edited by: Ruzena Bajcsy. 1993, Morgan Kaufmann, 1002-1007.
33.
go back to reference Murthy SK, Kasif M, Salzberg S: A system for induction of oblique decision trees. J Artif Intell Res. 1994, 2: 1-32. Murthy SK, Kasif M, Salzberg S: A system for induction of oblique decision trees. J Artif Intell Res. 1994, 2: 1-32.
34.
go back to reference Cantu-Paz E, Kamath C: Inducing oblique decision trees with evolutionary algorithms. IEEE Trans Evol Comput. 2003, 7: 54-68. 10.1109/TEVC.2002.806857.CrossRef Cantu-Paz E, Kamath C: Inducing oblique decision trees with evolutionary algorithms. IEEE Trans Evol Comput. 2003, 7: 54-68. 10.1109/TEVC.2002.806857.CrossRef
35.
go back to reference Brodley CE, Utgoff PE: Multivatiate decision trees. COINS technical reports 92-82. 1992, University of Massachusetts Brodley CE, Utgoff PE: Multivatiate decision trees. COINS technical reports 92-82. 1992, University of Massachusetts
36.
go back to reference Ghattas B: Agrégation d'arbres de décision binaires: application à la prévision de l'ozone dans les Bouches du Rhône. PhD thesis. 2001, University of Aix-Marseille II Ghattas B: Agrégation d'arbres de décision binaires: application à la prévision de l'ozone dans les Bouches du Rhône. PhD thesis. 2001, University of Aix-Marseille II
37.
go back to reference Tanser F, Le Sueur D: The application of geographical information systems to important public health problems in Africa. Int J Health Geogr. 2002, 1: 1-CrossRef Tanser F, Le Sueur D: The application of geographical information systems to important public health problems in Africa. Int J Health Geogr. 2002, 1: 1-CrossRef
38.
go back to reference Doumbo OK: It takes a village: medical research and ethics in Mali. Science. 2005, 307: 679-681. 10.1126/science.1109773.CrossRefPubMed Doumbo OK: It takes a village: medical research and ethics in Mali. Science. 2005, 307: 679-681. 10.1126/science.1109773.CrossRefPubMed
39.
go back to reference Rushton G, Lolonis P: Exploratory spatial analysis of birth defect rates in an urban population. Stat Med. 1996, 15: 717-726. 10.1002/(SICI)1097-0258(19960415)15:7/9<717::AID-SIM243>3.0.CO;2-0.CrossRefPubMed Rushton G, Lolonis P: Exploratory spatial analysis of birth defect rates in an urban population. Stat Med. 1996, 15: 717-726. 10.1002/(SICI)1097-0258(19960415)15:7/9<717::AID-SIM243>3.0.CO;2-0.CrossRefPubMed
40.
go back to reference Newcombe RG: Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998, 17: 857-872. 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E.CrossRefPubMed Newcombe RG: Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998, 17: 857-872. 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E.CrossRefPubMed
41.
42.
go back to reference Booman M, Durrheim DN, La Grange K, Martin C, Mabuza AM, Zitha A, Mbokazi FM, Fraser C, Sharp BL: Using a geographical information system to plan a malaria control programme in South Africa. Bull World Health Organ. 2000, 78: 1438-1444.PubMedPubMedCentral Booman M, Durrheim DN, La Grange K, Martin C, Mabuza AM, Zitha A, Mbokazi FM, Fraser C, Sharp BL: Using a geographical information system to plan a malaria control programme in South Africa. Bull World Health Organ. 2000, 78: 1438-1444.PubMedPubMedCentral
43.
go back to reference Baird JK, Agyei SO, Utz GC, Koram K, Barcus MJ, Jones TR, Fryauff DJ, Binka FN, Hoffman SL, Nkrumah FN: Seasonal malaria attack rates in infants and young children in Northern ghana. Am J Trop Med Hyg. 2002, 66: 280-286.PubMed Baird JK, Agyei SO, Utz GC, Koram K, Barcus MJ, Jones TR, Fryauff DJ, Binka FN, Hoffman SL, Nkrumah FN: Seasonal malaria attack rates in infants and young children in Northern ghana. Am J Trop Med Hyg. 2002, 66: 280-286.PubMed
44.
go back to reference Killeen GF, Seyoum A, Knols BGJ: Rationalizing historical successes of malaria control in africa in terms of mosquito resource availability management. Am J Trop Med Hyg. 2004, 71 (S2): 87-93.PubMed Killeen GF, Seyoum A, Knols BGJ: Rationalizing historical successes of malaria control in africa in terms of mosquito resource availability management. Am J Trop Med Hyg. 2004, 71 (S2): 87-93.PubMed
45.
go back to reference World Health Organization: Expert Committee on Malaria 20th report. World Health Organ Tech Rep. 2000, 735- World Health Organization: Expert Committee on Malaria 20th report. World Health Organ Tech Rep. 2000, 735-
Metadata
Title
Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk
Authors
Jean Gaudart
Belco Poudiougou
Stéphane Ranque
Ogobara Doumbo
Publication date
01-12-2005
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2005
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-5-22

Other articles of this Issue 1/2005

BMC Medical Research Methodology 1/2005 Go to the issue