Skip to main content
Top
Published in: Emerging Themes in Epidemiology 1/2017

Open Access 01-12-2017 | Analytic perspective

Decision trees in epidemiological research

Authors: Ashwini Venkatasubramaniam, Julian Wolfson, Nathan Mitchell, Timothy Barnes, Meghan JaKa, Simone French

Published in: Emerging Themes in Epidemiology | Issue 1/2017

Login to get access

Abstract

Background

In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods.

Main text

We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees.

Conclusions

Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
Appendix
Available only for authorised users
Footnotes
1
Some studies record participant’s self-reported level of wanting and liking using quantitative scales (e.g., [15]), while other studies measure this via brain activity during a motivational state (e.g., [16, 17]).
 
2
This standardized value of 0.26 is calculated from \((2190 - 2012)/685.55\), where 2012 is the mean energy intake and 685.55 is its standard deviation.
 
Literature
1.
go back to reference Van Hulst A, Roy-Gagnon M-H, Gauvin L, Kestens Y, Henderson M, Barnett TA. Identifying risk profiles for childhood obesity using recursive partitioning based on individual, familial, and neighborhood environment factors. Int J Behav Nutr Phys Act. 2015;12(1):17.CrossRefPubMedPubMedCentral Van Hulst A, Roy-Gagnon M-H, Gauvin L, Kestens Y, Henderson M, Barnett TA. Identifying risk profiles for childhood obesity using recursive partitioning based on individual, familial, and neighborhood environment factors. Int J Behav Nutr Phys Act. 2015;12(1):17.CrossRefPubMedPubMedCentral
2.
go back to reference Garzotto M, Beer TM, Hudson RG, Peters L, Hsieh Y-C, Barrera E, Klein T, Mori M. Improved detection of prostate cancer using classification and regression tree analysis. J Clin Oncol. 2005;23(19):4322–9.CrossRefPubMed Garzotto M, Beer TM, Hudson RG, Peters L, Hsieh Y-C, Barrera E, Klein T, Mori M. Improved detection of prostate cancer using classification and regression tree analysis. J Clin Oncol. 2005;23(19):4322–9.CrossRefPubMed
3.
go back to reference Ogden CL, Carroll MD, Curtin LR, McDowell MA, Tabak CJ, Flegal KM. Prevalence of overweight and obesity in the United States, 1999–2004. Jama. 2006;295(13):1549–55.CrossRefPubMed Ogden CL, Carroll MD, Curtin LR, McDowell MA, Tabak CJ, Flegal KM. Prevalence of overweight and obesity in the United States, 1999–2004. Jama. 2006;295(13):1549–55.CrossRefPubMed
4.
go back to reference Flegal KM, Kruszon-Moran D, Carroll MD, Fryar CD, Ogden CL. Trends in obesity among adults in the United States, 2005 to 2014. JAMA. 2016;315(21):2284–91.CrossRefPubMed Flegal KM, Kruszon-Moran D, Carroll MD, Fryar CD, Ogden CL. Trends in obesity among adults in the United States, 2005 to 2014. JAMA. 2016;315(21):2284–91.CrossRefPubMed
5.
go back to reference Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ. Classification and regression trees for epidemiologic research: an air pollution example. Environ. Health. 2014;13(1):17.CrossRefPubMedPubMedCentral Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ. Classification and regression trees for epidemiologic research: an air pollution example. Environ. Health. 2014;13(1):17.CrossRefPubMedPubMedCentral
6.
go back to reference Aguiar FS, Almeida LL, Ruffino-Netto A, Kritski AL, Mello FC, Werneck GL. Classification and regression tree (CART) model to predict pulmonary tuberculosis in hospitalized patients. BMC Pulm Med. 2012;12(1):40.CrossRefPubMedPubMedCentral Aguiar FS, Almeida LL, Ruffino-Netto A, Kritski AL, Mello FC, Werneck GL. Classification and regression tree (CART) model to predict pulmonary tuberculosis in hospitalized patients. BMC Pulm Med. 2012;12(1):40.CrossRefPubMedPubMedCentral
7.
go back to reference Lei Y, Nollen N, Ahluwahlia JS, Yu Q, Mayo MS. An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and CART: a heuristic comparison. BMC Public Health. 2015;15(1):341.CrossRefPubMedPubMedCentral Lei Y, Nollen N, Ahluwahlia JS, Yu Q, Mayo MS. An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and CART: a heuristic comparison. BMC Public Health. 2015;15(1):341.CrossRefPubMedPubMedCentral
8.
go back to reference French SA, Mitchell NR, Wolfson J, Harnack LJ, Jeffery RW, Gerlach AF, Blundell JE, Pentel PR. Portion size effects on weight gain in a free living setting. Obesity. 2014;22(6):1400–5.CrossRefPubMedPubMedCentral French SA, Mitchell NR, Wolfson J, Harnack LJ, Jeffery RW, Gerlach AF, Blundell JE, Pentel PR. Portion size effects on weight gain in a free living setting. Obesity. 2014;22(6):1400–5.CrossRefPubMedPubMedCentral
9.
go back to reference French SA, Mitchell NR, Wolfson J, Finlayson G, Blundell JE, Jeffery RW. Questionnaire and laboratory measures of eating behavior. Associations with energy intake and BMI in a community sample of working adults. Appetite. 2014;72:50–8.CrossRefPubMed French SA, Mitchell NR, Wolfson J, Finlayson G, Blundell JE, Jeffery RW. Questionnaire and laboratory measures of eating behavior. Associations with energy intake and BMI in a community sample of working adults. Appetite. 2014;72:50–8.CrossRefPubMed
10.
go back to reference Stunkard AJ, Messick S. The three-factor eating questionnaire to measure dietary restraint, disinhibition and hunger. J Psychosom Res. 1985;29(1):71–83.CrossRefPubMed Stunkard AJ, Messick S. The three-factor eating questionnaire to measure dietary restraint, disinhibition and hunger. J Psychosom Res. 1985;29(1):71–83.CrossRefPubMed
12.
go back to reference Hothorn T, Zeileis A. Partykit: a modular toolkit for recursive partytioning in R. J Mach Learn Res. 2015;16:3905–9. Hothorn T, Zeileis A. Partykit: a modular toolkit for recursive partytioning in R. J Mach Learn Res. 2015;16:3905–9.
14.
go back to reference Torgo L. Data mining with R, learning with case studies. Boca Raton: Chapman and Hall/CRC; 2010.CrossRef Torgo L. Data mining with R, learning with case studies. Boca Raton: Chapman and Hall/CRC; 2010.CrossRef
15.
go back to reference McNeil J, Cadieux S, Finlayson G, Blundell J, Doucet E. Associations between sleep parameters and food reward. J Sleep Res. 2015;24(3):346–50.CrossRefPubMed McNeil J, Cadieux S, Finlayson G, Blundell J, Doucet E. Associations between sleep parameters and food reward. J Sleep Res. 2015;24(3):346–50.CrossRefPubMed
17.
go back to reference Pool E, Sennwald V, Delplanque S, Brosch T, Sander D. Measuring wanting and liking from animals to humans: a systematic review. Neurosci Biobehav Rev. 2016;63:124–42.CrossRefPubMed Pool E, Sennwald V, Delplanque S, Brosch T, Sander D. Measuring wanting and liking from animals to humans: a systematic review. Neurosci Biobehav Rev. 2016;63:124–42.CrossRefPubMed
18.
go back to reference Breiman L, Friedman J, Olshen R, Stone C. Classification and regression trees. Boca Raton: CRC Press; 1984. Breiman L, Friedman J, Olshen R, Stone C. Classification and regression trees. Boca Raton: CRC Press; 1984.
19.
go back to reference Loh W, Shih Y. Split selection methods for classification trees. Stat Sin. 1997;7(4):815–40. Loh W, Shih Y. Split selection methods for classification trees. Stat Sin. 1997;7(4):815–40.
20.
go back to reference White A, Liu W. Technical note: bias in information-based measures in decision tree induction. Mach Learn. 1994;15(3):321–9. White A, Liu W. Technical note: bias in information-based measures in decision tree induction. Mach Learn. 1994;15(3):321–9.
21.
go back to reference Shih Y. A note on split selection bias in classification trees. Comput Stat Data Anal. 2004;45(3):457–66.CrossRef Shih Y. A note on split selection bias in classification trees. Comput Stat Data Anal. 2004;45(3):457–66.CrossRef
22.
go back to reference Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat. 2006;15(3):651–74.CrossRef Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat. 2006;15(3):651–74.CrossRef
23.
go back to reference Esposito F, Malerba D, Semeraro G, Kay J. A comparative analysis of methods for pruning decision trees. IEEE Trans Pattern Anal Mach Intell. 1997;19(5):476–91.CrossRef Esposito F, Malerba D, Semeraro G, Kay J. A comparative analysis of methods for pruning decision trees. IEEE Trans Pattern Anal Mach Intell. 1997;19(5):476–91.CrossRef
24.
go back to reference Mingers J. An empirical comparison of pruning methods for decision tree induction. Mach Learn. 1989;4(2):227–43.CrossRef Mingers J. An empirical comparison of pruning methods for decision tree induction. Mach Learn. 1989;4(2):227–43.CrossRef
25.
go back to reference Schaffer C. Overfitting avoidance as bias. Mach Learn. 1993;10(2):153–78. Schaffer C. Overfitting avoidance as bias. Mach Learn. 1993;10(2):153–78.
26.
go back to reference Atienza AA, Yaroch AL, Mãsse LC, Moser RP, Hesse BW, King AC. Identifying sedentary subgroups: the National Cancer Institute’s Health Information National Trends Survey. Am J Prev Med. 2006;31(5):383–90.CrossRefPubMedPubMedCentral Atienza AA, Yaroch AL, Mãsse LC, Moser RP, Hesse BW, King AC. Identifying sedentary subgroups: the National Cancer Institute’s Health Information National Trends Survey. Am J Prev Med. 2006;31(5):383–90.CrossRefPubMedPubMedCentral
27.
go back to reference King AC, Salvo D, Banda JA, Ahn DK, Gill TM, Miller M, Newman AB, Fielding RA, Siordia C, Moore S, et al. An observational study identifying obese subgroups among older adults at increased risk of mobility disability: do perceptions of the neighborhood environment matter? Int J Behav Nutr Phys Act. 2015;12(1):1.CrossRef King AC, Salvo D, Banda JA, Ahn DK, Gill TM, Miller M, Newman AB, Fielding RA, Siordia C, Moore S, et al. An observational study identifying obese subgroups among older adults at increased risk of mobility disability: do perceptions of the neighborhood environment matter? Int J Behav Nutr Phys Act. 2015;12(1):1.CrossRef
28.
go back to reference Lee Y-C, Lee W-J, Lin Y-C, Liew P-L, Lee CK, Lin S, Lee T-S. Obesity and the decision tree: predictors of sustained weight loss after bariatric surgery. Hepato Gastroenterol. 2008;56(96):1745–9. Lee Y-C, Lee W-J, Lin Y-C, Liew P-L, Lee CK, Lin S, Lee T-S. Obesity and the decision tree: predictors of sustained weight loss after bariatric surgery. Hepato Gastroenterol. 2008;56(96):1745–9.
29.
go back to reference Jung SY, Vitolins MZ, Fenton J, Frazier-Wood AC, Hursting SD, Chang S. Risk profiles for weight gain among postmenopausal women: a classification and regression tree analysis approach. PLoS ONE. 2015;10(3):0121430. Jung SY, Vitolins MZ, Fenton J, Frazier-Wood AC, Hursting SD, Chang S. Risk profiles for weight gain among postmenopausal women: a classification and regression tree analysis approach. PLoS ONE. 2015;10(3):0121430.
Metadata
Title
Decision trees in epidemiological research
Authors
Ashwini Venkatasubramaniam
Julian Wolfson
Nathan Mitchell
Timothy Barnes
Meghan JaKa
Simone French
Publication date
01-12-2017
Publisher
BioMed Central
Published in
Emerging Themes in Epidemiology / Issue 1/2017
Electronic ISSN: 1742-7622
DOI
https://doi.org/10.1186/s12982-017-0064-4

Other articles of this Issue 1/2017

Emerging Themes in Epidemiology 1/2017 Go to the issue