Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2019

Open Access 01-12-2019 | Suicide | Research article

Explainable statistical learning in public health for policy development: the case of real-world suicide data

Authors: Paul van Schaik, Yonghong Peng, Adedokun Ojelabi, Jonathan Ling

Published in: BMC Medical Research Methodology | Issue 1/2019

Login to get access

Abstract

Background

In recent years, the availability of publicly available data related to public health has significantly increased. These data have substantial potential to develop public health policy; however, this requires meaningful and insightful analysis. Our aim is to demonstrate how data analysis techniques can be used to address the issues of data reduction, prediction and explanation using online available public health data, in order to provide a sound basis for informing public health policy.

Methods

Observational suicide prevention data were analysed from an existing online United Kingdom national public health database. Multi-collinearity analysis and principal-component analysis were used to reduce correlated data, followed by regression analyses for prediction and explanation of suicide.

Results

Multi-collinearity analysis was effective in reducing the indicator set of predictors by 30% and principal component analysis further reduced the set by 86%. Regression for prediction identified four significant indicator predictors of suicide behaviour (emergency hospital admissions for intentional self-harm, children leaving care, statutory homelessness and self-reported well-being/low happiness) and two main component predictors (relatedness dysfunction, and behavioural problems and mental illness). Regression for explanation identified significant moderation of a well-being predictor (low happiness) of suicide behaviour by a social factor (living alone), thereby supporting existing theory and providing insight beyond the results of regression for prediction. Two independent predictors capturing relatedness needs in social care service delivery were also identified.

Conclusions

We demonstrate the effectiveness of regression techniques in the analysis of online public health data. Regression analysis for prediction and explanation can both be appropriate for public health data analysis for a better understanding of public health outcomes. It is therefore essential to clarify the aim of the analysis (prediction accuracy or theory development) as a basis for choosing the most appropriate model. We apply these techniques to the analysis of suicide data; however, we argue that the analysis presented in this study should be applied to datasets across public health in order to improve the quality of health policy recommendations.
Appendix
Available only for authorised users
Footnotes
1
“a set of tools for modeling and understanding complex datasets” ([20], p. vii). We use the term ‘statistical learning’ rather than ‘machine learning’ as the former more accurately represents the statistical analysis used in this paper.
 
2
“rich source of indicators across a range of health and wellbeing themes that has been designed to support JSNA [Joint Strategic Needs Assessment] and commissioning to improve health and wellbeing, and reduce inequalities.” [34]
 
3
Excluded were the Isles of Scilly, City of London and Rutland because of limited data availability.
 
4
To allow for comparison of between the different approaches, the results of variance accounted for (R2) are presented for each approach.
 
5
In hierarchical and stepwise regression, the semi-partial correlation coefficient squared denotes additional variance explained by each predictor in the outcome variable, but in forced-entry regression it denotes unique variance explained.
 
6
Please note that the Fingertips repository does not contain data about interventions. Instead, here we analyse variables that could be targeted by interventions.
 
7
Potential covariates (such as the predictors from a stepwise regression model) that cannot be (directly) influenced by the intervention under consideration are not included in the model.
 
Literature
1.
go back to reference Aísa R, Clemente J, Pueyo F. The influence of (public) health expenditure on longevity. Int J Public Health. 2014;59(5):867–75.PubMedCrossRef Aísa R, Clemente J, Pueyo F. The influence of (public) health expenditure on longevity. Int J Public Health. 2014;59(5):867–75.PubMedCrossRef
2.
go back to reference Bardsley M, Steventon A, Fothergill G: Untapped potential: Investing in health and care data analytics. 2019, 978–1–911615-30-9:. Bardsley M, Steventon A, Fothergill G: Untapped potential: Investing in health and care data analytics. 2019, 978–1–911615-30-9:.
3.
go back to reference Barzilay S, Feldman D, Snir A, Apter A, Carli V, Hoven CW, Wasserman C, Sarchiapone M, Wasserman D. The interpersonal theory of suicide and adolescent suicidal behavior. J Affect Disord. 2015;183:68–74.PubMedCrossRef Barzilay S, Feldman D, Snir A, Apter A, Carli V, Hoven CW, Wasserman C, Sarchiapone M, Wasserman D. The interpersonal theory of suicide and adolescent suicidal behavior. J Affect Disord. 2015;183:68–74.PubMedCrossRef
4.
go back to reference Bozeman SR, Hoaglin DC, Burton TM, Pashos CL, Ben-Joseph RH, Hollenbeak CS. Predicting waist circumference from body mass index. BMC Med Res Methodol. 2012;12(1):115. Bozeman SR, Hoaglin DC, Burton TM, Pashos CL, Ben-Joseph RH, Hollenbeak CS. Predicting waist circumference from body mass index. BMC Med Res Methodol. 2012;12(1):115.
5.
go back to reference Breiman L. Statistical modeling: the two cultures. Stat Sci. 2001;16(3):199–215.CrossRef Breiman L. Statistical modeling: the two cultures. Stat Sci. 2001;16(3):199–215.CrossRef
6.
go back to reference Choi SB, Lee W, Yoon J, Won J, Kim DW. Risk factors of suicide attempt among people with suicidal ideation in South Korea: A cross-sectional study. BMC Public Health. 2017;17(1):579.PubMedPubMedCentralCrossRef Choi SB, Lee W, Yoon J, Won J, Kim DW. Risk factors of suicide attempt among people with suicidal ideation in South Korea: A cross-sectional study. BMC Public Health. 2017;17(1):579.PubMedPubMedCentralCrossRef
8.
go back to reference Diez-Roux AV. Multilevel analysis in public health research. Annu Rev Public Health. 2000;21:171–92.PubMedCrossRef Diez-Roux AV. Multilevel analysis in public health research. Annu Rev Public Health. 2000;21:171–92.PubMedCrossRef
9.
go back to reference Dixon BE, Pina J, Kharrazi H, Gharghabi F, Richards J. What’s past is prologue: a scoping review of recent public health and global health informatics literature. Online J Public Health Inf. 2015;7(2):e216. Dixon BE, Pina J, Kharrazi H, Gharghabi F, Richards J. What’s past is prologue: a scoping review of recent public health and global health informatics literature. Online J Public Health Inf. 2015;7(2):e216.
11.
go back to reference Field A. Discovering statistics using IBM SPSS statistics. 5th ed. London: Sage; 2017. Field A. Discovering statistics using IBM SPSS statistics. 5th ed. London: Sage; 2017.
12.
go back to reference Fox S, Flowers J. fingertipsR: Fingertips data for public health; 2018. Fox S, Flowers J. fingertipsR: Fingertips data for public health; 2018.
13.
go back to reference Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, Musacchio KM, Jaroszewski AC, Chang BP, Nock MK. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull. 2017;143(2):187–232.PubMedCrossRef Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, Musacchio KM, Jaroszewski AC, Chang BP, Nock MK. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull. 2017;143(2):187–232.PubMedCrossRef
14.
go back to reference Gamache R, Kharrazi H, Weiner JP. Public and population health informatics: the bridging of big data to benefit communities. Yearb Med Inform. 2018;27(1):199–206.PubMedPubMedCentralCrossRef Gamache R, Kharrazi H, Weiner JP. Public and population health informatics: the bridging of big data to benefit communities. Yearb Med Inform. 2018;27(1):199–206.PubMedPubMedCentralCrossRef
15.
go back to reference Ghani R, Foster I. Big data and social science: a practical guide to methods and tools. Boca Raton: CRC Press; 2017. Ghani R, Foster I. Big data and social science: a practical guide to methods and tools. Boca Raton: CRC Press; 2017.
17.
go back to reference Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, Springer series in statistics. New York: Springer; 2009.CrossRef Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, Springer series in statistics. New York: Springer; 2009.CrossRef
18.
go back to reference Hayes AF. Introduction to mediation, moderation, and conditional process analysis: a regression-based approach. 2nd ed. USA: Guilford Press; 2017. Hayes AF. Introduction to mediation, moderation, and conditional process analysis: a regression-based approach. 2nd ed. USA: Guilford Press; 2017.
19.
go back to reference Hopkins WG, Marshall SW, Batterham AM, Hanin J. Progressive statistics for studies in sports medicine and exercise science. Med Sci Sports Exerc. 2009;41(1):3–12.PubMedCrossRef Hopkins WG, Marshall SW, Batterham AM, Hanin J. Progressive statistics for studies in sports medicine and exercise science. Med Sci Sports Exerc. 2009;41(1):3–12.PubMedCrossRef
20.
go back to reference James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning with applications in R. New York: Springer; 2017. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning with applications in R. New York: Springer; 2017.
21.
go back to reference Kabacoff R. R in action. 2nd ed. Shelter Island: Manning; 2015. Kabacoff R. R in action. 2nd ed. Shelter Island: Manning; 2015.
22.
go back to reference Kharrazi H, Lasser EC, AYasnoff W, Loonsk J, Advani A, Lehmann HP, Chin DC, Weiner JP. a proposed national research and development agenda for population health informatics: summary recommendations from a national expert workshop. J Am Med Inform Assoc. 2017;24(1):2–12.PubMedCrossRef Kharrazi H, Lasser EC, AYasnoff W, Loonsk J, Advani A, Lehmann HP, Chin DC, Weiner JP. a proposed national research and development agenda for population health informatics: summary recommendations from a national expert workshop. J Am Med Inform Assoc. 2017;24(1):2–12.PubMedCrossRef
24.
go back to reference Kruschke J. Doing Bayesian data analysis. 2nd ed. London: Academic Press; 2015. Kruschke J. Doing Bayesian data analysis. 2nd ed. London: Academic Press; 2015.
25.
go back to reference Lee AS, Pan A, Harbarth S, Patroni A, Chalfine A, Daikos GL, Garilli S, Martínez JA, Cooper BS. Variable performance of models for predicting methicillin-resistant Staphylococcus aureus carriage in European surgical wards. BMC Infect Dis. 2015;15(1):105.PubMedPubMedCentralCrossRef Lee AS, Pan A, Harbarth S, Patroni A, Chalfine A, Daikos GL, Garilli S, Martínez JA, Cooper BS. Variable performance of models for predicting methicillin-resistant Staphylococcus aureus carriage in European surgical wards. BMC Infect Dis. 2015;15(1):105.PubMedPubMedCentralCrossRef
26.
go back to reference MacKinnon DP. Introduction to statistical mediation analysis. New York: Erlbaum; 2008. MacKinnon DP. Introduction to statistical mediation analysis. New York: Erlbaum; 2008.
27.
go back to reference Massoudi BL, Chester KG. Public health, population health, and epidemiology informatics: recent research and trends in the United States. Yearb Med Inform. 2017;26(1):241–7.PubMedPubMedCentralCrossRef Massoudi BL, Chester KG. Public health, population health, and epidemiology informatics: recent research and trends in the United States. Yearb Med Inform. 2017;26(1):241–7.PubMedPubMedCentralCrossRef
28.
go back to reference Messer LC, Jagai JS, Rappazzo KM, Lobdell DT. Construction of an environmental quality index for public health research. Environ Health Global Access Sci Sour. 2014;13(1):39. Messer LC, Jagai JS, Rappazzo KM, Lobdell DT. Construction of an environmental quality index for public health research. Environ Health Global Access Sci Sour. 2014;13(1):39.
29.
go back to reference Michie S, West R. Behaviour change theory and evidence: a presentation to government. Health Psychol Rev. 2013;7(1):1–22.CrossRef Michie S, West R. Behaviour change theory and evidence: a presentation to government. Health Psychol Rev. 2013;7(1):1–22.CrossRef
30.
go back to reference Murphy KR, Myors B. Testing the hypothesis that treatments have negligible effects: minimum-effect tests in the general linear model. J Appl Psychol. 1999;84(2):234–48.CrossRef Murphy KR, Myors B. Testing the hypothesis that treatments have negligible effects: minimum-effect tests in the general linear model. J Appl Psychol. 1999;84(2):234–48.CrossRef
31.
go back to reference Musci RJ, Kharrazi H, Wilson RF, Susukida R, Gharghabi F, Zhang A, Wissow L, Robinson KA, Wilcox HC. The study of effect moderation in youth suicide-prevention studies. Soc Psychiatry Psychiatr Epidemiol. 2018;53(12):1303–10.PubMedCrossRef Musci RJ, Kharrazi H, Wilson RF, Susukida R, Gharghabi F, Zhang A, Wissow L, Robinson KA, Wilcox HC. The study of effect moderation in youth suicide-prevention studies. Soc Psychiatry Psychiatr Epidemiol. 2018;53(12):1303–10.PubMedCrossRef
32.
go back to reference Pedhazur E. Multiple regression in behavioral research: explanation and prediction. 3rd ed. London: Harcourt Brace; 1997. Pedhazur E. Multiple regression in behavioral research: explanation and prediction. 3rd ed. London: Harcourt Brace; 1997.
33.
go back to reference Pedhazur EJ, Schmelkin LP. Measurement, design and analysis: an integrated approach. Hillsdale; Hove: Lawrence Erlbaum; 1991. Pedhazur EJ, Schmelkin LP. Measurement, design and analysis: an integrated approach. Hillsdale; Hove: Lawrence Erlbaum; 1991.
35.
go back to reference Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. J Am Med Inform Assoc. 2018;25(8):969–75.PubMedPubMedCentralCrossRef Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. J Am Med Inform Assoc. 2018;25(8):969–75.PubMedPubMedCentralCrossRef
36.
go back to reference Reynolds D, Hennessy E, Polek E. Is breastfeeding in infancy predictive of child mental well-being and protective against obesity at 9 years of age? Child Care Health Dev. 2014;40(6):882–90.PubMedCrossRef Reynolds D, Hennessy E, Polek E. Is breastfeeding in infancy predictive of child mental well-being and protective against obesity at 9 years of age? Child Care Health Dev. 2014;40(6):882–90.PubMedCrossRef
37.
go back to reference Rudin C: Please stop explaining black box models for high stakes decisions. 2018. arXiv preprint arXiv:1811.10154 Rudin C: Please stop explaining black box models for high stakes decisions. 2018. arXiv preprint arXiv:1811.10154
38.
go back to reference Samadder SR, Nagesh Kumar D, Holden NM. An empirical model to predict arsenic pollution affected life expectancy. Popul Environ. 2014;36(2):219–33.CrossRef Samadder SR, Nagesh Kumar D, Holden NM. An empirical model to predict arsenic pollution affected life expectancy. Popul Environ. 2014;36(2):219–33.CrossRef
39.
go back to reference Samaritans. Suicide statistics report 2017. Ewell, Surrey: Author; 2017. Samaritans. Suicide statistics report 2017. Ewell, Surrey: Author; 2017.
40.
go back to reference Sheldon KM. Integrating behavioral-motive and experiential-requirement perspectives on psychological needs: a two process model. Psychol Rev. 2011;118(4):552–69.PubMedCrossRef Sheldon KM. Integrating behavioral-motive and experiential-requirement perspectives on psychological needs: a two process model. Psychol Rev. 2011;118(4):552–69.PubMedCrossRef
41.
42.
go back to reference Simon GE, Johnson E, Lawrence JM, Rossom RC, Ahmedani B, Lynch FL, Beck A, Waitzfelder B, Ziebell R, Penfold RB, Shortreed SM. Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. Am J Psychiatry. 2018;175(10):951–60.PubMedCrossRefPubMedCentral Simon GE, Johnson E, Lawrence JM, Rossom RC, Ahmedani B, Lynch FL, Beck A, Waitzfelder B, Ziebell R, Penfold RB, Shortreed SM. Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. Am J Psychiatry. 2018;175(10):951–60.PubMedCrossRefPubMedCentral
43.
go back to reference Tabachnick BG, Fidell LS. Using multivariate statistics. 6th ed. Boston, London: Pearson; 2013. Tabachnick BG, Fidell LS. Using multivariate statistics. 6th ed. Boston, London: Pearson; 2013.
44.
go back to reference Tan CL, Gan VBY, Saleem F, MAA H. Building intentions with the theory of planned behaviour: The mediating role of knowledge and expectations in implementing new pharmaceutical services in Malaysia. Pharm Pract. 2016;14(4):850.CrossRef Tan CL, Gan VBY, Saleem F, MAA H. Building intentions with the theory of planned behaviour: The mediating role of knowledge and expectations in implementing new pharmaceutical services in Malaysia. Pharm Pract. 2016;14(4):850.CrossRef
45.
go back to reference Tu Y, Gunnell D, Gilthorpe MS. Simpson’s paradox, Lord’s paradox, and suppression effects are the same phenomenon - the reversal paradox. Emerg Themes Epidemiol. 2008;5:2.PubMedPubMedCentralCrossRef Tu Y, Gunnell D, Gilthorpe MS. Simpson’s paradox, Lord’s paradox, and suppression effects are the same phenomenon - the reversal paradox. Emerg Themes Epidemiol. 2008;5:2.PubMedPubMedCentralCrossRef
46.
go back to reference Veldkamp B. Mastering the data mass. Enschede: University of Twente; 2018. Veldkamp B. Mastering the data mass. Enschede: University of Twente; 2018.
47.
go back to reference Wilcox H, Wissow L, Kharrazi H, Wilson R, Musci R, Zhang A, Robinson K. Data linkage strategies to advance youth suicide prevention. Evid Rep Technol Assess. 2016a;222(9):1–70. Wilcox H, Wissow L, Kharrazi H, Wilson R, Musci R, Zhang A, Robinson K. Data linkage strategies to advance youth suicide prevention. Evid Rep Technol Assess. 2016a;222(9):1–70.
48.
go back to reference Wilcox HC, Kharrazi H, Wilson RF, Musci RJ, Susukida R, Gharghabi F, Zhang A, Wissow L, Robinson KA. Data linkage strategies to advance youth suicide prevention: a systematic review for a National Institutes of health pathways to prevention workshop. Ann Intern Med. 2016b;165(11):779–85.PubMedCrossRef Wilcox HC, Kharrazi H, Wilson RF, Musci RJ, Susukida R, Gharghabi F, Zhang A, Wissow L, Robinson KA. Data linkage strategies to advance youth suicide prevention: a systematic review for a National Institutes of health pathways to prevention workshop. Ann Intern Med. 2016b;165(11):779–85.PubMedCrossRef
49.
go back to reference Wilson NJ, Cordier R. A narrative review of Men's sheds literature: reducing social isolation and promoting men's health and well-being. Health Soc Care Community. 2013;21(5):451–63.PubMedCrossRef Wilson NJ, Cordier R. A narrative review of Men's sheds literature: reducing social isolation and promoting men's health and well-being. Health Soc Care Community. 2013;21(5):451–63.PubMedCrossRef
Metadata
Title
Explainable statistical learning in public health for policy development: the case of real-world suicide data
Authors
Paul van Schaik
Yonghong Peng
Adedokun Ojelabi
Jonathan Ling
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2019
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-019-0796-7

Other articles of this Issue 1/2019

BMC Medical Research Methodology 1/2019 Go to the issue