Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2020

Open Access 01-12-2020 | Stroke | Research article

Assessing stroke severity using electronic health record data: a machine learning approach

Authors: Emily Kogan, Kathryn Twyman, Jesse Heap, Dejan Milentijevic, Jennifer H. Lin, Mark Alberts

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Login to get access

Abstract

Background

Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data.

Methods

NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set.

Results

Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5.

Conclusions

Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.
Appendix
Available only for authorised users
Literature
1.
go back to reference Katzan IL, Spertus J, Bettger JP, Bravata DM, Reeves MJ, Smith EE, et al. Risk adjustment of ischemic stroke outcomes for comparing hospital performance: a statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke. 2014;45(3):918–44.CrossRef Katzan IL, Spertus J, Bettger JP, Bravata DM, Reeves MJ, Smith EE, et al. Risk adjustment of ischemic stroke outcomes for comparing hospital performance: a statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke. 2014;45(3):918–44.CrossRef
2.
go back to reference National Center for Health Statistics. Health, United States, 2016: with Chartbook on long-term trends in health. Hyattsville: National Center for Health Statistics; 2017. National Center for Health Statistics. Health, United States, 2016: with Chartbook on long-term trends in health. Hyattsville: National Center for Health Statistics; 2017.
4.
go back to reference Fonarow GC, Saver JL, Smith EE, Broderick JP, Kleindorfer DO, Sacco RL, et al. Relationship of national institutes of health stroke scale to 30-day mortality in medicare beneficiaries with acute ischemic stroke. J Am Heart Assoc. 2012;1(1):42–50.CrossRef Fonarow GC, Saver JL, Smith EE, Broderick JP, Kleindorfer DO, Sacco RL, et al. Relationship of national institutes of health stroke scale to 30-day mortality in medicare beneficiaries with acute ischemic stroke. J Am Heart Assoc. 2012;1(1):42–50.CrossRef
5.
go back to reference Phan TG, Clissold BB, Ma H, Ly JV, Srikanth V. Predicting disability after ischemic stroke based on comorbidity index and stroke severity-from the virtual international stroke trials archive-acute collaboration. Front Neurol. 2017;8:192.CrossRef Phan TG, Clissold BB, Ma H, Ly JV, Srikanth V. Predicting disability after ischemic stroke based on comorbidity index and stroke severity-from the virtual international stroke trials archive-acute collaboration. Front Neurol. 2017;8:192.CrossRef
6.
go back to reference Hage AV. The NIH stroke scale: a window into neurological status. Nurs Spectr. 2011;24(15):44–9. Hage AV. The NIH stroke scale: a window into neurological status. Nurs Spectr. 2011;24(15):44–9.
7.
go back to reference Samuel OW, Fang P, Chen S, Geng Y, Li G. Activity recognition based on pattern recognition of myoelectric signals for rehabilitation. In: Khan SU, Zomaya AY, Abbas A, editors. Handbook of large-scale distributed computing in smart healthcare. Basel: Springer International Publishing AG; 2017. https://doi.org/10.1007/978-3-319-58280-1_16.CrossRef Samuel OW, Fang P, Chen S, Geng Y, Li G. Activity recognition based on pattern recognition of myoelectric signals for rehabilitation. In: Khan SU, Zomaya AY, Abbas A, editors. Handbook of large-scale distributed computing in smart healthcare. Basel: Springer International Publishing AG; 2017. https://​doi.​org/​10.​1007/​978-3-319-58280-1_​16.CrossRef
8.
go back to reference Fonarow GC, Alberts MJ, Broderick JP, Jauch EC, Kleindorfer DO, Saver JL, et al. Stroke outcomes measures must be appropriately risk adjusted to ensure quality care of patients: a presidential advisory from the American Heart Association/American Stroke Association. Stroke. 2014;45(5):1589–601.CrossRef Fonarow GC, Alberts MJ, Broderick JP, Jauch EC, Kleindorfer DO, Saver JL, et al. Stroke outcomes measures must be appropriately risk adjusted to ensure quality care of patients: a presidential advisory from the American Heart Association/American Stroke Association. Stroke. 2014;45(5):1589–601.CrossRef
9.
go back to reference Sung SF, Hsieh CY, Kao Yang YH, Lin HJ, Chen CH, Chen YW, et al. Developing a stroke severity index based on administrative data was feasible using data mining techniques. J Clin Epidemiol. 2015;68(11):1292–300.CrossRef Sung SF, Hsieh CY, Kao Yang YH, Lin HJ, Chen CH, Chen YW, et al. Developing a stroke severity index based on administrative data was feasible using data mining techniques. J Clin Epidemiol. 2015;68(11):1292–300.CrossRef
10.
go back to reference Sung SF, Hsieh CY, Lin HJ, Chen YW, Chen CH, Kao Yang YH, et al. Validity of a stroke severity index for administrative claims data research: a retrospective cohort study. BMC Health Serv Res. 2016;16(1):509.CrossRef Sung SF, Hsieh CY, Lin HJ, Chen YW, Chen CH, Kao Yang YH, et al. Validity of a stroke severity index for administrative claims data research: a retrospective cohort study. BMC Health Serv Res. 2016;16(1):509.CrossRef
11.
go back to reference Sung SF, Chen SC, Hsieh CY, Li CY, Lai EC, Hu YH. A comparison of stroke severity proxy measures for claims data research: a population-based cohort study. Pharmacoepidemiol Drug Saf. 2016;25(4):438–43.CrossRef Sung SF, Chen SC, Hsieh CY, Li CY, Lai EC, Hu YH. A comparison of stroke severity proxy measures for claims data research: a population-based cohort study. Pharmacoepidemiol Drug Saf. 2016;25(4):438–43.CrossRef
13.
go back to reference Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Information Insurance Portability and Accountability Act (HIPAA) Privacy Rule (Dated as September 4, 2012, as first released on November 26, 2012). Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Information Insurance Portability and Accountability Act (HIPAA) Privacy Rule (Dated as September 4, 2012, as first released on November 26, 2012).
14.
go back to reference Nunes AP, Yang J, Radican L, Engel SS, Kurtyka K, Tunceli K, et al. Assessing occurrence of hypoglycemia and its severity from electronic health records of patients with type 2 diabetes mellitus. Diabetes Res Clin Pract. 2016;121:192–203.CrossRef Nunes AP, Yang J, Radican L, Engel SS, Kurtyka K, Tunceli K, et al. Assessing occurrence of hypoglycemia and its severity from electronic health records of patients with type 2 diabetes mellitus. Diabetes Res Clin Pract. 2016;121:192–203.CrossRef
15.
go back to reference Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130–9.CrossRef Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130–9.CrossRef
17.
go back to reference Reeves M, Khoury J, Alwell K, Moomaw C, Flaherty M, Woo D, et al. Distribution of National Institutes of Health stroke scale in the cincinnati/northern Kentucky stroke study. Stroke. 2013;44(11):3211–3.CrossRef Reeves M, Khoury J, Alwell K, Moomaw C, Flaherty M, Woo D, et al. Distribution of National Institutes of Health stroke scale in the cincinnati/northern Kentucky stroke study. Stroke. 2013;44(11):3211–3.CrossRef
18.
go back to reference Ginsberg MD, Palesch YY, Hill MD, Martin RH, Moy CS, Barsan WG, et al. High-dose albumin treatment for acute ischaemic stroke (ALIAS) part 2: a randomised, double-blind, phase 3, placebo-controlled trial. Lancet Neurol. 2013;12(11):1049–58.CrossRef Ginsberg MD, Palesch YY, Hill MD, Martin RH, Moy CS, Barsan WG, et al. High-dose albumin treatment for acute ischaemic stroke (ALIAS) part 2: a randomised, double-blind, phase 3, placebo-controlled trial. Lancet Neurol. 2013;12(11):1049–58.CrossRef
19.
go back to reference Albers GW, von Kummer R, Truelsen T, Jensen J-KS, Ravn GM, Grønning BA, et al. Safety and efficacy of desmoteplase given 3–9 h after ischaemic stroke in patients with occlusion or high-grade stenosis in major cerebral arteries (DIAS-3): a double-blind, randomised, placebo-controlled phase 3 trial. Lancet Neurol. 2015;14(6):575–84.CrossRef Albers GW, von Kummer R, Truelsen T, Jensen J-KS, Ravn GM, Grønning BA, et al. Safety and efficacy of desmoteplase given 3–9 h after ischaemic stroke in patients with occlusion or high-grade stenosis in major cerebral arteries (DIAS-3): a double-blind, randomised, placebo-controlled phase 3 trial. Lancet Neurol. 2015;14(6):575–84.CrossRef
20.
go back to reference Reeves MJ, Smith EE, Fonarow GC, Zhao X, Thompson M, Peterson ED, et al. Variation and trends in the documentation of National Institutes of Health stroke scale in GWTG-stroke hospitals. Circ Cardiovasc Qual Outcomes. 2015;8(6 Suppl 3):S90–8.CrossRef Reeves MJ, Smith EE, Fonarow GC, Zhao X, Thompson M, Peterson ED, et al. Variation and trends in the documentation of National Institutes of Health stroke scale in GWTG-stroke hospitals. Circ Cardiovasc Qual Outcomes. 2015;8(6 Suppl 3):S90–8.CrossRef
Metadata
Title
Assessing stroke severity using electronic health record data: a machine learning approach
Authors
Emily Kogan
Kathryn Twyman
Jesse Heap
Dejan Milentijevic
Jennifer H. Lin
Mark Alberts
Publication date
01-12-2020
Publisher
BioMed Central
Keyword
Stroke
Published in
BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-1010-x

Other articles of this Issue 1/2020

BMC Medical Informatics and Decision Making 1/2020 Go to the issue