Skip to main content
Top
Published in: Advances in Therapy 2/2021

Open Access 01-02-2021 | Original Research

Body Mass Index Variable Interpolation to Expand the Utility of Real-world Administrative Healthcare Claims Database Analyses

Published in: Advances in Therapy | Issue 2/2021

Login to get access

Abstract

Introduction

Administrative claims data provide an important source for real-world evidence (RWE) generation, but incomplete reporting, such as for body mass index (BMI), limits the sample sizes that can be analyzed to address certain research questions. The objective of this study was to construct models by implementing machine-learning (ML) algorithms to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40 kg/m2) in administrative healthcare claims databases, and then internally and externally validate them.

Methods

Five advanced ML algorithms were implemented for each BMI classification on a random sampling of BMI readings from the Optum PanTher Electronic Health Record database (2%) and the Optum Clinformatics Date of Death (20%) database, while incorporating baseline demographic and clinical characteristics. Sensitivity analyses with oversampling ratios were conducted. Model performance was validated internally and externally.

Results

Models trained on the Super Learner ML algorithm (SLA) yielded the best BMI classification predictive performance. SLA model 1 utilized sociodemographic and clinical characteristics, including baseline BMI values; the area under the receiver operating characteristic curve (ROC AUC) was approximately 88% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m2 (internal validation), while accuracy ranged from 87.9% to 92.8% and specificity ranged from 91.8% to 94.7%. SLA model 2 utilized sociodemographic information and clinical characteristics, excluding baseline BMI values; ROC AUC was approximately 73% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m2 (internal validation), while accuracy ranged from 73.6% to 80.0% and specificity ranged from 71.6% to 85.9%. The external validation on the MarketScan Commercial Claims and Encounters database yielded relatively consistent results with slightly diminished performance.

Conclusion

This study demonstrated the feasibility and validity of using ML algorithms to predict BMI classifications in administrative healthcare claims data to expand the utility for RWE generation.
Appendix
Available only for authorised users
Literature
1.
go back to reference Xia AD, Schaefer CP, Szende A, et al. RWE framework: an interactive visual tool to support a real-world evidence study design. Drugs Real World Outcomes. 2019;6:193–203.CrossRef Xia AD, Schaefer CP, Szende A, et al. RWE framework: an interactive visual tool to support a real-world evidence study design. Drugs Real World Outcomes. 2019;6:193–203.CrossRef
2.
go back to reference Katkade VB, Sanders KN, Zou KH. Real world data: an opportunity to supplement existing evidence for the use of long-established medicines in health care decision making. J Multidiscip Health. 2018;11:295–304.CrossRef Katkade VB, Sanders KN, Zou KH. Real world data: an opportunity to supplement existing evidence for the use of long-established medicines in health care decision making. J Multidiscip Health. 2018;11:295–304.CrossRef
4.
go back to reference Office of the Surgeon General (US). The Surgeon General's call to action to prevent and decrease overweight and obesity. Office of Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, National Institutes of Health, U.S. Department of Health and Human Services. Rockville, MD: Office of the Surgeon General; 2001. https://www.ncbi.nlm.nih.gov/books/NBK44206/. Accessed 11 Aug 2020. Office of the Surgeon General (US). The Surgeon General's call to action to prevent and decrease overweight and obesity. Office of Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, National Institutes of Health, U.S. Department of Health and Human Services. Rockville, MD: Office of the Surgeon General; 2001. https://​www.​ncbi.​nlm.​nih.​gov/​books/​NBK44206/​. Accessed 11 Aug 2020.
5.
6.
go back to reference Stommel M, Schoenborn CA. Variations in BMI and prevalence of health risks in diverse racial and ethnic populations. Obesity. 2010;18:1821–6.CrossRef Stommel M, Schoenborn CA. Variations in BMI and prevalence of health risks in diverse racial and ethnic populations. Obesity. 2010;18:1821–6.CrossRef
7.
go back to reference Kamble PS, Hayden J, Collins J, et al. Association of obesity with healthcare resource utilization and costs in a commercial population. Curr Med Res Opin. 2018;34:1335–43.CrossRef Kamble PS, Hayden J, Collins J, et al. Association of obesity with healthcare resource utilization and costs in a commercial population. Curr Med Res Opin. 2018;34:1335–43.CrossRef
8.
go back to reference Elrashidi MY, Jacobson DJ, St Sauver J, et al. Body mass index trajectories and healthcare utilization in young and middle-aged adults. Medicine (Baltimore). 2016;95:e2467.CrossRef Elrashidi MY, Jacobson DJ, St Sauver J, et al. Body mass index trajectories and healthcare utilization in young and middle-aged adults. Medicine (Baltimore). 2016;95:e2467.CrossRef
9.
go back to reference Andreyeva T, Sturm R, Ringel JS. Moderate and severe obesity have large differences in health care costs. Obes Res. 2004;12:1936–43.CrossRef Andreyeva T, Sturm R, Ringel JS. Moderate and severe obesity have large differences in health care costs. Obes Res. 2004;12:1936–43.CrossRef
10.
go back to reference Ammann EM, Kalsekar I, Yoo A, et al. Validation of body mass index (BMI)-related ICD-9-CM and ICD-10-CM administrative diagnosis codes recorded in US claims data. Pharmacoepidemiol Drug Saf. 2018;27:1092–100.CrossRef Ammann EM, Kalsekar I, Yoo A, et al. Validation of body mass index (BMI)-related ICD-9-CM and ICD-10-CM administrative diagnosis codes recorded in US claims data. Pharmacoepidemiol Drug Saf. 2018;27:1092–100.CrossRef
12.
go back to reference Maniruzzaman M, Rahman MJ, Al-MehediHasan M, et al. Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J Med Syst. 2018;42:92.CrossRef Maniruzzaman M, Rahman MJ, Al-MehediHasan M, et al. Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J Med Syst. 2018;42:92.CrossRef
13.
go back to reference Goldstein BA, Navar AM, Carter RE. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. Eur Heart J. 2017;38:1805–14.PubMed Goldstein BA, Navar AM, Carter RE. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. Eur Heart J. 2017;38:1805–14.PubMed
14.
go back to reference Jauk S, Kramer D, Leodolter W. Cleansing and imputation of body mass index data and its impact on a machine learning based prediction model. Stud Health Technol Inform. 2018;248:116–23.PubMed Jauk S, Kramer D, Leodolter W. Cleansing and imputation of body mass index data and its impact on a machine learning based prediction model. Stud Health Technol Inform. 2018;248:116–23.PubMed
15.
go back to reference Zou Q, Qu K, Luo Y, et al. Predicting diabetes mellitus with machine learning techniques. Front Genet. 2018;9:515.CrossRef Zou Q, Qu K, Luo Y, et al. Predicting diabetes mellitus with machine learning techniques. Front Genet. 2018;9:515.CrossRef
16.
go back to reference Kavakiotis I, Tsave O, Salifoglou A, et al. Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J. 2017;15:104–16.CrossRef Kavakiotis I, Tsave O, Salifoglou A, et al. Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J. 2017;15:104–16.CrossRef
17.
go back to reference Zheng T, Xie W, Xu L, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.CrossRef Zheng T, Xie W, Xu L, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.CrossRef
18.
go back to reference Mueller L, Berhanu P, Bouchard J, et al. Application of machine learning models to evaluate hypoglycemia risk in type 2 diabetes. Diabetes Ther. 2020;11:681–99.CrossRef Mueller L, Berhanu P, Bouchard J, et al. Application of machine learning models to evaluate hypoglycemia risk in type 2 diabetes. Diabetes Ther. 2020;11:681–99.CrossRef
19.
go back to reference Ozcift A, Gulten A. Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput Methods Programs Biomed. 2011;104:443–51.CrossRef Ozcift A, Gulten A. Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput Methods Programs Biomed. 2011;104:443–51.CrossRef
20.
go back to reference Dipnall JF, Paco JA, Berk M, et al. Fusing data mining, machine learning and traditional statistics to detect biomarkers associated with depression. PLoS ONE. 2016;11:e0148195.CrossRef Dipnall JF, Paco JA, Berk M, et al. Fusing data mining, machine learning and traditional statistics to detect biomarkers associated with depression. PLoS ONE. 2016;11:e0148195.CrossRef
21.
go back to reference Quan H, Li B, Saunders D, et al. Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database. Health Serv Res. 2008;14:1424–41.CrossRef Quan H, Li B, Saunders D, et al. Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database. Health Serv Res. 2008;14:1424–41.CrossRef
22.
go back to reference Martin BJ, Chen G, Graham M, et al. Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies. BMC Health Serv Res. 2014;14:70.CrossRef Martin BJ, Chen G, Graham M, et al. Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies. BMC Health Serv Res. 2014;14:70.CrossRef
23.
go back to reference Hales CM, Carroll MD, Fryar CD, et al. Prevalence of obesity among adults and youth: United States, 2015–2016. NCHS Data Brief. 2017;288:1–8. Hales CM, Carroll MD, Fryar CD, et al. Prevalence of obesity among adults and youth: United States, 2015–2016. NCHS Data Brief. 2017;288:1–8.
24.
go back to reference Thesmar D, Sraer D, Phinheiro L, et al. Combining the power of artificial intelligence with the richness of healthcare claims data: opportunities and challenges. PharmacoEconomics. 2019;37:745–52.CrossRef Thesmar D, Sraer D, Phinheiro L, et al. Combining the power of artificial intelligence with the richness of healthcare claims data: opportunities and challenges. PharmacoEconomics. 2019;37:745–52.CrossRef
25.
go back to reference Mooney SJ, Pejaver V. Big data in public health: terminology, machine learning, and privacy. Annu Rev Public Health. 2018;39:95–112.CrossRef Mooney SJ, Pejaver V. Big data in public health: terminology, machine learning, and privacy. Annu Rev Public Health. 2018;39:95–112.CrossRef
26.
go back to reference Cornier M-A, Després J-P, Davis N, et al. Assessing adiposity: a scientific statement from the American Heart Association. Circulation. 2011;124:1996–2019.CrossRef Cornier M-A, Després J-P, Davis N, et al. Assessing adiposity: a scientific statement from the American Heart Association. Circulation. 2011;124:1996–2019.CrossRef
27.
go back to reference Naimi AI, Balzer LB. Stacked generalization: an introduction to super learning. Eur J Epidemiol. 2018;33:459–64.CrossRef Naimi AI, Balzer LB. Stacked generalization: an introduction to super learning. Eur J Epidemiol. 2018;33:459–64.CrossRef
28.
go back to reference Rose S. Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol. 2013;177:443–52.CrossRef Rose S. Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol. 2013;177:443–52.CrossRef
Metadata
Title
Body Mass Index Variable Interpolation to Expand the Utility of Real-world Administrative Healthcare Claims Database Analyses
Publication date
01-02-2021
Published in
Advances in Therapy / Issue 2/2021
Print ISSN: 0741-238X
Electronic ISSN: 1865-8652
DOI
https://doi.org/10.1007/s12325-020-01605-6

Other articles of this Issue 2/2021

Advances in Therapy 2/2021 Go to the issue