Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | COPD Exacerbation | Research article

Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records

Authors: Maria Pikoula, Jennifer Kathleen Quint, Francis Nissen, Harry Hemingway, Liam Smeeth, Spiros Denaxas

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

COPD is a highly heterogeneous disease composed of different phenotypes with different aetiological and prognostic profiles and current classification systems do not fully capture this heterogeneity. In this study we sought to discover, describe and validate COPD subtypes using cluster analysis on data derived from electronic health records.

Methods

We applied two unsupervised learning algorithms (k-means and hierarchical clustering) in 30,961 current and former smokers diagnosed with COPD, using linked national structured electronic health records in England available through the CALIBER resource. We used 15 clinical features, including risk factors and comorbidities and performed dimensionality reduction using multiple correspondence analysis. We compared the association between cluster membership and COPD exacerbations and respiratory and cardiovascular death with 10,736 deaths recorded over 146,466 person-years of follow-up. We also implemented and tested a process to assign unseen patients into clusters using a decision tree classifier.

Results

We identified and characterized five COPD patient clusters with distinct patient characteristics with respect to demographics, comorbidities, risk of death and exacerbations. The four subgroups were associated with 1) anxiety/depression; 2) severe airflow obstruction and frailty; 3) cardiovascular disease and diabetes and 4) obesity/atopy. A fifth cluster was associated with low prevalence of most comorbid conditions.

Conclusions

COPD patients can be sub-classified into groups with differing risk factors, comorbidities, and prognosis, based on data included in their primary care records. The identified clusters confirm findings of previous clustering studies and draw attention to anxiety and depression as important drivers of the disease in young, female patients.
Appendix
Available only for authorised users
Literature
2.
go back to reference Soriano JB. An epidemiological Overview of chronic obstructive pulmonary disease: what can real-life data tell us about disease management? COPD J Chron Obstruct Pulmon Dis. 2017;14:S3–7.CrossRef Soriano JB. An epidemiological Overview of chronic obstructive pulmonary disease: what can real-life data tell us about disease management? COPD J Chron Obstruct Pulmon Dis. 2017;14:S3–7.CrossRef
3.
go back to reference Rothnie KJ, et al. Recording of hospitalizations for acute exacerbations of COPD in UK electronic health care records. Clin Epidemiol. 2016;8:771–82.CrossRefPubMedPubMedCentral Rothnie KJ, et al. Recording of hospitalizations for acute exacerbations of COPD in UK electronic health care records. Clin Epidemiol. 2016;8:771–82.CrossRefPubMedPubMedCentral
6.
go back to reference Agustí A, Celli B, Faner R. What does endotyping mean for treatment in chronic obstructive pulmonary disease? The Lancet. 2017;390:980–7.CrossRef Agustí A, Celli B, Faner R. What does endotyping mean for treatment in chronic obstructive pulmonary disease? The Lancet. 2017;390:980–7.CrossRef
7.
go back to reference Burgel P-R, Paillasseur J-L, Roche N. Identification of clinical phenotypes using cluster analyses in COPD patients with multiple comorbidities. Biomed Res Int. 2014;2014:420134.CrossRefPubMedPubMedCentral Burgel P-R, Paillasseur J-L, Roche N. Identification of clinical phenotypes using cluster analyses in COPD patients with multiple comorbidities. Biomed Res Int. 2014;2014:420134.CrossRefPubMedPubMedCentral
9.
go back to reference Castaldi PJ, et al. Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax. 2014;69:416–23.CrossRefPubMedCentral Castaldi PJ, et al. Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax. 2014;69:416–23.CrossRefPubMedCentral
10.
go back to reference Rennard SI, et al. Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis. Ann Am Thorac Soc. 2015;12:303–12.CrossRefPubMed Rennard SI, et al. Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis. Ann Am Thorac Soc. 2015;12:303–12.CrossRefPubMed
13.
go back to reference Denaxas SC, et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). Int J Epidemiol. 2012;41:1625–38.CrossRefPubMedPubMedCentral Denaxas SC, et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). Int J Epidemiol. 2012;41:1625–38.CrossRefPubMedPubMedCentral
15.
go back to reference Rapsomaniki E, et al. Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1{·} 25 million people. Lancet. 2014;383. Rapsomaniki E, et al. Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1{·} 25 million people. Lancet. 2014;383.
17.
go back to reference Herrett E, et al. Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study. BMJ. 2013;346:f2350.CrossRefPubMedPubMedCentral Herrett E, et al. Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study. BMJ. 2013;346:f2350.CrossRefPubMedPubMedCentral
18.
go back to reference Quint JK, et al. Validation of chronic obstructive pulmonary disease recording in the clinical practice research datalink (CPRD-GOLD). BMJ Open. 2014;4:e005540.CrossRefPubMedPubMedCentral Quint JK, et al. Validation of chronic obstructive pulmonary disease recording in the clinical practice research datalink (CPRD-GOLD). BMJ Open. 2014;4:e005540.CrossRefPubMedPubMedCentral
21.
go back to reference Rabe KF, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2007;176:532–55.CrossRefPubMed Rabe KF, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2007;176:532–55.CrossRefPubMed
22.
go back to reference Daskalopoulou M, et al. Depression as a risk factor for the initial presentation of twelve cardiac, cerebrovascular, and peripheral arterial diseases: data linkage study of 1.9 million women and men. PLoS One. 2016;11:e0153838.CrossRefPubMedPubMedCentral Daskalopoulou M, et al. Depression as a risk factor for the initial presentation of twelve cardiac, cerebrovascular, and peripheral arterial diseases: data linkage study of 1.9 million women and men. PLoS One. 2016;11:e0153838.CrossRefPubMedPubMedCentral
23.
go back to reference Koudstaal S, et al. Prognostic burden of heart failure recorded in primary care, acute hospital admissions, or both: a population-based linked electronic health record cohort study in 2.1 million people methods and results. Eur J Heart Fail. 2017;19:1119–27.CrossRefPubMed Koudstaal S, et al. Prognostic burden of heart failure recorded in primary care, acute hospital admissions, or both: a population-based linked electronic health record cohort study in 2.1 million people methods and results. Eur J Heart Fail. 2017;19:1119–27.CrossRefPubMed
24.
go back to reference Gho JMIH, et al. An electronic health records cohort study on heart failure following myocardial infarction in England: incidence and predictors. BMJ Open. 2018;8:e018331.CrossRefPubMedPubMedCentral Gho JMIH, et al. An electronic health records cohort study on heart failure following myocardial infarction in England: incidence and predictors. BMJ Open. 2018;8:e018331.CrossRefPubMedPubMedCentral
25.
go back to reference Morley KI, et al. Defining disease phenotypes using national linked electronic health records: a case study of atrial fibrillation. PLoS One. 2014;9:e110900.CrossRefPubMedPubMedCentral Morley KI, et al. Defining disease phenotypes using national linked electronic health records: a case study of atrial fibrillation. PLoS One. 2014;9:e110900.CrossRefPubMedPubMedCentral
27.
go back to reference Rothnie KJ, et al. Validation of the recording of acute exacerbations of COPD in UK primary care electronic healthcare records. PLoS One. 2016;11:e0151357.CrossRefPubMedPubMedCentral Rothnie KJ, et al. Validation of the recording of acute exacerbations of COPD in UK primary care electronic healthcare records. PLoS One. 2016;11:e0151357.CrossRefPubMedPubMedCentral
30.
go back to reference Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv. 1999;31:264–323.CrossRef Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv. 1999;31:264–323.CrossRef
31.
go back to reference Choi SS, Cha SH, Tappert CC. A survey of Binary similarity and distance measures. J Syst Cybern INFORMATICS. 2010;8(1):43–8. Choi SS, Cha SH, Tappert CC. A survey of Binary similarity and distance measures. J Syst Cybern INFORMATICS. 2010;8(1):43–8.
32.
go back to reference Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.CrossRef Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.CrossRef
34.
go back to reference Gayle A, Axson E, Bloom C, Navaratnam V, Quint JK. Changing causes of mortality for people with chronic respiratory diseases. In: European Respiratory Society international congress; 2018. Gayle A, Axson E, Bloom C, Navaratnam V, Quint JK. Changing causes of mortality for people with chronic respiratory diseases. In: European Respiratory Society international congress; 2018.
35.
go back to reference Denaxas SC, Morley KI. Big biomedical data and cardiovascular disease research: opportunities and challenges. Eur Hear J-Qual Care Clin Outcome. 2015;1:9–16.CrossRefPubMed Denaxas SC, Morley KI. Big biomedical data and cardiovascular disease research: opportunities and challenges. Eur Hear J-Qual Care Clin Outcome. 2015;1:9–16.CrossRefPubMed
38.
Metadata
Title
Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records
Authors
Maria Pikoula
Jennifer Kathleen Quint
Francis Nissen
Harry Hemingway
Liam Smeeth
Spiros Denaxas
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-0805-0

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue