Skip to main content
Top
Published in: Orphanet Journal of Rare Diseases 1/2024

Open Access 01-12-2024 | Research

Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity

Authors: Carole Faviez, Marc Vincent, Nicolas Garcelon, Olivia Boyer, Bertrand Knebelmann, Laurence Heidet, Sophie Saunier, Xiaoyi Chen, Anita Burgun

Published in: Orphanet Journal of Rare Diseases | Issue 1/2024

Login to get access

Abstract

Background

Rare diseases affect approximately 400 million people worldwide. Many of them suffer from delayed diagnosis. Among them, NPHP1-related renal ciliopathies need to be diagnosed as early as possible as potential treatments have been recently investigated with promising results. Our objective was to develop a supervised machine learning pipeline for the detection of NPHP1 ciliopathy patients from a large number of nephrology patients using electronic health records (EHRs).

Methods and results

We designed a pipeline combining a phenotyping module re-using unstructured EHR data, a semantic similarity module to address the phenotype dependence, a feature selection step to deal with high dimensionality, an undersampling step to address the class imbalance, and a classification step with multiple train-test split for the small number of rare cases. The pipeline was applied to thirty NPHP1 patients and 7231 controls and achieved good performances (sensitivity 86% with specificity 90%). A qualitative review of the EHRs of 40 misclassified controls showed that 25% had phenotypes belonging to the ciliopathy spectrum, which demonstrates the ability of our system to detect patients with similar conditions.

Conclusions

Our pipeline reached very encouraging performance scores for pre-diagnosing ciliopathy patients. The identified patients could then undergo genetic testing. The same data-driven approach can be adapted to other rare diseases facing underdiagnosis challenges.
Appendix
Available only for authorised users
Literature
3.
go back to reference Hens D, Wyers L, Claeys KG. Validation of an Artificial Intelligence driven framework to automatically detect red flag symptoms in screening for rare diseases in electronic health records: hereditary transthyretin amyloidosis polyneuropathy as a key example. J Peripher Nervous Syst. 2023. https://doi.org/10.1111/jns.12523.CrossRef Hens D, Wyers L, Claeys KG. Validation of an Artificial Intelligence driven framework to automatically detect red flag symptoms in screening for rare diseases in electronic health records: hereditary transthyretin amyloidosis polyneuropathy as a key example. J Peripher Nervous Syst. 2023. https://​doi.​org/​10.​1111/​jns.​12523.CrossRef
33.
go back to reference Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16. Association for Computing Machinery; 2016. pp. 785–794. doi:https://doi.org/10.1145/2939672.2939785 Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16. Association for Computing Machinery; 2016. pp. 785–794. doi:https://​doi.​org/​10.​1145/​2939672.​2939785
Metadata
Title
Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity
Authors
Carole Faviez
Marc Vincent
Nicolas Garcelon
Olivia Boyer
Bertrand Knebelmann
Laurence Heidet
Sophie Saunier
Xiaoyi Chen
Anita Burgun
Publication date
01-12-2024
Publisher
BioMed Central
Published in
Orphanet Journal of Rare Diseases / Issue 1/2024
Electronic ISSN: 1750-1172
DOI
https://doi.org/10.1186/s13023-024-03063-7

Other articles of this Issue 1/2024

Orphanet Journal of Rare Diseases 1/2024 Go to the issue