Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Biomarkers | Research article

Methods for a similarity measure for clinical attributes based on survival data analysis

Authors: Christian Karmen, Matthias Gietzelt, Petra Knaup-Gregori, Matthias Ganzinger

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

Case-based reasoning is a proven method that relies on learned cases from the past for decision support of a new case. The accuracy of such a system depends on the applied similarity measure, which quantifies the similarity between two cases. This work proposes a collection of methods for similarity measures especially for comparison of clinical cases based on survival data, as they are available for example from clinical trials.

Methods

Our approach is intended to be used in scenarios, where it is of interest to use longitudinal data, such as survival data, for a case-based reasoning approach. This might be especially important, where uncertainty about the ideal therapy decision exists. The collection of methods consists of definitions of the local similarity of nominal as well as numeric attributes, a calculation of attribute weights, a feature selection method and finally a global similarity measure. All of them use survival time (consisting of survival status and overall survival) as a reference of similarity. As a baseline, we calculate a survival function for each value of any given clinical attribute.

Results

We define the similarity between values of the same attribute by putting the estimated survival functions in relation to each other. Finally, we quantify the similarity by determining the area between corresponding curves of survival functions. The proposed global similarity measure is designed especially for cases from randomized clinical trials or other collections of clinical data with survival information. Overall survival can be considered as an eligible and alternative solution for similarity calculations. It is especially useful, when similarity measures that depend on the classic solution-describing attribute “applied therapy” are not applicable. This is often the case for data from clinical trials containing randomized arms.

Conclusions

In silico evaluation scenarios showed that the mean accuracy of biomarker detection in k = 10 most similar cases is higher (0.909–0.998) than for competing similarity measures, such as Heterogeneous Euclidian-Overlap Metric (0.657–0.831) and Discretized Value Difference Metric (0.535–0.671). The weight calculation method showed a more than six times (6.59–6.95) higher weight for biomarker attributes over non-biomarker attributes. These results suggest that the similarity measure described here is suitable for applications based on survival data.
Appendix
Available only for authorised users
Footnotes
1
In total, this leads to 100.000 single results considered for biomarker classification.
 
Literature
2.
go back to reference Aamodt A, Plaza E. Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun. 1994;7:39–59. Aamodt A, Plaza E. Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun. 1994;7:39–59.
15.
go back to reference Stamper R, Todd BS, Macpherson P. Case-based explanation for medical diagnostic programs, with an example from gynaecology. Methods Inf Med. 1994;33:205–13.CrossRef Stamper R, Todd BS, Macpherson P. Case-based explanation for medical diagnostic programs, with an example from gynaecology. Methods Inf Med. 1994;33:205–13.CrossRef
16.
go back to reference Jaulent MC, Bennani A, Le Bozec C, Zapletal E, Degoulet P. A customizable similarity measure between histological cases. Proc AMIA Symp. 2002:350–4. Jaulent MC, Bennani A, Le Bozec C, Zapletal E, Degoulet P. A customizable similarity measure between histological cases. Proc AMIA Symp. 2002:350–4.
19.
go back to reference Bach K, Sauer C, Althoff K-D, Roth-Berghofer T. Knowledge Modeling with the Open Source Tool myCBR. In: Nalepa GJ, Baumeister J, Kaczor K, editors. CEUR Workshop Proceedings (http://ceur-ws.org/); 2014. Bach K, Sauer C, Althoff K-D, Roth-Berghofer T. Knowledge Modeling with the Open Source Tool myCBR. In: Nalepa GJ, Baumeister J, Kaczor K, editors. CEUR Workshop Proceedings (http://​ceur-ws.​org/​); 2014.
20.
22.
go back to reference Goel A, Diaz-Agudo B. What’s hot in case-based reasoning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17); 2017. Goel A, Diaz-Agudo B. What’s hot in case-based reasoning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17); 2017.
23.
go back to reference Sizov G, Öztürk P, Aamodt A. Evidence-driven retrieval in textual CBR: bridging the gap between retrieval and reuse. In: Hüllermeier E, Minor M, editors. Case-based reasoning Research and Development. Cham: Springer International Publishing; 2015. p. 351–65.CrossRef Sizov G, Öztürk P, Aamodt A. Evidence-driven retrieval in textual CBR: bridging the gap between retrieval and reuse. In: Hüllermeier E, Minor M, editors. Case-based reasoning Research and Development. Cham: Springer International Publishing; 2015. p. 351–65.CrossRef
25.
go back to reference Homem TPD, Perico DH, Santos PE, Bianchi RAC, RL de M. Qualitative case-based reasoning for humanoid robot soccer: A new retrieval and reuse algorithm; 2016. p. 170–85. Homem TPD, Perico DH, Santos PE, Bianchi RAC, RL de M. Qualitative case-based reasoning for humanoid robot soccer: A new retrieval and reuse algorithm; 2016. p. 170–85.
28.
go back to reference Giraud-Carrier C, Martinez T. An efficient metric for heterogeneous inductive learning applications in the attribute-value language. In: Yfantis EA, editor. Proceedings of the Fourth Golden West International Conference on Intelligent Systems (GWIC´94). Boston: Kluwer Academic Publishers; 1995. p. 341–50.CrossRef Giraud-Carrier C, Martinez T. An efficient metric for heterogeneous inductive learning applications in the attribute-value language. In: Yfantis EA, editor. Proceedings of the Fourth Golden West International Conference on Intelligent Systems (GWIC´94). Boston: Kluwer Academic Publishers; 1995. p. 341–50.CrossRef
30.
go back to reference Wilson DR, Martinez TR. Improved heterogeneous distance functions. J Artif Intell Res. 1997;6:1–34.CrossRef Wilson DR, Martinez TR. Improved heterogeneous distance functions. J Artif Intell Res. 1997;6:1–34.CrossRef
33.
go back to reference Gietzelt M, Karmen C, Haux C, Ganzinger M, Knaup P. vivaGen: Ein Datensatzgenerator für Überlebenszeitdaten. Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie: German Medical Science GMS Publishing House; 2017. https://doi.org/10.3205/17gmds052. Gietzelt M, Karmen C, Haux C, Ganzinger M, Knaup P. vivaGen: Ein Datensatzgenerator für Überlebenszeitdaten. Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie: German Medical Science GMS Publishing House; 2017. https://​doi.​org/​10.​3205/​17gmds052.
38.
go back to reference Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol. 1977;39:1–38. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol. 1977;39:1–38.
Metadata
Title
Methods for a similarity measure for clinical attributes based on survival data analysis
Authors
Christian Karmen
Matthias Gietzelt
Petra Knaup-Gregori
Matthias Ganzinger
Publication date
01-12-2019
Publisher
BioMed Central
Keyword
Biomarkers
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-0917-6

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue