Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Rheumatoid Arthritis | Software

AliClu - Temporal sequence alignment for clustering longitudinal clinical data

Authors: Kishan Rama, Helena Canhão, Alexandra M. Carvalho, Susana Vinga

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

Patient stratification is a critical task in clinical decision making since it can allow physicians to choose treatments in a personalized way. Given the increasing availability of electronic medical records (EMRs) with longitudinal data, one crucial problem is how to efficiently cluster the patients based on the temporal information from medical appointments. In this work, we propose applying the Temporal Needleman-Wunsch (TNW) algorithm to align discrete sequences with the transition time information between symbols. These symbols may correspond to a patient’s current therapy, their overall health status, or any other discrete state. The transition time information represents the duration of each of those states. The obtained TNW pairwise scores are then used to perform hierarchical clustering. To find the best number of clusters and assess their stability, a resampling technique is applied.

Results

We propose the AliClu, a novel tool for clustering temporal clinical data based on the TNW algorithm coupled with clustering validity assessments through bootstrapping. The AliClu was applied for the analysis of the rheumatoid arthritis EMRs obtained from the Portuguese database of rheumatologic patient visits (Reuma.pt). In particular, the AliClu was used for the analysis of therapy switches, which were coded as letters corresponding to biologic drugs and included their durations before each change occurred. The obtained optimized clusters allow one to stratify the patients based on their temporal therapy profiles and to support the identification of common features for those groups.

Conclusions

The AliClu is a promising computational strategy to analyse longitudinal patient data by providing validated clusters and by unravelling the patterns that exist in clinical outcomes. Patient stratification is performed in an automatic or semi-automatic way, allowing one to tune the alignment, clustering, and validation parameters. The AliClu is freely available at https://​github.​com/​sysbiomed/​AliClu.
Appendix
Available only for authorised users
Literature
2.
go back to reference Needleman SB, Wunsch CD. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J Mol Biol. 1970; 48:443–53.CrossRef Needleman SB, Wunsch CD. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J Mol Biol. 1970; 48:443–53.CrossRef
3.
go back to reference Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Sig Process. 1978; 26:43–9.CrossRef Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Sig Process. 1978; 26:43–9.CrossRef
4.
go back to reference Zhou F, la Torre FD. Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Vancouver: Curran Associates, Inc.: 2009. p. 2286–94. Zhou F, la Torre FD. Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Vancouver: Curran Associates, Inc.: 2009. p. 2286–94.
6.
go back to reference Fischer B, Roth V, Buhmann JM. Time-series alignment by non-negative multiple generalized canonical correlation analysis. BMC Bioinformatics. 2007; 8(10):4.CrossRef Fischer B, Roth V, Buhmann JM. Time-series alignment by non-negative multiple generalized canonical correlation analysis. BMC Bioinformatics. 2007; 8(10):4.CrossRef
7.
go back to reference Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG. Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol Syst Biol. 2011; 7(1):539.CrossRef Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG. Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol Syst Biol. 2011; 7(1):539.CrossRef
8.
go back to reference Katoh K, Standley DM. Mafft multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013; 30(4):772–80.CrossRef Katoh K, Standley DM. Mafft multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013; 30(4):772–80.CrossRef
9.
go back to reference Edgar RC. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32(5):1792–7.CrossRef Edgar RC. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32(5):1792–7.CrossRef
11.
go back to reference Canhão H, Faustino A, Martins F, et al.Reuma.pt - The Rheumatic Diseases Portuguese Register. Acta Reumatologica Portuguesa. 2011; 36(1):45–56.PubMed Canhão H, Faustino A, Martins F, et al.Reuma.pt - The Rheumatic Diseases Portuguese Register. Acta Reumatologica Portuguesa. 2011; 36(1):45–56.PubMed
13.
go back to reference Garg L, McClean S, Meenan BJ, Millard P. Phase-type survival trees and mixed distribution survival trees for clustering patients’ hospital length of stay. Informatica. 2011; 22(1):57–72. Garg L, McClean S, Meenan BJ, Millard P. Phase-type survival trees and mixed distribution survival trees for clustering patients’ hospital length of stay. Informatica. 2011; 22(1):57–72.
16.
go back to reference Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin C-T. A review of clustering techniques and developments. Neurocomputing. 2017; 267:664–81.CrossRef Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin C-T. A review of clustering techniques and developments. Neurocomputing. 2017; 267:664–81.CrossRef
17.
go back to reference Mucha H-J. Advances in Data Analysis In: Decker R, Lenz H-J, editors. Berlin, Heidelberg: Springer: 2007. p. 115–122. Mucha H-J. Advances in Data Analysis In: Decker R, Lenz H-J, editors. Berlin, Heidelberg: Springer: 2007. p. 115–122.
18.
go back to reference M. Rand W. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971; 66:846–50.CrossRef M. Rand W. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971; 66:846–50.CrossRef
19.
20.
go back to reference B. Fowlkes E, Mallows C. A method for comparing two hierachical clusterings. J Am Stat Assoc. 1983; 78:553–69.CrossRef B. Fowlkes E, Mallows C. A method for comparing two hierachical clusterings. J Am Stat Assoc. 1983; 78:553–69.CrossRef
21.
go back to reference Wallace DL. A method for comparing two hierachical clusterings: Comment. J Am Stat Assoc. 1983; 78:569–76. Wallace DL. A method for comparing two hierachical clusterings: Comment. J Am Stat Assoc. 1983; 78:569–76.
Metadata
Title
AliClu - Temporal sequence alignment for clustering longitudinal clinical data
Authors
Kishan Rama
Helena Canhão
Alexandra M. Carvalho
Susana Vinga
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-1013-7

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue