Brought to you by:
Paper

High-accuracy automatic classification of Parkinsonian tremor severity using machine learning method

, , , , , , and

Published 31 October 2017 © 2017 Institute of Physics and Engineering in Medicine
, , Citation Hyoseon Jeon et al 2017 Physiol. Meas. 38 1980 DOI 10.1088/1361-6579/aa8e1f

0967-3334/38/11/1980

Abstract

Motivation: Although clinical aspirations for new technology to accurately measure and diagnose Parkinsonian tremors exist, automatic scoring of tremor severity using machine learning approaches has not yet been employed. Objective: This study aims to maximize the scientific validity of automatic tremor-severity classification using machine learning algorithms to score Parkinsonian tremor severity in the same manner as the unified Parkinson's disease rating scale (UPDRS) used to rate scores in real clinical practice. Approach: Eighty-five PD patients perform four tasks for severity assessment of their resting, resting with mental stress, postural, and intention tremors. The tremor signals are measured using a wristwatch-type wearable device with an accelerometer and gyroscope. Displacement and angle signals are obtained by integrating the acceleration and angular-velocity signals. Nineteen features are extracted from each of the four tremor signals. The optimal feature configuration is decided using the wrapper feature selection algorithm or principal component analysis, and decision tree, support vector machine, discriminant analysis, and k-nearest neighbour algorithms are considered to develop an automatic scoring system for UPDRS prediction. The results are compared to UPDRS ratings assigned by two neurologists. Main results: The highest accuracies are 92.3%, 86.2%, 92.1%, and 89.2% for resting, resting with mental stress, postural, and intention tremors, respectively. The weighted Cohen's kappa values are 0.745, 0.635 and 0.633 for resting, resting with mental stress, and postural tremors (almost perfect agreement), and 0.570 for intention tremors (moderate). Significance: These results indicate the feasibility of the proposed system as a clinical decision tool for Parkinsonian tremor-severity automatic scoring.

Export citation and abstract BibTeX RIS

1. Introduction

Research on smart-technology-based tremor assessment for objective and quantitative diagnosis is being actively pursued; however, to date, few studies have investigated automatic scoring of Parkinsonian tremor severity using machine learning algorithms as an alternative to clinical assessments. To overcome this research gap, this study proposes a machine-learning-based approach for automatic scoring of Parkinsonian tremor severity, which is applicable to various tremor assessment tasks that are widely and commonly used for evaluation in neurology departments. The desired result is to naturally bridge newly developed evaluative technology and the current clinical rating scale, to facilitate more objective and sophisticated diagnosis and greater clinical convenience.

1.1. Background

Tremors, which are involuntary and consist of rhythmic, oscillatory, and back-and-forth actions, are one of the main symptoms manifesting in Parkinson's disease (PD) patients with bradykinesia, rigidity, and postural instability (Hurtado et al 1999, Kandel et al 2000). Patients exhibiting Parkinsonian tremors experience related physical and/or psychological difficulties on a daily basis; thus, prevention of symptom worsening is crucial. The objective and quantitative evaluation of tremors is considered to be a critical element of PD management.

In current clinical practice, PD-patient tremor assessment is accomplished using various clinical rating scales. Among them, the unified Parkinson's disease rating scale (UPDRS) is a very pervasive and universal method providing comprehensive information on the disabilities of PD patients to elucidate their conditions (Goetz et al 2008). The most important tremor-evaluation parameter for UPDRS usage is distinguishing between tremors at rest ('resting tremors') and action tremors (Puschmann and Wszolek 2011). This classification not only provides benefits with regard to pathophysiology and etiology, but is also highly relevant to the selection of the most propitious treatment option (Puschmann and Wszolek 2011). A resting tremor occurs when a body part is relaxed and completely supported against gravity (Crawford and Zimmerman 2011). In hospitals, resting tremors are evaluated when the patient is seated comfortably on a chair, resting their arms on the chair arms. Resting tremors are typically aggrandized by mental stress; this state is usually generated by asking the patient to count backwards (Crawford and Zimmerman 2011). Such tremors are referred to as 'resting tremors with mental stress' in this paper. However, these tremors are diminished by voluntary movement of the affected body part, then being classified as action tremors (Crawford and Zimmerman 2011). Action tremors can also be subcategorized into postural and kinetic tremors (Crawford and Zimmerman 2011). Postural tremors occur when the body-part position is maintained against gravity; they are evaluated in clinical practice by having the patient hold their arms outstretched (Crawford and Zimmerman 2011). Kinetic tremors manifest during any voluntary movement and include intention tremors during target-directed movements (Kraus et al 2006, Crawford and Zimmerman 2011). Intention tremors increase during visually guided movements toward targets upon movement termination (Kraus et al 2006).

The guidelines for evaluating these tremors according to the UPDRS are listed in table 1 (Movement Disorder Society Task Force 2003). Based on the UPDRS guidelines, neurologists rate tremor severity on a 0–4 integer scale by observing the patient's state and their ability to perform various tasks. However, although this rating is the best-known and most widely used method, the results obtained via this approach are variable as they are subjective, with the tremor severity being evaluated through visual inspection. Inconsistency between raters (and thus, poor reliability) has been reported, with consistency between clinician ratings and patients' own ratings also being surprisingly low (Goetz et al 1997, Davidson et al 2012). Besides, this approach is not suitable for investigating patient conditions during daily life, as UPDRS is employed during routine clinical visits only but tremor symptoms can fluctuate several times per day (Maetzler et al 2013). Thus, clinical demand for state-of-the-art technology that can objectively collect and quantitatively interpret tremor signals has arisen. By extension, advanced technology for long-term evaluation and follow-up monitoring of PD disabilities is also required (Maetzler et al 2013).

Table 1. Tremor evaluation in motor examinations (part III) using UPDRS (copied from Movement Disorder Society TASK Force on Rating Scales for Parkinson's Disease (2003)).

Score Guide
Tremor at rest (head, upper, and lower extremities) Action or postural tremor of hands
0 Absent Absent
1 Slight and infrequently present Slight; present with action
2 Mild in amplitude and persistent or moderate in amplitude, but only intermittently present Moderate in amplitude and present with action
3 Moderate in amplitude and present most of the time Moderate in amplitude and present with posture holding as well as action
4 Marked in amplitude and present most of the time Marked in amplitude; interferes with feeding

1.2. Related works

To meet clinical requirements for tremor-measuring technology, reliable, objective, and quantitative tremor diagnosis must be achieved. Thus, a large number of existing studies have focused on quantitative analysis of the characteristics of Parkinsonian tremors using movement sensors including accelerometers, gyroscopes, and smartphones or watches, along with electromyography (EMG). Using EMG, the effects of aging on the regularity of physiological tremors have been determined by calculating their regularity, coherence, and modal frequency (Sturman et al 2005). Further, high and significant correlations between the UPDRS and tremor signals have been reported based on feature analysis using accelerometers, gyroscopes, or smartphones; hence, the tremors of PD patients have been quantified (Salarian et al 2007, Daneault et al 2012). The linear and nonlinear tremor characteristics of tremor signals from acceleration measurements have also been determined using EMG (Meigal et al 2012). Heida et al distinguished tremors treated with deep brain stimulation (DBS) in PD patients from non-tremors using a gyroscope (Heida et al 2013). Indeed, differentiation between Parkinsonian tremors and other tremor symptoms is one of the most widely explored topics overall. For example, Kostikis et al proposed methods of distinguishing PD-induced hand tremors from those of healthy volunteers by employing various machine learning approaches (Kostikis et al 2015). In other studies, discrimination between Parkinsonian and essential tremors was achieved using accelerometers or a smart watch (Wile et al 2014, Thanawattano et al 2015). Further, Giuffrida et al tested the potential of a home-based accelerometer-containing PD assessment system for tremor quantification (Giuffrida et al 2009).

Although these studies may provide a basis for automatic Parkinsonian tremor scoring using high-level technology, it is difficult to apply their results directly in clinical practice. Martinez-Manzanera et al noted the gap between conventional clinical rating scales, which may be more familiar to neurologists through their constant use, and the scientific and objective assessment of PD symptoms in clinical practice (Martinez-Manzanera et al 2016). Thus, most clinicians continue to use the UPDRS as a gold standard, and the modern, objective approaches have not yet been incorporated into routine clinical evaluation (Maetzler et al 2013, Martinez-Manzanera et al 2016). Therefore, alternative strategies that can obtain the maximum possible approval from clinicians are needed to effectively link the present clinical rating scale and scientific and objective rating tools. A key approach is to provide clinical scores by mapping sensor-based-features relevant to PD patient symptoms onto existing clinical scales, as closely as possible to current clinician evaluations based on prevalent rating methods such as the UPDRS.

In response to this demand, many researchers have investigated the connection between sensor-based features and the UPDRS clinical rating scale using accelerometers and gyroscopes. To date, the relevant studies have considered bradykinesia (Martinez-Manzanera et al 2016, Sama et al 2017), gait-related tasks (Giuberti et al 2015a, 2015b, Parisi et al 2015, 2016), a finger-tapping task (Memedi et al 2013, Stamatakis et al 2013), and tremors (Giuffrida et al 2009, Dai 2015, Pan et al 2015). Three machine learning algorithms have been employed for automatic evaluation of gait-related tasks such as leg agility, sit-to-stand, and gait (Giuberti et al 2015a, 2015b, Parisi et al 2015, 2016). In other studies, to automatically assign bradykinesia symptoms to the UPDRS, a support vector machine (SVM) and regression model have been used (Martinez-Manzanera et al 2016, Sama et al 2017). For the finger-tapping task, a logistic regression model and a greedy backward algorithm have been explored (Memedi et al 2013, Stamatakis et al 2013). Finally, for automatic scoring of PD-patient tremors, only a regression model has been applied (Giuffrida et al 2009, Dai 2015, Pan et al 2015). to the best of our knowledge, machine learning approaches have not yet been applied for automatic Parkinsonian-tremor scoring. However, machine learning algorithms must be used to convert the high-dimensional characteristics of motion-sensor data into scientifically and clinically meaningful information (Kubota et al 2016).

In this study, we aim to automatically allocate the patient results for four tremor tasks to the UPDRS using several machine learning approaches. Thus, we design, develop, and validate an automatic scoring method to evaluate Parkinsonian-tremor severity for both clinical use and home assessment. In section 2, we first describe the participating patients, data acquisition procedure, and feature definition and selection, before discussing the machine-learning classification methods. In section 3, the performance exhibited by the examined methods is expounded. Finally, the development considerations, contributions, and limitations of this study are discussed in section 4. A conclusion is provided in section 5, which includes a brief discussion of further research directions.

2. Methods

2.1. Experiments

2.1.1. Wearable device.

A wristwatch-type wearable device was devised for this study to acquire hand-tremor symptom signals using acceleration and angular velocity measurements (figure 1). This device was designed to be small, light, wireless, and wearable and to have low power consumption; the aim was to minimize the effect of the size and weight on the body, so as to facilitate accurate tremor measurement. The size and weight specifications were 16 mm  ×  19.9 mm  ×  10 mm, 2.6 g, and 41 mm  ×  48 mm  ×  17.8 mm, 31.6 g, respectively, for the finger and wrist components. A tri-axis-accelerometer sensor (LIS3DSH, STMicroelectronics N.V., Switzerland) and a tri-axis-gyroscope sensor (L3GD20, STMicroelectronics N.V., Switzerland) were respectively equipped to the finger and wrist components. The accelerometer could measure up to  ±16 g in the X, Y, and Z directions, and the gyroscope was configured to a sensitivity of  ±  2000 ° s−1.

Figure 1.

Figure 1. Customized wristwatch-type wearable device used to measure hand tremors in this study. A: wrist component (length (L): 41 mm; width (W): 48 mm; height (H): 17.8 m). B: finger component (L: 16 mm; W: 19.9 mm; H: 10 mm).

Standard image High-resolution image

2.1.2. Subjects.

The subjects in this project were 85 patients with PD (average age: 65.96  ±  9.19 y; females: 44; males: 41). The criteria of the diagnostic guidelines of the United Kingdom Parkinson's Disease Society Brain Bank were followed. All participants were outpatients of the Department of Neurology of Seoul National University Hospital (SNUH), Republic of Korea, and were diagnosed as experiencing hand-tremor symptoms in their daily lives. Patients with leg tremors or dyskinesia were excluded. This study was conducted according to the principles of the Declaration of Helsinki (2008) with prior approval of the Ethics Committee of the SNUH. All subjects were given verbal explanations and signed the informed consent forms.

2.1.3. Acquisition procedure.

The experiment was designed to acquire hand-tremor signals and assign UPDRS ratings for four tremor assessment tasks that are used to rate the hand tremors of PD patients by neurologists in real clinical practice. Four tasks that are widely used to assess resting, resting with mental stress, postural, and intention tremors in the hands were measured. For these tasks, the subjects were comfortably seated in a chair. Then, a device was attached to each subject's wrist and the fingertip of their middle finger, on both their right and left hands. The sensor position on the fingertip was changed if the tremor symptoms were relatively severe; that is, it was moved to a different finger. For the resting tremor task, subjects were asked to sit on the chair and rest their arms on the arm rest. Then, while maintaining this posture, the subjects were asked to count backwards from 100 by subtracting in threes; this task was conducted to evaluate resting tremors with mental stress. For the postural tremors, the subjects were required to hold their hands outstretched with pronation. These three tasks were conducted for 1 min each. Further, intention tremors were assessed during a finger-to-nose test. One finger-to-nose touch movement was repeated 10 times, with a subject beginning this task by placing their index finger on the tip of their nose. In addition, a pen was provided as a target and the subjects were asked to move their index finger close to the pen. Note that they were requested to try keep their index finger close to the target only, without touching the pen. This distinction was made to avoid signal distortion due to direct contact with the pen. This procedure was repeated for the right and left index fingers. During all tasks, the subjects' hands were recorded by a video camera (Panasonic HDC-TM700, 1920  ×  1080 p HD) positioned in front of each patient at 60 frames s−1, for later evaluation by neurologists. Both hands were recorded at the highest possible magnification on the camera screen for enhanced evaluation. Two neurologists having a minimum of two years of subspecialty training in movement disorders independently assessed the hand-tremor severity by observing the fingers of both the subjects' hands in the video recording and on the basis of the UPDRS guidelines. From these video recordings, 170 recordings of the 85 patients' hands performing each of the four tasks were rated.

2.2. Data analysis

2.2.1. Dataset for analysis.

We used tremor recordings measured from the finger component of the wearable device, because hand tremors usually affect the fingers. Among all the finger-tremor recordings, a total of 238 recordings were selected based on consensus between the scores of the two neurologists. In other words, only the recordings for which both neurologists gave the same ratings were selected. In detail, the tremor signals having clear UPDRS ratings with no differences in evaluation were selected for data analysis from the total 170 tremor recordings for each of the four tasks. Then, among the consensus dataset, the tremor recordings for which UPDRS ratings  ⩾1 were assigned were used for feature extraction and modelling for automatic tremor scoring. Hence, 52, 58, 63, and 65 tremor recordings corresponding to resting, resting with mental stress, postural, and intention tremors, respectively, were analysed. Each belonged to one of the four UPDRS classes (1–4). The UPDRS rating distributions of each of the two neurologists and the selected tremor recordings for each of the four tasks in this study are shown in figure 2.

Figure 2.

Figure 2. Distribution of UPDRS ratings  ⩾1 for tremors in four examined tasks, for resting, resting with mental stress, postural, and intention tremors. Two neurologists independently rated the recorded tremors for the four tasks. Then, the tremors for which a rating consensus was reached were selected. The grey bars and the grey bars with diagonal lines indicate the number of ratings for each UPDRS score awarded by neurologists A and B, respectively. The black bars show the number of consensus UPDRS ratings, i.e. agreement between neurologists A and B. (a) Resting tremor. (b) Resting tremor with mental stress. (c) Postural tremor. (d) Intention tremor.

Standard image High-resolution image

2.2.2. Signal processing.

The tremor acceleration and angular velocity were measured at a 125 Hz sampling rate by the wearable device and transferred to a computer via Bluetooth. After scaling, the signals were band-pass-filtered between 1 and 16 Hz to eliminate artefacts such as drift (from low-frequency movements) and noise (from the main electrical power line) using a fifth-order Butterworth filter. From the filtered acceleration and angular velocity, the displacements and angles were calculated by twice integrating the accelerations and angular velocities. These four signals, i.e. the acceleration, angular velocity, displacement, and angle, were segmented into 30 s signals. Before this segmentation, the first and last 10 s were removed to exclude the unstable segments at the beginnings and ends of the signals. Thus, segmentation was performed on the main, central part of each overall signal.

2.2.3. Feature extraction and selection.

To investigate the tremor characteristics that can be classified using UPDRS, 19 features were defined for each of the four segmented signals (acceleration, angular velocity, displacement, and angle). These features were computed in the time and frequency domains of each of the signals. In the time domain, the root mean square (RMS) was first calculated from the three-axis signal; then, the temporal features were extracted from the RMS signal. In the frequency domain, the averaged spectrum was calculated by averaging the three power values corresponding to the same frequency in each of the three signal spectra (for each axis), for each frequency value in the averaged spectrum. This procedure was performed to extract the spectral features from one averaged spectrum reflecting the three-dimensional tremor movement.

Four features were defined in the time domain. The mean amplitude, the logarithm of the mean amplitude, the averaged regularity, and the standard deviation (STD) of the regularity were extracted as the temporal features. The mean amplitude was calculated by averaging the peak-to-peak amplitudes in the RMS signal. The regularity was defined as the time from the present peak to the next peak. The averaged regularity for the segmented period was computed to investigate the rhythmic variability of the tremor signal. As noted above, the logarithm of the mean amplitude and the STD of the regularity were also added to the temporal features. The temporal features are graphically illustrated in figure 3 and listed in table 2.

Table 2. Definitions of temporal and spectral features.

Feature type Feature Definition
Temporal Mean amplitude Mean amplitude  =  $\frac{\mathop{\sum }_{i=1}^{n}{\rm Amp}{{1}_{i}}+{\rm Amp}{{2}_{i}}}{2n}$ , where i is the peak order, and n is the number of peaks in the signal ${\rm Amp}{{1}_{i}}=\left| p{{p}_{i}}-n{{p}_{i}} \right|$
${\rm Amp}{{2}_{i}}=\left| n{{p}_{i+1}}-p{{p}_{i}} \right|$
$p{{p}_{i}}={\rm mag}\left( {{t}_{p\_i}} \right),\,n{{p}_{i}}={\rm mag}\left( {{t}_{n\_i}} \right)$
(mag(t) is the magnitude as a function of time)
Averaged regularity Averaged regularity  =  $\frac{\mathop{\sum }_{i=1}^{n-1}\left| ~{{t}_{p\_i+1}}~-~{{t}_{p\_i}} \right|}{n-1}$
Standard deviation of regularity STD (regularity)  =  $\sqrt{\frac{{{\sum{\left( \left| {{t}_{{{p}_{i}}+1}}-~{{t}_{{{p}_{i}}}} \right|~-~\overline{{{t}_{{{p}_{i}}+1}}-~{{t}_{{{p}_{i}}}}} \right)}}^{2}}}{n-1}}$
Logarithm of mean amplitude Log (mean amplitude)
Peak frequency Frequency at maximum power
Mean frequency Mean frequency  =  $\frac{\mathop{\sum }_{i=1}^{n}~{{f}_{i}}\cdot {{P}_{i}}~}{\mathop{\sum }_{i=1}^{n}{{P}_{i}}}$ ,
where i is the spectrum sample, fi is the frequency at sample i, and Pi is the power value at sample i
Peak power Power value at peak frequency
Mean power Power value at mean frequency
Power in low-frequency band ${{P}_{{\rm Low}}}=\sum\nolimits_{i=1}^{{{i}_{-{\rm tr}1}}}{{{P}_{i}}}$
Power in tremor-frequency band ${{P}_{{\rm Tr}}}=\sum\nolimits_{i={{i}_{-{\rm tr}1}}}^{{{i}_{-{\rm tr}2}}}{{{P}_{i}}}$
Spectral Power in high-frequency band ${{P}_{{\rm High}}}=\sum\nolimits_{i={{i}_{-{\rm tr}2}}}^{{{i}_{-16}}}{{{P}_{i}}}$
Relative power in low-frequency band ${{P}_{{\rm rl}\_{\rm Low}}}=\frac{~{{P}_{{\rm Low}}}~}{\mathop{\sum }_{i=1}^{n}{{P}_{i}}}$
Relative power in tremor-frequency band ${{P}_{{\rm rl}\_{\rm Tr}}}=\frac{~{{P}_{{\rm Tr}}}}{\mathop{\sum }_{i=1}^{n}{{P}_{i}}}$
Relative power in high-frequency band ${{P}_{{\rm rl}\_{\rm High}}}=\frac{~{{P}_{{\rm High}}}}{\mathop{\sum }_{i=1}^{n}{{P}_{i}}}$
Logarithm of peak power log(peak power)
Logarithm of mean power log(mean power)
Logarithm of PLow log(PLow)
Logarithm of PTr log(PTr)
Logarithm of PHigh log(PHigh)
Figure 3.

Figure 3. Graphical illustration of temporal feature definitions. Two sequential positive peaks (pp) and three sequential negative peaks (np) of the RMS signal of a tremor are shown together. The illustrated signal in this figure was produced from a tremor signal measured by the wearable device described in section 2.1.1.

Standard image High-resolution image

In the frequency domain, 15 features were defined, as listed in table 2. The peak (PF) and mean (MF) frequencies, the peak and mean powers, and the logarithms of the peak and mean powers were computed from an averaged spectrum. PF was calculated as the frequency at the maximum power in the averaged spectrum, whereas MF was the centre value of the power distribution across frequencies. The peak and mean powers were the power values at PF and MF, respectively, and the logarithm of the peak and mean powers were also included in the spectral features, as noted above. In addition, we derived six other features from three variable frequency bands in the averaged spectrum, which were defined as the low-, tremor-, and high-frequency bands (Heida et al 2013). For these bands, ${{f}_{{\rm tr}1}}$ and ${{f}_{{\rm tr}2}}$ were first defined as the frequencies at 3 Hz preceding and following MF, respectively. The low-, tremor-, and high-frequency bands were then defined as being from 0 Hz to ${{f}_{{\rm tr}1}}$ , from ${{f}_{{\rm tr}1}}$ to ${{f}_{{\rm tr}2}}$ , and from ${{f}_{{\rm tr}2}}$ to 16 Hz, respectively (figure 4). Note that 16 Hz is the highest frequency of interest considering the Parkinsonian tremor characteristics. (Unlike Heida et al (2013), the frequency was not fixed in this study, because every patient had a different MF.) For each of the three frequency bands, the power values were calculated, being denoted as PLow, PTr, and PHigh for the low-, tremor-, and high-frequency bands, respectively. The relative power values (Prl) in the three frequency bands were also defined, being the ratios of the sum of the power values in each frequency band to the total sum of the power values from 0 to 16 Hz. The logarithms of the power values in the three frequency bands were also included in the considered spectral features.

Figure 4.

Figure 4. Graphical illustration of three frequency bands and spectral features in averaged spectrum. For this example, the peak (PF) and mean (MF) frequencies are depicted. (Freq. in the figure is frequency.)

Standard image High-resolution image

For the features listed in table 2 and for each of the four signals (acceleration, angular velocity, displacement, and angle), a wrapper feature selection algorithm (Kohavi and John 1997) and principal component analysis (PCA) (Jolliffe and Cadima 2016) were implemented for feature dimension reduction. The wrapper approach is a feature subset selection method that employs iteration to add individual features to existing feature combinations. Then, the classifier performance is evaluated for an optimal feature subset (Kohavi and John 1997, Martinez-Manzanera et al 2016). The feature combination can be easily understood, as this method excludes features with meaningless variance and selects those yielding the highest performance (Kohavi and John 1997, Martinez-Manzanera et al 2016). PCA transforms all original variables based on the principal components, i.e. a linear combination of the original variables, which is obtained with maximum variance among all possible choices of the respective axis (Jolliffe and Cadima 2016). No redundant information is provided. In this study, both feature selection methods were implemented to attain optimal features for automatic tremor scoring for the four assessment tasks; thus, eight optimal feature sets were obtained, four from each feature selection method. Each feature set obtained by the wrapper method and PCA was applied to the machine-learning classifiers as input features; then, the optimal feature configuration for the classifiers was determined at the highest accuracy.

2.2.4. Classification and performance analysis.

To develop an automatic scoring system for tremor severity based on the kinematic features described in section 2.2.3 and as an alternative to the present, classical UPDRS method, we explored four classification algorithms: decision tree, SVM, discriminant analysis, and k-nearest neighbours (kNN). When the SVM was implemented, three kernels (linear, polynomial, and radial basis function (RBF)), were considered. For the kNN classification method, odd numbers between 1 and 11 were used as k values to avoid selection ambiguity for even k.

To validate the performance of the four employed classifiers, a leave-one-out cross-validation method was adopted to prevent bias in the classification performance, considering the tremor dataset size for the four tasks. In this validation method, one data point was moved a new set for validation and the remaining points were used as a set of training points to construct the classifier. The trained classifiers returned the predicted UPDRS classes of the tested points. To evaluate the performance of the four classifiers constructed in this study, we determined the classification errors $e$ , accuracy, the root mean square errors (RMSEs), and the weighted Cohen's kappa coefficients. Here, $e$ was defined as the absolute difference between the actual and predicted UPDRS, as follows (Parisi et al 2016):

Equation (1)

where $u$ is the UPDRS score allocated by the two neurologists and $\widehat{u}$ is the predicted tremor class decided by a trained classifier. The cumulative distribution functions (CDFs) of $e$ were also plotted to assess the classification accuracy and precision achieved by the examined classifiers. All offline analyses were conducted using MATLAB R2016b (MATLAB, Mathworks, USA).

3. Results

In this section, we describe the automatic tremor-severity scoring results yielded by the proposed method, focusing on the highest performance yielded by the various considered classifiers.

3.1. Resting tremors

For the resting tremors, the polynomial SVM applied to the wrapper-method-selected features yielded UPDRS predictions with an accuracy of 92.3%, RMSE of 0.039, and a weighted Cohen's kappa coefficient of 0.745, as summarized in table 3. All four features selected by the wrapper method were from the acceleration. Figure 5 illustrates the CDFs of the e values for each of the optimized classifiers on the features selected by the wrapper method or the PCA-projected data. The ability of the classifiers to predict UPDRS ratings with each e value is shown, where the CDF values on the Y-axis indicate the probability of obtaining a prediction with e  ⩽  the corresponding X value. The highest accuracy for resting tremors from all trained classifiers corresponding to CDF values at e  =  0 in figure 5 was 92.3%, achieved with the polynomial SVM. For the same e value, the lowest accuracy was 80.8% for the discriminant analysis and kNN on the selected features (figure 5). The probability of an estimation with e  ⩽  1 was 1 for all classifiers. Thus, the polynomial SVM returned the highest accuracy, along with a probability value of 1 with e  ⩽  1. Considering all performance, accuracy, RMSE, and weighted Cohen's kappa coefficient results, the best classifier for automatic prediction of the UPDRS rating of resting tremors was determined to be the polynomial SVM on the four features selected by the wrapper method. Note that the weighted Cohen's kappa coefficient of 0.745 is regarded as yielding substantial (Landis and Koch 1977, Jl 1981, Dg 1991) between the predicted UPDRS ratings and those given by the neurologists.

Table 3. Highest performance achieved by polynomial SVM on wrapper-method-selected features for resting tremors.

Best classifier Polynomial SVM Accuracy 92.3%
RMSE 0.039
Kappa 0.745
Wrapper-method-selected features Acceleration log(mean amplitude)1,a, log (peak power)2, log(mean power)3, log(PLow)4

aThe superscript number for each feature indicates the order with which the corresponding feature was entered into the classifier.

Figure 5.

Figure 5. CDFs of e of each optimized classifier for resting tremors. Application of the polynomial SVM to the features selected by the wrapper method (bold black line) yielded the best performance.

Standard image High-resolution image

3.2. Resting tremors with mental stress

As regards the performance of the proposed system for automatic scoring of the severity of resting tremors with mental stress, an accuracy of 86.2%, RMSE of 0.055, and weighted Cohen's kappa coefficient of 0.635 were found when the decision tree method was applied to the features selected by the wrapper algorithm (table 4). A total of nine features were selected from the acceleration and angular velocity signals for the resting tremors with mental stress. The CDFs of the e values of each optimized classifier for the features selected by the wrapper algorithm or the PCA-projected data are depicted in figure 6, as previously. The accuracy corresponding to CDF values at e  =  0 ranged from 75.9% to 86.2%. The highest accuracy of 86.2% was attained through application of the decision tree to the features selected by the wrapper method, as mentioned above. The lowest accuracy of 75.9% was from the discriminant analysis applied to the wrapper-algorithm-selected features. As for the probability of prediction with e  ⩽  1, the lowest probability was 0.966 from the decision tree, linear SVM, and discriminant analysis, and the highest was 0.983 from the kNN. For this e value, the highest accuracy was from the decision tree. Unlike the resting tremors, for which the highest accuracy and probability of estimation with e  ⩽  1 were achieved for the same classifier, in the case of the resting tremors with mental stress, the decision tree yielded the highest accuracy, but did not exhibit the highest probability for accurate rating prediction. However, the area under the CDF trendline was highest for this classifier. Considering all elements related to the performance, the decision tree exhibited the best performance for automatic scoring of resting tremors with mental stress, even though lower accuracy and higher RMSE were obtained compared to the resting tremor performance. In particular, the weighted Cohen's kappa coefficient of 0.635 is relatively high. Again, this value corresponds to substantial between the UPDRS prediction yielded by our method and the UPDRS ratings provided by the neurologists for resting tremors with mental stress (Landis and Koch, 1977, Jl 1981, Dg 1991).

Table 4. Highest performance achieved by decision tree on selected features for resting tremors with mental stress.

Best classifier Decision tree Accuracy 86.2%
RMSE 0.055
Kappa 0.635
Selected features Acceleration ${{P}_{{\rm rl}\_{\rm Tr}}}$ 1,a, log (peak power)2, log(mean power)3, log(PLow)4, log(PTr)5, log(PHigh)6
Gyroscope ${{P}_{{\rm rl}\_{\rm Tr}}}$ 7, log (mean amplitude)8, log(mean power)9

aThe superscript number for each feature indicates the order with which the corresponding feature was entered into the classifier.

Figure 6.

Figure 6. CDFs of e of each optimized classifier for resting tremors with mental stress. Application of the decision tree to the features selected by the wrapper method (bold black line) yielded the best performance.

Standard image High-resolution image

3.3. Postural tremors

The postural tremor ratings were predicted with an accuracy of 92.1%, RMSE of 0.036, and a weighted Cohen's kappa coefficient of 0.633 when kNN was employed for the features selected by the wrapper algorithm, as indicated in table 5. Here, 23 features were selected by the wrapper feature-selection algorithm from the four signals, i.e. acceleration, angular velocity, displacement, and angle. The CDFs of the e values of each optimized classifier on the features selected by the wrapper method or the PCA-projected data are graphically shown in figure 7. For CDF values corresponding to e  =  0 (figure 7), the highest accuracy for the postural tremor among the trained classifiers was 92.1%, achieved with kNN. The lowest accuracy was 85.7%, yielded by the decision tree on the PCA-projected data (figure 7). The probability of estimation with e  ⩽  1 was 1 for all optimized classifiers. The kNN returned the highest accuracy and also yielded the highest probability value of 1 with e  ⩽  1. Taking the accuracy, RMSE, and weighted Cohen's kappa coefficient into account, among all classifiers, the kNN estimated the UPDRS classes of the postural tremors with the highest accuracy, a high weighted Cohen's kappa coefficient, and the lowest RMSE. The weighted Cohen's kappa coefficient of 0.633 between the predicted and actual UPDRS was obtained for the postural tremors, which can again be regarded as substantial (Landis and Koch 1977, Jl 1981, Dg 1991).

Table 5. Highest performance achieved by kNN on selected features for postural tremors.

Best classifier kNN Accuracy 92.1%
RMSE 0.036
Kappa 0.633
Selected features Acceleration ${{P}_{{\rm rl}\_{\rm Tr}}}$ 1,a, STD (regularity)7, PF8, MF9, ${{P}_{{\rm rl}\_{\rm Low}}}$ 14, ${{P}_{{\rm rl}\_{\rm High}}}$ 15, log (peak power)16, log(mean power)17, log(PTr)18
Angular velocity ${{P}_{{\rm rl}\_{\rm Tr}}}$ 2, PF10, MF11, averaged regularity 19, STD (regularity)20, log (mean amplitude)21, log(mean power)22, ${{P}_{{\rm Tr}}}$ 23
Displacement MF3, ${{P}_{{\rm rl}\_{\rm Tr}}}$ 4, PF12,
Angle MF5, ${{P}_{{\rm rl}\_{\rm Tr}}}$ 6, PF13

aThe superscript number for each feature indicates the order with which the corresponding feature was entered into the classifier.

Figure 7.

Figure 7. CDFs of e of each optimized classifier for postural tremors. Application of kNN to the features selected by the wrapper method (bold black line) yielded the best performance.

Standard image High-resolution image

3.4. Intention tremors

For the intention tremors, an accuracy of 89.2%, RMSE of 0.041, and weighted Cohen's kappa coefficient of 0.570 were achieved by application of the decision tree method to the PCA-projected data, as detailed in table 6. For the intention tremors, the optimal feature configuration was achieved by the PCA, and the reduced feature dimension was 10. In figure 8, the CDFs of the e values of each optimized classifier applied to the wrapper-method-selected features or the PCA-projected data are illustrated. The accuracies for the intention tremors attained from each optimized classifier corresponding to CDF values at e  =  0 were from 83.1% to 89.2% (figure 8). The highest accuracy of 89.2% was achieved for the decision tree on the PCA-projected data, as mentioned above, and the lowest accuracy of 83.1% was from Linear SVM applied to the wrapper-method-selected features. The probability of estimation with e  ⩽  1 for the intention tremors was 1 for all optimized classifiers, which explains that all observations were estimated with e  ⩽  1 by the classifiers used in this study. In the results for the intention tremors, high accuracy, a probability of 1 for prediction with e  ⩽  1, and a low RMSE were attained; however, the weighted Cohen's kappa coefficient showed a lower value of 0.570 than those for the predictions of the other three tremor types. This lower weighted Cohen's kappa coefficient indicates that the examined approach is less reliable for automatic scoring of intention tremors compared to the other tremors considered in the other three assessment tasks. However, it can be considered that there is enough potential based on that the accuracy of 89.2% was not low, and the weighted Cohen's kappa coefficient of 0.570 is considered as fair to good or moderate (Landis and Koch 1977, Jl 1981, Dg 1991).

Table 6. Highest performance achieved by decision tree on selected features for intention tremors.

Best classifier Decision tree Accuracy 89.2%
RMSE 0.041
Kappa 0.570
Selected features Four signals From 1st to 10th PC  
Figure 8.

Figure 8. CDFs of e of each optimized classifier for resting tremors. Application of the decision tree on the PCA-projected data (bold black line) yielded the best performance.

Standard image High-resolution image

3.5. Multi-classification performance

To further investigate the e values of the automatic scoring of the tremor severity, the confusion matrix of the predicted UPDRS ratings for each tremor was computed; the results are listed in tables 710. In the confusion matrices, the sensitivities and specificities for each UPDRS class are provided (the sensitivities are the values along the diagonal of each table). In terms of sensitivity, the ranges were 86.4–100%, 79.0–100%, 66.7–97.8%, and 54.5–96.3% for the resting, resting with mental stress, postural, and intention tremors, respectively. In these confusion matrices, it was generally observed that high sensitivities were obtained for the UPDRS rating predictions, especially for resting tremors and resting tremors with mental stress. For the postural tremors, a sensitivity of 66.7% at a UPDRS rating of 4 (UPDRS 4) may be considered as low performance; however, this is acceptable considering the total number of three samples at UPDRS 4 and the correct classification of two samples among the total three samples. However, five samples of a total of 11 for UPDRS 2 in the intention tremor case were misclassified as UPDRS 1; this is the dominant class for this tremor with a lowest sensitivity of 54.5%; however, a high sensitivity of 96.3% was obtained for UPDRS 1. This result for the intention tremors is connected with the low specificity for UPDRS 1 and the high sensitivity of UPDRS 1.

Table 7. Confusion matrix of resting tremor results.

% 1 2 3 4 Spec.b(%)
1 21(95.5a) 1 0 0 90.0
2 3 19(86.4) 0 0 96.7
3 0 0 6(100.0) 0 100
4 0 0 0 2(100.0) 100

aThe bold numbers in the parentheses are the sensitivities of each class. bSpec. indicates 'specificity'.

Table 8. Confusion matrix of results for resting tremors with mental stress.

% 1 2 3 4 Spec.b(%)
1 7(87.5a) 1 0 0 92.0
2 2 21(87.5) 1 0 94.1
3 2 1 15(79.0) 1 97.4
4 0 0 0 7(100.0) 98.0

aThe bold numbers in the parentheses are the sensitivities of each class. bSpec. indicates 'specificity'.

Table 9. Confusion matrix of postural tremor results.

% 1 2 3 4 Spec.b(%)
1 45(97.8a) 1 0 0 88.2
2 2 8(80.0) 0 0 96.2
3 0 1 3(75.0) 0 98.3
4 0 0 1 2(66.7) 100.0

aThe bold numbers in the parentheses are the sensitivities of each class. bSpec. indicates 'specificity'.

Table 10. Confusion matrix of intention tremor results.

% 1 2 3 4 Spec.b(%)
1 52(96.3a) 2 0 0 54.5
2 5 6(54.5) 0 0 96.3
3 0 0 0 0 0.0
4 0 0 0 0 0.0

aThe bold numbers in the parentheses are the sensitivities of each class. bSpec. indicates 'specificity'.

As shown in figure 2, the data distribution for each UPDRS rating for the intention tremors is unbalanced. There are no data for UPDRS 3 and 4, and most data are for UPDRS 1. This distribution seems to have had a serious effect on the prediction ability for intention tremors. However, the uneven distribution at each UPDRS is inevitable, as it is related to the patient population. This point is explained in detail in section 4.3. Although the intention-tremor distribution at each UPDRS is not suitable for training machine learning classifiers, in this study, we attempted to apply a machine learning algorithm to the intention tremors to explore the automatic scoring potential for this tremor type.

4. Discussion

4.1. Contributions

The contribution of this paper is that it is the first study on the use of machine learning techniques for automatic scoring of Parkinsonian tremor severity in various tasks. Our approach can provide predicted UPDRS ratings similar to those currently used in clinical practice, unlike results from other studies that do not yield the same type of UPDRS ratings such that direct application of the findings of these studies in clinical practice may be difficult for clinicians engaged in tremor evaluation (Rissanen et al 2007, 2008, Salarian et al 2007, Meigal et al 2009, 2012, Daneault et al 2012, Heida et al 2013, Thanawattano et al 2015). We represented these outstanding results with RMSE, accuracy, and weighted Kappa coefficient based on the fact that several suitable methods for reliability assessment should be represented together considering advantages and disadvantages of each method of reliability assessment (Bruton et al 2000). The accuracies were 92.3%, 86.2%, 92.1%, and 89.2% for resting, resting with mental stress, postural, and intention tremors, respectively. Regarding the weighted Cohen's kappa, the values from our proposed method were 0.745, 0.635, 0.633, and 0.570 for resting, resting with mental stress, postural, and intention tremors, respectively. As mentioned above, the weighted Cohen's kappa coefficients for resting, resting with mental stress, and postural tremors were regarded as being in substantial. The weighted Cohen's kappa coefficient of 0.570 for the intention tremor was lower than the values for other tremors, but this coefficient was considered as indicating moderate agreement. In addition, a comparison between the RMSE results reported in Giuffrida et al (2009) and those from this study is presented in table 11. Note that the RMSE values for all tremor assessment tasks obtained using the proposed method were significantly lower than those obtained in Giuffrida et al. Thus, the machine learning techniques utilized in this study yielded the most accurate and reliable results compared with results obtained in Giuffrida et al and represent the most validated performance compared with other studies (Dai 2015, Pan et al 2015); therefore, the techniques presented in this study are the most logical candidates for use in a clinical or home context.

Table 11. Comparison of RMSEs for similar techniques and our proposed method.

Tremor type RMSE
Giuffrida et al This paper
Resting 0.32 0.039
Resting with mental stress X 0.055
Postural 0.35 0.036
Intention 0.45 0.041

4.2. Limitations

The main limitation of the method proposed in this paper is the unbalanced distribution of the UPDRS ratings. As shown in figure 2, the results of most tremor trials were classified as UPDRS 1 or 2, with relatively few tremors being classed as UPDRS 3 or 4. This may have affected the performance of the classifier models used to build and predict the UPDRS ratings, as a larger number of observations would have allowed relevant results to be obtained. In particular, the intention-tremor results were influenced by the amount of data (table 10), with uneven performance being obtained. The Cohen's kappa coefficient for the intention tremors was also affected, with a lower value (0.570) being obtained compared with results obtained in other tremor types. However, the small distribution of more severe tremors corresponding to UPDRS 3 and 4 is inevitable, going beyond the focus of this research, and takes the actual distribution of patients with tremor symptoms who visit hospitals into account (Parisi et al 2015). Indeed, in previous studies, it was necessary to examine data with low or even zero distribution densities corresponding to severe tremors (Giuffrida et al 2009, Giuberti et al 2015b, Parisi et al 2015).

5. Conclusions

In this paper, an automatic scoring method for the tremor severity of PD patients that employs a machine learning algorithms was proposed, as an alternative approach to the current clinical rating scale. Four machine learning methods, a decision tree, an SVM with three kernels, discriminant analysis, and kNN, were examined, so that the best classifier was applied to the analysis of each set of tremor assessment results. To configure the optimal feature set for the input features to be entered into the classifiers, the wrapper method and PCA were implemented. With the selected features, the ability of the machine learning classifiers to automatically predict the UPDRS ratings of the four PD tremor-assessment tasks were validated using the leave-one-out method.

The foremost outcome of this study is a highly accurate automatic scoring system for Parkinsonian tremor severity that employs machine learning methods. We report RMSEs of 0.039, 0.055, 0.036, and 0.041 for resting, resting with mental stress, postural, and intention tremors, respectively, compared to traditional UPDRS ratings provided by neurologists. These errors are markedly lower than those of other studies cited in this paper. In addition, our method yielded performance accuracies of 92.3%, 86.2%, 92.1%, and 89.2% and weighted Cohen's kappa coefficients of 0.745, 0.635, 0.633, and 0.570 for each of the four assessment tasks, respectively. These results correspond to the best performance, to our knowledge, for automatic scoring of tremor severity.

The technique presented in this study supports current trends that aspire to realize objective and quantitative patient diagnosis, disease progression monitoring, and optimal treatment identification, by facilitating automatic scoring of the UPDRS classes. The analyses and findings reported in this paper can be further investigated to mitigate the current limitations and can also be improved by integrating the system with other PD assessment tasks. Ultimately, such investigations may simplify the application of the proposed system as a decision support tool. The proposed approach could, therefore, be highly useful for the effective diagnosis and management of PD.

Acknowledgment

This research was supported by Coway Co., Ltd.

Please wait… references are loading.
10.1088/1361-6579/aa8e1f