Skip to main content
Top
Published in: Journal of NeuroEngineering and Rehabilitation 1/2024

Open Access 01-12-2024 | Dysphagia | Research

Prediction of dysphagia aspiration through machine learning-based analysis of patients’ postprandial voices

Authors: Jung-Min Kim, Min-Seop Kim, Sun-Young Choi, Ju Seok Ryu

Published in: Journal of NeuroEngineering and Rehabilitation | Issue 1/2024

Login to get access

Abstract

Background

Conventional diagnostic methods for dysphagia have limitations such as long wait times, radiation risks, and restricted evaluation. Therefore, voice-based diagnostic and monitoring technologies are required to overcome these limitations. Based on our hypothesis regarding the impact of weakened muscle strength and the presence of aspiration on vocal characteristics, this single-center, prospective study aimed to develop a machine-learning algorithm for predicting dysphagia status (normal, and aspiration) by analyzing postprandial voice limiting intake to 3 cc.

Methods

Conducted from September 2021 to February 2023 at Seoul National University Bundang Hospital, this single center, prospective cohort study included 198 participants aged 40 or older, with 128 without suspected dysphagia and 70 with dysphagia-aspiration. Voice data from participants were collected and used to develop dysphagia prediction models using the Multi-Layer Perceptron (MLP) with MobileNet V3. Male-only, female-only, and combined models were constructed using 10-fold cross-validation. Through the inference process, we established a model capable of probabilistically categorizing a new patient's voice as either normal or indicating the possibility of aspiration.

Results

The pre-trained models (mn40_as and mn30_as) exhibited superior performance compared to the non-pre-trained models (mn4.0 and mn3.0). Overall, the best-performing model, mn30_as, which is a pre-trained model, demonstrated an average AUC across 10 folds as follows: combined model 0.8361 (95% CI 0.7667–0.9056; max 0.9541), male model 0.8010 (95% CI 0.6589–0.9432; max 1.000), and female model 0.7572 (95% CI 0.6578–0.8567; max 0.9779). However, for the female model, a slightly higher result was observed with the mn4.0, which scored 0.7679 (95% CI 0.6426–0.8931; max 0.9722). Additionally, the other models (pre-trained; mn40_as, non-pre-trained; mn4.0 and mn3.0) also achieved performance above 0.7 in most cases, and the highest fold-level performance for most models was approximately around 0.9. The ‘mn’ in model names refers to MobileNet and the following number indicates the ‘width_mult’ parameter.

Conclusions

In this study, we used mel-spectrogram analysis and a MobileNetV3 model for predicting dysphagia aspiration. Our research highlights voice analysis potential in dysphagia screening, diagnosis, and monitoring, aiming for non-invasive safer, and more effective interventions.
Trial registration: This study was approved by the IRB (No. B-2109-707-303) and registered on clinicaltrials.gov (ID: NCT05149976).
Appendix
Available only for authorised users
Literature
3.
go back to reference Costa MMB. Videofluoroscopy: the gold standard exam for studying swallowing and its dysfunction. 2010, SciELO Brasil. p. 327–328. Costa MMB. Videofluoroscopy: the gold standard exam for studying swallowing and its dysfunction. 2010, SciELO Brasil. p. 327–328.
4.
go back to reference Yong Jae NA, et al. Thyroid cartilage loci and hyoid bone analysis using a video fluoroscopic swallowing study (VFSS). Medicine. 2019;98.30. Yong Jae NA, et al. Thyroid cartilage loci and hyoid bone analysis using a video fluoroscopic swallowing study (VFSS). Medicine. 2019;98.30.
5.
go back to reference Lind CD. Dysphagia: evaluation and treatment. Gastroenterol Clin. 2003;32(2):553–75.CrossRef Lind CD. Dysphagia: evaluation and treatment. Gastroenterol Clin. 2003;32(2):553–75.CrossRef
6.
go back to reference Nacci A, et al. Fiberoptic endoscopic evaluation of swallowing (FEES): proposal for informed consent. Acta Otorhinolaryngol Ital. 2008;28(4):206.PubMedPubMedCentral Nacci A, et al. Fiberoptic endoscopic evaluation of swallowing (FEES): proposal for informed consent. Acta Otorhinolaryngol Ital. 2008;28(4):206.PubMedPubMedCentral
7.
go back to reference Ryu JS, Park D, Kang JY. Application and interpretation of high-resolution manometry for pharyngeal dysphagia. J Neurogastroenterol Motil. 2015;21(2):283.CrossRefPubMedPubMedCentral Ryu JS, Park D, Kang JY. Application and interpretation of high-resolution manometry for pharyngeal dysphagia. J Neurogastroenterol Motil. 2015;21(2):283.CrossRefPubMedPubMedCentral
8.
go back to reference Kunieda K, et al. Relationship between tongue pressure and pharyngeal function assessed using high-resolution manometry in older dysphagia patients with sarcopenia: a pilot study. Dysphagia. 2021;36:33–40.CrossRefPubMed Kunieda K, et al. Relationship between tongue pressure and pharyngeal function assessed using high-resolution manometry in older dysphagia patients with sarcopenia: a pilot study. Dysphagia. 2021;36:33–40.CrossRefPubMed
9.
go back to reference Vaiman M, Eviatar E. Surface electromyography as a screening method for evaluation of dysphagia and odynophagia. Head Face Med. 2009;5(1):1–11.CrossRef Vaiman M, Eviatar E. Surface electromyography as a screening method for evaluation of dysphagia and odynophagia. Head Face Med. 2009;5(1):1–11.CrossRef
10.
go back to reference Jayatilake D, et al. Smartphone-based real-time assessment of swallowing ability from the swallowing sound. IEEE J Transl Eng Health Med. 2015;3:1–10.CrossRef Jayatilake D, et al. Smartphone-based real-time assessment of swallowing ability from the swallowing sound. IEEE J Transl Eng Health Med. 2015;3:1–10.CrossRef
11.
go back to reference Suiter DM, Leder SB. Clinical utility of the 3-ounce water swallow test. Dysphagia. 2008;23:244–50.CrossRefPubMed Suiter DM, Leder SB. Clinical utility of the 3-ounce water swallow test. Dysphagia. 2008;23:244–50.CrossRefPubMed
12.
go back to reference Garon BR, Engle M, Ormiston C. Reliability of the 3-oz water swallow test utilizing cough reflex as sole indicator of aspiration. J Neurol Rehabil. 1995;9(3):139–43. Garon BR, Engle M, Ormiston C. Reliability of the 3-oz water swallow test utilizing cough reflex as sole indicator of aspiration. J Neurol Rehabil. 1995;9(3):139–43.
13.
go back to reference Edmiaston J, et al. Validation of a dysphagia screening tool in acute stroke patients. Am J Crit Care. 2010;19(4):357–64.CrossRefPubMed Edmiaston J, et al. Validation of a dysphagia screening tool in acute stroke patients. Am J Crit Care. 2010;19(4):357–64.CrossRefPubMed
14.
go back to reference Trapl M, et al. Dysphagia bedside screening for acute-stroke patients: the Gugging Swallowing Screen. Stroke. 2007;38(11):2948–52.CrossRefPubMed Trapl M, et al. Dysphagia bedside screening for acute-stroke patients: the Gugging Swallowing Screen. Stroke. 2007;38(11):2948–52.CrossRefPubMed
15.
go back to reference Bahia MM, Mourao LF, Chun RYS. Dysarthria as a predictor of dysphagia following stroke. NeuroRehabilitation. 2016;38(2):155–62.CrossRefPubMed Bahia MM, Mourao LF, Chun RYS. Dysarthria as a predictor of dysphagia following stroke. NeuroRehabilitation. 2016;38(2):155–62.CrossRefPubMed
16.
go back to reference Daniels SK, et al. Aspiration in patients with acute stroke. Arch Phys Med Rehabil. 1998;79(1):14–9.CrossRefPubMed Daniels SK, et al. Aspiration in patients with acute stroke. Arch Phys Med Rehabil. 1998;79(1):14–9.CrossRefPubMed
17.
go back to reference Nishiwaki K, et al. Identification of a simple screening tool for dysphagia in patients with stroke using factor analysis of multiple dysphagia variables. J Rehabil Med. 2005;37(4):247–51.CrossRefPubMed Nishiwaki K, et al. Identification of a simple screening tool for dysphagia in patients with stroke using factor analysis of multiple dysphagia variables. J Rehabil Med. 2005;37(4):247–51.CrossRefPubMed
18.
go back to reference Kunieda K, et al. Reliability and validity of a tool to measure the severity of dysphagia: the Food Intake LEVEL Scale. J Pain Symptom Manage. 2013;46(2):201–6.CrossRefPubMed Kunieda K, et al. Reliability and validity of a tool to measure the severity of dysphagia: the Food Intake LEVEL Scale. J Pain Symptom Manage. 2013;46(2):201–6.CrossRefPubMed
19.
go back to reference Crary MA, Mann GDC, Groher ME. Initial psychometric assessment of a functional oral intake scale for dysphagia in stroke patients. Arch Phys Med Rehabil. 2005;86(8):1516–20.CrossRefPubMed Crary MA, Mann GDC, Groher ME. Initial psychometric assessment of a functional oral intake scale for dysphagia in stroke patients. Arch Phys Med Rehabil. 2005;86(8):1516–20.CrossRefPubMed
20.
go back to reference Antonios N, et al. Analysis of a physician tool for evaluating dysphagia on an inpatient stroke unit: the modified Mann Assessment of Swallowing Ability. J Stroke Cerebrovasc Dis. 2010;19(1):49–57.CrossRefPubMed Antonios N, et al. Analysis of a physician tool for evaluating dysphagia on an inpatient stroke unit: the modified Mann Assessment of Swallowing Ability. J Stroke Cerebrovasc Dis. 2010;19(1):49–57.CrossRefPubMed
21.
go back to reference Clavé P, et al. Accuracy of the volume-viscosity swallow test for clinical screening of oropharyngeal dysphagia and aspiration. Clin Nutr. 2008;27(6):806–15.CrossRefPubMed Clavé P, et al. Accuracy of the volume-viscosity swallow test for clinical screening of oropharyngeal dysphagia and aspiration. Clin Nutr. 2008;27(6):806–15.CrossRefPubMed
22.
go back to reference Audag N, et al. Screening and evaluation tools of dysphagia in adults with neuromuscular diseases: a systematic review. Ther Adv Chronic Dis. 2019;10:2040622318821622.CrossRefPubMedPubMedCentral Audag N, et al. Screening and evaluation tools of dysphagia in adults with neuromuscular diseases: a systematic review. Ther Adv Chronic Dis. 2019;10:2040622318821622.CrossRefPubMedPubMedCentral
23.
go back to reference Zhang P-P, et al. Diagnostic accuracy of the eating assessment tool-10 (EAT-10) in screening dysphagia: a systematic review and meta-analysis. Dysphagia. 2023;38(1):145–58.CrossRefPubMed Zhang P-P, et al. Diagnostic accuracy of the eating assessment tool-10 (EAT-10) in screening dysphagia: a systematic review and meta-analysis. Dysphagia. 2023;38(1):145–58.CrossRefPubMed
24.
go back to reference Brodsky MB, et al. Screening accuracy for aspiration using bedside water swallow tests: a systematic review and meta-analysis. Chest. 2016;150(1):148–63.CrossRefPubMedPubMedCentral Brodsky MB, et al. Screening accuracy for aspiration using bedside water swallow tests: a systematic review and meta-analysis. Chest. 2016;150(1):148–63.CrossRefPubMedPubMedCentral
25.
go back to reference Rofes L, et al. Sensitivity and specificity of the eating assessment tool and the volume-viscosity swallow test for clinical evaluation of oropharyngeal dysphagia. Neurogastroenterol Motil. 2014;26(9):1256–65.CrossRefPubMedPubMedCentral Rofes L, et al. Sensitivity and specificity of the eating assessment tool and the volume-viscosity swallow test for clinical evaluation of oropharyngeal dysphagia. Neurogastroenterol Motil. 2014;26(9):1256–65.CrossRefPubMedPubMedCentral
26.
go back to reference Song Y-J, et al. Predicting aspiration using the functions of production and quality of voice in dysphagic patients. J Korean Dysphagia Soc. 2022;12(1):50–8.CrossRef Song Y-J, et al. Predicting aspiration using the functions of production and quality of voice in dysphagic patients. J Korean Dysphagia Soc. 2022;12(1):50–8.CrossRef
27.
28.
go back to reference Roldan-Vasco S, et al. Machine learning based analysis of speech dimensions in functional oropharyngeal dysphagia. Comput Methods Programs Biomed. 2021;208: 106248.CrossRefPubMed Roldan-Vasco S, et al. Machine learning based analysis of speech dimensions in functional oropharyngeal dysphagia. Comput Methods Programs Biomed. 2021;208: 106248.CrossRefPubMed
29.
go back to reference Ryu JS, Park SR, Choi KH. Prediction of laryngeal aspiration using voice analysis. Am J Phys Med Rehabil. 2004;83(10):753–7.CrossRefPubMed Ryu JS, Park SR, Choi KH. Prediction of laryngeal aspiration using voice analysis. Am J Phys Med Rehabil. 2004;83(10):753–7.CrossRefPubMed
30.
go back to reference Waito A, et al. Voice-quality abnormalities as a sign of dysphagia: validation against acoustic and videofluoroscopic data. Dysphagia. 2011;26:125–34.CrossRefPubMed Waito A, et al. Voice-quality abnormalities as a sign of dysphagia: validation against acoustic and videofluoroscopic data. Dysphagia. 2011;26:125–34.CrossRefPubMed
31.
go back to reference Kang YA, et al. Detection of voice changes due to aspiration via acoustic voice analysis. Auris Nasus Larynx. 2018;45(4):801–6.CrossRefPubMed Kang YA, et al. Detection of voice changes due to aspiration via acoustic voice analysis. Auris Nasus Larynx. 2018;45(4):801–6.CrossRefPubMed
32.
go back to reference Salghetti A, Martinuzzi A. Dysphagia in cerebral palsy. East J Med. 2012;17(4):188. Salghetti A, Martinuzzi A. Dysphagia in cerebral palsy. East J Med. 2012;17(4):188.
33.
go back to reference Schmid F, Koutini K, Widmer G. Efficient large-scale audio tagging via transformer-to-cnn knowledge distillation. in ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023. IEEE. Schmid F, Koutini K, Widmer G. Efficient large-scale audio tagging via transformer-to-cnn knowledge distillation. in ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023. IEEE.
35.
go back to reference Logemann JA. Manual for the videofluoroscopic study of swallowing. Pro-Ed ed. Vol. 2. 1993, Texas: Ausin. Logemann JA. Manual for the videofluoroscopic study of swallowing. Pro-Ed ed. Vol. 2. 1993, Texas: Ausin.
39.
go back to reference Lou, S., et al. Audio-text retrieval in context. in ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022. IEEE. Lou, S., et al. Audio-text retrieval in context. in ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022. IEEE.
40.
go back to reference Gong Y, Chung Y-A, Glass J. Psla: improving audio tagging with pretraining, sampling, labeling, and aggregation. IEEE/ACM Trans Audio Speech Lang Process. 2021;29:3292–306.CrossRef Gong Y, Chung Y-A, Glass J. Psla: improving audio tagging with pretraining, sampling, labeling, and aggregation. IEEE/ACM Trans Audio Speech Lang Process. 2021;29:3292–306.CrossRef
42.
go back to reference Baijens LW, et al. European Society for Swallowing Disorders - European Union Geriatric Medicine Society white paper: oropharyngeal dysphagia as a geriatric syndrome. Clin Interv Aging. 2016;11:1403–1428. Baijens LW, et al. European Society for Swallowing Disorders - European Union Geriatric Medicine Society white paper: oropharyngeal dysphagia as a geriatric syndrome. Clin Interv Aging. 2016;11:1403–1428.
43.
go back to reference Rofes L, et al. Diagnosis and management of oropharyngeal dysphagia and its nutritional and respiratory complications in the elderly. Gastroenterol Res Pract. 2010;2011. Rofes L, et al. Diagnosis and management of oropharyngeal dysphagia and its nutritional and respiratory complications in the elderly. Gastroenterol Res Pract. 2010;2011.
44.
45.
go back to reference Huzaifah, M., Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. arXiv preprint arXiv:1706.07156, 2017. Huzaifah, M., Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. arXiv preprint arXiv:​1706.​07156, 2017.
46.
go back to reference Joshi D, Pareek J, Ambatkar P. Comparative study of Mfcc and Mel spectrogram for Raga classification using CNN. Indian J Sci Technol. 2023;16(11):816–22.CrossRef Joshi D, Pareek J, Ambatkar P. Comparative study of Mfcc and Mel spectrogram for Raga classification using CNN. Indian J Sci Technol. 2023;16(11):816–22.CrossRef
47.
go back to reference Chung K, McKibben N. Microphone directionality, pre-emphasis filter, and wind noise in cochlear implants. J Am Acad Audiol. 2011;22(09):586–600.CrossRefPubMed Chung K, McKibben N. Microphone directionality, pre-emphasis filter, and wind noise in cochlear implants. J Am Acad Audiol. 2011;22(09):586–600.CrossRefPubMed
48.
go back to reference Hershey S et al. CNN architectures for large-scale audio classification. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2017. IEEE. Hershey S et al. CNN architectures for large-scale audio classification. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2017. IEEE.
49.
go back to reference He K et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–778. He K et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–778.
50.
go back to reference Sun S. Digital audio scene recognition method based on machine learning technology. Sci Program. 2021;2021:1–9. Sun S. Digital audio scene recognition method based on machine learning technology. Sci Program. 2021;2021:1–9.
51.
go back to reference Pollak P, Behunek M. Accuracy of MP3 Speech Recognition Under Real-World Conditions. Electrical Engineering, Czech Technical University in Prague, 2011. Pollak P, Behunek M. Accuracy of MP3 Speech Recognition Under Real-World Conditions. Electrical Engineering, Czech Technical University in Prague, 2011.
52.
go back to reference Fuchs R, Maxwell O. The effects of mp3 compression on acoustic measurements of fundamental frequency and pitch range. In: Speech prosody. 2016. p. 523–527. Fuchs R, Maxwell O. The effects of mp3 compression on acoustic measurements of fundamental frequency and pitch range. In: Speech prosody. 2016. p. 523–527.
53.
go back to reference Cavalcanti JC, et al. Microphone and audio compression effects on acoustic voice analysis: a pilot study. J Voice. 2023;37(2):162–72.CrossRefPubMed Cavalcanti JC, et al. Microphone and audio compression effects on acoustic voice analysis: a pilot study. J Voice. 2023;37(2):162–72.CrossRefPubMed
55.
go back to reference Ji Y et al. HDF5-based I/O optimization for extragalactic HI data pipeline of FAST. In Algorithms and Architectures for Parallel Processing: 19th International Conference, ICA3PP 2019, Melbourne, VIC, Australia, December 9–11, 2019, Proceedings, Part II 19. 2020. Springer. Ji Y et al. HDF5-based I/O optimization for extragalactic HI data pipeline of FAST. In Algorithms and Architectures for Parallel Processing: 19th International Conference, ICA3PP 2019, Melbourne, VIC, Australia, December 9–11, 2019, Proceedings, Part II 19. 2020. Springer.
56.
go back to reference Howard A et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision. 2019. Howard A et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision. 2019.
Metadata
Title
Prediction of dysphagia aspiration through machine learning-based analysis of patients’ postprandial voices
Authors
Jung-Min Kim
Min-Seop Kim
Sun-Young Choi
Ju Seok Ryu
Publication date
01-12-2024
Publisher
BioMed Central
Published in
Journal of NeuroEngineering and Rehabilitation / Issue 1/2024
Electronic ISSN: 1743-0003
DOI
https://doi.org/10.1186/s12984-024-01329-6

Other articles of this Issue 1/2024

Journal of NeuroEngineering and Rehabilitation 1/2024 Go to the issue