Skip to main content
Top
Published in: Insights into Imaging 1/2023

Open Access 01-12-2023 | Original Article

Natural language processing for automatic evaluation of free-text answers — a feasibility study based on the European Diploma in Radiology examination

Authors: Fabian Stoehr, Benedikt Kämpgen, Lukas Müller, Laura Oleaga Zufiría, Vanesa Junquero, Cristina Merino, Peter Mildenberger, Roman Kloeckner

Published in: Insights into Imaging | Issue 1/2023

Login to get access

Abstract

Background

Written medical examinations consist of multiple-choice questions and/or free-text answers. The latter require manual evaluation and rating, which is time-consuming and potentially error-prone. We tested whether natural language processing (NLP) can be used to automatically analyze free-text answers to support the review process.

Methods

The European Board of Radiology of the European Society of Radiology provided representative datasets comprising sample questions, answer keys, participant answers, and reviewer markings from European Diploma in Radiology examinations. Three free-text questions with the highest number of corresponding answers were selected: Questions 1 and 2 were “unstructured” and required a typical free-text answer whereas question 3 was “structured” and offered a selection of predefined wordings/phrases for participants to use in their free-text answer. The NLP engine was designed using word lists, rule-based synonyms, and decision tree learning based on the answer keys and its performance tested against the gold standard of reviewer markings.

Results

After implementing the NLP approach in Python, F1 scores were calculated as a measure of NLP performance: 0.26 (unstructured question 1, n = 96), 0.33 (unstructured question 2, n = 327), and 0.5 (more structured question, n = 111). The respective precision/recall values were 0.26/0.27, 0.4/0.32, and 0.62/0.55.

Conclusion

This study showed the successful design of an NLP-based approach for automatic evaluation of free-text answers in the EDiR examination. Thus, as a future field of application, NLP could work as a decision-support system for reviewers and support the design of examinations being adjusted to the requirements of an automated, NLP-based review process.

Clinical relevance statement

Natural language processing can be successfully used to automatically evaluate free-text answers, performing better with more structured question-answer formats. Furthermore, this study provides a baseline for further work applying, e.g., more elaborated NLP approaches/large language models.

Key points

• Free-text answers require manual evaluation, which is time-consuming and potentially error-prone.
• We developed a simple NLP-based approach — requiring only minimal effort/modeling — to automatically analyze and mark free-text answers.
• Our NLP engine has the potential to support the manual evaluation process.
• NLP performance is better on a more structured question-answer format.

Graphical Abstract

Appendix
Available only for authorised users
Literature
2.
go back to reference Case S, Swanson D (2002) Constructing Written Test Questions For the Basic and Clinical Sciences. Natl Board Exam Case S, Swanson D (2002) Constructing Written Test Questions For the Basic and Clinical Sciences. Natl Board Exam
8.
go back to reference Sarker A, Klein AZ, Mee J et al (2019) An interpretable natural language processing system for written medical examination assessment. J Biomed Inform 98:103268CrossRefPubMed Sarker A, Klein AZ, Mee J et al (2019) An interpretable natural language processing system for written medical examination assessment. J Biomed Inform 98:103268CrossRefPubMed
9.
go back to reference Engelhard JG, Wang J, Wind SA (2018) A tale of two models: psychometric and cognitive perspectives on rater-mediated assessments using accuracy ratings. Psychol Test Assess Model 60:33–52 Engelhard JG, Wang J, Wind SA (2018) A tale of two models: psychometric and cognitive perspectives on rater-mediated assessments using accuracy ratings. Psychol Test Assess Model 60:33–52
15.
go back to reference Sanuvala G, Fatima SS (2021) A study of automated evaluation of student’s examination paper using machine learning techniques. In: 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). IEEE, pp 1049–1054 Sanuvala G, Fatima SS (2021) A study of automated evaluation of student’s examination paper using machine learning techniques. In: 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). IEEE, pp 1049–1054
16.
go back to reference Furlan R, Gatti M, Menè R et al (2021) A natural language processing–based virtual patient simulator and intelligent tutoring system for the clinical diagnostic process: simulator development and case study. JMIR Med Informatics 9:e24073. https://doi.org/10.2196/24073CrossRef Furlan R, Gatti M, Menè R et al (2021) A natural language processing–based virtual patient simulator and intelligent tutoring system for the clinical diagnostic process: simulator development and case study. JMIR Med Informatics 9:e24073. https://​doi.​org/​10.​2196/​24073CrossRef
26.
go back to reference Fatehi M, Pinto dos Santos D (2022) Structured reporting in radiology. Springer International Publishing, ChamCrossRef Fatehi M, Pinto dos Santos D (2022) Structured reporting in radiology. Springer International Publishing, ChamCrossRef
Metadata
Title
Natural language processing for automatic evaluation of free-text answers — a feasibility study based on the European Diploma in Radiology examination
Authors
Fabian Stoehr
Benedikt Kämpgen
Lukas Müller
Laura Oleaga Zufiría
Vanesa Junquero
Cristina Merino
Peter Mildenberger
Roman Kloeckner
Publication date
01-12-2023
Publisher
Springer Vienna
Published in
Insights into Imaging / Issue 1/2023
Electronic ISSN: 1869-4101
DOI
https://doi.org/10.1186/s13244-023-01507-5

Other articles of this Issue 1/2023

Insights into Imaging 1/2023 Go to the issue