Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2024

Open Access 01-12-2024 | Research

Assessing the research landscape and clinical utility of large language models: a scoping review

Authors: Ye-Jean Park, Abhinav Pillai, Jiawen Deng, Eddie Guo, Mehul Gupta, Mike Paget, Christopher Naugler

Published in: BMC Medical Informatics and Decision Making | Issue 1/2024

Login to get access

Abstract

Importance

Large language models (LLMs) like OpenAI’s ChatGPT are powerful generative systems that rapidly synthesize natural language responses. Research on LLMs has revealed their potential and pitfalls, especially in clinical settings. However, the evolving landscape of LLM research in medicine has left several gaps regarding their evaluation, application, and evidence base.

Objective

This scoping review aims to (1) summarize current research evidence on the accuracy and efficacy of LLMs in medical applications, (2) discuss the ethical, legal, logistical, and socioeconomic implications of LLM use in clinical settings, (3) explore barriers and facilitators to LLM implementation in healthcare, (4) propose a standardized evaluation framework for assessing LLMs’ clinical utility, and (5) identify evidence gaps and propose future research directions for LLMs in clinical applications.

Evidence review

We screened 4,036 records from MEDLINE, EMBASE, CINAHL, medRxiv, bioRxiv, and arXiv from January 2023 (inception of the search) to June 26, 2023 for English-language papers and analyzed findings from 55 worldwide studies. Quality of evidence was reported based on the Oxford Centre for Evidence-based Medicine recommendations.

Findings

Our results demonstrate that LLMs show promise in compiling patient notes, assisting patients in navigating the healthcare system, and to some extent, supporting clinical decision-making when combined with human oversight. However, their utilization is limited by biases in training data that may harm patients, the generation of inaccurate but convincing information, and ethical, legal, socioeconomic, and privacy concerns. We also identified a lack of standardized methods for evaluating LLMs’ effectiveness and feasibility.

Conclusions and relevance

This review thus highlights potential future directions and questions to address these limitations and to further explore LLMs’ potential in enhancing healthcare delivery.
Appendix
Available only for authorised users
Literature
1.
go back to reference Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. NPJ Digit Med. 2022;5(1):194.CrossRefPubMedPubMedCentral Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. NPJ Digit Med. 2022;5(1):194.CrossRefPubMedPubMedCentral
8.
go back to reference Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for science. Nature. 2023;614(7947):214–6.ADSCrossRefPubMed Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for science. Nature. 2023;614(7947):214–6.ADSCrossRefPubMed
10.
go back to reference Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for scoping reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018;169(7):467–73.CrossRefPubMed Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for scoping reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018;169(7):467–73.CrossRefPubMed
12.
go back to reference Ali SR, Dobbs TD, Hutchings HA, Whitaker IS. Using ChatGPT to write patient clinic letters. Lancet Digit Health. 2023;5(4):e179–81.CrossRefPubMed Ali SR, Dobbs TD, Hutchings HA, Whitaker IS. Using ChatGPT to write patient clinic letters. Lancet Digit Health. 2023;5(4):e179–81.CrossRefPubMed
13.
go back to reference Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in Healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33.CrossRefPubMedPubMedCentral Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in Healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33.CrossRefPubMedPubMedCentral
14.
go back to reference Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5(3):e107–8.CrossRefPubMed Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5(3):e107–8.CrossRefPubMed
15.
go back to reference Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med. 2023;388(13):1233–9.CrossRefPubMed Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med. 2023;388(13):1233–9.CrossRefPubMed
16.
go back to reference Puthenpura V, Nadkarni S, DiLuna M, Hieftje K, Marks A. Personality changes and staring spells in a 12-Year-old child: a Case Report incorporating ChatGPT, a Natural Language Processing Tool Driven by Artificial Intelligence (AI). Cureus. 2023;15(3):e36408.PubMedPubMedCentral Puthenpura V, Nadkarni S, DiLuna M, Hieftje K, Marks A. Personality changes and staring spells in a 12-Year-old child: a Case Report incorporating ChatGPT, a Natural Language Processing Tool Driven by Artificial Intelligence (AI). Cureus. 2023;15(3):e36408.PubMedPubMedCentral
17.
go back to reference Lantz R. Toxic epidermal necrolysis in a critically ill African American woman: a Case Report Written with ChatGPT Assistance. Cureus. 2023;15(3):e35742.PubMedPubMedCentral Lantz R. Toxic epidermal necrolysis in a critically ill African American woman: a Case Report Written with ChatGPT Assistance. Cureus. 2023;15(3):e35742.PubMedPubMedCentral
19.
go back to reference Sezgin E, Sirrianni J, Linwood SL, Operationalizing, Pretrained I. Large Artificial Intelligence Linguistic Models in the US Health Care System: Outlook of Generative Pretrained Transformer 3 (GPT-3) as a service model. JMIR Med Inf. 2022;10(2):e32875.CrossRef Sezgin E, Sirrianni J, Linwood SL, Operationalizing, Pretrained I. Large Artificial Intelligence Linguistic Models in the US Health Care System: Outlook of Generative Pretrained Transformer 3 (GPT-3) as a service model. JMIR Med Inf. 2022;10(2):e32875.CrossRef
21.
23.
go back to reference Lyu Q, Tan J, Zapadka ME, Ponnatapura J, Niu C, Myers KJ et al. Translating Radiology Reports into Plain Language using ChatGPT and GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential [Internet]. arXiv [cs.CL]. 2023. Available from: http://arxiv.org/abs/2303.09038. Lyu Q, Tan J, Zapadka ME, Ponnatapura J, Niu C, Myers KJ et al. Translating Radiology Reports into Plain Language using ChatGPT and GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential [Internet]. arXiv [cs.CL]. 2023. Available from: http://​arxiv.​org/​abs/​2303.​09038.
25.
go back to reference Zhu L, Mou W, Chen R. Can the ChatGPT and other large language models with internet-connected database solve the questions and concerns of patient with prostate cancer and help democratize medical knowledge? J Transl Med. 2023;21(1):269.CrossRefPubMedPubMedCentral Zhu L, Mou W, Chen R. Can the ChatGPT and other large language models with internet-connected database solve the questions and concerns of patient with prostate cancer and help democratize medical knowledge? J Transl Med. 2023;21(1):269.CrossRefPubMedPubMedCentral
29.
go back to reference Chari S, Acharya P, Gruen DM, Zhang O, Eyigoz EK, Ghalwash M, et al. Informing clinical assessment by contextualizing post-hoc explanations of risk prediction models in type-2 diabetes. Artif Intell Med. 2023;137:102498.CrossRefPubMed Chari S, Acharya P, Gruen DM, Zhang O, Eyigoz EK, Ghalwash M, et al. Informing clinical assessment by contextualizing post-hoc explanations of risk prediction models in type-2 diabetes. Artif Intell Med. 2023;137:102498.CrossRefPubMed
30.
go back to reference DiGiorgio AM, Ehrenfeld JM. Artificial Intelligence in Medicine & ChatGPT: De-tether the Physician. J Med Syst. 2023;47(1):32.CrossRefPubMed DiGiorgio AM, Ehrenfeld JM. Artificial Intelligence in Medicine & ChatGPT: De-tether the Physician. J Med Syst. 2023;47(1):32.CrossRefPubMed
31.
go back to reference Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT - reshaping medical education and clinical management. Pak J Med Sci Q. 2023 Mar-Apr;39(2):605–7. Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT - reshaping medical education and clinical management. Pak J Med Sci Q. 2023 Mar-Apr;39(2):605–7.
35.
go back to reference Ufuk F. The Role and limitations of large Language models such as ChatGPT in Clinical settings and Medical Journalism. Radiology. 2023;307(3):e230276.CrossRefPubMed Ufuk F. The Role and limitations of large Language models such as ChatGPT in Clinical settings and Medical Journalism. Radiology. 2023;307(3):e230276.CrossRefPubMed
36.
go back to reference Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C et al. GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records [Internet]. arXiv [cs.CL]. 2022. Available from: http://arxiv.org/abs/2203.03540. Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C et al. GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records [Internet]. arXiv [cs.CL]. 2022. Available from: http://​arxiv.​org/​abs/​2203.​03540.
40.
go back to reference Au Yeung J, Kraljevic Z, Luintel A, Balston A, Idowu E, Dobson RJ, et al. AI chatbots not yet ready for clinical use. Front Digit Health. 2023;5:1161098.CrossRefPubMedPubMedCentral Au Yeung J, Kraljevic Z, Luintel A, Balston A, Idowu E, Dobson RJ, et al. AI chatbots not yet ready for clinical use. Front Digit Health. 2023;5:1161098.CrossRefPubMedPubMedCentral
49.
go back to reference Wagner MW, Ertl-Wagner BB. Accuracy of information and references using ChatGPT-3 for Retrieval of Clinical Radiological Information. Can Assoc Radiol J. 2023;8465371231171125. Wagner MW, Ertl-Wagner BB. Accuracy of information and references using ChatGPT-3 for Retrieval of Clinical Radiological Information. Can Assoc Radiol J. 2023;8465371231171125.
54.
55.
go back to reference Snoswell CL, Snoswell AJ, Kelly JT, Caffery LJ, Smith AC. Artificial intelligence: augmenting telehealth with large language models. J Telemed Telecare. 2023;1357633X:231169055. Snoswell CL, Snoswell AJ, Kelly JT, Caffery LJ, Smith AC. Artificial intelligence: augmenting telehealth with large language models. J Telemed Telecare. 2023;1357633X:231169055.
56.
go back to reference Danilov G, Kotik K, Shevchenko E, Usachev D, Shifrin M, Strunina Y, et al. Length of Stay Prediction in Neurosurgery with Russian GPT-3 Language Model compared to human expectations. Stud Health Technol Inf. 2022;289:156–9. Danilov G, Kotik K, Shevchenko E, Usachev D, Shifrin M, Strunina Y, et al. Length of Stay Prediction in Neurosurgery with Russian GPT-3 Language Model compared to human expectations. Stud Health Technol Inf. 2022;289:156–9.
57.
go back to reference Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. Int J Environ Res Public Health [Internet]. 2023;20(4). https://doi.org/10.3390/ijerph20043378. Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. Int J Environ Res Public Health [Internet]. 2023;20(4). https://​doi.​org/​10.​3390/​ijerph20043378.
63.
go back to reference Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and Artificial Intelligence Chatbot responses to patient questions posted to a Public Social Media Forum. JAMA Intern Med. 2023;183(6):589–96.CrossRefPubMed Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and Artificial Intelligence Chatbot responses to patient questions posted to a Public Social Media Forum. JAMA Intern Med. 2023;183(6):589–96.CrossRefPubMed
65.
go back to reference Brown H, Lee K, Mireshghallah F, Shokri R, Tramèr F. What Does it Mean for a Language Model to Preserve Privacy? In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. New York, NY, USA: Association for Computing Machinery; 2022. p. 2280–92. (FAccT ’22). Brown H, Lee K, Mireshghallah F, Shokri R, Tramèr F. What Does it Mean for a Language Model to Preserve Privacy? In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. New York, NY, USA: Association for Computing Machinery; 2022. p. 2280–92. (FAccT ’22).
66.
go back to reference Mireshghallah F, Goyal K, Uniyal A, Berg-Kirkpatrick T, Shokri R. Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks [Internet]. arXiv [cs.LG]. 2022. Available from: http://arxiv.org/abs/2203.03929. Mireshghallah F, Goyal K, Uniyal A, Berg-Kirkpatrick T, Shokri R. Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks [Internet]. arXiv [cs.LG]. 2022. Available from: http://​arxiv.​org/​abs/​2203.​03929.
68.
go back to reference Kraljevic Z, Bean D, Shek A, Bendayan R, Hemingway H, Au Yeung J et al. Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs [Internet]. arXiv [cs.CL]. 2022. Available from: http://arxiv.org/abs/2212.08072. Kraljevic Z, Bean D, Shek A, Bendayan R, Hemingway H, Au Yeung J et al. Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs [Internet]. arXiv [cs.CL]. 2022. Available from: http://​arxiv.​org/​abs/​2212.​08072.
73.
go back to reference Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. 2022;35:27730–44. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. 2022;35:27730–44.
74.
75.
go back to reference Siala H, Wang Y. SHIFTing artificial intelligence to be responsible in healthcare: a systematic review. Soc Sci Med. 2022;296:114782.CrossRefPubMed Siala H, Wang Y. SHIFTing artificial intelligence to be responsible in healthcare: a systematic review. Soc Sci Med. 2022;296:114782.CrossRefPubMed
76.
go back to reference Lambert SI, Madi M, Sopka S, Lenes A, Stange H, Buszello CP, et al. An integrative review on the acceptance of artificial intelligence among healthcare professionals in hospitals. NPJ Digit Med. 2023;6(1):111.CrossRefPubMedPubMedCentral Lambert SI, Madi M, Sopka S, Lenes A, Stange H, Buszello CP, et al. An integrative review on the acceptance of artificial intelligence among healthcare professionals in hospitals. NPJ Digit Med. 2023;6(1):111.CrossRefPubMedPubMedCentral
80.
go back to reference Lau FF, Ronit A, Weis N, Winckelmann A. Reactive infectious mucosal eruptions (RIME) secondary to Chlamydia pneumoniae infection. Rep Int Dev Res Cent Can. 2021;4(2):11. Lau FF, Ronit A, Weis N, Winckelmann A. Reactive infectious mucosal eruptions (RIME) secondary to Chlamydia pneumoniae infection. Rep Int Dev Res Cent Can. 2021;4(2):11.
81.
go back to reference Graham ID, Harrison MB. Evaluation and adaptation of clinical practice guidelines. Evid Based Nurs. 2005;8(3):68–72.CrossRefPubMed Graham ID, Harrison MB. Evaluation and adaptation of clinical practice guidelines. Evid Based Nurs. 2005;8(3):68–72.CrossRefPubMed
Metadata
Title
Assessing the research landscape and clinical utility of large language models: a scoping review
Authors
Ye-Jean Park
Abhinav Pillai
Jiawen Deng
Eddie Guo
Mehul Gupta
Mike Paget
Christopher Naugler
Publication date
01-12-2024
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2024
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-024-02459-6

Other articles of this Issue 1/2024

BMC Medical Informatics and Decision Making 1/2024 Go to the issue