Skip to main content
Top

Open Access 04-05-2024 | Artificial Intelligence | Miscellaneous

Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms

Authors: Luigi Angelo Vaira, Jerome R. Lechien, Vincenzo Abbate, Fabiana Allevi, Giovanni Audino, Giada Anna Beltramini, Michela Bergonzani, Paolo Boscolo-Rizzo, Gianluigi Califano, Giovanni Cammaroto, Carlos M. Chiesa-Estomba, Umberto Committeri, Salvatore Crimi, Nicholas R. Curran, Francesco di Bello, Arianna di Stadio, Andrea Frosolini, Guido Gabriele, Isabelle M. Gengler, Fabio Lonardi, Fabio Maglitto, Miguel Mayo-Yáñez, Marzia Petrocelli, Resi Pucci, Alberto Maria Saibene, Gianmarco Saponaro, Alessandro Tel, Franco Trabalzini, Eleonora M. C. Trecca, Valentino Vellone, Giovanni Salzano, Giacomo De Riu

Published in: European Archives of Oto-Rhino-Laryngology

Login to get access

Abstract

Background

The widespread diffusion of Artificial Intelligence (AI) platforms is revolutionizing how health-related information is disseminated, thereby highlighting the need for tools to evaluate the quality of such information. This study aimed to propose and validate the Quality Assessment of Medical Artificial Intelligence (QAMAI), a tool specifically designed to assess the quality of health information provided by AI platforms.

Methods

The QAMAI tool has been developed by a panel of experts following guidelines for the development of new questionnaires. A total of 30 responses from ChatGPT4, addressing patient queries, theoretical questions, and clinical head and neck surgery scenarios were assessed by 27 reviewers from 25 academic centers worldwide. Construct validity, internal consistency, inter-rater and test–retest reliability were assessed to validate the tool.

Results

The validation was conducted on the basis of 792 assessments for the 30 responses given by ChatGPT4. The results of the exploratory factor analysis revealed a unidimensional structure of the QAMAI with a single factor comprising all the items that explained 51.1% of the variance with factor loadings ranging from 0.449 to 0.856. Overall internal consistency was high (Cronbach's alpha = 0.837). The Interclass Correlation Coefficient was 0.983 (95% CI 0.973–0.991; F (29,542) = 68.3; p < 0.001), indicating excellent reliability. Test–retest reliability analysis revealed a moderate-to-strong correlation with a Pearson’s coefficient of 0.876 (95% CI 0.859–0.891; p < 0.001).

Conclusions

The QAMAI tool demonstrated significant reliability and validity in assessing the quality of health information provided by AI platforms. Such a tool might become particularly important/useful for physicians as patients increasingly seek medical information on AI platforms.
Appendix
Available only for authorised users
Literature
1.
go back to reference Aung YYM, Wong DCS, Ting DSW (2021) The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare. Br Med Bull 139:4–15CrossRefPubMed Aung YYM, Wong DCS, Ting DSW (2021) The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare. Br Med Bull 139:4–15CrossRefPubMed
2.
5.
go back to reference Barat M, Soyer P, Dohan A (2023) Appropriateness of recommendations provided by ChatGPT to interventional radiologists. Can Assoc Radiol J 74:758–763CrossRefPubMed Barat M, Soyer P, Dohan A (2023) Appropriateness of recommendations provided by ChatGPT to interventional radiologists. Can Assoc Radiol J 74:758–763CrossRefPubMed
6.
go back to reference Cheng K, Sun Z, He Y et al (2023) The potential impact of ChatGPT/GPT-4 on surgery: will it topple the profession of surgeons? Int J Surg 109:1545–1547CrossRefPubMedPubMedCentral Cheng K, Sun Z, He Y et al (2023) The potential impact of ChatGPT/GPT-4 on surgery: will it topple the profession of surgeons? Int J Surg 109:1545–1547CrossRefPubMedPubMedCentral
9.
go back to reference Hopkins AM, Logan JM, Kichenadasse G et al (2023) Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr 7:pkad010CrossRefPubMedPubMedCentral Hopkins AM, Logan JM, Kichenadasse G et al (2023) Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr 7:pkad010CrossRefPubMedPubMedCentral
10.
go back to reference Dave T, Athaluri SA, Singh S (2023) ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 6:1169595CrossRefPubMedPubMedCentral Dave T, Athaluri SA, Singh S (2023) ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 6:1169595CrossRefPubMedPubMedCentral
11.
go back to reference Rao A, Pang M, Kim J et al (2023) Assessing the utility of ChatGPT throughout the entire clinical workflow: development and usability study. J Med Internet Res 25:e48659CrossRefPubMedPubMedCentral Rao A, Pang M, Kim J et al (2023) Assessing the utility of ChatGPT throughout the entire clinical workflow: development and usability study. J Med Internet Res 25:e48659CrossRefPubMedPubMedCentral
12.
go back to reference Sallam M (2023) ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healtcare (Basel) 11:887 Sallam M (2023) ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healtcare (Basel) 11:887
14.
go back to reference Bernstam EV, Shelton DM, Walji M et al (2005) Instruments to assess the quality of health information on the world wide web: what can our patients actually use? Int J Med Inform 74:13–19CrossRefPubMed Bernstam EV, Shelton DM, Walji M et al (2005) Instruments to assess the quality of health information on the world wide web: what can our patients actually use? Int J Med Inform 74:13–19CrossRefPubMed
17.
go back to reference Deiana G, Dettori M, Arghittu A et al (2023) Artificial intelligence and public health: evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines 11:1217CrossRefPubMedPubMedCentral Deiana G, Dettori M, Arghittu A et al (2023) Artificial intelligence and public health: evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines 11:1217CrossRefPubMedPubMedCentral
18.
go back to reference Chiesa-Estomba CM, Lechien JR, Vaira LA et al (2023) Exploring the potential of Chat-GPT as a supportive tool for sialoendoscopy and clinical decision making and patient information support. Eur Arch Otolaryngol 281:2081–2086CrossRef Chiesa-Estomba CM, Lechien JR, Vaira LA et al (2023) Exploring the potential of Chat-GPT as a supportive tool for sialoendoscopy and clinical decision making and patient information support. Eur Arch Otolaryngol 281:2081–2086CrossRef
22.
go back to reference Saibene AM, Allevi F, Calvo-Henriquez C et al (2024) Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation. Eur Arch Otorhinolaryngol 281:1835–1841CrossRefPubMedPubMedCentral Saibene AM, Allevi F, Calvo-Henriquez C et al (2024) Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation. Eur Arch Otorhinolaryngol 281:1835–1841CrossRefPubMedPubMedCentral
23.
go back to reference Charnock D, Shepperd S, Needham G et al (1999) DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 53:105–111CrossRefPubMedPubMedCentral Charnock D, Shepperd S, Needham G et al (1999) DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 53:105–111CrossRefPubMedPubMedCentral
24.
go back to reference Khazaal Y, Chatton A, Cochand S et al (2009) Brief DISCERN, six questions for the evaluation of evidence-based content of health-related websites. Patient Educ Couns 77:33–37CrossRefPubMed Khazaal Y, Chatton A, Cochand S et al (2009) Brief DISCERN, six questions for the evaluation of evidence-based content of health-related websites. Patient Educ Couns 77:33–37CrossRefPubMed
25.
go back to reference Olkun HK, Olkun RS (2021) Evaluation of the quality of information on the internet about 2019 coronavirus outbreak in relation to orthodontics. Health Technol (Berl) 11:437–441CrossRefPubMed Olkun HK, Olkun RS (2021) Evaluation of the quality of information on the internet about 2019 coronavirus outbreak in relation to orthodontics. Health Technol (Berl) 11:437–441CrossRefPubMed
26.
go back to reference Terrens AF, Soh SE, Morgan P (2022) What web-based information is available for people with Parkinson’s disease interested in aquatic physiotherapy? A social listening study. BMC Neurol 22:170CrossRefPubMedPubMedCentral Terrens AF, Soh SE, Morgan P (2022) What web-based information is available for people with Parkinson’s disease interested in aquatic physiotherapy? A social listening study. BMC Neurol 22:170CrossRefPubMedPubMedCentral
27.
go back to reference Vaira LA, Sergnese S, Salzano G et al (2023) Are YouTube videos a useful and reliable source of information for patients with temporomandibular joint diseorders? J Clin Med 12:817CrossRefPubMedPubMedCentral Vaira LA, Sergnese S, Salzano G et al (2023) Are YouTube videos a useful and reliable source of information for patients with temporomandibular joint diseorders? J Clin Med 12:817CrossRefPubMedPubMedCentral
29.
go back to reference Tsang S, Royse CF, Terkawi AS (2017) Guidelines for developing, translating, and validating a questionnaiure in preoperative and pain medicine. Saudi J Anaesth 11:S80–S89CrossRefPubMedPubMedCentral Tsang S, Royse CF, Terkawi AS (2017) Guidelines for developing, translating, and validating a questionnaiure in preoperative and pain medicine. Saudi J Anaesth 11:S80–S89CrossRefPubMedPubMedCentral
30.
go back to reference Dziuban CD, Shirkey EC (1974) When is a correlation matrix appropriate for factor analysis? Some decision rules. Phychol Bull 81:358–361CrossRef Dziuban CD, Shirkey EC (1974) When is a correlation matrix appropriate for factor analysis? Some decision rules. Phychol Bull 81:358–361CrossRef
31.
go back to reference Wolf MG, McNeish D (2023) Dynamic: an R package for deriving dynamic fit index cutoffs for factor analysis. Multivariate Behav Res 58:189–194CrossRefPubMed Wolf MG, McNeish D (2023) Dynamic: an R package for deriving dynamic fit index cutoffs for factor analysis. Multivariate Behav Res 58:189–194CrossRefPubMed
32.
go back to reference Streiner DL (2003) Starting at the beginning: an introduction to coefficient alpha and internal consistency. J Pers Assess 80:99–103CrossRefPubMed Streiner DL (2003) Starting at the beginning: an introduction to coefficient alpha and internal consistency. J Pers Assess 80:99–103CrossRefPubMed
33.
34.
go back to reference Minssen T, Vayena E, Cohen IG (2023) The challenges for regulating medical use of ChatGPT and other large language models. JAMA 330:315–316CrossRefPubMed Minssen T, Vayena E, Cohen IG (2023) The challenges for regulating medical use of ChatGPT and other large language models. JAMA 330:315–316CrossRefPubMed
35.
go back to reference Marks M, Haupt CE (2023) AI chatbots, health privacy, and challenges to HIPAA compliance. JAMA 330:309–310CrossRefPubMed Marks M, Haupt CE (2023) AI chatbots, health privacy, and challenges to HIPAA compliance. JAMA 330:309–310CrossRefPubMed
36.
go back to reference Frosolini A, Franz L, Benedetti S et al (2023) Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol 280:5129–5133CrossRefPubMed Frosolini A, Franz L, Benedetti S et al (2023) Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol 280:5129–5133CrossRefPubMed
37.
go back to reference Wagner MW, Ertl-Wagner BB (2024) Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information. Can Assoc Radiol J 75:69–73CrossRefPubMed Wagner MW, Ertl-Wagner BB (2024) Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information. Can Assoc Radiol J 75:69–73CrossRefPubMed
38.
go back to reference Lechien JR, Briganti G, Vaira LA (2024) Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology-head and neck surgery. Eur Arch Otorhinolaryngol 281:2159–2165CrossRefPubMed Lechien JR, Briganti G, Vaira LA (2024) Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology-head and neck surgery. Eur Arch Otorhinolaryngol 281:2159–2165CrossRefPubMed
Metadata
Title
Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms
Authors
Luigi Angelo Vaira
Jerome R. Lechien
Vincenzo Abbate
Fabiana Allevi
Giovanni Audino
Giada Anna Beltramini
Michela Bergonzani
Paolo Boscolo-Rizzo
Gianluigi Califano
Giovanni Cammaroto
Carlos M. Chiesa-Estomba
Umberto Committeri
Salvatore Crimi
Nicholas R. Curran
Francesco di Bello
Arianna di Stadio
Andrea Frosolini
Guido Gabriele
Isabelle M. Gengler
Fabio Lonardi
Fabio Maglitto
Miguel Mayo-Yáñez
Marzia Petrocelli
Resi Pucci
Alberto Maria Saibene
Gianmarco Saponaro
Alessandro Tel
Franco Trabalzini
Eleonora M. C. Trecca
Valentino Vellone
Giovanni Salzano
Giacomo De Riu
Publication date
04-05-2024
Publisher
Springer Berlin Heidelberg
Published in
European Archives of Oto-Rhino-Laryngology
Print ISSN: 0937-4477
Electronic ISSN: 1434-4726
DOI
https://doi.org/10.1007/s00405-024-08710-0