A Content-Aware Chatbot based on GPT 4 provides trustworthy Recommendations for Cone Beam Computed Tomography Guidelines in Dental Imaging.
Options
BORIS DOI
Publisher DOI
PubMed ID
38180877
Description
OBJECTIVES
To develop a content-aware chatbot based on GPT-3.5-Turbo and GPT-4 with specialized knowledge on the German S2 Cone-Beam CT (CBCT) dental imaging guideline and to compare the performance against humans.
METHODS
The LlamaIndex software library was used to integrate the guideline context into the chatbots. Based on the CBCT S2 guideline, 40 questions were posed to content-aware chatbots and early career and senior practitioners with different levels of experience served as reference. The chatbots' performance was compared in terms of recommendation accuracy and explanation quality. Chi-square test and one-tailed Wilcoxon signed rank test evaluated accuracy and explanation quality, respectively.
RESULTS
The GPT-4 based chatbot provided 100% correct recommendations and superior explanation quality compared to the one based on GPT3.5-Turbo (87.5% vs. 57.5% for GPT-3.5-Turbo; p = 0.003). Moreover, it outperformed early career practitioners in correct answers (p = 0.002 and p = 0.032) and earned higher trust than the chatbot using GPT-3.5-Turbo (p = 0.006).
CONCLUSIONS
A content-aware chatbot using GPT-4 reliably provided recommendations according to current consensus guidelines. The responses were deemed trustworthy and transparent and therefore facilitate the integration of artificial intelligence into clinical decision-making.
To develop a content-aware chatbot based on GPT-3.5-Turbo and GPT-4 with specialized knowledge on the German S2 Cone-Beam CT (CBCT) dental imaging guideline and to compare the performance against humans.
METHODS
The LlamaIndex software library was used to integrate the guideline context into the chatbots. Based on the CBCT S2 guideline, 40 questions were posed to content-aware chatbots and early career and senior practitioners with different levels of experience served as reference. The chatbots' performance was compared in terms of recommendation accuracy and explanation quality. Chi-square test and one-tailed Wilcoxon signed rank test evaluated accuracy and explanation quality, respectively.
RESULTS
The GPT-4 based chatbot provided 100% correct recommendations and superior explanation quality compared to the one based on GPT3.5-Turbo (87.5% vs. 57.5% for GPT-3.5-Turbo; p = 0.003). Moreover, it outperformed early career practitioners in correct answers (p = 0.002 and p = 0.032) and earned higher trust than the chatbot using GPT-3.5-Turbo (p = 0.006).
CONCLUSIONS
A content-aware chatbot using GPT-4 reliably provided recommendations according to current consensus guidelines. The responses were deemed trustworthy and transparent and therefore facilitate the integration of artificial intelligence into clinical decision-making.
Date of Publication
2024-02-08
Publication Type
Article
Subject(s)
600 - Technology::610 - Medicine & health
Keyword(s)
Chatbot Cone-Beam CT Dental Imaging Natural Language Processing
Language(s)
en
Contributor(s)
Russe, Maximilian Frederik | |
Rau, Alexander | |
Ermer, Michael Andreas | |
Rothweiler, René | |
Wenger, Sina | |
Klöble, Klara | |
Bamberg, Fabian | |
Schmelzeisen, Rainer | |
Reisert, Marco | |
Semper-Hogg, Wiebke |
Additional Credits
Zahnmedizinische Kliniken (ZMK) - Klinik für Oralchirurgie und Stomatologie
Series
Dento maxillo facial radiology
Publisher
British Institute of Radiology
ISSN
0250-832X
Access(Rights)
open.access