TJ-CEO

ISSN : 3023-7505

Current Issue

Ahead Of Print

Most Popular

Download Articles Read Articles

Automated Perimetry

Carbonic Anhydrase Inhibitors

Intra Ocular Lens Power Calculation and Optic Biometry...

Visual Field Defects in Glaucoma

Visual Field Defect and Retinal Nerve Fiber Layer Defect in a Case of Optic Nerve Head Drusen...

PureSee Kesintisiz Yüksek Kalitede Görüş

TJ-CEO 2024 , Vol 19 , Num 4

Abstract PDF Similar Articles Mail to Author

A comparative analysis to evaluate the effectiveness of Bing, Bard, ChatGPT-3.5 and ChatGPT-4.0 in answering glaucoma-related questions

Mehmet Canleblebici¹, Ali Dal², Murat Erdag³

¹Ophthalmology Department, Kayseri State Hospital, Kayseri, Turkiye
²Ophthalmology Department, Mustafa Kemal University, Hatay, Turkiye
³Ophthalmology Department, Fırat University, Elazıg, Turkiye DOI : 10.37844/TJ-CEO.2024.19.28 Purpose: Large language models can be used for education and training in glaucoma theoretically. The aim of study is to determine the proficiency and differences of chatbots in the field of glaucoma through self-assessment questions.

Materials and Methods: The self-assessment questions in the last decade were obtained from the American Academy of Ophthalmology Basic and Clinical Science Course Glaucoma Section books to be used in the study. These questions were asked one by one to ChatGPT-3.5 and 4.0, Bing and Bard respectively. The answers recorded as true and false were analyzed to evaluate the performance of artificial intelligence chatbots. Questions were evaluated in six main categories. In addition to descriptive statistical methods, the Fisher?s exact test and Pearson?s chi-square test was used to analyze the chatbots both pairwise and together.

Results: ChatGPT-4.0 had the highest correct response rate at 85.10%. Bing had a good accuracy rate of 81.80%. Bard and ChatGPT-3.5 underperformed, at 67.80% and 64.50%, respectively. There was statistical significance when all groups were compared(p<0.05). In pairwise comparison, there was a statistically significant difference between ChatGPT-4.0 with Bard and ChatGPT-3.5 and between Bing with Bard and ChatGPT-3.5(p<0.05). No significant difference was observed between ChatGPT-4.0 and Bing, Bard and ChatGPT- 3.5(p>0.05).

Conclusion: ChatGPT-4.0 and Bing showed an impressive correct response rate, while ChatGPT-3.5 and Bard were unfortunately inadequate. ChatGPT-4.0 and Bing have the potential to be used in education and training if care is taken to avoid misinformation, inaccurate results, and bias. Bard has a low response rate but is open to improvement. Keywords : Large language models, glaucoma, ChatGPT, Bing, Bard

Home

About

Editorial Board

Contact