ISSN : 3023-7505
  • Home
  • About The Journal
  • Editorial Board
  • Instructions for Authors
  • Contact
Current Issue
Ahead Of Print
Archive
Search
Most Popular
Download Articles Read Articles
Automated Perimetry
Carbonic Anhydrase Inhibitors
Intra Ocular Lens Power Calculation and Optic Biometry...
Visual Field Defects in Glaucoma
Visual Field Defect and Retinal Nerve Fiber Layer Defect in a Case of Optic Nerve Head Drusen...
Current Minimal Invasive Angle Procedures Without Implants for the Treatment of Glaucoma...
Intra Ocular Lens Power Calculation and Optic Biometry...
Automated Perimetry
Carbonic Anhydrase Inhibitors
Visual Field Defect and Retinal Nerve Fiber Layer Defect in a Case of Optic Nerve Head Drusen...
PureSee Kesintisiz Yüksek Kalitede Görüş
TJ-CEO 2024 , Vol 19 , Num 4
Abstract PDF Similar Articles Mail to Author
A comparative analysis to evaluate the effectiveness of Bing, Bard, ChatGPT-3.5 and ChatGPT-4.0 in answering glaucoma-related questions
Mehmet Canleblebici1, Ali Dal2, Murat Erdag3
1Ophthalmology Department, Kayseri State Hospital, Kayseri, Turkiye
2Ophthalmology Department, Mustafa Kemal University, Hatay, Turkiye
3Ophthalmology Department, Fırat University, Elazıg, Turkiye
DOI : 10.37844/TJ-CEO.2024.19.28 Purpose: Large language models can be used for education and training in glaucoma theoretically. The aim of study is to determine the proficiency and differences of chatbots in the field of glaucoma through self-assessment questions.

Materials and Methods: The self-assessment questions in the last decade were obtained from the American Academy of Ophthalmology Basic and Clinical Science Course Glaucoma Section books to be used in the study. These questions were asked one by one to ChatGPT-3.5 and 4.0, Bing and Bard respectively. The answers recorded as true and false were analyzed to evaluate the performance of artificial intelligence chatbots. Questions were evaluated in six main categories. In addition to descriptive statistical methods, the Fisher?s exact test and Pearson?s chi-square test was used to analyze the chatbots both pairwise and together.

Results: ChatGPT-4.0 had the highest correct response rate at 85.10%. Bing had a good accuracy rate of 81.80%. Bard and ChatGPT-3.5 underperformed, at 67.80% and 64.50%, respectively. There was statistical significance when all groups were compared(p<0.05). In pairwise comparison, there was a statistically significant difference between ChatGPT-4.0 with Bard and ChatGPT-3.5 and between Bing with Bard and ChatGPT-3.5(p<0.05). No significant difference was observed between ChatGPT-4.0 and Bing, Bard and ChatGPT- 3.5(p>0.05).

Conclusion: ChatGPT-4.0 and Bing showed an impressive correct response rate, while ChatGPT-3.5 and Bard were unfortunately inadequate. ChatGPT-4.0 and Bing have the potential to be used in education and training if care is taken to avoid misinformation, inaccurate results, and bias. Bard has a low response rate but is open to improvement. Keywords : Large language models, glaucoma, ChatGPT, Bing, Bard

PureSee Kesintisiz Yüksek Kalitede Görüş
Home
About The Journal
Editorial Board
Instructions for Authors
Contact