Computer Vision Meets Large Language Models: Performance of ChatGPT 4.0 on Dermatology Boards-Style Practice Questions

Main Article Content

Logan Smith https://orcid.org/0000-0003-1766-9424
Rana Hanna https://orcid.org/0009-0007-5024-2998
Leigh Hatch
Karim Hanna

Keywords

ChatGPT, Artificial Intelligence, Dermatology, Medical Education, GAI, Boards Preparation, Licensing Examinations

Abstract

Background: ChatGPT is a generative artificial intelligence that has numerous professional applications. Applications in medical education are currently being explored. ChatGPT 4.0 performance on image-based dermatology boards-style practice questions has not been assessed. 


Objective: The objective of this study was to determine the accuracy with which ChatGPT can answer dermatology boards examination practice questions.


Methods: 150 multiple-choice questions from the popular question bank DermQbank were inputted into ChatGPT. Of these, 83 were text-only questions and 67 had associated images. These same questions were inputted into ChatGPT again in July 2024. An additional 150 questions were inputted for a total of 300 different questions where 169 were text-only and 133 had associated images.


Results: Of the aggregate 300 question data, ChatGPT answered 232 questions correctly (77.3%). ChatGPT performed significantly better with text-only questions than with questions that included images (85.2% (144/169) vs 67.7% (90/133), P<.001). Of image-based questions, ChatGPT performed better with clinical image questions than with dermatopathology questions (69.0% (78/133) vs. 58.8% (10/17), P=.40), but this difference was not statistically significant partially due to the sample size of the dermatopathology questions. Compared to post-graduate year 4 (PGY-4) residents, ChatGPT performed above the 46th percentile. ChatGPT agreed with the answer choice picked by the majority of question bank users 75.3% of the time. Multivariable regression demonstrated that significant predictive variables for ChatGPT answering a question correctly included the percent of dermatology trainees who answered a question correctly and whether the question was text-based (P<.001and P=.004, respectively).


Conclusions: ChatGPT answered 77.3% of dermatology board examination practice questions correctly, performing above the 46th percentile of PGY-4 question bank users. If using ChatGPT as a study resource for dermatology board examination preparation, residents should be judicious with exactly how they employ ChatGPT to avoid learning incorrect information.

References

1. OpenAI. GPT-4 Technical Report. 2023.

2. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.

3. Mihalache A, Popovic MM, Muni RH. Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment. JAMA Ophthalmol. 2023;141(6):589-597.

4. Cho SI, Sun S, Mun JH, et al. Dermatologist-level classification of malignant lip diseases using a deep convolutional neural network. Br J Dermatol. 2020;182(6):1388-1394.

5. Haenssle HA, Fink C, Toberer F, et al. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol. 2020;31(1):137-143.

6. Maron RC, Weichenthal M, Utikal JS, et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur J Cancer. 2019;119:57-65.

7. Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29(8):1836-1842.

8. Martin-Gonzalez M, Azcarraga C, Martin-Gil A, Carpena-Torres C, Jaen P. Efficacy of a Deep Learning Convolutional Neural Network System for Melanoma Diagnosis in a Hospital Population. Int J Environ Res Public Health. 2022;19(7).

9. Van Molle P, Mylle S, Verbelen T, et al. Dermatologist versus artificial intelligence confidence in dermoscopy diagnosis: Complementary information that may affect decision-making. Exp Dermatol. 2023;32(10):1744-1751.

10. Passby L, Jenko N, Wernham A. Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions. Clin Exp Dermatol. 2024;49(7):722-727. doi:10.1093/ced/llad197

11. Joly-Chevrier M, Nguyen AX, Lesko-Krleza M, Lefrançois P. Performance of ChatGPT on a Practice Dermatology Board Certification Examination. J Cutan Med Surg. 2023;27(4):407-409. doi:10.1177/12034754231188437

12. How are exams scored? . American Board of Dermatology. https://www.abderm.org/residents-and-fellows/exams/how-are-exams-scored. Published 2023. Accessed 12/05/2023.

13. ABD Certification Pathway Exam Pass Rates. American Board of Dermatology. https://www.abderm.org/residents-and-fellows/abd-certification-pathway/abd-certification-pathway-exam-pass-rates. Published 2023. Accessed 12/05/2023.

14. Lukpat A. JPMorgan Restricts Employees From Using ChatGPT. Wall Street Journal. 2023. https://www.wsj.com/articles/jpmorgan-restricts-employees-from-using-chatgpt-2da5dc34. Published 2/22/2023. Accessed 12/5/2023.

15. Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J. Ethical Considerations of Using ChatGPT in Health Care. J Med Internet Res. 2023;25:e48009.