Computer Vision Meets Large Language Models: Performance of ChatGPT 4.0 on Dermatology Boards-Style Practice Questions
Main Article Content
Keywords
ChatGPT, Artificial Intelligence, Dermatology, Medical Education, GAI, Boards Preparation, Licensing Examinations
Abstract
Background: ChatGPT is a generative artificial intelligence that has numerous professional applications. Applications in medical education are currently being explored. ChatGPT 4.0 performance on image-based dermatology boards-style practice questions has not been assessed.
Objective: The objective of this study was to determine the accuracy with which ChatGPT can answer dermatology boards examination practice questions.
Methods: 150 multiple-choice questions from the popular question bank DermQbank were inputted into ChatGPT. Of these, 83 were text-only questions and 67 had associated images. These same questions were inputted into ChatGPT again in July 2024. An additional 150 questions were inputted for a total of 300 different questions where 169 were text-only and 133 had associated images.
Results: Of the aggregate 300 question data, ChatGPT answered 232 questions correctly (77.3%). ChatGPT performed significantly better with text-only questions than with questions that included images (85.2% (144/169) vs 67.7% (90/133), P<.001). Of image-based questions, ChatGPT performed better with clinical image questions than with dermatopathology questions (69.0% (78/133) vs. 58.8% (10/17), P=.40), but this difference was not statistically significant partially due to the sample size of the dermatopathology questions. Compared to post-graduate year 4 (PGY-4) residents, ChatGPT performed above the 46th percentile. ChatGPT agreed with the answer choice picked by the majority of question bank users 75.3% of the time. Multivariable regression demonstrated that significant predictive variables for ChatGPT answering a question correctly included the percent of dermatology trainees who answered a question correctly and whether the question was text-based (P<.001and P=.004, respectively).
Conclusions: ChatGPT answered 77.3% of dermatology board examination practice questions correctly, performing above the 46th percentile of PGY-4 question bank users. If using ChatGPT as a study resource for dermatology board examination preparation, residents should be judicious with exactly how they employ ChatGPT to avoid learning incorrect information.
References
2. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
3. Mihalache A, Popovic MM, Muni RH. Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment. JAMA Ophthalmol. 2023;141(6):589-597.
4. Cho SI, Sun S, Mun JH, et al. Dermatologist-level classification of malignant lip diseases using a deep convolutional neural network. Br J Dermatol. 2020;182(6):1388-1394.
5. Haenssle HA, Fink C, Toberer F, et al. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol. 2020;31(1):137-143.
6. Maron RC, Weichenthal M, Utikal JS, et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur J Cancer. 2019;119:57-65.
7. Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29(8):1836-1842.
8. Martin-Gonzalez M, Azcarraga C, Martin-Gil A, Carpena-Torres C, Jaen P. Efficacy of a Deep Learning Convolutional Neural Network System for Melanoma Diagnosis in a Hospital Population. Int J Environ Res Public Health. 2022;19(7).
9. Van Molle P, Mylle S, Verbelen T, et al. Dermatologist versus artificial intelligence confidence in dermoscopy diagnosis: Complementary information that may affect decision-making. Exp Dermatol. 2023;32(10):1744-1751.
10. Passby L, Jenko N, Wernham A. Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions. Clin Exp Dermatol. 2024;49(7):722-727. doi:10.1093/ced/llad197
11. Joly-Chevrier M, Nguyen AX, Lesko-Krleza M, Lefrançois P. Performance of ChatGPT on a Practice Dermatology Board Certification Examination. J Cutan Med Surg. 2023;27(4):407-409. doi:10.1177/12034754231188437
12. How are exams scored? . American Board of Dermatology. https://www.abderm.org/residents-and-fellows/exams/how-are-exams-scored. Published 2023. Accessed 12/05/2023.
13. ABD Certification Pathway Exam Pass Rates. American Board of Dermatology. https://www.abderm.org/residents-and-fellows/abd-certification-pathway/abd-certification-pathway-exam-pass-rates. Published 2023. Accessed 12/05/2023.
14. Lukpat A. JPMorgan Restricts Employees From Using ChatGPT. Wall Street Journal. 2023. https://www.wsj.com/articles/jpmorgan-restricts-employees-from-using-chatgpt-2da5dc34. Published 2/22/2023. Accessed 12/5/2023.
15. Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J. Ethical Considerations of Using ChatGPT in Health Care. J Med Internet Res. 2023;25:e48009.