Evaluation of ChatGPT’s performance on answering pediatric urology questions based on association guidelines

Auteurs-es

  • Wyatt MacNevin Dalhousie Medical School
  • Nicholas Dawe Dalhousie University, Faculty of Medicine
  • Laura Harkness Dalhousie University, Faculty of Medicine
  • Budoor Salman Dalhousie University, Department of Urology, Halifax, Nova Scotia, Canada
  • Daniel T. Keefe Dalhousie University, Department of Urology and Pediatric Urology, Halifax, Nova Scotia, Canada

DOI :

https://doi.org/10.5489/cuaj.9238

Mots-clés :

ChatGPT, Artificial Intelligence, Pediatric Urology, Medical Information, Patient Knowledge

Résumé

INTRODUCTION: ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT’s answers with association recommendations has yet to be analyzed.

METHODS: A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance were used for statistical analysis.

RESULTS: ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63), and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). “Easy” questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to “medium” (1.92/3.00±0.07, p=0.003) and “difficult” questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).

CONCLUSIONS: ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.

Téléchargements

Les données relatives au téléchargement ne sont pas encore disponibles.

Publié-e

2025-07-28

Comment citer

MacNevin, W., Dawe, N., Harkness, L., Salman, B., & Keefe, D. T. (2025). Evaluation of ChatGPT’s performance on answering pediatric urology questions based on association guidelines. Canadian Urological Association Journal, 19(11), E362–7. https://doi.org/10.5489/cuaj.9238

Numéro

Rubrique

Original Research