Comparative assessment of AI models in addressing questions on priapism

An evaluation of response quality and clinical utility

Authors

  • Ahmet Halis Yedikule Chest Diseases and Thoracic Surgery Training and Research Hospital
  • Hacibey Ibrahim Bagcilar Training and Research Hospital, Istanbul, Turkey

DOI:

https://doi.org/10.5489/cuaj.9302

Keywords:

Artificial Intelligence, Priapism, clinical application, ChatGPT

Abstract

INTRODUCTION: This study aimed to evaluate the performance of three artificial intelligence (AI) models — ChatGPT, Gemini, and Copilot — in addressing priapism-related inquiries. The accuracy, comprehensiveness, and clinical applicability of AI-generated responses were systematically analyzed.

METHODS: Frequently asked questions (FAQs) regarding priapism were collected from medical guidelines, literature, and online health platforms. Each AI model generated responses, which were independently assessed by two experts based on accuracy, fluency, and clinical relevance. The Global Quality Score (GQS) was used for evaluation. Statistical analysis was performed using one-way ANOVA, with a significance threshold of p<0.05.

RESULTS: ChatGPT and Gemini demonstrated comparable performance across all thematic categories, with mean scores ranging from 4.5-4.9, while Copilot showed significantly lower scores (3.2–4.2, p<0.001). Both ChatGPT and Gemini provided clinically relevant and accurate information, whereas Copilot’s responses frequently lacked guideline-based recommendations.

CONCLUSIONS: ChatGPT and Gemini were statistically comparable in generating reliable, clinically useful responses, making them valuable tools for medical education and patient counseling. Copilot, however, exhibited lower accuracy and applicability. These findings highlight the need for continuous refinement of AI models to enhance their role in clinical decision-making while ensuring human expertise remains central to patient care.

Downloads

Download data is not yet available.

Published

2025-11-25

How to Cite

Halis, A., & Ibrahim, H. (2025). Comparative assessment of AI models in addressing questions on priapism: An evaluation of response quality and clinical utility. Canadian Urological Association Journal, 20(3), E89–92. https://doi.org/10.5489/cuaj.9302

Issue

Section

Original Research