Assessing the methodologic quality of systematic reviews using generative large language models
DOI:
https://doi.org/10.5489/cuaj.9243Keywords:
LLM, articial intelligence, ChatGPT, Methodology, Systematic review, AMSTAR 2Abstract
INTRODUCTION: We aimed to evaluate whether generative large language models (LLMs) can accurately assess the methodologic quality of systematic reviews (SRs).
METHODS: A total of 114 SRs from five leading urology journals were included in the study. Human reviewers graded each of the SRs in duplicate, with differences adjudicated by a third expert. We created a customized generative artificial intelligence (generative pretrained transformer [GPT]), “Urology AMSTAR 2 Quality Assessor,” and graded the 114 SRs in three iterations using a zero-shot method. We performed an enhanced trial focusing on critical criteria by giving GPT detailed, step-by-step instructions for each of the SRs using chain-of-thought method. Accuracy, sensitivity, specificity, and F1 score for each GPT trial were calculated against human results. Internal validity among three trials were computed.
RESULTS: GPT had an overall congruence of 75%, with 77% in critical criteria and 73% in non-critical criteria when compared to human results. The average F1 score was 0.66. There was a high internal validity at 85% among three iterations. GPT accurately assigned 89% of studies into the correct overall category. When given specific, step-by-step instructions, congruence of critical criteria improved to 91%, and overall quality assessment accuracy to 93%.
CONCLUSIONS: GPT showed promising ability to efficiently and accurately assess the quality of SRs in urology.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
You, the Author(s), assign your copyright in and to the Article to the Canadian Urological Association. This means that you may not, without the prior written permission of the CUA:
- Post the Article on any Web site
- Translate or authorize a translation of the Article
- Copy or otherwise reproduce the Article, in any format, beyond what is permitted under Canadian copyright law, or authorize others to do so
- Copy or otherwise reproduce portions of the Article, including tables and figures, beyond what is permitted under Canadian copyright law, or authorize others to do so.
The CUA encourages use for non-commercial educational purposes and will not unreasonably deny any such permission request.
You retain your moral rights in and to the Article. This means that the CUA may not assert its copyright in such a way that would negatively reflect on your reputation or your right to be associated with the Article.
The CUA also requires you to warrant the following:
- That you are the Author(s) and sole owner(s), that the Article is original and unpublished and that you have not previously assigned copyright or granted a licence to any other third party;
- That all individuals who have made a substantive contribution to the article are acknowledged;
- That the Article does not infringe any proprietary right of any third party and that you have received the permissions necessary to include the work of others in the Article; and
- That the Article does not libel or violate the privacy rights of any third party.







