Relevance & Research Question
The advent of large language models (LLMs) has introduced the potential to automate routine tasks across various professions, including the academic field. This case study explores the feasibility of employing LLMs to reduce the workload of researchers by performing simple scientific review tasks. Specifically, it addresses the question: Can LLMs complete simple reviewer tasks to the same degree as real researchers?
Methods & Data
We utilized original text data from abstracts submitted to the GOR 2024 conference, along with multiple reviewer assessments (i.e., numeric scores) for each abstract. In addition, we used ChatGPT 4 to generate several AI reviewer scores for each abstract. The ChatGPT model was specifically instructed to mimic the GOR conference review criteria applied by the scientific reviewers, focusing on the quality of research, relevance to the scientific field, and alignment with the conference’s focus. This approach allows us to compare multiple AI assessments with multiple peer-review assessments for each abstract.
Results
Our results indicate that ChatGPT can quickly and comprehensively evaluate conference abstracts, with ratings slightly higher, i.e. on average more positive, than those of academic reviewers, while retaining a similar variance.
Added Value
This case study contributes to the ongoing discourse on the integration of AI in academic workflows by demonstrating that LLMs, like ChatGPT, can potentially reduce the burden on researchers and organizers when handling a large set of scientific contributions.