JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at office@dgof.de.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

P 1.5: Postersession

Time:

Thursday, 22/Feb/2024:

2:30pm - 3:30pm

Location: Auditorium (Room 0.09/0.10/0.11)

Rheinische Fachhochschule Köln Campus Vogelsanger Straße Vogelsanger Str. 295 50825 Cologne Germany

Presentations

The AI Reviewer: Exploring the Potential of Large Language Models in Scientific Research Evaluation

Dorian Tsolak, Zaza Zindel, Simon Kühne

Bielefeld University, Germany

Relevance & Research Question

The advent of large language models (LLMs) has introduced the potential to automate routine tasks across various professions, including the academic field. This case study explores the feasibility of employing LLMs to reduce the workload of researchers by performing simple scientific review tasks. Specifically, it addresses the question: Can LLMs complete simple reviewer tasks to the same degree as real researchers?

Methods & Data

We utilized original text data from abstracts submitted to the GOR 2024 conference, along with multiple reviewer assessments (i.e., numeric scores) for each abstract. In addition, we used ChatGPT 4 to generate several AI reviewer scores for each abstract. The ChatGPT model was specifically instructed to mimic the GOR conference review criteria applied by the scientific reviewers, focusing on the quality of research, relevance to the scientific field, and alignment with the conference’s focus. This approach allows us to compare multiple AI assessments with multiple peer-review assessments for each abstract.

Results

Our results indicate that ChatGPT can quickly and comprehensively evaluate conference abstracts, with ratings slightly higher, i.e. on average more positive, than those of academic reviewers, while retaining a similar variance.

Added Value

This case study contributes to the ongoing discourse on the integration of AI in academic workflows by demonstrating that LLMs, like ChatGPT, can potentially reduce the burden on researchers and organizers when handling a large set of scientific contributions.

Can socially desirable responding be reduced with unipolar response scales?

Vaka Vésteinsdóttir, Haukur Freyr Gylfason

University of Iceland, Iceland

Relevance & Research Question

It is well-known that the presentation and length of response scales can affect responses to questionnaire items. However, less is known about how different response scales affect responses and what the possible underlying mechanisms are. The purpose of this study was to compare bipolar and unipolar scales using a measure of personality (HEXACO-60) with regard to changes is response distributions, social desirability and acquiescence.

Methods & Data

Four versions of the HEXACO-60 personality questionnaire were administered online via MTurk to 1,000 participants, randomly assigned to one of four groups, each containing one of the four versions. The first group received the HEXACO with its original response options (a five-point bipolar response scale), the second group received the HEXACO with a five-point unipolar agreement response scale, and the third group also received a unipolar agreement response scale but with three response options (the original response scale without the disagree response options). The fourth group was asked to rate the social desirability scale value (SDSV) of each of the 60 HEXACO items on a seven-point response scale (from very undesirable to very desirable). An index of item desirability was created from the SDSV and a measure of acquiescence was created by selecting items with incompatible content from the HEXACO to produce item pairs where agreement to both items would indicate acquiescence.

Results

The three versions of the HEXACO-60 were analyzed with regard to distributions, social desirability of item content and acquiescence. The results show differences in the distribution of responses between the three response scales. Compared to the bipolar scale, the unipolar scales increased agreement with items rated as undesirable, which would indicate less socially desirable responding on unipolar scales. However, the use of unipolar scales increased overall agreement to items, which could indicate either increased acquiescence or different interpretations of the question in relation to the response options. The results and possible interpretations will be discussed.

Added Value

The study provides added understanding of the effects of changing response scales from bipolar to unipolar and aids in understanding of the mechanisms underlying responses.