Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
B6.2: AI Tools for Survey Research 2
Time:
Friday, 23/Feb/2024:
2:00pm - 3:00pm

Session Chair: Florian Keusch, University of Mannheim, Germany
Location: Seminar 4 (Room 1.11)

Rheinische Fachhochschule Köln Campus Vogelsanger Straße Vogelsanger Str. 295 50825 Cologne Germany

Show help for 'Increase or decrease the abstract text size'
Presentations

Vox Populi, Vox AI? Estimating German Public Opinion Through Language Models

Leah von der Heyde1, Anna-Carolina Haensch1, Alexander Wenz2

1LMU Munich, Germany; 2University of Mannheim, Germany

Relevance & Research Question:
The recent development of large language models (LLMs) has spurred discussions about whether these models might provide a novel method of collecting public opinion data. As LLMs are trained on large amounts of internet data, potentially reflecting attitudes and behaviors prevalent in the population, LLM-generated “synthetic samples” could complement or replace traditional surveys. Several mostly US-based studies have prompted LLMs to mimic survey respondents, finding that the responses closely match the survey data. However, the prevalence of native-language training data, structural differences between the population reflected therein and the general population, and the relationship between a country’s socio-political structure and public opinion, might affect the generalizability of such findings. Therefore, we ask: To what extent can LLMs estimate public opinion in Germany?
Methods & Data:
We use the example of vote choice as an outcome of interest in public opinion. To generate a “synthetic sample” of the voting-eligible population in Germany, we create personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents. Prompting GPT-3.5 with each persona, we ask the LLM to predict each respondents’ vote choice. We examine how the average party vote shares obtained through GPT-3.5 compare to the survey-based estimates, assess whether GPT-3.5 is able to make accurate estimates for different population subgroups, and compare the determinants of voting behavior between the two data sources.
Results:
Based on our prompt design and model configuration, we find that GPT-3.5 does not accurately predict citizens’ vote choice, exhibiting a bias towards the Left and Green parties on aggregate, and making better predictions for more “typical” voter subgroups, such as political partisans. Regarding the determinants of its predictions, it tends to miss out on the multifaceted factors that sway individual voter choices.
Added Value:
By examining the prediction of voting behavior using LLMS in a new context, our study contributes to the growing body of research about the conditions under which LLMs can be leveraged for studying public opinion. The findings underscore the limitation of applying LLMs for public opinion estimation without accounting for the biases and potential limitations in their training data.



Integrating LLMs into cognitive pretesting procedures: A case study using ChatGPT

Timo Lenzner, Hadler Patricia

GESIS - Leibniz Institute for the Social Sciences, Germany

Relevance & Research Question
Since the launch of ChatGPT in November 2022, large language models (LLMs) have been the talk of the town. LLMs are artificial intelligence systems that are trained to understand and generate human language based on huge data sets. In all areas where language data play a central role, they have great potential to become part of a researcher’s methodological toolbox. One of these areas is the cognitive pretesting of questionnaires. We identify three tasks where LLMs can augment current cognitive pretesting procedures and potentially render them more effective and objective: (1) identifying potential problems of draft survey questions prior to cognitive testing, (2) suggesting cognitive probes to test draft survey questions, and (3) simulating or predicting respondents’ answers to these probes (i.e., generating ‘synthetic samples'). In this case study, we examine how well ChatGPT performs these tasks and to what extent it can improve current pretesting procedures.
Methods & Data
We conducted a cognitive interviewing study with 24 respondents, testing four versions of a survey question on children’s activity levels. Half of the respondents were parents of children aged 3 to 15 years, the other half were adolescents aged 11 to 17 years. In parallel to applying our common pretesting procedures, we prompted ChatGPT 3.5 to perform the three tasks above and analyzed similarities and differences in the outcomes of the LLM and humans.
Results
With respect to tasks (1) and (2), ChatGPT identified some question problems and probes that were not anticipated by humans, but it also missed important problems and probes identified by human experts. With respect to task (3), the answers generated by ChatGPT were characterized by a relatively low variation between individuals with very different characteristics (i.e., gender, age, education) and the reproduction of gender stereotypes regarding the activities of boys and girls. All in all, they only marginally matched the answers of the actual respondents.
Added Value
To our knowledge, this is one of the first studies examining how LLMs can be incorporated into the toolkit of survey methodologists, particularly in the area of cognitive pretesting.



Using Large Language Models for Evaluating and Improving Survey Questions

Alexander Wenz1, Anna-Carolina Haensch2

1University of Mannheim, Germany; 2LMU Munich, Germany

Relevance & Research Question: The recent advances and availability of large language models (LLMs), such as OpenAI’s GPT, have created new opportunities for research in the social and behavioral sciences. Questionnaire development and evaluation is a potential area where researchers can benefit from LLMs: Trained on large amounts of text data, LLMs might serve as an easy-to-implement and inexpensive method for both assessing and improving the design of survey questions, by detecting problems in question wordings and suggesting alternative versions. In this paper, we examine to what extent GPT-4 can be leveraged for questionnaire design and evaluation by addressing the following research questions: (1) How accurately can GPT-4 detect problematic linguistic features in survey questions compared to existing computer-based evaluation methods? (2) To what extent can GPT-4 improve the design of survey questions?

Methods & Data: We prompt GPT-4 with a set of survey questions and ask to identify features in the question stem or the response options that can potentially cause comprehension problems, such as vague terms or a complex syntax. For each survey question, we also ask the LLM to suggest an improved version. To compare the LLM-based results with an existing computer-based survey evaluation method, we use the Question Understanding Aid (QUAID; Graesser et al. 2006) that rates survey questions on different categories of comprehension problems. Based on an expert review among researchers with a PhD in survey methodology, we assess the accuracy of the GPT-4- and QUAID-based evaluation methods in identifying problematic features in the survey questions. We also ask the expert reviewers to evaluate the quality of the new question versions developed by GPT-4 compared to their original versions.

Results: We compare both evaluation methods with regard to the number of problematic question features identified, building upon the five categories used in QUAID: (1) unfamiliar technical terms, (2) vague or imprecise relative terms, (3) vague or ambiguous noun phrases, (4) complex syntax, and (5) working memory overload.

Added Value: The results from this paper provide novel evidence on the usefulness of LLMs for facilitating survey data collection.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: GOR 24
Conference Software: ConfTool Pro 2.8.101
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany