Bots in web surveys: Predicting robotic language in open narrative answers
Joshua Claassen1, Jan Karem Höhne1, Ruben Bach2
1DZHW; Leibniz University Hannover; 2University of Mannheim
Relevance & Research Question
Web survey data is key for social and political decision-making, including official statistics. Respondents are frequently recruited through online access panels or social media platforms, making it difficult to verify that answers come from humans. As a consequence, bots – programs that autonomously interact with systems – may shift web survey outcomes and social and political decisions. Bot and human answers often differ regarding word choice and lexical structure. This may allow researchers to identify bots by predicting robotic language in open narrative answers. In this study, we therefore investigate the following research question: Can we predict robotic language in open narrative answers?
Methods & Data
We conducted a web survey on equal gender partnerships, including three open narrative questions. We recruited 1,512 respondents through Facebook ads. We also programmed two AI-based bots that each ran through our web survey 100 times: The first bot is linked to the LLM Gemini Pro, and the second bot additionally includes a memory feature and adopts personas, such as age and gender. Using a transformer model (BERT) we attempt to predict robotic language in the open narrative answers.
Results
Each open narrative answer is labeled based on whether it was generated by our bots (robotic language = “yes”) or the respondents recruited through Facebook ads (robotic language = “unclear”). Using this dichotomous label as ground truth, we will train a series of prediction models relying on the BERT language model. We will present various performance metrics to evaluate how accurately we can predict robotic language, and thereby identify bots in our web survey. In addition, we compare these results to students’ predictions of robotic language to study whether our BERT models outperform human judgement.
Added Value
Our study contributes to the ongoing discussion on bot activities in web surveys. By investigating AI-based bots with different levels of sophistication that are linked to a LLM, our study stands out from previous research that mostly looked at less sophisticated rule-based bots. Finally, it extends the methodological toolkit of social research when it comes to identifying bots in web surveys.
Addressing Biases of Sensor Data in Social Science Research: A Data Quality Perspective
Vanessa Lux1, Lukas Birkenmaier1, Johannes Breuer1, Jessica Daikeler1, Fiona Draxler2, Judith Gilsbach1,3, Julian Kohne1,4, Frank Mangold1, Indira Sen2,5, Henning Silber6, Katrin Weller1,7, Mareike Wieland1
1GESIS - Leibniz Institute for the Social Sciences, Germany; 2University of Mannheim, Germany; 3University of Konstanz, Germany; 4Ulm University, Germany; 5RWTH Aachen, Germany; 6University of Michigan, USA; 7University of Düsseldorf, Germany
Relevance & Research Question:
Sensor data – social sciences – error sources – error framework
The everyday availability of sensors has opened new research avenues for the social sciences, including their combination with traditional data types, such as survey data. However, as sensors become more prevalent for the collection of digital behavioral information, concerns regarding the accuracy and reliability of the obtained sensor data have emerged. Error sources and biases of sensor data are very sensor-specific, which poses a challenge to social science researchers as the necessary technical expertise is often lacking. The paper gives an overview of these concerns and proposes a general error framework for the data quality assessment of sensor data in social science research, contributing conceptually and methodologically to enhance the assessment and reporting of sensor data quality.
Methods & Data
Systematic review – thematically focused content analysis – expert group
Sensor error framework dimensions were extracted based on the results of a thematically focused systematic review (see preregistration here: https://osf.io/vkxbt ) using qualitative content analysis and evaluated within an expert group.
Results
Data quality – error framework – technical and human error – measurement error – representation bias
The proposed error framework outlines error sources and potential biases for measurement and representation along the full research cycle (planning, data collection, data analysis, archiving and sharing). We addressed the intricate relationship between general data quality dimensions and sensor-specific error sources by incorporating the multilayered character of sensor data arising from technical affordances and device effects. In addition, we identified three principles structuring error sources and biases for specific sensors: The interplay between researcher, study participant, and device, the spatial mobility of the sensor, and the continuous character of the error sources. The adoption of the framework is illustrated with sensor-specific examples.
Added Value
Data quality assessment – reporting standards – replicability – interpretability
The proposed general error framework for sensor data bears the potential to enhance the assessment and reporting of sensor data quality in the social sciences. It provides guidance to researchers and facilitates better replicability and interpretability of sensor data.
Improving the measurement of solidarity in the European context: results from a web probing in four countries
Vera Lomazzi, Margherita Pellegrino
University of Bergamo
Relevance & Research Question
This research addresses how cultural biases affect cross-national comparability in attitudes toward solidarity. Recent studies highlight concerns about the comparability of solidarity measurements between countries. By implementing international web-probing, we aim to uncover these biases and improve the clarity of questions in future rounds of the EVS questionnaire to ensure reliable cross-country comparisons.
Cross-country comparability, solidarity, European Values Study.
Methods & Data
We conducted web probing in—Italy, Portugal, Hungary, and the Czechia —utilizing nine solidarity-related items from the EVS 2017 questionnaire. The method involved inserting probes following closed-ended questions to explore respondents’ interpretations. A sample of 600 participants was surveyed, with responses analyzed qualitatively to identify variations in how terms like “Europeans,” “immigrants,” and “concern” are understood, and explaining why they chose to feel a certain level of concern toward Europeans. This data was translated and categorized using thematic coding across languages.
Web probing, open-ended questions, codification.
Results
For each response, we identified multiple categories, demonstrating the diverse interpretations of the same word within and especially across countries, despite the accurate translation of the EVS questionnaire. For instance, in Italy and Hungary, the word "concern" was primarily interpreted as "cherish" or "care," while in Portugal, it was more commonly identified with being "worried." Similarly, the term "immigrants" was understood in multiple ways. In Hungary and Portugal, it was often perceived as referring to generic migrants, whereas in the Czech Republic and Italy, respondents specified particular categories. For instance, 24 respondents from the Czech Republic described migrants specifically as Ukrainians, highlighting how contextual and temporary these interpretations can be.
Cultural variations, language, context, interpretation.
Added Value
This study demonstrates the value of web probing as a tool for identifying and addressing cultural biases in international surveys. The insights gained provide a basis for refining survey instruments, ensuring that data on solidarity reflects a more accurate and culturally sensitive understanding across European countries.
Cultural biases, comparative implications, data quality improvement.
|