GOR 25
General Online Research Conference 2025
Freie Universität Berlin - Henry Ford Building
31 March - 2 April 2025
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Date: Monday, 31/Mar/2025 | |
9:00am - 10:00am | Begin Check-in Location: Foyer EG |
10:00am - 1:00pm | Workshop 1 Location: Konferenzraum II |
|
Web tracking - augmenting web surveys with app, URL, and search term data DZHW, Leibniz University Hannover, Germany Workshopslot 2,5 Target groups Researchers and practitioners with a general methodological interest in web-based surveys and digital trace data Is the workshop geared at an exclusively German or an international audience? International Workshop language English Description of the content of the workshop Web surveys frequently run short to accurately measure digital behavior because they are prone to recall error (i.e., biased recalling and reporting of past behavior) and social desirability bias (i.e., misreporting of behavior to comply with social norms and values). New advances in the collection of digital trace (or web tracking) data make it possible to directly measure digital behavior in the form of browser logs (e.g., visited URLs and search terms) and apps (e.g., duration and frequency of their use). Building on these advances, we will introduce participants to web surveys augmented with web tracking data. In this course, we initially give a thorough overview of the manifold new measurement opportunities provided by web tracking. In addition, participants obtain comprehensive insights into the collection, processing, analysis, and error sources of web tracking data as well as its application to substantive research. Importantly, the course includes applied web tracking data exercises in which participants learn how to ... Goals of the workshop At the end of the workshop, participants will be able to 1) independently conceptualize the collection of web tracking data, 2) decide on best practices when it comes to data handling and analysis, and 3) critically reflect upon the opportunities and challenges of web tracking data and its suitability for empirical studies in the context of social and behavioral science. Necessary prior knowledge of participants Basic knowledge on web-based surveys, including structured and unstructured datasets, is beneficial but not a pre-requisite. Literature that participants need to read prior to participation none Recommended additional literature Bosch, O. J., & Revilla, M. (2022). When survey science met web tracking: Presenting an error framework for metered data. Journal of the Royal Statistical Society (Series A), 185, 408-436. https://doi.org/10.1111/rssa.12956 Information about the instructors Jan Karem Höhne (hoehne@dzhw.eu) is junior professor at Leibniz University Hannover in association with the German Center for Higher Education Research and Science Studies (DZHW). He is head of the CS3 Lab for Computational Survey and Social Science. His research focuses on new data forms and types for measuring political and social attitudes. Joshua Claassen (claassen@dzhw.eu) is PhD student and research associate at Leibniz University Hannover in association with the German Center for Higher Education Research and Science Studies (DZHW). His research focuses on computational survey and social science with an emphasis on digital trace data. Maximum number of participants 25 Will participants need to bring their own devices in order to be able to access the Internet? Will they need to bring anything else to the workshop? Yes, they should bring their devices (laptop and smartphone). |
10:00am - 1:00pm | Workshop 2 Location: Konferenzraum III |
|
Structured information extraction with LLMs Q Agentur für Forschung GmbH, Germany Duration of the workshop 2,5 Target groups Analysts and researchers working with text data, e.g. transcripts, news articles, social media posts or reviews Is the workshop geared at an exclusively German or an international audience? International Workshop language English Description of the content of the workshop This workshop is an introduction to the application of Large Language Models (LLMs) for structured information extraction in market research and social sciences. Participants will implement solutions to natural language processing tasks such as text classification, entity recognition, and sentiment analysis. The session includes hands-on exercises in Python using the library "instructor". Participants will learn about strategies for prompting, few-shot examples and fine-tuning. The approaches taught are compatible with a wide range of open source and commercial models. Discussion sections of the workshop will cover the methodological and technical possibilities and limitations of LLMs for information extraction. Goals of the workshop
Necessary prior knowledge of participants Basic knowledge of Python. R users can use the guide recommended literature to get up to speed quickly. The code examples in the workshop can be followed with minimal coding knowledge, extending them requires a bit more. Literature that participants need to read prior to participation Starter guide which will be sent before the workshop. It will contain instructions for using Google Colab and installing the required Python packages. Recommended additional literature Primer on Python for R users: https://rstudio.github.io/reticulate/articles/python_primer.html Information about the instructor Paul Simmering is a data scientist at Q Agentur für Forschung where he works on social media and review analysis. He has presented research on sentiment analysis at GOR 23 and GOR 24 Maximum number of participants 20 Will participants need to bring their own devices in order to be able to access the Internet? Will they need to bring anything else to the workshop? Participants will need to bring a laptop. An OpenAI API key will be provided for use during the workshop. The recommended development environment for beginners is Google Colab, which is free and runs in the browser. A starter guide will be provided. Advanced users are welcome to use an IDE of their choice and are also welcome to use a different LLM platform than OpenAI that is compatible with instructor, such as Anthropic, Cohere, Gemini and local models using Ollama. |
1:00pm - 1:30pm | Break |
1:30pm - 4:30pm | Workshop 3 Location: Konferenzraum III |
|
Working with research partners - strategies for effectively collaborating on a research project with external agencies 1Callegaro Research, United Kingdom; 2Untold Research, United States Duration of the workshop 2.5 h Target groups Researchers who have worked or are looking into working with third parties companies for market/survey research projects and want to improve the collaboration and the quality of the project outcome Is the workshop geared at an exclusively German or an international audience? International audience Workshop language english Description of the content of the workshop Researchers in both commercial and academic sectors are increasingly relying on research partners for the entire process of market/survey research projects. The goal of this course is to describe good practices in working with external agencies and managing a survey research project with a third party vendor. This first part of the course will cover a broad spectrum of research management topics, including: determining the project scope and objectives, proposal preparation including requests for proposals (RFP), statements of work (SOW) and how to evaluate and select a vendor, how to best communicate with vendors, setting up timelines and schedules, types of contracts and subcontracting, budgeting, monitoring and quality controls, data security, research ethics, and deliverables and reporting. The second part of the course will discuss strategies of effective collaboration and how to best work together on a project such as assigning and managing tasks, managing time and managing documents/files and documenting the project. We will also discuss strategies for communicating with the research agency. We will close the workshop looking at trends in the vendor industry and the use of AI from the vendor side Goals of the workshop Dealing with the different stages of negotiations and documents needed when working with third parties. Streamline the partnership with third parties in order to obtain high quality data Necessary prior knowledge of participants No necessary previous knowledge Literature that participants need to read prior to participation There is basically no literature we are aware of at this time, we might uncover something during the preparation of the final version of the slides Information about the instructor Mario Callegaro has 15 years of experience at Google having worked with large research international agencies on numerous million dollar projects. Before that, he worked on the agency side at Knowledge Panel (now Ipsos Knowledge Panel). Maximum number of participants 35 Will participants need to bring their own devices in order to be able to access the Internet? Will they need to bring anything else to the workshop? No need to bring a laptop |
1:30pm - 4:30pm | Workshop 4 Location: Konferenzraum II |
|
Collecting and analyzing smartphone (survey) data using the GESIS AppKit GESIS - Leibniz Institute for the Social Sciences, Germany (all) Duration of the workshop 2,5 Target groups Social Scientisits including, but not limited to, Media & Communication Scientists, Psychologists, Quantitative Sociologists, and Political Scientists Is the workshop geared at an exclusively German or an international audience? International Audience Workshop language English Description of the content of the workshop Smartphones are a ubiquitous technology that most people have in their pockets almost Goals of the workshop 1. Introduction to intensive longitudinal data collections, how to design these studies, Necessary prior knowledge of participants Basic Knowledge of R and Rsutdio is recommended (changing working directories, importing data, using ggplot2). Literature that participants need to read prior to participation none Recommended additional literature https://gesiscss.github.io/AppKit_Documentation/ [not required] Information about the instructor https://www.gesis.org/institut/ueber-uns/mitarbeitendenverzeichnis/person/Julian.Kohne Maximum number of participants 40 Will participants need to bring their own devices in order to be able to access the Internet? Will they need to bring anything else to the workshop? 1) Laptop with R and RStudio installed |
1:30pm - 4:30pm | Workshop 5 ONLINE |
|
Making inference from nonprobability online surveys Utrecht University, Netherlands, The Duration of the Workshop: Target Groups: Is the workshop geared at an exclusively German or an international audience? Workshop Language: Description of the content of the workshop: Academic and market researchers often rely on opt-in (volunteer) online panels to collect survey data. These panels allow for the rapid collection of responses, resulting in large datasets at a relatively low cost. While this approach is convenient and cost-effective, it has a major limitation: making inferences about the target population is not straightforward. The issue is that the sample is non-probabilistic; the panel consists of volunteers who self-select themselves, introducing selection bias. Addressing this bias requires statistical adjustments under strong assumptions. Goals of the workshop: Learn how to: Necessary prior knowledge of participants: .Literature that participants need to read for preparation Participants can watch this video by the Pew Reserch Center about nonprobality surveys: https://www.pewresearch.org/short-reads/2018/08/06/what-are-nonprobability-surveys/ Recommended additional literature Information about the instructors: Dr. Camilla Salvatore works as an assistant professor at the department of Methodology and Statistics at Utrecht University, where she specializes in survey research. Her interests include inference with nonprobability samples, survey weighting, nonresponse, the use of digital trace data and their integration with surveys. https://www.uu.nl/medewerkers/CSalvatore Will participants need to bring their own devices in order to be able to access the Internet? online workshop |
4:30pm - 5:30pm | Break |
5:30pm - 7:00pm | DGOF Members General Meeting Location: Jung&Schönn |
7:00pm - 7:30pm | Break |
7:30pm - 8:30pm | Early Career Speed Networking Event Location: Jung&Schönn Mix - Mingle - Match: Find Your Scientific Soulmate at GOR’s Early Career Science Speed Networking
|
8:30pm - 11:59pm | GOR 25 Get Together Location: Jung&Schönn |
Date: Tuesday, 01/Apr/2025 | |
8:00am - 9:00am | Begin Check-in Location: Foyer EG |
9:00am - 10:15am | 1: GOR 25 Opening & Keynote 1 Location: Max-Kade-Auditorium |
|
High Quality Training Data for AI Models: Lessons from 20 years in Surveys University of Maryland Whether future AI models are fair, trustworthy, and aligned with the public’s interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data is difficult, and few AI/ML researchers are trained in data collection methods. The talk bridges artificial intelligence and survey methodology, demonstrating how techniques from survey research can improve training data quality and model performance. The talk concludes with practical recommendations for improving training data collection and ideas for joint research. |
10:15am - 10:45am | Break |
10:45am - 11:45am | 2.1: Innovations in Market Research Location: Hörsaal A Session Chair: Georg Wittenburg, Inspirient, Germany |
|
Show me, how you touch it – The effects of vicarious touch in online marketing videos Rheinische Hochschule Köln gGmbH, Germany Relevance & Research Question Overcoming key purchase barriers and improving consumer experience is an ongoing trend in online shopping. Therefore, this research analyses the influence of vicarious touch in the digital context. It is assumed that the depiction of a hand touching the product can compensate for online shoppers' need for haptic information. Initial studies have already shown that this strategy leads to mental simulation of a product interaction and improved product evaluation (e.g. Luangrath et al., 2022; Liu et al., 2019), although no clear distinction has yet been made between product images and videos. Referring to the underlying theoretical framework of MNS activation (Adler & Gillmeister, 2019), it is assumed that the actual perception of movement is a prerequisite for such an effect, and therefore vicarious touch must always be presented in video format. Methods & Data The experimental study uses quantitative data (N = 1000) from an online access panel, nationally representative in terms of age and gender. In a social commerce context, respondents reported their purchase intentions for four different products selected in a pre-study that differed in their haptic importance. The two between-factors touch (vicarious touch vs. no touch) and media format (image vs. video) were realized, resulting in four different product presentation types. Results Indeed, only for product videos did the use of vicarious touch significantly increase purchase intent compared to static images (p < .001, d = .3) and rotating product videos (p < .05, d = .2). There was no effect when vicarious touch was presented in static images. Interestingly, the video format itself was a significant driver of purchase intent, highlighting the overall benefit of dynamic product presentation (p < .001, ηp² = .012). Furthermore, results did not differ on different products, suggesting that online shopping generally creates a haptic information gap. Added Value Using realistic e-commerce scenarios, this study shows that vicarious touch can significantly increase purchase intent, but only in product videos, not images. The findings highlight the need for dynamic, motion-rich content to compensate for the lack of haptic feedback online, making products more tangible and driving consumer engagement. Data-Driven Decision-Making in Real Estate with TenantCM: Unlocking Value from Tenant Satisfaction Surveys YouGov Schweiz AG Relevance & Research Question Tenant satisfaction is a key performance indicator in the real estate industry, driving tenant retention, revenue stability, reputation, and operational efficiency. As tenant expectations evolve, it is crucial to efficiently capture feedback, address concerns, and benchmark performance against industry standards. Beyond benchmarking, the true value of tenant satisfaction surveys lies in generating actionable insights for decision-making at both the property and individual tenant levels. This raises the question: How can tenant satisfaction surveys be enhanced to provide detailed benchmarking opportunities and granular analysis for informed decision-making? Methods & Data In collaboration with over 20 property owners across residential and commercial segments, we conducted tailored online tenant satisfaction surveys in Switzerland from 2019 to 2024 leading to nearly 75’000 completed surveys. Invitations were sent via email, postal mail, or tenant apps, with responses processed into a unified dashboard. This platform enables clients to benchmark performance against aggregated market results and allows property managers to compare performance across owners. Key challenges, such as identifying stakeholders and improving response rates, were addressed through targeted approaches. Actionable insights were ensured using multi-level evaluation, NLP for analysing open text comments, and structural equation modelling (SEM) to identify key drivers of satisfaction. Annual workshops with clients further refine survey design, dashboards, and workflows. Results Our methodology enabled dual-layer evaluations, offering detailed assessments at the tenant level and aggregated insights at property and portfolio levels. Property owners tracked satisfaction trends over time and benchmarked against market standards, while property managers addressed specific tenant concerns, expressed via numeric ratings and open comments, and compared performance across clients. Results are presented in an interactive dashboard with a measure management tool for tracking issues and enabling targeted improvements. Added Value By advancing traditional survey methodologies, our approach places tenants at the centre of decision-making. The unified dashboard delivers tailored, comparable insights across residential and commercial segments. NLP and SEM efficiently identify key satisfaction drivers, enabling timely interventions. Annual workshops ensure tools evolve with client needs, fostering tenant loyalty and supporting sustainable portfolio management. |
10:45am - 11:45am | 2.2: Web Tracking Location: Hörsaal B Session Chair: Dorian Tsolak, Bielefeld University, Germany |
|
Understanding Participation in Web Tracking Studies: A Comparison of Probabilistic and Nonprobabilistic Sampling Strategies GESIS, Germany Relevance & Research Question Methods & Data Results Added Value Socioeconomic Status and Patterns of Online Behavior in Germany GESIS Leibniz Institute for the Social Sciences, Germany Relevance & Research Question Do individuals from different socioeconomic status (SES) groups use the internet differently? The digital divide extends beyond disparities in access to adequate internet access. Being online offers potential benefits, but people may differ in their knowledge, opportunities, and capabilities to take full advantage of these benefits. Methods & Data Using a linked dataset from the German General Social Survey (a large, probability-based survey) and respondents' web surfing behavior (GESIS Web Tracking), this study explores whether online behavior varies by SES background. Respondents participated in a web tracking study, which collected data on every individual website visit over two months following the installation of a browser plug-in. This pilot study includes more than 4 million website visits from 500 participants, with the linked data providing around 340,000 website visits from 106 respondents. The websites visited were classified into content-based categories, such as “education,” “job search,” “healthy living,” “personal finance,” alongside categories like “shopping,” “sports,” and “video gaming” using two different third-party service providers. Through regression analysis, I examine whether SES is associated with particular types of website visits and whether this relationship is moderated by first- and second-level digital divide factors, such as access to fast internet connections and digital literacy. Results Preliminary results based on the webtracking data set show that characteristics of participants such as their education and political interest are associated with more frequent visits of particular website types such as news websites. Linking of the data sources will be possible next week. I will be able to show results on differences in online behavior by SES at the conference. Added Value Differing online behavior may ultimately contribute to inequalities in education, the labor market, or financial well-being, potentially mitigating or reinforcing existing social inequalities. Thus, understanding these behavioral differences is crucial for reducing structures that exacerbate inequality. Results of this work may demonstrate the substantive value of linked survey and web tracking data.
Bridging Gaps or Deepening Divides? The Impact of Online Intermediaries on News Diversity Department of Computational Social Science, GESIS – Leibniz Institute for the Social Sciences Relevance & Research Question Recent research demonstrated that intermediaries like Facebook, Twitter, search engines and news aggregators can broaden the diversity of news people consume. However, the personalized content users encounter on intermediaries remains a black box, as tracking tools have been limited in their ability to capture in-platform content. A central debate persists around whether algorithmically curated content diverges from preference-driven selective exposure -- where users actively choose to engage with specific news items based on their interests. Using a web tracking tool that captures the public content on Facebook and Twitter as well the content encountered on other websites, this study examines whether content exposure through intermediaries affects the diversity of news accessed by German internet users. This study examines three months of web browsing histories, including survey responses, from a sample of German internet users (N = 739) to investigate how the use of intermediaries -- and the diversity of content encountered on these platforms -- affects direct visits to news websites and the diversity of news encountered. The analysis uses random-effects within-between (REWB) models, with data hierarchically structured at individual and daily levels. Preliminary findings align with existing research, demonstrating that engagement with intermediaries is positively associated with greater diversity in news exposure. Further, results show that individuals who engage with intermediaries tend to have richer and more diverse news diets within intermediary platforms compared to the diversity encountered through direct visits to news websites. However, this increased diversity comes with a caveat: users are also more likely to encounter hyperpartisan news through these platforms. This study offers insights into the role of intermediaries in the dissemination of information and their potential impact for information diversity in digital environments. By providing a nuanced perspective on the mechanisms driving news diversity, it advances the field's understanding of the relationship between online intermediaries and diverse media diets. |
10:45am - 11:45am | 2.3: Voting Behavior and Information Sources Location: Hörsaal C Session Chair: Roland Abold, infratest dimap Ges. für Trend- und Wahlforschung, Germany |
|
The Impact of Voting Advice Applications on Voting Behavior: Evidence from the 2024 Austrian Elections University of Vienna, Austria Relevance & Research Question: Voting advice applications (VAAs) are digital tools providing personalized voting advice based on the match between party positions and the user's opinion. As such recommendations can influence decision-making processes and bear implications for electoral outcomes, understanding how VAAs influence voting behavior is essential for assessing the role of technology in strengthening or weakening democratic processes. This study addresses the following research question: "How do voting advice applications affect voting behavior?" Assessing Voter Fatigue: Media Consumption and Political Engagement Across Israeli Election Cycles (2019-2022) The Max Stern Yezreel Valley College, Israel Relevance & Research Question This study examines voter behavior under conditions of frequent elections and political instability, focusing on Israel's unprecedented period of five national elections between 2019-2022. Analyzing this unique case of democratic stress, we investigate how recurring election cycles influence Israeli voters' media consumption and engagement patterns, and explore their relationship with voting intentions. The insights contribute to understanding the interplay between political volatility, media consumption, and democratic participation. Methods & Data The study employed survey data from 2,000 Israeli participants recruited through the Midgam Project Web Panel, using stratified sampling aligned with Israeli Central Bureau of Statistics demographics. Data collection spanned four election cycles between 2019-2022, with surveys conducted prior to each election. Variables measured included traditional and digital media consumption patterns, social media engagement with political figures, and changes in voting intentions. Results Analysis revealed distinct trends across the 2019-2022 election cycles. Traditional media consumption peaked in 2019 but declined significantly in subsequent elections, reaching its lowest point in 2021 before a modest recovery in 2022. Digital media consumption showed steady growth throughout the period. Social media engagement with political figures exhibited a complex pattern: after decreasing in the second and third election rounds, it rebounded in 2022. Notably, logistic regression analysis indicated that while higher general social media consumption correlated with stable voting intentions, tracking politicians across multiple platforms significantly increased the likelihood of voting intention changes. Added Value This research provides novel insights into voter behavior under conditions of repeated elections, challenging assumptions about voter fatigue in highly contested democratic environments. The findings demonstrate how different forms of media consumption influence political engagement and voting stability, particularly highlighting social media's nuanced role in shaping electoral behavior. These results have important implications for understanding democratic participation during periods of political instability and inform strategies for maintaining voter engagement in similar contexts. Click for Clarity? Examining the effect of optional information on prediction accuracy in Swiss Referenda 1YouGov Schweiz AG, Switzerland; 2Universität Konstanz, Germany Relevance & Research Question Methods & Data Results Added Value |
10:45am - 11:45am | GOR Thesis Award I: Bachelor/Master Location: Hörsaal D Session Chair: Olaf Wenzel, Wenzel Marktforschung, Germany |
|
Predicting Travel Purpose in a Smartphone-Based Travel Survey 1Utrecht University, The Netherlands; 2Statistics Netherlands, The Netherlands Relevance & Research Question The general population travel survey is burdensome for the respondents as each This study aims to answer research question "How well can we predict the travel purpose using sensor data from a smartphone-based travel diary study?” Methods & Data CBS collected the data from November 2022 to February 2023 using the ODiN app and contains a total of 505 users. Administrative data, or demographic variables, from the CBS database is linked with the location data. Multiple bounding boxes with varying radiuses were determined from OSM for different tags associated with trip purposes. The data will be partitioned into training and testing sets with a ratio of 80:20 from the 4961 locations available. This study used an Artificial Neural Network (ANN) and as comparison, the balanced accuracy of Random Forest (RF), Naive Bayes (NB), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB) were evaluated. Weather data that has a potential to assist trip-purpose prediction was also included. Results Added Value Sensor data from a smartphone-installed travel diary application successfully predicts the purpose of a trip by utilising spatial and temporal patterns. Prediction models give more importance to trends that occur over time in specific locations. Some types of stops would be more important for making accurate predictions than others. Satisficing in a German Self-Administered Probability-Based Panel Survey Deutsches Institut für Wirtschaftsforschung, Germany Relevance & Research Question When applied to the survey research context, satisficing theory describes a range of behavioral strategies survey participants may use to reduce the cognitive effort required to answer questions conscientiously and truthfully (i.e., optimally). The resulting response error undermines data quality, affecting both reliability and validity of survey results. While general recommendations for preventing satisficing exist, they do not fully account for the complexity and individuality of the behavior. Therefore, satisficing remains a significant challenge to survey researchers. If we could predict which respondent is at risk of a certain type of satisficing behavior, we may be able to prevent satisficing using targeted interventions, particularly in a panel survey context where we have more information on satisficing types and their correlates over time. This study aims to explore the potential for such targeted approaches by investigating the robustness and predictiveness of survey satisficing in self-administered mixed-mode panels. The key research questions are:
Methods & Data The analyses base on the first three waves of the German Social Cohesion Panel (SCP), which is conducted jointly by the German Institute for Economic Research (DIW Berlin) and the Research Institute Social Cohesion (RISC) recruited in 2021. The SCP is a mixed-mode panel survey with participants self-selecting into either paper-and-pencil (PAPI) or web (CAWI) mode. The sample consists of 17,029 individuals nested in 13,053 households. I generated several indicators of satisficing behavior, including extreme and midpoint response selection, open and closed question nonresponse, speeding (in CAWI), and nondifferentiation (Nd) across item batteries. Nd is the tendency to select the same or similar response categories across a number of items, resulting in overall limited response variability. I measured nondifferentiation using the mean root of pairs method, distinguishing between unidimensional (weak Nd) and multidimensional or reverse-coded item batteries (strong Nd). To identify distinct patterns of satisficing behavior, I employed latent class analysis (LCA). To account for the hierarchical data structure, I used robust standard errors. For each combination of survey wave (waves 1, 2, 3) and mode (PAPI, CAWI), I estimated six latent class models (assuming one to six latent classes). From the 36 models, I selected the six final models based on information criteria (BIC, CAIC, SABIC) and classification diagnostics (entropy, average posterior probabilities, odds of correct classification). Finally, I interpreted the six resulting latent class models regarding their homogeneity and distinctiveness. Afterwards, i used multinomial logistic regression to investigate whether individual-level characteristics such as sociodemographic factors, predict class membership. Finally, I applied logistic regression to predict the most likely latent class membership in a wave by the estimated posterior class probability for the same latent class in a previous wave. Results Across the online survey mode, the LCA models identified three consistent latent classes that differ in their propensity to engage in certain satisficing strategies: The largest class are Optimizers. Counterintuitively, optimizers do not completely dispense with satisficing but exhibit comparably little and unspecific satisficing behavior. Optimizers may occasionally skip questions or apply nondifferentiation to reduce cognitive effort, sometimes speeding through questions. ExtreMists are the second largest class. A typical ExtreMist is very likely to nonrespond to at least one closed as well as open question, reliably generating item missings. ExtreMist’s responses will likely be the highest or lowest extreme values of the given response scales. By providing an approximate answer, they circumvent the cognitive engagement that nuanced and differentiated responses necessitated. Indifferents are the smallest class. Typical Indifferents can be identified through nondifferentiation with a tendency to select the midpoints of response scales. Furthermore, they are at risk to speed through the survey. However, in the paper mode, models demonstrate variability in their global-level structures across waves. A fourth class of "Missers," exclusively emerging in the second wave's paper mode, was identified as primarily engaging in item nonresponse. Besides, no class of optimizers was identified in the paper mode of the third wave. In the multinomial analyses, I found individual-level characteristics, such as education and income to be associated with class membership, with limited predictive power. Regarding the robustness of class membership, the regression analyses revealed that the estimated posterior probabilities of belonging to a satisficing class in a previous wave were consistently significant predictors of belonging to that same class in the future, with odds ratios ranging from around 2 to 10. In some models, I found main effects of the survey mode as well as moderation effects of the survey mode on the effect of the past on the future satisficing strategy. The effect sizes are consistently modest, with Nagelkerke R2 values between 0.6 to 0.25. Added Value This study provides several important insights for survey research. First, it demonstrates the feasibility of using LCA to identify distinct typical satisficing patterns in self-administered mixed-mode surveys. Second, the identification of distinct satisficing patterns across survey waves and modes suggests the potential for targeted interventions to mitigate satisficing. Third, the finding that the global-level satisficing patterns replicate across survey waves for the online mode but not for the paper mode suggests that mode-specific approaches to targeted interventions may be necessary. Fourth, the moderate predictiveness of past satisficing behavior on future satisficing as well as the limited predictive power of individual-level characteristics indicate that while individual-level characteristics play a role, situational factors also substantially influence satisficing. This implies that targeted interventions should not rely solely on past behavior but should be enhanced with real-time data and the application of learning algorithms. Fifth, the finding that even optimizers have non-negligible risks of engaging in undesirable response behavior suggests that innovative prevention methods should target not only the extreme cases but also individuals who are more under the radar but potentially more approachable.
|
10:45am - 11:45am | INVITED SESSION I: Session bei markforschung.de Location: Max-Kade-Auditorium Session Chair: Holger Geissler, marktforschung.de, Germany |
|
Podiumdiskussion: KI in der Markt- und Meinungsforschung – Wo soll das eigentlich alles hinführen? 1Moderation: Holger Geißler, Geschäftsführer marktforschung.de & succeet GmbH; 2Experte für Markenmanagement, Neuromarketing und Künstliche Intelligenz; 3Berater Marktforschung und Markenführung, vormals Geschäftsführer Kantar Deutschland; 4COO von Civey; 5Head of Business Development KERNWERT GmbH Podiumsdiskussion Die aktuelle Diskussion um den Einsatz von KI in der Marktforschung ist auch zwei Jahre nach dem Launch von ChatGPT vor allem geprägt von der Frage, an welchen Stellen KI eingesetzt werden kann, um Prozesse zu vereinfachen oder zu automatisieren oder mittels KI neue Tools und Methoden zu entwickeln. Was bislang kaum diskutiert wird, ist die Frage, welche Auswirkungen der verstärkte Einsatz von KI auf die Branche der Markt-, Sozial- und Meinungsforschung an sich haben könnte. Welche Veränderungen sind zu erwarten? Wie wird sich die Branche unter dem Einfluss von KI verändern? In der Diskussionsrunde sollen folgende Aspekte thematisiert werden:
Podiumsteilnehmer:
Moderation: Holger Geißler, Geschäftsführer marktforschung.de & succeet GmbH |
11:45am - 12:00pm | Break |
12:00pm - 1:15pm | 3.1: Virtual Interviewing Location: Hörsaal A Session Chair: Tanja Kunz, GESIS – Leibniz Institute for the Social Sciences, Germany |
|
Video-interviewing as part of a multi-mode design in panel studies: insights from the field 1Deutsches Institut für Wirtschaftsforschung (DIW Berlin); 2GESIS - Leibniz-Institut für Sozialwissenschaften; 3Humboldt-Universität zu Berlin Relevance & Research Question To ensure continued survey participation and data quality among respondents, the panel survey landscape must adapt to the changing societal reality, especially in terms of mobility and digitalization. Computer-Assisted Web-Interviewing (CAWI) has been shown to be a useful self-administered and cost-effective survey mode. Nevertheless, in surveys with complex instruments, interviewer assistance may still be necessary. Given a growing prevalence of videoconferencing in many people’s lives, Computer-Assisted Live Video Interviewing (CALVI) has potential to be a solution.
To develop a high-usability CALVI methodology and examine fitness-for-purpose of this mode across societal subgroups, we have gathered data on panel respondents’ willingness to try CALVI in the Socio-Economic Panel (SOEP), developed and pretested an implementation of CALVI together with infas Institute for Applied Social Science, and started fieldwork in an experiment comparing CALVI to online self-administered and offline face-to-face modes in the context of the SOEP Innovation Sample (SOEP-IS).
To assess the potential of CALVI in household panel surveys in Germany, we introduced a hypothetical inquiry regarding respondents' willingness to be surveyed via videocalls in the SOEP 2022 data collection wave. Of the 22,549 respondents who provided a valid answer, 39% expressed their willingness to try CALVI. Based on the results, we designed a CALVI implementation trial experiment for the SOEP Innovation Sample. In initial pretests, 600 participants of an infas panel study were invited to try video interviews. In total, 73 of those made an appointment for the interview and 46 participated in the interview. On average, the interviews were 106 minutes long and were conducted by experienced and CALVI-trained infas interviewers. Our presentation will provide more in-depth results from the hypothetical scenario and the pretest as well as lessons learned from the experimental fieldwork.
The results of our feasibility project will inform future implementations of CALVI, especially with regard to usability for both respondents and interviewers as well as fitness-for purpose of the new mode for different population subgroups. Furthermore, our results will be used to develop targeted multi-mode strategies for data collection in SOEP-IS. Willingness to Participate in Surveys Administered by Smart Speakers Technical University of Darmstadt, Germany Relevance & Research Question The Internet of Things (IoT) is a rapidly evolving technology, bridging the physical and virtual worlds (Zhang, 2021). By integrating Internet-connected devices supported by hardware or software, IoT enables the emergence of smart devices with communication capabilities. These capabilities create a new form of interaction between humans and computers, enabling new possibilities in survey research. Smart speakers could conduct interviews more cost-efficient than face-to-face interviews and potentially improve data quality compared to web surveys. However, it is unclear whether people are willing to particpate in interviews administered by smart speakers. This study examines socio-demographic factors associated with the willingness to participate in such interviews. Methods & Data Data was collected from two non-probability online access panels in the U.S. in Spring 2024 and a similar online panel in Germany in November 2024 with samples representing the general English-speaking U.S. population aged 18 and older and the respective German counterpart. Each country had around 2,000 respondents. The analysis included socio-demographic characteristics like gender, age, ethnicity, education, income, and metropolitan status. Additionally, participants’ attitude on the importance of surveys were considered to explore factors influencing willingness to participate. Results In the U.S., 25% of repondents are “very likely” to participate in a survey conducted by a smart speaker. Willingness was significantly influenced by factors such as age, ethnicity, gender and attitudes towards survey importance. Younger respondents, males, African Americans and people who placed higher importance on surveys were significantly more likely to report willingness to participate. Analyses of the German data is ongoing to compare the willingness and the factors influencing the willingness between the two countries. Added Value This study underscores the selective appeal of smart speaker surveys, suggesting that while not universally preferred, they may reach traditionally underrepresented groups. Smart speaker surveys can extend survey methodologies for technology-affine participants in mixed-mode designs. Although current acceptance varies across demographics, investment in this approach is advisable, as its applicability is expected to broaden with the advancing IoT technology. Data Quality Investigations of Online Live Video Interviewing: Empirical Evidence from Several Major UK Social Surveys 1University of Southampton, United Kingdom; 2University of Queensland; 3UCL, University College London; 4NatCen; 5Ipsos Relevance & Research Question Many surveys in the UK are transitioning to online data collection. However, long surveys or those involving complex elements, such as data linkage consent, cognitive assessments, and sensitive questions can be difficult to move to online self-administered data collection. As a result, in the UK several surveys explored online live video interviewing (VI), which represents a visual video-mediated, remotely interviewer-administered, and computerised survey mode. This presentation investigates the use of VI, focusing on opportunities and barriers, and collating evidence from seven major social surveys in the UK, with an emphasis on longitudinal surveys and the collection of complex data. The research questions are:
Methods & Data This study uses data from seven surveys conducted in the UK 2020-2023: National Child Development Study, British Cohort Study, Next Steps, Children of the 2020s, English Longitudinal Study of Ageing, European Social Survey and Health Survey for England pilot. We examine response rates, composition of VI samples and response to complex questionnaire items. Results One of the main findings is that VI in the UK is used either as the primary survey mode, or as a complementary mode in mixed-mode designs. VI as the primary data collection mode can lead to lower response rates and potentially also to an increase in representation bias. On the other hand, there are encouraging findings, including that this mode proves to be a very suitable approach for collecting complex elements. This is a key finding since previous research has identified limitations of other remote methods for collecting this kind of data, which is an important component of many studies, especially longitudinal studies. Added Value This paper is the first to investigate the use of online VI in UK surveys. Particular benefits exist for longitudinal surveys. The collection of complex data is common practice in longitudinal studies and evidence presented here suggests VI performs well. |
12:00pm - 1:15pm | 3.2: Respondent Engagement and Attrition Location: Hörsaal B Session Chair: Ellen Laupper, Swiss Federal University for Vocational Education and Training SFUVET, Switzerland |
|
Attrition patterns and warning signs in a long-term, high frequency probability online panel University of Mannheim, Germany Relevance & Research Question All longitudinal- and panel studies are confronted with the gradual loss of respondents, i.e., panel attrition. Over time, this leads to lower respondent numbers, loss of statistical power, and, if the attrition is nonrandom, systematic loss of specific respondents and thus biases in the remaining sample. Using data from a long term high-frequency panel, we investigate (1) which respondents are disproportionately lost over time and may be specifically targeted, oversampled, or the survey adjusted to facilitate participation, (2) what are warning signs of attrition, and (3) which survey features are associated with higher attrition and thus present opportunities for optimization. Methods & Data Using data from a probability online panel of the German population, we analyze respondents’ participation in over 70 panel waves spanning 12 years. This gives us the rare opportunity to analyze rich data from a high frequency and long running panel. Descriptively, we observe how the panel composition changes over time and which respondents are disproportionately lost. Using a survival model, we investigate risk factors for attrition on the respondent level, in their participation patterns, and in survey features. Results We observe high attrition over the first panel waves and slower but steady loss of respondents long term. Over time, the sample tends towards being higher educated, more likely to be married, and skews slightly more male. Attrition risk is lower for younger, higher educated, and full-time employed respondents. Higher attrition risk is associated with patterns of infrequent participation (breakoffs, missing panel waves, participating late during field time, item nonresponse), but not with interrupting the survey to continue later. Higher attrition risk is also associated with longer and poorly rated survey waves. Added Value For panel practitioners, it is important to understand attrition patterns to accurately predict how many respondents to expect in future waves, when and how many respondents to recruit, and which groups should be specifically targeted or oversampled. We identify several groups that are at higher risk for attrition, early warning signs that may be used to counteract attrition with targeted interventions, and opportunities to optimize surveys for continued participation. Do we need to include offliners in self-administered general population surveys? An analysis of 150 substantive variables in a probability-based mixed-mode panel survey in Germany GESIS – Leibniz Institute for the Social Sciences, Germany Relevance & Research Question: Due to concerns about bias stemming from the undercoverage of non-internet users, most probability-based surveys try to include offliners (i.e., respondents not able or willing to participate online). Previous research shows that including offliners increases accuracy for some socio-demographic characteristics while not for others. These studies lack the inclusion of substantive variables. We aim to address this research gap by answering the following research question: Does the inclusion of offliners in a probability-based panel impact measures of substantive variables? Methods & Data: We use data from the GESIS Panel.pop, a probability-based self-administered mixed-mode panel of the German general population, surveyed via web and mail mode. We analyze around 150 substantive variables from six core studies collected since 2014, which we compare between the whole sample and a sample of only onliners (i.e., without offliners). To assess the impact of including offliners, we compute differences between both samples for each substantive variable and compute average absolute relative bias (AARB) for each variable and by (sub-)topic. In addition, we re-run these analyses for different definitions of onliners and offliners and for different recruitment cohorts. Results: Comparing the online-only subsample with the complete mixed-mode sample that includes offliners shows statistically significant average absolute relative biases for all six topics (subjective wellbeing; social and political participation; environmental attitudes and behavior; personality and personal values; media usage; work and leisure) and for socio-demographic variables. This findings shows that univariate estimators for a wide variety of topics differ depending on whether offliners are included or not. Added Value: Our study contributes to the practical challenge of deciding whether to include the offline population in surveys by employing a costly and labor-intensive mail mode. We go beyond previous research by examining a wide range of substantive variables which will enable us to draw conclusions on topic areas in which including offliners is more warranted than in others. We expect our findings to be of relevance for survey practitioners and substantive researchers alike. Reducing Political Science Surveys’ Attrition Bias Without Limiting Substantive Research: Potentials of Adaptive Survey Design and Missing Data Strategies. GESIS - Leibniz Institute for the Social Sciences, Germany Relevance & Research Question Adaptive Survey Designs that use respondents' topic interests can reduce the overrepresentation of politically engaged respondents in political science surveys. Politically disengaged respondents would receive a questionnaire combining political and non-political modules to boost participation, while politically engaged respondents receive a purely political questionnaire. However, assigning respondents based on political interest may distort research if variables of interest correlate with political interest. Instead, researchers could assign only a certain percentage of respondents with low political interest to tailored questionnaires and use missing data strategies based on main survey data to correct biased estimates. This paper aims to assess whether adaptive designs that rely on content variation bias substantive research and whether missing data procedures mitigate this bias. Methods & Data Using the probability-based GESIS Panel, I simulate several datasets in which 50% to 100% of politically disengaged respondents’ responses to various variables are set as missing. I then run several regression models using the original and simulated datasets to answer RQ1. To answer RQ2, I calculate the regression models using inverse probability weights and multiple imputation. Results Preliminary results suggest that adaptive survey designs that vary a questionnaires’ content can bias substantive research if more than 50% of respondents are randomly assigned to question modules that are supposed to be more interesting. However, assigning only a certain percentage of respondents and applying missing data strategies instead reduces biased estimates and corrects confidence intervals. Added Value While adaptive survey designs with content variation may reduce attrition bias in political science surveys, researchers must be cautious if variables of interest correlate with political interest as estimates may be biased. Assigning only a certain percentage of respondents to a question module that is supposed to be more interesting to them mitigates the risk of biased estimates when using inverse probability weights or multiple imputation. Quantifying Questionnaire Design: A Holistic Approach to Data Quality and User Engagement Ipsos Relevance & Research Question The market research industry continues to face quality issues, often prioritizing the identification and removal of "problematic" respondents. However, this approach overlooks a crucial aspect of data quality: adapting questionnaires to align with respondents' lifestyles and enhancing their appeal to high-quality panel members. By focusing on survey design optimization, we can potentially improve data integrity at its source, rather than relying solely on cleansing methods. This paper presents a quantitative perspective on enhancing data quality through optimized questionnaire design. Measuring questionnaire design effectiveness is complex due to the multitude of elements influencing respondent experience. While previous research has attempted to identify factors that make questionnaires difficult to answer within one research type and/or topic, we lack clear thresholds on what a good respondent is able to answer without risking diminishing response quality. Our research addresses this gap by analyzing a comprehensive dataset from Ipsos' global operations over a multi-week period. We propose a questionnaire segmentation system, correlating design elements with Ipsos' quality indicators as a proxy for engagement. This approach allows us to identify measurable factors that minimize drops in respondent engagement and provide clearer recommendations for effective questionnaire structures. Importantly, we acknowledge the challenge of conducting research on research, given the significant influence of topic and interest levels on respondent behavior. Our methodology accounts for this variability, offering insights that are applicable across diverse research contexts. Methods & Data |
12:00pm - 1:15pm | 3.3: - Location: Hörsaal C |
12:00pm - 1:15pm | 3.4: Digital Battlegrounds Location: Hörsaal D Session Chair: Long Nguyen, DeZIM – German Center for Integration and Migration Research, Germany |
|
War Journalism on Telegram? Ethical and Professional Dilemmas of Telegram Content Creators During War The Max Stern Yezreel Valley College, Israel Relevance & Research Question The Iron Swords War, which erupted in October 2023, has witnessed the emergence of Telegram as a significant news platform in Israel. The war was marked by the proliferation of alternative news channels and their exponential growth in followers. This study analyzes items from the nine most-followed independent Telegram news channels during the first year of the war. It examines how content creators address ethical and professional dilemmas traditionally associated with conventional journalism, as reflected in their posts. The main research questions are: What types of professional and ethical dilemmas emerge in these Telegram channels, and how do these align with or diverge from conventional journalistic dilemmas? Methods & Data The study employs qualitative content analysis of posts from selected Telegram news channels, sampled between October 2023 and September 2024, beginning from the October 2023. A systematic random sampling yielded 300 posts (distributed equally across channels), which were thematically analyzed to identify explicit and implicit references to ethical and professional dilemmas. These were subsequently compared with the Israeli Journalists Association's ethical code of conduct to identify patterns of convergence and divergence. Results Content creators across all analyzed channels consistently grappled with professional and ethical dilemmas throughout the study period. The primary tension emerged between respecting victims' families' sensitivities and fulfilling the professional obligation to inform the public. A secondary dilemma involved balancing journalistic transparency against national security considerations. Notably, these channels demonstrated limited adherence to traditional journalistic practices regarding error correction and fact-verification protocols. Added Value This research contributes to the emerging scholarship on Telegram as a news platform and presents the first systematic analysis of its role during active warfare. By examining content creators' navigation of professional and ethical challenges, it illuminates the evolving nature of wartime journalism in digital spaces. The findings enhance our understanding of both crisis reporting and digital alternative journalism while raising important questions about accountability in emerging news platforms. Early Warnings: Forecasting Edits and Disputes on Wikipedia Armed Conflict Pages 1Oxford Internet Institute, Oxford, United Kingdom; 2Einstein Center Digital Future, Berlin, Germany Relevance & Research Question: In today’s digital world, news websites, online encyclopedias and other knowledge platforms are crucial for many to gain access to reliable sources of information. On Wikipedia — one of the most frequented web pages globally — every edit to an article regarding an event must cite a reliable source. Therefore, we hypothesise that edits on Wikipedia pages can be predicted using related news articles. Understanding the process of updating Wikipedia pages via news articles gives insight into how information, including misinformation, integrates into a platform used by the public as a reliable source of information. We consider pages on armed conflicts as a case study, as these are often fast-developing events with much information need. Methods & Data: We examine the Russo-Ukrainian War, the Mali War, and the Sudanese Civil War over a 4-year time period. We use an API to collect online news articles as well as Wikipedia articles and edits. The Wikipedia pages and news articles are linked to each other using keywords which appear in both sources. We then use three machine learning models to predict the number of Wikipedia edits per day using the news sources. To examine how news articles relate to edits on a more granular level, we introduce a novel metric called the difference measure. Results: We find that all the machine learning models perform well, producing an MSE score of less than one that is stable across different wars and pages. Of notable interest is the fact that the titles of BBC articles act as sufficient predictors. Additionally, the difference measure is able to supplement traditional controversy measures by identifying excessive edits in response to real-life events. Added Value: This framework is the first that forecasts when information on Wikipedia is likely to change on a granular scale using exogenous sources. Hence, it could be used as an early warning tool of a page’s vulnerability on any online platform that is regularly updated by an active user community. This gives the moderators of the community more time to implement protection methods to preserve the quality of information. Is Wikipedia a battleground of the Russo-Ukrainian war? University of Oxford, United Kingdom Relevance & Research Question This article investigates the link between territorial conflict and digital disputes, focusing on Wikipedia during the Russo-Ukrainian war. Wikipedia, one of the most visited websites, is regarded as a "consensus truth" in Western societies and has increasing significance due to large language models that rely heavily on its content, making it a key data source for developing AI tools. Despite its role as a reliable information source, the platform's open-access nature makes it vulnerable to manipulation and attempts to influence public opinion. Therefore, we hypothesize that the 2022 invasion of Ukraine led to heightened attention and disputes on Wikipedia, particularly in articles about contested Ukrainian regions. Methods & Data To analyse the digital impact of territorial disputes, we use Wikipedia data and classify regions into three groups: disputed Ukrainian Oblasts, undisputed Ukrainian Oblasts, and Polish Voivodeships. Disputed Oblasts are defined as areas where Russian forces gained territory, or Russia annexed a We develop a custom Natural Language Processing (NLP) method to identify dispute edits on the Wikipedia pages of disputed Ukrainian Oblasts. In addition to using the number of dispute edits as a conflict proxy, we quantify the 2022 invasion’s impact with two further metrics: the number of revisions (proxying editor community attention) and the number of identity reverts (indicating generalized disputes). We use a difference-in-difference (DiD) regression setup to measure the invasion's effect on all three dependent variables. Results We find a significant increase in attention and dispute on articles about disputed Ukrainian regions compared to those about undisputed regions. Furthermore, we find that dispute edits frequently involve debates over the Ukrainian versus Russian spelling of place names and discussions of national identity, reflecting the broader conflict. Added Value The results presented here confirm our initial assumption that territorial conflicts are spreading into the digital realm in the case of the Russo-Ukrainian war. While research exists on other media, such as social media, newspapers, and mass media, our findings shed light on a previously overlooked digital conflict zone: attempts to change the identity of disputed regions on Wikipedia. |
12:00pm - 1:15pm | GOR Thesis Award II: PhD Location: Max-Kade-Auditorium Session Chair: Olaf Wenzel, Wenzel Marktforschung, Germany |
|
Identifying, Characterizing, and Mitigating Errors in the Automated Measurement of Social Constructs from Text Data University of Mannheim, Germany In this abstract, I summarize the contents and contributions of my doctoral thesis “Identifying, characterizing, and mitigating errors in the automated measurement of social constructs from text data”. My doctoral research is highly interdisciplinary and sits at the intersection of Natural Language Processing (NLP) and Social Science. Relevance: Computational Social Science (CSS) brings a transformative approach to social science, leveraging digital trace data—data generated through online platforms and digital activities [1]. This form of data is distinct from traditional social science data like surveys, providing vast, real-time information on social constructs, or human behavior and attitudes. Given the large-scale nature of digital traces, new methods of analysis are needed to draw insights from them; particularly automated techniques using NLP and Machine Learning. Yet, despite their advantages, research with digital traces and automated methods also present unique challenges [2]. Digital trace data lacks well-established validity measures and faces biases from platform-specific artifacts, incompleteness, and methodological gaps. Traditional social science emphasizes robust frameworks to minimize biases in survey data, but such practices are underdeveloped for digital trace data, making the rigorous study of its limitations critical. By systematically identifying and addressing these issues, we can advance the field and offer more reliable insights into complex social constructs like political attitudes, prejudicial attitudes, and health-related behaviors. This is what I set out to do in my doctoral dissertation, by exploring and answering the following research questions: Research Questions: 1. RQ1: What errors and biases arise when using digital trace data to measure social constructs? 2. RQ2: Can survey items enhance the validity of computationally measured social constructs from digital trace data? 3. RQ3: Can both manual and automated data augmentation techniques improve the robustness and generalizability of these computational measurements? Methods & Data Data: I use various online data sources in this dissertation and combine survey data and survey scale items. For RQ1, we conduct a scoping review of research that uses Twitter/X, Wikipedia, search engine data, and others. For RQ2, we utilize data from Twitter/X to study sexism, and Glassdoor, a platform where employees review their workplaces, to study workplace depression. To incorporate social theory into our computational models for RQ2, we also use survey items for sexism [3, inter alia] and workplace depression [4], as well as survey-based estimates of depression in the US for validation. For RQ3, we use data from Twitter/X, Reddit, and Gab to study sexist and hateful attitudes. Methods: The study combines computational techniques, specifically NLP and Machine Learning (ML), with quantitative and qualitative approaches rooted in social science. For RQ1, we use a combination of literature review, case studies, and qualitative analysis to conceptualize an error framework inspired by traditional survey methodology, particularly the Total Survey Error (TSE) Framework [5]. Our framework, the “Total Error Framework for Digital Traces of Human Behavior on Online Platforms” (TED-On) adapts the TSE to address idiosyncrasies of digital traces, e.g., the effect of the online platform. Our framework helps in systematically identifying errors when using digital trace data for measuring social constructs. For RQ2, we incorporate theory into computational NLP models in two ways — using survey items to guide the creation of labeled training data which is then used to train computational models, or to use them with sentence embeddings [6] for a semi-supervised approach. For RQ3, we use manual and automated data augmentation to create better NLP models. We use Large Language Models (LLM) like GPT3.5 for automated synthetic data. Finally, we also devise robust evaluation approaches that specifically account for the generalizability of computational methods. Results RQ1: Through the TED-On framework, two main error types are highlighted: (1) measurement errors due to content misalignment with constructs and (2) representation errors due to biased or incomplete data coverage. RQ2: survey-inspired codebooks and models help structure the analysis of digital trace data around specific and holistic theoretical dimensions. Concretely, we find that models developed on our theory-driven codebook outperform existing state-of-the-art models by 6% higher F1 scores. Our workplace depression classifier was also validated and found to correlate with state-level depression scores (r = 0.4). RQ3: Manual and automated data augmentation techniques increase computational models' robustness, especially in cross-domain applications. For example, synthetic data created to detect sexism and hate speech resulted in models that generalize better across platforms, minimizing reliance on platform-specific artifacts. Both manual and LLM-generated augmented data improve the out-of-domain generalizability of computational models with improvements of 5-12% F1 scores for different domains, compared to scores of around 55% F1 from previous models. Added Value. This thesis makes significant contributions to three foundational aspects of CSS: 1. Theory: The research introduces measurement theory adapted for digital traces, creating a shared vocabulary for the interdisciplinary field. The TED-On framework aids in error documentation and model validation tailored to digital data challenges. 2. Data: This work develops several datasets: a Twitter/X dataset on sexism, synthetic training data for detecting sexism and hate speech, and a dataset detailing workplace depression rates across companies and states, offering valuable resources for CSS research. 3. Methods: The research introduces theory-driven and generalizable NLP models for identifying sexism and hate speech, and semi-supervised models for analyzing workplace depression. These contributions provide a roadmap for future CSS studies which could further refine social construct measurement from new digital trace data sources References
The Power of Language: The Use, Effect, and Spread of Gender-Inclusive Language Nuffield College, University of Oxford, United Kingdom Relevance & Research Question This dissertation investigates a new social phenomenon: the use of gender-inclusive language (GIL). Leveraging language as a strategic research site, this paper-based dissertation contributes to understanding a current social phenomenon in Germany and advances the sociological understanding of how behavioural change occurs and is experienced. It creatively combines different online research methods (web scraping, text analysis, a digital field experiment, and qualitative online interviews). GIL refers to changing person nouns to be gender-inclusive, akin to the shift from policemen to police officers in English. In German, a researcher is a male researcher (Forscher) or a female researcher (Forscherin). A gender-inclusive alternative would be Forscher*in. The convention is to use the masculine form generically, which holds a strong and empirically documented male bias. In 2020, GIL was a relatively new and highly discussed topic in Germany. Everyday observations indicated an increase in GIL use (e.g. in iOS software and Spotify), yet academic sources viewed it as a marginal phenomenon. GIL is a potential equal opportunities tool owing to extensive research documenting the ability of GIL to mitigate the male bias in language. From a sociological perspective, it provides an opportunity to retrospectively study behavioural change using large-scale data without researcher interference. This dissertation asks:
Methods & Data Paper 1 uses web scraping to craft two unique datasets that were then analysed using quantitative text analysis, part-of-speech tagging, and manual annotation. Using a programme written in Python, I accessed the Deutscher Referenzkorpus (DeReKo) to collate GIL occurrences in over 4 million newspaper articles published in five different media outlets between 2000 and 2021. This measured the frequency of GIL but not of the generic masculine. Therefore, I also web-scraped the five different media outlets to gather full-length newspaper articles, which I then annotated to measure the relative use of GIL. I conducted differential analyses by the political orientation of the media outlet (left, centre, right), the type of GIL (10 different types), and the author’s gender (identified using part-of-speech tagging). Paper 2 is co-authored with Klarita Gërxhani and Arnout van de Rijt. It is a digital field experiment that tests the effect of GIL, specifically in job applications where previous research has demonstrated an increase in girls’ and women’s attitudes and preferences towards stereotypically masculine jobs when GIL is used. We use Prolific, an online crowd-working platform, as an experimental labour market, where sign-up for our advertised task itself was an outcome variable. We advertised participation in a stereotypically masculine task (solving maths problems) and varied the use of GIL in the advertisement (on Prolific) and the task description (on Qualtrics) in a 2x2 between design, which allowed the separation of a recruitment effect and a performance effect. We then measured how many women participated under each condition (two-tailed proportion test) and how well women performed when GIL was used (two-tailed independent t-tests). The experiment was pre-registered on OSF (https://osf.io/x8ft4) with a sample size of 2,000 based on power calculations. However, the participant pool was exhausted at 1,321 participants. It was fielded in Germany and Italy. As GIL is a relatively new behaviour, Paper 3 revisits the DeReKo data for the years 2022/23 and combines it with data from Google Trends and the Dow Factiva database to see not only how GIL use develops after its initial increase but also whether and how it is talked about. It then builds on the quantitative findings with prospective longitudinal qualitative interviews in combination with ego-centric network data collection (21 cases). The qualitative interviews were conducted online using Microsoft Teams, an online tool that was essential as 1) it allowed access to a geographically diverse group of research participants and 2) for participants to engage in participatory research by using the whiteboard function to let them draw their network. Results Paper 1: In addition to observing an unexpectedly rapid increase in GIL (reaching 800 occurrences per million words, or 16.5% of potential use), two different trends are identified: whilst non-binary inclusive forms of GIL are increasingly used in the left-leaning newspaper, GIL that adheres to a binary notion of gender is favoured in the mainstream and right-leaning media. Three conditions for difficult behavioural change are identified: having a role model, the possibility of a low-threshold adoption, and incremental adoption. Paper 2: We find no effect of GIL on the share of women nor on the performance of women. In each condition, the share of women was 43%, and there was no statistically significant difference in women’s mean performance (mean number of correct answers in math task was 3.3 with GIL, 3.5 without GIL). This may be because GIL only influences attitudes, not behaviours. It may also be an artefact of our research design, so we are planning a follow-up experiment. Paper 3: The data show that since 2021, the use of GIL has stalled, and the contestation of GIL has spiked. The current situation is what I refer to as incipient change: GIL has increased but has not replaced the generic masculine. Yet, it has also not waned. Turning to the interview data, I uncover a fragmented understanding of GIL (i.e., everyone knows GIL, but not everyone correctly understands what GIL is) alongside a surprisingly clear individual understanding of the circumstances in which GIL can or sometimes even should be used. Rather than a general separation of GIL users and non-GIL users (polarisation), I argue that the use of GIL seems to be strongly tied to context, particularly the professional context, thus mapping onto a coordination-based micro-level theory. Added value This thesis combines multiple online research methods to study different facets of the same highly relevant social phenomenon: GIL. First, it shows the strength of using webscraping and computational tools to measure macro-level patterns and using digital tools to access the micro level, underscoring the importance of online research not just for quantitative research but also qualitative. Second, it demonstrates how the shift of the labour market into the digital sphere enables new research designs and, with them, a new avenue of research possibilities.
|
1:15pm - 2:30pm | Lunch Break |
2:30pm - 3:30pm | 4.1: Poster Session Location: Foyer OG |
|
Mapping the Way Forward: Integrating APIs into Travel Surveys National Centre for Social Research, United Kingdom Relevance & Research Question: Technological advances are making it increasingly possible to collect information that augments traditional survey research. This study explores the performance of a survey integrated with an Application Programming Interface (API) that allows respondents to identify locations on a map. Specifically, we examine the accuracy and usability of this API integration, focusing on its potential to improve data quality in the context of travel diaries. Methods & Data: This research is based on a large-scale pilot conducted in May and June of 2024, where 7,500 addresses in Wales were issued. The web survey was programmed using Blaise 5 and integrated with the Ordnance Survey Places and Names databases, enabling respondents to locate places on a map. Data were collected from 1,008 individuals, resulting in 2,743 reported journeys. Results: Most respondents confirmed that the location information was correct (92% on day 1 and 95% on day 2), indicating a high rate of accurate matches for searched locations. The most commonly reported errors were related to the start or end points of the journeys, followed by the inclusion of journeys that did not occur. In 278 instances (7.9% of all map entries), respondents selected the "I could not find my location" option and provided a free-text description instead. These descriptions varied widely in specificity, ranging from precise locations to vague place names that could correspond to multiple places. While the API integration performs well overall, some respondents encountered difficulties with location precision, suggesting that some degree of post-survey editing will be required. Added Value: This study highlights the potential of API integration to enhance survey data collection by capturing detailed and specific geographic information. Our findings suggest that while the approach is largely effective, there are areas for improvement to ensure that respondents can accurately and easily identify locations on a map. This research contributes to the growing field of technological augmentation in survey methodologies, offering insights into the practical challenges and opportunities of API-enabled geographic data collection. Uncovering Housing Discrimination: Lessons Learned from a Large-Scale Field Experiment via Immoscout24 1German Centre for Integration and Migration Research (DeZIM); 2Bielefeld University, Germany; 3Freie Universität Berlin, Germany Relevance & Research Question The experiment involved sending over 2,000 standardized rental applications via the online portal Immoscout24 to landlords in 10 major German cities. These applications systematically varied applicant names to signal different ethnic backgrounds. The poster emphasizes key aspects of the technical implementation:
Results Straightlining in CS versus AD items, completed by Phone versus PC University of Groningen Relevance & Research Question Straightlining has been shown to be more prevalent in surveys that are completed on a PC than on smartphones. However, on device use for self-completion, differential effects of different items on straightlining have not been researched extensively. Agree-disagree (AD) items are assumed to evoke more straightlining than construct specific (CS) items (i.e., items with a different response scale for each item, depending on the response dimension being evaluated). To fill this gap, we aim to answer the following research question: What are the combined effects of AD versus CS items and device use on response patterns that are assumed to be indicators of satisficing, such as straightlining? Methods & Data Our survey was conducted in November 2024, with 3,500 flyers distributed across a neighborhood in a large Dutch city with subsequent face-to-face recruitment by students. The flyers included a QR code and URL for survey access. The survey was filled out by 556 individuals completing at least 50% of the questions (and 478 completing the full questionnaire), yielding a 13% response rate at household level. A smartphone was used by 85% of the participants, whereas 15% used a PC. Respondents were randomly assigned to four blocks of either five AD items or five CS items. Straightlining was defined in two ways; in a strict definition of providing exactly the same answer two five items asked in the same battery of items and by computing within respondent variance for each battery). Results Straightlining(both in the strict definition and in terms of lower variance) occurred more frequently in AD items than in CS items, with 18% of respondents in the AD condition showing straightlining, as opposed to 5% of respondents in the CS condition. PC respondents were more likely than smartphone respondents to straightline in battery items phrased as AD items, but this effect was not found when items were phrased as CS items. Added Value Our study shows that using CS items might be more beneficial when the questionnaire is filled out on a computer than on a smartphone. Bayesian Integration of Probability and Nonprobability Web Survey Data 1IAB, Germany; 2LMU-Munich; 3Utrecht University, the Netherlands; 4University of Manchester Relevance & Research Question The popularity of non-probability sample (NPS) web-surveys is increasing due to their convenience and relatively low costs. On the contrary, traditional probability-sample surveys (PS) are suffering from decreasing response rates, with a consequent increase in survey. Integrating the two types of samples in order to overcome their respective disadvantages is one of the current challenges. We propose an original methodology to combine probability and non-probability online samples to improve analytic inference on binary model parameters. Methods & Data To combine the information coming from the two samples, we consider the Bayesian framework where inference is based on the PS and the available information from the NPS is supplied in a natural way through the prior. We focus on the logistic regression model, and conduct a simulation study to compare the performance of several priors in terms of mean-squared error (MSE) according to different selection scenarios, selection probabilities, and sample sizes. Finally, we present a real data analysis considering an actual probability-based survey and several parallel non-probability web surveys from different vendors which reflect different selection scenarios. Results Added Value The method provides a means of integrating probability and nonprobability web survey data to address important trade-offs between costs and quality/error. For survey practitioners, the method offers a systematic framework for leveraging information from nonprobability data sources in a cost-efficient manner to potentially improve upon probability-based data collections. This can be particularly fruitful for studies with modest budgets or small sample sizes, where the greatest gains in efficiency can be achieved. |
2:30pm - 3:30pm | 4.2: Poster Session Location: Foyer OG |
|
Everybody does it sometimes: reducing Social Desirability Bias in an online survey using face-saving strategies University of Groningen, Netherlands, The Relevance & Research Question Methods & Data Results Decoding Straightlining: The Role of Question Characteristics in Satisficing Response Behavior 1GESIS - Leibniz Institute for the Social Sciences, Germany; 2University of Michigan; 3NORC at the University Of Chicago Relevance & Research Question Satisficing response behavior, including straightlining, can threaten the reliability and validity of survey data. Straightlining refers to selecting identical (or nearly identical) response options across multiple items within a question, potentially compromising data quality. While straightlining has often been interpreted as a sign of low-quality responses, there is a need to distinguish between plausible and implausible straightlining (see Schonlau and Toepoel, 2015; Reuning and Plutzer, 2020). With this research, we introduce a model that classifies straightlining into plausible and implausible patterns, offering a more nuanced understanding of the conditions under which straightlining likely indicates optimized response behavior (plausible straightlining) vs. satisficing response behavior (implausible straightlining). For instance, straightlining behavior is plausible when answering attitudinal questions with items worded in the same direction, but it becomes implausible when items are reverse-worded. This study further examines how question characteristics—including grid size, design (e.g., matrix vs. single-item formats), straightlining plausibility—influence straightlining behavior. Methods & Data For our analyses, we use the German GESIS Panel, a mixed-mode (mail and online), probability-based panel study, leveraging a change in the panel’s layout strategy in 2020 that shifted multi-item questions from matrix to single-item designs, offering a unique quasi-experimental set-up. We conduct multilevel logistic regression analyses to assess the effect of question design and grid size on straightlining behavior. We further conduct difference-in-difference analyses to examine the format change's effect on plausible and implausible straightlining. Results Our initial multilevel regression analyses, using data from 3514 respondents and 18 grid questions from the wave before and after the design switch, show that matrix designs are associated with higher levels of straightlining compared to single-item designs. Our preliminary analyses, based on coding by five survey methodology experts, classify 22.2% of these questions as exhibiting plausible straightlining, with the remainder showing implausible patterns. Further analyses investigate how these classifications correspond to conditions under which straightlining reflects optimized versus satisficing response behavior, offering deeper insights into the role of question characteristics. This research enhances questionnaire design and the accurate identification of low-quality responses, addressing gaps in linking question characteristics to straightlining plausibility. Political Communication on Social Media: Analysis of Strategies in the Bavarian State Election Campaign 2023 University of Regensburg, Germany Relevance & Research Question The research confirms Instagram’s importance as a platform for political outreach also in Bavaria, with high interaction rates and active participation from candidates across relevant parties. While all identified communication motives were present, their frequency and effectiveness varied. Posts focusing on personal self-presentation and political content, such as policy issues and party-related topics, were dominant. In contrast, fundraising and internal communication were rare, reflecting their lower relevance in this electoral context. Regression analysis revealed that personal self-presentation significantly boosted engagement, aligning with findings that informal content fosters relatability and connection. Negative campaigning also had a slight positive effect, capturing audience interest in the Bavarian election context. Posts focused on key policy issues and thematic content, while frequent, had little impact on engagement rates. This supports findings in social media research indicating that personal and informal content resonates strongly with audiences, likely because it promotes a sense of connection and relatability. The findings also suggest that the identified communication strategies alone are insufficient to explain the varying success of political posts in terms of engagement rates and reach. It appears that other post characteristics - such as technical factors (e.g., video format or interactive elements), emotional triggers, or the offline popularity of candidates - play a more significant role in driving engagement on Instagram |
2:30pm - 3:30pm | 4.3: Poster Session Location: Foyer OG |
|
Modular survey design: Experimental evidence from the German Internet Panel (GIP) 1University of Mannheim, Germany; 2forsa Gesellschaft für Sozialforschung und statistische Analysen mbH Relevance & Research Question Methods & Data Results Added Value gxc - an R package for spatial linking of Earth observation data with social indicators GESIS - Leibniz Institute for the Social Sciences, Germany Relevance & Research Question The unique feature of the tool is the possibility of carrying out both geographically and temporally medium- to high-resolution queries, which at the same time function efficiently on simple workstations. Our tested workflow development has identified five major levers: parameter type, indicator intensity, focal time period, baseline time period, and spatial buffer. Flexibility on these five attributes will be maximized for users. The tool also offers the functionality to automatically derive spatio-temporal links with other georeferenced data (e.g., surveys, digital behavioral data). Users benefit from the core variables integrated into the interface for social research. Examples include data on local air quality and pollutants, extreme weather events, or land use changes. Results Understanding Redemption Patterns: A Study of Points-Based Incentive Schemes in Online Panel Surveys IAB Nürnberg, Germany Research Question Many online surveys offer incentives to enhance response rates, betting on stronger motivation for a response once respondents' participation costs are rewarded. Commonly, respondents receive incentives such as cash or vouchers. Additionally, online panel surveys may include a points-based incentive program allowing respondents to accumulate reward points throughout the study and redeem them anytime to get a shopping voucher. We would like to address the following research questions: What are the distinct patterns of reward point redemption among online survey participants, and how can these be categorized into behavioural clusters? How do demographic and socioeconomic factors influence reward point redemption behaviours? Methods In 2023, the Institute for Employment Research in Germany launched a new online panel survey of the German workforce (IAB-OPAL) using a push-to-web approach. The quarterly survey utilises a post-paid points-based incentive program, allowing respondents to earn reward points in their accounts after completing the survey. They can collect these points over time and redeem them for shopping vouchers from various providers at their convenience. We comprehensively assess respondents' redemption behaviours of across five survey waves using individual tracking data on inflows and outflows of reward points of 13.513 panelists. First, we analyse recurring redemption patterns and identify distinct behavioural clusters by applying time series k-means. Second, we explore other dimensions of redemption behaviour, such as the timing of point redemption across different demographic groups and specific temporal trends. Lastly, we investigate the demographic and socioeconomic drivers of redemption behaviours, giving special attention to the respondents who collect reward points without redeeming them. Results The analysis reveales several key insights into reward point redemption behaviours within the IAB-OPAL panel survey. Respondents exhibited a wide range of behaviours, from frequent small redemptions to rare but large-point redemptions. Through clustering methods, distinct behavioural groups were identified. Added Value Our findings shed light on the dynamics of reward point redemption in online panels and have practical implications. We provide valuable guidance for designing online panel surveys that may incorporate a points-based incentive program. Moreover, our results can assist survey practitioners in budget planning, decision-making, and fieldwork preparation. |
2:30pm - 3:30pm | 4.4: Poster Session Location: Foyer OG |
|
Children's Social Media Behavior and Sociodemographics: A Segmentation Approach FH Wiener Neustadt GmbH, Austria Relevance & Research Question The growing influence of social media has sparked increasing concerns about its impact on digital well-being, particularly among young people. A survey from 2023 revealed that adults are highly concerned about the effects of social media on their children's mental health (Knight Foundation, 2023). Our literature review highlighted key areas of existing research, such as the relationship between parents' social media sharing and children's privacy (Ong et al., 2022), the adverse effects of digital media on toddlers (e.g., Barr, 2022), emotional developmental delays, anxiety, and depression among children (O’Riley et al., 2018; Primack et al., 2017), and the short-term benefits of social media breaks (e.g., Brown & Kuss, 2020). Building on these findings, the current research aims to (1) explore adolescents’ (10-14 years) social media behavior and (2) identify different social-media-user segments based on their social media behavior and sociodemographic variables. Methods & Data Results Added Value Exploring adolescents’ social-media-behavior is critical from both a scientific and a practical perspective by providing insights into how different segments of adolescents engage with social media, enabling tailored interventions for digital well-being, education, and digital literacy. Additionally, it supports policymakers in creating ethical, segment-specific strategies for communication, product design (e.g. apps), and online safety regulations. Smart Survey Implementation: Experiences from experiments in three European countries 1University of Mannheim, Germany; 2Statistics Norway; 3Utrecht University Relevance & Research Question Smart surveys combine surveys with smart elements from sensors, for example, the use of the smartphone camera for receipt scanning in a household budget survey and the use of geolocation tracking to identify activities in time use surveys. At this point, relatively little is known on how to best implement smart surveys in the general population for official statistics, and what influence the different features of smart surveys have on participation behavior. Methods & Data In 2024, fieldwork experiments were conducted in Norway, Belgium, and Germany to test various options in how to design and field smart surveys as part of national household budget surveys and time use surveys. The experiments in the three countries varied several design features to test their effect on recruitment and participation rates to smart surveys. In Norway, the use of different platforms from which the data collection app could be accessed and the use of CATI interviewers for recruitment and follow-up was tested. In Belgium and Germany different recruitment protocols were tested, including the use of different appeals in the invitation letters focusing on features of the smart survey (e.g., use of the camera to scan receipts) and secondary data collection modes (e.g., paper questionnaires instead of app). Results We find that recruitment and participation rates vary across countries, and that the differences between within-country experimental conditions are relatively small. The poster will present results on differential recruitment and participation rates and nonparticipation bias in the three countries. Added Value This research is part of the Smart Survey Implementation (SSI) project, funded by EUROSTAT, which aims to enhance data collection for official statistics across Europe through digital innovation. This experiment specifically addresses recruitment challenges in app-based surveys and evaluates the potential of mobile technology to streamline participation in official household budget and time use surveys. Scenario-Based Measures of Smartphone Skills in Online Surveys University of Mannheim, Germany Relevance & Research Question Digital skills have become important for navigating in today’s information society. While prior digital inequality research has mostly focused on studying general internet uses and skills, research on smartphone-specific inequalities is still scarce. In addition, existing measurement instruments mostly rely on survey-based self-reports or small-scale laboratory-based performance tests that are susceptible to measurement and representation errors. In this study, we examine the feasibility of using novel scenario-based measures to evaluate the level of smartphone skills in the general population. Scenario-based measures evaluate smartphone skills by assessing how well respondents perform a set of smartphone activities described in a hypothetical situation. Methods & Data Data were collected in the German Internet Panel, a probability-based online panel of the general population aged 16-75 in Germany, in March 2022. Respondents were asked to answer three scenario-based questions and rate their general smartphone skills. The scenario-based questions asked respondents to correctly order a set of steps to carry out smartphone activities, such as buying a train ticket with an app that is not yet installed on their device. We examine response distributions and correlations between the scenario-based and self-reported measures. We also assess whether predictors of smartphone skills differ between the two measures. Results The scenario-based and self-reported measures are significantly positively correlated and measure the same underlying construct as determined by an exploratory factor analysis. Compared to self-reports, the scenario-based measures, however, have substantially greater rates of item-nonresponse. Older and less educated smartphone owners are significantly less likely to respond to the scenario-based questions. The predictors of smartphone skills differ by respondents’ sociodemographic characteristics across the two measures. Older, female, and more educated respondents are more likely to underreport their smartphone skills in the self-report compared to the scenario-based questions. Added Value Methodologically, we demonstrate the feasibility of using scenario-based measures of smartphone skills in an online survey. Substantively, we contribute to the growing body of research on the second-level smartphone divide. Who Donates Their Google Search Data? Participation in a Data Donation Study During the 2025 German Federal Election 1GESIS, Germany; 2GESIS Germany RELEVANCE & RESEARCH QUESTION Data donation is a relatively new, user-centered approach to METHODS & DATA: Wave 60 participants from the GLES Tracking pre-election survey (n ≈ 2,000, CS, RESULTS: Data is collected one week after the 2025 German Federal Election (February 23, 2025). At ADDED VALUE: This study provides insights into how sociodemographic characteristics relate to |
2:30pm - 3:30pm | 4.5: Poster Session Location: Foyer OG |
|
Pay to Stay? Examining the Long-Term Impact of Initial Recruitment Incentives on Panel Attrition German Center for Integration and Migration Research, Germany Relevance & Research Question This study explores the long-term effects of incentives provided during the recruitment wave of a panel study, focusing on their impact on panel consent and subsequent dropout risks. Specifically, it examines how the amount and conditionality of incentives influence participation decisions and whether transitioning from a pre-paid incentive in the recruitment wave to a post-paid incentive in subsequent waves affects dropout rates. These questions are critical for developing sustainable strategies for panel recruitment and retention, particularly in the context of probability-based online access panels. Methods & Data The analysis is based on data from the DeZIM.panel, a probability-based online access panel in Germany. Logistic regression models were used to examine panel consent and participation behaviours across different experimental conditions. The dataset includes responses from 9,168 participants in the recruitment wave and longitudinal data spanning 70,926 person-years across subsequent waves. The experimental design varied incentive types (prepaid vs. postpaid) and amounts (€5 vs. €10), enabling a detailed assessment of their effects on respondent behaviour. Results The findings reveal that pre-paid incentives significantly reduce panel consent compared to post-paid incentives. However, the amount of the incentive (€5 vs. €10) does not significantly influence consent rates. Long-term analyses show no substantial effects of either the incentive type or amount on participation rates across subsequent panel waves. Furthermore, switching from a prepaid incentive in the recruitment wave to a postpaid incentive in the first panel wave does not increase dropout risks in future waves. These results suggest that higher or unconditional incentives do not necessarily yield higher response rates or long-term participation in panel studies. Added Value This study contributes to the literature on survey methodology by challenging the common assumption that higher or unconditional incentives automatically enhance response rates. The findings emphasize the importance of designing incentive strategies that balance short-term participation gains with the long-term sustainability of panel studies. By providing empirical evidence from a probability-based online panel in Germany, the study highlights the potential for more cost-effective and nuanced incentive policies in maintaining panel participation over time. Socially Desirable Responding in Panel Studies – How Does Repeated Interviewing Affect Responses to Sensitive Questions? GESIS Leibniz Institute for the Social Sciences, Germany Relevance & Research Question Using Apps to Motivate Respondents: Experiences from an Experiment in a Refugee Panel Study 1Leibniz Institute for Educational Trajectories, Germany; 2infas Institute for Applied Social Science, Germany Relevance & Research Question Be it online panels or traditional panel studies, the commitment of participants to a longitudinal study is essential for the stability of a panel. In many panels, respondents are therefore contacted by letter at regular intervals with information about the study. With the growing importance of smartphones, tailored apps that use smartphones as a channel to communicate with respondents can help to keep in touch with respondents by sending them notifications and messages about the study. In this context the question arises, whether it makes sense to contact respondents more frequently or less frequently via app and how do respondents react to notifications? Methods & Data In the study “Refugees in the German Educational System (ReGES)”, an experiment was conducted with the “my infas” app that varied the frequency of contact via the app. One experimental group of respondents was contacted 6 times, while the rest were only contacted 12 times in the same period. The data from this experiment is used to cluster the participation behavior more precisely using a sequence analysis. Results Of the 2,740 respondents to whom we sent messages via the app as part of the experiment, only 271 people read at least one message. Due to this low number of participants, the potential for analysis is limited. Nevertheless, it is possible to draw some insights from this experiment into how respondents react to messages. Four clusters were identified: The One-Time Clickers, the Curious, the Awakened and the Interested. If we look not only at whether the message and the results were read, but also at when the messages were read, we see that many respondents read the messages very late (on average 28 days after receiving them). Added Value As many people do not react to them at all or very late, the analyses suggest that using an app in panel studies is not a guaranteed success, but potential issues such as respondents turning off push notifications or uninstalling the app need to be considered as time-critical messages sent via apps may not reach all respondents. |
3:30pm - 3:45pm | Break |
3:45pm - 4:45pm | 5.1: Innovations in Sampling Location: Hörsaal A Session Chair: Sebastian Lundmark, University of Gothenburg, Sweden |
|
Horoscoping and Sampling: Preregistered Exploration of the Impact of Birth Month on Research Outcomes via the ‘Whose Birthday Is Next’ Sampling Strategy 1GESIS - Leibniz Institute for the Social Sciences, Germany; 2Tilburg University, Netherlands; 3Utrecht University, Netherlands Relevance & Research Question A large corpus of studies across various domains has demonstrated a birth month effect, wherein individuals born in specific months display distinct outcomes compared to those born in other months with respect to areas such as health, socioeconomic status, and behavior. In contrast, the common use of birthday-based sampling methods (e.g., selecting respondents in a household whose birthday is next or was last) in large-scale surveys assumes that birth month is uncorrelated with outcome variables. If a birth month effect exists, this assumption may introduce bias, particularly when comparing groups with systematic differences in household size, such as non-Western immigrants and majority populations in Western Europe. We first develop a theoretical framework to explore the relationship between birth month effects and potential biases in birthday sampling designs. We conduct a preregistered empirical analysis using the LISS panel (Longitudinal Internet studies for the Social Sciences), a probability-based online panel of Dutch households. In the LISS panel, data is collected from all individuals within a household aged 16 and above. Through simulations across 12 different fieldwork periods (i.e., months of the year), we assess the extent of bias that might arise if the LISS panel employed the next-birthday sampling method instead. We examine 35 variables, including personality traits, health outcomes, and socioeconomic status, to evaluate the potential impact on research outcomes. Our analysis does not reveal evidence for a strong birth month effect for the selected variables. The simulations show that, for the Dutch context, the next-birthday sampling method does not introduce substantial bias for the variables of interest. Though a null finding, our study provides important insights for survey methodology. It suggests that, in the Netherlands, next-birthday sampling is unlikely to produce bias related to birth month effects, at least for the way how these variables are commonly measured in social science surveys. This contributes to the ongoing discussion on sampling methods and enhances the reliability of results in large-scale surveys. Sampling Refugees in Countries of First Refuge – An International Snowball Sampling Approach with Multiple Target Populations Bielefeld University, Germany Relevance & Research Question As reported by the UNHCR for 2023, the majority of refugees are hosted in low- and middle-income countries (75%) and countries that neighbor their country of origin (69%). For refugees who are particularly vulnerable and unable to return to their country of origin, resettlement programmes aim to provide long-term prospects by resettling them to Western countries. And despite the fact that only a small proportion of those in need for resettlement are actually resettled (8% according to UNHCR), little is known about the living situation of those who are left behind. This study addresses this gap by conducting a web-survey targeting potential resettlement refugees using social contacts of already resettled refugees. Methods & Data The study uses addresses of all refugees resettled to Germany since 2013 (i.e., approx. 17,000) from the German Central Register of Foreigners (AZR) to invite participants via postal mail to a web-survey. This survey marks the start of a snowballing approach, where refugees in Germany function as seeds and are asked to forward a survey-link (with a mobile “share”option) to up to three contacts, who still reside in countries of first refuge and may be eligible for resettlement (target population A). To reduce the risk of realizing too few cases – especially in the first step from Germany to abroad –, participants are also asked to share the survey to refugees who may not be eligible for resettlement (target population B). In the subsequent steps, this snowballing process continues in the countries abroad. Results Although data collection is ongoing at the time of the GOR conference, findings from cognitive pretests and from data already collected will be presented, offering insights into the potential but also challenges of surveying refugees via international snowball sampling. Added Value This study contributes to the understanding of the living conditions of refugees while offering methodological insights into sampling hard-to-reach populations. By demonstrating how snowball sampling with multiple target populations can mitigate recruitment challenges, it provides valuable lessons for researchers focusing on vulnerable groups. Social Media Sampling for Quantitative Surveys in Hard-to-Reach Countries Bilendi & respondi, France Relevance & Research Question Traditional online survey panels often lack coverage in smaller or less digitally integrated countries, limiting researchers' ability to collect reliable data from these regions. Social media sampling presents a promising alternative for quantitative surveys in such contexts. This study investigates the feasibility, reliability, and potential biases of social media sampling as a data collection method. Using a multi-country survey spanning 18 nations—including Zimbabwe, Kazakhstan, Costa Rica, and Iceland—we address the question: Can social media sampling provide reliable insights for brand perception and societal measures, in countries where online panels are unavailable? The study deployed a quantitative survey during June and July 2024 through targeted social media advertisements, optimized to recruit representative samples across diverse demographics. In total n9.000 Participants responded to a standardized questionnaire including brand image measures and selected questions from the World Happiness Report. Sampling quotas and algorithmic targeting ensured coverage of gender, age, regions, ethnicities and income levels each country. Reliability was assessed through comparative analysis against external data sources, where available, and consistency checks within the datasets. Results The results demonstrate that social media sampling can effectively generate diverse, balanced samples in countries lacking established online panels. Across the 18 countries, response rates and sample representativeness varied but were sufficient for robust analysis. Insights from brand image and happiness metrics revealed consistent trends across nations and offered valuable local context, while some limitations, such as underrepresentation of older rural populations, and low-income groups and some ethnicities in African countries were noted This research highlights the untapped potential of social media as a viable sampling solution for most of the investigated countries. By showcasing a rigorous approach to survey design, execution, and evaluation, this paper contributes a practical framework for using social media to extend the reach of quantitative research globally. The findings are particularly relevant for researchers, seeking solutions for data collection in emerging and underrepresented markets. |
3:45pm - 4:45pm | 5.2: Respondent Nudging and Incentives Location: Hörsaal B Session Chair: Georg-Christoph Haas, Institute for Employment Research, Germany |
|
Knock-to-nudge methods to improve survey participation in the UK University of Southampton, United Kingdom Relevance & Research Question The knock-to-nudge is an innovative method of household contact, first introduced during the COVID-19 pandemic when face-to-face interviewing was not possible. In this approach, interviewers visit households and encourage sampled units to participate in a survey through a remote survey mode (either web or telephone) at a later date. Interviewers may also collect contact information, such as telephone numbers or email addressed, or conduct within-household selection of individuals on the doorstep. This approach continued to be used post-pandemic, but there remains a knowledge gap regarding its advantages and disadvantages. It is still unclear whether knock-to-nudge approach leads to improvements in sample composition and data quality. This study contributes to the under-researched area of knock-to-nudge methods. The results indicate that, when carefully designed and implemented, this approach can enhance recruitment efforts and improve sample composition of the resulting samples in surveys. Recruitment incentive experiment of the probability-based panel Health in Germany: results on outcome rates, non-response bias and panel case costs 1Robert Koch-Institut, Germany; 2infas Institut für angewandte Sozialwissenschaft GmbH Relevance & Research Question The Robert Koch Institute (RKI) set up a probability-based panel infrastructure focused on public health research (‘Health in Germany’) (registered active panelists about 47.000). Due to declining response rates in recent decades, incentives have become increasingly important. Incentive experiments are therefore often carried out in order to achieve high data quality with a lower use of resources. Therefore, for the first recruitment study of the panel Health in Germany the RKI conducted an incentive experiment with a random sub-sample in order to test the effectiveness of different incentive schemes. The central questions of the incentive experiment are: Differentiated by incentive group (a) the response rate for panel registration, (b) the possible distortion due to non-response bias and (c) the panel case costs. The study population comprises all persons aged 16 and over living in Germany. Around 170,000 addresses of the residents' registration offices were used as a random sample. The field period ran from January to May 2024. Some of the target persons were randomly selected for an incentive experiment with four groups of 1440 individuals each. The incentives were either paid unconditionally beforehand (‘before’) or were linked to registration for the panel (‘after’). The incentive schemes of the groups were: (1) €5 before, €10 after, in cash; (2) €10 after, in cash; (3) €5 before, €10 after, as a voucher; (4) no incentive at all (control group). Results The incentive experiment replicates existing incentive experiment studies. The use of cash instead of vouchers and the unconditional payment (‘in advance’) of €5 substantially increases the willingness to participate (about 12 percentage points difference). Particularly in the hard-to-reach population group of people with a low level of education, a less biased sample composition can be observed in comparison to all other incentive groups. The results of the incentive experiment provide insights into how high data quality can be achieved with fewer resources. In view of the large number of cases and probabilistic sampling, the findings can be transferred to similar epidemiological research projects. |
3:45pm - 4:45pm | 5.3: LLMs and Synthetic Survey Data Location: Hörsaal C Session Chair: Johanna Hölzl, University of Mannheim, Germany |
|
Synthetic Respondents, Real Bias? Investigating AI-Generated Survey Responses 1Lund University, Sweden; 2Utrecht University, The Netherlands Relevance & Research Question The idea to simulate survey respondents has lately been seen as a promising data collection tool in academia and market research. However, previous research has shown that LLMs are likely to reproduce human biases and stereotypes existent in their training data. Because of this, we further investigate the potential benefits and challenges of creating synthetic response datasets by following two major aims: 1. investigate whether AI tools can replace real survey respondents, and if yes, for which questions and topics, and 2. explore whether intentional prompts reveal underlying biases in AI prediction. Methods & Data We compare already existing survey data from the German General Social Survey (Allbus) 2021, to AI-generated synthetic data with the OpenAI model GPT-4. For this, we took a random sample of 100 respondents from the Allbus dataset and created a so-called AI-Agent for each. Each Agent was calibrated based on general instructions and individual background information (14 variables). We chose to predict three different types of outcomes, including a numerical, binary, and open text/string format, each of them inheriting the potential to provoke certain biases, such as social desirability, gender, and age stereotypes. Furthermore, each item was tested across different contextual factors, such as AI model calibration and language settings. Results We found a deep lack of accuracy in the simulation of survey data for both numerical (r = -0.07, p = 0.6) as well as binary outcomes (χ² (1) = 0.61, p = 0.43, V = 0.1), while the explanatory power of the background variables for the predicted outcome, was high for both the former ( = 0.4) and the latter ( = 0.25). Furthermore, we found no difference in the prediction accuracy between different input languages and AI model calibrations. While predicting open-text answers, individual background information was generally well considered by the AI tool. However, several potential biases became apparent, such as age, gender, and regional biases. Added Value Our research contributes to a more ethically responsible application of AI tools in data simulation, highlighting an urgent need for more caution in the already-started utilization of AI-generated datasets. Talk, talk, talk - unlocking AI for conversational research Human8 Europe, Belgium Relevance & Research Questions Now, also AI-moderation plugins can be used for conversational research. There are lots of opinions on conversational AI and AI-moderated interviews. Often these tools are assumed to bring in more speed and efficiency or considered as an option to conduct qualitative research with a large sample, like quantitative research. Human8 has used AI moderation technology and examined where and how this can provide benefits or what limitations there are.
Methods & Data In 2024, AI-moderated interviews have been implemented into research. The participant reads or is read a question and can then answer the question using voice. Research participants can record their responses instead of typing them. AI automatically transcribes the feedback and goes even one step further by probing intelligently, asking relevant follow- up questions by considering both the project objectives that we shared and the participant feedback. And AI also helps with the data processing. In an A-B-experiment with n=30, we compared traditional feedback in insight communities, where participants typically engage asynchronously with typed responses, to AI-moderated interviews, where participants respond vocally and receive dynamic, real-time follow-up questions from the AI.
Results We found that AI tools for conversational research are an enrichment – linked to a variety of benefits, opening up new options for qualitative research. We captured twice as much data. Talking freely allowed participants to avoid over-rationalizing or filtering their emotions, resulting in feedback that was richer, more emotional and more contextual. AI now enables us to use voice at scale. And, participants were highly satisfied. They liked using the tool. The combination of voice, AI-driven probing and processing of voice and human analysis allowed us to unlock actionable insights. It gave our client the depth they needed to fuel their activation strategy. Added Value Our results examine use cases of AI moderation technology and uncover what to consider when using these tools. We will share our learnings on how to unlock the potential of these tools with the audience and open the discussion. |
3:45pm - 4:45pm | 5.4: Social Media Influence in War Times Location: Hörsaal D Session Chair: Zaza Zindel, Bielefeld University, Germany |
|
Rumor propagation across war stages: Influences of psychological, social, and mainstream media factors The Max Stern Yezreel Valley College, Israel Relevance & Research Question This study investigates the psychological and media-related factors influencing rumor propagation during the Israel-Hamas 2023-2024 war, comparing early (first month) and later stages (ninth month) of the conflict. Using the unprecedented case of prolonged conflict in Israel, we examine how situational anxiety, trait anxiety, and psychological closeness interact with media consumption patterns to affect rumor dissemination. The research explores how these relationships evolve over time, contributing to our understanding of information behavior during extended crisis situations. Methods & Data Data were collected through a longitudinal survey of 347 Jewish-Israeli participants recruited via the Midgam Panel, using stratified sampling aligned with Israeli Central Bureau of Statistics demographics. The initial sample of 500 participants achieved a 70% retention rate at the nine-month follow-up. Measurements included validated scales for state and trait anxiety (adapted from STAI), psychological closeness, media consumption across platforms, and rumor-spreading behavior. Analyses employed t-tests, correlations, and Hayes' PROCESS macro for mediation analysis. Results While rumor-spreading levels remained stable across time periods, significant decreases were observed in situational anxiety, psychological closeness, and both social and mainstream media consumption between early and late stages. Social media platforms consistently showed stronger associations with rumor spread than mainstream media, with correlation strengths increasing over time. In the early stage, psychological closeness and trait anxiety significantly predicted rumor spread, mediated by media consumption. However, in the later stage, these factors directly influenced rumor spreading without significant mediation effects, indicating an evolution in the psychological mechanisms driving information sharing. Added Value This study advances crisis communication theory by demonstrating how psychological and media factors evolve differently in their influence on rumor propagation during prolonged conflicts. The findings reveal that while overall rumor-spreading behavior remains stable, the underlying psychological and media-related mechanisms shift significantly. These insights contribute to developing more effective, stage-specific strategies for managing information flow during extended crises, particularly highlighting the distinct roles of social and mainstream media in different conflict phases. Social Media, Anxiety, and Ethnic Disparities - The Case of the Jewish and Arab Population Groups Following October 7 in Israel 1Max Stern Yezreel Valley College, Israel; 2University of Washington, USA; 3Bar Ilan University, Israel Relevance & Research Question Methods & Data What makes media contents credible? A survey experiment on the relative importance of visual layout, objective quality and confirmation bias for public opinion formation University of Konstanz, Germany Relevance & Research Question The emergence of social media has transformed the way people consume and share information. As such platforms widely lack mechanisms to ensure content quality, their increasing popularity has raised concerns about the spread of fake news and conspiracy beliefs – with potentially harmful effects on public opinion and social cohesion. Our research aims to understand the underlying mechanisms of media perception and sharing behaviour when people are confronted with factual vs conspiracy-based media contents. Under which circumstances do people believe in a media content? Do traditional indicators of quality matter? Are pre-existing views more important than quality (confirmation bias)? How is perceived credibility linked to sharing behaviour? Methods & Data To empirically assess these questions, we administered a survey experiment to a general population sample in Germany via Bilendi in August 2023. As respondents with a general susceptibility to conspiracy beliefs are of major substantive interest, we made use of responses from a previous survey to oversample “conspiracy thinkers”. Respondents were asked to evaluate the credibility of different media contents related to three vividly debated topics: vaccines against Covid-19, the climate crisis and the Ukraine war. We analyze these evaluations regarding the quality of the content (measured by author identity and data source), its visual layout (newspaper vs tweet), and previous respondent beliefs on the respective topic to measure confirmation bias. Results Our findings suggest that the inclination to confirm pre-existing beliefs is the most important predictor for believing a media content, irrespective of its quality. This general tendency applies to both, the mainstream society and “conspiracy thinkers”. However, according to self-reports the latter group is much more likely to share media contents they believe in. Added Value Methodologically, we use an interesting survey experiment that allows us to vary opinion (in)consistency and objective quality of media contents simultaneously, meaning that we can estimate the relative effect of these features on the credibility of media contents. We provide insights into the underlying mechanisms of the often debated spread of conspiracy beliefs through online platforms, with their practical implications for public opinion formation. |
3:45pm - 4:45pm | I: Impact & Innovation Award I Location: Max-Kade-Auditorium Session Chair: Yannick Rieder, Janssen EMEA, Germany |
|
Optimizing Loyalty Programs with AI-Driven Insights: A Case Study Lakmoos AI and Prague University of Business and Economics, Czech Republic Relevance & Research Question Methods & Data To address this, Lakmoos AI designed a targeted survey combining synthetic data and customer segmentation analysis: 1.Targeted Survey Design: The survey explored customer opinions on current loyalty features, preferences for new incentives, and variations in preferences across demographic groups. 2.Synthetic Panel Data Simulation: Simulated customer profiles provided predictive insights into how different demographics might respond to features like cashback, reward points, and exclusive benefits. 3.Real-Time Data Validation: Survey findings were validated against real customer data to ensure insights were accurate and actionable. 4.Quantitative and Qualitative Integration: Alongside quantitative questions, open-ended prompts offered qualitative insights into customer priorities and suggestions for program enhancements. Results Key insights revealed:
Added Value The findings led to impactful changes for Raiffeisenbank: 1.Program Redesign: Enhanced loyalty programs now include more cashback options, personalized recommendations, and gamified features for younger users, boosting enrollment and engagement. 2.AI Integration: The successful pilot led to AI-driven research being integrated into Raiffeisenbank’s standard processes, enabling faster and more precise customer insights. 3. Synthetic Panels in Design Sprints: The adoption of synthetic panels in research democratized insights, making design sprints more inclusive and efficient. These measures strengthened Raiffeisenbank’s ability to create customer-centric, innovative loyalty programs, enhancing their competitive position in the market. From Words To Numbers: How To Quantitatively Size and Profile Qualitative Personas 1Factworks, Germany; 2Yahoo Relevance & Research Question Yahoo News sought to validate and size user personas based on preexisting qualitative findings to optimize product and marketing strategies. With a strong user base and brand identity, Yahoo News aimed to drive revenue growth. However, the team lacked a unified understanding of core users, which slowed innovation and delivery. They initially developed five personas through qualitative interviews but needed quantitative validation to determine their accuracy and distribution. Factworks was engaged to refine and enrich these personas for strategic prioritization. Methods & Data To quantify personas, Factworks translated qualitative insights into a structured survey administered to a representative Yahoo user sample. Instead of traditional segmentation techniques, the k-nearest neighbors algorithm was used to classify users based on similarity to preidentified personas. Respondents too dissimilar to any persona were excluded. This approach allowed Yahoo to size each persona group and prioritize them strategically. Statistical testing helped identify key distinguishing traits, creating more distinct personas. Additionally, a Typing Tool was developed, enabling persona classification via a short questionnaire in Excel for future use in individual and batch scoring.
Added Value Yahoo applied these insights to redesign and modernize its Homepage, tailoring updates to the three target personas. Results from iterative experiments included:
Following the launch on June 13, 2024:
Intelligent Documents: Evolving Existing Research Workflows to Ease AI Adoption 1Inspirient; 2Verian Relevance & Research Question |
4:45pm - 5:00pm | Break |
5:00pm - 6:00pm | 6.1: Increasing Survey Response Location: Hörsaal A Session Chair: Barbara Felderer, GESIS, Germany |
|
A Simple Invitation: A Study on the Impact of Simplified Invitation Letters on the Willingness to Join a Probability-Based Web Panel in Sweden The SOM Institute, Sweden Relevance & Research Question Reducing nonresponse bias in the recruitment to probability-based online access panels is essential for any panel striving to achieve accurate inferential statistics. However, certain groups of the population tend to be more difficult to recruit. For example, research have identified that foreign-born individuals and individuals with less educational attainment were more difficult to recruit. Targeted efforts to enhance recruitment and reduce breakoffs in these groups might be especially efficient for reducing potential nonresponse bias. The present study assessed whether simplified invitation letters increased recruitment and reduced breakoffs in a probability-based web panel and whether a simplified language improved those rates even more so among hard-to-recruit subgroups of the population. Methods & Data In a probability-based recruitment to the Swedish Citizen Panel (SCP) conducted in fall 2024, the sample was randomly assigned into one of two groups; One group (N = 9,000) were assigned to receive an invitation and reminder letter with the standard language typically used in SCP recruitment. The experimental group (N = 9,000) received a revised version of the letter, written in simpler language that avoided academic jargon and words with many syllables. Results Data will be collected in November-December 2024 and will be analyzed and reported in an updated abstract in early January 2025. The effects on recruitment and breakoffs will be analyzed for the full sample, as well as for subgroups based on register information on sex, age, education and immigrant background. Added Value The present experiment will demonstrate whether simplified invitation letters increase recruitment rates and decrease breakoffs in a probability-based online access panel, with a particular focus on the impact among hard-to-recruit populations. Simplified invitation letters may offer a cost-effective method to boost recruitment- and breakoff rates as well as reduce demographic skewness, and, in turn, nonresponse bias. Picture this! The influence of stressing the camera feature in the mail invitation to an app-based household budget survey on participation behavior 1University of Mannheim, Germany; 2Destatis - Federal Statistical Office Germany, Germany Relevance & Research Question Ask Me Now or Lose Me Later – The Impact of Immediate Follow-Up on Participation Rates, Retention and Data Quality in Web Panels. University of Gothenburg, Sweden GOR 2025 Conference
Ask Me Now or Lose Me Later – The Impact of Immediate Follow-Up on Participation Rates, Retention and Data Quality in Web Panels Tilda Ekström, Alexandra Garcia Nilsson, and Anders Carlander
Relevance & Research Question: Ensuring high participation rates and respondent retention is essential for sample and data quality in web panels. This study examined whether the time interval between recruitment and invitation to a respondent’s first panel wave affected their likelihood to participate, of unsubscribing from the panel, and the quality of their survey responses. Drawing on Construal Level Theory (CLT), we posit that a longer interval may increase psychological distance and therefore decrease engagement in the web panel. By varying the time between recruitment and the first survey invitation, we aim to present optimal strategies to enhance participation and reduce attrition in web panels. Methods & Data: This study employed an experimental design using a newly recruited (spring 2024) non-probability sample of panelists (N = 3,140) from the Swedish Citizen Panel. Half of the sample (n = 1,570) were randomly assigned to receive their first panel wave invitation shortly after being recruited, whereas the other half (n = 1,570) were assigned to receive their first invitation six months after being recruited. Both groups responded to the same survey in December 2024. Results: Data will be collected in December 2024 and will be analyzed and reported in an updated abstract in early January 2025. Added Value: Understanding the timing of follow-up surveys after recruitment may be essential for deciding whether panelists benefit from an immediate invitation to complete a panel wave shortly after being recruited or if delaying the invitation may be detrimental for engagement. Our findings provide insights for the management of web panels and strategies on newly recruited panelists to reduce dropout rates and improve data quality. |
5:00pm - 6:00pm | 6.2: Mixed Mode and Mode Transitions Location: Hörsaal B Session Chair: Björn Rohr, GESIS - Leibniz Institute for the Social Sciences, Germany |
|
Large-scale social surveys without field interviewers in the UK: An evidence review 1University of Southampton, United Kingdom; 2University of Essex, United Kingdom; 3City, University of London, United Kingdom Relevance & Research Question Data collection organisations are shifting toward new approaches, with social surveys undergoing significant design and implementation changes. Since the COVID-19 pandemic, agencies have increasingly moved to online data collection due to dwindling response rates and rising fieldwork costs. A key challenge for self-completion general population surveys is the absence of field interviewers to facilitate recruitment and guiding respondents through the survey process. This research examines the UK survey landscape, aiming to identify recruitment methods for self-administered surveys, that can produce more representative samples of the general population.
We present findings from an information request sent to the UK’s nine most important survey agencies. We collected information on surveys without field interviewers conducted between 2018 and early 2024, including publicly available technical and methodological reports and other survey materials, along with internal reports provided by the agencies. We processed and codified this information, building a spreadsheet containing 144 instances of 59 longitudinal and cross-sectional surveys, along with 227 communication materials. Results The responses for the surveys in our dataset use 57% online, 38% paper, and 5% telephone modes. Most surveys (84%) offer incentives to participants, with 92% being monetary and only 33% given unconditionally. Response rates vary widely – household-based cross-sectional surveys tend to have lower response rates (81% at 30% response or lower) than individual-based ones (47% at 30% or lower). Longitudinal surveys generally have the highest response rates. While only 35% of reports assess sample representativeness, the general trend confirms that mixed-mode surveys yield more representative samples than single-mode surveys. Added Value To our knowledge, this review is the first coordinated effort to collate and summarise recruitment strategies for surveys without field interviewers in the UK. It covers sampling design, communication strategies and materials, incentivisation, fieldwork procedures, response rates, and report quality assessments. Our dataset provides insights into the current state of survey practice and helps identifying practices that might contribute towards higher response rates and better sample composition. Does web as first mode in a mixed-mode establishment survey affect the data quality? Institute for Employment Research, Germany Relevance & Research Question Due to declining response rates and higher survey costs, establishment surveys are (or have been) transitioning from traditional interviewer modes to online and mixed-mode data collection. Previous analyses have shown that mixed-mode designs maintain response rates at lower costs compared to face-to-face designs, but the question remains to what extent introducing the online mode affects measurement quality – this has rarely been addressed in the establishment survey literature. Methods & Data The Establishment Panel of the Institute for Employment Research (IAB) was primarily a face-to-face survey until 2018. Since then, the IAB has experimented with administering a sequential web-first followed by face-to-face mixed-mode design versus the traditional face-to-face design. We address our research question by using this data and comparing the survey responses from the single- and mixed-mode experimental groups to corresponding administrative data from employer-level social security notifications. The accuracy of survey responses in both mode designs is assessed and measurement equivalence is evaluated. Especially a lot of open-ended variables on numbers of employees with certain characteristics is used and additionally, we report on differences in accuracy between the individual web and face-to-face modes. Furthermore, we consider differences for several alternative data quality indicators, including item nonresponse, social desirability responding, and the use of filter questions. To account for selection and nonresponse bias weights are used throughout the analysis and as sensitivity checks, weights are estimated in different ways. In addition to propensity scores, random forest, and extreme gradient boosting were also applied. Results Preliminary results show that measurement error bias in online interviews is sometimes larger than in face-to-face interviews but compared to the mixed-mode design the difference is not significant anymore. Looking at sensitive questions it cannot be confirmed that online respondents answer more socially desirable. Further findings indicate slightly larger item nonresponse in the online mode but considering the sequential mode design no difference can be found. Added Value Thus, the study provides comprehensive insights into data quality for mixed-mode data collection in establishment surveys and informs survey practitioners about the implications of switching from single- to mixed-mode designs in large-scale establishment panels. Examining Differences in Face-to-Face and Self-Administered Mixed-Mode Surveys: Insights from a General Social Survey GESIS, Germany Relevance & Research Question General social surveys are traditionally conducted face-to-face, maintaining long time series for tracking public opinion trends. To ensure comparability, survey designs typically change minimally over time. However, face-to-face surveys have been experiencing declining response rates and higher costs. As a result, self-administered mixed-mode designs have gained popularity due to their ability to circumvent these challenges. Since switching modes is a major methodological change, investigating data comparability with the original mode is critical. Furthermore, self-administered mixed-mode designs can be implemented in two ways: concurrent or sequential. Each result in different proportions of web and mail survey responses. This raises the question: Does this difference in proportions affect comparability with face-to-face surveys? Methods & Data This study uses the German General Survey 2023 (ALLBUS), which surveys the general population aged 18 and older and is traditionally conducted face-to-face. In 2023, ALLBUS included three randomized experimental groups: (1) face-to-face, (2) concurrent self-administered mixed-mode (mail and web), and (3) sequential self-administered mixed-mode (mail and web). This study examines data comparability by evaluating differences in nonresponse bias, sample composition, and measurement between face-to-face and the two mixed-mode designs. Results Overall, both self-administered mixed-mode designs produce similar results, with both showing slight strengths. The sequential design is slightly more similar to the face-to-face design in terms of nonresponse bias and sample composition. In contrast, the concurrent design achieves slightly smaller measurement differences compared to the face-to-face design. Added Value This study offers valuable insights into the shift from face-to-face to self-administered mixed-mode designs by comparing concurrent and sequential approaches. It highlights their strengths in maintaining data comparability, with both designs producing similar results overall. This indicates that web and mail modes are comparable, as the proportion of these two modes varies between designs.
|
5:00pm - 6:00pm | 6.3: Survey Innovations Location: Hörsaal C Session Chair: Almuth Lietz, German Center for Integration and Migration Research, Germany |
|
Optimising Online Time Use Surveys: Balancing Quality, Efficiency, and Inclusivity National Centre for Social Research, United Kingdom Relevance & Research Question Key words: time use survey, online diary data collection Time-use surveys provide invaluable insights into how individuals allocate their time across various activities. Traditionally, paper-based diaries, often administered through face-to-face interviewers, have been the primary method for gathering such data. Advances in technology have enabled the development of online diary tools, but while digital tools offer advantages like reduced costs, cleaner data, and increased flexibility, they also raise concerns about data quality and inclusivity. In recent years, two time use surveys using online diary tools have been developed and used in the UK, providing a unique opportunity to explore the benefits and challenges of this methodology. This study assesses the impact of design choices on the quality and efficiency of online time-use surveys. By examining factors such as incentivisation, completion mode, use of support materials, and user interface, we identify strategies for minimising respondent burden while maximising data quality and response rates. Methods & Data Key words: time-use data, respondent burden, respondent journey, desk review, probability-based panel This study analyses eight waves of time-use surveys conducted in the UK between 2020 and 2023, using sample from the NatCen's probability-based panel. A desk review of the online diary tools, participant materials, and analysis of available data (response rates, sample profiles, split experiments, paradata, and respondent feedback) was conducted to assess the performance of the fieldwork design and evaluate the respondent journey. Results Key words: mobile-first design, incentives, response rates, representativeness Several key factors influence the success of online time-use surveys. Invitations and reminders must be timely and effective. Online tools should be designed with a mobile-first approach. Printed support materials can guide participants and improve data quality. A telephone fieldwork option can significantly boost response rates and improve sample representativeness. While higher incentives can increase response rates, their effectiveness diminishes with increasing amounts. However, these benefits must be weighed against additional costs. Added Value Key words: practical considerations, time-use online tool, data quality Practical considerations and recommendations for the design of online time-use studies, ensuring they meet the diverse needs and maximise data quality and sample representativeness. What do participants refer to when asked about their place of residence? 1University of Kaiserslautern-Landau, Germany; 2LMU Munich, Germany Relevance & Research Question Social research is increasingly recognizing the relevance of spatial context to social phenomena. Thus, many surveys ask respondents about neighborhood characteristics. An open question, however, is the appropriate scale of the spatial context: Is it the immediate neighborhood, the municipality, or the county that matters? We investigate the consistency between respondents' subjective perceptions and objective characteristics across these spatial scales. Additionally, we examine how these perceptions differ between socio-demographic groups. Methods & Data The analysis is based on data from the GLEN study, a nationally representative panel study started in 2024. In the self-administered push-to-web survey, respondents were asked to rate their place of residence on a number of characteristics, including green space, labor market, mobility, and health care accessibility. We test whether the respondents’ subjective ratings are consistent with administrative data from official statistics at different spatial scales. Differential effects are tested for dimensions such as age, employment status, and urban vs. rural residence. Results We contribute to the understanding of spatial mechanisms by identifying the spatial scale that best aligns with respondents’ subjective perceptions, and by highlighting how the area of reference may vary across demographic groups. Our results can give guidance for survey practitioners regarding the design of neighborhood questions as well as the selection of spatial context data as auxiliary variables. Will harmony last? - Harmonizing time series survey data with equating under challenging patterns of data availability GESIS - Leibniz-Institut für Sozialwissenschaften, Germany Relevance & Research Question Survey researchers often wish to combine survey data from multiple sources to create longer time series. One central problem when combining survey data is that variables of interest are measured with similar but not identical questions. Differences in the question text or the response options make data from questions incomparable, even though they measure the same construct. In this study, we evaluate the harmonization method Observed Score Equating in a Random-Groups Design (OSE-RG) for time series harmonization. OSE-RG aligns differences between survey questions by transforming the response scale of one question into the format of another question requiring data from the same point in time. The challenge with time series is that we often do not have access to data from both questions at all points in time. To still use OSE-RG for time series harmonization, we need to re-use existing harmonizations at times where only data from one question is available. Thus, our research question is as follows: Is it possible to re-use existing OSE-RG harmonizations over time? Methods & Data To explore the re-usability of OSE-RG harmonizations over time, we harmonize time series of ten pairs of survey questions from three German general population survey programs over a period of 14 years. The central idea is that we create a response scale transformation in one year and then re-use the response scale transformation to harmonize data from other years, tracking harmonization error in the process. Results We find that OSE-RG harmonizations are re-usable over time for some questions, for example questions measuring general health, but not for others, for example political interest questions. We conclude that OSE-RG is a viable choice for harmonizing time series survey data. However, researchers need to be aware of reduced re-usability of OSE-RG in time series harmonization as a potential source of bias. Added Value The added value is threefold: First, we empirically demonstrate how time series survey data can be harmonized using OSE-RG. Second, we pinpoint the re-usability of harmonizations over time as a potential source of bias. Third, we discuss consequences for different patterns of data availability. |
5:00pm - 6:00pm | 6.4: LLMs as Analysis Tools Location: Hörsaal D Session Chair: Jeldrik Bakker, Statistics Netherlands, Netherlands, The |
|
Going Multimodal: Challenges (and Opportunities) of Streamlining Deliverable Production with AI 1Inspirient; 2Verian Relevance & Research Question Among end clients and decision makers, each individual engages differently with the results of market research studies or opinion polls: Some read the headlines, some look at the charts, some read the entire report. The Artificial Intelligence (AI) community has made strides in automating text generation, but their promise of efficiency gains comes with the caveat of lacking trustworthiness. Hence, for this contribution we ask three questions: How can we leverage AI to accurately describe our quantitative results? How can we tune this output so that it helps us produce reports more efficiently? How can we ensure AI-generated text can be trusted to be correct? Methods & Data Verian Germany and Inspirient have worked together for the past three years to make Generative AI applicable to quantitative survey data through automated statistical reasoning. These prior results comprise both visual output of analyses (incl. charts) as well as corresponding formal chains of reasoning steps, which ensures that results can be linked back to source data and thus trusted. We now combine these into “speaker notes” for a Large Language Model (LLM) that we then utilize to generate descriptive textual output for each analytical result. Results Our system is able to generate descriptive text for typical charts that one may find in a survey report. We explain how to set up LLMs to accurately link their output back to their speaker notes, and how to control for this as part of post-processing. In our evaluation, we illustrate which AI speaker notes are required for which kind of output, which aspects can be controlled via prompting, and we discuss in how far client-ready output is achievable. Added Value While our approach does not match the nuanced writing style and proficiency of an experienced human researcher, we can claim with confidence that the speed-up in getting to draft-level output is tremendous – in particular for lengthy reports. We thus envision a setup in which researchers merely need to fine-tune an AI-written draft report, incl. charts and accompanying text, while knowing that the factual statements in this deliverable can be trusted. Meet Your New Client: Writing Reports for AIs 1Inspirient; 2Q Agentur für Forschung Relevance & Research Question As organizations adopt Retrieval-Augmented Generation (RAG) for their Knowledge Management Systems (KMS), traditional market research deliverables face new functional demands. While PDFs of reports and presentation slides have effectively served human readers, they now are also “read” by AI systems to answer questions of human users—a trend that will only increase going forward. In order to future-proof the reports that are delivered today, this study evaluates information loss when transferring market research insights through different delivery formats into RAG systems. This open question emerged from a discussion at the DGOF KI Forum between market research buyers and suppliers. Methods & Data We frame the transfer of information, incl. research insights, into clients’ KMS as a signal processing problem. The fidelity of the information transfer depends on the data format: Some formats, e.g., pictures of charts, incur an information loss while other formats, e.g., tables, do not. We model this loss using benchmarks for information extraction from different file formats and from graphs. Further, we assess the needs handled by current reporting formats and contrast them with new needs from RAG. This is done through expert interviews and an analysis of research reports from different institutes. Results Findings indicate that classic formats, while valuable for human interpretation, are not optimal for AI systems. Key limitations include difficulties in extracting information from graphs and styled slides, which lead to altered, de-contextualized, or lost information. Text-heavy reports offer greater compatibility, yet are not optimal either, e.g., when methodology is presented separately from results. Our study suggests that transitioning to complimentary special-purpose deliverables, designed explicitly for AI, enhances the retrieval accuracy of research insights within KMS, and thus for the client. Added Value The choice of reporting format is critical for delivering insights to market research clients, especially now that these reports will also be consumed by AI. This study yields insights on new demands and improved formats for reports from suppliers. It also supports buyers of reports in their assessment of proposals and effective ingestion of results into their KMS for optimal information retrieval going forward. |
5:00pm - 6:00pm | II: Impact & Innovation Award II Location: Max-Kade-Auditorium Session Chair: Yannick Rieder, Janssen EMEA, Germany |
|
No Bitter Notes: Ensuring Data Quality in Stiegl’s Brand Tracking Study with ReDem 1Redem GmbH, Austria; 2Media1 Relevance & Research Question Methods & Data Results The Teilhabe-Community. An infrastructure for research projects involving individuals with disabilities 1Aktion Mensch e.V., Germany; 2Ipsos; 3Ipsos; 4Ipsos Relevance & Research Question People with various impairments or disabilities are often not adequately represented in (population) surveys, and their specific subgroups are not identifiable in result analyses. Similarly, participatory product development with people with disabilities rarely takes place.This group wants a voice, a right protected by the UN Disability Rights Convention. Aktion Mensch develops innovations for an inclusive society, including the idea of a special panel. Ipsos has been brought on board for a collaboration to establish this panel. Ipsos and Aktion Mensch launched the "Teilhabe-Community" in early 2023. This panel of around 900 individuals with disabilities is available for diverse research about their life realities and everyday experiences. Due to barriers, these individuals often require specialized communication or customer engagement, necessitating an additional panel. Methods & Data Ipsos hosts the "Teilhabe-Community" on its own online platform. Following a participatory approach, we work with people with disabilities together to ensure the community is as barrier-free as possible.We aim to capture a diverse range of impairments in our panel. In addition to online panels and social media, we leverage contacts through leading social welfare organizations (Aktion Mensch members). Workshops and events, involving advocates like facility directors and support staff, help convey our message.The introduced exchange feature on the platform, requested by individuals with disabilities, also serves the purpose of direct exchange with this group. Moderators assist inexperienced panelists in expressing and forming opinions. Results The "Teilhabe-Community" serves researchers and companies for varied inquiries, supporting both quantitative and qualitative methods, including offline activity recruitment. Applications span from usual social research to customer and user experience, including customer journey and more.The "Teilhabe-Community" enables companies to integrate diverse experiences of people with disabilities into product or service development, fostering an inclusive society. Aktion Mensch and others have already conducted surveys and website tests. Added Value The unique feature of the "Teilhabe-Community" is its easy and quick access to this group. Panelists register in advance, sharing information on their impairments and demographics for targeted sampling for research purposes. The community also allows panelists to interact. Involving them in research significantly contributes to fostering inclusivity. Developing a detailed understanding of patient journeys and treatment pathways through a co-created digital-first approach, to rapidly collect real-world patient experience data. DontBePatient Intelligence, Germany Relevance & Research Question |
8:30pm - 11:59pm | GOR 25 Party Location: Beate Uwe |
Date: Wednesday, 02/Apr/2025 | |
8:30am - 9:00am | Begin Check-in Location: Foyer EG |
9:00am - 10:00am | 7.1: Survey Recruitment Location: Hörsaal A Session Chair: Camilla Salvatore, Utrecht University, Netherlands, The |
|
Designing passwords for web survey access: The effect of password length and complexity on survey and panel recruitment Institute for Employment Research, Germany Relevance & Research Question For online probability surveys that recruit participants via postal invitation letters, passwords are used to manage access to the survey. These passwords serve several purposes, such as, blocking uninvited individuals, and preventing multiple submissions from the same individual. Research on web survey passwords has primarily focused on whether providing a password for survey access affects response rates. However, the chosen password strength, that is, length and complexity, may also affect response propensities. Password length refers to the number of characters in a password. Password complexity involves the set of characters from which the password can be derived (e.g., lowercase letters and numbers). Our research evaluates the effect of password length and complexity on survey access, response, panel registration and linkage consent rates. We implemented a survey experiment by varying password length and complexity during the first wave of a general population online survey. For recruitment, every individual received a postal invitation letter with a web-link and QR-code directing to the survey, along with an individualized password. We conducted a 2×2 experiment that manipulated password length (five vs. eleven characters) and complexity (uppercase letters only vs. uppercase + lowercase letters + numbers). Additionally, we included a group that used the default length and complexity settings of the service hosting the survey (eight uppercase letters). Invited individuals were randomly assigned to one of these five groups across two different probability samples: employees (N=77,173) and welfare recipients (N=99,176). Results Results show that short as well as long passwords increase the access rate compared to the control group (16.7%, 19.2% vs. 14.9%). The positive effects of the password designs remain for response and panel registration rates. We also find that long passwords have a positive effect on the propensity to consent to linking survey with administrative data. Added Value Our research sheds light on an often-overlooked aspect of postal survey invitations for web surveys: Passwords designs. Our talk shows how researchers can strategically design survey passwords, potentially influencing not only survey response rates but also other data quality indicators.
Using panelists self-stated motivations to craft efficient targeted email invitations to an online probability panel 1Institute for Employment Research, Germany; 2University of Bamberg Relevance & Research Question In online panels, emails are a crucial element for recruiting respondents. Email invitations may substantially affect panelists’ perception of the study's relevance, potentially influencing both response rates and sample composition. Previous research has examined the use of targeted appeals, where the wording in the invitation letter varies among pre-identified subgroups. In our study, we divide panelists into subgroups based on their self-stated motivations to participate. We then use these self-stated motivations to craft an appealing email invitation to invite panelists to a subsequent wave. Based on the Leverage-Saliency Theory, emphasizing the self-stated motivation in email invitations (Saliency) should have a positive effect on the panelists' response propensity, enhancing cooperation as well as reducing attrition within the panel. Our design enables us to answer the question: Do targeted invitations based on panelists self-stated motivations from a previous wave increase response rates in a subsequent wave? Methods & Data We implemented a survey experiment in a German Online Probability Panel: IAB-OPAL. In wave 3, we asked 10,246 panelists to state their main motivation for participation, choosing among seven different motivations: topic, incentive, giving opinion, informing politics, curiosity, helping science, feeling obligated. In wave 4, we randomly assigned panelist either to the standard invitation or to an invitation that aligns with one of the self-stated motivations. Our treatment included a different subject line as well as a motivational email text. Results Results show that our treatment did not improve cooperation or reduce attrition within the panel. On the contrary, for the motivations “giving opinion” and “informing politics”, results show that aligning the wording of the invitation email with panelists self-stated motivations from the previous wave reduces response rates compared to the standard invitation email. Added Value Aligning panel communication documents with panelists' underlying motivations holds the promise to enhance cooperation and reduce attrition within the panel. At least for email invitations, we find that our results break with this promise, and lead to unexpected results regarding the underlying leverage-saliency-theory. Therefore, we find it noteworthy to share and discuss these results with experienced colleagues in the online survey community. Backing up a Panel with Piggybacking – The Effect of Piggybacking Recruitment on Nonresponse Bias and Panel Attrition in a Mixed Mode Panel Survey GESIS - Leibnitz Institute for the Social Sciences, Germany Relevance & Research Question Sampling and recruiting respondents for (online) probability-based panels can be very expensive. One cost-intensive aspect of the process is drawing a separate sample and recruiting the respondents offline. To reduce the cost of panel recruitment, some mixed-mode or online panels (e.g., the GESIS Panel, the German Internet Panel, and the NatCen Panel) relied on piggybacking in some recruitments or refreshments. Piggybacking means that participants for the panel are recruited at the end of another probability survey so that no additional sample has to be drawn. Although this reduces the cost of panel recruitment, it might also introduce additional nonresponse. Whether or not the higher amount of nonresponse also translates to higher amounts of bias in practical applications of a piggybacking survey will be analyzed in my research. In addition to the bias for the initial recruited panelists, we will also investigate the effect piggybacking has on panel attrition. Methods & Data To answer the research question, we use the GESIS Panel, a panel survey that was initially recruited in 2013 (n = 4,961) from a separate sample but later on refreshed three times with the help of piggybacking (n = 1,710, 1,607, 764). This setting allows us to compare the bias of both survey types against each other and disentangle the nonresponse bias introduced by piggybacking in contrast to regular nonresponse bias. To estimate the bias of the separate recruitment waves, we use the German Microcensus as a benchmark. The bias will be measured as a relative bias for demographic and job-related variables, as well as the difference in Pearson’s r between benchmark and survey. Results Initial results show that piggybacking did significantly increase the number of nonrespondents, compared to a separate recruitment. As we are currently preparing the data for the bias analyses, actual results regarding the bias need to be added later. Added Value Our work will give researchers a better understanding of the bias introduced through piggybacking and if this method is still a useful tool to reduce the cost of a probability (panel) survey without introducing high amounts of nonresponse bias. |
9:00am - 10:00am | 7.2: Digital Behavior and Digital Traces Location: Hörsaal B Session Chair: Julian Kohne, GESIS - Leibniz Institute for the Social Sciences, Germany |
|
Where You Are Is What You Get? Sample Inconsistencies of Google Trends Data Across Download Locations University of Mannheim, Germany Relevance & Research Question: Researchers increasingly use digital trace data sources such as Google Trends as an alternative or complement to survey data. However, besides technical limitations and issues of external and internal validity, several researchers have noticed issues with Google Trends’ reliability. The data are based on an unknown sample of all Google searches. Downloading Google Trends data for the exact same parameters (i.e., search term, region, time) but at different points in time can therefore produce unreliable values on Google Trends’ search index, especially for queries with low search volume. In this paper, we extend the research on Google Trends’ reliability beyond the retrieval date by examining the effect of the download location on inconsistencies across samples: Do we get different values from Google Trends depending on where we download the data? Methods & Data: We retrieved Google Trends data for the same regions, time periods, and terms from four different countries on three continents (Austria, Germany, the U.S., and Australia). We then compared the search index values retrieved from each respective country to those downloaded in the other countries, keeping all parameters of the query constant. Results: Our results show that values from Google Trends differ across download locations depending on the download day and the query’s total search volume. Researchers can minimize these inconsistencies by averaging samples from several days for high search volume queries. Nevertheless, our results point to an additional limitation regarding the reliability and replicability of Google Trends data for its usage in social science research. Added Value: Our findings help researchers working with Google Trends data in making their research better replicable by averaging samples from several days for high search volume queries. Our results also serve as a tail of caution for research relying on APIs that provide samples of their digital trace data as the download location might impact the findings. Online Labour Markets in the context of Human Rights and Environmental Due Diligence 1Datenwissenschaftliche Gesellschaft DWG Berlin, Germany; 2Oxford Internet Institute Relevance & Research Question: Online labour markets (OLMs) reflect the globalisation of the past three decades, combined with accelerating digitisation, and are poised to reshape the future of work. For highly educated workers in developing and emerging economies, OLMs offer significant income opportunities. However, existing literature highlights issues such as insufficient regulation, lack of transparency, and inadequate policy focus. Recently, emerging frameworks like the German Act on Due Diligence in Supply Chains (Lieferkettengesetz, LkSG) have introduced legal mechanisms to address human rights violations in global value chains. These frameworks could also help regulate OLMs by requiring clients to exercise due diligence. This obligation, however, depends on the ability to identify clients and assign them corresponding responsibilities. This study addresses two key research questions:
Methods & Data: We build on digital trace data from Braesemann et al. (2022) and the Online Labour Index (Stephany et al., 2021) who compile data on freelance projects from platforms like UpWork and Fiverr. Our analysis focuses on project histories to examine client outsourcing behaviour. Metrics such as wages, working hours, and gender imbalances are also assessed. Results: Our study demonstrates the feasibility of identifying clients through project-ID matching algorithms, using a sample of 250 project IDs. Results show that small companies dominate outsourcing activities on OLMs. Wage distributions across case studies in Serbia, Egypt, and Bangladesh reveal that average freelance wages often exceed local minimum wages. However, significant variations exist across occupations and genders, underscoring the need for targeted policy interventions to ensure fair pay and gender equity. Added Value: This study highlights the potential of supply chain regulations to address regulatory gaps in OLMs by enforcing minimum wage standards and addressing gender disparities. It also advances methods for identifying and analysing clients on OLMs, providing actionable insights for policymakers and researchers.
Measuring the accuracy of self-reported Instagram behavior - a data donation approach. University of Mannheim, Germany Relevance & Research Question Current research on online behavior heavily relies on self-reported data, which, if flawed, can lead to inaccurate inference in subsequent analyses. Researchers examining online behavior require detailed measures beyond "time spent on a platform" to explore, for example, well-being, social media use, or online privacy, particularly to differentiate between active and passive social media use. This study investigates the extent of misreporting in questions about fine-grained Instagram behavior by comparing them to objective measures collected via data donation. We also explore to what extent the accuracy of self-reports is dependent on the response format (rating scale vs. open text field) and the reference period ("last week" vs. "typical week"). Methods & Data We collected survey data from over 500 Instagram users in a German probability-based online panel regarding 25 distinct behaviors, including posting, liking, and commenting. Participants first complete survey questions on these behaviors. As part of the survey, we conduct a 2x2 experiment that randomly varies the reference period and, for a subset of behaviors, the response format. Respondents are then asked and, if they agree, instructed to download their Instagram usage data for the last three months and donate them to our research. We analyze correlation coefficients between behavioral self-reports and donated data to assess the accuracy of self-reports in general and for specific behaviors. Results Our study’s data collection phase ended on 12 November, and we cannot present any results yet. We have successfully collected self-reported and donated behavior data from 122 respondents. We will update this abstract with the respective findings before 1 March 2025. Added Value This study contributes in three ways. First, we inform the field of questionnaire design by offering insights into how to accurately inquire about specific online behaviors, which is particularly interesting for researchers who may not utilize data donation methods. Second, we examine the accuracy of self-reported data on individual Instagram behavior, helping researchers assess the validity of surveying self-reported online behaviors. Third, we illustrate the potential of data donation to gather detailed, fine-grained data on individual behaviors, which participants might be unable to report accurately. |
9:00am - 10:00am | 7.3: Questionnaire Design Location: Hörsaal C Session Chair: Yfke Ongena, University of Groningen, Netherlands, The |
|
B:RADICAL – Co-designing online survey questions with children and young people on their understanding and their experiences of respect and disrespect Queen's University Belfast, United Kingdom Relevance & Research Question In this presentation, I will showcase how children and young people can be involved as co-researchers and research advisors in designing online survey questions and analysing survey results collected in large-scale online social attitude surveys conducted among children and young people. There is a growing interest in participatory and collaborative research approaches and an increasing recognition that the involvement of co-researchers can enhance the relevance, validity and meaningfulness of research data. Traditionally, collaborative research approaches predominantly utilised qualitative research methods, but more recently survey researchers have also been more open to adopting co-production approaches. In this study I will showcase such an collaborative approach to online surveys involving children and young people. Does Succeeding on Attention Checks Moderate Treatment Effects? 1University of Gothenburg, Sweden; 2Stanford University, USA; 3Lisanne Wichgers Consulting; 4Matt Berent Consulting Relevance & Research Question Attention checks have become a common tool in questionnaires administered online. Confirming their popularity, 58 out of the 139 articles published in the Journal of Personality and Social Psychology in 2019 featured an online experiment where at least one attention check was used to exclude participants. If participants do not pay attention to survey questions and treatments, Type-II error may be inflated, increasing the likelihood of false negative results. A few studies have found that excluding participants who failed the attention check strengthened treatment effects, presumably because excluding them reduced noise and increased data quality. However, participants failing attention checks typically differ from passers in characteristics such as age, gender, and education, so excluding failers compromises sample representativeness and reduces sample size. Methods & Data To assess the impact of excluding participants who fail attention checks on treatment effects and sample composition accuracy, data from sixty-six experiments were analyzed. In all experiments, online respondents were randomly assigned to one of two experimental conditions, with 750 people completing the questionnaire in each condition. This allowed for the assessment of whether the treatment effect became stronger when excluding attention check failers, the degree to which failing rates differed between the treatment and control conditions (which would compromise internal validity if failers are dropped), and the degree to which dropping failers compromised sample distribution of demographic characteristics. Results The results indicated that attention checks only weakly moderated treatment effects. Participants who failed the attention check showed statistically significant treatment effects despite, ostensibly, not paying attention. Including or excluding the failing participants did not alter any of the conclusions made about each of the sixty-six experimental treatment effects. Added Value This meta-analytical study adds to the growing research investigating the appropriateness of attention checks in online questionnaire administration. The study allowed for differentiating whether certain types of attention checks were more efficient in detecting inattentive participants. Lastly, the study results add insights into how excluding data from participants failing attention checks affects a sample's resemblance to the general population in terms of several demographic characteristics. Balancing Questionnaire Length and Response Burden: Short and Long-Term Effects in the IAB Job Vacancy Survey IAB, Germany Relevance and Research Question: Declining survey response rates are a global concern, also affecting official statistics and establishment surveys. This trend is evident in Germany, raising questions about what factors influence participation. Prior research indicates that longer questionnaires can deter respondents due to increased time and effort, leading to higher response burden and lower response rates. While this effect has been shown in household surveys, its impact on establishment surveys should also be researched. This study examines how questionnaire length affects response rates and response burden in the IAB Job Vacancy Survey, a relevant data source for understanding labor demand and recruitment in Germany. Methods & Data: In the 2023 wave of the IAB Job Vacancy Survey, around 2000 establishments were randomly assigned to receive either a concise two-page questionnaire or a detailed four-page version, differing in the number of questions. The survey employed a mixed-mode design, combining self-selected web or paper modes of data collection. This experimental design was implemented over three consecutive quarters, allowing the estimation of immediate and long-term effects throughout the year. By comparing the response behavior between the two groups over time, we assessed the influence of questionnaire length on participation and perceived burden. Results: The findings show no significant differences in response rates between the two groups across the three quarters. However, establishments who received the longer questionnaire reported significantly higher levels of perceived response burden. This effect was particularly pronounced in the paper mode of data collection, where respondents expressed greater burden compared to those using the web mode. This increased burden did not translate into decrease short-term participation but may have implications for respondent satisfaction a data quality. Added Value: These results are significant for the design of establishment surveys and the production of official statistics. They suggest that reducing questionnaire length can lower response burden without significantly affecting response rates. This insight supports efforts to optimize the IAB Job Vacancy Survey's push-to-web design, aiming to enhance respondent experience and maintain high-quality data collection. |
9:00am - 10:00am | INVITED SESSION II: Innovation in Practice: Smart survey techniques Location: Hörsaal D Session Chair: Stefan Oglesby, data IQ AG, Switzerland |
|
Measuring the impact of OOH advertising in the Swiss Alps intervista AG, Switzerland Measuring the impact of OOH advertising in the Swiss Alps In 2024, intervista was commissioned by APG|SGA to investigate the advertising impact of campaigns in winter sports resorts in Switzerland. The innovative research design included two essential components. On the one hand, smartphone-based GPS tracking was used to continuously measure contacts with the advertising spaces and calculate campaign reach. On the other hand, a survey was conducted after the campaigns and the survey data was analyzed in combination with the measured contacts. This research approach made it possible to understand the advertising impact in relation to the contact dose. In addition to the methodology, the presentation will also show concrete results. AI changing the insight game? A practitioner’s view on developing and implementing a RAG-based audience simulation at Samsung Samsung Electronics Austria & Switzerland, Switzerland Goalsetting The primary goal of this project was to explore the broader use of Generative AI (GenAI) in consumer insights, enhancing accessibility, efficiency, and marketing decision-making. In preparation for Samsung’s Galaxy Ring launch, we developed an AI model with two key purposes:
Process Developed iteratively with a market research agency, the model was trained on proprietary research, stakeholder interviews, global marketing inputs, and social listening data. Validation tested its ability to replay ingested data, simulate buyer personas, and predict selected variables using holdout survey data. After successfully simulating responses on a nationally representative level, further validation focused on key buyer personas. Being the first AI model of its kind at Samsung, we prioritized reliability and truthful answer behavior, ensuring the model was ready for practical application. It was officially introduced to the Samsung Marketing team in November 2024. Results The model showed strong qualitative and quantitative validation (R² > 0.9 in most cases). Confidence in its accuracy led to formal adoption within the marketing team. Ongoing usage is monitored, and findings will be shared at GOR 2025. Added Value This session provides a practitioner’s perspective on developing an AI-driven, RAG-based model for marketing decision-making. It contributes to best practices in GenAI by reflecting on development, validation, and implementation. Additionally, it explores how AI-driven insights can transform marketing and consumer insights collaboration, fostering more data-informed, impactful decisions. “Implicit Conjoint” - Why latency time is important 1YouGov Schweiz AG, Switzerland; 2bms marketing research + strategy; 3YouGov Schweiz AG, Switzerland Relevance & Research Question: Dual process theories suggest that the majority of human decisions are made by System 1, which works automatically and largely unconsciously. This system leads to highly habitualised behaviour in which consumption decisions are made relative to other options, regardless of the product category. Therefore, an optimal research design should not only create realistic decision contexts and be based on established experimental methods but also consider consumers’ implicit decision-making processes. We will present an implicit Discrete Choice Model (DCM) that is suitable for modelling System 1-influenced decisions. Methods & Data: To this end, we had participants complete a DCM and, subsequently, evaluate the concepts chosen in the DCM tasks (i.e., winner concepts) by means of a Single Implicit Association Test (SIAT). This allowed us to specifically measure the respondents’ response speed when indicating whether the concept represents a purchase option or not through a number of cognitive, affective and behavioural statements towards the respective concept. The SIAT is the most commonly used method for this purpose. Participants’ responses and response times measured in the SIAT were finally transformed into a linear function of associative strength and used to calibrate the utility estimation of the DCM. By incorporating subconscious associations into the choice model, this approach enhances predictive accuracy, bridging the gap between stated preferences and real-world purchasing behaviour. Results: Two online studies – one on “city trips” and one on “smartphones” – demonstrate the impact of integrating SIAT into DCM. Simulations reveal that incorporating measured response times enhances the predictiveaccuracy of purchasing behaviour, capturing decision-making dynamics beyond traditional models. Since DCM is typically influenced by System 2 processes, this integration allows for the inclusion of more intuitive, implicit preferences. The results show that combining SIAT with DCM provides deeper, more meaningful insights into consumer decision-making, bridging the gap between stated and subconscious preferences. Added Value: Our approach enhances DCM by integrating SIAT, capturing subconscious biases often missed in traditional models and improving predictive accuracy. |
10:00am - 10:30am | Break |
10:30am - 11:15am | 8: Keynote 2: Location: Max-Kade-Auditorium |
|
Do large language models have a duty to tell the truth? University of Oxford, United Kingdom Careless speech is a new type of harm created by large language models (LLM) that poses cumulative, long-term risks to science, education, and the development of shared social truths in democratic societies. LLMs produce responses that are plausible, helpful, and confident but that contain factual inaccuracies, inaccurate summaries, misleading references, and biased information. These subtle mistruths are poised to cause a severe cumulative degradation and homogenisation of knowledge over time. This talk examines the existence and feasibility of a legal duty for LLM providers to create models that “tell the truth.” LLM providers should be required to mitigate careless speech and better align their models with truth through open, democratic processes. Careless speech is defined and contrasted with the simplified concept of “ground truth” in LLMs and prior discussion of related truth-related risks in LLMs including hallucinations, misinformation, and disinformation. EU human rights law and liability frameworks contain some truth-related obligations for products and platforms, but they are relatively limited in scope and sectoral reach. The talk concludes by proposing a pathway to create a legal truth duty applicable to providers of both narrow- and general-purpose LLMs, and discusses “zero-shot translation” as a prompting method to constrain LLMs and better align their outputs with verified, truthful information. |
11:15am - 11:45am | 9: GOR Award Ceremony Location: Max-Kade-Auditorium |
11:45am - 12:00pm | Break: Break |
12:00pm - 1:15pm | 10.1: Reluctant Respondents and Item Nonresponse Location: Hörsaal A Session Chair: Indira Sen, University of Mannheim, Germany |
|
Encouraging revision of ‘Don’t know’ responses: Comparing delayed and dynamic feedback in Web surveys Technical University of Darmstadt, Germany Relevance & Research Question In Web surveys, the absence of interviewers increases the risk of item missing. Therefore, general wisdom suggests to refrain from don’t know (DK) options as they may encourage respondents to satisfice (Krosnick & Fabrigar, 1997). This, however, may lead to situations where respondents who cannot generate a valid answer randomly select one of the substantive response categories. Previous studies indicate that interactive feedback can effectively improve response quality (Zhang, 2013; Al Baghal & Lynn, 2015). Interactive feedback can be provided either (1) after the questionnaire page is submitted (delayed feedback) or (2) immediately after respondents chose DK before they submit the page (dynamic feedback). In this study, we apply interactive feedback in difficult single-choice questions that offer an explicit DK. If respondents select DK, we follow-up with either delayed or dynamic feedback to clarify question content. We assume that dynamic feedback is more effective in reducing DK since the feedback is provided while respondents are still engaged in the response process. Methods & Data In a Web survey, conducted with a German online access panel in November 2024 (n=2,000), we implemented a between-subjects design experiment. In two single-choice questions the effectiveness of providing dynamic feedback (EG1) or delayed feedback (EG2) is evaluated compared to a control group providing no feedback (CG). Results Preliminary results indicate that both feedback types reduce the percentage of final DK responses. In the first experiment, positioned early in the questionnaire, delayed feedback appears to be more effective in reducing DK than dynamic feedback. In contrast, in the second experiment, placed later in the questionnaire, dynamic feedback exhibits stronger effects. These findings suggest that delayed feedback may be more effective when respondents are highly motivated, whereas dynamic feedback seems to reduces DK to a greater extent as respondent motivation decreases. Added Value This study provides insights for survey researchers seeking to minimize DK answers and improve data quality in web surveys. By examining the distinct effects of dynamic versus delayed feedback on the revision of DK answers, this study helps understand how the timing of feedback influences respondent behavior. Zooming in: Measuring Respondent´s Reactance and Receptivity to assess the effects of Error-Reducing Strategies in Web Surveys Technical University of Darmstadt, Germany Relevance & Research Question In Web surveys, respondents often exhibit signs of satisficing behavior, like speeding, non-differentiation, or item nonresponse. A common strategy to reduce such behavior uses prompts and interactive feedback to respondents (for example Baghal & Lynn, 2015; Kunz & Fuchs, 2019). However, the effectiveness of prompts is sometimes limited. Some respondents seem to react with an optimizing tendency while others tend to ignore such prompts. This raises concerns that prompts do not reach all respondents to the same extent. In this study we assess the interaction of two personality traits with the effectiveness of prompts concerning non-differentiation, item nonresponse and other types of satisficing behavior. It is assumed that the respondents’ level of reactance, which describes a person´s inner resistance to restrictions on one's own freedom of action, prevents respondents from improving their answering behavior when exposed to a prompt. By contrast, a person’s receptivity describes the likelihood of a positive change in behavior due to interventions. It is assumed that respondents with higher levels of receptivity perceive prompts as a helpful resource and change their response behavior to the good. Methods & Data In a web survey conducted in a general population sample drawn from a non-probability based online access panel (n=2,000) on “AI and digitalization” a series of experiments evaluating prompts on various types of satisficing behavior such as non-differentiation and item nonresponse have been implemented. Also, two validated German psychometric scales were utilized to measure how responsive respondents are to interventions: One on reactance (Herzberg, 2002) and one on receptivity to instructional feedback (Bahr et al., 2024). Results Field work is still underway. In the analysis we aim to test, whether these personal traits are able to explain the differential effectiveness of the various established satisficing prompts. Added Value The study contributes to a better understanding of the differential effectiveness of satisficing prompts. Based on the results we aim to tailor the frequency, the presentation and the wording of prompts to the respondents’ psychometric profile. We assume that an improved respondent experience may foster effectiveness of interventions and ultimately data quality.
Understanding item-nonresponse in open questions with requests for voice responses 1Utrecht University, Netherlands, The; 2German Center for Higher Education Research and Science Studies (DZHW); 3Leibniz University Hannover Relevance & Research Question |
12:00pm - 1:15pm | 10.2: AI and Automation in (Survey) Location: Hörsaal B Session Chair: Danielle Remmerswaal, Utrecht University, Netherlands, The |
|
Bots in web surveys: Predicting robotic language in open narrative answers 1DZHW; Leibniz University Hannover; 2University of Mannheim Relevance & Research Question Web survey data is key for social and political decision-making, including official statistics. Respondents are frequently recruited through online access panels or social media platforms, making it difficult to verify that answers come from humans. As a consequence, bots – programs that autonomously interact with systems – may shift web survey outcomes and social and political decisions. Bot and human answers often differ regarding word choice and lexical structure. This may allow researchers to identify bots by predicting robotic language in open narrative answers. In this study, we therefore investigate the following research question: Can we predict robotic language in open narrative answers? Methods & Data We conducted a web survey on equal gender partnerships, including three open narrative questions. We recruited 1,512 respondents through Facebook ads. We also programmed two AI-based bots that each ran through our web survey 100 times: The first bot is linked to the LLM Gemini Pro, and the second bot additionally includes a memory feature and adopts personas, such as age and gender. Using a transformer model (BERT) we attempt to predict robotic language in the open narrative answers. Results Each open narrative answer is labeled based on whether it was generated by our bots (robotic language = “yes”) or the respondents recruited through Facebook ads (robotic language = “unclear”). Using this dichotomous label as ground truth, we will train a series of prediction models relying on the BERT language model. We will present various performance metrics to evaluate how accurately we can predict robotic language, and thereby identify bots in our web survey. In addition, we compare these results to students’ predictions of robotic language to study whether our BERT models outperform human judgement. Added Value Our study contributes to the ongoing discussion on bot activities in web surveys. By investigating AI-based bots with different levels of sophistication that are linked to a LLM, our study stands out from previous research that mostly looked at less sophisticated rule-based bots. Finally, it extends the methodological toolkit of social research when it comes to identifying bots in web surveys. Addressing Biases of Sensor Data in Social Science Research: A Data Quality Perspective 1GESIS - Leibniz Institute for the Social Sciences, Germany; 2University of Mannheim, Germany; 3University of Konstanz, Germany; 4Ulm University, Germany; 5RWTH Aachen, Germany; 6University of Michigan, USA; 7University of Düsseldorf, Germany Relevance & Research Question: Sensor data – social sciences – error sources – error framework The everyday availability of sensors has opened new research avenues for the social sciences, including their combination with traditional data types, such as survey data. However, as sensors become more prevalent for the collection of digital behavioral information, concerns regarding the accuracy and reliability of the obtained sensor data have emerged. Error sources and biases of sensor data are very sensor-specific, which poses a challenge to social science researchers as the necessary technical expertise is often lacking. The paper gives an overview of these concerns and proposes a general error framework for the data quality assessment of sensor data in social science research, contributing conceptually and methodologically to enhance the assessment and reporting of sensor data quality.
Systematic review – thematically focused content analysis – expert group Sensor error framework dimensions were extracted based on the results of a thematically focused systematic review (see preregistration here: https://osf.io/vkxbt ) using qualitative content analysis and evaluated within an expert group. Results Data quality – error framework – technical and human error – measurement error – representation bias The proposed error framework outlines error sources and potential biases for measurement and representation along the full research cycle (planning, data collection, data analysis, archiving and sharing). We addressed the intricate relationship between general data quality dimensions and sensor-specific error sources by incorporating the multilayered character of sensor data arising from technical affordances and device effects. In addition, we identified three principles structuring error sources and biases for specific sensors: The interplay between researcher, study participant, and device, the spatial mobility of the sensor, and the continuous character of the error sources. The adoption of the framework is illustrated with sensor-specific examples. Added Value Data quality assessment – reporting standards – replicability – interpretability The proposed general error framework for sensor data bears the potential to enhance the assessment and reporting of sensor data quality in the social sciences. It provides guidance to researchers and facilitates better replicability and interpretability of sensor data. Improving the measurement of solidarity in the European context: results from a web probing in four countries University of Bergamo Relevance & Research Question This research addresses how cultural biases affect cross-national comparability in attitudes toward solidarity. Recent studies highlight concerns about the comparability of solidarity measurements between countries. By implementing international web-probing, we aim to uncover these biases and improve the clarity of questions in future rounds of the EVS questionnaire to ensure reliable cross-country comparisons. Cross-country comparability, solidarity, European Values Study. Methods & Data We conducted web probing in—Italy, Portugal, Hungary, and the Czechia —utilizing nine solidarity-related items from the EVS 2017 questionnaire. The method involved inserting probes following closed-ended questions to explore respondents’ interpretations. A sample of 600 participants was surveyed, with responses analyzed qualitatively to identify variations in how terms like “Europeans,” “immigrants,” and “concern” are understood, and explaining why they chose to feel a certain level of concern toward Europeans. This data was translated and categorized using thematic coding across languages. Web probing, open-ended questions, codification. Results For each response, we identified multiple categories, demonstrating the diverse interpretations of the same word within and especially across countries, despite the accurate translation of the EVS questionnaire. Cultural variations, language, context, interpretation. Added Value This study demonstrates the value of web probing as a tool for identifying and addressing cultural biases in international surveys. The insights gained provide a basis for refining survey instruments, ensuring that data on solidarity reflects a more accurate and culturally sensitive understanding across European countries. Cultural biases, comparative implications, data quality improvement. |
12:00pm - 1:15pm | 10.3: Exploring Representation in Social Media Location: Hörsaal C Session Chair: Jessica Donzowa, Max Planck Institute für demographische Forschung, Germany |
|
Dancing with Data: Understanding Gender Representation Among Viral TikTok Content Bielefeld University, Germany Relevance & Research Question: Social media platforms have evolved into significant ecosystems, where content creators derive income from their digital presence. Despite its prominence as one of the biggest social media platforms, TikTok remains understudied in social science literature, primarily due to the methodological challenges in analyzing video content. Methods & Data: This paper presents a novel computational approach to evaluate content creators' performance through systematic video data analysis. By embedding the videos via VideoMAE, a state of the art model for embedding of visual data,I examine the performance variations across different content categories and conduct a comparative analysis of creator performance stratified by gender. Using a unique data set of 36,166 videos that have been sampled using a rigorous hourly sampling approach to access content performance over a 41 day period. Added Value : My methodology demonstrates the feasibility of video content analysis on TikTok, contributing to both the theoretical understanding of digital creator economies and the methodological toolkit for social media research. It also demonstrates how embedding models can be leveraged for social scientific studies of visual data, which is still scarce in the field. Results: I find differences in the distrubtion of content types across genders. The choice of content type is also related to the sucess of generating user engagement (views / likes). However, individual level intercepts also strongly predict performance of shared content. Beyond Binary Bytes: Mapping the Evolution of Gender Inclusive Language on Twitter 1Bielefeld University, Germany; 2Bielefeld Graduate School in History and Sociology; 3DeZIM Institute Relevance & Research Question Languages worldwide differ significantly in how they incorporate gender into grammar and phonetics. In the German language, the generic masculine form (e.g., saying “Lehrer” [teacher, male, sing.]) is used to refer to a group of people with unknown (or non-male) sex and has been criticized for rendering women and non-binary people invisible in language, thereby reinforcing gender biases and unequal power dynamics. Gender-inclusive language (GIL) has been proposed as an alternative to the generic masculine and involves various subtypes. Our study investigates the development of GIL on Twitter between 2018 and 2023. In addition, we study individual (gender) and contextual (regional) effects on the use of GIL. We rely on a unique dataset of over 1 Billion German language Tweets. We present a pipeline to detect three types of GIL, namely binary feminization, non-gendered GIL and non-binary inclusive language. We do this through a combination of using a fine-tuned German BERT model, regular expressions, and a corpus of German gender-inclusive language words. User names are analyzed based on lists of male, female and unisex names. By inferring the place of residence for the users of more than 300 million Tweets, we shed light on the correlations of socio-structural variables and use of gender-inclusive language across Germany. Results We find that GIL adoption increases slightly over the studied 5 year period and we identify different trends among GIL types in this adoption. Furthermore, profiles with female usernames use GIL more often than those with masculine or unisex usernames. In addition, we find regional patterns with more use of GIL in urban regions and regions with a higher share of users with young population. Our study makes several novel contributions to the understanding of gender-inclusive language adoption and digital socio-linguistics. First, it provides insights into the real-world uptake of gender-inclusive language through the largest-scale analysis of German social media communication to date. Second, by linking language use to regional socio-structural variables, we offer the first comprehensive geographic analysis of gender-inclusive language adoption patterns in Germany. Romance Dawn: Investigating the dynamics of collaboration in a cultural producer community on YouTube 1Social Monitor, Romania; 2University of Mannheim Relevance & Research Question Online communities are widely acknowledged to provide new opportunities for meaningful interaction between individuals with similar tastes in cultural consumption. However, the flip side of this coin – that online communities likewise provide new social connection opportunities for producers of cultural content – has received much less attention so far. This project, therefore, starts with the premise that cultural producers foster “competitive co-operation” among themselves for mutual benefits through collaboration. We attempt to explore the specific dynamics of how such collaborations fosters growth and success of individual producers as well as of their community as a whole.
To investigate these and related claims empirically, we trace the development of an online community of cultural producers creating video content on a popular Japanese manga on YouTube. We develop a new interactive interface allowing for the systematic coding and preprocessing of Youtube data. The coding interface allows for the coding of collaboration cues: words and phrases indicating collaborations (e.g. “feat”, “w”, “@”). With this we construct a longitudinal collaboration network where cultural producers of this specific community share a directed tie if they have jointly published a video on YouTube, with the host as a source and the guest as destination.
The results of negative binomial regression estimating view count indicate great benefits of being invited by other youtubers. A tie to a new host, i.e. an increase in indegree, heightens the yearly viewer yield of a youtuber by around 20%. Inviting new guests, i.e. an increase in outdegree, also increases the yearly viewer yield of a youtuber, however only by around 7%. This study contributes to the study of online producers in two way. First we show a method how to code and preprocess youtube data. Second, we explore specific effects taking place in a cultural producer community pointing out very different effects of in- and outdegree. Both give insight into the dynamics behind emergence of communities online. Where is Everybody? Measuring Semantic Source Position and Creating Online Discourse Typologies from Co-Occurrence Networks Social Monitor, Romania Relevance & Research Question Tremendous advances have been made in measuring the meaning of text data allowing us to quantify entire discourses within unified data structures such as a vector spaces or networks. What is often lost in the application of such methods is the perspectives that different types sources or actors may contribute to a discourse and how such perspectives may come to shape it. We demonstrate a method for evaluating the different positions text sources can have in co-occurrence networks, what we call the sources semantic position. We then create a typology of the networks that emerge based on the semantic position of the actors. Methods & Data We use online data from a sample of leading US sources collected with the NewsVibe platform, which allows access to web news articles and facebook posts based on search queries. We sample articles around the 2024 US election. The co-occurrence network for each source is computed first, and then all keyword networks are added to a global co-occurrence network. This allows us to determine the position of each source within the discourse according to the keyword. Finally we evaluate the different types of networks that emerge from such an analysis. Results We show that depending on the search query, networks as well as source contribution can be highly contextual. Discourses around “economics”, but also discourses around candidates like Trump and Kamala are highly polarized. The sources in each of these cases split into almost separate parts of the network displaying what one could call semantic polarization. However networks around topics such as NATO are governed by strong influence by a few actors and very little participation by other sources. Added Value |
12:00pm - 1:15pm | INVITED SESSION III: DGOF KI (AI) Forum: Inspiration Session (Session held in English) Opportunities & Challenges in Applying AI to Market Research Location: Hörsaal D Session Chair: Yannick Rieder, Janssen EMEA, Germany Session Chair: Oliver Tabino, Q Agentur für Forschung, Germany Join us for an insightful session where experts test innovative solutions and explore the opportunities AI can bring to market research. Hear from industry experts about the current challenges in applying AI to market research and how they successfully overcome them. |
|
GOR 2025 – AI interview technology: Enhancing quality data sets with AI tools in the digital interview process 1HTW Berlin; 2horizoom GmbH, Germany; 3Susi&James; 4Susi&James; 5Xelper; 6horizoom GmbH, Germany AI is about to transform industries, including market research. AI-assisted tools are enhancing both qualitative and quantitative data collection methods.
The comparative studies focused on the AI supported data collection. AI could was also being used for the analytical parts of all four input types. Why do LLMs enter market research in Companies so slowly Roche Pharma AG, Germany Current Situation: With the advent of LLMs, a significant impact on various industrial activities has been anticipated. However, their current application in market research within companies remains limited. The challenges associated with this approach lie beyond the capabilities of the method itself. Problem: LLM-based methods compete with established techniques and existing data sources. Future: The limited adoption of LLMs is not due to deficiencies in the method itself but rather to competition with existing data sources and analytical approaches. The use of LLMs is likely to grow when they enable answering questions that remain unresolved with current analytical approaches. |
1:15pm - 2:30pm | Lunch Break |
2:30pm - 3:45pm | 11.1: Nonresponse Bias and Correction Location: Hörsaal A Session Chair: Anke Metzler, Technical University of Darmstadt, Germany |
|
Do machine learning techniques improve nonresponse weighting? GESIS, Germany Relevance & Research Question Nonresponse weighting is an important tool for improving the representativeness of surveys, e.g. weighting respondents according to their inverse propensity to respond (IPW). IPW estimates a person's propensity to respond based on characteristics that are available for respondents and nonrespondents. While logistic regression is typically used to estimate the response propensity, machine learning methods offer several advantages: they allow for very flexible estimation of relationships and the inclusion of a large number of potentially correlated predictor variables. ML methods are known to predict values very accurately. However, it is also known that the estimation of relationships of the weighting variables and the response propensity suffers from regularization bias. With regard to weighting, it is unclear which of these properties is more relevant and has a greater influence on the quality of the weighted estimate. In this study, we address the question of whether machine learning methods outperform logistic regression in performing IPW. Methods & Data In a simulation study that mimics the three nonresponse models (separate cause model, common cause model, survey variable cause model) and varies the number of features that affect nonresponse, we apply IPW weighting using five different prediction models: Regression Trees (CART), Random Forest, Boosting, Lasso, and Logistic Regression. We conclude the analysis with an application to voting decisions collected in the German Internet Panel. Results Machine learning methods perform similarly well to logit regression and lead to a lower variance in the estimates than logit regression. Overall, the advantage of an excellent prediction seems to outweigh the disadvantages of regularization bias. Added Value The presentation provides guidance on how to improve the weighting of surveys, which is a crucial task when drawing conclusions about the general population from a survey. Company Nonresponse in Gender Inequality Surveys: Challenges in Participation and Implications for Data Quality 1ARS - Associazione per la ricerca sociale; 2University of Bergamo Relevance & Research Question Sampling challenges, company engagement, labor market. Methods & Data A representative sample of 1,400 companies, each employing over 50 workers, from three Italian provinces of (Milan, Bergamo, and Brescia) was contacted to participate in the survey, with the aim of exploring company policies on gender inclusion, work-life balance, career development, and diversity. Challenges related to corporate nonresponse, such as issues with contact information accuracy, email deliverability (e.g., emails marked as spam), and hesitancy to participate, were systematically documented and analyzed. To increase the number of responses several strategies were applied. Our research offers valuable insights into how to increase companies’ responses. Representative sample, survey strategies. Results We will present our findings on how different strategies for involving companies can impact response rates. We will begin by discussing the initial recruitment attempt and the subsequent strategies employed to increase participation. Finally, we will examine the advantages ofthis type of research, the complexities inherent in this methodology, and strategies to overcome these challenges. Response rates, methodology. Added Value This study provides critical perspectives into the complexities of corporate engagement in gender inequality research. It highlights the importance of overcoming participation barriers to obtain a representative sample, by identifying strategies to improve company engagement, this research can inform future studies and help develop more effective methods to engage local samples and obtain representative results. Engagement optimization, gender inequality research, research impact. Reasons for participating in (non)probability online panels and how to ask about it GESIS – Leibniz Institute for the Social Sciences, Germany Relevance & Research Question A deeper look into education bias in web surveys 1Institute for Employment Research (IAB), Germany; 2Institute for Employment Research (IAB), Germany; 3Institute for Employment Research (IAB), Germany Relevance & Research Question The COVID-19 pandemic has accelerated a trend in survey research to use online data collection for general population samples. High-quality web surveys recently achieved response rates comparable to or even exceeding those of telephone surveys. However, selection bias concerning education is often more pronounced. To address this issue, we analyze complete employment biographies of both respondents and non-respondents and focus on three main research questions: (1) How do the different stages of the recruitment process for an online panel contribute to education bias? (2) Are there specific subgroups within the low-educated population who are even less likely to participate? (3) Are there interaction effects between education and other predictors of nonresponse? In 2023, the Institute for Employment Research in Germany launched a new online panel survey of the German workforce (IAB-OPAL) using a push-to-web approach. Addresses were sampled from a powerful database comprising compulsory social insurance notifications by employers as well as unemployment insurance and welfare benefit records. We utilize this unique opportunity of a sampling frame containing detailed individual-level information on complete employment biographies. Our findings indicate that educational bias accumulates at every stage of the recruitment process. We observed that unit nonresponse is more pronounced among individuals with lower education levels, particularly for respondents aged 50 and older and foreign nationals. Additionally, nationality appears to have a greater impact on highly educated individuals, and women are less likely to participate unless they hold advanced degrees. Using a detailed sampling frame that includes individual-level information from complete employment histories enables us to evaluate how educational bias emerges throughout the recruitment process. It also allows us to determine if response tendencies within different educational strata vary based on typically unobserved factors, such as experience with benefit receipt, occupations, or wages. |
2:30pm - 3:45pm | 11.2: Bots, Avatars, and Online Labs Location: Hörsaal B Session Chair: Joachim Piepenburg, GESIS, Germany |
|
Bots in web survey interviews: a showcase 1DZHW, Leibniz University Hannover, Germany; 2DZHW, University of Magdeburg Relevance & Research Question Bringing the Lab Online: Device Effects in Psychological Bias Testing in Online Surveys 1DeZIM Institut, Germany; 2Leuphana University Lüneburg, Germany; 3FU Berlin, Germany Relevance and Research Question Self-report measures for sensitive topics, such as stereotypes and prejudices, are often compromised by social desirability bias. Indirect psychological bias tests offer a promising alternative by measuring implicit biases through reaction times, decision errors under time pressure, priming effects, and memory performance. Traditionally, these tests are conducted in controlled lab environments, which limits the sample size and diversity due to logistical constraints. However, when conducted online, researchers cannot control participants' choice of device - but device type is associated with systematic differences (e.g. screen size, input method, and test environment) that may influence results. We developed a tool for integrating indirect psychological bias tests into online surveys: MIND.set. This study used MIND.set to investigate two key questions: 1) How reliably can implicit biases be detected in online survey contexts? and 2) How does the type of device used (e.g., mobile vs. desktop) impact test outcomes? Methods and Data In 2023, we conducted an online survey with 2,707 participants from the general population, using quotas for gender, age, and education. Participants were randomly assigned to one of five indirect bias tests implemented via the MIND.set platform: Implicit Association Test (IAT), Affect Misattribution Procedure (AMP), Shooter Task (ST), Avoidance Task (AT), and Source Monitoring Paradigm (SMP). All tests focused on stereotypes of Arab-looking men versus White men, specifically regarding perceived threat. Participants self-selected their devices (mobile or desktop), and our preregistered hypotheses (OSF) examined the influence of device type on bias detection and bias scores. Results The analyses confirmed implicit biases on at least one bias indicator across all five tests. Crucially, bias scores were largely unaffected by device type. While minor variations were observed, these did not significantly undermine the reliability of results across different devices. Added Value The MIND.set platform enhances the accessibility of indirect bias testing by offering a robust infrastructure for online research. This study is the first to systematically investigate device effects across multiple indirect bias tests, providing critical insights for researchers seeking to incorporate such tests into online surveys. Enhancing Open-Answer Coding in Quantitative Surveys Using Optimized LLM Tools Bilendi & respondi, France Relevance & Research Question Accurate coding of open-ended responses in quantitative surveys is critical for generating insights. However, traditional manual coding methods are time-consuming and costly. LLMs present an opportunity to revolutionize this process. The research question explored in this study: How can LLM-based coding be optimized to outperform both human coders and baseline LLM implementations in terms of accuracy? The goal of this research is to better understand how to improve automated coding via foundation models, and to assess the impact on coding quality of various strategies aimed at improving on vanilla LLM use. In particular, we considered the effect of: 1/ few shot learning: how helpful is it to provide general or case-based examples? 2/ prompt optimization: what is the best way to ask the LLM to perform the seemingly easy task of applying labels to verbatims? 3/ input optimization: is there a way to format input labels so as to make it easier for the LLM to correctly apply them? 4/ model choice: do newer generation LLMs fare better than older ones? are quicker/lighter models good enough? Methods & Data This research employed comparative tests across 4 coding methods tested on multiple datasets from real-world surveys: (1) Human manual coding by client (=benchmark), (2) Human coding by external suppliers, (3) Initial implementation of an LLM-based coding tool (BARI V1), and (4) An optimized version of the LLM tool (BARI V2) enhanced through iterative improvements in prompt engineering, training data alignment and feedback loops. Key performance metrics being always coding accuracy (measured against benchmark ‘Client’). Our results suggest that: 1/ general few shot learning is not particularly helpful 2/ some prompting strategies do fare better, specially on trickier inputs 3/ input format actually crucial the most crucial factor 4/ smaller models aren't bad compared to bigger ones Added Value This study showcases how optimizations of LLMs can bridge the gap between AI and human in coding open-ended survey responses. The findings provide insights into leveraging AI for more efficient and accurate data analysis, highlighting a transformative approach for researchers, practitioners, and industry stakeholders. |
2:30pm - 3:45pm | 11.3: Social Media Surveys and Recruitment Location: Hörsaal C Session Chair: Wojtek Jablonski, Erasmus University Rotterdam, Netherlands, The |
|
The COVID-19 Health Behaviour Survey: A Cross-National Survey Conducted via Facebook Max Planck Institute for Demographic Research, Germany Relevance & Research Question The COVID-19 pandemic affected daily life in unprecedented ways, posing serious challenges for governments and societies. Nonpharmaceutical interventions (NPIs), such as stay-at-home orders, physical distancing measures, and mask mandates, were pivotal in reducing transmission, particularly during the early stages when vaccines were unavailable. Understanding how populations responded to these interventions was crucial for developing effective communication strategies and policies. However, the lack of comprehensive data on behaviors and perceptions during the pandemic posed a significant challenge. This study sought to fill this gap by investigating behavioral responses to COVID-19 across diverse demographic groups and countries, examining the interplay between threat perception, preventive behaviors, and compliance with NPIs. Methods & Data To explore these dynamics, we conducted the COVID-19 Health Behavior Survey, a large-scale, cross-national online survey administered across eight countries: Belgium, France, Germany, Italy, the Netherlands, Spain, the United Kingdom, and the United States. Data collection relied on targeted Facebook advertisements, enabling rapid recruitment of participants during the pandemic’s initial wave. The survey, conducted between March 13 and August 12, 2020, yielded over 140,000 responses. It captured detailed information on participants’ health status, behaviors, social contacts, and attitudes toward COVID-19. Statistical techniques were employed to address potential sampling biases and ensure robust insights. Results The results highlighted significant demographic and national differences in pandemic responses. Women and older individuals perceived COVID-19 as a greater threat than men and younger groups, leading to higher adoption rates of preventive measures such as mask-wearing and physical distancing. Threat perception was particularly influential among vulnerable populations, including the elderly and those with preexisting conditions. Social contact patterns also changed markedly, with physical distancing guidelines leading to a 48%-85% reduction in social contacts compared to pre-pandemic levels across surveyed countries, often exceeding the impact of lockdown measures. Added Value This study provides valuable cross-national insights into behavioral responses during a global health crisis. By leveraging innovative survey methods and timely data collection, it underscores the importance of understanding population behavior to inform public health strategies and enhance preparedness for future pandemics. The findings offer actionable guidance for evidence-based policy-making and effective risk communication. Estimating Fertility Indicators in Low- and Middle-Income Countries: Evidence from a Network Reporting Online Survey in Senegal 1Max Planck Institute für demographische Forschung, Germany; 2Bielefeld University, Germany; 3University of California, Berkeley, USA Relevance & Research Question Data availability is often limited in developing countries, with timely administrative or survey data especially lacking. To address this, we propose a novel survey recruitment and estimation approach. First, we recruit survey participants through Facebook advertisements. While social media surveys are common in high-income countries, they are less frequently used in contexts like Sub-Saharan Africa, where internet and Facebook penetration are low. We aim to assess the potential of this approach in such settings. Additionally, we explore the feasibility of a network reporting approach to estimate fertility rates. Methods & Data We used Meta’s advertising platform to recruit survey respondents, targeting Facebook users aged 18 and over in all 14 regions of Senegal. Data collection occurred over one week in October 2024. Our survey included a network reporting component, where respondents provided information about themselves and three people from their regular social network. This approach captures unique data typically inaccessible in standard Facebook surveys, including socio-demographic information such as age, gender, education level, and number and age of children. Our analysis aims to estimate birth rates, using data from the Demographic Health Survey (DHS) as a benchmark for sample composition and fertility rate accuracy. Results Our sample includes 350 respondents, with 44% women. About 24% live in Dakar, Senegal's capital, and 37% live in rural areas. The average respondent age is 33. On average, respondents reported contact with 10 people the previous day and provided detailed information for up to three. The network sample (n=567) is gender-balanced (50% women) with an average age of 30. About 23% of network members reportedly do not use Facebook. Further analysis will focus on fertility rate estimation, comparing our findings with DHS data to assess the reliability of our approach.
Our study addresses data gaps in African fertility estimates and introduces a new data collection method using social media. By comparing our results with DHS data, we aim to evaluate the potential of this approach for providing timely fertility estimates in African contexts, thereby enhancing understanding of population trends in the region. Should We Be Worried? The Impact of Problematic Responses on Social Media Surveys 1German Centre for Integration and Migration Research (DeZIM); 2Bielefeld University, Germany Relevance & Research Question The digital age has transformed survey recruitment, with social media ads enabling cost-effective and rapid access to diverse and hard-to-reach populations. Despite this promise, this method raises critical challenges - one of these is the tendency for measurement errors in form of problematic response behaviors. These behaviors—including satisficing, low-effort responses, and fraudulent participation—threaten data quality. While some studies have shortly mentioned these issues, most social media-recruited surveys do not address them systematically. This raises the question: Should we worry about problematic responses in social media-recruited surveys? This study examines whether problematic respondents in social media-recruited surveys systematically differ from others and assesses their impact on data quality and model estimates. The study addresses two core questions: (1) Are problematic respondents systematically different in socio-demographics and substantive answers? (2) Do they bias multivariate model estimates? Methods & Data The study analyzes data from a web survey on labor market discrimination against women with headscarves in Germany, conducted in 2021/2024. Recruitment via targeted Facebook ads yielded 3,021 completed interviews at an average cost of €1.41 per respondent. Response quality was evaluated using indicators such as item non-response, straight-lining, speeding, and identity misrepresentation. Statistical tests, including chi-square, Fisher’s exact, and Mann-Whitney U-tests, were employed to identify significant differences. Multivariate regression models assessed the impact of problematic behaviors on key outcomes, such as perceived anti-Muslim discrimination. Results Preliminary findings reveal that problematic respondents differ significantly in socio-demographic composition and substantive answers. For example, behaviors like straight-lining and speeding are more frequent among younger and less-educated respondents. Multivariate analyses show that problematic responses distort key estimates, particularly on discrimination experiences. Cleaning the dataset improves model fit and the reliability of results, emphasizing the value of robust quality control measures. Added Value This study provides a systematic investigation of problematic response behaviors in social media-recruited surveys, shedding light on their prevalence, predictors, and implications for data quality. It underscores the necessity of incorporating quality control measures in future surveys, offering practical recommendations for researchers leveraging social media recruitment strategies. Optimizing Social Media Recruitment: Balancing Costs and Sample Quality in Non-Probabilistic Panels GESIS, Germany Relevance & Research Question With social media platforms increasingly utilized for participant recruitment, it’s critical to assess their effectiveness in building balanced non-probabilistic panels. This study investigates whether investing in targeted social media ads on platforms like Meta (Facebook and Instagram) can effectively balance recruitment costs and sample composition quality. We explore how different targeting criteria (such as age and gender) impact sample composition and evaluate the accuracy of platform-provided socio-demographic estimates. The study aims to understand the trade-offs between advertising budget allocation and sample representativeness, addressing the overarching question: do high-cost ad strategies improve recruitment outcomes? Methods & Data Our findings reveal that targeted social media recruitment can enhance demographic balance in certain respects but does not fully eliminate sampling biases. For instance, Meta’s age and gender targeting improved representation among older individuals but showed limitations for younger demographics. Additionally, while the socio-demographic estimates provided by Meta are generally reliable, slight misclassifications (around 5%) were observed. Cost analysis revealed that lower recruitment budgets yielded the most cost-effective samples, contradicting the notion that higher spending guarantees improved sample composition. Higher ad expenditures increased reach but also raised cost per participant, suggesting a strategic budget allocation is essential for optimal sample utilization. Added Value allocation is essential for optimal sample utilization. Added Value: |
2:30pm - 3:45pm | INVITED SESSION IV: DGOF KI (AI) Forum: World Café (Session held in German) Opportunities & Challenges in Applying AI to Market Research Location: Hörsaal D Session Chair: Georg Wittenburg, Inspirient, Germany Session Chair: Oliver Tabino, Q Agentur für Forschung, Germany Engage in World Café discussions with fellow GOR participants. Discuss your personal challenges and barriers of integrating AI into market research projects and collaborate on strategies to overcome them. Share experiences and gain new insights on successfully implementing AI in market research projects. |
3:45pm - 4:00pm | Break |
4:00pm - 5:00pm | 12.1: Reducing Nonresponse Location: Hörsaal A Session Chair: Alexandra Asimov, GESIS, Germany |
|
CAPI, or not CAPI – That Is the Question: Using Administratative Data to Assign the Optimal Mode for Maximizing Response-Rates in a Household Panel Statistik Austria, Austria Relevance & Research Question: Selecting the appropriate mode(s) of data collection is a major consideration for every survey. Personal interviews (CAPI) – typically recognized as the gold standard of data collection in surveys – have been increasingly called into question as the be-all and end-all of data collection methods, with a growing shift towards self-administered modes, particularly web surveys (CAWI). Reasons for this range from representation concerns due to shifting mode preferences and the flexibility these modes offer to busy respondents, all the way to practical constraints like health and safety concerns, interviewer availability, and budgetary restrictions. Yet, recruitment using CAWI alone might result in biases due to, e.g., systematic differences in digital skills. Thus, many surveys employ mixed-mode designs, raising the question of how to determine which mode should be offered to whom. Methods & Data: The Austrian Socio-Economic Panel (ASEP), a household panel of the Austrian population, experimentally tested a tailored mode-design using administrative data to assign half of the sample’s households to their presumably preferred mode (CAPI/CAWI) while also offering the other mode after persistent non-response. The other households were randomly assigned to one of the mode-designs (CAPI-First/CAWI-First) as control groups. To evaluate the utility of the tailored mode-design concept, we employed a multi-facetted analytical approach by comparing a variety of indicators, such as the rate of proxy-interviews, the number of requested mode-changes, or the overall response-rates and resulting nonresponse bias, between the different mode-designs. Added Value: With the tailored mode-design, we present an interesting and promising alternative to already established single- or mixed-mode designs. This novel approach could prove helpful to decrease nonresponse bias and survey costs while maintaining data quality. The Framework of Survey Behaviour: An Extension of the Framework for Web Survey Participation Statistics Netherlands Relevance & Research Question: Why do people behave the way they do in surveys? The answer to this fundamental question in survey research can help increase survey participation, decrease break-off and improve data quality. Underneath this seemingly simple question is a complex interplay of factors influencing survey behaviour (i.e., the behaviour of (potential) respondents). While current frameworks, theories and models provide valuable insights into this behaviour, they all have limitations in understanding survey behaviour as a whole. Furthermore, none are generically applicable across survey behaviours for all modes, devices, and target populations (i.e., person, household, establishment). Methods & Data: We conducted an extensive literature review of both generic behavioural, and survey-specific frameworks, theories and models. Using the Framework for Web Survey Participation (Peytchev, 2009) as a starting point, we extended this framework into our generic Framework of Survey Behaviour. Results: The resulting framework provides a holistic view of the factors affecting the key survey decisions and the underlying behaviours that shape those decisions. The key survey decisions reflect the three main goals in survey research: getting people to start the survey, complete the survey, and provide high-quality responses. These decisions are affected by five groups of factors: environmental factors, respondent factors, interviewer factors, survey design factors, and questionnaire factors. The underlying survey behaviours that shape those decisions are diverse and range from (proxy) responding, satisficing, breaking off, and straightlining, to speeding. Added Value: By centralising behaviour in the framework, we offer a comprehensive approach that considers all human, organisational, and environmental elements involved in the survey process. The framework guides researchers in designing surveys and collecting high-quality data across diverse contexts. Understanding and being able to influence survey behaviour for the better is key in order to improve respondent engagement and data quality. Practical recommendations are provided, and future research areas are identified. References: Peytchev, A. (2009). Survey breakoff. Public Opinion Quarterly, 73(1):74–97. Survey design features that matter: A meta-analysis using official statistics surveys of the Netherlands Statistics Netherlands Relevance & Research Question: More and more surveys are being conducted, but response rates are declining. The solution to avoid the missing data problem is to avoid having any. Understanding which factors influence response rates is crucial for improving survey participation and reducing nonresponse bias. This study investigates which features significantly affect response rates and how they can be optimized to improve survey participation? Methods & Data: We conducted a multilevel meta-analysis using Statistics Netherlands’ data from 38 person population surveys with over 1200 samples. The surveys were fielded over a seven-year period (2018–2024) and had a total sample size of over 7 million people. These surveys range from one-time to recurring studies with frequencies from weekly to biennially. We used 72 factors such as respondent factors (e.g., age, gender, nationality, device use), survey design factors (e.g., year, month, mode, incentive, fieldwork period, number of contacts, topic) and page & question factors (e.g., duration, number of block, pages & questions, number of question per question type, number of introduction texts) that potentially have an effect on the response rates. Results: Preliminary findings suggest that the data collection mode, type of incentives, survey topic, the number of and types of questions, the device used by the respondent, and age and gender have significant effects on the response rate. Interestingly, the length of the fieldwork, the number of reminders, and the periodicity of the survey show non-significant effects. Added Value: This study offers comprehensive insights into improving response rates for official statistics surveys, highlighting effective data collection strategies and identifying survey designs to avoid. Utilizing a wide range of different design features, the study serves as a practical toolbox for national statical agencies, survey agencies, and survey researchers alike. Notably, it represents the first practical application of the Framework of Survey Behavior, which has also been submitted to this conference. |
4:00pm - 5:00pm | 12.2: App-based Diary Studies Location: Hörsaal B Session Chair: Otto Hellwig, Bilendi & respondi, Germany |
|
The effect of personalized feedback on participant behavior in data collection: Using paradata to understand participation rates and participant engagement in app-based data collection 1Utrecht University, The Netherlands; 2Statistics Netherlands Relevance & Research Question From 1441 households, 290 participated in the study. We find no effect on participation rates and study engagement when promising feedback in the invitation stage and in providing participants with personalized feedback. We will present detailed findings of our analysis of paradata to demonstrate how participants interacted with the personalized feedback (e.g., number and duration of engagements, changes to reporting of expenses in the feedback and no-feedback groups), as well as and whether the feedback had an effect on specific subgroups. Added Value A Recipe to Handle Receipts? Usability-testing the Receipt Scanning Function of an App-based Household Budget Diary Federal Statistical Office Germany (Destatis), Germany Relevance & Research Question As part of the EU project Smart Survey Implementation (SSI), the Federal Statistical Office of Germany (Destatis) is participating in the development of Smart Surveys. Smart Surveys combine traditional question-based surveys and smart features, that collect data by accessing device sensor data. One smart feature is a receipt scanner, making use of the smartphone camera, allowing participants to upload pictures of shopping receipts in a survey app. The aim is to reduce respondent burden in diary-based household budget surveys. However, smart features can only help to simplify surveys if they can be easily used. Therefore, the usability of the receipt scanning function was tested with potential respondents. Methods & Data Destatis conducted qualitative usability tests with a diary-based household budget app, featuring a prototype of a receipt scanner. 19 participants used the app, while their interaction with the app was observed, followed by an interview on their experiences. Results Given the choice between manual input and using the scanner, respondents prefer the scan function to record purchases. Participants appreciate the fast and easy way to record receipts, compared to the manual input of purchases. Making use of the scan function does not raise privacy concerns – at least not as long as an advantage is clearly given and as the publisher of the app is known and trusted. All participants were able to use the scan function, although user friendliness of the current state of development proved to be insufficient. As a prototype was tested, the scan results were prone to errors, which had to be corrected. Respondents do not accept having to correct data, as the effort involved is perceived as too high and results are expected to be accurate. Added Value The study shows that a receipt scanning function per se is highly appreciated. However, in order to be used by the respondents, it is imperative that the function works pefectly, that its operation is easy to understand and involves little effort. Concerning the further development of this smart feature, the results confirm us in our approach but also show, where improvements are needed. How many diary days? Smart surveys can help to reduce burden and costs of data collection for behavioural statistics 1Utrecht University, Netherlands, The; 2Statistics Netherlands Relevance & Research Question Diary studies are used to capture detailed respondent behaviour, but they are highly burdensome for respondents, resulting in nonresponse and measurement-errors. Smart surveys (i.e using a smartphone or activity tracker) can make use of sensors that can replace questions, reducing the response burden. Researchers can then collect diary data for a prolonged time period. We aim to answer the question “How many days should we collect data for a smart diary study in behavioural statistics?”, considering the response burden and the inter- and intra-personal variability of statistics. Methods & Data We use data from four one-week smart diary studies: a physical activity study (2021, N = 414), a travel study (2018 and 2022, N = 185), and a budget study (2022, N = 10,421). All studies used probability-based samples, were conducted by Statistics Netherlands, and have different levels of response burden. For each data source we calculate two statistics based on the frequency and duration of the studied behaviour. For each statistic we separate the inter-individual and intra-individual variance. We calculate the intraclass-correlation-coefficient (ICC), the proportion of intra-individual variance out of the total variance. We then calculate the reliability of a study period of 7 or fewer days, based on these variance components. Results The intra-individual variance for 7 days is high for each of the studies, meaning that not every day looks the same for studied individuals. The ICC is approximately 50% for travel statistics, 55% for physical activity statistics, and 90% for expenditure statistics. The reliability for the first two studies is around 0.80 at 7 days, meaning that 7 days provide enough information. While for expenditure, reliability is around 0.50 at 7 days, implying more days are useful. Added Value Our results can guide the decision on the number of diary days by taking into consideration burden and data quality. The results can help practitioners efficiently allocate resources (e.g., lower sample size, shorter data collection), reducing the costs. Our conclusions and recommendations have relevance across a wide range of research areas concerning studies on (time-use) behaviour, and studies using (smartphone) sensors or apps. |
4:00pm - 5:00pm | 12.3: Reminders and Targeting Location: Hörsaal C Session Chair: Florian Heinritz, Leibniz Institute for Educational Trajectories, Germany |
|
I want YOU for this survey! Text interventions in email reminders to increase response rates in web surveys University of Gothenburg, Sweden Relevance & Research Question This is a 2x2 factorial experiment using the Swedish Citizen Panel, a non-commercial non-incentive web panel. Each group received a different intervention: (1) control group/no intervention (a general reminder), (2) social identity intervention (eliciting a sense of social identity by using phrasing such as “We need more responses from people like you”), (3) descriptive norm intervention (inducing a norm activation with a text like “Most people have already responded but we still need your answer”), and (4) a combined social identity intervention (2) and descriptive norm intervention (3). Results Results indicate a positive pattern for all three experimental conditions. The response rate after the first reminder was significantly higher in the combined social identity and descriptive norm intervention (18,8 %) compared with the control group (15,6 %). Sub-group analyses indicate that certain social groups are more sensitive to the experimental manipulations. Added Value The effects of using digital mailbox reminders The SOM Institute, University of Gothenburg, Sweden Relevance & Research Question Text messages have often been used as a cost-effective way to remind sample persons to complete mailed paper-and-pencil and online questionnaires. However, in some countries, the likelihood of identifying phone numbers of sample persons have decreased rapidly. Fortuitously, digital mailboxes have been implemented in Sweden where government authorities and companies can send mail digitally instead of physically. According to the Agency of Digital Government (2024), more than 6 million Swedes used these digital mailboxes in 2023. The present study assessed the effect that reminding potential respondents by digital mailboxes instead of through text messages. Methods & Data The preregistered experiment was implemented in a self-administered mixed-mode survey (paper-and-pencil and web questionnaire) administered to a sample of 26,250 randomly selected individuals. Prior to being invited to complete the questionnaire, all sample persons were randomly assigned to one of two groups. One group of sample persons received reminders via four regular mailings and four texts, whereas the other respondents received reminders through four regular mailings, two through texts, and two through their digital mailbox. Per the preregistration, the study will assess the two reminder strategies in terms of response rates, nonresponse bias, and data quality. Results The data collection period started in August 2024 and was finalized at the beginning of January 2025. The results indicated that sending reminders through digital mailbox had a significantly positive effect on response rate. Added Value Can Targeted Appeals Win Over Late Respondents in a Business Web Surveys? 1Swiss Federal University for Vocational Education and Training SFUVET, Switzerland; 2Konstanz University, Germany Relevance & Research Question Nonresponse is a particularly salient issue in the context of business web surveys, as response rates for such surveys are on average lower than for personal web-surveys. Previous research in the field of business surveys has mainly focused on testing the utility of different contact and survey modes to increase response rates. The following study uses an experimental approach to test the implementation of an adapted and expanded design feature to increase motivation to complete the survey, originally proposed by Lewis et al. (2019). This is done by implementing a more interactive and targeted communication with late respondents, i.e. potential nonrespondents. Methods & Data Therefore, a randomly selected sample of late respondents (N=3'000 in total) to a well-established Swiss business web survey was selected and asked to indicate their primary reason for nonresponse at various stages in the survey design. Subsequently, they were randomly presented with one of three appeals: a targeted appeal addressing their previously stated reason for nonresponse, a generic general appeal addressing some of the most prevalent reasons for nonresponse, or a main appeal providing arguments for the nonresponse reason of "not time". After reading the appeal respondents had to indicate whether they wanted to proceed immediately to the main survey or postpone the decision to participate. Results As hypothesized, a targeted appeal demonstrated a more pronounced motivational effect compared to the two alternative appeal types in persuading respondents to proceed to the main survey. However, the motivational effect was limited insofar as it did not extend to a higher motivation to provide more complete data. Furthermore, more detailed analyses revealed that the conversion rates are neither independent of the type of nonresponse reason nor timing factors. Added Value The study demonstrates that the implementation of such an interactive feature is feasible and potentially worthwhile, even in the context of a challenging business survey. It shows the potential to motivate target respondents to participate, particularly at a later stage in the fieldwork period. |
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: GOR 25 |
Conference Software: ConfTool Pro 2.8.105 © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |