Beyond the First Questionnaire: Retaining Participants in an App-based Household Budget Survey
Maren Fritz, Florian Keusch
University of Mannheim, Germany
Relevance & Research Question
Survey attrition is a common problem in household budget surveys (HBS) as such surveys impose a high burden on participants, asking them to report their expenses daily for a specific period. We study the attrition in an app-based HBS with three preceding questionnaires and a 14-day diary. Participants can drop out at each of the questionnaires or during the diary. If respondents drop out, the data obtained from them contains missing information. The research questions are: 1. At what stage do participants drop out of an app-based HBS? 2. Does the way the study is presented in the invitation letter influence the drop-out? 3. What individual characteristics correlate with drop-out at the different stages?
Methods & Data
In 2024, we drew a probability sample of 7049 individuals from the register of residents and invited them by mail. We included an experimental variation in the invitation letters to analyse whether stressing the effort associated with participation and whether mentioning a receipt scanning function has an influence on participation. The survey consisted of three initial questionnaires regarding personal information, income and expenses, and information about the household. After having completed them, respondents were asked to continue with an expense diary. To be eligible for an incentive, participants had to manually enter their expenses or scan their receipts on at least seven days.
Results
Participants drop out of the study continuously, even after they started data entry in the diary. The poster will present results on the attrition and drop out across the stages, and whether the three experimental groups differ significantly in their attrition rates across the stages. Additionally, the poster shows whether people with specific characteristics are more likely to drop out at a particular stage.
Added Value
This research is part of the Smart Survey Implementation (SSI) project, funded by EUROSTAT, which aims to enhance data collection for official statistics across Europe through digital innovation. This experiment addresses attrition in app-based HBS. It informs decisions about what actions can be taken to reduce attrition.
Joint Evaluation of LLM and Human Annotations with MultiTrait–MultiError Models
Georg Ahnert1, Maximilian Kreutner1, Alexandru Cernat2, Markus Strohmaier1,3,4
1University of Mannheim; 2University of Manchester; 3GESIS–Leibniz Institute for the Social Sciences; 4CSH Vienna
Relevance & Research Question Large Language Models (LLMs) are increasingly used to for text annotation, and for simulating survey responses via silicon samples. To evaluate their validity, researchers commonly assess the alignment of generated annotations and survey responses with data from human participants. However, treating human data as ground-truth overlooks the various sources of measurement error inherent in human data collection.
We propose applying MultiTrait–MultiError (MTME) models to jointly analyze imperfect responses from LLMs and humans. Our goal is to assess the extent to which MTME models can qualify measurement quality in responses from LLMs and humans, and whether they can improve estimates of underlying traits.
Methods & Data Our MTME models estimate latent trait and error factors based on multiple measures from humans and LLMs of the same trait, e.g., the readability of a text excerpt. LLM measures may vary by model size, model family, prompting strategy, or decoding approach.
In an initial study, we fit an MTME model on LLM-generated annotations of publication year and readability for 600 text excerpts from the CommonLit Ease of Readability Corpus (CLEAR) corpus. We evaluate four Qwen 3 LLMs (4B–32B). We include ground-truth data from the CLEAR corpus into our evaluation, purely for demonstrating the feasibility of the MTME approach.
Results Our initial results show that the measurement quality estimated by the MTME model partially matches correlations of individual LLMs with the CLEAR data. We find that the factor scores extracted from the MTME model for readability have a higher correlation with the CLEAR data (0.76) than the responses of each individual LLM (≤ 0.70), indicating that MTME aggregation yields a more accurate estimate of the underlying trait. Next, we will incorporate human-coded annotations into the MTME model and conduct experiments involving other datasets, categorical variables, and LLM-generated survey responses.
Added Value We demonstrate that MTME models can jointly estimate measurement quality in responses from humans and LLMs. By extracting latent traits that pool information across participants and models, we show that MTME-generated predictions can improve the estimation of underlying traits. MTME models are a promising method for the joint analysis of LLM and human data.
Lessons learned: Utilizing Social Media Influencers for Targeted Recruitment on Discrimination in the German Healthcare System
Zaza Zindel1,2, Aylin Mengi1, Zerrin Salikutluk1,3, Tae Kim1
1German Centre for Integration and Migration Research (DeZIM), Germany; 2Bielefeld University, Germany; 3Humboldt-University, Germany
Relevance & Research Question Traditional sampling strategies often fail to recruit sufficient cases from small or marginalized populations, especially for sensitive topics where distrust or fear of repercussions are common. While social media recruitment is increasingly discussed in survey methodology, most work centers on targeted advertising rather than leveraging the “social” infrastructure of platforms: trusted creators embedded in tightly segmented communities. Influencers can function as credible intermediaries and, methodologically, as seeds in a semi-network (virtual snowball) recruitment process. This poster reports lessons learned from a feasibility study leveraging a network of “Medfluencers” and anti-racism advocates to recruit participants for research on discrimination in the German healthcare system. We ask: Is engaging social media influencers worthwhile for survey recruitment, and what methodological insights emerge regarding reach, conversion, and sample composition? Methods & Data We fielded an online survey on experiences of discrimination in the German healthcare system among patients and medical professionals. The questionnaire combined closed items with multiple open-ended text prompts to elicit detailed accounts. Recruitment centered on one primary seed influencer: a practicing medical doctor who is also a visible Muslim woman. She shared the survey link and mobilized her network of fellow influencers to repost, creating a virtual snowball mechanism through interconnected audiences. Data collection ran from February 1-25, 2025. Results The seed post received 14,952 likes and 437 comments. The survey link was accessed 42,530 times; 27,710 individuals consented; 17,390 completed the questionnaire (62% post-consent completion). The campaign produced a strong multiplier effect over a short field period and reached key target groups: 3,505 Muslim participants and 2,979 respondents working in the German medical system. The sample skewed female (n = 16,803) and younger (mean age 35.2) than the average German population. High perceived source credibility appeared to facilitate trust and candid answering; 2,387 respondents volunteered for follow-up interviews. Added Value Although not suited for population inference, influencer sampling can efficiently generate large, highly motivated samples in under-researched groups. The poster concludes with concrete lessons learned on when and how influencer-seeded recruitment can be productively used in survey research, and which limitations researchers must communicate transparently when interpreting resulting data.
Classifying Moral Reasoning in Political Discourse: Demonstrating Interrater Reliability and Testing an AI-Based Classification Approach
Felix Schmirler, Rudolf Kerschreiter
Freie Universität Berlin, Germany
Relevance & Research Question
Moral reasoning, whether people justify political positions through rules and duties (deontological reasoning) or through expected outcomes (consequentialist reasoning), is central to understanding how people deliberate in polarised political (online) debates. While experimental moral-dilemma research shows differences in rule-based vs outcome-based judgments depending on political ideology, it remains unclear whether such patterns manifest in real-world political communication. Capturing moral reasoning in naturalistic discourse could deepen our understanding of how polarization emerges and provide a foundation for designing interventions that adapt to people’s reasoning styles. This study asks: Can moral reasoning in political discourse be reliably classified using NLP methods, and do we find evidence supporting fundings from experimental moral dilemma research?
Methods & Data
This study presents a validation of an approach to classify moral reasoning in political discourse using a large language model (LLM). A corpus of 576 sentences from Reddit discussions and German parliamentary speeches was pre-sampled using a novel extension of the Distributed Dictionary Representations (DDR) method (Garten et al., 2018), which identifies sentences with high cosine similarity to exemplary reasoning styles. Two expert raters then independently coded each sentence as deontological, consequentialist, or neutral and adapted the coding manual based on deliberation after each round.
Results Interrater reliability improved across three codebook iterations to excellent reliability (Krippendorff’s α = .56-.68 → .92-.93). Agreement between human and AI-assigned labels based on cosine similarity were subsequently also sufficient to demonstrate the feasibility of classifying moral reasoning styles in large text corpora through LLMs (Krippendorff’s α = .70-.73). Building on this validation, this method will be applied to a larger multilingual corpus (~12k sentences) to analyze ideological and temporal patterns in moral reasoning across political and cultural contexts. Added Value This work offers the first validated procedure for detecting deontological and consequentialist reasoning in naturalistic political communication, bridging experimental moral psychology and large-scale text analysis. It establishes a foundation for ongoing research using a multilingual, cross-country corpus to study ideological and temporal patterns in moral reasoning, with potential downstream applications for designing tailored, depolarizing communication interventions.
Dynamic Surveys for Dynamic Life Courses: Development of a Web-App for Self-Administered Life History Data Collection
Sebastian Lang1, Heike Spangenberg2, David Ohlendorf2, Heiko Quast2, Leena Lahse2
1Leibniz Institute for Educational Trajectories (LIfBi), Germany; 2German Centre for Higher Education Research and Science Studies (DZHW), Germany
Relevance & Research Question Life history data is essential in social sciences, yet retrospective collection often suffers from memory errors, reducing data quality. The Life History Calendar (LHC) addresses some of these issues, but its traditional interviewer-based administration in surveys is costly and participation rates are declining. Self-administered web surveys (CAWI) offer a promising alternative. Our research asks: Can a dynamic, self-administered web application with an integrated LHC and continuous access improve data quality and reduce response burden compared to a classic retrospective collection?
Methods & Data To explore this, we developed a web app that enables respondents to update their life courses continuously, aiming to enhance usability, minimize recall errors, and simplify panel maintenance. We designed two experiments where respondents are randomly assigned to treatment groups (continuous life history data collection with or without reminders) and a control group (classic retrospective collection). Preliminary results come from the first experiment, implemented as NEPS next add-on study using starting cohort three. We analyze response rates and the effect of the new LHC implementation on response burden using a fixed-effects panel regression model.
Results First results show no differences in initial participation across groups. Similarly, response rates (RR1) for completed interviews do not differ between the existing LHC and the new web-app implementation. Regarding response burden, we observe an increase immediately after the LHC section, but this increase does not differ across experimental groups.
Added Value Our approach introduces a flexible, cost-efficient solution for self-administered life history data collection. The added flexibility in reporting life courses did not negatively affect participation or burden, which is a key prerequisite for the main experiment on continuous access. This innovation enables continuous updates, simplifies panel management, and aims to counteract attrition, offering a scalable model for improving longitudinal data quality in self-administered life history collection.
|