GOR 26 - Annual Conference & Workshops
Annual Conference- Rheinische Hochschule Cologne, Campus Vogelsanger Straße
26 - 27 February 2026
GOR Workshops - GESIS - Leibniz-Institut für Sozialwissenschaften in Cologne
25 February 2026
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
| Session | ||
2.2: Paradata and metadata
| ||
| Presentations | ||
Metadata uplift of survey data for research discovery and provenance 1University College London; 2Scotcen; 3University of Surrey; 4University of Essex Relevance & Research Question: -The profusion of data from the introduction of CAI has created three main problems for researchers, volume, complexity and understanding quality. The disjointed nature of many survey data collections has fragmented this across many organisations, during in which much valuable information is lost or opaque to the researcher using data at the end of the data lifecycle.- Methods & Data: -Focused and well contructed Machine Learning offers the possibility of automating at scale the available metadata resources into standardised metadata which can be made available in repositories for discover, and creating the detailed granular metadata that a researcher needs to evaluate quality of complex survey prior to data applications or access. The collaboration between CLOSER, University of Essex, University of Surrey and Scotcen has been developing machine learning models, utilising the CLOSER Discovery metadata store to improve the timeliness and accuracy of metadata extraction to deliver high quality metadata.- Results: -Preliminary results will be presented on the success and challenges faced in taking complex survey instruments and rendering them into DDI-Lifecycle for ingest into repository platforms and the opportunites for further enhancement of these metadata resources for reuse of questions into the survey development pipeline.- Added Value: -The abilty to create high quality reuable metadata across the survey specification, collection, management and dissemination lifeycle would bring efficiencies in terms of costs, improvements in quality, discoverability and understanding of these complex data resources.- Beyond the Questionnaire: Linking Passively Metered Platform Data with Surveys for Audience Profiling Datapods GmbH, Germany Relevance & Research Question: The integration of large-scale passively metered data with established survey methodologies has become a central development in contemporary market and social research. In particular, digital trace data originating from major platform operators such as Google, Meta and TikTok represents a highly promising source for enhancing sociological measurement, audience segmentation and modeling. However, substantial challenges remain with respect to obtaining continuous, consent-based access to such platform data, and to linking heterogeneous data types in a methodologically robust and privacy-compliant way. Methods & Data: Datapods has established a novel approach with its own proprietary user panel that allows for the combination of survey methodologies to define socio-economic, value-based and demographic profiles with direct copies of the personal data from big tech companies. We first established the baseline for these profiles by utilizing common survey methodologies. These measures serve as our ground truth for subsequent validation. In a second step, we linked these baseline profiles with corresponding behavioral data streams, including web-browsing histories, YouTube viewing histories and interaction logs on Instagram, Facebook and TikTok. We identified key indicators for different data types to be the most influential for the profile of the panelist and joined data across types to ensure a holistic picture about the user. Results: Early results indicate that only a relatively small subset of survey items adds substantial incremental information beyond what is already embedded in the digital trace data. Researchers can, in practice, rely on high-quality, consent-based personal data to assign users to pre-defined socio-demographic and value-based target group profiles with high accuracy, and to identify fundamental clusters and segments within the panel. The results suggest that passively collected platform data can function as a proxy for many conventional survey indicators. Final empirical results will be available by the end of 2025 and handed in subsequently. Added Value: This passively metered, platform-data-based approach to user profiling and segmentation substantially enhances survey-centric designs and, for certain research questions, can partially or even fully substitute conventional survey data collection. It enables more granular behavioral indicators and provides a scalable solution for continuous audience measurement and sociological analysis. Visualizing the Answering Process: Exploring Mode Differences with Respondent-Level Paradata from the IAB Establishment Panel 1Institute for Employment Research, Germany; 2LMU Munich Relevance & Research Question Understanding how respondents interact with survey instruments is crucial for facilitating the response process and improving data quality. Especially for establishments there is still a lack of insights into their response behavior. By analyzing respondent-level paradata we aim to explore the answering process in detail. We investigate how establishments navigate through the online questionnaire of the IAB Establishment Panel, focusing on differences between survey modes (CAI versus Web) and samples (panel versus refreshment). Following this, we investigate whether distinct response patterns can be identified and if there is a need for a tailored response process by utilizing important establishment characteristics. Methods & Data We analyze detailed respondent-side paradata from the IAB Establishment Panel, conducted annually by the Institute for Employment Research (IAB). Since 2018, the survey has been implemented in a mixed-mode design with computer-assisted personal interviewing (CAI) and an online mode (Web) using identical software. In 2022, we collected paradata logging every click, answer, and timestamp at the second level. After creating an audit trail for each respondent, we identify appropriate paradata indicators and apply cluster analysis to identify groups of establishments with similar navigation and response behaviors. Results The visualization of paradata via audit trails reveals differences in navigation behavior. Some establishments follow a straightforward sequence, while others loop back or perform multiple checks before submission. Paradata indicators reveal that Web respondents take longer, more breaks, use more tree view, edit more answers and drop-out more often than CAI respondents. Preliminary results of the clustering analysis show that we can identify two clusters for each combination of sample and mode. Cluster 1 seems to include the linear respondents, while cluster 2 includes all other respondents with more conspicuous response behavior. Added Value By combining visualization and clustering of establishments response processes, we provide an empirical approach of utilizing paradata. Our results help to understand how establishments navigate and respond to a survey. Simultaneously, we give recommendations for the survey design and evaluate mixed-mode establishment surveys. | ||