Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
T2: GOR Thesis Award 2024 Competition: PhD
Time:
Thursday, 22/Feb/2024:
12:00pm - 1:15pm

Session Chair: Olaf Wenzel, Wenzel Marktforschung, Germany
Location: Seminar 2 (Room 1.02)

Rheinische Fachhochschule Köln Campus Vogelsanger Straße Vogelsanger Str. 295 50825 Cologne Germany

Show help for 'Increase or decrease the abstract text size'
Presentations

Challenging the Gold Standard: A Methodological Study of the Quality and Errors of Web Tracking Data

Oriol J. Bosch1,2,3

1University of Oxford, United Kingdom; 2The London School of Economics, United Kingdom; 3Universitat Pompeu Fabra, Spain

Relevance & Research Question

The advent of the Internet has ushered the social sciences in a new era of data abundancy. In this era, when individuals engage with online platforms and digital technologies, they leave behind digital traces. These digital traces can be collected for scientific research through innovative data collection methods. One of these methods, web trackers, has gained popularity in recent years. This approach hinges on the utilization of web tracking technologies, known as meters, encompassing a diverse array of solutions that participants can install onto their devices. These meters enable the tracking of various traces left by participants during their online interactions, such as visited URLs.

Historically, web tracking has been upheld as the de facto gold standard for measuring online behaviours. This thesis studies whether this prevailing notion holds true. Specifically, it explores the following questions: is web tracking data affected by errors? If so, what is the prevalence of these errors? To what extend do these errors introduce bias to web tracking measures? What is the overall validity and reliability of web tracking measures? And can we do something to limit the impact of web tracking data on the measurement quality of its measures?

Methods & Data

To explore these questions, this thesis uses data from the TRI-POL project. The TRI-POL is three-wave survey, conducted between 2021 and 2022, matched at the individual level with web tracking data. Data were collected through the Netquest opt-in metered panels in Spain, Portugal, and Italy, which consist of individuals who have meter(s) already installed in their devices and who can be contacted to conduct surveys. Cross quotas for age and gender, and for educational level, and region were used in to ensure a sample matching on these variables to the general country online populations.

The thesis is composed of three interconnected papers. The first paper, “When survey science met web tracking: Presenting an error framework for metered data”, develops and present a Total Error framework for digital traces collected with Meters (TEM). The TEM framework (1) describes the data generation and the analysis process for metered data and (2) documents the sources of bias and variance that may arise in each step of this process. Using a case study, the paper also shows how the TEM can be applied in real life to identify, quantify, and reduce metered data errors.

The second paper, “Uncovering digital trace data biases: tracking undercoverage in web tracking data,” adopts an empirical approach to address tracking undercoverage. This is a key error identified in the TEM: the failure to capture data from all the devices and browsers that individuals utilize to go online. The paper uses a new approach to combine self-reported data on participant’s device usage, and paradata about the devices tracked, to identify undercoverage. Moreover, the paper estimates the bias introduced by different undercoverage scenarios, through the use of a Monte Carlo simulations.

The third and last paper, “Validity and Reliability of Digital Trace Data in Media Exposure Measures: A Multiverse of Measurements Analysis,” explores the validity and reliability of web tracking data when used to measure media exposure. To do so, the paper uses a novel multiverse of measurements analysis approach, to estimate the predictive validity and true-score reliability of more than 7,000 potentially designable web tracking measures of media exposure. The reliability of the multiverse of measurements is estimated using Quasi-Markov Simplex Models, and the predictive validity of the measures is inferred as the association between media exposure and political knowledge (gains). Furthermore, the paper estimates the effect of each design choice on the reliability and validity of web tracking measures using Random Forests.

Results

The TEM in the first paper suggest that web tracking data can indeed be affected by a plethora of error sources and, therefore, statistics computed with this data might be biased. Hence, caution should be taken when using metered data for inferential statistic. By clearly showing how web tracking data is collected and analysed, and identifying the errors of web tracking data, the framework allows to develop approaches to quantify those errors, and strategies to minimise them.

Furthermore, the thesis shows - in the second paper- that tracking undercoverage is highly prevalent in commercial panels. Specifically, it reveals that across the countries examined, 74% of the panellists studied had at least one device they used for online activities untracked. Additionally, the simulations prove that web tracking estimates, both univariate and multivariate, are often substantially biased due to tracking undercoverage. As an example, across the different scenarios tested, undercoverage can inflate the proportion of participants identified as news avoiders by 5-21 percentage points, an overestimation of 29-123%. This represent the first empirical evidence demonstrating that web tracking data is biased. Moreover, it exposes deficiencies in the practices and procedures followed by both online fieldwork companies and researchers.

Focusing on the measurement properties of web tracking measures, the third paper shows that the median reliability of the entire universe of measurements explored is high but imperfect (≈ 0.86). Hence, in general, the explored measures of media exposure capture around 86% of the variance of their true score. Conversely, the predictive validity of the measures is low, given that overall the association between being exposed to media and gaining political knowledge is null. Although most self-reported measures of media exposure have been criticized precisely because of their lack of predictive power, results suggest that this is not limited to self-reports. Hence, with the current evidence, web tracking measures of media exposure cannot be considered an improvement to self-reports. Additionally, results from the Random Forests suggest that the design decisions made by researchers when designing web tracking measurements can have a substantial impact on their measurement properties.

Added Value

Collectively, this thesis challenges the prevailing belief in web tracking data as the gold standard to measure online behaviours. It shows that web tracking data is affected by errors, which can substantially bias the statistics produced, as well as harm the reliability and validity of the resulting measures. In addition, the thesis demonstrates that high-quality measures can only be achieved through conscious design decisions, both when collecting the data (e.g., making sure all devices are tracked), and when defining how to construct the measurements. Methodologically, the thesis illustrates how a combination of traditional survey and computational methods can be used to assess the quality of digital trace data.



The Language of Emotions: Smartphone-Based Sentiment Analysis

Timo Koch1,2

1University of St. Gallen, Switzerland; 2LMU Munich

Relevance & Research Question:

In an era transformed by artificial intelligence (AI) and the surge of voice assistants, chatbots, and other text or speech-based systems generating massive volumes of language data, automated emotion recognition and sentiment analysis have become integral across disciplines ranging from online marketing to user experience research.

However, a main challenge has constrained previous research in this field: differentiating subjective emotional experience ("How do I feel in this moment?") from observable emotional expressions ("How do I express my feelings through language?"). While recognizing subjective emotions is of great scientific and practical relevance, the empirical difficulty of obtaining data on subjective emotional experiences and concurrent real-time language samples has limited the research. As a consequences, prior studies and deployed algorithms mainly relied on datasets composed of text or speech data either rated by participants for their emotional content or provided by actors, thereby focusing on emotion expression.

Here, the advent of conventional smartphones has provided a novel research tool, enabling the collection of self-reports on subjective emotional experience via apps and the gathering of everyday speech data through the smartphone's keyboard and built-in microphone. The present work leverages the ubiquity of smartphones, utilizing those capabilities to gather authentic text and speech samples, along with self-reported emotional states, bridging the gap between subjective emotional experiences and their linguistic expressions.

Thereby, the present dissertation addresses the research question if subjective emotional experience can be associated with and predicted from features in spoken and written natural language. Moreover, it identifies specific language characteristics, such as the use of certain word categories or voice parameters, associated with one’s subjective emotional experience. Finally, this work examines the influence of the context of language production on emotional language.

Methods & Data:

The present dissertation unfolds across two pivotal studies, employing everyday smartphones to collect rich datasets of both spoken and written language as well as self-reports on momentary emotional experience.

Study 1 analyzes subjective momentary emotion experience in more than 23,000 speech samples from over 1,000 participants in Germany (Study 1.1) and the US (Study 1.2). In Study 1.1, participants uttered predetermined sentences with varying emotional valences (positive/neutral/negative) into their smartphones' microphones and self-reported on their momentary emotional states through an app. From the voice logs, vocal parameters (e.g., loudness, pitch, frequency) were algorithmically extracted. On the contrary, in Study 1.2, participants were given the freedom to express their current thoughts and feelings during the speech recordings alongside the emotion self-reports. Here, not only acoustic parameters, but also state-of-the-art word embeddings based on a Large Language Model (LLM) were extracted from participants’ speech. Then, machine learning algorithms were employed to predict self-reported emotional experience from the extracted voice parameters and word embeddings. Also, interpretable machine learning methods were employed to identify the most important vocal features for emotion predictions.

Study 2 leverages a dataset of over 10 million typed words from 486 participants to investigate traces of subjective emotion experience in text data. Here, the smartphone’s keyboard was utilized to log data on typing dynamics (e.g., typing speed), word use based on sentiment dictionaries and indirect emotion markers (e.g., use of first person singular), and emoji and emoticon use. Moreover, the logged data were enriched with contextual information on the app where the respective text had been produced as well as the input prompt text (e.g., “Was gibt’s Neues?” on Twitter). This allowed to distinguish between private communication, for example sending a message on WhatsApp, and public communication, like posting on Facebook. As in study 1, self-reported momentary emotional states and overall stable trait emotionality were assessed through an app. Then, descriptive correlations between self-reported emotion measures and language characteristics as well as machine learning models were investigated for different communication contexts and time aggregations (e.g., daily emotional experience vs. momentary emotions).

Results:

Results from study 1 indicate that while scripted speech offers limited emotional cues, spontaneous speech significantly enhances the prediction accuracy for emotions. Further, speech content showed a superior predictive performance compared to vocal acoustics in the employed machine learning models. Also, for both prompted and spontaneous speech, the emotional valence of the spoken content had no effect on the algorithmic recognition of emotions from vocal features. Finally, interpretable machine learning methods revealed vocal features related to loudness and spectral fluctuation to be most relevant for emotion predictions from vocal parameters.

Study 2 reveals that sentiment dictionaries capture subjective emotion experience for large time windows, such as for overall trait emotionality or weekly emotion experience, but are limited for shorter periods, like momentary emotions. Besides those time effects, findings indicate that the context of language production has a significant impact on distinct emotion-related language variations. Most prominently, the use of first-person singular words (e.g., "I," "me") correlated significantly stronger with negative trait emotionality in public communication than in private communication while the use of first person plural (e.g., "we") had a higher correlation with positive trait emotionality for private communication than public communication.

Added Value:

In conclusion, the present dissertation sheds light on the complex interplay between language and subjective emotion experience. The two studies that underpin this dissertation are among the first pieces of research to collect and scientifically investigate everyday spoken and written language using conventional smartphones over an extended period, illustrating the promises of personal devices as a new data collection tool.

Moreover, the present work emphasizes the significance of the context of language production in emotion detection, demonstrating the potential for nuanced context-aware sentiment recognition systems to understand consumer sentiment and enhance user experience.

Finally, by highlighting the challenges of current emotion-recognition methodologies, this dissertation contributes to the academic discourse as well as the development of privacy-conscious sentiment detection technologies.



Imputation of missing data from split questionnaire designs in social surveys

Julian B. Axenfeld

German Institute for Economic Research (DIW Berlin), Germany

Relevance & Research Question

In face of declining response rates and escalating costs in social survey research, more and more survey projects are switching from traditional face-to-face interviews to much less expensive self-administered online surveys. However, online surveys have comparatively narrow limits in questionnaire length due to a higher susceptibility for breakoffs. Thus, moving online may force survey designers to cut down on the number of questions asked in a survey, potentially resulting in the cancellation of important research projects due to limited resources. In this context, survey projects increasingly adopt innovative new data collection designs promising to reduce questionnaire length without dropping questions entirely from the survey, such as split questionnaire designs. This is achieved by presenting each respondent only randomly assigned subsets of the questionnaire with the goal of imputing the planned missing data originating from this procedure thereafter. This dissertation addresses the imputation of social survey data from split questionnaire designs and the methodological decisions associated with implementing such surveys to facilitate imputation, asking how split questionnaires may be designed and how the resulting data may be imputed such that estimates could be achieved with a satisfying accuracy in practice based on the imputed data.

Methods & Data

Through a series of Monte Carlo simulations, drawing on real social survey data from the German Internet Panel and the European Social Survey, this research assesses the accuracy of estimates across various scenarios, encompassing the implementation of both the split questionnaire design and the subsequent imputation. It delves into the impacts of different split questionnaire module construction strategies, varying imputation techniques, the interplay between planned missingness and conventional item nonresponse, and the implications of general-purpose versus analysis-specific imputation on the accuracy of estimates for a multivariate model. In each simulation run, a split questionnaire design is simulated by allocating items to modules, randomly assigning a number of modules to each survey participant, and deleting all data from the modules not assigned. Thereafter, the data are multiply imputed and estimates calculated based on the imputed data. These estimates are then compared to benchmarks calculated from the complete data to assess their accuracy.

Results

Main findings from this research involve:

  1. With respect to the imputation, each respondent should receive a selection of questions from a large variety of topics rather than all questions from a selection of topics, as the latter leads to estimates with lower accuracy.
  2. One may need to simplify imputation models with respect to the applied imputation methods and predictor sets to prevent highly inaccurate estimates, especially for relations between variables. For example, the imputation may benefit from excluding variables with near-zero correlations to the imputed variable from imputation models, or from applying dimensionality reduction techniques on the predictor space to reduce the effective number of predictors.
  3. Additional conventional item nonresponse by respondents may challenge the imputation especially if this implies large amounts of missing data from both sources combined, even if the nonresponse is missing completely at random. In this study, especially combined amounts of missing data exceeding 40% appeared harmful to the accuracy of estimates. Thus, even though a split questionnaire design allows for collecting data on more items than are presented to each individual respondent, there seem to be practical limitations on how much questionnaire length can be reduced without negative repercussions on data quality.
  4. If the data are imputed for general research purposes to be supplied to a variety of third-party data users, the imputed data appear well-suited to be used for analyses of continuous relations in the entire survey sample. Conversely, estimating models with strongly non-continuous relationships (such as interactions or quadratic terms) or models based only on a subset of the survey sample could result in considerable biases, given the current state-of-the-art imputation procedures. For such analyses, the data would need to be imputed once more for this specific research objective, rather than for general purposes.

Added Value

The insights gleaned from these simulations thus offer valuable guidance and recommendations for future implementations of split questionnaire designs in online surveys: Split questionnaire survey designers should take care to present questions from preferably all survey topics to each respondent and make sure the split questionnaire design does not result in too large amounts of missing data, also taking into account their expectations about additional unplanned nonresponse. Furthermore, researchers applying imputation to these data may need to reduce complexity in the imputation models to some extent, as for example through dimensionality reduction. Finally, if the data are imputed for general purposes, it should be communicated clearly for which kinds of analyses the imputed data could be used and for which analyses an analysis-specific imputation may be needed.



Essays on Inference for Non-probability Samples and Survey Data Integration

Camilla Salvatore

Utrecht University, The Netherlands

Relevance & Research Question

Probability sample surveys, which are the gold standard for population inference, are facing difficulties due to declining response rates and related increasing costs. Fielding large size probability samples can be cost prohibitive for many survey researchers and study sponsors. Thus, moving towards less expensive, but potentially biased, non-probability sample surveys or alternative data sources (big or digital trace data) is becoming a more common practice.

While non-probabilistic data sources offer many advantages (convenience, timeliness, exploring new aspects of phenomena), they also come with limitations. Drawing inference from non-probability samples is challenging because of the absence of a known sampling frame and random selection process. Moreover, digital trace data are often unstructured and require additional analysis to extract the information of interest. Additionally, there is no unique framework for evaluating their quality, and the lack of a benchmark measure can be a problem when studying new phenomena. Furthermore, it is important to evaluate the construct being measured, as it may be different from the one measured by traditional data sources. Thus, from a statistical perspective, there are many challenges and research questions that need to be addressed, such as the possibility of doing inference with non-probabilistic data, the quality of these data, and whether these data sources can replace or supplement traditional probability sample surveys.

The focus of this work is on answering three research questions: 1) What is the evolution of the field of survey data integration and what new trends are emerging?, 2) Can probability and non-probability sample surveys be combined in order to improve analytical inference and reduce survey costs?, and 3) How can traditional and digital trace data be combined to augment the information in traditional sources and better describe complex phenomena?

Methods & Data

The three research questions are addressed by three different studies.

The first study presents an original science mapping application using text mining and bibliometric tools. In addition to characterizing the field in terms of collaboration between authors and research trends, it also identifies research gaps and formulates a research agenda for future investigations. From this research, it appears evident that data integration is a broad and diverse field in terms of methodologies and data sources. Thus, the second and third studies explore whether using non-probabilistic data can improve inference or can allow to study new aspects of a complex phenomenon.

The second study focuses on the structured and more traditional volunteer web surveys. In order to address the second research question, the paper presents a novel Bayesian approach to integrate a small probability sample with a larger online non-probability sample (possibly affected by selection bias) to improve inferences about logistic regression coefficients and reduce survey costs. The approach can be applied in different contexts. We provide examples from socioeconomic contexts (volunteering, voting behavior, trust) as well as health contexts (smoking, health insurance coverage).

The third study relates the analysis of traditional data in combination with unstructured textual data from social media (Twitter, now X). It shows how digital trace data can be used to augment traditional data, thus feeding smart statistics. On this purpose we propose an original general framework to combine traditional and digital trace based indicators. We show an application related to business statistics but it can be applied to all cases where traditional and new data sources are available.

Results

In the second study, through the simulation and the real-life data analysis we show that the Mean Squared Errors (MSEs) of regression coefficients are generally lower when implementing data integration with respect to the case of no data integration. Also, using assumed probability and non-probability sample costs, we show that potential cost savings are evident. This work is accompanied by an online application (Shiny App) with replication code and an interactive cost-analysis tool. By entering probability and non-probability (per-unit) sample costs, researchers are able to compare different scenarios of costs. These results can be used as a reference for survey researchers interested in collecting and integrating a small probability sample with a larger non-probability one.

The third study results in the development of a general framework to combine traditional and digital trace data. This framework is modular and it is composed of three layers, each describing the steps necessary for the technical construction of a smart indicator. The modularity of the framework is a key feature, as it allows for flexibility in its application. In fact, researchers can use the framework to explore different methodological variants within the same architecture, and potentially carry out improvements to specific modules or test for sensitivity of the results obtained at the different levels.

Added Value

Research in the field of survey data integration and inference for non-probability samples is expanding and becoming increasingly dynamic. Combining different data sources, especially traditional and innovative ones, is a powerful way to gain a comprehensive understanding of a topic, exploring new perspectives, and can result in new and valuable insights.

This work significantly contributes to the current debate in the literature by presenting original methodological findings and adopting a broad perspective in terms of analytical tools (text mining, Bayesian inference and composite indicators) and data sources (volunteer web surveys and textual data from social media).

Addressing the three research questions, it: a) enhances understanding of existing literature, identifying current trends and research gaps for future investigations, b) proposes an original Bayesian framework to combine probability and non-probability online surveys in a manner that improves analytic inference while also reducing survey costs, and c) establishes a modular framework that allows for building composite smart indicators in order to augment the information available in traditional sources through digital trace data.

The added value of this work lies in its presentation of diverse perspectives and case studies on data integration, showcasing how it can provide enhanced statistical analysis.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: GOR 24
Conference Software: ConfTool Pro 2.8.101
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany