Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
Only Sessions at Location/Venue 
 
 
Session Overview
Location: Seminar 2 (Room 1.02)
Rheinische Fachhochschule Köln Campus Vogelsanger Straße Vogelsanger Str. 295 50825 Cologne Germany
Date: Wednesday, 21/Feb/2024
10:00am - 1:00pmWorkshop 1
Location: Seminar 2 (Room 1.02)
Session Chair: Lisa de Vries, Bielefeld University, Germany
Session Chair: Zaza Zindel, Bielefeld University, Germany
 

Embracing Diversity: Integrating Queer Perspectives in Online Survey Research

Zaza Zindel, Lisa de Vries

Bielefeld University, Germany

Duration of the Workshop:
3 hours

Target Groups:
The workshop is designed for researchers, survey practitioners, and anyone passionate about improving the inclusivity and accuracy of their online surveys, particularly in terms of sexual and gender diversity.

Is the workshop geared at an exclusively German or an international audience?
International audience (material will be in English)

Workshop Language:
English. While the instruction material is in English, we can handle questions in German, too.

Description of the content of the workshop:

Political and social advancements have enhanced the acceptance and visibility of sexual and gender minorities in many Western countries. However, the ongoing challenge of accurately addressing their unique experience in online survey research remains. Researchers and sur-vey providers often struggle to incorporate queer perspectives, leaving many surveys and research designs blind to these minority groups.

This workshop offers a comprehensive introduction to the integration of sexual and gender diversity within (online) survey research. It focuses on four key areas:

1) Measurement of Sexual Orientation and Gender Identity: Exploring nuanced approaches for respectful and inclusive data collection on sexual orientation and gender identity.

2) Integrating Queer Perspectives: Learning effective strategies to craft survey questions that resonate with and capture the experiences of sexual and gender minorities.

3) Sampling Methods: Gaining insights into strategies and techniques for effectively reaching and engaging sexual and gender minority populations in online survey re-search.

4) Data Preparation and Analysis: Equipping participants with the skills to sensitively manage and analyze data collected from diverse populations to draw valuable in-sights.

This dynamic workshop combines informative presentations, group discussions, and hands-on exercises, ensuring participants leaving with the confidence and skills to successfully integrate sexual and gender diversity into their research projects.

Necessary prior knowledge of participants:
Prior experience in surveys and survey research principles is beneficial but not required.

Literature that participants need to read for preparation
None

Recommended additional literature
Fischer, M., Kroh, M., de Vries, L. K., Kasprowski, D., Kühne, S., Richter, D., & Zindel, Z. (2021). Sexual and Gender Minority (SGM) Research Meets Household Panel Surveys: Research Potentials of the German Socio-Economic Panel and Its Boost Sample of SGM Households. European Sociological Review, 38(2), 321–335. https://doi.org/10.1093/esr/jcab050

Information about the instructors:

Lisa de Vries is a research associate in the Quantitative Methods of Empirical Social Research department at Bielefeld University. In her dissertation, she focused on the effects of discrimination on leadership positions and job preferences of sexual and gender minorities. Her research interests incluce discrimination and harassment, LGBTQI* parent families and the measurement of sexual orientation and gender identity (SOGI). In addition, she has several experiences with sampling LGBTQI*-people in probability and non-probability surveys as well as measuring SOGI.

Zaza Zindel is a research associate and a doctoral candidate in sociology at Bielefeld University, specializing in survey research. Her dissertation centers around the use of social media as a means of recruiting rare populations for web surveys. Her research interests include survey methodology, the potential of social media for empirical social research, and exploring new technologies to improve statistical representation of marginalized, vulnerable, or rare populations.

Will participants need to bring their own devices in order to be able to access the Internet? Will they need to bring anything else to the workshop?
No

 
1:30pm - 4:30pmWorkshop 5
Location: Seminar 2 (Room 1.02)
Session Chair: Ludger Kesting, Tivian, Germany
 

Flexible Text Categorisation in Practice: Using AI Models to Analyse Open-Ended Survey Responses

Ludger Kesting

Tivian, Germany

Duration of the Workshop:
2 hours

Target Groups:
Beginner for text analysis model

Is the workshop geared at an exclusively German or an international audience?
International

Workshop Language:
English

Description of the content of the workshop:
Gaining an understanding of a powerful but easy approachable text analysis model, the advantages and disadvantages, approach and background, analysis dashboard for text analytics, background knowledge of open source data. Learning about analysis representations, manual completion, analysis approach, text classification, zero shot text classification, visualisation of the analysis, practical application in Tableau.

Goals of the workshop:
Creating understanding for your own starting point to work with language models without training. Existing model, approachable to work with the outcomes.

Necessary prior knowledge of participants:
None

Information about the instructors:

Ludger Kesten's educational background is empirical social science and statistics based on my studies of sociology, computer science and ethnology. His focus and interest were always a data driven understanding of connections between people, social groups and cultures now breaking this down to employee and customer experience to help companies to listen to their employees, empower their leaders to drive success.

Will participants need to bring their own devices in order to be able to access the Internet? Will they need to bring anything else to the workshop?
They have to bring their own device.

 
Date: Thursday, 22/Feb/2024
10:45am - 11:45amT1: GOR Thesis Award 2024 Competition: Bachelor/Master
Location: Seminar 2 (Room 1.02)
Session Chair: Olaf Wenzel, Wenzel Marktforschung, Germany
 

Fair Sampling for Global Ranking Recovery

Georg Ahnert

University of Mannheim, Germany

Relevance & Research Question

Measuring human perception of attributes such as text readability (Crossley et al., 2023) or perceived ideology of politicians (Hopkins and Noel, 2022) is oftentimes difficult because rating scales are hard to interpret. Pairwise comparisons between candidates—for instance: which of these two politician is more conservative?—pose a viable alternative. Given such pairwise comparisons, the task is to recover a global ranking of all candidates. This is non-trivial because of its probabilistic nature, i.e., a "weaker" candidate might win a comparison by chance. Furthermore, resources are often limited and not all pairs can be compared until a satisfactory estimate of individual strength is reached. Therefore, pairs of individuals must be selected according to a specified sampling strategy.

In recent years, a subfield of machine learning has developed around the quantification of fairness. While not without criticism, researchers propose fairness metrics and integrate fairness targets into machine learning algorithms. Lately, algorithmic fairness research has expanded from classification tasks to ranking scenarios as well. Yet, the fairness of ranking recovery from pairwise comparisons remains largely unexplored. This is particularly relevant since measured human perceptions are likely biased. For instance in hiring, pairwise comparisons between candidates for a position might not lead to the identification of the ideal candidate in the presence of such biases.

To the best of my knowledge, no previous research is concerned with the combined influence of sampling strategies and ranking recovery methods on the accuracy and fairness of recovered rankings. I thus propose the following research questions:

  1. What is the effect of the sampling strategies and ranking recovery methods on overall accuracy?
  2. Under which conditions do ranking recovery methods put an unprivileged group at a disadvantage?
  3. Can sampling strategies or ranking recovery methods mitigate the effects of existing biases?

Methods & Data

In this thesis, I present a framework that manipulates the sampling of individuals for comparison in the presence of bias. I simulate individuals with latent "skill scores" on a certain task. I then separate the individuals into two groups and subtract a bias from the scores of the "unprivileged" group. I implement three distinct sampling strategies for selecting individuals from both groups for comparison: (1) random sampling (2) oversampling the unprivileged group and (3) sampling by previous success. Using the Bradley-Terry model (Bradley and Terry, 1952), I then simulate pairwise comparisons between the sampled individuals.

On the simulated pairwise comparison data, I apply various ranking recovery methods including basic heuristics (David, 1987) and a state-of-the art method that involves graph neural networks: GNNRank (He et al., 2022). Further, I recover rankings with Fairness-Aware PageRank (Tsioutsiouliklis et al., 2021)—an algorithm developed for a different task, that is, however, group-aware and aims at eliminating bias.

In order to evaluate the interaction between sampling strategies and ranking recovery methods, I propose a novel group-conditioned accuracy measure tailored towards ranking recovery. Using this measure, I am able to evaluate both the overall accuracy of the recovered ranking, but also its fairness as operationalized through group representation (exposure) and group-conditioned accuracy.

I provide a Python package under MIT license to facilitate replication of my findings as well as for further investigation of fairness in ranking recovery.

Results

Regarding the effect of sampling strategies, I find that both oversampling and rank-based sampling harm the accuracy of the recovered ranking. This is surprising as we would expect oversampling to improve the ranking accuracy of the unprivileged group that is oversampled. However, since this group's ranking accuracy also depends on correct comparisons against the individuals of the other group, the oversampled group's accuracy suffers as well. Oversampling thus is not a good remedy against biased comparisons.

In scenarios where there is no bias present against the unprivileged group, the optimal choice of ranking recovery method depends on the sampling that was used before pairwise comparison. Under random sampling, more advanced methods add little to no benefit in accuracy compared to heuristics based methods (i.e., David's Score). When oversampling or rank-based sampling is applied, however, GNNRank outperforms the other methods.

In the presence of bias against the unprivileged group, Fairness-Aware PageRank outperforms all other ranking recovery methods. Not only does it mitigate group representation bias from the recovered ranking, it also improves the ranking's accuracy when measured against the unbiased, latent "skill scores". This highlights the importance of group-aware ranking recovery over marginal benefits observed between the other ranking recovery methods.

Added Value

This thesis bridges the gap between previous research on fairness in machine learning and ranking recovery from pairwise comparisons. It is the first to introduce a framework for systematic investigation of fairness in ranking recovery and focusses on real-world sampling strategies and existing ranking recovery methods. Further, I propose a novel group-conditioned accuracy measure tailored towards ranking recovery. The results highlight the importance of fairness-aware ranking recovery methods and I supply recommendations on which ranking recovery method to use under which circumstances.



Understanding the Mobile Consumer along the Customer Journey: A Behavioural Data Analysis based on Smartphone Sensing Technology

Isabelle Halscheid1,2

1Technische Hochschule Köln, Germany; 2Murmuras GmbH, Germany

Relevance & Research Question:

Digitalisation is shaping a new consumption era characterised by high connectivity, mobility and a broad range of easily accessible information on products, prices and alternatives. The modern consumers are broadly connected via social media and more mobile than ever with their smart devices. This empowers consumers to make sophisticated buying decisions based on a comprehensive amount of easily accessible online information, while having a broad range of options to choose from. Moreover, they compare prices, ask for opinions online and are willing to choose alternative products or services if they fit better in their lifestyle and meet their needs. As a result, it becomes more difficult than ever to understand modern consumers along their complex and dynamic path to purchase. However, since the modern consumers are constantly online through their smartphones, they produce a notable amount of data about their mobile and online behaviour such as movement, social media activities, online purchases or google searches. This behavioural data is immensely valuable for companies because it allows them to get a deep understanding about the mobile consumption behaviour of their customers. Yet, there is no solution on how to use this data to follow the consumers on their mobile devices. Therefore, this thesis investigates the extent to which mobile data collected with sensing technologies is useful to describe mobile consumer behaviour. The goal was to propose a first approach on how mobile data can be analysed to understand mobile consumers along their customer journey. For this purpose, an explorative analysis was conducted based on the following research question: What analyses can be performed using data generated with smartphone sensing technology to understand mobile consumer behaviour along the customer journey?

Methods & Data:

As a first step, a literature review on current customer journey analytics theories, models and practices was conducted as foundation for the explorative data analysis. Because there could not be found any reasonable research that focuses on analysing customer journeys from mobile consumers, a mobile customer journey model was developed by adapting current models that are used among practitioners in customer journey analytics.

For the data analysis, the author collaborated with Murmuras, which developed a smartphone sensing technology for collecting sensing data via an application on participants’ mobile phones. The collection process adheres to GDPR compliance standards, with data exclusively stored on servers located in Germany. Importantly, no personal information is tracked; instead, only consumption-relevant data is recorded. The company runs an ongoing incentivised smartphone sensing panel with a constant participant basis of approximately 1.500 smartphone users in Germany. Because of this, the thesis could be provided with long-term data from 01.10.2021 to 31.08.2022. This mainly included app usage data as well as mobile browser data (e.g. google search terms, website visits, etc.) and specific in-app content such as advertisement in the Facebook and Instagram app and in-app shopping content from the Amazon shopping app.

The data was provided and analysed via the platform Metabase, which mainly uses SQL-programming for analysing data. As the author has previous experience working with the data and analytics platform during a student internship, this knowledge could be used to transfer the mobile customer journey model into analytics concepts. Based on that, an explorative data analysis was conducted to explore the full potential of sensing data in the context of customer journey analytics.

Results:

The results show that mobile sensing data can be used in three main research areas among customer journey analytics: examining the touchpoint performance of a brand across mobile apps, describing different target groups by their smartphone usage behaviour and deriving real customer journeys on users’ devices. For these areas interactive dashboards using different types of sensing data were developed.

The first dashboard focuses on analysing the touchpoint performance across various sensing datasets, including general app usage, in-app advertising, browser data, and Amazon shopping data. Key Performance Indicators (KPIs) were calculated to assess both general and app-related touchpoint performance. The integrated mobile customer journey provides an overview of all brand touchpoints over time, with detailed analyses of ads, browser interactions, and shopping behaviour. The second dashboard dives into target group analysis, aiming to understand mobile behaviour and preferences by providing insights into demographics, smartphone usage habits, contact channels, and mobile shopping behaviours on Amazon. The last part of the analysis employs the dashboards to conduct a deep analysis of an individual brand customer. This involved identifying relevant touchpoints, observing intercorrelations between touchpoints, analysing phone and mobile shopping habits, and mapping the customer journey stages. The insights gained from this analysis contribute to a comprehensive customer journey map and offer opportunities for the brand based on a deeper understanding of the consumers’ mobile life.

Added Value:

Although the vast amount of sensing data and the complexity of its analysis in the context of customer journey analytics remains challenging, it could be shown that sensing data presents a big opportunity for companies and researchers in this research area. It is not only possible to follow the relevant customers on their complex path to purchase, but also act on it by having the knowledge on how and where exactly to interact with their customers in the mobile world. As this has been a blind spot for companies and researchers before, they now have the ability to decode the whole customer journey of target groups by combining existing data with the insights derived from mobile sensing data. As sensing technology and sensing capabilities as well as smart devices are constantly improving, it is expected that an even more complete picture of mobile customer journeys can be analysed, which will add further value to customer journey analytics in future.



Effects of active and passive use on subjective well-being of users of professional networks

Constanze Roeger

TH Köln, Germany

Relevance & Research Question:

Over the past decade online networking platforms have become integral parts of everyday life for most people, reshaping the way individuals communicate and network both privately and professionally. The growing popularity of these sites has sparked both enthusiasm and apprehension, resulting in a heated debate on the negative consequences of social network site (SNS) use on users’ well-being in both popular culture and academia. Almost simultaneously with the rise of private network sites such as Facebook, professional network sites (PNSs) including LinkedIn have gained popularity. Despite the great interest in usage patterns (active and passive use) and the negative effects of SNS use on users’ well-being, relatively little research has been performed on PNSs. Especially the association between PNS use and well-being has received very little academic attention so far. In view of the increasing popularity of PNSs for both private users and organizations this is surprising. Examining the impact on well-being is important as PNSs become more popular, leading to an increasing number of users who may be affected by the potentially harmful consequences such as decreased satisfaction with life, increased depressive symptoms or loneliness some authors have previously attributed to SNS use.

The aim of this study was to transfer previous findings on SNS use to the context of PNSs, exploring the multifaceted relationship between usage patterns and users’ well-being leading to the following research questions:

RQ1 What is the relationship between PNS usage type and users’ subjective well-being?

RQ2 What factors play a role in determining the influence of PNS usage type on the subjective well-being of the users?

RQ2.1 How does bridging social capital influence the relationship between active use and users’ subjective well-being?

RQ2.2 How do social comparison and envy influence the relationship between passive use and users’ subjective well-being?

Methods & Data:

A quantitative online survey was conducted which yielded an adjusted total sample of 526 LinkedIn users (173 male, 350 female, 2 diverse, 1 undisclosed) aged 19 to 65 (M = 28.69, SD = 8.66). A convenience sample was recruited using WhatsApp, LinkedIn and university mailing lists. Additionally, three survey sharing platforms (i.e. SurveyCircle, SurveySwap and PollPool) were used.

According to the active-passive model of SNS use (Verduyn et al., 2017), which was employed as the theoretical framework for this thesis and transferred to the context of PNSs for this purpose, the effects of active and passive use on users’ subjective well-being are explained by three mediating variables: social capital for active and social comparison as well as envy for passive use. Followingly, participants were asked to fill out measures regarding their usage pattern on LinkedIn, their subjective well-being, their tendency to engage in social comparison behavior, their experiences with envy as well as their levels of social capital.

Three mediation analyses were run using the PROCESS add on (Hayes, 2013) for IBM SPSS 28.0.1.0. To test the relationship between active LinkedIn use and subjective well-being, which was predicted to be mediated by bridging social capital, a simple mediation model was tested (model 1). Next, a serial mediation analysis was run to test the relationship between upward social comparison and envy as mediators in the relationship between passive LinkedIn use and subjective well-being (model 2). The same procedure was repeated, replacing upward comparison by downward social comparison (model 3).

Results:

Results of the mediation analyses revealed an indirect positive relation between active use of LinkedIn and well-being. Conversely, a negative indirect relation was found between passive use of LinkedIn and subjective well-being.

Bridging social capital fully mediated the relationship between active LinkedIn use and well-being (significant positive indirect effect ab = .0624, 95%-CI [.0303; .0999] and insignificant direct effect c’ = .0967, p = .1237, 95%-CI [-.0191; .1585]).

As predicted, social comparison and envy acted as serial mediators in the relation between passive LinkedIn use and subjective well-being (model 2: a1d21b2 = -.0347, 95%-CI [-.0583; -.0120]; model 3: a1d21b2 = -.0101, 95% CI [-.0217; -.0009]).

Though, results of the two mediation models examining passive LinkedIn use indicated possible omissions of other mediating variables as the direct effect between passive LinkedIn use and subjective well-being (model 2: c’ = .1692, p < .001, 95%-CI [.0904; .2481]; model 3: c’ = .1433, p < .001, 95%-CI [.0651; .2215]) remained significant after the mediator variables were added to the model.

Added Value:

The results of this thesis further expand upon previous research by examining users of PNSs. This study extends prior findings of other studies in two ways. First, it advances literature on online networking site use and well-being as it explores PNS use. Previous research mainly examined the relation between SNS use and well-being with special attention to Facebook. Moreover, prior studies have mainly focused on examining either passive or active use, while this study examined both usage patterns at once.

While results of this study are preliminary and should not be generalized, findings suggest that SNSs and PNSs share similarities, that lead to similar effect patterns when examining the relationship between usage patterns and well-being. Testing the active-passive model of SNS use (Verduyn et al., 2017) in the context of PNSs, revealed appropriate applicability. Results of this thesis also have practical relevance for both users and creators of platforms like LinkedIn. Active use behavior should be promoted and encouraged as it has been associated with positive affects on users’ well-being. When being educated on the different effects of usage patterns, users can proactively change their behaviors, positively affecting their well-being.

 
12:00pm - 1:15pmT2: GOR Thesis Award 2024 Competition: PhD
Location: Seminar 2 (Room 1.02)
Session Chair: Olaf Wenzel, Wenzel Marktforschung, Germany
 

Challenging the Gold Standard: A Methodological Study of the Quality and Errors of Web Tracking Data

Oriol J. Bosch1,2,3

1University of Oxford, United Kingdom; 2The London School of Economics, United Kingdom; 3Universitat Pompeu Fabra, Spain

Relevance & Research Question

The advent of the Internet has ushered the social sciences in a new era of data abundancy. In this era, when individuals engage with online platforms and digital technologies, they leave behind digital traces. These digital traces can be collected for scientific research through innovative data collection methods. One of these methods, web trackers, has gained popularity in recent years. This approach hinges on the utilization of web tracking technologies, known as meters, encompassing a diverse array of solutions that participants can install onto their devices. These meters enable the tracking of various traces left by participants during their online interactions, such as visited URLs.

Historically, web tracking has been upheld as the de facto gold standard for measuring online behaviours. This thesis studies whether this prevailing notion holds true. Specifically, it explores the following questions: is web tracking data affected by errors? If so, what is the prevalence of these errors? To what extend do these errors introduce bias to web tracking measures? What is the overall validity and reliability of web tracking measures? And can we do something to limit the impact of web tracking data on the measurement quality of its measures?

Methods & Data

To explore these questions, this thesis uses data from the TRI-POL project. The TRI-POL is three-wave survey, conducted between 2021 and 2022, matched at the individual level with web tracking data. Data were collected through the Netquest opt-in metered panels in Spain, Portugal, and Italy, which consist of individuals who have meter(s) already installed in their devices and who can be contacted to conduct surveys. Cross quotas for age and gender, and for educational level, and region were used in to ensure a sample matching on these variables to the general country online populations.

The thesis is composed of three interconnected papers. The first paper, “When survey science met web tracking: Presenting an error framework for metered data”, develops and present a Total Error framework for digital traces collected with Meters (TEM). The TEM framework (1) describes the data generation and the analysis process for metered data and (2) documents the sources of bias and variance that may arise in each step of this process. Using a case study, the paper also shows how the TEM can be applied in real life to identify, quantify, and reduce metered data errors.

The second paper, “Uncovering digital trace data biases: tracking undercoverage in web tracking data,” adopts an empirical approach to address tracking undercoverage. This is a key error identified in the TEM: the failure to capture data from all the devices and browsers that individuals utilize to go online. The paper uses a new approach to combine self-reported data on participant’s device usage, and paradata about the devices tracked, to identify undercoverage. Moreover, the paper estimates the bias introduced by different undercoverage scenarios, through the use of a Monte Carlo simulations.

The third and last paper, “Validity and Reliability of Digital Trace Data in Media Exposure Measures: A Multiverse of Measurements Analysis,” explores the validity and reliability of web tracking data when used to measure media exposure. To do so, the paper uses a novel multiverse of measurements analysis approach, to estimate the predictive validity and true-score reliability of more than 7,000 potentially designable web tracking measures of media exposure. The reliability of the multiverse of measurements is estimated using Quasi-Markov Simplex Models, and the predictive validity of the measures is inferred as the association between media exposure and political knowledge (gains). Furthermore, the paper estimates the effect of each design choice on the reliability and validity of web tracking measures using Random Forests.

Results

The TEM in the first paper suggest that web tracking data can indeed be affected by a plethora of error sources and, therefore, statistics computed with this data might be biased. Hence, caution should be taken when using metered data for inferential statistic. By clearly showing how web tracking data is collected and analysed, and identifying the errors of web tracking data, the framework allows to develop approaches to quantify those errors, and strategies to minimise them.

Furthermore, the thesis shows - in the second paper- that tracking undercoverage is highly prevalent in commercial panels. Specifically, it reveals that across the countries examined, 74% of the panellists studied had at least one device they used for online activities untracked. Additionally, the simulations prove that web tracking estimates, both univariate and multivariate, are often substantially biased due to tracking undercoverage. As an example, across the different scenarios tested, undercoverage can inflate the proportion of participants identified as news avoiders by 5-21 percentage points, an overestimation of 29-123%. This represent the first empirical evidence demonstrating that web tracking data is biased. Moreover, it exposes deficiencies in the practices and procedures followed by both online fieldwork companies and researchers.

Focusing on the measurement properties of web tracking measures, the third paper shows that the median reliability of the entire universe of measurements explored is high but imperfect (≈ 0.86). Hence, in general, the explored measures of media exposure capture around 86% of the variance of their true score. Conversely, the predictive validity of the measures is low, given that overall the association between being exposed to media and gaining political knowledge is null. Although most self-reported measures of media exposure have been criticized precisely because of their lack of predictive power, results suggest that this is not limited to self-reports. Hence, with the current evidence, web tracking measures of media exposure cannot be considered an improvement to self-reports. Additionally, results from the Random Forests suggest that the design decisions made by researchers when designing web tracking measurements can have a substantial impact on their measurement properties.

Added Value

Collectively, this thesis challenges the prevailing belief in web tracking data as the gold standard to measure online behaviours. It shows that web tracking data is affected by errors, which can substantially bias the statistics produced, as well as harm the reliability and validity of the resulting measures. In addition, the thesis demonstrates that high-quality measures can only be achieved through conscious design decisions, both when collecting the data (e.g., making sure all devices are tracked), and when defining how to construct the measurements. Methodologically, the thesis illustrates how a combination of traditional survey and computational methods can be used to assess the quality of digital trace data.



The Language of Emotions: Smartphone-Based Sentiment Analysis

Timo Koch1,2

1University of St. Gallen, Switzerland; 2LMU Munich

Relevance & Research Question:

In an era transformed by artificial intelligence (AI) and the surge of voice assistants, chatbots, and other text or speech-based systems generating massive volumes of language data, automated emotion recognition and sentiment analysis have become integral across disciplines ranging from online marketing to user experience research.

However, a main challenge has constrained previous research in this field: differentiating subjective emotional experience ("How do I feel in this moment?") from observable emotional expressions ("How do I express my feelings through language?"). While recognizing subjective emotions is of great scientific and practical relevance, the empirical difficulty of obtaining data on subjective emotional experiences and concurrent real-time language samples has limited the research. As a consequences, prior studies and deployed algorithms mainly relied on datasets composed of text or speech data either rated by participants for their emotional content or provided by actors, thereby focusing on emotion expression.

Here, the advent of conventional smartphones has provided a novel research tool, enabling the collection of self-reports on subjective emotional experience via apps and the gathering of everyday speech data through the smartphone's keyboard and built-in microphone. The present work leverages the ubiquity of smartphones, utilizing those capabilities to gather authentic text and speech samples, along with self-reported emotional states, bridging the gap between subjective emotional experiences and their linguistic expressions.

Thereby, the present dissertation addresses the research question if subjective emotional experience can be associated with and predicted from features in spoken and written natural language. Moreover, it identifies specific language characteristics, such as the use of certain word categories or voice parameters, associated with one’s subjective emotional experience. Finally, this work examines the influence of the context of language production on emotional language.

Methods & Data:

The present dissertation unfolds across two pivotal studies, employing everyday smartphones to collect rich datasets of both spoken and written language as well as self-reports on momentary emotional experience.

Study 1 analyzes subjective momentary emotion experience in more than 23,000 speech samples from over 1,000 participants in Germany (Study 1.1) and the US (Study 1.2). In Study 1.1, participants uttered predetermined sentences with varying emotional valences (positive/neutral/negative) into their smartphones' microphones and self-reported on their momentary emotional states through an app. From the voice logs, vocal parameters (e.g., loudness, pitch, frequency) were algorithmically extracted. On the contrary, in Study 1.2, participants were given the freedom to express their current thoughts and feelings during the speech recordings alongside the emotion self-reports. Here, not only acoustic parameters, but also state-of-the-art word embeddings based on a Large Language Model (LLM) were extracted from participants’ speech. Then, machine learning algorithms were employed to predict self-reported emotional experience from the extracted voice parameters and word embeddings. Also, interpretable machine learning methods were employed to identify the most important vocal features for emotion predictions.

Study 2 leverages a dataset of over 10 million typed words from 486 participants to investigate traces of subjective emotion experience in text data. Here, the smartphone’s keyboard was utilized to log data on typing dynamics (e.g., typing speed), word use based on sentiment dictionaries and indirect emotion markers (e.g., use of first person singular), and emoji and emoticon use. Moreover, the logged data were enriched with contextual information on the app where the respective text had been produced as well as the input prompt text (e.g., “Was gibt’s Neues?” on Twitter). This allowed to distinguish between private communication, for example sending a message on WhatsApp, and public communication, like posting on Facebook. As in study 1, self-reported momentary emotional states and overall stable trait emotionality were assessed through an app. Then, descriptive correlations between self-reported emotion measures and language characteristics as well as machine learning models were investigated for different communication contexts and time aggregations (e.g., daily emotional experience vs. momentary emotions).

Results:

Results from study 1 indicate that while scripted speech offers limited emotional cues, spontaneous speech significantly enhances the prediction accuracy for emotions. Further, speech content showed a superior predictive performance compared to vocal acoustics in the employed machine learning models. Also, for both prompted and spontaneous speech, the emotional valence of the spoken content had no effect on the algorithmic recognition of emotions from vocal features. Finally, interpretable machine learning methods revealed vocal features related to loudness and spectral fluctuation to be most relevant for emotion predictions from vocal parameters.

Study 2 reveals that sentiment dictionaries capture subjective emotion experience for large time windows, such as for overall trait emotionality or weekly emotion experience, but are limited for shorter periods, like momentary emotions. Besides those time effects, findings indicate that the context of language production has a significant impact on distinct emotion-related language variations. Most prominently, the use of first-person singular words (e.g., "I," "me") correlated significantly stronger with negative trait emotionality in public communication than in private communication while the use of first person plural (e.g., "we") had a higher correlation with positive trait emotionality for private communication than public communication.

Added Value:

In conclusion, the present dissertation sheds light on the complex interplay between language and subjective emotion experience. The two studies that underpin this dissertation are among the first pieces of research to collect and scientifically investigate everyday spoken and written language using conventional smartphones over an extended period, illustrating the promises of personal devices as a new data collection tool.

Moreover, the present work emphasizes the significance of the context of language production in emotion detection, demonstrating the potential for nuanced context-aware sentiment recognition systems to understand consumer sentiment and enhance user experience.

Finally, by highlighting the challenges of current emotion-recognition methodologies, this dissertation contributes to the academic discourse as well as the development of privacy-conscious sentiment detection technologies.



Imputation of missing data from split questionnaire designs in social surveys

Julian B. Axenfeld

German Institute for Economic Research (DIW Berlin), Germany

Relevance & Research Question

In face of declining response rates and escalating costs in social survey research, more and more survey projects are switching from traditional face-to-face interviews to much less expensive self-administered online surveys. However, online surveys have comparatively narrow limits in questionnaire length due to a higher susceptibility for breakoffs. Thus, moving online may force survey designers to cut down on the number of questions asked in a survey, potentially resulting in the cancellation of important research projects due to limited resources. In this context, survey projects increasingly adopt innovative new data collection designs promising to reduce questionnaire length without dropping questions entirely from the survey, such as split questionnaire designs. This is achieved by presenting each respondent only randomly assigned subsets of the questionnaire with the goal of imputing the planned missing data originating from this procedure thereafter. This dissertation addresses the imputation of social survey data from split questionnaire designs and the methodological decisions associated with implementing such surveys to facilitate imputation, asking how split questionnaires may be designed and how the resulting data may be imputed such that estimates could be achieved with a satisfying accuracy in practice based on the imputed data.

Methods & Data

Through a series of Monte Carlo simulations, drawing on real social survey data from the German Internet Panel and the European Social Survey, this research assesses the accuracy of estimates across various scenarios, encompassing the implementation of both the split questionnaire design and the subsequent imputation. It delves into the impacts of different split questionnaire module construction strategies, varying imputation techniques, the interplay between planned missingness and conventional item nonresponse, and the implications of general-purpose versus analysis-specific imputation on the accuracy of estimates for a multivariate model. In each simulation run, a split questionnaire design is simulated by allocating items to modules, randomly assigning a number of modules to each survey participant, and deleting all data from the modules not assigned. Thereafter, the data are multiply imputed and estimates calculated based on the imputed data. These estimates are then compared to benchmarks calculated from the complete data to assess their accuracy.

Results

Main findings from this research involve:

  1. With respect to the imputation, each respondent should receive a selection of questions from a large variety of topics rather than all questions from a selection of topics, as the latter leads to estimates with lower accuracy.
  2. One may need to simplify imputation models with respect to the applied imputation methods and predictor sets to prevent highly inaccurate estimates, especially for relations between variables. For example, the imputation may benefit from excluding variables with near-zero correlations to the imputed variable from imputation models, or from applying dimensionality reduction techniques on the predictor space to reduce the effective number of predictors.
  3. Additional conventional item nonresponse by respondents may challenge the imputation especially if this implies large amounts of missing data from both sources combined, even if the nonresponse is missing completely at random. In this study, especially combined amounts of missing data exceeding 40% appeared harmful to the accuracy of estimates. Thus, even though a split questionnaire design allows for collecting data on more items than are presented to each individual respondent, there seem to be practical limitations on how much questionnaire length can be reduced without negative repercussions on data quality.
  4. If the data are imputed for general research purposes to be supplied to a variety of third-party data users, the imputed data appear well-suited to be used for analyses of continuous relations in the entire survey sample. Conversely, estimating models with strongly non-continuous relationships (such as interactions or quadratic terms) or models based only on a subset of the survey sample could result in considerable biases, given the current state-of-the-art imputation procedures. For such analyses, the data would need to be imputed once more for this specific research objective, rather than for general purposes.

Added Value

The insights gleaned from these simulations thus offer valuable guidance and recommendations for future implementations of split questionnaire designs in online surveys: Split questionnaire survey designers should take care to present questions from preferably all survey topics to each respondent and make sure the split questionnaire design does not result in too large amounts of missing data, also taking into account their expectations about additional unplanned nonresponse. Furthermore, researchers applying imputation to these data may need to reduce complexity in the imputation models to some extent, as for example through dimensionality reduction. Finally, if the data are imputed for general purposes, it should be communicated clearly for which kinds of analyses the imputed data could be used and for which analyses an analysis-specific imputation may be needed.



Essays on Inference for Non-probability Samples and Survey Data Integration

Camilla Salvatore

Utrecht University, The Netherlands

Relevance & Research Question

Probability sample surveys, which are the gold standard for population inference, are facing difficulties due to declining response rates and related increasing costs. Fielding large size probability samples can be cost prohibitive for many survey researchers and study sponsors. Thus, moving towards less expensive, but potentially biased, non-probability sample surveys or alternative data sources (big or digital trace data) is becoming a more common practice.

While non-probabilistic data sources offer many advantages (convenience, timeliness, exploring new aspects of phenomena), they also come with limitations. Drawing inference from non-probability samples is challenging because of the absence of a known sampling frame and random selection process. Moreover, digital trace data are often unstructured and require additional analysis to extract the information of interest. Additionally, there is no unique framework for evaluating their quality, and the lack of a benchmark measure can be a problem when studying new phenomena. Furthermore, it is important to evaluate the construct being measured, as it may be different from the one measured by traditional data sources. Thus, from a statistical perspective, there are many challenges and research questions that need to be addressed, such as the possibility of doing inference with non-probabilistic data, the quality of these data, and whether these data sources can replace or supplement traditional probability sample surveys.

The focus of this work is on answering three research questions: 1) What is the evolution of the field of survey data integration and what new trends are emerging?, 2) Can probability and non-probability sample surveys be combined in order to improve analytical inference and reduce survey costs?, and 3) How can traditional and digital trace data be combined to augment the information in traditional sources and better describe complex phenomena?

Methods & Data

The three research questions are addressed by three different studies.

The first study presents an original science mapping application using text mining and bibliometric tools. In addition to characterizing the field in terms of collaboration between authors and research trends, it also identifies research gaps and formulates a research agenda for future investigations. From this research, it appears evident that data integration is a broad and diverse field in terms of methodologies and data sources. Thus, the second and third studies explore whether using non-probabilistic data can improve inference or can allow to study new aspects of a complex phenomenon.

The second study focuses on the structured and more traditional volunteer web surveys. In order to address the second research question, the paper presents a novel Bayesian approach to integrate a small probability sample with a larger online non-probability sample (possibly affected by selection bias) to improve inferences about logistic regression coefficients and reduce survey costs. The approach can be applied in different contexts. We provide examples from socioeconomic contexts (volunteering, voting behavior, trust) as well as health contexts (smoking, health insurance coverage).

The third study relates the analysis of traditional data in combination with unstructured textual data from social media (Twitter, now X). It shows how digital trace data can be used to augment traditional data, thus feeding smart statistics. On this purpose we propose an original general framework to combine traditional and digital trace based indicators. We show an application related to business statistics but it can be applied to all cases where traditional and new data sources are available.

Results

In the second study, through the simulation and the real-life data analysis we show that the Mean Squared Errors (MSEs) of regression coefficients are generally lower when implementing data integration with respect to the case of no data integration. Also, using assumed probability and non-probability sample costs, we show that potential cost savings are evident. This work is accompanied by an online application (Shiny App) with replication code and an interactive cost-analysis tool. By entering probability and non-probability (per-unit) sample costs, researchers are able to compare different scenarios of costs. These results can be used as a reference for survey researchers interested in collecting and integrating a small probability sample with a larger non-probability one.

The third study results in the development of a general framework to combine traditional and digital trace data. This framework is modular and it is composed of three layers, each describing the steps necessary for the technical construction of a smart indicator. The modularity of the framework is a key feature, as it allows for flexibility in its application. In fact, researchers can use the framework to explore different methodological variants within the same architecture, and potentially carry out improvements to specific modules or test for sensitivity of the results obtained at the different levels.

Added Value

Research in the field of survey data integration and inference for non-probability samples is expanding and becoming increasingly dynamic. Combining different data sources, especially traditional and innovative ones, is a powerful way to gain a comprehensive understanding of a topic, exploring new perspectives, and can result in new and valuable insights.

This work significantly contributes to the current debate in the literature by presenting original methodological findings and adopting a broad perspective in terms of analytical tools (text mining, Bayesian inference and composite indicators) and data sources (volunteer web surveys and textual data from social media).

Addressing the three research questions, it: a) enhances understanding of existing literature, identifying current trends and research gaps for future investigations, b) proposes an original Bayesian framework to combine probability and non-probability online surveys in a manner that improves analytic inference while also reducing survey costs, and c) establishes a modular framework that allows for building composite smart indicators in order to augment the information available in traditional sources through digital trace data.

The added value of this work lies in its presentation of diverse perspectives and case studies on data integration, showcasing how it can provide enhanced statistical analysis.

 
3:45pm - 4:45pmB3: The Power of Social Media Data
Location: Seminar 2 (Room 1.02)
Session Chair: Ádám Stefkovics, HUN-REN Centre for Social Sciences, Hungary
 

Bridging Survey and Twitter Data: Understanding the Sources of Differences

Josh Pasek1, Lisa Singh2, Trivellore Raghunathan1, Ceren Budak1, Michael Jackson3, Jessica Stapleton3, Leticia Bode2, Le Bao2, Michael Traugott1, Nathan Wycoff2, Yanchen Wang2

1University of Michigan, United States of America; 2Georgetown University, United States of America; 3SSRS, United States of America

Relevance & Research Question

For years, researchers have attempted to use social media data to generate inferences typically produced using surveys. But Twitter data and other social media traces do not consistently reflect contemporary survey findings. Two explanations have been proposed for why this might be the case: one posits that the set of people producing data on social media sites differs from those recruited to surveys; the other asserts that data generating processes are sufficiently different that it does not make sense to compare their social media and survey outputs directly.

Methods & Data

This study links a probability US sample of survey respondents with those same individuals’ Twitter data as well as with decahose Twitter data. We compare four datasets to understand links between samples and data generating processes. These include survey responses on three topics for (1) a probability sample of the US public (N=9544); (2) the same survey responses for the subset of individuals who use Twitter, consent to access, and tweet about the topics of interest (N=246); (3) tweets for this set of linked individuals who tweeted about the topic of interest; and (4) tweets from US individuals sampled from the Twitter decahose (N=7,363 after removing bots and non-individual accounts). Open-ended survey questions and social media posts are topic modeled using a guided topic modeling approach within topic areas to identify vaccination behaviors/attitudes, economic evaluations, and parenting challenges during the COVID pandemic.
Results

We find that the subset of individuals who used Twitter and consented to linkage differed slightly in demographic composition, but mentioned similar distribution subtopics in response to open-ended survey questions about all three areas. In contrast, individuals with survey and Twitter data provided similar data across these two modes for one of our three topics (economics) and different data across the other topics (vaccinations and parenting). Tweets from consented users and the decahose sample, in contrast, provided similar distributions of topics for vaccinations and parenting, but not economics.
Added Value

This suggests that motivation to post and posting frequency may be more important for data acquired than who is represented.



Physical Proximity and Digital Connections: The Impact of Geographic Location on Twitter User Interaction

Long Nguyen1, Zoran Kovacevic2

1Bielefeld University; 2ETH Zürich

Relevance & Research Question

In the context of an online social network where geographical distance is often assumed to be inconsequential, this study examines how physical proximity relates to Twitter user interaction. In line with previous findings, the central hypothesis is that individuals who live in closer physical proximity are more likely to engage with one another, despite the virtual nature of Twitter. Moreover, the extent of this impact is expected to be contingent on the specific topic under discussion.

Methods & Data

Employing a multi-layered approach, the study integrates techniques from natural language processing, network analysis, and spatial analysis. A dataset of over 500 million geolocated German tweets (including retweets) forms the basis of the analysis. First, a BERT-like language model is trained on the tweets to categorise them into thematically similar groups, enabling a granular exploration of topic-specific interactions. Subsequently, retweet and reply networks are constructed for each thematic group as well as for the entire tweet corpus. Community detection algorithms are then used to identify clusters of users who frequently retweet and reply to each other. Spatial analysis is then applied to examine the correlation between users' physical proximity and their clustering as identified by community detection.

Results

Preliminary results indicate a corpus-wide positive correlation between the spatial proximity of users and their clustering based on retweet and reply communities. However, the strength and significance of the correlation varies across the different topics discussed within the Twitter dataset. Notably, the geographical aspect of discussions can be found not only among local topics, but also in topics with a more universal appeal.

Added Value

This study offers a methodologically complex investigation of the interplay between geography and online social networks. By revealing the nuanced relationship between spatial proximity and Twitter user interaction based on topics, the study extends our understanding of online social dynamics. The findings contribute to the broader discourse on social media by highlighting the importance of local context and regional differences as a determinant of online interaction patterns.



Gender (self-)portrayal and stereotypes on TikTok

Dorian Tsolak1,2,3, Stefan Knauff1,2,3, Long H. Nguyen1,2, Rian Hedayet Zaman1, Jonas Möller1, Yasir Ammar Mohammed1, Ceren Tüfekçi1

1Bielefeld University, Germany; 2Bielefeld Graduate School in History and Sociology, Bielefeld; 3Institute for Interdisciplinary Research on Conflict and Violence, Bielefeld

Relevance & Research Question

Women and men are portrayed differently in advertising and on social media, as research on gender (self-)portrayal has shown. Most studies in this area analyzed small samples of static images to examine gender stereotypes conveyed through images on social media. We study gender (self-)portrayal on TikTok, in particular which dynamic expressions are more often used by individuals passing as women or men. For this, we present a novel method to analyze large amounts of video data with computational methods.

Methods & Data

Our data encompasses approximately 36,000 unique videos extracted from the top 1000 trending TikTok videos in Germany over a consecutive 40-day period in 2021, supplemented by 973,000 metadata entries. Each video is processed using YOLOv8 pose detection, which dissects the videos into frames and annotates 17 key points per frame. We group the data into commonly used dynamic expressions (i.e., sequences of body movement). We employ HDBSCAN and DTW to deal with differences of sequence and video length and to handle ‘valid’ missing data, e.g., from certain body parts not being visible in the footage.

Results

Sequences are grouped into prototypes of dynamic expressions. Using manually annotated information, we can distinguish certain types of movement that are more commonly used by one gender. Utilizing metadata and expressions in the videos, we are able to explain a part of the variance of how a video performs, i.e. how many likes it gets or how long it stays within the top 1000 trends. A qualitative assessment of the prototypes of the most gender-biased expressions allows for integration with sociological theory on gender stereotypical body posing and provides insight into why some poses might perform better regarding likes and views.

Added Value

We extend the framework for analyzing gender stereotypical posing from static social media images to dynamic social media videos, which is an important endeavor to adapt to the trend of video-based social media content (Snapchat, TikTok, Instagram reels) becoming the de facto default type of content, especially for younger generations. Regarding methods, we offer a tractable way to analyze body posing on social media.

 
5:00pm - 6:00pmB4: Willingness to participate in passive data collection studies
Location: Seminar 2 (Room 1.02)
Session Chair: Johannes Volk, Destatis - Federal Statistical Office Germany, Germany
 

The influence of conditional and unconditional incentives on the willingness to participate in web tracking studies

Judith Gilsbach, Joachim Piepenburg, Frank Mangold, Sebastian Stier, Bernd Weiss

GESIS Leibniz Institute for the Social Sciences, Germany

Relevance & Research Question

Linking web tracking and survey data opens new research areas. We record participants browsing behavior via a browser plugin. Tracking allows for measuring behavior that individuals tend to recall inaccurately and reduces the survey burden. However, previous studies found that participants were reluctant to participate in tracking studies. To increase participation rates, monetary incentives are widely used. These can be granted unconditionally, conditional on participation or as a combination of both. It is, however, unclear (1) how large conditional incentives should be and whether unconditional incentives can additionally increase participation rates. Additionally, we are interested in, (2) whether these effects are the same for a convenience sample and a probability-based sample.

Methods & Data

To answer our research questions, we conduct a 2x3 factorial experiment with approximately 2600 panelists of a new panel. Panelists are recruited via Meta-ads and via a German general population survey (ALLBUS). The first factor is whether panelists receive a prepaid incentive of 5 Euro or not. The second factor is the amount of the postpaid incentive (10,25 or 40 Euro), conditional on 30 out of 60 active days in the tracking period.

We investigate (1a) consent for participation in a web tracking study, (1b) actual installation of the browser plugin. We will present logistic regression models. Additionally, (2) we will investigate the differences between Meta and ALLBUS recruited participants.

Results

Using a smaller dataset of our first field period including only participants recruited via Meta-ads, we find that the unconditional incentive has a positive relation with consent but not with installation. For the amount of the conditional incentive, we cannot see an effect yet. We will analyze a larger dataset for the results that will be presented at the conference. Results for our second research question will be available by the time of the conference.

Added Value

Little is known on how incentives and other factors impact participation in tracking studies as most research only investigates hypothetical consent. Our study adds to the knowledge on the incentive amount needed to recruit participants via Meta and via a general population survey into a web tracking study.



Intentions vs. Reality. Validating Willingness to Participate Measures in Vignette Experiments Using Real-World Participation Data

Ádám Stefkovics1,2,3, Zoltán Kmetty1,4

1HUN-REN Centre for Social Sciences, Hungary; 2IQSS, Harvard University; 3Századvég Foundation; 4Eötvös Loránd University

Relevance & Research Question:

Vignette and conjoint experiments are extensively utilised in the field of social sciences. These methodologies assess preferences in hypothetical scenarios, exploring decision-making in complex choices with varied attributes, aiming to align survey responses more closely with real-world decisions. However, survey experiments are only valid externally to the extent that stated intentions align with real-world behaviour. This study uses a unique dataset that allows us to compare the outcomes of a vignette experiment (which assessed willingness to participate in a social media data donation study) to actual participation in a real data donation study involving the same survey respondents.

Methods & Data:

A vignette experiment embedded in an online survey of a non-probability-based panel was conducted in Hungary in May 2022 (n=1000). Respondents expressed their willingness to participate in hypothetical data donation studies. In a mixed factorial design, five treatment dimensions were varied in the study descriptions (platform, range of data, upload/download time, monetary-, and non-monetary incentive). In February 2023, the same participants were invited to a real data donation study which had almost the same characteristics as the ones described in the vignettes.

Results:

The correlation between the self-reported willingness and actual participation was only 0.29. Moreover, the drivers of willingness and actual participation were different. For instance, education was one of the strongest predictors of willingness, yet was not significantly associated with actual participation. We also found differences regarding the effect of privacy beliefs or the Big-Five personality traits.

Added Value:

This study contributed to the literature by validating the results of a vignette experiment using within-person comparisons from behavioural data. The results suggest that vignette experiments may strongly suffer from hypothetical or other biases, at least in scenarios when the personal risk and burden are high, and underscore the importance of improving external validity of such experiments.



Who is willing to participate in an app- or web-based travel diary study?

Danielle Remmerswaal1,2, Barry Schouten1,2, Peter Lugtig1, Bella Struminskaya1

1Utrecht University; 2Statistics Netherlands

Relevance & Research Question

Using apps as a survey mode offers promising features. The use of passive measurements on smartphones can be beneficial for response burden, by replacing traditional survey questions, and for data quality as it can reduce recall bias. However, not everyone is able or willing to participate in an app-based study, causing coverage issues and nonresponse. We investigate whether a mixed-mode design can be effective for our goals by analyzing who chooses to participate in an app study and who prefers a web questionnaire.

Methods & Data

We report on a study by Statistics Netherlands (winter 2022-2023) for which we invited 2544 individuals from a cross-sectional sample of the Dutch population. We asked individuals to use a smartphone app to collect their travel data, or participate in a web questionnaire. We combine a concurrent mixed-mode design with a “push-to-app” design by offering the web questionnaire at different moments: directly in the invitation letter or in one of the reminders. Invitees are asked to participate in one mode. We assess whether participation is related to individual characteristics with registry data.

Results

More people register in the app (11.5%) than in the questionnaire (7.0%). Total registration rates are higher when the web questionnaire is offered directly (19.8%) than in the first (18.5%) or second reminder (15.8%). The app registration rate does not increase much when the web questionnaire is offered later, suggesting that certain people have a mode-preference for the app. Most striking is the age effect. The app attracts younger participants while older participants are overrepresented in the web questionnaire. Combining the two yields a more balanced sample.

Added Value

We show that with a mixed-mode design, we can attract more respondents than with an app-only design in a probability based sample. With the use of population registries we are able to improve our understanding of who participates in app- and web-studies. Additionally, our analysis can contribute to the design of future diary studies combining a smartphone app and a web questionnaire.

 
Date: Friday, 23/Feb/2024
11:45am - 12:45pmB5: To Trace or to Donate, That’s the Question
Location: Seminar 2 (Room 1.02)
Session Chair: Alexander Wenz, University of Mannheim, Germany
 

Exploring the Viability of Data Donations for WhatsApp Chat Logs

Julian Kohne1,2, Christian Montag2

1GESIS - Leibniz Institute for the Social Sciences; 2Ulm University

Relevance & Research Question

Data donations are a new tool for collecting research data. They can ensure informed consent, highly granular, retrospective, and potentially less biased behavioral traces, and are independent from APIs or webscraping pipelines. We thus seek to explore the viability of data donations for a type of highly personal data: WhatsApp chat logs. Specifically, we are exploring a wide range of demographic, psychological, and relational charactersitics and how they relate to peoples donation willingness, censoring, and actual data donation behavior.
Methods & Data

We used an opt-in survey assessing demographics, personality, relationship characteristics of a self-selected social relationship, and concerns for privacy. Participants were also asked whether they are willing to donate a WhatsApp chat from a 1:1 chat from the respective relationship. If they agreed, participants were forwarded to an online platform where they could securely upload, review, self-censor, and donate the chat log. Donated chats were anonimized automatically by first extracting variables of interest (e.g. number of words per message, emoji, smilies, sent domains, response time) and then deleting the raw message content. In a second step, participants selected which parts of the anonymized data should be included in the donations. The study was reviewed and accepted by the ethics committee of Ulm University. So far, 244 people participated in the survey and 140 chat log files with over 1 million messages in total were donated.

Preliminary Results

Preliminary results (based on 198 ppts.) show that participants were mostly university students. Self-indicated willingness to donate a chat was surprisingly high (73%), with a sizable gap to actual donations (39.4%). Interestingly participants rarely excluded any data manually after the automatic anonimization step. Furthermore, we did not find any meaningful differences in data donation willingness and behavior with respect to demographics, personality, privacy concerns, or relationship characteristics.

Added Value
Our preliminary results highlight, that opt-in data donations can be a viable method to collect even highly sensitive digital trace data if sufficient measures are taken to ensure anonimization, transparancey, and ease-of-use. We will discuss further implications for study design and participant incentivation based on the larger dataset.



The Mix Makes the Difference: Using Mobile Sensing Data to Foster the Understanding of Non-Compliance in Experience Sampling Studies

Ramona Schoedel1,2, Thomas Reiter2

1Charlotte Fresenius Hochschule, University of Psychology, Germany; 2LMU Munich, Department of Psychology

Relevance & Research Question

For decades, social sciences have focused on broad one-time assessments and neglected the role of momentary experiences and behaviors. Now, novel digital tools facilitate the ambulatory collection of data on a moment-to-moment basis via experience sampling methods. But the compliance to answer short questionnaires in daily life varies considerably between and within participants. Compliance and consequently mechanisms leading to missing data in experience sampling studies, however, still remain in the dark today. In our study we therefore explored person-, context- and behavior-related patterns associated with participants’ compliance in experience sampling studies.

Methods & Data

We used a data set part (N = 592) of the Smartphone Sensing Panel Study recruited according to quotas representing the German population. We extracted over 400 different person-, context-, and behavior-related variables by combining assessments from traditional surveys (e.g., personality traits), experience sampling (e.g., mood), and passively collected mobile sensing data (e.g., smartphone usage, GPS). Based on more than 25,000 observations, we predicted participants' compliance to answer experience sampling questionnaires. For this purpose, we used a machine learning based modeling approach and benchmarked different classification algorithms using 10-fold cross-validation. In addition, we applied methods from interpretable machine learning to better understand the importance of single variables and constellations of variable groups.

Results

We found that compliance to experience sampling questionnaires could be successfully predicted above chance and that among the compared algorithms the linear elastic net model performed best (MAUC = 0.723). Our follow-up analysis showed that study-related past behaviors such as the average response rate to previous experience sampling questionnaires were the most informative, followed by location information such “at home” or “at work”.

Added Value

Our study shows that compliance in experience sampling studies is related to participants' behavioral and situational context. Accordingly, we illustrate systematic patterns associated with missing data. Our study is an empirical starting point for discussing the design of experience sampling studies in social sciences and for pointing out future directions in research addressing experience sampling methodology and missing data.

 
2:00pm - 3:00pmB6.1: Automatic analysis of answers to open-ended questions in surveys
Location: Seminar 2 (Room 1.02)
Session Chair: Barbara Felderer, GESIS, Germany
 

Using the Large Language Model BERT to categorize open-ended responses to the "most important political problem" in the German Longitudinal Election Study (GLES)

Julia Susanne Weiß, Jan Marquardt

GESIS, Germany

Relevance & Research Question

Open-ended survey questions are crucial e.g., for capturing unpredictable trends, but the resulting unstructured text data poses challenges. Quantitative usability requires categorization, a labor-intensive process in terms of costs and time, especially with large datasets. In the case of the German Longitudinal Election Study (GLES) spanning from 2018 to 2022, with nearly 400,000 uncoded mentions, it prompted us to explore new ways of coding. Our objective was to test various machine learning approaches to determine the most efficient and cost-effective method for creating a long-term solution for coding responses, ensuring high quality simultaneously. Which approach is best suited for the long-term coding of open-ended mentions regarding the "most important political problem" in the GLES?

Methods & Data

Pre-2018, GLES data was manually coded. Shifting to a (partially) automated process involved revising the codebook. Subsequently, the extensive dataset comprising nearly 400,000 open responses to the question regarding the "most important political problem" in the GLES surveys conducted between 2018 and 2022 was employed. The coding process was facilitated using the Large Language Model BERT (Bidirectional Encoder Representations from Transformers). During the entire process, we tested a whole host of important aspects (hyperparameter finetuning, downsizing of the “other” category, simulations of different amounts of training data, quality control of different survey modes, using training data from 2017) before arriving at the final implementation.
Results

The "new" codebook already demonstrates high quality and consistency, evident from its Fleiss Kappa value of 0.90 for the matching of individual codes. Utilizing this refined codebook as a foundation, 43,000 mentions were manually coded, serving as the training dataset for BERT. The final implementation of coding for the extensive dataset of almost 400,000 mentions using BERT yields excellent results, with a 0/1 loss of 0.069, a Micro F1 score of 0.946 and a Macro F1 score of 0.878.
Added Value

The outcomes highlight the efficacy of the (partially) automated coding approach, emphasizing accuracy with the refined codebook and BERT's robust performance. This strategic shift towards advanced language models signifies an innovative departure from traditional manual methods, emphasizing efficiency in the coding process.



The Genesis of Systematic Analysis Methods Using AI: An Explorative Case Study

Stephanie Gaaw, Cathleen M. Stuetzer, Maznev Petko

TU Dresden, Germany

Relevance & Research Question

The analysis of open-ended questions in large-scale surveys can provide detailed insights into respondents' views that often can't be assessed with closed-ended questions. However, due to the large number of respondents, it takes a lot of resources to review the answers within open-ended questions and thus provide them as research results. This contribution aims to show the potential benefits and limitations of using AI-based tools (e.g. ChatGPT), for analyzing open-ended questions in large-scaled surveys. It therefore also aims to highlight the challenge of conducting systematic analysis methods with AI.

Methods & Data
As part of a large-scale survey on the use of AI in higher education at a major German university, open-ended questions were included to provide insight into the perceived benefits and challenges for students and lecturers of using AI in higher education. The open-ended responses were then analyzed using a qualitative content analysis. In order to verify whether ChatGPT could be used to analyze the open-ended questions in a faster manner, while maintaining the same quality of results, we asked ChatGPT to analyze the responses in a way similar to our analytical process.

Results
The results show a roadmap of letting ChatGPT analyze our open-ended data. In our case study it obtained categories and descriptions similar to those we obtained by qualitatively analyzing the data ourselves. However, 9 out of 10 times we had to re-prompt ChatGPT to specify the context for the analysis to get the appropriate results. In addition, there were some minor differences in how items were sorted into their respective categories. Yet, despite these limitations, it became clear that 80% of cases, Chat GPT assigned the responses to the derived categories more accurately than our research team did in the qualitative analysis.

Added Value
This paper provides insight into how ChatGPT can be used to simplify and accelerate the standard process of qualitative analysis under certain circumstances. We will give insights into our prompts for ChatGPT, detailed findings from comparing its results with our own, and its limitations to contribute to the further development of systematic analysis methods using AI.



Insights from the Hypersphere - Embedding Analytics in Market Research

Lars Schmedeke, Tamara Keßler

SPLENDID Research, Germany

Relevance & Research Question:

In the intersection of qualitative and quantitative research, analyzing open-ended questions remains a significant challenge for data analysts. The incorporation of AI language models introduces the complex embedding space: a realm where semantics intertwine with mathematical principles. This paper explores how Embedding Analytics, a subset of explainable AI, can be utilized to decode and analyze open-ended questions effectively.

Methods & Data:

Our approach utilized the ada_V2 encoder to transform market research responses into spatial representations on the surface of a 1,536-dimensional hypersphere. This process enabled us to analyze semantic similarities using traditional statistics as well as advanced machine learning techniques. We employed K-Means Clustering for text grouping and respondent segmentation, and Gaussian Mixture Models for overarching topic analysis across numerous responses. Dimensional reduction through t-SNE facilitated the transformation of these complex data sets into more comprehensible 2D or 3D visual representations.

Results:

Utilizing OpenAI’s ada_V2 encoder, we successfully generated text embeddings that can be plausibly clustered based on semantic content, transcending barriers of language and text length. These clusters, formed via K-Means and Gaussian Mixture Models, effectively yield insightful and automated analyses from qualitative data. The two-dimensional “cognitive constellations” created through t-SNE offer clear and accessible visualizations of intricate knowledge domains, such as brand perception or public opinion.

Added Value:

This methodology allows for a precise numerical analysis of verbatim responses without the need for labor-intensive manual coding. It facilitates automated segmentation, simplification of complex data, and even enables qualitative data to drive prediction tasks. The rich, nuanced datasets derived from semantic complexity are suitable for robust analysis using a wide range of statistical methods, thereby enhancing the efficacy and depth of market research analysis.

 
3:15pm - 4:15pmB7: Mobile Apps and Sensors
Location: Seminar 2 (Room 1.02)
Session Chair: Ramona Schoedel, Charlotte Fresenius Hochschule, University of Psychology, Germany
 

Mechanisms of Participation in Smartphone App Data Collection: A Research Synthesis

Wai Tak Tung, Alexander Wenz

University of Mannheim

Relevance & Research Question: Smartphone app data collection has recently gained increasing attention in the social and behavioral sciences, allowing researchers to integrate surveys with sensor data, such as GPS to measure location and movement. Similar to other forms of surveys, participation rates of such studies in general population samples are generally low. Previous research has identified several study- and participant-level determinants of willingness to participate in smartphone app data collection. However, a comprehensive overview of which factors are predictors of willingness and a theoretical framework are currently lacking and some of the effects are inconsistent. To guide future app-based studies, we address the following research questions:

(1) Which study- and participant-level characteristics affect the willingness to participate in smartphone app data collection?

(2) Which theoretical frameworks can be used to understand participation decisions in smartphone app data collection?

Methods & Data: We conduct a systemic review and a meta-analysis on existing studies with app-based data collection guided by the Preferred Reporting Items for Systematic reviews and Meta-analysis (PRISMA) framework (Moher et al. 2009). We compile a list of keywords to search for relevant literature in bibliographic databases. We focus on peer-reviewed articles published in English. We also perform double coding to ensure a reliable selection of literature for the analysis. Finally, we map the identified determinants of willingness to potential theoretical frameworks that can explain participation behavior.

Results: In the systematic review, we summarize findings about study-level characteristics that are under the researchers' control, such as monetary incentives or invitation mode, and participant-level characteristics, such as privacy concerns and socio-demographics. Meanwhile, the meta-analysis focuses on selected characteristics, which have been most often covered in previous research.

Added Value: This study will provide a holistic understanding of the current state of research on participation decisions in app-based studies. The findings will also help researchers to design effective invitation strategies for future studies.



“The value of privacy is not as high as finding my person”: Self-disclosure practices on dating apps illustrate an existential dilemma for data protection

Lusine Petrosyan, Grant Blank

University of Oxford, United Kingdom

Relevance & Research Question: Dating apps create a unique digital sphere where people must disclose sensitive personal information about their demographics, location, values and lifestyle. Because of these intimate disclosures, dating apps constitute a strategic research site to explore how privacy concerns influence personal information disclosure. We use construal-level theory to understand how context influences a decision to disclose. Construal-level theory refers to the influence of psychological distance: the more psychologically distant an event the more mental effort required to understand it. When people have no direct experience in a context they rely on conventional stereotypes and quick generalizations. Using this theory we ask the research question: Why do people choose to disclose or not disclose personal Information on their dating app profile?
Methods & Data: We use in-depth, key-informant interviews with 27 active male and female users of the dating site Hinge. Interviews were transcribed and assigned descriptive, process-oriented and interpretative codes using Atlas.ti software.
Results: Dating site users distinguish two kinds of privacy risks. One class of threats is other dating app users who may misuse their information for embarrassment, harassment or stalking, particularly if it could identify the user. These are contexts where users have personal experience. People consider very carefully what information to disclose or hide at the user-level. The second class is the platform-level: app providers who use or sell their information for targeted advertisements. In this context users have no direct experience. Platform-level use is abstract and requires serious mental effort to understand it. Hence it is seen as not threatening and it is ignored. These results confirm construal-level theory.
Added Value: This research uncovers a previously unnoticed mechanism that governs privacy awareness. It provides clear policy guidelines for enhancing privacy awareness on social media and the Internet in general. Specifically, to encourage people to protect their personal information psychological distance has to be reduced. This can be done by explicit warnings about data use, or explicit statements about data sale and what third parties may do with the information. Warnings should be easily visible on the home page or other prominent locations.



Money or Motivation? Decision Criteria to participate in Smart Surveys

Johannes Volk, Lasse Häufglöckner

Destatis - Federal Statistical Office Germany, Germany

Relevance & Research Question

The German Federal Statistical Office (Destatis) is continuing to develop its data collection instruments and is working on smart surveys in this context. By smart surveys we mean the combination of traditional question-based survey data collection and digital trace data collection by accessing device sensor data via an application (GPS, camera, microphone, accelerometer, ...).

Unlike traditional surveys, smart surveys not only ask respondents for information but also require them to download an app and allow access to sensor data. Destatis conducted focus groups to learn more about the attitudes, motives and obstacles regarding the willingness to participate in smart surveys. This was done as part of the European Union's Smart Survey Implementation (SSI) project, in which Destatis is participating alongside other project partners.

Methods & Data

Three focus groups with a total of 16 participants were conducted at the end of October 2023. The group discussions were led by a moderator using a guideline. The discussions lasted around two hours each and were video-recorded.

Results

Overall, it became clear that participants are more willing to take part in a survey, to download an app and to grant access to sensor data if they see a purpose in doing so on the one hand and have trust on the other. In order to motivate people to participate, it seems particularly important against this background to provide transparent information explaining why to conduct the survey, why they should participate, why access to the sensor data is desired as well as what is being done to ensure a high level of data protection and data security.

Added Value

In official statistics, the development of new survey methods is seen as an important step towards modern data collection. However, modern survey methods can only make a positive contribution if they are used by respondents. The results are intended to provide information on how potential respondents can best be addressed to participate. In the further course of the SSI project, a quantitative field test for recruitment is planned. The results of the focus groups will also be used to prepare this test.

 

 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: GOR 24
Conference Software: ConfTool Pro 2.8.102
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany