Do machine learning techniques improve nonresponse weighting?
Barbara Felderer, Björn Rohr, Christian Bruch
GESIS, Germany
Relevance & Research Question
Nonresponse weighting is an important tool for improving the representativeness of surveys, e.g. weighting respondents according to their inverse propensity to respond (IPW). IPW estimates a person's propensity to respond based on characteristics that are available for respondents and nonrespondents. While logistic regression is typically used to estimate the response propensity, machine learning methods offer several advantages: they allow for very flexible estimation of relationships and the inclusion of a large number of potentially correlated predictor variables. ML methods are known to predict values very accurately. However, it is also known that the estimation of relationships of the weighting variables and the response propensity suffers from regularization bias. With regard to weighting, it is unclear which of these properties is more relevant and has a greater influence on the quality of the weighted estimate. In this study, we address the question of whether machine learning methods outperform logistic regression in performing IPW.
Methods & Data
In a simulation study that mimics the three nonresponse models (separate cause model, common cause model, survey variable cause model) and varies the number of features that affect nonresponse, we apply IPW weighting using five different prediction models: Regression Trees (CART), Random Forest, Boosting, Lasso, and Logistic Regression. We conclude the analysis with an application to voting decisions collected in the German Internet Panel.
Results
Machine learning methods perform similarly well to logit regression and lead to a lower variance in the estimates than logit regression. Overall, the advantage of an excellent prediction seems to outweigh the disadvantages of regularization bias.
Added Value
The presentation provides guidance on how to improve the weighting of surveys, which is a crucial task when drawing conclusions about the general population from a survey.
Company Nonresponse in Gender Inequality Surveys: Challenges in Participation and Implications for Data Quality
Elena Ferrari1, Margherita Pellegrino2, Vera Lomazzi2, Flavia Pesce1
1ARS - Associazione per la ricerca sociale; 2University of Bergamo
Relevance & Research Question Gender inequality in the labor market is still nowadays a significant issue. However, gathering data from a local and representative sample to understand the barriers faced by employed women in terms of career progression, pay equity, and work-life balance remains challenging. This research aims to examine the difficulties of engaging companies to participate in surveys on gender inequality and the subsequent implications for data quality. Understanding these challenges is crucial for advancing research methods and, in this specific case, for better understanding social inequality and improving future survey methodologies in sensitive areas.
Sampling challenges, company engagement, labor market.
Methods & Data
A representative sample of 1,400 companies, each employing over 50 workers, from three Italian provinces of (Milan, Bergamo, and Brescia) was contacted to participate in the survey, with the aim of exploring company policies on gender inclusion, work-life balance, career development, and diversity. Challenges related to corporate nonresponse, such as issues with contact information accuracy, email deliverability (e.g., emails marked as spam), and hesitancy to participate, were systematically documented and analyzed. To increase the number of responses several strategies were applied. Our research offers valuable insights into how to increase companies’ responses.
Representative sample, survey strategies.
Results
We will present our findings on how different strategies for involving companies can impact response rates. We will begin by discussing the initial recruitment attempt and the subsequent strategies employed to increase participation. Finally, we will examine the advantages ofthis type of research, the complexities inherent in this methodology, and strategies to overcome these challenges.
Response rates, methodology.
Added Value
This study provides critical perspectives into the complexities of corporate engagement in gender inequality research. It highlights the importance of overcoming participation barriers to obtain a representative sample, by identifying strategies to improve company engagement, this research can inform future studies and help develop more effective methods to engage local samples and obtain representative results.
Engagement optimization, gender inequality research, research impact.
Reasons for participating in (non)probability online panels and how to ask about it
Tanja Kunz, Irina Bauer
GESIS – Leibniz Institute for the Social Sciences, Germany
Relevance & Research Question Knowing why respondents join online panels helps researchers design more engaging recruitment strategies, especially for hard-to-reach populations. Understanding motivations also supports retention strategies by aligning the panel experience with participants’ expectations. This study aims to explore the reasons why individuals join online panels and beyond, to assess how different question designs—specifically closed versus open-ended formats asking for single versus multiple responses—affect the measurement of participation motives and whether they differ between nonprobability and probability-based panels. Methods & Data The study was conducted using data collected from two types of online panels: a probability panel recruited via random sampling and a nonprobability panel relying on self-selection. Respondents (Nprob = 19,876 and Nnonprob = 3,256) were randomly assigned to one of four experimental groups: closed question asking for a single response, closed question asking for multiple responses, open-ended question asking for a single reason, and open-ended question asking for multiple reasons. Responses were analyzed to assess differences in the diversity, ranking, and completeness of reasons reported across different formats and panel types. Results Preliminary findings suggest that the most important reason for participating and the prioritization of influencing factors differ between probability and nonprobability panelists. In addition, open-ended questions lead to slightly more diverse responses, but at the cost of drastically higher nonresponse rates. The different question formats also lead to sometimes striking differences in the ranking of reasons. Added Value This study sheds light on the reasons for participation in online panels and highlights the methodological implications of question design.
A deeper look into education bias in web surveys
Mustafa Coban1, Christine Distler2, Mark Trappmann3
1Institute for Employment Research (IAB), Germany; 2Institute for Employment Research (IAB), Germany; 3Institute for Employment Research (IAB), Germany
Relevance & Research Question
The COVID-19 pandemic has accelerated a trend in survey research to use online data collection for general population samples. High-quality web surveys recently achieved response rates comparable to or even exceeding those of telephone surveys. However, selection bias concerning education is often more pronounced. To address this issue, we analyze complete employment biographies of both respondents and non-respondents and focus on three main research questions: (1) How do the different stages of the recruitment process for an online panel contribute to education bias? (2) Are there specific subgroups within the low-educated population who are even less likely to participate? (3) Are there interaction effects between education and other predictors of nonresponse? Methods & Data
In 2023, the Institute for Employment Research in Germany launched a new online panel survey of the German workforce (IAB-OPAL) using a push-to-web approach. Addresses were sampled from a powerful database comprising compulsory social insurance notifications by employers as well as unemployment insurance and welfare benefit records. We utilize this unique opportunity of a sampling frame containing detailed individual-level information on complete employment biographies. Results
Our findings indicate that educational bias accumulates at every stage of the recruitment process. We observed that unit nonresponse is more pronounced among individuals with lower education levels, particularly for respondents aged 50 and older and foreign nationals. Additionally, nationality appears to have a greater impact on highly educated individuals, and women are less likely to participate unless they hold advanced degrees. Added Value
Using a detailed sampling frame that includes individual-level information from complete employment histories enables us to evaluate how educational bias emerges throughout the recruitment process. It also allows us to determine if response tendencies within different educational strata vary based on typically unobserved factors, such as experience with benefit receipt, occupations, or wages.
|