Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Causal inference: Mixed topics
Time:
Wednesday, 27/Aug/2025:
2:00pm - 3:30pm

Location: Biozentrum U1.101

Biozentrum, 122 seats

Show help for 'Increase or decrease the abstract text size'
Presentations
39-causal-inf-mixed-top: 1

Establishing when to use causal machine learning for conditional average treatment effect estimation in randomised controlled trials using simulation

Eleanor Van Vogt1, Suzie Cro1, Karla Diaz-Ordaz2

1Imperial College London, United Kingdom; 2University College London, United Kingdom

Background: Randomised controlled trials (RCTs) typically focus on estimating the average treatment effect (ATE), which often results in null conclusions. However, where heterogeneous treatment effects (HTEs) are of interest, factors responsible for variation must be pre-specified. There is growing interest in exploring HTEs in the context of personalised treatment regimens and policy decisions, and causal machine learning methods for HTEs are increasing in popularity. They offer flexible tools for exploring HTEs across many covariates without needing pre-specification by learning the conditional average treatment effect (CATE).

Current usage of these methods is restricted to exploring HTEs and generating hypotheses for validation in a future dataset. Existing questions relate to the sample sizes required to obtain valid CATEs and the impact of missing covariate information on resulting CATEs.

Methods: We conduct a simulation study to compare several causal machine learning candidates and classical subgroup detection methods across simple and complex HTE scenarios with varying sample sizes and missing data mechanisms. We consider binary and continuous outcomes and the handling of competing events. We explore bias, coverage of HTE estimates, and error rates for global heterogeneity tests.

Informed by minimum sample size requirements from our simulation and results from heterogeneity testing, we additionally simulate scenarios where HTE hypotheses are generated during an interim analysis of an RCT and then validated on the later recruited participants. Large RCTs could potentially use this approach to generate and validate subgroup findings and provide treatment recommendations.

Expected Results and Discussion: By addressing challenges such as minimum sample size requirements and missing data handling, the presented results from simulations will provide researchers with a framework to decide whether causal machine learning methods are suitable for RCT datasets at their disposal. Further, the proposed interim analysis framework has the potential to enhance RCT utility, enabling real-time hypothesis generation and validation for personalised, evidence-driven treatment.



39-causal-inf-mixed-top: 2

Variable selection in Causal Survival Analysis

Charlotte Voinot1,2, Julie Josse1, Bernard Sebastien2

1Premedical, INRIA, France; 2Sanofi R&D, France

Background In classical causal inference, it is well-established that instrumental variables should not be included in adjustment models, whereas precision variables should be included as they lead to a gain in variance, even when employing weighting estimators (IPW). However, no analogous guidelines exist in causal survival analysis, where numerous estimators rely on different nuisance models, yet variable selection strategies remain largely unexamined. Given the various estimators available for restricted mean survival time (RMST), understanding the role of different types of variables—including precision variables, instrumental variables, and censoring-related variables—is crucial for improving estimation efficiency.

Methods We estimate RMST using different causal survival estimators such as the G-formula, IPTW-Buckley-James, IPTW-IPCW Kaplan-Meier, and the doubly/triply robust AIPTW-AIPCW and analyze the impact of variable selection across treatment, outcome, and censoring models. Our study assesses how different types of variables—precision variables (affecting only the outcome), instrumental variables (affecting only treatment), and censoring-related variables—influence estimator variance. In particular, we examine the inclusion of variables that affect both censoring and outcome, which may differ from classical confounders. Variance calculations and simulations are conducted to evaluate the effects of these choices.

Results Our findings confirm that including precision variables in the outcome model improves estimator efficiency. Moreover, consistent with findings in classical causal inference, precision variables also provide benefits when incorporated into the treatment model, even in the context of weighting estimators. Similarly, instrumental variables increase variance in the treatment model, aligning with findings from classical causal inference, reinforcing the necessity of careful variable selection. Regarding censoring-related variables, we find that those affecting both censoring and the outcome improve precision when included in the outcome model, whereas variables solely related to censoring increase variance and should not be included in the censoring model.

Conclusion This study provides practical recommendations for variable selection in causal survival analysis. Our results highlight that precision variables should be included, while instrumental variables should be avoided. Additionally, we provide new insights specific to survival analysis: censoring-related variables negatively impact variance in the censoring model, but variables influencing both censoring and the outcome enhance precision when included in the outcome model. These findings apply to parametric and semi-parametric models. However, in nonparametric approaches like causal forests, the conclusions may be more nuanced. With finite sample sizes, including additional variables could introduce a bias-variance tradeoff, and the observed benefits in this study would likely hold only asymptotically.



39-causal-inf-mixed-top: 3

An Overlooked Stability Property of the Risk Ratio and Its Practical Implications

Marco Piccininni1, Mats J. Stensrud2

1Digital Health - Machine Learning Research Group, Hasso Plattner Institute for Digital Engineering, Potsdam, Germany; 2Chair of Biostatistics, Institute of Mathematics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Risk ratios are widely used effect measures in empirical research, but their stability and transportability across populations remain debated. Here, we show that the causal risk ratio is stable under selection based on immune status. For example, the causal risk ratio remains unchanged when individuals who cannot experience the outcome, regardless of treatment, are excluded from a study. We term this property "immune-selection stability" (ISS).
ISS applies broadly and generalizes previous findings on the stability of risk ratios. Furthermore, unlike earlier results, ISS does not rely on assumptions about cross-world counterfactuals. We also demonstrate an analogous property for survival ratios.
Despite decades of discussion on the properties of risk ratios, ISS has received little to no attention. However, its implications for interpreting, comparing, and transporting estimates across populations are considerable. We illustrate the practical relevance of ISS by discussing the results of hypothetical HIV trials.



39-causal-inf-mixed-top: 4

Introducing an open access simulated benchmarking data resource to enable assessment and neutral comparison of causal inference methods

Ruth Keogh1, Nan van Geloven2, Daniala Weir3

1London School of Hygiene and Tropical Medicine (LSHTM), United Kingdom; 2Leiden University Medical Center, Leiden, The Netherlands; 3Utrecht University, Utrecht, The Netherlands

Observational data, such as from electronic health records, provide opportunities to address questions about causal effects of interventions under certain assumptions, and there is an extensive and growing literature on causal inference methods that enable this, including methods that combine statistical and machine learning techniques. The increasing availability and complexity of causal inference methods raises challenges for researchers making decisions about which methods to use in practice. Firstly, it is important to make comparisons of methods and their suitability for addressing different causal questions, but there is a lack of detailed and neutral comparisons, particularly using the type of observational data commonly faced in practice. Secondly, there is a lack of openly accessible data that researchers can use to learn, and teach others, how to implement methods and assess their practical feasibility.

In this work we have developed a simulated data resource designed to mimic complex longitudinal observational data. The data resource is intended to enable comparisons of methods for addressing a range of different types of causal question, as well as helping researchers to learn new methods. The data is based on a case-study concerning choice of second-line treatments for people with type-2 diabetes. It mimics longitudinal data including time-dependent treatments, longitudinal covariates of different types, and time-to-event outcomes, including competing events.

The data resource will be introduced and its potential use as a benchmarking data set, a template for a simulation study, or an educational tool will be discussed. Benchmarking data sets are real data sets to which different methods can be applied and compared, but they have the disadvantage that the data generating mechanism is unknown. Simulated data sets have the advantage of a known data generating mechanism, but they tend to be simpler than real data and have been criticised for being designed to favour some analysis methods over others. Our simulated data resource combines some of the benefits of real and simulated data, by mimicking the complexities of real data, while retaining the advantage that the data-generating process is known. The data-generating mechanism is designed so as not to favour particular analysis methods, therefore enabling more neutral comparison studies.

The use of the data will be illustrated with a comparison of causal inference methods for estimating treatment effects on time-to-event outcomes, including g-methods and doubly-robust methods incorporating machine learning techniques. The potential for further development of the resource by the community will also be discussed.



39-causal-inf-mixed-top: 5

Simulating data from marginal structural models for a survival time outcome

Shaun Seaman1, Ruth Keogh2

1University of Cambridge, United Kingdom; 2London School of Hygiene and Tropical Medicine

Marginal structural models (MSMs) are often used to estimate causal effects of treatments on survival time outcomes from observational data when time-dependent confounding may be present. They can be fitted using, e.g., inverse probability of treatment weighting (IPTW). It is important to evaluate the performance of statistical methods in different scenarios, and simulation studies are a key tool for such evaluations. In such simulation studies, it is common to generate data in such a way that the model of interest is correctly specified, but this is not always straightforward when the model of interest is for potential outcomes, as is an MSM. Methods have been proposed for simulating from MSMs for a survival outcome, but these methods impose restrictions on the data-generating mechanism. Here we propose a method that overcomes these restrictions. The MSM can be, for example, a marginal structural logistic model for a discrete survival time or a Cox or additive hazards MSM for a continuous survival time. The hazard of the potential survival time can be conditional on baseline covariates, and the treatment variable can be discrete or continuous. We illustrate the use of the proposed simulation algorithm by carrying out a brief simulation study. This study compares the coverage of confidence intervals calculated in two different ways for causal effect estimates obtained by fitting an MSM via IPTW.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: ISCB46
Conference Software: ConfTool Pro 2.6.154+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany