JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at organizers@iscb2025.info.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Causal inference: Mixed topics

Time:

Wednesday, 27/Aug/2025:

2:00pm - 3:30pm

Location: Biozentrum U1.101

Biozentrum, 122 seats

Presentations

39-causal-inf-mixed-top: 1

Establishing when to use causal machine learning for conditional average treatment effect estimation in randomised controlled trials using simulation

Eleanor Van Vogt¹, Suzie Cro¹, Karla Diaz-Ordaz²

¹Imperial College London, United Kingdom; ²University College London, United Kingdom

Background: Randomised controlled trials (RCTs) typically focus on estimating the average treatment effect (ATE), which often results in null conclusions. However, where heterogeneous treatment effects (HTEs) are of interest, factors responsible for variation must be pre-specified. There is growing interest in exploring HTEs in the context of personalised treatment regimens and policy decisions, and causal machine learning methods for HTEs are increasing in popularity. They offer flexible tools for exploring HTEs across many covariates without needing pre-specification by learning the conditional average treatment effect (CATE).

Current usage of these methods is restricted to exploring HTEs and generating hypotheses for validation in a future dataset. Existing questions relate to the sample sizes required to obtain valid CATEs and the impact of missing covariate information on resulting CATEs.

Methods: We conduct a simulation study to compare several causal machine learning candidates and classical subgroup detection methods across simple and complex HTE scenarios with varying sample sizes and missing data mechanisms. We consider binary and continuous outcomes and the handling of competing events. We explore bias, coverage of HTE estimates, and error rates for global heterogeneity tests.

Informed by minimum sample size requirements from our simulation and results from heterogeneity testing, we additionally simulate scenarios where HTE hypotheses are generated during an interim analysis of an RCT and then validated on the later recruited participants. Large RCTs could potentially use this approach to generate and validate subgroup findings and provide treatment recommendations.

Expected Results and Discussion: By addressing challenges such as minimum sample size requirements and missing data handling, the presented results from simulations will provide researchers with a framework to decide whether causal machine learning methods are suitable for RCT datasets at their disposal. Further, the proposed interim analysis framework has the potential to enhance RCT utility, enabling real-time hypothesis generation and validation for personalised, evidence-driven treatment.

39-causal-inf-mixed-top: 2

Variable selection in Causal Survival Analysis

Charlotte Voinot^1,2, Julie Josse¹, Bernard Sebastien²

¹Premedical, INRIA, France; ²Sanofi R&D, France

Background In classical causal inference, it is well-established that instrumental variables should not be included in adjustment models, whereas precision variables should be included as they lead to a gain in variance, even when employing weighting estimators (IPW). However, no analogous guidelines exist in causal survival analysis, where numerous estimators rely on different nuisance models, yet variable selection strategies remain largely unexamined. Given the various estimators available for restricted mean survival time (RMST), understanding the role of different types of variables—including precision variables, instrumental variables, and censoring-related variables—is crucial for improving estimation efficiency.

Methods We estimate RMST using different causal survival estimators such as the G-formula, IPTW-Buckley-James, IPTW-IPCW Kaplan-Meier, and the doubly/triply robust AIPTW-AIPCW and analyze the impact of variable selection across treatment, outcome, and censoring models. Our study assesses how different types of variables—precision variables (affecting only the outcome), instrumental variables (affecting only treatment), and censoring-related variables—influence estimator variance. In particular, we examine the inclusion of variables that affect both censoring and outcome, which may differ from classical confounders. Variance calculations and simulations are conducted to evaluate the effects of these choices.

Results Our findings confirm that including precision variables in the outcome model improves estimator efficiency. Moreover, consistent with findings in classical causal inference, precision variables also provide benefits when incorporated into the treatment model, even in the context of weighting estimators. Similarly, instrumental variables increase variance in the treatment model, aligning with findings from classical causal inference, reinforcing the necessity of careful variable selection. Regarding censoring-related variables, we find that those affecting both censoring and the outcome improve precision when included in the outcome model, whereas variables solely related to censoring increase variance and should not be included in the censoring model.

Conclusion This study provides practical recommendations for variable selection in causal survival analysis. Our results highlight that precision variables should be included, while instrumental variables should be avoided. Additionally, we provide new insights specific to survival analysis: censoring-related variables negatively impact variance in the censoring model, but variables influencing both censoring and the outcome enhance precision when included in the outcome model. These findings apply to parametric and semi-parametric models. However, in nonparametric approaches like causal forests, the conclusions may be more nuanced. With finite sample sizes, including additional variables could introduce a bias-variance tradeoff, and the observed benefits in this study would likely hold only asymptotically.

39-causal-inf-mixed-top: 3

An Overlooked Stability Property of the Risk Ratio and Its Practical Implications

Marco Piccininni¹, Mats J. Stensrud²

¹Digital Health - Machine Learning Research Group, Hasso Plattner Institute for Digital Engineering, Potsdam, Germany; ²Chair of Biostatistics, Institute of Mathematics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Risk ratios are widely used effect measures in empirical research, but their stability and transportability across populations remain debated. Here, we show that the causal risk ratio is stable under selection based on immune status. For example, the causal risk ratio remains unchanged when individuals who cannot experience the outcome, regardless of treatment, are excluded from a study. We term this property "immune-selection stability" (ISS).
ISS applies broadly and generalizes previous findings on the stability of risk ratios. Furthermore, unlike earlier results, ISS does not rely on assumptions about cross-world counterfactuals. We also demonstrate an analogous property for survival ratios.
Despite decades of discussion on the properties of risk ratios, ISS has received little to no attention. However, its implications for interpreting, comparing, and transporting estimates across populations are considerable. We illustrate the practical relevance of ISS by discussing the results of hypothetical HIV trials.

39-causal-inf-mixed-top: 4

Introducing an open access simulated benchmarking data resource to enable assessment and neutral comparison of causal inference methods

Ruth Keogh¹, Nan van Geloven², Daniala Weir³

¹London School of Hygiene and Tropical Medicine (LSHTM), United Kingdom; ²Leiden University Medical Center, Leiden, The Netherlands; ³Utrecht University, Utrecht, The Netherlands

Observational data, such as from electronic health records, provide opportunities to address questions about causal effects of interventions under certain assumptions, and there is an extensive and growing literature on causal inference methods that enable this, including methods that combine statistical and machine learning techniques. The increasing availability and complexity of causal inference methods raises challenges for researchers making decisions about which methods to use in practice. Firstly, it is important to make comparisons of methods and their suitability for addressing different causal questions, but there is a lack of detailed and neutral comparisons, particularly using the type of observational data commonly faced in practice. Secondly, there is a lack of openly accessible data that researchers can use to learn, and teach others, how to implement methods and assess their practical feasibility.

In this work we have developed a simulated data resource designed to mimic complex longitudinal observational data. The data resource is intended to enable comparisons of methods for addressing a range of different types of causal question, as well as helping researchers to learn new methods. The data is based on a case-study concerning choice of second-line treatments for people with type-2 diabetes. It mimics longitudinal data including time-dependent treatments, longitudinal covariates of different types, and time-to-event outcomes, including competing events.

The data resource will be introduced and its potential use as a benchmarking data set, a template for a simulation study, or an educational tool will be discussed. Benchmarking data sets are real data sets to which different methods can be applied and compared, but they have the disadvantage that the data generating mechanism is unknown. Simulated data sets have the advantage of a known data generating mechanism, but they tend to be simpler than real data and have been criticised for being designed to favour some analysis methods over others. Our simulated data resource combines some of the benefits of real and simulated data, by mimicking the complexities of real data, while retaining the advantage that the data-generating process is known. The data-generating mechanism is designed so as not to favour particular analysis methods, therefore enabling more neutral comparison studies.

The use of the data will be illustrated with a comparison of causal inference methods for estimating treatment effects on time-to-event outcomes, including g-methods and doubly-robust methods incorporating machine learning techniques. The potential for further development of the resource by the community will also be discussed.

39-causal-inf-mixed-top: 5

Simulating data from marginal structural models for a survival time outcome

Shaun Seaman¹, Ruth Keogh²

¹University of Cambridge, United Kingdom; ²London School of Hygiene and Tropical Medicine

Marginal structural models (MSMs) are often used to estimate causal effects of treatments on survival time outcomes from observational data when time-dependent confounding may be present. They can be fitted using, e.g., inverse probability of treatment weighting (IPTW). It is important to evaluate the performance of statistical methods in different scenarios, and simulation studies are a key tool for such evaluations. In such simulation studies, it is common to generate data in such a way that the model of interest is correctly specified, but this is not always straightforward when the model of interest is for potential outcomes, as is an MSM. Methods have been proposed for simulating from MSMs for a survival outcome, but these methods impose restrictions on the data-generating mechanism. Here we propose a method that overcomes these restrictions. The MSM can be, for example, a marginal structural logistic model for a discrete survival time or a Cox or additive hazards MSM for a continuous survival time. The hazard of the potential survival time can be conditional on baseline covariates, and the treatment variable can be discrete or continuous. We illustrate the use of the proposed simulation algorithm by carrying out a brief simulation study. This study compares the coverage of confidence intervals calculated in two different ways for causal effect estimates obtained by fitting an MSM via IPTW.

Mobile View Print View

Contact and Legal Notice · Contact Address:

organizers{at}iscb2025 dot

info

Privacy Statement · Conference: ISCB46