Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Machine learning 1
Time:
Tuesday, 26/Aug/2025:
9:15am - 10:45am

Location: Biozentrum U1.101

Biozentrum, 122 seats

Show help for 'Increase or decrease the abstract text size'
Presentations
21-machine-learning-1: 1

Causal machine learning methods for dynamic and static treatment strategies deprescribing medications in a polypharmacy population using electronic health records.

Maurice M O'Connell1, Michael Abaho2, Aseel S Abuzour3, Matin Ahmed1, Saiqa Ahmed5, Asra Aslam3, Danushka Bollegala2, Iain Buchan2, Harriet Cant1, Andrew Clegg3, Mark Gabbay2, Alan Griffiths5, Layik Hama3, Francine Jury1, Gary Leeming2, Emma Lo2, Frances S Mair4, Simon Maskell2, Erin McCloskey2, Olusegun Popoola2, Samuel Relton3, Roy A Ruddle3, Pieta Schofield2, Eduard Shantsila2, Tjeerd Van Staa1, Lauren E Walker2, Samantha A Wilson2, Alan A Woodall2, Rachael Wright2, Matthew Sperrin1

1University of Manchester; 2University of Liverpool; 3University of Leeds; 4University of Glasgow; 5Public and Patient Partner

Introduction

Despite recent advances, limited evaluation and guidance are available on the implementation of causal inference in the area of polypharmacy where high-dimensional confounding and medication interactions are present. The DynAIRx project (Artificial Intelligence for dynamic prescribing optimisation and care integration in multimorbidity) aims to develop statistical tools supporting GPs and pharmacists to find patients living with multimorbidity and polypharmacy who might be offered a better combination of medicines. We estimate treatment effects of discontinuing medications in a polypharmacy population.

Methods

Within a polypharmacy cohort, we estimate the average causal effect of both time-fixed and time-varying treatment strategies comparing deprescribing versus continuing specific medications as advised by expert clinicians, e.g. individuals stopping either antiplatelets or anticoagulants, neither or both and the corresponding risk of strokes, bleeds and death. We underwent a detailed causal elicitation process with expert clinicians to draw causal diagrams, pre-specify dynamic treatment strategies and identify all important variables from electronic health records (EHRs).

We emulate target trials (using both sequential target trials and landmarking approaches) to estimate the average effect of treatment strategies using EHRs from the Clinical Practice Research (CPRD) Database. We target different causal estimands (e.g., total, direct, or separable effects) to allow for competing events from parametric pooled-over time logistic models using G-methods. Negative control outcomes are used to check robustness.

Causal machine learning is used to semi-parametrically include a larger combination of medications and interactions than included in our expert elicited causal diagram e.g., Targeted Maximum Likelihood Estimators, augmented inverse probability weighting with data adaptive approaches, cross-fitting, and super learner ensemble learning.

We plan to estimate individualised heterogeneous treatment effects (conditional average treatment effects over the smallest subgroups that can be supported by the data), e.g., R-, S-, T-, U-, X-, RS-, DR-learners evaluated with impact fraction rank-weighting metrics. When selecting patients for medication review, we can prioritise patients who are at highest risk of harms (benefits) from (changes in) treatment strategies.

Results

We aim to present interim results from a large CPRD database consisting of millions of EHRs.

Discussion

How do we give better advice in medication reviews to those with multimorbidity and polypharmacy, traditionally excluded from clinical trials? In clinical practice these prescribing decisions have been experienced a large number of times in EHRs. DynAIRx combines causal AI, guidelines and computing power linked to EHRs and where possible randomised controlled trials to estimate these complex causal effects.



21-machine-learning-1: 2

Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling

David Svensson1, Erik Hermansson1, Konstantinos Sechidis2, Nikos Nikolaou3, Ilya Lipkovich4

1AstraZeneca, Sweden; 2Novartis, Switzerland; 3UCL, London; 4Eli Lilly and Company, USA

Background / Introduction: In recent years, two parallel research trends have emerged in machine learning: the modeling of Individual Treatment Effects, particularly the Conditional Average Treatment Effect (CATE) using meta-learner techniques [1], and the field of Explainable Machine Learning (XML). While CATE modeling aims to identify causal effects from observational data, XML focuses on making complex models more interpretable, with Shapley Additive Explanations (SHAP) being a prominent technique [2]. Despite SHAP's popularity in supervised learning, its application in identifying predictive biomarkers through CATE models remains underexplored, especially in pharmaceutical precision medicine.

Methods: We address the inherent challenges of applying SHAP in multi-stage CATE strategies by introducing an approach that is agnostic to the choice of CATE strategy, effectively reducing computational burdens in high-dimensional data. Our method involves a secondary modeling step after estimating individual treatment effects, regressing the estimated CATE against baseline covariates using a boosting model, from which SHAP importance is derived for each covariate.

Results: Using our proposed method, we conduct simulation benchmarking to evaluate the ability to accurately identify biomarkers using SHAP values derived from various CATE meta-learners and Causal Forest [3]. This two-step approach provides a novel and unified way to evaluate different CATE models based on their ability to accurately identify predictive covariates through SHAP rankings. The results suggest that the architecture of the CATE model can greatly impact performance.

Conclusion: Our study highlights key considerations when using SHAP values to explain models aimed at estimating causal quantities rather than traditional supervised learning. We investigate the operating characteristics of several popular CATE modeling choices using this new metric, providing insights into the impact of meta-learner schemes on covariate ranking accuracy. This research contributes to the understanding of predictive covariates underlying CATE estimates and offers a robust framework for future studies in causal inference and precision medicine.

References:
[1] Lipkovich I, Svensson D, Ratitch B, Dmitrienko A. Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data. Statistics In Medicine 2024.

[2] Lundberg S, Su-In L. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS), 2017.

[3] Athey S, Imbens G, Recursive partitioning for heterogeneous causal effects, PNAS, 2016.



21-machine-learning-1: 3

Signpost testing to navigate the high-dimensional parameter space of the linear regression model

Wessel van Wieringen1,2

1Dept. Epidemiology & Data Science, Amsterdam UMC; 2Dept. Mathematics, Vrije Universiteit

Breast cancer knows many subtypes with different prevalences. While fundamentally different, the subtypes share commonalities. Such commonalities may benefit the statistical learning from data of the lesser prevalent subtypes, especially in high-dimensional situations.
We evaluate the relevance of external quantative information on the parameter of a linear regression model from high-dimensional data. The external information comes in the form of a parameter value available from a related knowledge domain or population, for instance from a more prevalent breast cancer subtype. The direction from a null to this externally provided parameter value serves as a signpost in the vast parameter space.
We present a hypothesis test, the textit{signpost test}, to guide the search for the location of the parameter of the linear regression model in the high-dimensional setting. If the signpost test is significant, it is worthwhile to follow the signpost in the search of true parameter value. Our test statistic measures the relevance of the signpost's direction. We derive the test statistic's limiting distribution and provide approximations to other cases. The signpost's significance is assessed by comparing the signpost's direction to that of randomly rotations of this direction. We present a Bayesian interpretation of the signpost test and its connection to the global test. In simulation we investigate the signpost test's type I error and power, with particular interest in the effect of regularization and high-dimensionality in finite samples on these properties, and under misspecification of the alternative hypothesis.
We employ the signpost test to illustrate how the learning of the regulatory mechanism of well-known cancer genes in a low prevalent breast cancer subtype benefits from external knowledge on this mechanism obtained from data of a more prevalent and related but fundamentally different subtype.
The signpost test also finds use within the context of federated learning. It then serves as a means to evaluate the informativeness of an external parameter estimate, e.g. provided by a foreign institution, for the in-house model.



21-machine-learning-1: 4

Leveraging Influence Functions for Statistical Inference in R

Klaus Holst

Novo Nordisk, Denmark

Influence functions (IFs), also known as influence curves or canonical gradients, are essential for characterizing regular and asymptotic linear estimators. They enable the direct calculation of properties such as asymptotic variance and facilitate the construction of new estimators through straightforward combinations and transformations. In this presentation, we will demonstrate how to work effectively with IFs in the statistical software R. Several examples will be provided to illustrate how to estimate and manipulate IFs, with specific applications in analyzing randomized clinical trials and multiple testing.



21-machine-learning-1: 5

Optimal testing for the presence of conditional average treatment effects

Feng Liang1, Kelly Van Lancker1, Stijn Vansteelandt1,2

1Ghent University, Belgium; 2London School of Hygiene and Tropical Medicine, UK

In May 2023, the U.S. Food and Drug Administration (FDA) issued industry guidance titled "Adjustment for Covariates in Randomized Clinical Trials for Drugs and Biological Products." This guidance advocates for the use of efficient estimators and tests for the average treatment effect, which adjust for baseline imbalances using flexible, data-adaptive methods while mitigating concerns about model misspecification bias. However, these approaches, though optimal for estimating the average treatment effect, do not fully leverage the information contained in baseline covariates, particularly in the presence of treatment effect heterogeneity, as the average treatment effect dilutes such heterogeneity.

To address this limitation, we develop optimal tests for the null hypothesis that the conditional average treatment effect is zero across all levels of measured baseline covariates. Our approach employs debiased machine learning and is inspired by the Projected Covariance Measure test, which we generalize to enhance its applicability. Through theoretical analysis and simulations, we compare our method with Maximum Likelihood Estimation and Augmented Inverse Probability Weighting, demonstrating that our test achieves higher statistical power while maintaining valid Type I error control. These advantages position our method as a competitive alternative for causal inference in both randomized clinical trials and observational studies, particularly in settings where treatment effect heterogeneity is anticipated.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: ISCB46
Conference Software: ConfTool Pro 2.6.154+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany