Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Bayesian methods 2
Time:
Tuesday, 26/Aug/2025:
9:15am - 10:45am

Location: Biozentrum U1.141

Biozentrum, 124 seats

Show help for 'Increase or decrease the abstract text size'
Presentations
20-bayesian-2: 1

Using a foundation model for detecting and reducing site-specific differences in federated meta-analysis of regression models

Patric Tippmann1,2, Max Behrens1,2, Harald Binder1,2

1Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg; 2Freiburg Center for Data Analysis, Modeling and AI, University of Freiburg

Multi‐centre meta‐analysis of regression models can be impacted by discrepancies in relations between covariates across centres. The reasons for such discrepancies are particularly difficult to detect when data cannot be pooled or directly transferred between centres. We propose a two‐step, federated approach to (1) detect cross‐centre differences and (2) harmonise them under data sharing restrictions. Both steps require a model that captures relations between variables to be trained at a reference centre. The trained model is then transferred to the other centres with restricted data access to diagnose and potentially remove differences. While standard regression models could be used for this task, these might not be flexible enough for reflecting complex patterns. As an alternative, we consider using a foundation model, specifically TabPFN, which encodes a prior distribution on plausible patterns and can be used to obtain a posterior based on the reference centre. A further advantage of this specific approach is robustness to non-linear transformations, outliers and missing data. Furthermore, this flexibility is accompanied by high computational efficiency—requiring only a single forward pass—and suitability for small datasets. Specifically, the trained model is applied locally at each restricted centre to generate predictions for the inaccessible variables. Aggregating these predictions over multiple iterations yields a harmonised estimate of the local correlation structure.

We illustrate these benefits with clinical datasets, with a focus on stroke, demonstrating that a conspicuously different regression coefficient in a restricted collaborating centre shifts substantially toward the reference centre’s value after applying our harmonisation approach to the associated correlation structures. Because a collaborating centre retains its raw data locally—receiving only the trained model from the reference centre—the method takes into account regulatory and ethical constraints on data sharing. The resulting harmonised correlations enable more reliable multi-centre analyses while preserving individual-level privacy.



20-bayesian-2: 2

Sample size calculations for prediction model development: A general Bayesian-framework using posterior distributions to examine expected performance, degradation and stability

Richard D Riley1, Rebecca Whittle1, Mohsen Sadatsafavi2, Glen Martin3, Alexander Pate3, Gary Collins4, Joie Ensor1

1University of Birmingham, United Kingdom; 2The University of British Columbia, Canada; 3University of Manchester, United Kingdom; 4University of Oxford, United Kingdom

Background

For studies developing a clinical prediction model, various sample size calculations exist. However, their underlying theory is often based on standard (unpenalised) regression and extensions to other machine learning approaches are needed.

Objectives

To propose a general Bayesian approach to sample size calculations for model development or updating, based on drawing samples from anticipated posterior distributions, and targeting small reduction in predictive performance (‘model degradation’) compared to an assumed true model.

Methods

Researchers must provide their candidate predictors, a ‘true model’ (e.g., regression equation with intercept and predictor effects that match outcome incidence and c-statistic of previous models), and a (synthetic) dataset reflecting the joint distribution of candidate predictors. Then, for a chosen sample size and development strategy, our general approach is fully simulation-based: generate thousands of models and apply each to a large evaluation dataset to produce posterior distributions of individual predictions, model performance and model degradation. However, to susbtantially improve computational speed for penalised regression (e.g., lasso, ridge), we propose approximating posterior distributions using a one-sample Bayesian analysis that incorporates shrinkage priors alongside the likelihood decomposed into sample size and Fisher’s unit information. The derived posterior distributions enable any criteria to be examined (e.g., mean and variance of calibration slope; expected degradation in c-statistic; mean width of 95% intervals for individual risk; expected value of sample information for decision-making) to inform the (minimum) sample size required.

Results

We illustrate the approach when developing models in pre-eclampsia and show how they encompass any criteria of existing sample size calculations, whilst additionally allowing researchers to examine variability (instability) of model predictions and degradation in model performance. Focusing on ridge and lasso logistic regression, we demonstrate the Bayesian one-sample analysis via our module pmssbayes and compare results/speed with the fully simulation-based approach. We show how the approach informs fairness of models and outline practical options for specifying the ‘true model’ and case-mix distribution.

Conclusions

Our Bayesian approaches generalise existing sample size proposals for model development, by utilising anticipated posterior distributions conditional on a chosen sample size and development strategy, to inform the sample size required to target appropriate model performance, stability and clinical utility.



20-bayesian-2: 3

Efficient Utilization of Dose-Schedule Grids for Optimal Therapeutic Outcomes in Non-Oncology Settings

Lars Andersen, Mitchell Thomann, Thomas Jaki

BOEHRINGER-INGELHEIM PHARMA GMBH & Co KG, Germany

Efficient Utilization of Dose-Schedule Grids for Optimal Therapeutic Outcomes in Non-Oncology Settings

Authors:

Lars Andersen - Boehringer Ingelheim Pharma GmbH & Co. KG,

Mitchell Thomann - Boehringer Ingelheim Pharma GmbH & Co. KG,

Thomas Jaki - University Regensburg

Topics: Bayesian Methods, Simulation Studies, clinical trials - designs

Background: Existing dose-schedule finding methods in phase I oncology studies are constrained by toxicity and small sample sizes, resulting in smaller grids and narrow design spaces. These methods assume predefined dose and schedule ordering and require starting at low doses with structured escalation. In phase II, these assumptions may not hold for efficacy. This simulation study evaluates a more robust modeling framework with separate models for dose and schedule, considering various study designs that allocate across the factorial space with and without optimization criteria as well as with and without an interim analysis.

Methods: Across various designs and modelling frameworks, the study measured performance in terms of go/no-go decision, correct minimum effective dose (MED) estimation, mean squared error (MSE), root mean squared error (RMSE), Akaike information criterion (AIC), Bayesian information criterion (BIC) and the number of patients allocated at the true MED.

Results: The correct MED estimation is not as robust if there is a drop in sample size and therefore the study being potentially underpowered, nor if there are big variance changes on patient level. Model misspecification worsens performance, decision-making, and estimation. Go/No-Go measurements perform well across designs and scenarios if the true model is selected. MSE and RMSE are robust across scenarios, and patient allocation at the true MED mirrors correct MED estimation. The full factorial design performs well and is robust due to its allocation across the sample space. Adding interim analysis slightly improves performance, especially in patient allocation at the true MED. Optimality designs show minimal differences across scenarios.

Conclusion: To conclude, one can say that this simulation study showed a framework for design and modelling in this setting which was developed and tested across a variety of scenarios. In general, this framework was able to show robustness of model selection criteria such as AIC and BIC. Additionally, it also displayed the benefit of an interim analysis to reallocate the patients based on criteria such as the MED.



20-bayesian-2: 4

On the interplay between prior weight and vague variance in Robust Mixture Priors

Marco Ratta1, Gaëlle Saint-Hilary2, Pavel Mozgunov3

1Politecnico di Torino, Italy; 2Saryga, France; 3Cambridge University, United Kingdom

The use of historical data in complementing current control arm in the context of randomized controlled trials (RCTs) is increasingly attractive, particularly when patient recruitment presents a significant hurdle. This necessitates addressing potential conflicts between historical and current trial data. Robust Mixture Priors (RMPs) are a prominent dynamic borrowing approach to mitigate this, consisting in combining an historical informative component and a weakly informative robust component via mixture distribution. Once observed the data, the RMP is updated in a posterior distribution that is again a mixture of the individual posterior distributions, with updated posterior weights. The RMP's key feature is its borrowing mechanism, directly proportional to the agreement between historical and current data. High agreement maximizes borrowing; inconsistencies progressively reduce it.

Specifying parameters for normal RMP components, particularly variance of the robust component and mixture weights, presents a challenge, as these parameters significantly influence posterior inferences. Improper normal distributions seem intuitive for the robust component, however their use has been discouraged, as – for a given value of the mixture weight – it leads to full borrowing even in case of extremely large inconsistency between historical and concurrent data. This phenomenon is known in literature as Lindley’s paradox. For this purpose, weakly informative robust components have been preferred and unit-information prior (UIP) has become a common choice, hence letting the weight of the mixture prior as the only parameter to be elicited based on the prior confidence of the sponsor in the external data. This choice poses some challenges, specifically i) the UIP’s potential over-informativeness in trials with limited sample sizes, and ii) the inflation of the type I error (potentially up to 100%) under unequal allocation of patients to the control and experimental arms.

In this work we first prove that the posterior inference is driven by the specification of both the mixture weight and the robust variance, demonstrating in particular that (infinite many) different pairs of mixture weight and robust variance lead to (almost) the same posterior inference. Moreover, we prove that the joint selection of the mixture weight and variance of the robust component within a RMP framework effectively avoids incurring in Lindley's paradox, guaranteeing good borrowing properties even with arbitrarily large variances. We further demonstrate that employing large variance robust components mitigates type I error inflation in unbalanced trials (or even asymptotically eliminates it). The practical implications of all of these theoretical results will be demonstrated.



20-bayesian-2: 5

Bayesian nonparametric methods for inferring causal effects of longitudinal treatments amidst missing covariate data

Liangyuan Hu

Rutgers University, United States of America

Background / Introduction

Missing covariate data is a prevalent issue in longitudinal studies, posing challenges for causal inference on longitudinal treatments. Imputation is a widely used solution, with most techniques relying on parametric models that explicitly define complex relationships among longitudinal responses, treatments, and covariates. However, incorrect specification of these parametric forms can lead to biases. While machine learning methods have gained traction for handling missing data, their development has predominantly focused on cross-sectional data, leaving longitudinal settings with repeated measures relatively underexplored.

Methods

To address these limitations, we propose a flexible Bayesian nonparametric sequential imputation framework tailored for longitudinal data. We first develop a Bayesian ensemble-tree mixed-effects model BMTrees, and its variants, which leverage nonparametric priors to capture complex, non-linear relationships over time and handle non-normal random effects and errors. We then adapt BMTrees to the sequential imputation framework, effectively modeling relationships between observed and missing variables while incorporating a fitting-with-imputing strategy to enhance computational efficiency. This flexible imputation method can be seamlessly integrated with longitudinal causal inference approaches to enable coherent estimation of time-varying treatment effects.

Results

Simulation studies demonstrate that BMTrees outperforms established ensemble-tree methods, including mixedBART and mixedRF, in both prediction and imputation tasks, particularly in challenging scenarios with non-normal data structures. Through a case study, we demonstrate the use of our sequential imputation method combined with noniterative conditional expectation estimator to evaluate the comparative effectiveness of antihypertensive treatment initiation thresholds for reducing long-term systolic blood pressure.

Conclusion

We recommend using our proposed BMTrees method to impute longitudinal missing values, especially in scenarios where the dependence structure among longitudinal variables is nonlinear, and normality assumptions for model components are violated. This approach facilitates integrative causal analysis for evaluating time-varying treatment effects, coherently accounting for various sources of uncertainty.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: ISCB46
Conference Software: ConfTool Pro 2.6.154+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany