JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at organizers@iscb2025.info.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Only Sessions at Date / Time

Session Overview

Session

Poster Exhibition: W / Wednesday posters at Biozentrum

Time:

Wednesday, 27/Aug/2025:

10:30am - 11:00am

Location: Biozentrum, 2nd floor

Biozentrum, 2nd floor poster area

Presentations

posters-wednesday-BioZ: 1

Cure models to compare aftercare monitoring schemes in pediatric cancer

Ulrike Pötschger¹, Harm van Tinteren², Evgenia Glogova¹, Helga Arnardottir¹, Paulina Kurzmann¹, Sabine Taschner-Mandl¹, Lieve Titgat², Martina Mittlböck³

¹St. Anna Children's Cancer Research Institute, Austria; ²Princess Maxima Center; ³Medical University of Vienna, Center for Medical Data Science

Background / Introduction

Neuroblastoma is a malignant tumor of the peripheral nervous system and 50% of the patients are high risk with a poor outcome. Monitoring with minimally invasive liquid biopsies may now allow earlier detection of tumor recurrence compared to conventional follow up evaluations based on imaging and bone marrow biopsies.

In a randomized study two monitoring strategies for relapsed Neuroblastoma are compared: minimal invasive liquid biopsies-based monitoring and conventional follow-up evaluations imaging and bone marrow biopsies.

The primary endpoint is disease-free survival (DFS). When liquid biopsies monitoring is beneficial, disease recurrences can be detected earlier. Thus, survival curves are expected to show an early group difference that vanishes in the long-term and consequently non-proportional hazards are expected.

Methods

The primary statistical evaluation of the treatment effect will be done with a Weibull mixture cure model. The crucial assumption underlying a mixture Cure model is that DSF results from the survival experience of two subgroups: cured patients and uncured patients. Within this model the proportion of cured patients and the time of an event for the uncured subpopulation are modelled separately. The time to detect a recurrence in the subpopulation of uncured patients is of primary interest here.

Monte-Carlo simulations were performed to evaluate power and statistical properties of the Weibull mixture cure-model. For the standard monitoring arm, the inversion method is used to simulate survival data following a mixture cure model as observed in historical populations. For the experimental arm the effect of different data-generating processes and liquid biopsy schedules are explored.

Results

Simulation studies helped to explore different liquid biopsy schedules, effect sizes (lag time between detectable signals with liquid biopsy and imaging) under various data generating processes. Accordingly, the simulation studies helped to refine the study-design and schedule of the liquid biopsies. As compared to a conventional analysis with a Cox regression model, substantial gains in statistical power could be achieved. With a two-sided alpha of 5% and n=150 patients, the simulated power to detect recurrences 5 months earlier was 81% and 60% for the Cure- and Cox-model, respectively.

Conclusion

Comparing aftercare evaluations with different schedules and sensitivities is methodologically challenging. With anticipated non-proportional hazards, it is important to directly address the primary interest in an earlier signal-detection. Simulation studies helped to assess power and to develop an optimal monitoring schedule. Cure-models provide results with a clear interpretation and lead to substantial gains in statistical power.

posters-wednesday-BioZ: 2

Comparison of treatment sequences in advanced pancreatic cancer

Norbert Marschner^1,2, Nina Haug³, Susanna Hegewisch-Becker⁴, Marcel Reiser⁵, Steffen Dörfel⁶, Rüdiger Liersch⁷, Hartmut Linde⁸, Thomas Wolf⁹, Anna Hof¹⁰, Anja Kaiser-Osterhues², Karin Potthoff², Martina Jänicke¹⁰

¹Med. Klinik 1, Universitätsklinik Freiburg, Freiburg, Germany; ²Medical Department, iOMEDICO, Freiburg, Germany; ³Biostatistics, iOMEDICO, Freiburg, Germany; ⁴Hämatologisch-Onkologische Praxis Eppendorf (HOPE), Hamburg, Germany.; ⁵PIOH-Praxis Internistische Onkologie und Hämatologie, Köln, Germany; ⁶Onkozentrum Dresden/Freiberg, Dresden, Germany; ⁷Hämatologisch-onkologische Gemeinschaftspraxis, Münster, Germany; ⁸MVZ für Blut- und Krebserkrankungen, Potsdam, Germany; ⁹BAG, Gemeinschaftspraxis Hämatologie-Onkologie, Dresden, Germany; ¹⁰Clinical Epidemiology and Health Economics, iOMEDICO, Freiburg, Germany

There are no clear guidelines regarding the optimal treatment sequence for advanced pancreatic cancer, as head-to-head phase III randomised trials are missing. We assessed real-world effectiveness of three frequently administered sequential treatment strategies: FOLFIRINOX→GEMNAB, GEMNAB→FOLFOX/OFF and GEMNAB→ NALIRI + 5-FU. To this end, we emulated a hypothetical target trial where patients were randomised to one of these sequences before the beginning of first-line therapy. As causal estimand, we quantified the per-protocol effect of treatment on overall survival and time-to-deterioration of health-related quality of life. Treatment effects were estimated both for the whole population and stratified by risk group according to the Pancreatic Cancer Score¹. Our analysis included 1551 patients with advanced pancreatic cancer from the prospective, clinical cohort study Tumour Registry Pancreatic Cancer receiving FOLFIRINOX (n = 613) or gemcitabine/nab-paclitaxel (GEMNAB; n = 938) as palliative first-line treatment. We used marginal structural modeling to adjust for time-varying confounding affecting the relation between treatment and endpoint — a key challenge in real-world data analysis². The estimated effectiveness of the three treatment sequences evaluated was largely comparable. Patients with poor prognosis might benefit from intensified treatment with FOLFIRINOX→GEMNAB in terms of survival and quality of life. Future randomised trials on sequential treatments in advanced pancreatic cancer are warranted.³

1. Marschner N, Hegewisch-Becker S, Reiser M, von der Heyde E, Bertram M, Hollerbach SH, Kreher S, Wolf T, Binninger A, Chiabudini M, Kaiser-Osterhues A, Jänicke M, et al. FOLFIRINOX or gemcitabine/nab-paclitaxel in advanced pancreatic adenocarcinoma: A novel validated prognostic score to facilitate treatment decision-making in real-world. Int J Cancer 2023;152:458–69.

2. Robins JM, Hernán MA, Brumback B. Marginal Structural Models and Causal Inference in Epidemiology. In: Epidemiology. 2000. 550–60.

3. Marschner N, Haug N, Hegewisch‐Becker S, Reiser M, Dörfel S, Lerchenmüller C, Linde H, Wolf T, Hof A, Kaiser‐Osterhues A, Potthoff K, Jänicke M, et al. Head‐to‐head comparison of treatment sequences in advanced pancreatic cancer—Real‐world data from the prospective German TPK clinical cohort study. Intl Journal of Cancer 2024;155:1629–40.

posters-wednesday-BioZ: 3

Clinical Trials with Time-to-event-endpoint: Interim Prediction of Number of Events with Confidence Distributions

Edoardo Ratti, Maria Grazia Valsecchi, Stefania Galimberti

Bicocca Bioinformatics Biostatistics and Bioimaging B4 Center, School of Medicine and Surgery, University of Milan-Bicocca, Monza, Italy

Introduction. An important aspect in randomized clinical trials design is planning interim analyses. With time-to-event endpoints, the target sample size is function of event number. It is crucial that studies provide sufficient follow-up to observe the event number needed to preserve power. Novel approaches were developed in a bayesian framework to predict the date at which the target number of events is reached (maximum information trial). However, there is little on forecasting number of events expected at a fixed future date with corresponding prediction interval in trials with fixed follow-up time (maximum duration trial).

Methods. Based on a recent paper on the use of confidence distributions in clinical trials [1], we adapt a prediction method developed in reliability analysis [2] and show its potential in the clinical context. The proposed method obtains prediction intervals from a predictive distribution constructed on a bootstrap-based confidence distribution of the parameters of the fitted survival model. The appropriateness of the framework was assessed by application on a real phase III trial and by evaluating intervals coverage probability with simulations.

Results. Using data from a published phase III trial [3], at the second interim (accrual closed) and every subsequent 6 months we predicted the number of occurring event. Results show that all intervals included the observed number of events. Simulations show that prediction intervals have the desired coverage under the appropriate survival distribution.

Conclusions. For a maximum duration trial, it is crucial to predict the number of events at future times with proper prediction intervals. The presented approach allows to construct valid predictive inference based on confidence distributions accommodating different parametric model/censoring mechanisms. This is an alternative to a bayesian approach. Its use is proposed here for prediction after accrual closure and further work will face modelling accrual.

References
[1] Marschner IC., Confidence distributions for treatment effects in clinical trials: Posteriors without priors. Statistics in Medicine. 2024; 43(6): 1271-1289
[2] Tian Q., Meng F., Nordman D. J., Meeker W. Q., Predicting the Number of Future Events. Journal of the American Statistical Association. 2021; 117(539): 1296–1310
[3] Conter V, Valsecchi MG, Cario G, at al. Four Additional Doses of PEG-L-Asparaginase During the Consolidation Phase in the AIEOP-BFM ALL 2009 Protocol Do Not Improve Outcome and Increase Toxicity in High-Risk ALL: Results of a Randomized Study. J Clin Oncol. 2024 Mar 10;42(8):915-926.

posters-wednesday-BioZ: 4

A Bayesian-Informed Dose-Escalation Design for Multi-Cohort Oncology Trials with Varying Maximum Tolerated Doses

Martin Kappler¹, Yuan Ji²

¹Cytel Inc., Waltham, USA; ²University of Chicago, USA

In oncology dose-escalation trials, it is common to evaluate a drug across multiple cancer types within the same study. However, different cancer types may also have different maximum tolerated doses (MTDs) due to potentially different underlying patient characteristics. Standard approaches either pool all patients, potentially ignoring important differences between cancer types, or conduct separate dose-escalation processes for each type, which can lead to inefficiencies. We propose a dose-escalation design that leverages the dose-level information from faster-recruiting cohorts to inform dose-escalation and de-escalation rules for slower-recruiting cohorts, thereby balancing safety, efficiency, and cohort-specific MTD estimation.

Our approach is based on a model assisted dose escalation design and uses informative priors to leverage dose-toxicity information from the faster-recruiting cohort to the slower-recruiting cohort. This approach enables a more conservative and adaptive dose-escalation process for slower cohorts by updating the prior based on observed dose-limiting toxicities in the faster cohort. The informative prior ensures that the dose-escalation in the slower cohort is both cautious and responsive to emerging data, without requiring separate dose-escalation processes for each cancer type. Uncertainty for slower cohorts is reduced and unnecessary toxicity risks are avoided.

The operating characteristics of the approach (probability to determine MTD, number of patients exposed to toxic doses, etc.) are assessed via simulations over a variety of scenarios in the two cohorts and are compared to separate or pooled escalation.

posters-wednesday-BioZ: 5

Comparison of Bayesian Approaches in Single-Agent Dose-Finding Studies

Vibha Srichand

Prasanna School of Public Health, Manipal Academy of Higher Education, India

Single-agent dose-finding studies conducted as part of phase 1 clinical trials aim to obtain sufficient information regarding the safety and tolerability of a drug, with the primary objective of determining the Maximum Tolerated Dose (MTD) – the maximum test dose that can be administered with an acceptable level of toxicity. While the 3+3 design has been the conventional choice for dose-finding studies, innovative Bayesian designs have gained prominence. These designs provide a framework to incorporate prior knowledge with data accumulated during the study to adapt the study design and efficiently estimate the MTD. However, existing Bayesian assume a specific parametric model for the dose-toxicity relationship which reduces its adaptability to complex data patterns. To address this limitation, recent research has introduced nonparametric Bayesian methods which are model free, robust and well-suited for small sample sizes.Thus, it is imperative to comprehensively compare the performance of parametric and nonparametric Bayesian methods and provide evidence for the implementation of different methods.

This paper aims to understand the accuracy, safety and adaptability of dose-finding methods by analysing different scenarios of target toxicity probabilities and varying cohort sizes for a predetermined sample size. The methods under review are as follows: traditional method – 3+3 design; parametric methods – continual reassessment method (CRM), modified toxicity probability (mTPI and mTPI-2), keyboard and Bayesian optimal interval designs (Kurzrock et al., 2021) as well as nonparametric methods – Bayesian nonparametric continual reassessment (Tang et al., 2018) and Bayesian stochastic approximation method (Xu et al., 2022). The performance of the designs will be assessed using four key metrics, with conclusions drawn based on extensive simulation studies.

Keywords: Dose-finding, Maximum Tolerated Dose, Clinical trial design, Bayesian, Parametric, Nonparametric, Continual Reassessment method, Stochastic Approximation

References:
Kurzrock, R., Lin, C.-C., Wu, T.-C., Hobbs, B. P., Pestana, R. C., MD, & Hong, D. S. (2021). Moving beyond 3+3: The future of clinical trial design. American Society of Clinical Oncology Educational Book. American Society of Clinical Oncology. Meeting, 41, e133–e144. https://doi.org/10.1200/EDBK_319783

Tang, N., Wang, S., & Ye, G. (2018). A nonparametric Bayesian continual reassessment method in single-agent dose-finding studies. BMC Medical Research Methodology, 18(1), 172. https://doi.org/10.1186/s12874-018-0604-9

Xu, J., Zhang, D., & Mu, R. (2022). A dose-finding design for phase I clinical trials based on Bayesian stochastic approximation. BMC Medical Research Methodology, 22(1), 258. https://doi.org/10.1186/s12874-022-01741-3

posters-wednesday-BioZ: 6

Evaluating the effect of different non-informative prior specifications on the Bayesian proportional odds model in randomised controlled trials

Chris J Selman^1,2, Katherine J Lee^1,2, Michael Dymock^3,4, Ian Marschner⁵, Steven Y.C. Tong^6,7, Mark Jones^4,8, Tom Snelling^3,8, Robert K Mahar^1,9,10

¹Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Australia; ²Department of Paediatrics, University of Melbourne, Australia; ³Wesfarmers Centre of Vaccines and Infectious Diseases, The Kids Research Institute Australia, Australia; ⁴School of Population and Global Health, The University of Western Australia, Australia; ⁵NHMRC Clinical Trials Centre, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2050, Australia; ⁶Victorian Infectious Diseases Services, The Royal Melbourne Hospital, Australia; ⁷Department of Infectious Diseases, University of Melbourne, Australia; ⁸Sydney School of Public Health, Faculty of Medicine and Health, University of Sydney, Australia; ⁹Centre for Epidemiology and Biostatistics, University of Melbourne, Australia; ¹⁰Methods and Implementation Support for Clinical and Health Research Hub, University of Melbourne, Australia

Background

Ordinal outcomes can be a powerful way of combining multiple distinct patient outcomes into a single endpoint in randomised controlled trial (RCT). Such outcomes are commonly analysed using proportional odds (PO) models. When the analysis uses a Bayesian approach, it is not obvious what ‘non-informative’ priors should be used and whether these are truly ‘non-informative’, particularly in adaptive trials where early stopping decisions may be influenced by the choice of prior.

Methods

This study evaluates the effect of different non-informative prior specifications on the Bayesian PO model for a two-arm trial in the context of a design with an early stopping rule and a fixed design scenario. We conducted an extensive simulation study, varying factors such as effect size, sample sizes, number of categories and the distribution of the control arm probabilities. The models are also illustrated using data from the Australian COVID-19 Trial.

Results

Our findings indicate that the prior specification can introduce bias in the estimation of the treatment effect, particularly when control arm probabilities are right-skewed. Using an R-square prior specification had the smallest bias and increased the likelihood of stopping early in such settings when there was a treatment effect. However, this specification exhibited larger biases for control arm probabilities that were U-shaped and trials that incorporated an early stopping rule. Using Dirichlet priors with concentration parameters close to zero had the smallest bias when probabilities were right-skewed in the control arm, and were more likely to stop earlier for superiority for trials that incorporated early stopping rules even if there was no treatment effect. Specifying concentration parameters close to zero using the Dirichlet prior may also cause computational issues at interim analyses with small sample sizes and larger number of categories in the outcome.

Conclusion

The specification of non-informative priors in Bayesian adaptive trials that use ordinal outcomes has implications for treatment effect estimation and early stopping decisions. Careful selection of priors that consider the likely distribution of control arm probabilities or informed sensitivity analyses may be essential to inference is not unduly influenced by inappropriate priors.

posters-wednesday-BioZ: 7

Bayesian decision analysis for clinical trial design with binary outcome in the context of Ebola Virus Disease outbreak – Simulation study

Drifa Belhadi^1,2, Joonhyuk Cho^3,4,5, Pauline Manchon⁶, Denis Malvy^7,8, France Mentré^1,6, Andrew W Lo^3,5,9,10, Cédric Laouénan^1,6

¹Université Paris Cité, Inserm, IAME, F-75018 Paris, France; ²Saryga, France; ³MIT Laboratory for Financial Engineering, Cambridge, MA, USA; ⁴MIT Department of Electrical Engineering and Computer Science, Cambridge, MA, USA; ⁵MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA; ⁶AP-HP, Hôpital Bichat, Département d′Epidémiologie Biostatistiques et Recherche Clinique, F-75018 Paris, France; ⁷UMR 1219 Inserm/EMR 271 IRD, University of Bordeaux, Bordeaux, France; ⁸Department for Infectious and Tropical Diseases, University Hospital Center Pellegrin, Bordeaux, France; ⁹MIT Operations Research Center, Cambridge, MA, USA; ¹⁰MIT Sloan School of Management, Cambridge, MA, USA

Background

When designing trials for high-mortality diseases with limited available therapies, the conventional 5% type I error rate used for sample size calculation can be questioned. Bayesian Decision Analysis (BDA) for trial design allows for the integration of multiple health consequences of the disease when designing trials. This study adapts BDA for trials with binary outcomes to calculate optimal sample sizes and type I error rates in the context of an Ebola virus disease outbreak.

Methods

We consider a fixed, two-arm randomized trial with a binary outcome and two types of clinical trial loss: post-trial loss, for not approving an effective treatment or approving an ineffective treatment; in-trial loss, for not administrating an effective treatment to patients in the control arm or for administrating an ineffective treatment for patients in the experimental arm. The model accounts for side effects of an ineffective treatment and the burden of Ebola disease. A loss function was defined to summarize the multiple consequences into a single measure, and optimal sample sizes (n) and type I error rates (α) were derived by minimizing this loss function.

Results

Using the mortality rate as the outcome, we varied model parameters to represent different Ebola epidemic scenarios, such as target population size, mortality rate, and treatment efficacy. In most cases, BDA-optimal α values exceeded the conventional one-sided 2.5% rate and BDA-optimal sample sizes were smaller. Additionally, we conducted simulations comparing a BDA-optimized two-arm trial (fixed or sequential) to standard designs (two-arm/single-arm, fixed/sequential) across various outbreak scenarios. Overall, statistical power remained comparable across designs, except when sample size assumptions were incorrect, or when the trial started after the outbreak peak; in these situations, BDA-optimized trials were associated with superior powers.

Conclusion

This BDA adaptation provides a new framework for designing trials with a binary outcome, enabling more effective evaluation of therapeutic options. It is particularly valuable for diseases with high mortality rates and limited treatment options. In an outbreak context, where case numbers decline after the epidemic peak and there is uncertainty around mortality rate and treatment efficacy, BDA-optimized trials offer an interesting approach for evaluating new experimental treatments.

posters-wednesday-BioZ: 8

Relevance of Electronic Medical Records for Clinical Trial Eligibility: A Feasibility Assessment in Acute Stroke Studies

Yusuke Sasahara¹, Taizo Murata², Yasufumi Gon^3,4, Toshihiro Takeda^2,5, Eisuke Hida¹

¹Department of Biostatistics and Data Science, Osaka University Graduate School of Medicine; ²Department of Medical Informatics, Osaka University Hospital; ³Department of Neurology, Osaka University Graduate School of Medicine; ⁴Academic Clinical Research Center, Osaka University Hospital; ⁵Department of Integrated Medicine, Medical Informatics, Osaka University Graduate School of Medicine

Electronic medical records (EMRs) are a key source of real-world data in clinical trials. In hyperacute-phase diseases, where conducting RCTs is challenging, external control arms using EMRs are expected to enhance trial feasibility. In July 2024, FDA released guidance on evaluating EMRs and claims data to support regulatory decision-making, emphasizing the importance of ensuring data reliability and relevance. However, evidence on how well EMRs meet these criteria remains limited. This study evaluates the feasibility of extracting clinical trial eligibility criteria from EMRs, focusing on data extraction and structuring in acute stroke studies.

Five acute stroke-related clinical trials with detailed eligibility criteria were selected from the jRCT and UMIN-CTR databases. Registration forms were created for each trial, and an expert panel (physician, medical informatician, statistician, and data manager) evaluated the feasibility of extracting these criteria from EMRs at Osaka University Hospital. Data types were categorized into four groups: structured, mosaic (a mix of structured and unstructured), unstructured, and unavailable. The proportion of each type was summarized by trial and item category, and extraction feasibility was scored (structured: 3, mosaic: 2, unstructured: 1, unavailable: 0). Data were visualized using bar charts, box plots, and radar charts.

Across all five trials, structured data accounted for 37.6%, mosaic for 12.1%, unstructured for 42.3%, and unavailable for 8.1%. The proportion of unstructured data varied among trials, with Trial B having the highest (68.3%) and Trial C the lowest (15.8%). Trial A had the highest unavailable data (16.7%). Imaging-related variables were entirely unstructured (100%), and medical history/comorbidity (84.6%) and diagnosis (61.1%) also lacked structure. In contrast, structured data were demography (80.0%), treatment applicability (62.5%), and laboratory/vital signs (56.3%).

The study assessed how well EMRs align with clinical trial eligibility criteria to evaluate their relevance. Due to the variability in EMRs availability across trials and items, a preliminary assessment is necessary for each protocol. Since 42.3% of all items were unstructured, manual chart review may be unavoidable. Structured data were more prevalent in demography and treatment applicability, whereas imaging and medical history/comorbidity data posed major challenges. FDA guidelines highlight the need for validation and bias assessment in data transformation, requiring standardized processes to enhance EMR relevance for regulatory use.

The feasibility of extracting eligibility criteria and the degree of structuring in EMRs varied across trials and items. While imaging and medical history/comorbidity data were poorly structured, developing standardized data extraction methods may enhance the relevance of EMRs.

posters-wednesday-BioZ: 9

Navigating complex and computationally demanding clinical trial simulation

Saumil Shah, Mitchell Thomann

Boehringer Ingelheim, Germany

Many diseases lacking treatment options have multiple correlated endpoints as progression biomarkers. Establishing efficacy in many endpoints with randomised dose-finding represents an unmet need.

A seamless Phase IIa-IIb trial design was proposed, featuring staggered recruitment, dropouts, longitudinal and correlated endpoints, and interim analysis. The trial design also included using historical information using Bayesian meta-analytic priors and Bayesian dose-finding methods to improve trial efficacy. Scenario planning across a wide range of effects, dose-response models and endpoint correlations is a considerable challenge. Thus, a robust trial simulation implementation was required to estimate operating characteristics precisely and optimise the study design.

We used the random slope and intercept method to capture the longitudinal endpoint and patient-level variance. The correlated secondary endpoint was generated from conditional distributions. The informative historical prior was updated with the generated data to get a posterior. We used the posterior in the interim analysis to compare the across-arm gains in the change from baseline values. The final analysis used the multiple comparison procedureand Bayesian modelling for randomised dose-finding. We considered six appropriate candidate dose-response models for the Bayesian modelling. Each endpoint was assigned go-no-go boundaries for stop, continue or success decisions. We used the median of posterior distribution from the fitted Bayesian models to make the decision.

We used R programming language and available open-source packages to implement the trial simulation. The data generation and analysis steps were implemented as a collection of functions in a pipeline. The pipeline was managed using {targets}, a workflow management package. Such management allowed us to handle many scenarios and replicates, preventing redundant and unnecessary computations. It also helped with parallel execution, bringing execution time to the order of hours on a high-performance cluster.

Our implementation enabled the rapid exploration of a wide range of trial scenarios and treatment effects, enabling reliable estimation of the operating characteristics of each design aspect. This approach provides a potent tool for optimising clinical trial design across therapeutic areas.

posters-wednesday-BioZ: 10

Transforming Clinical Trials: The Power of Synthetic Data in Augmenting Control Arms

Emmanuelle Boutmy¹, Shane O Meachair², Julie Zhang⁵, Sabrina de Souza¹, Saheli Das⁴, Dina Oksen³, Anna Tafuri¹, Lucy Mosquera^5,6

¹Merck KGaA, Darsmtadt, Germany; ²Aetion, Barcelona, Spain; ³Merck Biopharma Co., Ltd.; ⁴Merck Specialities Pvt., Ltd.; ⁵Aetion, New York, USA; ⁶CHEO Research Institute, Ottawa, Ontario, Canada

Background: Synthetic data generation (SDG) creates artificial datasets that replicate the characteristics of clinical trial (CT) data, potentially mitigating challenges when real data is scarce. This study aimed to explore methods for Synthetic Data Augmentation (SDA): augmentation (adding synthetic data to original data) of a CT control arm. Data from the INTRAPID lung 0037 control arm were used, consisting of advanced NSCLC patients with high PD-L1 expression treated with pembrolizumab (n=152).

Methods: Three generative models were employed to create synthetic data: Sequential decision trees (SDT), Bayesian networks (BN), and Transformer synthesis (TS), alongside a reference approach using bootstrapping (BS). Descriptive statistics, parameter estimates, and standard errors were calculated using a multiple imputation method for synthetic data using 10 synthetic datasets. The quality of synthetic data was assessed through utility and workload-specific assessments. The primary outcome was bias relative to the full CT control arm estimate and standard deviation to assess variability across samples. Bias assessments compared augmented estimates to Progression Free Survival (PFS) from the full control arm, simulating scenarios with 50% unavailable control arm data.

Results: Univariate distances and multivariate relationships were below the pre-specified threshold indicating close replication of real data distribution for all models except TS. Results indicated that synthetic data produced outcomes comparable to real data, with bias in PFS ranging from -2.69 for TS to -+0.2 months for SDT, where values closer to zero indicates better performance (-2.22 for BN, -0.71 months for BS). SDT synthesis demonstrated the lowest bias among all augmentation methods including the reduced control sample alone. In the sensitivity analysis, SDT was the only approach whose 95% interval included the true ground truth PFS estimate from the full CT control arm.

Discussion: These results suggest that generative models could yield nearly identical distributions for real and synthetic variables. SDA has been shown to yield estimates with low bias compared to using the available CT data alone and may be leveraged for clinical trials where patient enrollment in the control arm is difficult, such as simulating trial scenarios or completing datasets for underrepresented groups. Further research is needed to confirm and develop a synthetic validation framework to assess the limits of SDA for statistical inference, as well as impact on other statistical quantities such as power and type I error rates and harness the transformative power of synthetic data.

posters-wednesday-BioZ: 11

Integrating stakeholder perspectives in modeling routine data for therapeutic decision-making

Michelle Pfaffenlehner^1,2, Andrea Dreßing^3,4, Dietrich Knoerzer⁵, Markus Wagner⁶, Peter Heuschmann^7,8,9, André Scherag¹⁰, Harald Binder^1,2, Nadine Binder^2,11

¹Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Germany; ²Freiburg Center for Data Analysis and Modeling and AI, University of Freiburg, Freiburg, Germany; ³Department of Neurology and Clinical Neuroscience, Medical Center, University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; ⁴Freiburg Brain Imaging Center, Faculty of Medicine, Medical Center–University of Freiburg, University of Freiburg, Freiburg, Germany; ⁵Roche Pharma AG, Grenzach, Germany; ⁶Stiftung Deutsche Schlaganfall-Hilfe, Gütersloh, Germany; ⁷Institute for Medical Data Sciences, University Hospital Würzburg, Würzburg, Germany; ⁸Institute for Clinical Epidemiology and Biometry, University Würzburg, Würzburg, Germany; ⁹Clinical Trial Centre, University Hospital Würzburg, Würzburg, Germany; ¹⁰Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany; ¹¹Institute of General Practice/Family Medicine, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany

Background

Routine medical data offer a valuable resource for generating evidence to improve patient care in therapeutic contexts beyond randomized controlled trials. These data include patient-related parameters, diagnostic information, and treatment data recorded in digital patient records from hospital admission to discharge. With the introduction of the German Health Data Use Act (GDNG) in 2024, the use of such data is becoming more accessible in Germany. However, methodological approaches must account for the diverse needs of stakeholders, including clinicians, the pharmaceutical industry, patient advocacy groups, and statistical modelers. This study explores how different perspectives shape the use and interpretation of routine data in medical decision-making, with each perspective aiming to address specific research questions.

Methods

Building on insights from an interdisciplinary workshop that we recently organized, we examine how various stakeholder perspectives can be incorporated into the modelling of routine data. We discuss key routine data sources, such as electronic health records, and highlight statistical and artificial intelligence (AI)-based techniques that could be used to extract meaningful insights. Moreover, the linkage of patient-reported outcomes will be discussed to address the patient’s perspective. Additionally, we illustrate how different modelling approaches address distinct research questions, reflecting the priorities of the stakeholder groups. A particular focus is placed on multi-state models, which are well-suited for capturing disease and treatment trajectories by structuring diagnoses and treatments as transition events over time.

Results

Our conceptual analysis identifies multiple approaches for integrating diverse perspectives into routine data modelling. For example, clinicians prioritize clinical relevance and interpretability, the pharmaceutical industry focuses on regulatory compliance and real-world evidence, while patient representatives emphasize transparency and inclusion of patient-reported outcomes. Multi-state models are particularly advantageous because they allow the characterization of dynamic disease processes and patient transitions between states, offering a more accessible and interpretable approach to routine data analysis. Still, challenges remain in data quality.

Conclusion

Effective use of routine data in medical decision-making requires robust analytical methods that meet the needs of diverse stakeholders. Multi-state models provide a dynamic framework for capturing disease progression and treatment pathways, making them particularly suitable for clinical and regulatory applications. To maximize their impact, future research should focus on improving data integration, transparency in methods used, and making the methods practically useful, leading to better integration into healthcare decision-making.

posters-wednesday-BioZ: 12

Aligning Synthetic Trajectories from Expert-Based Models with Real Patient Data Using Low-Dimensional Representations

Hanning Yang¹, Meropi Karakioulaki², Cristina Has², Moritz Hess¹, Harald Binder¹

¹Institute of Medical Biometry and Statistics (IMBI), University of Freiburg, Germany; ²Department of Dermatology, University of Freiburg, Germany

Background:

Quantitative models, such as ordinary differential equations (ODEs), are widely used to model dynamic processes, such as disease progression - e.g., for subsequently generating synthetic data. However, calibrating them with real patient data, which is typically sparse, noisy, and highly heterogeneous can be challenging. This is particularly notable in rare diseases like Epidermolysis Bullosa (EB), where observations are limited, and data is often missing. To address this, we developed an approach to calibrate ODEs informed by expert knowledge with real, observational patient data using low-dimensional representations.

Methods:

We developed an ODE system informed by experts to model EB key biomarker dynamics and employed an autoencoder for dimensionality reduction. Calibration of ODE parameters was informed by a loss computed from the distance between real and ODE-derived synthetic observations in the latent space. Specifically, this loss captures key trajectory features, including temporal alignment and pointwise differences. To handle discrepancies in initial conditions, a centring approach is applied during early iterations, and an imputation layer is trained to address missing data.

Results:

A simulation study demonstrated robustness under high noise and complex missing patterns, with parameters converging to the ground truth. When applied to real EB data, our method consistently improved the alignment between synthetic and real data, despite the challenges of noisy and sparse observations from only 21 highly diverse patients. As a result, relationships in synthetic data became more consistent with real patient data.

Conclusion:

This study presents a novel approach for calibrating an expert-informed synthetic data model using neural networks, supporting realistic synthetic individual patient data (IPD) generation and advancing rare disease research.

posters-wednesday-BioZ: 13

Integrating semantic information in care pathway studies with medical code embeddings, application to the case of Amyotrophic Lateral Sclerosis

Corentin FAUJOUR^1,2, Stéphane BOUEE¹, Corinne EMERY¹, Anne-Sophie JANNOT^2,3

¹CEMKA, Bourg-La-Reine, France; ²Université Paris Cité, Inria, Inserm, HeKA, F-75015 Paris, France; ³French National Rare Disease Registry (BNDMR), Greater Paris University Hospitals (AP-HP), Université Paris Cité, Paris, France

Background

Modelling care pathways from claims databases is a challenging task given the tens of thousands of existing medical codes, sometimes associated with the same medical concept. In such modelling, medical codes are usually represented as sets of binary variables (one-hot encoding), which does not allow for the inclusion of semantic information. Embedding medical codes in a continuous space, so that semantically related codes are represented by numerically similar vectors, could improve care pathway modelling.

We aimed to embed codes from the International Classification of Diseases (ICD-10) and the Anatomical Therapeutic Chemical Classification (ATC) into a common latent space. A secondary goal was to use these embeddings in the prediction of amyotrophic lateral sclerosis (ALS).

Methods

A co-occurrence matrix between codes was constructed from care sequences contained in the ESND, a French claims database containing care consumptions for a representative sample of 1.5 million patients over 15 years. Code embeddings for all 5 classifications systems available in the ESND, i.e. representative numerical vectors that capture semantic relationships, were then obtained using singular value decomposition on the corresponding pointwise mutual information matrix.

Embeddings’ consistency was assessed using UMAP visualisation and nearest neighbour searches. The resulting embeddings were used to predict the occurrence of ALS in a penalized logistic regression model, taking as input all codes in the care sequence prior to diagnosis. Sequence-level embeddings were obtained by an average-pooling operation at the code level. We compared the performance obtained using embeddings as input with those obtained using one-hot encoding.

Results

We obtained embeddings for 30,000 codes, including 9,900 ICD-10 codes and 1,400 ATC codes from 1.5 million care pathways representing 400 million tokens. Consistency evaluation revealed that semantically related codes form clusters in the latent space, e.g., the diagnosis code for motor neuron disease is surrounded by other muscle disorders (myopathies, muscular dystrophy, etc.) and its specific treatment (riluzole).

Using the resulting embeddings to classify sequences from 22,000 ALS patients and 22,000 matched controls, we were able to significantly improve predictive performance (AUC: 0.78, 95% CI [0.77-0.79] with embeddings vs. 0.74 [0.73-0.75] with one-hot encoding). This suggests that the inclusion of semantic information is relevant for such a prediction task.

Conclusion

This is the first semantic representation of ICD-10 and ATC codes in a common latent space, two classifications commonly used in claims databases. The resulting embeddings can be used to improve the representation of healthcare pathways.

posters-wednesday-BioZ: 14

Inequalities in impact of respiratory viruses: development and analysis of respiratory virus phenotypes in EHRs from England using OpenSAFELY

Em Prestige¹, Jennifer K. Quint², Charlotte Warren-Gash¹, William Hulme³, Edward PK Parker¹, Elizabeth Williamson¹, Rosalind M. Eggo¹

¹London School of Hygiene & Tropical Medicine, United Kingdom; ²Imperial College London, United Kingdom; ³Bennett Institute for Applied Data Science, Nuffield Department of Primary Care Health Sciences, University of Oxford, United Kingdom

Background

Respiratory virus burden is large and unequally distributed in England, with disproportionate impact in socioeconomically deprived areas and minority ethnic groups. To explore these disparities using electronic health records (EHRs) computable phenotypes must be designed to identify reported respiratory virus health events. However, many EHR codes are non-specific or uncertain, for example, a patient could have codes for ‘cough’ or ‘suspected influenza’ and neither of these would be highly specific identifiers of flu cases. Therefore, sensitivity and specificity of the phenotypes should determine what codes to include. This research explores the design of phenotypes to identify patients with respiratory viruses - respiratory syncytial virus (RSV), influenza (flu), and COVID-19, and the subsequent application exploring disparities in the impact of these conditions. We highlight the trade-offs between sensitivity and specificity in phenotype design and their implications for identifying health disparities.

Methods

With the approval of NHS England, we used pseudonymized GP data in OpenSAFELY, linked with Hospital Episode Statistics (HES) and ONS mortality data, to develop phenotypes for mild (primary/emergency care) and severe (secondary care) respiratory outcomes. For each virus, we created maximally sensitive and specific phenotypes to capture cases with more frequency or accuracy respectively. Maximally sensitive phenotypes included non-specific symptoms and suspected diagnosis codes, whereas, maximally specific phenotypes included lab test results. We then identified disparities by socioeconomic status and ethnicity in these outcomes from 2016-2024. We used Poisson regression for rates of mild and severe outcomes per 1000 person-years, adjusting for age group, sex, rurality, and where relevant, vaccination status. We performed analyses on the NHS records of approximately 45% of England’s population, presenting a unique opportunity to explore respiratory outcomes in cohorts where cases are rare or under-ascertained.

Results

We report differences and overlap in cases identified using specific versus sensitive phenotypes across the three pathogens. We describe the extent to which disparities in respiratory outcomes vary by pathogen, age cohort and severity of disease and use adjusted models to explore patterns of risk across ethnicity and socioeconomic status in different phenotypes.

Conclusion

Both highly specific and sensitive computable phenotypes are essential tools in EHR research. Their design should align with research objectives, balancing accuracy with the required number of outcomes. Exploring multiple phenotype definitions supports sensitivity analyses and subgroup evaluations. Furthermore, disparities in respiratory virus outcomes highlight the pathogen-specific risks and age-related vulnerabilities that should be targeted to minimise health inequities.

posters-wednesday-BioZ: 15

Modeling Longitudinal Clinical Outcomes: Comparison of Generalized Linear Models, Generalized Estimating Equations, and Marginalized Multilevel Models in Pediatric Intensive Care

Luca Vedovelli¹, Stefania Lando¹, Danila Azzolina², Corrado Lanera¹, Ileana Baldi¹, Dario Gregori¹

¹University of Padova, Italy; ²University of Ferrara, Italy

Introduction Longitudinal data analysis is essential in neonatal and pediatric intensive care, where patient outcomes evolve rapidly, such as in sepsis progression or respiratory distress. Selecting the right statistical model is critical for accurate clinical effect estimation. We compared four modeling approaches—generalized linear models (GLM), GLM with a shrinkage factor, generalized estimating equations (GEE), and marginalized multilevel models (MMM)—in scenarios replicating real-world complexity, including random effects, latent effects, and transition dynamics. Our study evaluated model accuracy, robustness, and interpretability in small and variable cluster settings typical of intensive care units, where patient populations are often limited, heterogeneous, and subject to rapid physiological changes.

Methods We conducted a simulation study reflecting the heterogeneity of clinical trajectories in neonatal and pediatric intensive care. Scenarios included non-fixed patient clusters ranging from 4 to 10 and sample sizes between 20 and 150. Models were evaluated based on Mean Absolute Percentage Error (MAPE), Type I and Type II error rates, and parameter stability. We assessed the impact of incorporating shrinkage factors in GLM to mitigate estimation biases.

Results MMM consistently outperformed GEE and GLM in small sample sizes and low cluster counts, yielding lower MAPE and reduced bias. This superior performance is due to its integration of marginal and subject-specific effects while accounting for within-cluster correlation. As sample size and cluster numbers increased, performance differences diminished. GEE and GLM exhibited high variability in small samples, with GEE particularly unstable. GLM tended to overestimate effects, inflating Type I error rates. MMM maintained a controlled Type I error rate, though at the cost of slightly reduced power.

Conclusion In neonatal and pediatric intensive care, where patient populations are small and heterogeneous, MMM is a more reliable alternative to GEE and GLM. It balances interpretability and robustness, making it well suited for longitudinal clinical applications. While GLM is adequate in large datasets, its tendency to overestimate effects warrants caution, as it may misguide clinical decisions. GEE, although widely used, is less stable in small samples. Our findings support the use of MMM for clinical research requiring accurate inference of treatment effects and patient trajectories. Future work should explore Bayesian extensions of MMM for enhanced inferential precision through improved uncertainty modeling, small-sample estimation, and incorporation of prior knowledge.

posters-wednesday-BioZ: 16

Modelling the costeffectiveness of Truvada for the Prevention of Mother to Child Prevention (PMTCT) of Hepatitis B Virus in Botswana

Graceful Mulenga^1,2, Motswedi Anderson^1,4,5, Simani Gaseitsiwe^1,3

¹Botswana Harvard Health Partnership, Botswana; ²Department of Mathematics and Statistical Sciences, Faculty of Science, Botswana International University of Science and Technology, Palapye , Botswana; ³Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA, USA; ⁴The Francis Crick Institute, London, UK; ⁵Africa Health Research Institute, Durban, South Africa

Hepatitis B virus (HBV) infection remains a major public health challenge globally, with approximately 254 million people living with chronic HBV, including over 6 million children under 5 years. Mother-to-child transmission (MTCT) of HBV is responsible for a significant portion of new infections, particularly in high-prevalence regions. Infants born to HBV-infected mothers are at risk of chronic infection, which can lead to severe liver diseases later in life. Truvada (TDF), a nucleoside reverse transcriptase inhibitor, is a recommended antiviral treatment for both HBV and HIV and has shown potential in reducing MTCT of HBV. However, the cost-effectiveness of TDF for preventing MTCT in resource-limited settings like Botswana is not well established. This study aims to evaluate the feasibility and cost effectiveness of three distinct strategies for screening and managing HBV among pregnant women in Botswana . This study will use a cohort of pregnant women in Botswana by assessing three groups as follows; i)No HBV screening or treatment is provided, and TDF prophylaxis is not administered (control group), ii)Screening for Hepatitis B surface antigen (HBsAg) is conducted for all pregnant women, with TDF prophylaxis administered to those who test positive for HBsAg, beginning at 28 weeks gestation and continuing for four weeks postpartum, iii)Screening for both HBsAg and HBV e-antigen (HbeAg) is performed, and TDF prophylaxis is administered exclusively to women who test positive for both HBsAg and HBeAg. A cost-utility analysis (CUA) will be conducted to compare the costs and clinical outcomes of each strategy, with effectiveness measured in terms of the number of HBV transmissions prevented. Costs will include screening for HBsAg and HBeAg, TDF treatment, hepatitis B immunoglobulin (HBIG) for infants, all components of the intervention (such as training, administration, supervision, etc) and maternal healthcare. In addition, a decision-analytic model that would allow the generation of cost-effectiveness estimates will be designed. The Incremental Cost-Effectiveness Ratio (ICER) will be calculated to assess the cost per case of HBV transmission prevented for each strategy. Moreover, sensitivity analyses will be performed to test the robustness of results under varying assumptions related to drug costs, screening effectiveness, and intervention costs.

posters-wednesday-BioZ: 17

Application of machine learning methods for the analysis of randomised controlled trials: A systematic review

Xiao Xuan Tan, Rachel Phillips, Mansour Taghavi Azar Sharabiani

Imperial College London

Background

Randomised controlled trials (RCTs) collect extensive data on adverse events (AEs), yet their analysis and presentation are often overly simplistic, leading to missed opportunities for identifying potential signals of treatment-related harm. A 2024 scoping review identified a variety of machine learning (ML) approaches being employed in RCTs to identify heterogeneous treatment effects (HTEs) across key participant subgroups [1]. This highlights the range of ML methods being explored to derive insights from RCT data. ML methods hold potential to enhance AE analysis, offering tools to better interpret complex AE data and support data-driven, personalised treatment harm profiles. This review aims to identify ML methods and evaluate applications for analysis of RCT data, revealing both established and potentially suitable ML approaches that could be adapted to analyse AE data in RCTs. Additionally, this review will highlight emerging trends in ML applications to RCTs, including shifts in commonly used techniques, evolving best practices, and expanding use cases beyond HTE analysis.

Methods

A systematic search was conducted in November 2024 via the Embase, MEDLINE, Web of Science and Scopus databases, alongside the preprint repositories arXiv, medRxiv and bioRxiv. Articles were eligible if they applied ML methods to analyse or reanalyse RCT datasets, irrespective of the types of outcomes examined, and accounted for the RCT’s treatment assignment in their analyses. Following screening, a pre-piloted data extraction sheet will be used to systematically collect relevant study details.

Results

After deduplication, 11286 articles were retrieved. Following title and abstract review, 2015 articles were eligible for full text review. Data extraction and synthesis are underway. Results presented will describe (i) study characteristics (e.g., purpose of analysis), (ii) RCT characteristics (e.g., medical area, trial design, outcomes examined) (iii) ML methods used, including model implementation details, use of explainability tools (e.g., SHAP, LIME), results, limitations, reproducibility considerations (e.g., software, code availability, dataset access).

Conclusion

The findings of this review will provide a comprehensive overview of applications of ML methods in RCTs, guiding trialists in their potential use for future trial design and analysis. Additionally, it will pinpoint ML techniques most relevant to the analysis of AEs, an area where more advanced analytical approaches are needed to facilitate early identification of potential signals of harm and improve the understanding of treatment-related harm.

[1] Inoue K, et al. Machine learning approaches to evaluate heterogeneous treatment effects in randomized controlled trials: a scoping review. Journal of Clinical Epidemiology. 2024; 176: 111538.

posters-wednesday-BioZ: 18

Joint longitudinal modelling of non-normally distributed outcomes and endogenous covariates

Chiara Degan¹, Bart Mertens¹, Pietro Spitali², Erik H. Niks², Jelle Goeman¹, Roula Tsonaka¹

¹Department of Biomedical Data Sciences, Leiden University Medical Center, The Netherlands,; ²Department of Human Genetics, Leiden University Medical Center, The Netherlands,

In biomedical research, longitudinal outcomes and endogenous time-dependent covariates are often recorded, creating the need to develop methodological approaches to assess their associations, evaluate how one outcome changes in relation to the covariate, and determine how this relationship evolve over time.
To address these aspects, endogenous covariate and outcome are typically modelled jointly by assuming correlated random effects (Verbeke et al., 2014). We refer to this model as the Joint Mixed Model (JMM). This approach allows for the combination of variables of different types, while preserving the parameter interpretation of the univariate case and accommodating unbalanced data. However, the association is interpreted through the correlation of random effects rather than directly on the scale of the observed variables. Moreover, by making assumptions on the form of the variance-covariance matrix of the random effects, we impose some constraints on the variables' association form that may lead to biased estimations if misspecified.

As an alternative, we consider a modification of the joint model proposed by Rizopoulos (2017), adapting it to include only longitudinal outcomes rather than a time-to-event component. We refer to this adapted model as the Joint Scaled Model (JSM). It induces the association by copying and scaling the linear predictor of the endogenous covariate into the linear predictor of the outcome. This approach preserves the advantages of the JMM while improving interpretability.

To compare the two model results and assess the impact of their underlying assumptions on conclusions, we propose to analytically derive an association coefficient that measures the marginal relation between variables. The purpose of this coefficient is to construct a quantity that has the same meaning in both models and can be interpreted similarly to a regression coefficient. It measures the change in outcome in response to a unit change in the covariate. Furthemore, it is a quantity that depends on the time of both variables, allowing it to capture the cross-sectional effect of the endogenous covariate on the outcome, as well as their relationship at different time points (lag effect).

The practical application of these models is limited by computational costs, which arise from high-dimensional integrations over random effects. To fill this gap, a flexible Bayesian estimation approach, known as INLA, has been used.

We will present the results of a longitudinal study on Duchenne Muscular Dystrophy patients, with a focus on evaluating the relationship between a bounded outcome and blood biomarkers.

posters-wednesday-BioZ: 19

Mediation analysis for exploring gender differences in mortality among acute myocardial infarction

Alice Bonomi, Arianna Galotta, Francesco Maria Mattio, Lorenzo Cangiano, Giancarlo Marenzi

IRCCS Centro Cardiologico Monzino, Italy

Background. Women with acute myocardial infarction (AMI) have higher mortality rates than men, influenced by factors such as older age, comorbidities, atypical symptoms, and treatment delays. This study analyzed AMI patients (2003-2018) from the Lombardy Health Database (Italy) to investigate sex differences in in-hospital and one-year mortality, assessing the impact of age, percutaneous coronary intervention (PCI), and post-discharge therapy using mediation analysis.

Methods. Among 263,564 AMI patients (93,363 women, 170,201 men), the primary and secondary endpoints were in-hospital and one-year mortality, respectively. Mediation analysis was performed to evaluate the direct and indirect effects of sex on outcomes, incorporating age, PCI, and post-discharge therapy as mediators. The analysis was conducted using the SAS Proc CALIS procedure (SAS Institute Inc., Cary, NC, USA) based on structural equation modeling, with relationships quantified using standardized β coefficients.

Results. Women had significantly higher in-hospital mortality (10% vs. 5%; P<0.0001) and one-year mortality (24% vs. 14%; P<0.0001) compared to men. Mediation analysis revealed that female sex directly contributed 12% to in-hospital mortality and 4% to one-year mortality, whereas age and undertreatment accounted for the majority of the disparity (88% [β=0.09] and 96% [β=0.15], respectively).

Conclusion. Women with AMI experience higher mortality, primarily due to older age and undertreatment, both during hospitalization and after discharge. Addressing these disparities through optimized treatment strategies may improve outcomes in women with AMI.

posters-wednesday-BioZ: 20

Bivariate random-effects models for the meta-analysis of rare events

Danyu Li, Patrick Taffe

Center for Primary Care and Public Health (unisanté), Division of Biostatistics, University of Lausanne (UNIL), Switzerland

It is well known that standard methods of meta-analysis, such as the inverse variance or DerSimonian and Laird methods, break down with rare binary events. Not only are effect sizes and within-study variances badly estimated, but also heterogeneity is generally not identifiable or strongly underestimated, and the overall summary index is biased. Many alternative estimation methods have been proposed to improve the estimates in sparse data meta-analysis. In addition to the Bivariate Generalized Linear Mixed Model (BGLMM), the Marginal Beta-Binomial, and the Sarmanov Beta-Binomial models are competitive alternatives. These models have already been used in the context of meta-analysis of diagnostic accuracy studies, where the correlation between sensitivity and specificity is likely to be strongly negative. To our best knowledge, they have not been investigated in the context of rare events and sparse data meta-analysis with a focus on estimating the Risk Difference (RD), Relative Risk (RR), and Odds Ratio (OR). Therefore, the goal of this study was to assess the performance and robustness of these three competitive models in this context. More specifically, the robustness of each model will be assessed using data-generating processes based on the other two competing models. For example, if the data were simulated based on the Sarmanov distribution, then the BGLMM and Marginal Beta Binomial models are misspecified, and assessing their robustness is of interest. According to the simulation results, the BGLMM performs worst regardless of the misspecification of the distribution. The Sarmanov Beta-Binomial model and the Marginal Beta-Binomial model perform better and are more stable due to their lower variance.

posters-wednesday-BioZ: 21

Time-varying Decomposition of Direct and Indirect Effects with Multiple Longitudinal Mediators

Yasuyuki Okuda¹, Masataka Taguri²

¹Daiichi Sankyo Do., Ltd., Japan; ²Tokyo Medical University

Recent advances in mediation analysis using causal inference techniques have led to the development of sophisticated methods for complex scenarios, including those involving multiple time-varying mediators. Although these approaches accommodate time-varying mediators, their estimates are typically restricted to a single timepoint of interest, thus limiting our understanding of the temporal dynamics of mediation processes. In many clinical contexts, it is essential to capture how mediator effects vary over time to elucidate underlying mechanisms and optimize intervention timing. For example, temporal variations in direct and indirect effects can reveal critical windows during which a treatment exerts its primary influence. To address these limitations, we proposed a novel framework that extends existing approaches based on interventional direct and indirect effects with multiple time-varying mediators and treatment-mediator interaction.

Our method not only decomposes the overall effect into direct and indirect effects, but also further decomposes these effects into time-varying components to investigate mediated effects both up to and beyond the timepoint (t), thereby capturing their longitudinal trajectories. We also proposed a practical estimation approach using marginal structural models (MSMs) for both the outcome and mediators, using inverse probability weighting (IPW) method to account for time-varying confounders.

To illustrate the utility of our method, we applied it to the data from a randomized controlled trial evaluating the effect of a mineralocorticoid receptor (MR) blocker on urinary albumin-to-creatinine ratio (UACR) reduction. Specifically, we investigated how much of the treatment effect is mediated by changes in blood pressure and renal function (measured by eGFR) and explored differences in their mediator-specific effects over time. Our analysis indicated that the mediated effects via both systolic blood pressure and eGFR were relatively small compared with other pathways, with different patterns observed in their longitudinal trajectories.

We believe our approach provides investigators with a valuable tool for understanding an agent's mechanism of action, distinguishing it from other agents, and ultimately informing treatment decisions appropriate for each patient.

posters-wednesday-BioZ: 22

Causal framework for analyzing mediation effects of clinical biomarkers

Jinesh Shah

CSL Behring, Germany

For a biomarker to be at least a "level 3 surrogate" that is "reasonably likely to predict clinical benefit for a specific disease and class of interventions" [1] it must be either a mediator [1,2] on the causal pathway between treatment and response, or else be causally downstream of such a mediator. We investigate causal mediation analysis as an approach to statistically infer potential mediation effects of biomarkers.Steps involve graphically stating the causal structure using DAGs, formulating estimands of interest and using statistical methods to derive estimates. However, longitudinal clinical data are commonplace and causal estimation of such data is notoriously challenging, standard statistical methods might not provide appropriate target estimates. Thus, we also explore methods to account for time-varying confounding in mediation analysis, one such method discussed provides a reasonable approximation by "Landmarking" the biomarker process at a particular timepoint t [3], and modeling the clinical outcome data after time t. We aim to outline fundamental ideas of causal mediation [4] analysis and delineate a potential framework for its use in clinical development.

(1) Fleming, T.R. and Powers, J.H. Biomarkers and surrogate endpoints in clinical trials. Stat. Med. 31 (2012):2973–2984.

(2) Joffe, M.M. andGreene, T. Related causal frameworks for surrogate outcomes. Biometrics 65 (2009):530–538.

(3) Putter, H. and van Houwelingen, H.C. Understanding landmarking and its relation with time-dependent Cox regression. Stat. Biosci. 9 (2017):489–503.

(4) Imai, K., Keele, L. and Tingley, D. A general approach to causal mediation analysis. Psychol. Methods 15 (2010):309–334.

posters-wednesday-BioZ: 24

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

Martin Spindler^1,3, Victor Chernozhukov², Philipp Bach¹, Jan Teichert-Kluge¹, Sven Klaassen^1,3, Suhas Vijaykumar²

¹Universität Hamburg, Germany; ²MIT, USA; ³Economic AI, Germany

This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in medicine, biostatistics and data science in general who are interested in estimating causal quantities using non-traditional data.

posters-wednesday-BioZ: 25

Causal Machine Learning Methods for Estimating Personalised Treatment Effects - Insights on validity from two large trials

Hongruyu Chen, Helena Aebersold, Milo Alan Puhan, Miquel Serra-Burriel

University of Zurich, Switzerland

Causal machine learning (ML) methods hold great promise for advancing precision medicine by estimating personalised treatment effects. However, their reliability remains largely unvalidated in empirical settings. In this study, we assessed the internal and external validity of 17 mainstream causal heterogeneity ML methods—including metalearners, tree-based methods, and deep learning methods— using data from two large randomized controlled trials: the International Stroke Trial (N=19,435) and the Chinese Acute Stroke Trial (N=21,106). Our findings reveal that none of the ML methods reliably validated their performance, neither internal or external, showing significant discrepancies between training and test data on the proposed evaluation metrics. The individualized treatment effects estimated from training data failed to generalize to the test data, even in the absence of distribution shifts. These results raise concerns about the current applicability of causal ML models in precision medicine, and highlight the need for more robust validation techniques to ensure generalizability.

posters-wednesday-BioZ: 26

Challenges with subgroup analyses in individual participant data meta-analysis of randomised trials

Alain Amstutz^1,2,3, Dominique Costagliola⁴, Corina S. Rueegg^2,5,6, Erica Ponzi^2,5, Johannes M. Schwenke¹, France Mentré^7,8, Clément R. Massonnaud^7,8, Cédric Laouénan^7,8, Aliou Baldé⁴, Lambert Assoumou⁴, Inge C. Olsen^2,5, Matthias Briel^1,9, Stefan Schandelmaier^9,10,11

¹Division of Clinical Epidemiology, Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland; ²Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway; ³Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom; ⁴Sorbonne Université, Inserm, Institut Pierre-Louis d’Épidémiologie et de Santé Publique, Paris, France; ⁵Department of Research Support for Clinical Trials, Oslo University Hospital, Oslo, Norway; ⁶Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland; ⁷Université Paris Cité, Inserm, IAME, Paris, France; ⁸Département d’Épidémiologie, Biostatistique et Recherche Clinique, Hôpital Bichat, AP-HP, Paris, France; ⁹Department of Health Research Methods, Evidence, and Impact (HEI), McMaster University, Hamilton, Canada; ¹⁰School of Public Health, University College Cork, Cork, Ireland; ¹¹MTA–PTE Lendület "Momentum" Evidence in Medicine Research Group, Medical School, University of Pécs, Pécs, Hungary

Background: Individual participant data meta-analyses (IPDMA) offer the opportunity to conduct credible subgroup analyses of randomized clinical trial data by standardising subgroup definitions across trials, avoiding between-trial information sharing, and enabling effect comparison from trial to trial. These advantages are reflected and judged in item 1 and 2 of the Instrument for the Credibility of Effect Modification ANalyses (ICEMAN), a tool increasingly used by Cochrane meta-analysts and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group. However, guidance on optimal approaches to inform these ICEMAN items when conducting an IPDMA is limited and might differ when using an IPDMA one-stage versus two-stage models. We recently conducted two large IPDMAs, analysing 20 COVID-19 trials with over 23,000 randomised participants. Here, we provide a case report on the approaches used to inform ICEMAN item 1 and 2.

Methods: Following a pre-specified protocol, we applied one- and two-stage models for these IPDMAs, and documented challenges and mitigation strategies along the subgroup analysis process to enhance guidance for future updates to the ICEMAN tool.

Results: We identified several challenges. First, ensuring that the one-stage model separates within from between trial information (ICEMAN item 1), as the two-stage model does by design, is difficult and requires stratification of certain parameters, and correctly specifying the random parameters. Second, the default estimation methods may differ depending on the statistical packages used for one- and two-stage, resulting in different interaction estimates to inform ICEMAN item 1. Third, choosing descriptive thresholds for continuous effect modifiers in meta-analysis of interaction plots can mislead about the direction of effect modification in individual trials (ICEMAN item 2). We developed illustrative modular R codes to inform ICEMAN item 1 with one- and two-stage models, and provided plots with meta-analysis of interaction estimates alongside trial-specific subgroup effects to inform ICEMAN item 2.

Conclusion: At the conference, we will present these challenges in detail, their mitigation strategies and discuss the need for refining methods guidance to evaluate the effect modification credibility in IPDMAs using the ICEMAN tool.

posters-wednesday-BioZ: 27

Illustration and evaluation of a causal approach to sensitivity analysis for unmeasured confounding using measured proxies with a simulation study

Nerissa Nance^1,2, Romain Neugebauer³

¹Novo Nordisk, Denmark; ²University of California, Berkeley CA; ³Kaiser Permanente Northern California Division of Research, Pleasanton CA

Introduction

Sensitivity analysis for unmeasured confounding is a key component of applied causal analyses using observational data[1]. A general method [2] based on a rigorous causal framework has been previously proposed; this approach addresses limitations of existing methods such as reliance on arbitrary parametric assumptions or expert opinion without taking advantage of the available data at hand. We illustrate and evaluate this general method through a simulation study.

Methods
We simulated data using a parametrized nonparametric structural equation model. Our simulated observed data consisted of unmeasured covariate, measured covariate, exposure, and outcome. We studied the performance of point and interval estimation of an inverse probability weighting estimator that aims to adjust for unmeasured confounding through a measured proxy variable. We assessed this method under a range of scenarios, including: interaction terms with the exposure, various association strengths and directions between the covariates and the exposure/outcome.

Results

We demonstrated potential bias elimination and recovery of confidence interval coverage from unmeasured confounding in the case where the unmeasured covariate has the same magnitude and direction of association with both exposure and outcome as the measured proxy. However, in other scenarios, such as when the measured and unmeasured confounders had antagonistic effects, recovery was low or minimal.

Discussion

We illustrate through simulations that when there is the same magnitude and direction of the association of the unmeasured confounder and measured proxy with the exposure and outcome, the true unconfounded effect can be fully recovered. However, we also show how this recovery can break down in other situations that analysts may encounter. Results from this study informs key practical considerations for applying these methods, as well as highlight potential limitations.

References

Dang LE et al.. A causal roadmap for generating high-quality real-world evidence. J Clin Transl Sci. 2023 Sep 22;7(1):e212.
Luedtke, A.R., Diaz, I. and van der Laan, M.J., 2015. The statistics of sensitivity analyses.

posters-wednesday-BioZ: 28

Quantifying causal treatment effect on binary outcome in RCTs with noncompliance: estimating risk difference, risk ratio and odds ratio

Junxian Zhu, Mark Y. Chan, Bee-Choo Tai

National University of Singapore, Singapore

Randomized Controlled Trials (RCT) are currently the most reliable method for empirically evaluating the effectiveness of a new drug. However, patients may fail to adhere to the treatment protocol due to side effects. Medical guidelines recommend reporting the risk difference (RD), the risk ratio (RR) and the odds ratio (OR), as they offer distinct perspectives on the effect of the same drug. Unlike RD, there are only a few available methods to estimate RR and OR for RCT in the presence of non-compliance. In this paper, we propose a new inverse probability weighting (IPW)-based RD, RR and OR estimators for RCT in the presence of non-compliance. This IPW-based method creates a new categorical variable by utilizing information on non-compliance with the randomly assigned treatment. For all estimators, we prove their identification, asymptotic normality and derive corresponding asymptotic confidence intervals. We evaluate the performance of these three estimators through an intensive simulation study. Its application is further demonstrated using data from the IMMACULATE trial on remote post-discharge treatment for patients with acute myocardial infarction.

posters-wednesday-BioZ: 29

Blinded sample size recalculation for randomized controlled trials with analysis of covariance

Takumi Kanata, Yasuhiro Hagiwara, Koji Oba

Department of Biostatistics, School of Public Health, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan

Background / Introduction: In randomized controlled trials, covariate adjustment can improve the statistical power and reduce the necessary sample size compared to unadjusted estimator. Analysis of covariance (ANCOVA) is often used for adjusting baseline covariates when outcome is continuous. For designing a sample size based on ANCOVA, it is necessary to pre-specify the association between outcome and baseline covariates, as well as that among baseline covariates. However, determining these parameters at the design stage is challenging. While it may be possible to adaptively assess them during the trial, the statistical impact remains unclear. In this study, we propose a blinded sample size recalculation method for ANCOVA estimator, which is asymptotically valid under minimum distributional assumptions and thus allows for arbitrary model misspecification.

Methods: We show that the asymptotic variance of ANCOVA estimator and unadjusted estimator can be calculated using the pooled outcome and baseline covariates when the treatment is randomly assigned with 1:1 ratio independent of the baseline covariates. This result is valid under arbitrary model misspecification. Our proposal is as follows. First, we calculate the sample size based on a t-test without adjusting for baseline covariates. Then, at a specific time point (e.g. when 50% of outcome is observed), we assess the relevant parameters under blinded conditions without examining the between-group differences. We propose a sample size recalculation method that considers the asymptotic variance reduction through covariate adjustment and recalculate the final sample size based on this proposed method. We conducted simulations to evaluate the performance of the proposed method under various scenarios.

Results: The proposed method achieved a nominal statistical power under various scenarios and it reduced the necessary sample size at the final analysis according to the correlations between the outcome and the baseline covariates; for example, when the correlations are 0.5, the sample size reduction ranged from 15% to 36% on average. Although the proposed method was based on the asymptotic results, it performed well under the relatively small sample size. We also found that type-I error at the final analysis was not affected by the proposed method.

Conclusion: The proposed sample size recalculation method achieves a nominal statistical power in randomized controlled trials based on ANCOVA without type-I error inflating. The proposed method possibly reduces the necessary sample size, and it would lead to efficient drug development.

posters-wednesday-BioZ: 30

Variance stabilization transformation for the intraclass correlation coefficient of agreement with an application example to meta-analyses of inter-rater reliability studies

Abderrahmane Bourredjem^1,2,3, Isabelle Fournel¹, Sophie Vanbelle⁴, Nadjia El Saadi³

¹Inserm CIC1432, Centre d’investigation clinique, Module Epidémiologie Clinique/Essais cliniques, CHU de Dijon, France.; ²Institut de Mathématiques de Bourgogne, UMR 5584, CNRS, Université de Bourgogne, F-21000 Dijon, France.; ³LAMOPS, École Nationale Supérieure de Statistique et d’Economie Appliquée, Kolea, Algérie.; ⁴Department of Methodology and Statistics, Care and Public Health Research Institute (CAPHRI), Maastricht University, The Netherland.

Introduction :

We consider the problem of variance stabilizing transformation (VST) for the two-way intra-class correlation coefficient of agreement (ICC2a) in inter-rater reliability studies when both raters and subjects are assumed to be randomly selected from their respective populations. Such transformations aim to make the ICC2a’s variance independent from its estimate, improving the ICC2a confidence interval (CI) and the combination of independent ICC2as in meta-analyses. In this work, we calculate three potential VSTs for the ICC2a, evaluate their properties by simulation for single study CIs and demonstrate their use on meta-analysis of inter-rater reliability studies.

Methods :

It was recently shown that the variance of the ICC2a estimate depends on a nuisance parameter, defined as the ratio of the inter-rater to the inter-subject variances. Using this variance expression, three VST approximations (noted T0, T1 and T2) were obtained addressing the nuisance parameter differently. A simulation study with small to moderate sample sizes compared the properties of the obtained VSTs against two reference CIs methods: 1) the modified large sample approach (MLS), 2) a beta-distribution-based method (β). Finally, we illustrated the use of our VSTs on a single inter-rater reliability study with 10 physiotherapists evaluating the exercise performance of 42 low back pain patients, as well as on a meta-analysis of 11 inter-rater reliability studies of upper extremity muscle tone measurements.

Results :

The analytical expression of the three VSTs varies in complexity, from T0 (the simplest) to T2 (the most complex, requiring numerical methods to calculate its inverse transformation), through T1 (a middle level between T0 and T2). Simulations show that for small samples (up to 30 subjects and fewer than 10 raters), the MLS and β approaches remain preferable. For medium-sized samples (from 40 subjects and 10 raters), T1 provides coverage rates close to 95% while shortening the CI length. In the meta-analysis example, T1 offers advantages including transformed estimators simpler to interpret and better considering study weights in the synthesis of the ICC2a estimates and their CI.

Conclusion :

We propose a novel VST (noted T1) for ICC2a, filling a gap in the literature. We recommend using T1 for ICC2a CIs in medium-sized individual studies and for meta-analyses of inter-rater reliability studies. However, more extensive simulations are required to refine this recommendation, especially for meta-analyses.

posters-wednesday-BioZ: 31

Bridging Single Arm Studies with Individual Participant Data in Network Meta-Analysis of Randomized Controlled Trials: A Simulation Study

Katerina Maria Kontouli, Stavros Nikolakopoulos, Christos Christogiannis, Dimitrios Mavridis

University of Ioannina, Greece

Background: There is a growing interest in including single-arm studies within health technology assessments (HTA). Manufacturers often have access to individual participant data (IPD) from their own studies (a single-arm study evaluating treatment B), while only aggregate data (AGD) are available from published studies (e.g., comparing treatments C, D etc. to a reference treatment A). Several methods such as the Matching-Adjusted Indirect Comparison (MAIC) and the Simulated Treatment Comparison (STC) have been suggested to estimate an indirect effect (e.g, BvsA, BvsC) when the distribution of prognostic factors and effect modifiers differ across studies. The aim is to evaluate MAIC and STC in estimating an indirect effect in the above scenario through a simulation study.

Methods: We examined three methods: two widely used adjusted methods for unanchored comparisons, MAIC and STC, and the naïve (unadjusted) method. We applied these methods to incorporate single-arm studies with available interventions within a connected network of randomized controlled trials. To optimize the matching process, we employed two distinct distance metrics: Gower’s and Mahalanobis distance. Our simulation study explored various scenarios, varying (i) the sample size of studies, (ii) the magnitude of the treatment effect, (iii) the correlation between continuous covariates representing study population characteristics, (iv) the baseline probability, and (v) the degree of overlap between the single-arm study and the RCTs.

Results: Our simulation results indicate that when all continuous covariates are drawn from the same distributions with zero correlation, all methods perform similarly in terms of bias, mean squared error, and coverage across all scenarios. However, when the covariate overlaps between the single-arm study and the RCTs is around 80%, the Bucher method produces more biased estimates compared to MAIC and STC. As the overlap decreases to approximately 60%, the differences between MAIC and STC become more pronounced, particularly in terms of coverage and MSE.

Conclusion: STC emerges as the most robust approach for integrating evidence from single-arm studies into a network of RCTs. Additionally, Mahalanobis distance proves to be effective in identifying the optimal match, enhancing the reliability of the synthesis.

posters-wednesday-BioZ: 32

Comparative Efficacy and Safety of Migraine Treatments: A Network Meta-Analysis of Clinical Outcomes

Shashank Tripathi¹, Rachna Agarwal²

¹University College of Medical Sciences GTB Hospital, New Delhi, India; ²Institute of Human Behavior and Allied Sciences, New Delhi, India

Introduction

Migraine is a common and debilitating neurological condition, affecting roughly 10% of the global population and placing a significant burden on public health. It occurs in episodes, often characterized by intense headaches accompanied by sensitivity to light (photophobia), sensitivity to sound (phonophobia), and a range of autonomic and sensory disturbances.

Methods

A comprehensive search of three databases was conducted up to April 30, 2023. A frequentist network meta-analysis was utilized to estimate both direct and indirect effects across three outcomes; mean migraine days, freedom for pain in two hours, and adverse event. Interventions were ranked independently for each outcome using the p-score. The choice of meta-analysis model was based on the I² statistic: a random-effects model was applied when I² exceeded 30%, while a fixed-effect model was used when I² was ≤30%. All statistical analyses were performed using R version 4.3.2.

Results

A total of 80 articles were included in current investigation. For, change in mean migraine days (MMD) as direct estimate, suggested statistically significant result for CGRP antagonist [SMD: -0.38 (-0.61, -0.14)], CGRP mAbds [SMD: -0.35 (-0.41, -0.31)] and Triptans [SMD: -0.36 (-0.62, -0.10)]. Similarly, direct estimates were calculated for freedom for pain in two hours, suggested statistically significant result CGRP antagonist [RR: 5.83 (2.50, 13.59)], Dihydroergotamine [RR: 19.92 (3.41, 116.76)], Nasal agent (NSAID) [RR: 10.27 (1.03, 102.28)], Nasal agent (Triptan) [RR: 8.27 (3.51, 19.56)], NSAID [RR: 19.1 (7.36, 49.01)], and Triptan [RR: 22.82 (16.74, 31.12)]. Additionally, for the outcome adverse event, the direct estimate suggested statistically significant result for CGRP mAbs [RR: 2.77 (1.97, 3.91)], CGRP antagonist [RR: 2.92 (1.95, 4.37)], Dihydroergotamine [RR: 3.99 (1.47, 10.82), NSAID [RR: 4.21 (2.1, 8.1)], Nasal agent (CGRP antagonist) [RR: 7.61 (2.31, 25.19)], Triptans [RR: 8.40 (6.91, 10.22)], Nasal agent (triptan) [RR: 23.57 (9.01, 61.71)]. The indirect estimates were calculated taking all treatments under investigations as reference treatment, simultaneously, for each outcome of interest.

Conclusion

A network meta-analysis of migraine treatments found Triptans to be highly effective for pain, though with a higher risk of adverse events. CGRP antagonists excelled at reducing monthly migraine days but also had increased side effects.

posters-wednesday-BioZ: 33

Optimal standardization as an alternative to matching using propensity scores

Ekkehard Glimm, Lillian Yau

Novartis Pharma, Switzerland

In many development programs in the pharmaceutical industry, there is a need for indirect comparisons of medical treatments that were investigated in separate trials. Usually, trials have slightly different inclusion criteria, hence the influence of confounding factors has to be removed for a “fair” comparison. The most common method applied for this is propensity score matching. This method yields a set of weights used to re-weight patients in such a way that the weighted averages of the confounding variables are rendered comparable across the studies.

Propensity score matching typically achieves "roughly matched" groups, but almost invariably some differences between the averages of the matching variables in the compared trials remain.

We have recently suggested an approach for exact matching which may serve as an alternative to propensity score matching via a logistic regression model. This approach treats the matching problem as a constrained optimization problem. This approach guarantees that post-matching, the averages of the variables used in matching from the two trials are identical. While several objective functions could in theory be selected to generate a set of weights, in this talk we will focus on weights that maximize the effective sample size (ESS).

While the approach is closely related to matching-adjusted indirect comparison (MAIC, Signorovitch et al, 2010), it goes beyond their suggestion because we do not impose a specific functional form on the matching weights. Furthermore, in the talk we focus on the case where individual patient data (IPD) is available from all trials in the analysis, whereas the original MAIC approach considered only the matching of IPD onto aggregated data.

In the talk, we illustrate the application of the approach to two studies. Furthermore, we present the results from a simulation study showing that the new suggestion leads to weights which are considerably more stable than propensity score weights.

References

Signorovitch JE, Wu EQ, Andrew P, et al. Comparative effectiveness without head-to-head trials: a method for matching-adjusted indirect comparisons applied to psoriasis treatment with adalimumab or etanercept. PharmacoEconomics. 2010;28(10):935-945.

Glimm, E. and Yau, L. (2022): Geometric approaches to assessing the numerical feasibility for conducting matching-adjusted indirect comparisons. Pharm Stat 21, 974-987.

posters-wednesday-BioZ: 34

Evaluating Diagnostic Tests Against Composite Reference Standards: Quantifying and Adjusting for Bias

Vera Hudak, Nicky J. Welton, Efthymia Derezea, Hayley Jones

University of Bristol, United Kingdom

Background: Composite reference standards (CRSs) are often used in diagnostic accuracy studies in situations where gold standards are unavailable or impractical to carry out on everyone. Here, the test under evaluation is compared with some combination (composite) of results from other tests. We consider a special case of CRS, which we refer to as a ‘check the negatives’ design. Here, all study participants receive an imperfect reference standard, and those who test negative on this are additionally tested with the gold standard. Unless the imperfect reference standard is 100% specific, some bias can be anticipated.

Methods: We derive algebraic expressions for the bias in the estimated accuracy of the test under evaluation in a ‘check the negatives’ study, under the assumption that test errors are independent given the true disease status. We then describe how bias can be adjusted for using a Bayesian model with an informative prior for the specificity of the imperfect reference standard, based on external information. Our approach is evaluated through a simulation study under two scenarios. First, we consider the case where the prior for the specificity of the imperfect reference standard is correctly centred around its true value, and we assess the impact of increasing uncertainty by increasing the prior standard deviation. Second, we examine the case where the prior is incorrectly centred, but the true value remains within the 95% prior credible interval, to explore the consequences of moderate prior misspecification.

Results/Conclusions: In a ‘check the negatives’ study, under the assumption of conditional independence of errors made by the test under evaluation and the imperfect reference standard, the estimated specificity is unbiased but the sensitivity is underestimated. Preliminary findings suggest that, if the informative prior is correctly centred, the Bayesian model will always reduce bias and can successfully eliminate it in some, but not all, scenarios. Full simulation results, including those with incorrectly centred prior, and their implications will be presented at the conference.

posters-wednesday-BioZ: 35

Characteristics, Design and Statistical Methods in Platform Trials: A Systematic Review

Clément R. Massonnaud^1,2, Christof Manuel Schönenberger³, Malena Chiaborelli³, Selina Ehrenzeller³, Alexandra Griessbach³, André Gillibert⁴, Matthias Briel³, Cédric Laouénan^1,2

¹Université Paris Cité, Inserm, IAME, F-75018 Paris, France; ²AP-HP, Hôpital Bichat, Département d’Épidémiologie, Biostatistique et Recherche Clinique, F-75018 Paris, France; ³CLEAR Methods Center, Division of Clinical Epidemiology, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland; ⁴Department of Biostatistics, CHU Rouen, Rouen, France

Background

Platform trials (PTs) are gaining popularity in clinical research due to their innovative and flexible methodologies. However, their complex design underscores the need for a review of how they are currently implemented. The objective of this systematic review was to determine the characteristics, methodological and statistical practices in PTs.

Methods

We identified PTs from trial registries and bibliographic databases up to August 2024. Eligible PTs were randomized controlled trials studying multiple interventions within a single population, with flexibility to add or drop arms. Data were extracted on trial status, design, statistical methods, and reporting practices. Key variables included sample size determination, interim analyses, and type I error control. Descriptive statistics summarized findings across therapeutic areas and statistical framework (frequentist or Bayesian).

Results

We identified 190 PTs. Most focused on infectious diseases (77 [40.5%], including 57 for COVID-19) and oncology (69 [36.3%]). PT initiation peaked during the COVID-19 pandemic but has since stabilized at 85 active trials, with 25 PTs in planning. Non-industry sponsorship accounted for 78% (142/183) of PTs, with differences between infectious disease (95%, 71/75) and oncology trials (51%, 35/68). A complete master protocol was available for 47% (89/190) of all PTs and for 55% (83/152) of ongoing, completed, or discontinued PTs. Amendments were tracked in 61% (52/85) of protocols with multiple versions. Registry entries were considered up-to-date for 87% (153/175) of registered PTs. Bayesian designs featured in 59/190 PTs versus 56/190 frequentist trials, 20/190 trials utilizing both frameworks (unclear statistical framework in 55/190 PTs). Overall, 25/111 trials (23%) were designed without a pre-determined target sample size, all of which were Bayesian. Among these, 15 were explicitly reported as “perpetual” trials. The number of interim analyses was pre-determined in 19% (11/58) of Bayesian trials versus 58% (28/48) of frequentist trials. Simulations to evaluate operating characteristics were used in 93% (39/42) of Bayesian trials. Simulation reports were available in 67% (26/39) of cases, and the procedure was detailed for 62% (24/39) of trials. Only two trials shared the simulation code.

Conclusions

Platform trials remain popular and increasingly diverse. Efforts to enhance transparency and reporting, especially in complex Bayesian platform trials, are essential to ensure reliability and broader acceptance.

posters-wednesday-BioZ: 36

WRestimates: An R Package for Win-Ratio Sample Size and Power Calculations

Autumn O Donnell

University of Galway, Ireland

The win-ratio has excellent potential for determining the overall efficacy of treatments and
therapies in clinical trials. Its ability to hierarchically account for multiple endpoints provides a holistic metric of the treatment effect. For the win-ratio to become a prominent and reliable statistical method outside of cardiovascular disease, there is a need for a straightforward approach to the study design, particularly the power and sample size determination. An appropriate method for determining these metrics is vital to ensure the validity of the results obtained in a study. The WRestimates package provides easy-to-use functions which can be used in required sample size determination and power of studies implementing the win-ratio. These allow for the calculation of sample size and power based on estimands or pilot data, negating the need for complex simulation-based methods which require many assumptions to be made of the data.

posters-wednesday-BioZ: 37

Randomizing With Investigator Choice of Treatment: A Powerful Pragmatic Tool in Clinical Trials

Lillian Yau, Betty Molloy

Novartis Pharma, Switzerland

Taking a patient-centric approach, pragmatic clinical trials aim for study designs that are closer to clinical practice. Results of treatment benefits and risks of new medical products from these trials can provide information for patients, health care professionals, and decision-makers that are more easily generalized to the real world.

We present as an example the design of a multi-regional, phase III registration study for a first-line cancer treatment. The study compares an experimental treatment against two generations of standards-of-care (SoC) that are approved and used worldwide for newly diagnosed patients. The first generation (1G) and second generation (2G) treatments differ with respect to efficacy and safety.

To mimic clinical practice, before randomization, trial investigators and patients together selected an SoC option based on patient-related factors such as age, comorbidities, disease characteristics, as well as on regional practice. This choice of SoC was used as a stratification factor in the randomization and subsequent data analysis. This approach facilitates causal inference on the comparison of the experimental treatment with the different SoC options (1G or 2G) separately and combined.

To satisfy the requirements of different health authorities and the reimbursement agencies, joint primary endpoints as well as key secondary endpoints were designed to be tested at different time points. Strong-control of the type I error was guaranteed by combining multiplicity adjustment with group sequential testing.

By allowing investigator choice of 4 currently available SoC in the active control arm, the study optimized patients’ treatment and reduced the risk of exclusion of patients. It was very attractive to both patients and physicians, as reflected in the fast recruitment with close to 30 patients per month, nearly double what was expected in this disease area.

The study had its primary and key secondary read-outs in 2024. The primary results were the basis of the approval of the new treatment in many countries including the US, Canada, and Switzerland. The key secondary results are used to support the submission to EMA.

This study not only advances the therapeutic landscape but also sets a benchmark for future clinical trials, demonstrating that patient-centered strategies and robust designs can address the requirements of multiple decision makers and can lead to significant advancements in clinical research and patient care.

posters-wednesday-BioZ: 38

Confirming assay sensitivity in 2-arm non-inferiority trial using meta-analytic-predictive approach

Satomi OKAMURA¹, Eisuke HIDA²

¹Department Of Medical Innovation, The University of Osaka Hospital, Japan; ²Graduate School of Medicine, The University of Osaka, Japan

Introduction and Objective: Assay sensitivity is a well-known issue in 2-arm non-inferiority (NI) trials. To assess assay sensitivity, a 3-arm NI trial including placebo, control, and treatment is strongly recommended, with concerns about ethics and feasibility. FDA guidance on NI trials states: “In the absence of a placebo arm, knowing whether the trial had assay sensitivity relies heavily on external information (not within-study), giving NI studies some of the characteristics of a historically controlled trial.” Hence, the new NI trial must be similar to the historical trials. Additionally, the historical trials must have consistently shown that the ‘control’ in the NI trial is superior to placebo. The superiority here requires that the effect of the ‘control’ minus a NI margin is greater than that of placebo, not just the ‘control’.

Our objective is to propose a method to ensure the similarity of the NI trial to the historical trials and the superiority of ‘control’ to placebo for confirming assay sensitivity in the 2-arm NI trial. Information from historical trials usually consists of aggregate data. However, it has become clear that when effect modifiers are present, simple summary statistics for the entire population are insufficient. Therefore, it is important that the proposed method take into account the presence of effect modifiers.

Method and Results: To assess assay sensitivity, we use the meta-analytic-predictive approach. This approach is the Bayesian method so the prior distribution, especially in this study for the between-trial heterogeneity, is crucial. We assume the parameter follows the half-normal distribution for deviation or inverse-gamma distribution for variance. The performance of the approach is evaluated from two perspectives. First, we assess the influence of the prior setting on assay sensitivity by varying the amount of prior information about between-trial heterogeneity. Second, we demonstrate what trials may reduce assay sensitivity by setting multiple conditions for the historical trials, such as the number of historical trials, sample size, effect size, and the property of effect modifiers. For each scenario, we compute the posterior distribution of the ‘control’ effect and assess the performance of the method through joint power and type I error rate.

Conclusions: Our simulation study suggests that the meta-analytic-predictive approach is one of the useful methods to evaluate assay sensitivity in 2-arm NI trial. Especially, the consideration of uncertainty, which is unique to the Bayesian approach, is of great benefit where only the aggregate data in the historical trials are available.

posters-wednesday-BioZ: 39

Adding baskets to an ongoing basket trial with information borrowing: When do you benefit?

Libby Daniells¹, Pavel Mozgunov¹, Helen Barnett³, Alun Bedding⁴, Thomas Jaki^1,2

¹MRC Biostatistics Unit, Cambridge University, United Kingdom; ²Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany; ³Department of Mathematics and Statistics, Lancaster University, Lancaster, United Kingdom; ⁴Roche Products, Ltd, Welwyn Garden City, United Kingdom

Innovation in trial designs has led to the development of basket trials in which a single therapeutic treatment is tested in several patient populations, each of which forms a basket. This trial design allows for the testing of rare diseases or subgroups of patients. However, limited basket sample sizes can cause a lack of statistical power and precision of treatment effect estimates. This is tackled through the use of Bayesian information borrowing.

To provide flexibility to these studies, adaptive features are desirable as they allow for pre-specified modifications to an ongoing trial. In this talk we focus on the incorporation of (a) newly identified basket(s) part-way through a study. We propose and compare several approaches for adding new baskets to an ongoing basket trial under an information borrowing structure and highlight when it is beneficial to add a new basket to an ongoing trial as opposed to running a separate investigation for them. We also propose a novel calibration for the decision criteria in basket trials that is robust with respect to false decision making. Results display a substantial improvement in power for a new basket when information borrowing is utilized, however, this comes with potential inflation of error rates. This inflation is reduced under the novel calibration procedure.

posters-wednesday-BioZ: 40

Optimizing Adaptive Trial Design to Ensure Robustness Across Varying Treatment Effect Assumptions

Valeria Mazzanti¹, Dirk Klingbiel²

¹Cytel Inc.; ²Bristol Myers Squibb

Background:

Strong adaptive clinical trial design relies on several key aspects: experience in a therapeutic area; expertise in statistical methodology; and appropriate technology to assess design robustness. A recent study design assessment for a compound under development in Hematology highlighted each of these aspects in an interesting way. The study’s primary endpoint was Progression-Free Survival (PFS), though there was also strong interest in monitoring observed events for Overall Survival (OS), adding to the design’s complexity. Our aim in this assessment and optimization process was to shorten the expected average study duration, while still ensuring appropriate statistical power to detect a minimally clinically viable treatment effect for this product.

Methods:

The original design targeted 90% power using a 1-sided alpha of 0.025 and 1:1 randomization approach. The design included one interim analysis after 40% of PFS events, assessing futility only. In our simulation plan, we varied the number of interim analyses (1 or 2 interim looks) and explored the impact of a variety of interim timings and types of assessment (futility and/or efficacy) on the expected number of events. We also employed a multi-state model to simulate PFS and OS events for each patient so that we could report how many OS events would be observed at each analysis of PFS. These variations resulted in over 5,000 parameter combinations that were ranked and scored in-line with the stated strategic priority of overall reduction in study duration, using industry-standard advanced statistical software.

Results:

We managed to optimize our design such that the required sample size fell by 85 patients and was 13 months shorter in duration on average. The optimized design included two interim analyses, in which both efficacy and futility were assessed. The additional efficacy evaluations led to a high probability of early stopping without compromising on overall power of the study.

Conclusion:

The results of the exploration highlighted the value of adding an efficacy stopping boundary and a second, later interim analysis both leading to savings in average sample size and average study duration. Additional explorations may include assigning statistical significance to evaluate the treatment’s impact on the OS endpoint as well, and assessing the probability of success of the trial by sampling the events generated in our simulation from prior distributions identified from historical studies.

posters-wednesday-BioZ: 41

N-of-1 Trials to Estimate Individual Effects of Music on Concentration

Thomas Gärtner¹, Fabian Stolp¹, Stefan Konigorski^1,2

¹Hasso Plattner Institute for Digital Engineering, Germany; ²Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, USA

Focus and concentration are influenced by various environmental factors, with music shown to impact cognitive performance. However, recent research highlights the individualized nature of the effect of music, as responses vary largely based on genre and personal preference. Traditional population-level studies often obscure these differences, whereas N-of-1 trials can provide a personalized approach that may be particularly suited for examining how self-selected music genres affect concentration.
This study presents the design of a series of N-of-1 trials investigating the individual effects of music on cognitive processes as primary outcome, measured using a digitally adjusted Stroop test. In the study, participants will select one music genre, with or without lyrics, as their intervention, which will be compared to silence as a baseline. Each participant will be randomly assigned to a sequence of 3-minute music listening periods (intervention, A) and 3-minute silent periods (control, B) in a two-cycle crossover design (ABAB or BABA). To minimize carryover effects and concentration loss, a 1-minute break is scheduled between blocks. After each block, participants will complete a brief questionnaire to assess self-reported concentration and stress levels. Additionally, physiological proxies for stress and cognitive load, including heart rate, electroencephalography (EEG), and pupil dilation, will be recorded. Intervention effects will be estimated using a Bayesian linear mixed models, with a primary focus on individual-level analyses and secondary analyses at the population level.

This study will provide valuable insights into the personalized effects of music on concentration, helping individuals optimize their cognitive performance. At the population level, it will identify variations in concentration effects across different music genres, contributing to the broader understanding of music as a cognitive intervention.

posters-wednesday-BioZ: 42

Simulation Study Examining Impact of Study Design Factors on Variability Measures

Laura Quinn^1,2, Jon Deeks^1,2, Yemisi Takwoingi^1,2, Alice Sitch^1,2

¹Department of Applied Health Sciences, University of Birmingham; ²National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK

Introduction

Interobserver variability studies in diagnostic imaging are crucial for assessing the reliability of imaging test interpretations between different observers. The design and conduct of these studies are influenced by various factors that can impact the calculation and interpretation of variability estimates. These factors include participant sample size, condition prevalence, diagnostic test discrimination, and reader error levels.

Methods

Data was simulated for a study design with binary outcomes and two interpretations for each patient. A range of scenarios were simulated, varying participant sample size (25 to 200), condition prevalence (5% to 95%), diagnostic test discrimination (good, reasonable, poor), and reader error levels (low, medium, high). For each combination, 1,000 simulations were performed, and variability measures (percentage agreement, Cohen’s kappa, Prevalence- Adjusted Bias-Adjusted Kappa (PABAK), Krippendorff’s alpha, and Gwet’s AC coefficient) were calculated, along with sensitivity and specificity.

Results

The study showed that increased sample size consistently produced more precise variability estimates across all scenarios. Percentage agreement consistently showed the highest values among the variability measures. PABAK and Gwet’s AC coefficient demonstrated greater stability and less sensitivity to condition prevalence compared to Cohen’s kappa and Krippendorff’s alpha, which showed more variable performance. As diagnostic test discrimination decreased and reader error increased, all variability measures showed a decline.

Conclusion

These findings show the importance of considering different factors in assessing interobserver variability in diagnostic imaging tests. Different variability measures are affected in distinct ways by participant sample size, condition prevalence, diagnostic test discrimination, and reader error levels. By providing guidance on designing interobserver variability studies, future studies can be improved, providing more accurate information on the reliability of diagnostic imaging tests, leading to better patient care.

posters-wednesday-BioZ: 43

Simulation-based optimization of adaptive designs using a generalized version of assurance

Pantelis Vlachos¹, Valeria Mazzanti¹, Boaz Adler²

¹Cytel Inc, Switzerland; ²Cytel Inc, USA

The power of cloud computing is utilized to create a tool that collects information from different parts of the clinical development team (clinical, operations, commercial etc) and with the statistician at the driver seat seeks and proposes designs that optimize a clinical study with respect to sample size, cost, duration and power. The optimization is performed using a generalized assurance measure that takes into account all trial possible scenarios with respect to treatment effect, control response, enrollment, dropouts etc. Furthermore, this tool can be used to communicate and update information to the trial team in real time, considering (possibly) changing target objectives. Case studies of actual adaptive trials will be given.

posters-wednesday-BioZ: 44

Evaluating the impact of outcome delay on adaptive designs

Aritra Mukherjee¹, Michael J. Grayling², James M. S. Wason¹

¹Population Health Sciences Institute, Newcastle University; ²Johnson and Johnson

Background: Adaptive designs (AD) are a broad class of trial designs that allow pre-planned modifications to be made to a trial as patient data is accrued, without undermining its validity or integrity. ADs can lead to improved efficiency, patient-benefit, and power of a trial. However, these advantages may be affected adversely by a delay in observing the primary outcome variable. In the presence of such delay, a choice must be made between (a) pausing recruitment until requisite data is accrued for the interim analysis, leading to longer trial completion period; or (b) continuing to recruit patients, which may result in a large number of participants who do not benefit from the interim analysis. In the latter case, little work has investigated the size of outcome delay that results in the realised efficiency gains of ADs being negligible compared to classical fixed-sample alternatives. Our study covers different kinds of ADs and the impact of outcome delay on them.

Methods:We assess the impact of delay on the expected efficiency gains of an AD by estimating the number of pipeline patients being recruited in the trial under the assumption that recruitment is not paused while we await treatment outcomes. We assume different recruitment models to suitably adjust for single- or multi-centred trials. We discuss findings for two-arm group-sequential designs as well as multi-arm multi-stage designs. Further, we focus on sample size re-estimation (SSR), a design where the variable typically optimized to characterise trial efficiency is not the expected sample size (ESS).

Results and conclusions: Our results indicate that if outcome delay is not considered at the planning stage of a trial, this can translate to much of the expected efficiency gains being lost due to delay. The worst affected designs are typically those with early stopping, where the efficiency gains are assessed through a reduced ESS. SSR can also suffer adversely if the initial sample size specification was largely over-estimated.

Finally, in light of these findings, we discuss the implications of using the ratio of the total recruitment length to the outcome delay as a measure of the utility of different ADs.