46th Annual Conference of the International Society for Clinical Biostatistics (ISCB)

posters-tuesday: 1

Leveraging Wearable data for Probabilistic Imputation in Cardiovascular Risk Calculators.

Antoine Faul¹, Patric Wyss², Anja Mühlemann¹, Manuela Moraru³, Danielle Bower³, Petra Stute³, Ben Spycher², David Ginsbourger¹

¹Institute of Mathematical Statistics and Actuarial Science, University of Bern, Bern, Switzerland; ²Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland; ³Department of Obstetrics and Gynecology, University Women’s Hospital, Bern, Switzerland

Wearable technology for health data collection is rapidly expanding enabling continuous, non-invasive monitoring of physiological parameters such as heart rate variability and physical activity. This advancement offers promising improvements for cardiovascular disease (CVD) risk prediction, which traditionally depends on clinical measurements often requiring time-consuming and costly healthcare visits.

This study analyzes data from 193 female participants, aged 40 to 69, gathered during an observational study at Inselspital, Bern. Participants provided comprehensive medical and personal information, supplemented by wearable data collected with Garmin Vivosmart 3 devices over a week.

In this work, we explore the potential of replacing systematically missing inputs with probabilistic predictions derived from wearable data and self-reported information. By integrating this uncertainty into a risk calculator, we aim to provide probabilistic assessments of cardiovascular risk. Our approach uses an interpretable statistical model based on Gaussian copulas. This method flexibly characterizes the joint distribution, employing distinct marginals and a Gaussian dependence structure to facilitate analytical conditioning.

We extend the approach outlined by Mühlemann [1] by adressing the challenge of high dimensionality of smartwatch data. For this we focus on selected features obtained by both supervised and unsupervised dimensionality reduction techniques.

Proper scoring rules, such as the CRPS and the Brier Score, are employed to assess the quality of probabilistic predictions. We also compare various methods by cross validation in the context of high vs low CVD risk classification.

Our results demonstrate that wearable data can help in substituting clinical missing inputs in cardiovascular risk calculators, provided that an efficient dimension reduction step is implemented. However, the gains in predictive performance are moderate, suggesting that further exploration of advanced dimensionality reduction techniques could be beneficial.

[1] MÜHLEMANN, Anja, STANGE, Philip, FAUL, Antoine, et al. Comparing imputation approaches to handle systematically missing inputs in risk calculators. PLOS Digital Health, 2025, vol. 4, no 1, p. e0000712.

posters-tuesday: 2

Stepwise Prediction of Tuberculosis Treatment Outcomes Using XGBoost and Feature-Level Analysis: A Multi-Stage Approach to Clinical Decision Support

Linfeng Wang, Jody Phelan, Taane Clark

London School of Hygiene and Tropical Medicine, United Kingdom

Tuberculosis (TB) remains a global health crisis, with multidrug-resistant (MDR-TB) and extensively drug-resistant (XDR-TB) strains posing significant challenges to treatment. Utilizing the extensive TB Portals database, comprising clinical, radiological, demographic, and genomic data from 15,997 patients across high-burden countries, we developed an XGBoost-based machine learning model to predict treatment outcomes. Our approach categorizes features into four diagnostic evidence data categories: demographic, microbiology and disease state, x-ray result variables and treatment variables. This framework enables the model to progressively incorporate available data while maintaining robust predictive performance, even in the presence of missing values typical of real-world healthcare settings. The model achieved high predictive accuracy (AUC-ROC: 0.96, F1-score: 0.94), with key predictors including age of onset, drug resistance, and treatment adherence. Regional analysis highlighted variability in performance, underscoring the potential for localized model adaptation. By accommodating missing data at various diagnostic stages, our model provides actionable insights for personalized TB treatment strategies and supports clinical decision-making in diverse and resource-constrained contexts.

posters-tuesday: 3

The use of variable selection in clinical prediction modelling for binary outcomes: a systematic review

Xinrui Su, Gareth Ambler, Nathan Green, Menelaos Pavlou

Department of Statistical Science, University College London

Background

Clinical prediction models can serve as important tools, assisting in medical decision-making. Concise, accurate and interpretable models are more likely to be used in practice and hence an appropriate selection of predictor variables is viewed as essential. While many statistical methods for variable selection are available, data-driven selection of predictors has been criticised. For example, the use of variable selection with very low significance levels can lead to the exclusion of variables that may improve predictive ability. Hence, their use has been discouraged in prediction modelling. Instead, selection of predictors based on the literature and expert opinion is often recommended. Recent sample size guidelines also assume that predictors have been pre-specified, and no variable selection is performed. This systematic review aims to investigate current practice with respect to variable selection when developing clinical prediction models using logistic regression.

Methods

We focused on published articles in PubMed between 1-21 October 2024 that developed logistic prediction models for binary health outcomes. We extracted information on study characteristics and methodology.

Results

In total 141 papers were included in the review. We found that almost all papers (140/141) used variable selection. Univariable selection (UVS) was by far the most commonly reported method; it was used solely or sequentially alongside other methods in 78% (110/141) of papers. It was followed by backwards elimination (BE) (60/141, 43%), ‘with bulk removal’ (BR) from a single model (58/141, 41%) and LASSO (35/141, 25%). UVS and BE were frequently applied together (45/139, 32%), as were UVS and BR (43/139, 31%).

Conclusions

Despite criticisms regarding the uncritical use of data-driven variable selection methods, surprisingly almost all studies in this review employed at least one such method to reduce the number of predictors, with many studies using multiple methods. Traditional methods such as UVS and BE, as well as more modern techniques such as LASSO, are still commonly used. In the pursuit of parsimonious, as well as accurate risk models, model developers must be cautious when using methods based on significance testing, particularly with very low significance levels. However, methods such as LASSO, which directly aim to optimise out-of-sample predictive performance while also removing redundant predictors, may be promising and merit attention.

posters-tuesday: 4

Comparison of Methods for Incorporating Related Data when Developing Clinical Prediction Models: A Simulation Study

Haya Elayan, Matthew Sperrin, Glen Martin, David Jenkins

University of Manchester, United Kingdom

Background

Clinical Prediction Models (CPMs) are algorithms that compute an individual’s risk of a diagnostic or prognostic outcome, given a set of their predictors. Guidance states CPMs should be constructed using data directly sampled from the target population. However, researchers might also have access to additional and potentially related datasets (ancillary data) originating from different time points, countries, or healthcare settings, which could support model development, especially when the target dataset is small.

A critical consideration in this context is the potential heterogeneity between the target and ancillary datasets due to data distribution shifts. These occur when the distributions of predictors, event rates, or the relationships between predictors and outcome differ. Such shifts can negatively affect CPMs performance in the target population. We aim to investigate in what situations and using which methods, the ancillary data should be incorporated when developing CPMs. Specifically, if the effectiveness of utilising the ancillary data is influenced by the heterogeneity between the available datasets, and their relative sample sizes.

Methods

We conducted a simulation study to assess the impact of these factors on CPM performance when ancillary data is available. Target and ancillary populations were generated with varying degrees of heterogeneity. CPMs were developed using naive logistic regression (developed on data from target only), Logistic and Intercept regression updating methods (developed on ancillary data and updated to target), and importance weighting using propensity scores (developed on all available data, while weighting the ancillary data samples based on their similarity to the target). These models were then validated on independent data drawn from the same data-generating mechanism as the target population, using calibration, discrimination, and prediction stability metrics.

Results and Conclusion

Incorporating ancillary data consistently improved performance compared to using the target data only, especially when the target sample size was small. Both Logistic and Intercept Recalibration improve performance over naïve regression in most scenarios. However, the former showed greater variability in calibration slopes and more instability in calibration curves, while the latter performed worse in calibration slope under predictor-outcome association shift.

Importance weighting using propensity scores showed consistent results with improved performance to other methods in many scenarios, particularly under predictor–outcome association shift.

While this study investigates a-priori known data distribution shifts, their presence and type in practical settings are often unknown. Therefore, we recommend using importance weighting method for its robustness and stability across varied scenarios.

posters-tuesday: 5

A Systematic Review of Methodological Research on Multi-State Prediction Models

Chantelle Cornett, Glen Martin, Alexander Pate, Victoria Palin

University of Manchester, United Kingdom

Background:
Prediction models use information about a person to predict their risk of disease. Across health, patients transition between multiple states over time, such as health states or disease progression. Here, multi-state models are crucial, but these models require additional methodological considerations and their application in prediction modelling remains scarce. The methodological state-of-play of these methods in a prediction context has not been summarised.

Objectives:
This systematic review aims to summarise and critically evaluate the methodological literature on multi-state models, with a focus on development and validation techniques.

Methods:
A comprehensive search strategy was implemented across PubMed, Scopus, Web of Science, arXiv to identify methodological papers on multi-state models up to 7^th October 2024. Papers were included if they focused on methodological innovation, such as sample size determination, calibration, or novel computational methods; we excluded purely applied papers. Methodological details were extracted and summarised using thematic analysis.

Results:

The search identified 14,788 papers. After the title and abstract screening, there were 443 papers for full-text screening, of which 299 papers were included.
Preliminary findings from these studies reveal the majority of methodological research falls into the following groups:

Techniques for estimating transition probabilities, state occupation time, and hazards.
Hypothesis testing.
Variable selection techniques.

This presentation will overview the themes of methodological work, the limitations/gaps in methodological literature in this space, and outline areas for future work.

Conclusions:
Early results highlight progress in the methodological development of multi-state models and emphasise areas requiring further attention, such as more research into sample size and robust validation practices. The final results of this study aim to guide future research and support the adoption of best practices in the use of multi-state models.

posters-tuesday: 6

Assessing the robustness of prediction models: A case study on in-hospital mortality prediction using MIMIC-III and MIMIC-IV

Alan Balendran¹, Raphaël Porcher^1,2

¹Université Paris Cité, Université Sorbonne Paris Nord, INSERM, INRAE, Centre for Research in Epidemiology and StatisticS (CRESS); ²Centre d’Épidémiologie Clinique, Assistance Publique-Hôpitaux de Paris, Hôtel-Dieu

Clinical prediction models have become increasingly prevalent due to the availability of large healthcare datasets. While these models often achieve strong predictive performance, their robustness—their ability to remain stable under various perturbations—remains underexplored. However, models may experience significant performance degradation when tested on perturbed data (e.g., noisy data or datasets collected at different time points). Understanding how robust a prediction model is is essential for ensuring reliable clinical decision-making.

Building on an existing framework that identified eight key robustness concepts in healthcare (Balendran, A., Beji, C., Bouvier, F. et al. A scoping review of robustness concepts for machine learning in healthcare. npj Digit. Med. 8, 38 (2025)), we evaluate the robustness of different machine learning models using real-world critical care data from intensive care unit (ICU) patients.

We utilise the MIMIC-III and MIMIC-IV critical care databases to predict in-hospital mortality based on patient data from their first 24 hours of ICU admission. The dataset includes vital signs, laboratory test results, and demographic information (Johnson, A., Pollard, T., Shen, L. et al. MIMIC-III, a freely accessible critical care database. Sci Data 3, 160035 (2016)). To develop prediction models, we explore a range of machine learning approaches, from linear models such as logistic regression and LASSO to more complex tree-based methods, including random forest and gradient boosting. Additionally, we assess deep learning models, including a multilayer perceptron (MLP) and the recently introduced transformer-based model TabPFN (Hollmann, N., Müller, S., Purucker, L. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326 (2025)), which has been reported to outperform traditional gradient boosting techniques.

Each model is evaluated across multiple robustness concepts, including input perturbations, label noise, class imbalance, missing data, temporal validation, and subgroup analysis. To better reflect real-world clinical settings, we introduce varying levels of noise and test different scenarios for some concepts.

Our findings demonstrate that no model is consistently robust across all concepts, with some models being particularly sensitive to specific perturbations. Our result highlights that relying solely on standard performance metrics within a dataset does not account for potential deviations that can be encountered in real clinical settings. We advocate for robustness assessments as a crucial component of model evaluation and selection in healthcare.

posters-tuesday: 7

The Influence of Variable Selection Approaches on Prediction Model Stability in Low-Dimensional Data: From Traditional Stepwise Selection to Regularisation techniques

Noraworn Jirattikanwong¹, Phichayut Phinyo¹, Pakpoom Wongyikul¹, Natthanaphop Isaradech², Wachiranun Sirikul², Wuttipat Kiratipaisarl²

¹Department of Biomedical Informatics and Clinical Epidemiology (BioCE), Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; ²Department of Community Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand

Introduction: Prediction model instability presents significant challenges to clinical decision-making and may lead to patient harm. While several factors, such as dataset size and algorithm choice, are known to affect stability, evidence on how specific modelling decisions, particularly variable selection methods, influence stability remains limited. This study examines the impact of different variable selection approaches on prediction stability and model performance.

Methods: The German HOPE dataset of 9,924 patients, previously used to develop an anxiety prediction model was used. We generated three datasets of different sizes (0.5, 1, and 2 times the base size), where the base size was determined using Riley’s minimum sufficient sample size method. We defined 61 candidate parameters and replicated the model to predict anxiety using logistic regression. Six variable selection approaches were examined: (1) UNIVAR – univariate screening followed by backward elimination, (2) FULL – full model including all variables, (3) FORWARD – forward selection, (4) BACKWARD – backward elimination, (5) LASSO – least absolute shrinkage and selection operator, and (6) ELASTIC – elastic net. Model performance was evaluated in terms of discrimination and calibration. Optimism in performance metrics and mean absolute prediction error (MAPE) were estimated using the bootstrap internal validation procedure proposed by Riley and Collins.

Results: All variable selection approaches exhibited a similar level of discrimination at the base size and twice the base size. In contrast, at half the base size, both discrimination and calibration measures varied considerably. FULL achieved the highest discrimination in the smallest dataset but consistently displayed poor calibration across all sample sizes. Regularisation approaches (i.e., LASSO and ELASTIC) were well-calibrated across all dataset sizes, whereas traditional stepwise selection methods (i.e., UNIVAR, FORWARD, and BACKWARD) were only well-calibrated when the sample size was twice the base size. In terms of stability, both regularisation approaches had lower MAPE than others at the base size and twice the base size, while FULL showed lower MAPE at half the base size. All approaches required at least twice the minimum sufficient sample size to achieve a high level of individual stability.

Conclusion: Variable selection using regularisation is recommended, provided the sample size is sufficiently large. When sample sizes are around half the base size, regularisation approaches may still outperform other techniques in terms of stability and calibration. While FULL resulted in a modest improvement in stability, it exhibited significantly poorer calibration compared to UNIVAR, FORWARD, and BACKWARD.

posters-tuesday: 8

Early-detection of high-risk patient profiles admitted to hospital with respiratory infections using a multistate model

João Pedro Carmezim¹, Cristian Tebé¹, Natàlia Pallarès¹, Roger Paredes¹, Cavan Reilly²

¹Germans Trias i Pujol Research Instituteand Hospital (IGTP), Spain; ²University of Minnesota

Background: This study aims to identify clinically relevant prognostic factors associated with oxygen support, death or hospital discharge in a global cohort of adult patients with Influenza or COVID-19 using a multistate model.

Methods: Data was drawn from a cohort of adult patients diagnosed with respiratory infections admitted to a hospital of the Strategies and Treatments for Respiratory Infections and Viral Emergencies (STRIVE) research group. The study evaluates socio-demographic factors, medical history, comorbidities, vaccination status, virus type and clinical symptoms as prognostic factors. The multistate model was defined with the following states: hospital admission, noninvasive ventilation, invasive ventilation, oxygen support discharge, hospital discharge and death. The model estimates cause-specific hazard ratios, cumulative hazards and transition probabilities.

Results: A total of 4968 patients were included where the median age was 62.1 and the percentage of females was 47.9%. The number of patients that needed noninvasive ventilation was 1906 (38.4%), 277 (5.6%) required invasive ventilation , and 275 (5.5%) died. Demographic and clinical risk profiles revealed distinct progression pathways, and visualization using trajectory plots highlighted how risk factors influenced movement through disease states.

Conclusion: This study highlights the utility of a multistate model in mapping the progression of respiratory infections, providing critical insights into high-risk patient profiles. Transition probability trajectories provide clinicians with data to predict outcomes and, ideally, could help to plan resource allocation for these patients.

posters-tuesday: 9

Investigating fair data acquisition for risk prediction in resource-constrained settings

Ioanna Thoma¹, Matthew Sperrin², Karla Diaz Ordaz³, Ricardo Silva³, Brieuc Lehmann³

¹The Alan Turing Institute, London, United Kingdom; ²Division of Informatics, Imaging & Data Sciences, The University of Manchester, Manchester, United Kingdom; ³Department of Statistical Science, University College London, London, United Kingdom

Introduction: Accurate risk prediction relies on robust clinical prediction models (CPMs), yet their reliability, generalisability, and fairness can be constrained by the available data. While additional covariates may improve risk prediction, collecting them for an entire population might not always be feasible due to resource constraints. For example, genetic testing can provide additional predictive power when combined with a clinical risk model, but a population-wide rollout may not be financially viable. A key question is how to allocate resources, prioritising individuals for whom additional (genetic) testing would benefit most. This framework optimises utility and fairness when choosing between a baseline prediction model and a more costly but potentially more informative augmented model.

Methods: We develop a framework that quantifies the potential benefit to fairness and accuracy of a CPM when assessing policies for acquiring additional information for a subset of individuals. A specific use case is deploying an integrated tool that combines a traditional CPM, based on clinical risk factors, with a polygenic risk score (PRS). The goal is to evaluate the utility gained from such data integration. This involves comparing the outcomes of a conventional CPM with those of an integrated tool to assess how risk categorisation shifts when genetic information is incorporated.

Results: We apply our methodology to cardiovascular disease (CVD) risk prediction on a UK Biobank cohort of 96884 individuals aged 40-75. Transitions in risk classification help identify populations that benefit most from genetic score integration. Once these population subgroups have been identified, we define sub-sampling policies to determine which individuals should be selected based on their covariates and existing model uncertainty. We investigate deterministic and stochastic policies that also account for varying subgroup proportions, ensuring a representative and fair sample composition. The methodology identifies age and gender groups that experience the most significant shifts in risk classification when transitioning from the baseline to the integrated model.
Conclusion: This framework has the potential to guide future data collection strategies, helping to prioritise population subgroups that need it the most. While our application focuses on the evaluation of an integrated tool for CVD risk prediction, we expect the methodology to be broadly applicable and can be adapted to a variety of predictive models across the disease spectrum.

posters-tuesday: 10

A critical benchmark of Bayesian shrinkage estimation for subgroup analysis

Sebastian Weber¹, Björn Bornkamp¹, David Ohlssen²

¹Novartis Pharma AG, Switzerland; ²Novartis Pharmaceuticals, USA

The estimation of subgroup specific treatment effects is known to be a statistically difficult problem. We suggest to evaluate different estimation approaches using a benchmark. This benchmark is based on scoring the predictive distribution for the subgroup treatment effect using late phase clinical trial data comprising normal, binary and time-to-event endpoints. Bayesian shrinkage estimation models for subgroups are traditionally applied to non-overlapping subgroups using hierarchical models. This implies that several models need to be fitted to the same data set when several subgroup defining variables are of interest. Recently Wolbers et al (2024) propose to use a single global regression model using priors such as horseshoe priors to induce shrinkage for the used model. This method has the benefit that there is no need to create a disjoint space of subgroups. Thus, overlapping subgroups can be investigated with a single model avoiding the need to refit a given data set multiple times. We will compare the performance of different shrinkage approaches based on a real data benchmark. The evaluated approaches include no and full-shrinkage towards the overall treatment effect, Bayesian hierarchical shrinkage and more novel priors such as the global model prior R2D2 proposed by Zhang et al (2020).

posters-tuesday: 11

Mathematical Modelling of Oxygenation Dynamics Using High-Resolution Perfusion Data: An Advanced Statistical Framework for Understanding Oxygen Metabolism

Mansour Taghavi Azar Sharabiani¹, Alireza Mahani², Richard Issitt³, Yadav Srinivasan⁴, Alex Bottle¹, Serban Stoica⁵

¹School of Public Health, Imperial College London, United Kingdom; ²Statman Solution Ltd, United Kingdom; ³Perfusion Department, Great Ormond Street Hospital for Children, London, United Kingdom; ⁴Cardiac Surgery Department, Great Ormond Street Hospital for Children, London, United Kingdom; ⁵Cardiac Surgery Department, Bristol Royal Children’s Hospital, Bristol, United Kingdom

Background
Balancing oxygen supply and demand during cardiopulmonary bypass (CPB) is crucial to minimising adverse outcomes. Oxygen supply is determined by cardiac index (CI), haemoglobin concentration (Hb), and arterial oxygen saturation (SaO₂), whereas oxygen demand is driven by metabolism, which itself depends on body temperature (Temp). Actual oxygen consumption is driven by oxygen extraction ratio (OER), dynamically adapting to changes in oxygen supply and demand, yet the mechanisms of this adaptation remain poorly understood. We developed GARIX and eGARIX, mathematically extending classical time-series models to incorporate nonlinear dependencies, patient-specific variabilities and minute-by-minute OER dynamics.

Methods
GARIX is a time-series model that integrates exogenous variables (CI, Hb, SaO₂, Temp) with a disequilibrium term representing the imbalance between oxygen consumption and temperature-dependent oxygen demand, initially modelled via a constant Q₁₀ framework (van’t Hoff model). The model was trained on intraoperative data from 343 CPB operations (20,000 minutes) in 334 paediatric patients at a UK centre (2019–2021). eGARIX extends GARIX by relaxing the assumption of constant Q₁₀, introducing nonparametric temperature dependence (splines) and incorporating age, weight, and their interaction. Subgroup analyses explored OER responses across different age groups.

Results
GARIX identified that OER adapts in a two-phase process: a rapid adjustment phase (<10 minutes) and a slower phase lasting several hours. Equilibrium analysis estimated Q₁₀ ≈2.25, indicating that oxygen demand doubles with every 8.5°C temperature increase. eGARIX demonstrated indexed oxygen demand following a nonlinear trajectory with age and weight, peaking at 3 years of age. In neonates and infants, oxygen demand correlated positively with weight, whereas in adolescents, the correlation was negative. Additionally, temperature dependence deviated from the classical Q₁₀ assumption, showing low sensitivity at mild hypothermia and high sensitivity at deep hypothermia. Younger patients exhibited a diminished OER response to Hb changes compared to older children.

Conclusions
Proposed GARIX and eGARIX represent mathematical extensions of classical time-series modelling, enabling a data-driven approach to studying oxygen metabolism during CPB. By harnessing vast amounts of recently available high-resolution perfusion data, these models compensate for the ethical limitations of direct human experimentation, providing a powerful framework to refine intraoperative oxygenation strategies. Our findings highlight the importance of advanced mathematical modelling in optimising personalised oxygen delivery strategies, adapting to individual patient characteristics, and enhancing our understanding of oxygen metabolism in paediatric CPB.

posters-tuesday: 12

Marginal structural Cox model with weighted cumulative exposure modelling for the estimation of counterfactual Population Attributable Fractions

Yue Zhai¹, Ana-Maria Vilcu², Jacques Benichou^2,3, Lucas Morin², Agnès Fournier⁴, Anne Thiébaut², Vivian Viallon¹

¹Nutrition and Metabolism Branch, International Agency of Research in Cancer(IARC), Lyon, France; ²High Dimensional Biostatistics for Drug Safety and Genomics Team, Université Paris-Saclay, UVSQ, Inserm, CESP, Villejuif, France; ³Department of Biostatistics, Rouen University Hospital, Rouen, France; ⁴Exposome and Heredity Team, CESP U1018, Université Paris-Saclay, UVSQ, Inserm, Gustave Roussy, Villejuif, France

Introduction: Marginal structural Cox models (Cox MSMs) have become popular for estimating the causal effect of time-varying exposures on a time-to-event outcome, accounting for time-varying confounders affected by prior exposure levels. They can be combined with the weighted cumulative exposure (WCE) method to flexibly model the causal effect of past levels of the exposure on the hazard rate. This study evaluated the performance of the corresponding approach (Cox WCE MSM) based on regression B-splines, for the estimation of population attributable fractions (PAF) through extensive simulations.

Method: Independent samples of 10,000 and 1,000 individuals, each with 100 regular visits of follow-up, were generated. In each sample, approximately 50% of individuals experienced the event of interest before the end of follow-up. For a given hazard ratio comparing “always exposed” to “never exposed”, we considered four scenarios, with different standardized weight functions reflecting how past exposure causally influences the current hazard rate as time elapses since exposure: i) monotonically decreasing weight; ii) bell-shaped weight; iii) constant weight; iv) current exposure only. Estimands of interest were the PAF and the causal effect function of past exposure. Various versions of Cox WCE MSMs were implemented to assess the influence of parameters like the number of knots and the length of time window. Additionally, we implemented two versions of Cox MSM accounting for only current exposure and unweighted cumulative exposure, respectively. The Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) were used for model selection.

Results: PAF estimates produced by most Cox WCE MSMs were unbiased in scenarios i to iii, but were biased in scenario iv. The variance of Cox WCE MSMs was comparable to that of conventional Cox MSMs. Notably increasing the number of knots had little effect on variance. Models selected via either AIC or BIC provided unbiased PAF estimates across all scenarios. As for the causal effect of past exposure, although average estimates provided by Cox WCE MSMs were generally close to the true function, we observed large variation across samples, especially with smaller samples and weaker effects.

Conclusion: Overall, Cox WCE MSMs selected by either AIC or BIC yielded unbiased estimates of counterfactual PAF. To ensure robust model selection, we recommend considering also the conventional Cox MSMs that account for current and unweighted cumulative exposure in the model selection process.

posters-tuesday: 13

Lost in the Forest of Forest Plots? Practical Guidelines and an All-in-One Tool for Forest Plots

Hongqiu Gu, Yong Jiang, Hao Li

Beijing Tiantan Hospital, Capital Medical University, People's Republic of China

Background: Forest plots are indispensable visualization tools in meta-analyses and other contexts of medical research, yet existing guidelines and implementation tools are often fragmented and lack a cohesive framework. In this study, we aimed to develop comprehensive guidelines and integrated tools to extend the applicability of forest plots across a wider range of research contexts.

Methods: In consultation with a thorough review of existing literature and guidelines, combined with practical experience, we synthesized and developed a comprehensive classification system for forest plots driven by analysis methods. Additionally, we proposed key principles for their construction and created a versatile SAS macro to facilitate more effective application and communication of forest plots across various research scenarios.

Results: We categorized forest plots into four main types that correspond to regression analysis, subgroup analysis, estimation analysis, and meta-analysis across 11 scenarios independent of study design. The five key principles for creating effective forest plots are providing comprehensive data, arranging items logically, ensuring accurate scaling, and applying aesthetic formatting. Furthermore, we developed versatile and integrated SAS tools that align with the framework and principles proposed.

Conclusion: This guideline provides a versatile, integrated solution for applying forest plots across various research contexts. It is expected to lead to improved use and visualization of forest plots.

posters-tuesday: 14

Robust Outlier Detection with Skewness-Adjusted Fences: Theoretical Foundations and Applications

YUNCHAE JUNG, MINSU PARK

Department of Statistics and Data Science, Chungnam National University, Republic of Korea

Outlier detection plays a crucial role in statistical analysis by ensuring data integrity and improving the reliability of inferences. Traditional methods, such as Tukey’s boxplot, often struggle with skewed distributions, leading to inaccurate detection and potential misinterpretation of results. While approaches like the adjusted boxplot (Hubert and Vandervieren, 2008) provide some improvements, they can be computationally demanding and less effective under extreme skewness.

In this study, we present an outlier detection framework that incorporates a skewness-adjusted fence into an enhanced boxplot design. By utilizing a robust skewness measure based on the median absolute deviation, this method addresses key limitations of existing approaches, offering a computationally efficient and statistically reliable alternative for skewed distributions. Simulation studies and real-world applications demonstrate that the proposed method consistently improves detection accuracy while maintaining efficiency.

Additionally, we extend this approach to time-dependent data, showing its effectiveness in identifying outliers in time series settings. This extension makes the method applicable to a wide range of fields, including finance, healthcare, and environmental monitoring, where detecting anomalies in structured and evolving datasets is essential.

Keywords: Robust outlier detection, Skewness-adjusted boxplot, Influence function, Median absolute deviation

posters-tuesday: 15

Minimum Area Confidence Set Optimality for Simultaneous Confidence Bands for Percentiles in Linear Regression: An Application to Estimating Shelf Life

Lingjiao Wang¹, Yang Han¹, Wei Liu², Frank Bretz^3,4

¹Department of Mathematics, University of Manchester, UK; ²School of Mathematical Sciences and Southampton Statistical Sciences Research Institute, University of Southampton, UK; ³Novartis Pharma AG, Basel, Switzerland; ⁴Section for Medical Statistics, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Austria

Background: The stability of a drug product over time is a critical property in pharmaceutical development. A key objective in drug stability studies is to estimate the shelf-life of a drug, involving a suitable definition of the true shelf-life and the construction of an appropriate estimate of the true shelf-life. Simultaneous confidence bands (SCBs) for percentiles in linear regression are valuable tools for determining drug shelf-life in drug stability studies.
Methods: In this paper, we propose a novel criterion, the Minimum Area Confidence Set (MACS), for identifying the optimal SCB for percentile regression lines. This criterion focuses on the area of the constrained regions for the newly proposed pivotal quantities, which are generated from the confidence set for the unknown parameters of a SCB. We employ the new pivotal quantities to construct exact SCBs over any finite covariate intervals and use the MACS criterion to compare several SCBs of different forms. Additionally, we introduce a computationally efficient method for calculating the critical constants of exact SCBs for percentile regression lines.
Results: The optimal SCB under the MACS criterion is demonstrated to effectively construct interval estimates of the true shelf-life. The proposed method for calculating critical constants significantly improves computational efficiency. A real-world drug stability dataset is used to illustrate the application and advantages of the proposed approach.

posters-tuesday: 16

One-sided simultaneous tolerance intervals based on kernel density estimates

Gian Louisse Roy

University of the Philippines Diliman

Tolerance intervals are informative tools with wide-ranging applications in various fields, especially in laboratory medicine. They are valuable in medical decision making as they contain a specified proportion of values of the sampled population with high degree of confidence. When several biochemical analytes are measured from patients, simultaneous inference becomes useful. This study proposes nonparametric methods that construct simultaneous tolerance intervals (STIs) under the one-sided case. As most medical data show skewness and come from unknown underlying distributions, the proposed STIs are based on kernel density estimates. The methodologies used are evaluated by examining performance metrics, such as estimated coverage probabilities and expected lengths, and by comparing them with the usual Bonferroni-correction approach (BCA). The proposed methods show accurate results as the said metrics exhibit desirable patterns, with a few exceptions that are further examined and justified. These methods also address a spurious behavior that BCA results tend to display. The proposed one-sided nonparametric STIs are generally favourable than the ones from BCA and can be improved through recommended future work that are laid out.

posters-tuesday: 17

Robust large-scale multiple testing for hidden Markov random field model

Donghwan Lee¹, Jiyn Sun²

¹Department of Statistics, Ewha Womans University, Republic of Korea; ²Integrated Biostatistics Branch, Division of Cancer Data Science, National Cancer Center, Republic of Korea

The hidden Markov random field model (HMRF), as an effective model to describe the local dependence of two or three-dimensional image data, has been successfully applied to large-scale multiple testing of correlated data, image segmentation, graph discovery, and so on. Given the unobservable random field, the emission probability (conditional distribution of observations) is usually assumed to be known, and the Gaussian distribution is frequently used. To achieve robustness, we introduce a novel framework for large-scale multiple testing when the emission probability distribution of HMRF is unknown or misspecified. We build the inferential procedure for estimating parameters and the false discovery rate (FDR) based on a quadratically convergent method for computing non-parametric maximum likelihood estimates of a mixing distribution. Furthermore, we integrate latent variable modeling with the knockoff filter method to improve FDR control in testing. The proposed method is validated by simulation studies, which show that it outperforms the other existing methods in terms of FDR validity and power. A real data example for neuroimaging is illustrated to demonstrate the utility of the proposed procedure.

posters-tuesday: 18

Model informed assurance approach for 3-way PK similarity studies

Rachid El Galta, Roland Baumgartner

Sandoz, Germany

In the absence of actual data, published pharmacokinetic (PK) models can simulate subjects' PK profiles to estimate geometric mean ratios and coefficient of variations for parameters like AUC and Cmax. These estimates can be used to inform sample size calculations for PK similarity studies. However, the accuracy depends on the quality of the PK model and input parameters. Ignoring uncertainty can lead to underpowered studies.

To address this, we use an assurance approach alongside power calculations. This involves simulating PK profiles with a published PK model, considering parameter uncertainty by sampling from a multivariate normal distribution. We generate multiple parameter sets, simulate PK profiles by treatment arm for each, and perform equivalence testing. Assurance is the proportion of successful equivalence tests.

Combining assurance with traditional power calculations provides a more comprehensive assessment of sample size considerations.

posters-tuesday: 19

Korea Sequence Read Archive (KRA) - A public repository for archiving raw sequence data

JAEHO LEE

KRIBB, Korea, Republic of (South Korea)

The Korea Sequence Read Archive (KRA; https://kbds.re.kr/KRA) is a publicly availble repository of high throughput sequencing data as a part of the Korea BioData Station (K-BDS; https://kbds.re.kr/) database. KRA collects and provides key nucleotide sequence data, including files in FASTQ or FASTA format and rich metadata generated by various NGS technologies. The primary objective of the KRA is to support and promote the use of nucleotide sequencing as an experimental research platform. It achieves this by offering comprehensive services for data submission, archiving, searching, and downloading. Recently, the existing collaboration with DDBJ has been further strengthened to establish close cooperation with INSDC. As a result, KRA now supports data submission to INSDC via DDBJ DRA, and through enhanced browser functionalities, users can search and download data more efficiently. By ensuring the long-term preservation and accessibility of nucleotide sequence data and through continuous development and improvements, KRA remains an important resource for researchers utilizing nucleotide sequence analysis data. KRA is available at https://kbds.re.kr/KRA.

posters-tuesday: 20

Integrative analysis of transcriptomic and epigenomic dynamics of liver organoids using single cell RNA-seq and ATAC-seq

Kwang Hoon Cho, Jong-Hwan Kim, Jimin Kim, Jahyun Yun, Dayeon Kang

Korea Research Institute of Bioscience and Biotechnology, Korea, Republic of (South Korea)

Previously, we developed a novel method to generate functionally mature human hepatic organoids derived from pluripotent stem cells (PSCs), and their maturation was validated through bulk RNA sequencing (RNA-seq). In this study, we aimed to characterize the heterogeneity and dynamic changes in the transcriptome and epigenome at the single-cell level. To achieve this, we employed single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) using the 10x Chromium platform.

Hepatic organoids were cultured under two distinct medium conditions: hepatic medium (HM) and differentiation medium (DM). A total of 39,310 and 36,940 individual cells were analyzed using scRNA-seq and scATAC-seq, respectively. To validate our findings, we compared our data with publicly available RNA-seq datasets from liver organoids and liver tissues at various stages of differentiation, including induced pluripotent stem cells (iPSCs), DM-treated cells, primary human hepatocytes (PHHs), and adult liver tissues.

Our analysis revealed that cells clustered into 10 to 11 distinct subpopulations, representing different developmental stages in both scRNA-seq and scATAC-seq datasets. Furthermore, integrative analysis of scRNA-seq and scATAC-seq data identified coordinated changes in gene expression and chromatin accessibility near key liver differentiation marker genes. These findings indicate that hepatic organoids cultured under HM and DM conditions consist of heterogeneous cell populations spanning multiple stages of hepatic differentiation.

In conclusion, single-cell transcriptomic and epigenomic profiling provided insights into the cellular diversity and developmental trajectory within hepatic organoids. This study highlights the utility of scRNA-seq and scATAC-seq in elucidating the molecular dynamics underlying liver differentiation and maturation.

posters-tuesday: 21

Leveraging tumor imaging compositional data structure in model feature space for predicting recurrence in colorectal carcinoma

Olivia J Bobek, Nicholas Larson, Rish K Pai, Fang-Shu Ou

Mayo Clinic, United States of America

Background/Introduction:

The quantitative segmentation algorithm QuantCRC extracts morphologic features of digitized H&E slides in colorectal carcinoma (CRC), quantitatively decomposing the tumor bed area into stroma and stromal subtypes, necrosis, and tumor components. These features have previously been incorporated as linear predictors in a LASSO regularized regression model for cancer recurrence in a cancer registry study. However, as compositional data, representing these features as simple proportions may not maximize their informativeness for prediction. Likewise, algorithms based on linear predictors may fail to account for more complex relationships between compositional features and outcome. The objective of this research was to investigate how commonly used log-ratio transformations for compositional data impact QuantCRC-based prognostic modeling performance as well as assess competing machine learning algorithms that may offer benefits for compositional feature spaces.

Methods:

The study cohort consisted of 2411 CRC patients from the Colon Cancer Family Registry. The outcome of interest was recurrence-free survival, measured as time from surgery to recurrence or last follow-up. The original LASSO model included 15 QuantCRC features, tumor stage (I-IV) and mismatch repair status (deficient vs. proficient). The proposed model feature space included the additive log-ratio transformations of the composition variables in addition to the clinical variables, yielding 34 features total. In addition to LASSO, elastic net and gradient boosting machine (GBM) algorithms were also applied using the log-ratio feature set. Training was performed using 10-fold cross validation on 80% (n=1928) and tested on 20% (n=483) of the data. Harrell’s C-index was used to assess discrimination.

Results:

On the training set, the original LASSO produced a Harrell’s C-index of 0.697 (bootstrapped 95% Confidence Interval (CI): 0.672, 0.723) and the LASSO with log-ratio features produced a C-index of 0.703 (95% CI: 0.679, 0.729). The C-index for the elastic net and GBM was 0.704 (95% CI: 0.677, 0.731) and 0.719 (95% CI: 0.692, 0.744) respectively. In the test data, the LASSO with the log-ratio transformation produced a slightly improved C-index: 0.701 (95% CI: 0.650, 0.746) compared to the original features (0.697 (95% CI: 0.646, 0.743)). The elastic net resulted in a C-index of 0.703 (95% CI: 0.653, 0.749) and GBM produced a C-index of 0.702 (95% CI: 0.647, 0.751).

Conclusion:

The additive log-ratio transformation is a compositional data representation to consider for predictive models. In this application, feature engineering based on compositional structure slightly improved model performance. All algorithms with compositional data features demonstrated comparable model discrimination.

posters-tuesday: 22

BayesPIM: A Bayesian Prevalence-Incidence Mixture Model for Screening Outcomes, with an Application to Colorectal Cancer

Thomas Klausch, Birgit Lissenberg-Witte, Veerle Coupé

Amsterdam University Medical Center

Background

Screening programs for diseases, such as colorectal cancer (CRC), involve inviting individuals in regular or irregular intervals for a test, such as the Fecal immunochemical test (FIT) or a colonoscopy. The resulting data can be analyzed to obtain the time to (pre-state) disease which, when additionally regressed on covariates, such as age and gender, is informative on risk heterogeneity. Such information helps decide whether screening intervals should be personalized to identified risk factors.

We present the R package BayesPIM – Bayesian prevalence-incidence mixture model – which is particularly suited in settings where individuals are periodically tested (interval censoring), have the disease at baseline (prevalence), baseline tests may be missing, and the screening test has imperfect sensitivity. We motivate the model using data from high-risk familial CRC surveillance through colonoscopy, where adenomas, precursors of CRC, are the primary target of screening. Besides demonstrating the functionalities of BayesPIM, we also show how to evaluate model performance using simulations based on the real-world CRC data.

Methods

BayesPIM models the interval-censored time to incidence via an accelerated failure time model while handling latent prevalence, imperfect test sensitivity, and covariate data. Internally, a Metropolis-within-Gibbs sampler and data augmentation is used, implemented through an Rcpp backend. A user-friendly R interface is available. Model fit can be assessed using information criteria and validated against a non-parametric estimator of cumulative incidence.

Additionally, performance is evaluated by resampling the real-world CRC screening data. Specifically, we set the data-generating model parameters to their estimates and then generate screening times and outcomes that closely resemble those observed in practice via an innovative algorithm. Repeatedly comparing estimates on these resampled datasets to the true values assesses model performance under realistic data conditions.

Results

In the CRC application, baseline prevalence of adenomas was estimated at 27.4% [95% CI: 22.2%, 33.3%], with higher prevalence in males and older individuals. Among those free of adenoma at baseline, incidence reached 20% at five years and 45% at ten years, with older individuals experiencing faster incidence. Resampling simulations based on the CRC data showed that model estimation remained stable if informative priors on test sensitivity were imposed, even at low sensitivity (40%).

Conclusion

BayesPIM offers robust estimation of both prevalence and incidence under complex, real-world screening conditions, including uncertain test sensitivity, latent disease status, and irregular intervals. The model demonstrated stable performance under varying test sensitivities, highlighting its practical value for designing more effective, patient-centered screening programs.

posters-tuesday: 23

Joint Modelling of Random Heterogeneity in Longitudinal and Multiple time-to-Events in Colon Cancer

DIVYA DENNIS, Jagathnath Krishna KM

Regional Cancer Centre, Thiruvananthapuram, Kerala, India, India

Background: In caner survival studies, disease progression can be assessed with longitudinal study designs where the patients are observed over time and the covariates information (biomarkers, carcinoembryonic antigen -CEA) are measured repeatedly during the follow up period. Apart from repeated measured covariates, multiple survival outcomes were observed longitudinally. Also there may exist unobserved random heterogeneity between the survival outcomes. This motivated to derive a joint multi-state frailty model (JMFM) capable of predicting the risk for multiple time-to events simultaneously utilizing the dynamic predictors and random heterogeneity factor, the frailty. The frailty variable was assumed to follow gamma distribution, thus forms the joint multi-state gamma frailty model (JMGFM).

Methodology: For accounting heterogeneity, longitudinal outcome and multiple time-to-events, we derived a JMGFM. The longitudinal sub-model was modeled using linear mixed model and the survival sub-model using multi-state gamma frailty model (MGFM). The latent variable was used to link the longitudinal and multiple time-to-event sub-models. The parameters were estimated using maximum likelihood estimation method. The existing MGFM and the developed model were illustrated using colon cancer patient data. The covariate considered for risk prediction were, composite stage, lymph node involvement, T4, age, sex, PNI, LVE; and CEA as longitudinal outcome.

Results: The study observed that frailty coefficient had a significant impact on predicting the risk at each transition states along with longitudinally measured covariate. So JMGFM were found to be better predictive than MGFM. The JMGFM model is capable of providing dynamic risk prediction simultaneously, which the MGFM cannot. The present study identified that PNI (transition from diagnosed as disease to death), Composite Stage (transition from recurrence to death; transition from metastasis to death and transition from recurrence to metastasis) lymph-node involvement and age along with the longitudinally measured CEA value as significant prognostic factors for predicting the multiple time-to-events based on the proposed JMGFM and also found that for each transition state, the longitudinal observation (CEA) has strong association with corresponding survival events (η ranges from 1.3 to 1.5).

Conclusion: Thus we conclude that the joint multi-state frailty model as a better model for simultaneous dynamic risk prediction of multiple events in presence of random heterogeneity in longitudinal study design.

Keywords: Multi-state model, Joint multi-state model, joint multi-state frailty model, longitudinal sub-model, Colon cancer

posters-tuesday: 24

Refining the Association between BMI, Waist Circumference, and Breast Cancer Risk in Postmenopausal Women using G-formula Method

Somin Jeon^1,2, Boyoung Park^1,3, Junghyun Yoon^1,2

¹Department of Preventive Medicine, Hanyang University College of Medicine, Seoul, Republic of Korea; ²Institute for Health and Society, Hanyang University, Seoul, South Korea; ³Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, Republic of Korea

Purpose. Previous studies have shown an increased risk of postmenopausal breast cancer (BC) in obese women. However, these studies did not focus on longitudinal changes in obesity levels and did not account for time-varying covariates. This study applies the g-formula method to assess how changes in BMI and waist circumference associate with subsequent BC risk.

Methods. Data were obtained from the Korean National Health Insurance Database. We utilized data from the national BC screening, with baseline data including women who underwent screening in 2009-2010. Screening information in the subsequent biennial cycles (2011-2012 until 2019-2020) was examined, and only women with postmenopausal status at baseline and with at least three screenings were included in the analysis. Incident BC cases were ascertained until 2021. We applied the g-formula method to compare BC risk in women who maintained a certain BMI/waist circumference level versus the natural course. Hazard ratios (HRs) were estimated, and the model was adjusted for age, fixed covariates, and time-varying covariates.

Results. Of the 91,092 postmenopausal women, the mean (SD) age was 60.7 (7.5), and the mean (SD) BMI and waist circumference were 24.3 (3.1) and 79.9 (8.1), respectively. Results from the G-formula show that compared to women with a natural course of BMI, those who maintained a normal BMI level (<23) or overweight BMI (23 to <25) had a decreased BC risk (adjusted hazard ratio [aHR] 0.93, 95% CI 0.90 – 0.95, and aHR 0.97, 95% CI 0.96-0.98, respectively). In contrast, those who maintained obese status had an increased BC risk (obese 1, BMI 25 to <27.5, with an aHR of 1.07, and obese 2, BMI ≥27.5, with an aHR of 1.20). A similar pattern was observed in the results for waist circumference.

Conclusions. Results from the g-formula indicate that maintaining a normal BMI or waist circumference is associated with a lower BC risk, while obese women are at an increased risk of postmenopausal breast cancer.

Acknowledgments: This study was funded by the National Research Foundation of Korea (NRF) (grant no. RS-2023-00241942, RS-2024-00462658, and 2021R1A2C1011958).

posters-tuesday: 25

Building cancer risk prediction models by synthesizing national registry and prevention trial data

Oksana Chernova¹, Donna Ankerst^1,2, Ruth M Pfeiffer³

¹Technical University of Munich, Germany; ²Department of Urology, University of Texas Health Science Center at San Antonio, USA; ³Biostatistics Branch, National Cancer Institute, NIH, HHS, Bethesda, USA

Current online United States (US) five-year prostate cancer risk calculators are based on screening trials or databases not calibrated to the heterogeneous US population. They are underpowered for the rarer outcome of high-grade disease, particularly for the subpopulation of African Americans, who are underrepresented in many national trials. The US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) program has monitored state cancer rates since 1973, more recently adding Gleason grade. SEER rates are stratified by five-year age groups and race, filling in statistical power gaps for African Americans. This talk provides the statistical method for integrating SEER incidence and mortality rates with time-to-event data with competing risks from prevention and screening trials following the NCI Breast Cancer Risk Assessment Tool. The methodology allows development of a contemporary 5-year high-grade prostate cancer risk prediction model that is trained from merging individual-participant data from the Selenium and Vitamin E Cancer Prevention Trial (SELECT) with population aggregated data in SEER. Simulation of a contemporary US validation set is performed by merging individual-level data from the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) with SEER.

posters-tuesday: 26

Modelling Individual-level Uncertainty from Missing Data in Personalised Breast Cancer Risk Prediction

Bethan L. White, Lorenzo Ficorella, Xin Yang, Douglas F. Easton, Antonis C. Antoniou

University of Cambridge, United Kingdom

Breast cancer risk prediction models use a range of predictors to estimate an individual’s chance of developing breast cancer in a given timeframe. These can facilitate risk stratification, to identify individuals who would benefit most from screening or preventive options. The BOADICEA breast cancer risk model, implemented in the CanRisk tool (www.canrisk.org), uses genetic, lifestyle, hormonal, family history and anthropometric data to estimate an individual’s risk. When implementing risk prediction models, risk predictor data are often incomplete. Point-estimates calculated when some risk factor data are missing can therefore hide considerable uncertainty.

We developed a methodological approach for quantifying uncertainty and the probability of risk-reclassification in the presence of missing data. We employed Monte Carlo simulation methods to estimate the distribution of breast cancer risk for individuals with missing data, using multiple imputation by chained equations (MICE) with UK Biobank and KARMA as reference datasets to sample missing covariates. We developed a framework for estimating uncertainty, that can be applied to any given individual with missing risk factor data. We used exemplar cases to assess the probability that collecting all missing data would result in a change in risk categorisation, on the basis of the 10-year predicted risk from age 40, using the UK National Institute for Health and Care Excellence (NICE) guidelines.

For example, a woman whose mother and sister have both been previously diagnosed with breast cancer, but with all other personal risk factor information unmeasured, will be categorised as at “moderate risk” by the BOADICEA model, with around a 5% chance of developing breast cancer between the ages of 40 and 50. However, if all remaining risk factor information were measured, our methodology estimates a 52% chance of reclassification to the “population risk” group, and a 5% chance of reclassification to the “high risk” group. Taking into account all missing risk factor information, an estimated 95% uncertainty interval for the risk point estimate would be (0.9%, 9.0%).

These results demonstrate that there can be a considerable likelihood of reclassification into a different risk category after collecting missing data. The methodology presented in this work can identify situations where it would be most beneficial to collect additional patient information before making decisions in clinical settings.

posters-tuesday: 27

Time-varying covariates in Survival Analysis: a graphical approach to assessing the risk of cardiovascular events and aortic valve sclerosis development

Arianna Galotta¹, Francesco Maria Mattio¹, Veronika Myasoedova¹, Elisabetta Salvioni¹, Paolo Poggio^1,3, Piergiuseppe Agostoni^1,2, Alice Bonomi¹

¹Centro Cardiologico Monzino, IRCCS, Milan, Italy; ²Department of Clinical and Community Sciences, University of Milan, Milan, Italy; ³Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy

Background: Survival analysis is essential for studying the time to the occurrence of an event of interest, such as death or the onset of a disease. When covariates change over time, it is crucial to consider these variations to estimate the relationship between exposure and outcome accurately and robustly. While using the Cox model with time-dependent covariates is methodologically appropriate, its graphical representation remains challenging. This study focuses on evaluating the development of aortic valve sclerosis (AVSc) as an exposure condition to the risk of cardiovascular (CV) events, taking into account its progression over time.

Methods: The relative risk of CV events linked to AVSc development was assessed using the Cox proportional hazards model. To generate the survival curves, we applied the method proposed by Simon and Makuch (Schultz et al., 2002). This approach differs from the traditional Kaplan-Meier method, which treats covariates as fixed; the difference imposed by considering a time-dependent covariate is in the interpretation of the risk set. In our case, a time-varying covariate leads to a continuous renewal of risk sets based on the value of the covariate at each time point. Therefore, the risk set includes all individuals at risk just before time t, whose covariate value indicates their membership in the relevant group at that time.

Results: Time-dependent analyses were conducted to evaluate AVSc development as a time-sensitive exposure to CV events. Participants who developed AVSc during the follow-up period were considered unexposed from baseline until the onset of development, after which they were classified as exposed. The hazard ratio related to the AVSc development was then evaluated using the Cox proportional-hazards model. The analysis with the time-dependent covariate approach provided a more detailed understanding of the association between the AVSc development and the risk of CV events over time. The survival curves generated using this method demonstrated that accounting for the time-varying nature of AVSc exposure significantly impacted the prognosis of patients.

Conclusion: This study emphasises the importance of considering time-varying covariates in survival analysis for an accurate risk estimate. Although the Cox model with time-dependent covariates is the correct methodological choice, its graphical representation is complex. The method proposed by Simon and Makuch enhances the traditional Kaplan-Meier approach by allowing the integration of covariates that evolve over time. This is particularly relevant in medical research, where dynamic exposures must be considered to avoid misleading conclusions.

posters-tuesday: 28

Polygenic scores as tools for intervention selection in the setting of finasteride for prostate cancer prevention

Allison Meisner

Fred Hutchinson Cancer Center, United States of America

Background/introduction: Polygenic risk scores (PRS) have been proposed as tools for intervention selection. PRS are weighted combinations of single nucleotide polymorphisms (SNPs) where each SNP is weighted by its association with outcome risk. An alternative approach utilizes predictive polygenic scores (PPS), in which the weight for each SNP corresponds to its association with intervention effect. We compare the utility of PRS and PPS for identifying individuals expected to benefit from finasteride in the prevention of prostate cancer.

Methods: We used data from the Prostate Cancer Prevention Trial (PCPT), a randomized trial of finasteride for prostate cancer prevention. Of the 8,506 men with available genotype data, YY developed prostate cancer. We used the Polygenic Score Catalog to identify a recently developed prostate cancer PRS. We split the data into training (2/3 of the data) and test (1/3 of the data) sets. We constructed three scores, each of which was a combination of 198 SNPs in the PRS published on the Polygenic Score Catalog: (1) a PRS based on the coefficients published in the Polygenic Score Catalog (PRS1), (2) a PRS based on coefficients estimated in the training data via logistic regression (PRS2), and (3) a PPS based on the interaction between each SNP and randomization to finasteride, estimated in the training data via logistic regression. In the test data, we compared the three scores based on the reduction in the rate of prostate cancer when a given score is used for intervention selection.

Results: In the test data, 17.0% of men developed prostate cancer and finasteride was significantly associated with a reduction in risk of prostate cancer; thus, the default setting is to treat all men with finasteride. For PRS1, there was no threshold at which treatment with finasteride would not be recommended; thus, use of PRS1 to guide intervention use would not reduce the rate of prostate cancer. For PRS2, 0.2% of men would not be recommended finasteride, leading to a reduction in the rate of prostate cancer of < 0.001%. Finally, for the PPS, 35.3% of men would not be recommended finasteride, leading to a reduction in the rate of prostate cancer of 3.0% if the PPS were used to guide intervention use.

Conclusion: In this analysis of PCPT data, PPS demonstrated substantially greater clinical utility as tools for intervention selection compared to PRS. PPS should be considered as tools for intervention selection more broadly.

posters-tuesday: 29

Implementation of a Disease Progression Model accounting for covariates

Gabrielle Casimiro, Sofia Kaisaridi, Sophie Tezenas du Montcel

ARAMIS, Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Groupe Hospitalier Sorbonne Université, Paris, France

Introduction: Disease progression models are promising tools for analysing longitudinal data presenting multiple modalities. Such models can be used to estimate long-term disease progression and reconstruct individual trajectories. Inter-patient variability is often modeled as random perturbations around a fixed reference. However, much of this variability is driven by external factors such as genetic mutations, gender, level of education or socio-economic status.

In this work, we extend a non-linear mixed-effects disease progression model (Disease Course Mapping Model), implementing a multivariate logistic framework to explicitly account for covariates. We illustrate the potential of this approach by modelling the evolution of CADASIL disease, the most frequent small artery brain disease caused by pathogenic variants of the NOTCH3 gene, using the genetic mutation location as a covariate.

Methods: A general formulation involves a non-linear mapping η between timepoints and clinical markers, parametrized by fixed effects α (population level) and random effects β_i (individual level): y_i = η_α(t_i | β_i).

The Disease Course Mapping Model applies time reparameterization to realign all individual trajectories into a common timeline, accounting for spatiotemporal variability. To do so, it estimates a population parameter expressing the average disease onset time, enabling direct comparison of features (such as scores or biomarker values measured longitudinally) at this time and identifying the sequence of symptom onset.

To incorporate baseline covariates in the model, the existing paradigm was extended. Instead of estimating a fixed effect α parametrizing the average disease course, we introduced a link function f_φ that can predict an expected trajectory of the disease conditioned by a given set of covariates c_i.

Results: The proposed model has been implemented in the Open-source library Leaspy. Applied to different clinical scores, it reveals significant differences according to the mutation location: patients with the pathogenic variant located in EGFr domains 1-6, previously identified as a determinant of disease severity, showed a faster and more pronounced decline in the Rankin score assessing the severity of disability.

Conclusion: This approach allows us to explicitly model how external factors influence disease progression rather than treating variability as purely stochastic. While the current model incorporates a single binary covariate, future work will focus on extending this framework to handle multiple covariates simultaneously and to integrate continuous variables.

posters-tuesday: 30

Genetics influences LDL-C response to statin therapy: short- and long-term observational study with functional data analysis

Andrea Corbetta^1,2,3, Emanuele Di Angelantonio^1,4, Andrea Ganna³, Francesca Ieva^1,2

¹Human Technopole, Milan, Italy; ²Politecnico Di Milano, Milan, Italy; ³Institute for Molecular Medicine Finland, Helsinki, Finland; ⁴University of Cambridge, Cambridge UK, UK

Introduction: Understanding the genetic basis of lipid-lowering responses to statin therapy may provide critical insights into personalized cardiovascular treatment strategies. This study employs advanced statistical methods to investigate how genetic predisposition, captured through polygenic scores (PGS) for low-density lipoprotein cholesterol (LDL-C), influences short-term and long-term changes in LDL-C levels following statin initiation.

Methods: We utilized data from the FinnGen cohort, focusing on LDL-C measurements in two distinct groups: (1) a short-term group of 11,343 individuals with LDL-C measurements recorded within one year before and after initiating statin therapy and (2) a long-term group of 15,864 individuals who had maintained statin therapy for a minimum of five years. The LDL-C trajectories were modelled as functional objects, allowing us to apply functional principal components analysis (FPCA) to identify independent patterns of LDL-C response.

In the short-term group, we modelled the absolute and relative reduction of LDL-C using linear regression models with PGS as a predictor. In the long-term group, we analyzed the first two FPCA components: the first principal component (PC1) representing the baseline LDL-C level (mean pattern) and the second principal component (PC2) capturing the LDL-C reduction pattern. Genome-wide association studies (GWAS) were conducted to identify genetic variants associated with these phenotypic patterns, applying stringent Bonferroni correction for multiple testing.

Results: We observed that individuals in the highest PGS tertile experienced a greater absolute LDL-C reduction in the first year after statin initiation, with a mean reduction of 8.12 mg/dL (95% CI: 6.93–9.57) compared to the lowest tertile. However, this group demonstrated a smaller relative reduction of 1.81% (95% CI: 0.06–2.99). In the long-term group, higher PGS was associated with elevated LDL-C levels over five years but no significant association was found between PGS and LDL-C change patterns. The GWAS identified significant genome-wide loci for relative LDL-C reduction and baseline LDL-C levels, with lead variants near genes previously implicated in lipid metabolism.

Conclusion: Our findings suggest that short-term LDL-C response exhibits a genetic basis strongly linked to baseline LDL-C regulation. In contrast, long-term LDL-C changes appear predominantly influenced by non-genetic factors such as adherence. Nonetheless, individuals with higher LDL-C PGS consistently maintain higher LDL-C levels over extended periods. These results underscore the complex genetic architecture of LDL-C response to statins and highlight the utility of FPCA in characterizing dynamic lipid trajectories.

posters-tuesday: 31

Longitudinal analysis of imprecise disease status using latent Markov models: application to Italian Thyroid Cancer Observatory data

Silvia D'Elia¹, Marco Alfò¹, Maria Francesca Marino²

¹Sapienza University of Rome (Italy); ²University of Florence (Italy)

Background

Longitudinal data are widely used in medicine to monitor patients over time, providing dynamic view of disease progression, and treatment responses. Ordinal scales are often used to measure response to treatment or summarise disease severity.

Methods

Latent Markov (LM) models represent an important class for dealing with ordinal longitudinal data. LM models are based on a latent process which is assumed to follow a Markov chain with a certain number of states (latent states). The state characterises the (conditional) response distribution at each time occasion.

LM model allows to analyse longitudinal data when dealing with:

measurement error
unobserved heterogeneity

Such models estimate transition probabilities between latent states, also including individual covariates¹. Of particular interest is the evaluation of patient dynamics over time as a function of individual covariates (both constant and time-dependent).

Application

A latent Markov model is used to analyse data from the Italian Thyroid Cancer Observatory (ITCO), a database of over 15,000 patients with diagnosis of thyroid cancer, treated in different clinical centres in Italy. Despite the high survival rate, the risk of recurrence remains significant, and long-term monitoring is needed to detect recurrence early and maintain appropriate therapies². The study aims to monitor and assess the effectiveness of the response to treatment over the rime, trying to identify factors that predict true disease status.

Patients are monitored prospectively from the date of surgery, with follow-up visits at 12, 36, and 60 months. At each follow-up visit, the response to treatment is assessed through clinical, biochemical, and imaging findings and response is classified into 4 categories: excellent (ER, no evidence of disease), indeterminate (IND), biochemical incomplete (BIR) and structural incomplete (SIR, evidence of disease). However, this classification has limitations: categories synthesise multiple measurements prone to error, are influenced by unobserved factors, and the disease status (evidence vs. no evidence of disease) is not directly observable due to the presence of ambiguous categories (IND, BIR).

While around 50% of patients show clearly no evidence of disease at any time point, 30–40% fall into indeterminacy area.

Conclusion

Latent Markov models may lead to a better understanding of patients' clinical trajectories depicting a more accurate picture of patients' dynamics, considering variables that may influence transitions between states.

¹Bartolucci et al., Latent Markov Models for Longitudinal Data, 2012.

²Haugen et al., 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer, 2016.

posters-tuesday: 32

Long-term risk prediction from short-term data – a microsimulation approach

Moritz Pamminger, Theresa Ullmann, Moritz Madern, Daniela Dunkler, Georg Heinze

Medical University of Vienna, Austria

Background
In medical research, long-term risk prediction is often desirable, e.g. to predict the risk of a cardiovascular or other health event within the next 30 years. To estimate such a prediction model requires data with a long enough follow-up. Such data are rarely available and may be outdated. Our aim is to develop and evaluate methods to harness contemporary data for long-term predictions.

Methods
We assume longitudinal data with 2-5 possibly irregular measurements of 5-20 prognostic factors (e.g. cholesterol, blood pressure etc.) per individual over a 5-years period and associated survival outcomes. We present a microsimulation-based strategy to obtain predictions of survival and of trajectories of the prognostic factors over a long-term prediction horizon of 20-30 years.

First, we trained models using the current values of prognostic factors to predict subsequent measurements and the event status with a short-term prediction horizon of 1-2 years. Starting with individual-specific initial values of the prognostic factors, these short-term models were then applied to generate follow-up values of the prognostic factors and of the survival state as draws from the respective predictive distributions. These values serve as new baseline for the next prediction-and-generation step. Iteration proceeds until an event is predicted or the long-term prediction horizon is reached. For each individual multiple (e.g. 1.000) trajectories for prognostic factors and the survival process are generated, which can be suitably summarized.

We validated the approach using various synthetic datasets for which long-term follow-up was available. We artificially censored these datasets to mimic data with short-term follow-up, which we used to train our models. Then we applied the microsimulation approach to make long-term predictions and compared the predicted outcomes with the observed ones in the training set. We also validated predictions in an independent test set.

Results
The approach was implemented in an R package for convenient application in various situations. The package provides flexible options to specify short-term models. It can perform predictions for individuals, efficiently processing entire datasets, and present results with appropriate graphical summaries.

Conclusion
Despite some limitations, the method effectively handles irregular time intervals in the training data and allows capturing nonlinear and interaction effects for prognostic factors and survival. It provides analysts with a flexible tool for long-term prognosis across various fields and in the future may provide a practically useful framework for individual long-term prognosis at routine health screenings. This work was supported through the FWF project P-36727-N.

posters-tuesday: 33

Deep learning algorithm for dynamic survival prediction with competitive risks

Tristan Margaté^1,2,3, Marine Zulian¹, Agathe Guilloux², Sandrine Katsahian^2,3,4

¹Healthcare and Life Sciences Research, Dassault Systemes, France; ²HeKa team, INRIA, Paris, France; ³Université Paris Cité, France; ⁴URC HEGP, APHP Paris

Background:

The medical follow-up of a patient suffering from cancer is spread over time, making it possible to obtain repeated measurements that allow to consider the progression of the disease state over time. The development of a prognostic solution requires the ability to update predictions of the occurrence of clinical events over time according to new measurements, i.e., to make dynamic predictions.
In oncology, patients can face appearance of metastasis, other diseases due to comorbidities or death. It can be useful to predict which of these event s will occur first; this is referred in the literature as competing risks. We aim to develop new methodologies capable of considering longitudinal data for predicting competing risks.

Methods:
Various Deep Learning algorithms have recently been developed to consider longitudinal survival data and competing risks [1] [2]. However, as they consider the entirety of the patient's available longitudinal data to create a time-independent static embedding, they suffer from bias when used to predict survival over the patient's follow-up interval. Recent methodologies use a progressive approach to integrate patient’s data [3], allowing to get an embedding of features that varies over time. It shows superior results in a context of classical survival analysis.
We have chosen to use this type of methodology, modifying the method used to create the embedding of longitudinal data, and to extend it to competing survival setting.
In addition, we have developed a new simulation scheme to obtain synthetic longitudinal survival data with competitive risks.

Results/Conclusion:

We will present results on different approaches to consider both longitudinal and survival data and their limitations in order to produce unbiased predictions. We compare our algorithm with existing algorithms on simulated data and a subset of real-world data from the Framingham Heart Study whose aim was to study the etiology of cardiovascular disease.

References:
[1] Lee, C. et al. (2019). Dynamic-deephit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Transactions on Biomedical Engineering
[2] Moon, I. et al. (2022). SurvLatent ODE: A Neural ODE based time-to-event model with competing risks for longitudinal data improves cancer-associated Venous Thromboembolism (VTE) prediction. In Machine Learning for Healthcare Conference
[3] Bleistein, L et al. (2024). Dynamical Survival Analysis with Controlled Latent States. arXiv preprint arXiv:2401.17077.

posters-tuesday: 34

Identifying Cutoff in Predictors in Survival Analysis: An Ensemble Strategy for Flexible Knot Selection

Stefania Lando, Giulia Lorenzoni, Dario Gregori

Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padova, Italy

Background

Restricted cubic splines (RCS) are widely used in Cox proportional hazards models to capture nonlinear relationship between a continuous biomarker and patient outcomes. Traditional approaches for knot selection often rely on predefined quantiles (e.g., 5th, 35th, 65th, and 95th percentiles), normally arbitrarily chosen, or a fixed number of knots systematically placed across the biomarker range. All such strategies present certain limitations: quantile-based methods provide stability and reproducibility but may oversimplify underlying nonlinear relationships, whereas fixed knots are at risk of overlooking variations in the data.

Methods

Our study explores an ensemble methodology that seeks to put together the robustness of quantile-based knot placement with the flexibility of data-driven strategies, aiming to provide robust knot placement while preserving clinical meaning in cutoff determination. The core idea is to maintain the intuitive simplicity of quantile-based knots while introducing a selective tuning mechanism—guided by cross-validation—to refine their placement. In parallel, our approach incorporates time-dependent ROC analysis to identify clinically relevant cutoffs for risk stratification at a chosen time horizon.

Applications

The proposed methodology can be applied in various clinical and epidemiological settings where risk stratification based on continuous biomarkers is essential. Examples include oncology for identifying prognostic thresholds in tumor markers, cardiology for refining cardiovascular risk scores, and infectious disease modeling for determining severity cutoffs. Additionally, this approach can be extended to precision medicine, where patient subgroups with distinct risk profiles can be identified for targeted interventions.

Conclusion

By integrating flexible knot placement with an ensemble-based cutoff strategy, the method enhances the adaptability of spline-based Cox models while preserving clinical relevance.

posters-tuesday: 35

Estimating Quality Adjusted Life Years (QALYs) from a joint modeling framework: a simulation-based study

Vincent Bonnemains¹, Yohann Foucher², Philippe Tessier¹, Etienne Dantan¹

¹1. Nantes Université, Univ Tours, CHU Nantes, INSERM, MethodS in Patients-centered outcomes and HEalth Research, SPHERE, F-44000 Nantes, France; ²2. INSERM, CIC-1402, Centre Hospitalier Universitaire de Poitiers, Université de Poitiers, Poitiers, France.

Background. Clinical trials investigators often chose patient survival as the primary outcome.

When health-related quality of life (HRQoL) outcomes are considered, they are usually analyzed secondarily and separately from the survival outcome, precluding the consideration of potential trade-offs between them. In contrast, Quality-Adjusted Life Years (QALYs) are a composite outcome that allows the two stakes to be considered simultaneously by weighting years of life by HRQoL indexes (utility scores) that reflect individual preferences. Hence, QALYs could be a practical primary outcome for assessing treatments benefit.

However, the estimation of QALYs usually relies on non-parametric approaches suffering several methodological pitfalls. This work aims to propose a sounder method for estimating QALYs using the shared random effects joint modelling framework.

Methods. We developed a shared random-effect joint model, the longitudinal utility scores being considered through a mixed beta regression and the time-to-death through a proportional hazard Weibull model. We then proposed a method for estimating QALYs using this model. We compared its performances with the commonly used non-parametric method through a simulation study.

We simulated a wide range of clinical trials considering the presence and absence of treatment effect, 200 and 500 included patients, one and two utility score measurements per patient per year for three years, and two censoring rates: 0% and 30% at a three-year horizon. We also considered different data generation mechanisms resulting in well-specified or misspecified models. For each scenario, we simulated 1000 data samples. The treatment effect was estimated in terms of QALYs at a three-year horizon.

Results. Our proposed method provided unbiased estimates of QALYs and significant improvements over the non-parametric approach when the joint model was well-specified. This was particularly the case when a low number of repeated utility measurements per patient or a high censoring rate was simulated. The two methods performed poorly when simulating the risk of event with non-proportional hazards.

Conclusions. We proposed a method based on joint modeling for estimating QALYs. We reported accurate estimations for clinical trials with moderate sizes when the model is well specified. However, we found the estimations to be sensitive to model misspecification. We are working to develop additional modelling tools and deliver an R package that will allow users to accurately estimate QALYs in a wide range of situations. We hope this will encourage a larger use of QALYs in clinical trials and better consideration of patients’ preferences in medical decision-making.

posters-tuesday: 36

An Alternative Estimand for Overall Survival in the Presence of Treatment Discontinuation: Simulation Results and Case Study

Kara-Louise Royle¹, David Meads², Jennifer Visser-Rogers³, David A Cairns¹, Ian R White⁴

¹Leeds Cancer Research UK Clinical Trials Unit, Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK; ²Academic Unit of Health Economics, Leeds Institute of Health Sciences, University of Leeds, Leeds, UK; ³Coronado Research, Kent, England; ⁴MRC Clinical Trials Unit at UCL, London, UK

Introduction

Overall survival (OS) is a definitive endpoint for clinical effectiveness in cancer clinical trials. However, intercurrent events, like treatment discontinuation, can affect its interpretation.

A recent literature review concluded that treatment discontinuation and the uptake of subsequent anti-cancer treatment is often considered part of the treatment strategy i.e. researchers follow the “Treatment Policy” approach.

Our objective was to investigate the novel alternative hypothetical estimand: What is the effect on OS of the experimental trial treatment versus the control treatment, if all participants who discontinued prior to death received the same subsequent treatment?

Methods

Statistical techniques, including simple intention-to-treat (ITT) and per protocol (PP) methods and more complex two-stage and inverse proportional censoring weighting (IPCW) methods, were applied in a simulation study. The data-generating mechanism simulated a two-arm randomised controlled trial dataset of 700 participants and three stratification factors. Observed and unobserved variables were simulated at baseline and follow-up timepoints. At each follow-up timepoint, some participants were simulated to discontinue and start one of two (A or B) subsequent treatments. Eleven different scenarios were considered, including varying the true experimental treatment effect and timing of treatment discontinuation. The estimand of interest was the hazard ratio and 95% confidence interval of the experimental vs control arms if everyone who discontinued had the same subsequent treatment (A rather than B). The methods were evaluated in terms of bias, coverage, and power, calculated across 1000 repetitions.

Results

The ITT method was biased across all scenarios, but mostly had adequate power and coverage. The PP methods were biased with poor coverage in all scenarios. The two-stage methods were unbiased and had adequate power and coverage in almost all scenarios. The IPCW methods’ performance fluctuated the most across the scenarios.

Discussion

The simulation study found that the estimand could be estimated, with varying levels of performance, by all implemented methods. Overall, the two-stage method was the most consistently accurate method across the scenarios. The practicability of estimating the hypothetical estimand using the two-stage method in practice will be assessed through a real clinical trial case study, presented at the meeting. The trial was chosen as second-line immunotherapy was introduced during trial follow-up. As more effective treatments are developed, this is likely to be a common scenario. We will discuss the generalisability of the hypothetical estimand, how it improves the interpretation of clinical trial results and the necessary considerations when analysing OS in such situations.

posters-tuesday: 37

Corrections of confidence interval for differences in restricted mean survival times in clinical trials with small sample sizes

Hiroya Hashimoto¹, Akiko Kada^2,1

¹NHO Nagoya Medical Center, Japan; ²Fujita Health University, Japan

Background / Introduction

In recent years, restricted mean survival time (RMST) has been used as a measure to demonstrate the difference in efficacy between treatment groups in time-to-event outcomes, especially when the proportional hazards assumption does not hold. However, statistical tests and interval estimations based on asymptotic normality may deviate from the normal distribution when the sample size is small, leading to an inflation of the type I error rate. In this presentation, we discuss the correction of confidence intervals for between-group differences in RMST.

Methods

Under the condition that the survival functions of two groups follow the same Weibull distribution and the censoring functions follow a uniform distribution, we conducted a simulation analysis in a two-group comparative study with fewer than 100 subjects per group under various scenarios. We examined the following methods:

(1) A method based on asymptotic normality,

(2) A method that applies bias correction to the standard error of Method (1), specifically, multiplying the standard error for each group by √{m_i/(m_i-1)}, where m_i is the number of events in group i,

(3) A method that lies between Methods (1) and (2), specifically, multiplying the standard error for each group by √{m_i/(m_i-0.5)}.

Results

As expected, Method (1) had the highest Type I error rate in all scenarios considered, followed by Method (3), and Method (2) had the lowest. In the uncensored situation, Method (2) was generally the most appropriate, and Method (3) was optimal when the event rate was low (S(τ)=0.7). Method (2) also tended to be too conservative as the censoring rate increased, and this was more pronounced for smaller sample sizes.

Conclusions

Method (3) produces better results when events occur less frequently. Method (2) yields conservative results, but caution should be exercised because it is too conservative in situations with small sample sizes and high censoring. When the sample size per group exceeds 100, the difference between methods is negligible.

posters-tuesday: 38

Advantages and pitfalls of a multi-centre register collecting long-term real-world data on medical devices: Insights from a cochlear implant registry

Karin A. Koinig, Magdalena Breu, Jasmine Rinnofner, Stefano Morettini, Ilona Anderson

MED-EL Medical Electronics, Austria

Background

There is a need for real world data (RWD) to demonstrate how medical devices function outside the setting of clinical studies and over longer time periods. One way to address this, is to establish registries collecting data from routine clinical visits. Here we present our experience from evaluating pre-surgery to two years post-surgery data from a multicentre cochlear implant registry.

Methods:

Data were extracted in anonymized form from a registry covering five clinics. The medical devices studied were cochlear implants, which help individuals with severe to profound sensorineural hearing loss (deafness) to regain their hearing. Key outcomes included speech perception, wearing time of the implant, self-perceived auditory benefit, self-reported quality of life, and safety results.

Results

The registry provided extensive data but revealed differences in clinical practices, which made summarizing data across different assessments a challenge. Not all clinics collected the same information, although a minimal measurement data set was specified in the registry protocol. For example, the methods used to assess speech perception varied between centres, including differences in noise levels and test formats. In addition, we observed a high dropout rate, which represents a possible bias: Particularly at long-term follow-up visits, those with more problems seemed more likely to return to the clinic, while those with fewer problems were more likely to be adequately cared for by the outpatient clinics and therefore more likely to be lost to follow-up. Overall, this resulted in a substantial amount of missing data, which was difficult to explain to regulatory bodies like the FDA and TÜV. To address this issue, we presented demographics and outcomes with and without the patients lost to follow-up.

Conclusion

RWD are valuable but pose a challenge when collected in routine clinical practice, as the diversity of assessments and tests leads to different reporting standards and data gaps that make it difficult to obtain homogeneous and usable data. Statisticians must work with the study team to develop clear and transparent strategies for data collection and data extraction to achieve consistent and reliable results from registries.

posters-tuesday: 39

Development and validation of prognostic models in phase I oncology clinical trials

Maria Lee Alcober^1,2, Guillermo Villacampa¹, Klaus Langohr²

¹Statistics Unit, Vall d'Hebron Institute of Oncology (Spain); ²Department of Statistics and Operations Research, Universitat Politècnica de Catalunya (Spain)

Phase I trials are an essential part of the development of oncology research. For patients, balancing the potential risks of toxicity against the benefits of investigational drugs is crucial. Consequently, participation in phase I trials requires a minimum life expectancy and the absence of relevant symptoms. However, in clinical practice, no objective measures are used to evaluate these criteria, and decisions rely on subjective judgment. Considering this background, this study aims to use different statistical methods to develop and validate prognostic models to better identify oncology patients who may benefit from early-phase clinical trials.

A total of 921 patients treated at the Vall d’Hebron Institute of Oncology from January 2011 to November 2024 were included in this study (799 in the development cohort and 122 in the validation cohort). Different strategies were used to develop the prognostic models: i) stratified Cox's proportional hazards models, ii) stratified Cox models enhanced with restricted cubic splines to address non-linearity, and iii) machine learning techniques such as decision trees and random survival forests to capture complex interactions. Risk scores derived from these models provide interpretable summaries of patient risk profiles, facilitating practical clinical use.

Results were validated using i) internal validation employing bootstrapping and cross-validation and ii) external validation using an independent dataset. Model performance was evaluated through discrimination (C-statistic), calibration (calibration plots and the Hosmer-Lemeshow test), overall performance (Brier score), and clinical utility (decision curve analysis).

Internal validation consistently outperformed external validation across all performance metrics, particularly in calibration and clinical utility. Among the models, random survival forests achieved the highest C-statistic, demonstrating superior discrimination. Conversely, incorporating restricted cubic splines into the Cox's proportional hazards model did not notably improve the evaluated metrics.

This work offers a replicable framework for deriving and validating risk scores that improve precision in patient selection for phase I trials. Future efforts will focus on formalising calibration methods and comparing these models and scores with other published prognostic tools using external validation.

posters-tuesday: 40

Application of Bayesian surrogacy models to select primary endpoint in phase 2 based on relationship to a phase 3 endpoint

Alexandra Jauhiainen¹, Enti Spata², Patrick Darken³, Carla A. Da Silva⁴

¹R&I Biometrics and Statistical Innovation, Late Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden; ²R&I Biometrics and Statistical Innovation, Late Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK; ³R&I Biometrics and Statistical Innovation, Late Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, US; ⁴Early Respiratory and Immunology Clinical Development, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden

Background

A key goal of treatment in asthma is to prevent episodes of severe symptom worsening called exacerbations. Designing trials for these relatively rare events is a challenge, especially in the early phases of development of new therapies, as the studies tend to be large and lengthy. Hence, exacerbations are not usually studied as a primary endpoint until phase 3. Alternative endpoints to use in early phase trials of shorter duration can be lung function measurements like FEV₁, or the novel endpoint CompEx, which is a composite endpoint enriching exacerbations by adding events defined from deteriorations in diary card variables.

Methods

All three endpoints; FEV₁, CompEx, and exacerbations; were evaluated using patient level data across a set of 14 trials with 27 treatment comparisons. FEV₁ was analysed as change from baseline, while CompEx and exacerbations were modelled both in a time-to-first and recurrent event setting, across two timeframes (3- and 12-months duration).

Bayesian bivariate random-effect meta-analysis was applied to estimate the total correlation for treatment effects for FEV₁ and CompEx with exacerbations. Bayesian surrogacy analysis within the Daniels & Hughes framework was applied across treatment comparisons to evaluate the trial-level relationship between CompEx and exacerbations.

Results

The change from baseline in FEV₁ at 3 months had a weak correlation with the preferred phase 3 endpoint, the rate ratio for exacerbations at 12 months, and showed limitations in its ability to quantify the effect reported on exacerbations across drug modalities.

In contrast, the CompEx hazard ratio at 3 months correlated well with the 12-month rate ratio observed on exacerbations. CompEx was confirmed as a surrogate in terms of predicting treatment effects observed on exacerbations, with a high level of correspondence between the endpoints across modalities and asthma severities.

Conclusion

FEV1 remains an important respiratory endpoint, especially for drugs with bronchodilating properties, but has limitations as a primary phase 2 endpoint across modalities when the aim is to target exacerbations in phase 3.

CompEx has an increased event frequency compared to exacerbations alone, especially noticeable in populations with low exacerbation rates (mild/moderate asthma). This makes CompEx an attractive endpoint to use in design of early phase trials across a range of modalities, especially towards the milder spectrum of disease, substantially reducing sample sizes needed.

This research was funded by AstraZeneca.

posters-tuesday: 41

Discontinuation and attrition rates in phase II or phase III first-line randomized clinical trials (RCTs) of solid tumors

Virginia Delucchi¹, Chiara Molinelli², Luca Arecco², Andrea Boutros³, Davide Soldato², Matteo Lambertini^2,3, Dario Trapani^4,5, Bishal Gyawali⁶, Gabe S Sonke⁷, Sarah R Brown⁸, Mattew R Sydes^9,10, Luca Boni¹, Saskia Litiere¹¹, Eva Blondeaux¹

¹U.O. Epidemiologia Clinica, IRCCS Ospedale Policlinico San Martino, Genova, Italy; ²U.O.C. Clinica di Oncologia Medica, IRCCS Ospedale Policlinico San Martino, Genova, Italy; ³Department of Internal Medicine and Medical Specialties, University of Genova, Genova, Italy; ⁴Division of New Drugs and Early Drug Development for Innovative Therapies, European Institute of Oncology, IRCCS, Milan 20141, Italy; ⁵Department of Oncology and Hemato-Oncology, University of Milan, Milan 20122, Italy; ⁶Division of Cancer Care and Epidemiology, Cancer Research Institute, Queen's University, Kingston, ON, Canada; ⁷Division of Medical Oncology, Netherlands Cancer Institute, Amsterdam, the Netherlands; ⁸Leeds Cancer Research UK Clinical Trials Unit, University of Leeds, Leeds, UK; ⁹BHF Data Science Centre, HDR UK, London, UK; ¹⁰Data for R&D, Transformation Directorate, NHS England, London, UK; ¹¹EORTC Headquarters, Brussels, Belgium

Background

Differential discontinuation and attrition rates in randomized controlled trials (RCTs) bias efficacy assessments, potentially leading to misinterpretations of treatment effects. Despite their critical role, the extent and implications of these rates in cancer trials remain unclear. We aimed to systematically quantify discontinuation and attrition rates in RCTs of solid tumors and how variation in these rates might impact the estimated treatment effect on overall survival.

Methods

A systematic review of published literature was carried out to identify phase II or phase III RCTs of first-line treatments published from Jan‑2015 to Feb‑2024 of solid tumors in Medline. Reported treatment discontinuation and post-study treatments figures were extracted from CONSORT diagram and/or text. Attrition was computed as the percentage of patients reported as discontinuing study drugs for whom a post-study treatment was not documented. We investigated differences in discontinuation and attrition rates according to type of cancer, sponsor and trial phase. Discontinuation and attrition by treatment arm were not reported due to potential influence of experimental treatment on progression. Simulations evaluating the impact of different discontinuation and attrition rates on overall survival will be implemented and presented at the congress.

Results

Out of 22,141 records screened, 533 trials met the inclusion criteria. The majority (56%) were phase III, industry-sponsored (54%) trials; 126 (24%) trials enrolled patients with non-small cell lung cancer, 79 (15%) breast cancer, 53 (10%) colorectal cancer, 40 (8%) other gastrointestinal cancers, 29 (5%) melanoma, 28 (5%) pancreatic cancer and 178 other tumor types. Treatment discontinuation figures were reported in 415 (78%) trials, with a median patient discontinuation rate of 83%. No difference in the patient’ treatment discontinuation rate was observed according to sponsor and trial phase. Among the 415 trials reporting patient’ treatment discontinuation, data on any post-study treatment was reported in 220 (53%) trials. Median patient attrition rate was 37%. The highest median patient attrition rate was observed for urothelial cancer trials (53%) and the lowest for breast cancer trials (28%). Industry-sponsored trials reported a higher median patient attrition rate than academic trials (38% vs 26%, respectively). No difference in patient attrition rate was observed between phase II and phase III trials.

Conclusions

Although most cancer trials published on treatment discontinuation rates, post-study treatments were less frequently documented. Our results highlight the need to improve the reporting of these figures to ensure transparency, reliability, and accurate assessment of treatment effects on long-term outcome measures.

posters-tuesday: 42

Enhancing Dose Selection in Phase I Cancer Trials: Extending the Bayesian Logistic Regression Model with Non-DLT Adverse Events Integration

Luca Genetti, Andrea Nizzardo, Marco Pergher

Evotec - Verona, Italy

This work presents the Burdened Bayesian Logistic Regression Model (BBLRM), an enhancement to the Bayesian Logistic Regression Model (BLRM) for dose-finding in phase I oncology trials. Traditionally, the BLRM determines the maximum tolerated dose (MTD) based on dose-limiting toxicities (DLTs)¹. However, clinicians often perceive model-based designs like BLRM as complex and less conservative than rule-based designs, such as the widely used 3+3 method^2,3. To address these concerns, the BBLRM incorporates non-DLT adverse events (nDLTAEs) into the model. These events, although not severe enough to qualify as DLTs, provide additional information suggesting that higher doses might result in DLTs.

In the BBLRM, an additional parameter δ is introduced to account for nDLTAEs. This parameter adjusts the toxicity probability estimates, making the model more conservative in dose escalation without compromising the accuracy in allocating the true MTD. The δ parameter is derived from the proportion of patients experiencing nDLTAEs and is tuned based on the design characteristics to balance the model’s conservatism. This approach aims to reduce the likelihood of assigning toxic doses as MTD while involving clinicians more directly in the decision-making process identifying the nDLTAEs along the study conduction.

The work includes a simulation study comparing BBLRM with more traditional versions of BLRM^4,5 and a two stage Continual Reassessment Method (CRM)⁶ that incorporates nDLTAEs across various scenarios. The simulations demonstrate that BBLRM significantly reduces the selection of toxic doses as MTD without compromising the accuracy of MTD identification. These results suggest that integrating nDLTAEs into the dose-finding process can enhance the safety and acceptance of model-based designs in phase I oncology trials.

References:

1. Neuenschwander B et al. Critical aspects of the bayesian approach to phase I cancer trials. Statistics in Medicine 2008.

2. Love SB et al. Embracing model-based designs for dose-finding trials. British Journal of Cancer 2017.

3. Kurzrock R et al. Moving beyond 3+3: the future of clinical trial design. American Society of Clinical Oncology Educational Book 2021.

4. Zhang H et al. Improving the performance of Bayesian logistic regression model with overdose control in oncology dose-finding studies. Statistics in Medicine 2022.

5. Ghosh D et al. Hybrid continuous reassessment method with overdose control for safer dose escalation. Journal of Biopharmaceutical Statistics 2023.

6. Iasonos A et al. Incorporating lower grade toxicity information into dose finding designs. Clinical Trials 2011.

posters-tuesday: 43

Bayesian Inference of the Parametric Piecewise Accelerated Failure Time Models for Immune-oncology Clinical Trials

XINGZHI XU, SATOSHI HATTORI

Osaka University, Japan

Modeling delayed treatment effects pose significant challenges in survival analysis, particularly in immune-oncology trials where Kaplan-Meier curves often exhibit overlapping patterns. Overlapping Kaplan-Meier curves implies the proportional hazard assumption is violated and the use of hazard ratio to summarize treatment effects is not appealing. In addition, it implies some patients do not benefit from the immuno-oncology drug. To address these issues, Sunami and Hattori (2024) introduced the piecewise Accelerated Failure Time (pAFT) model, employing a frequentist semi-parametric maximum-likelihood approach to account for delayed treatment effects and to evaluate each patient's probability of receiving benefit from the treatment. Their framework, while innovative, faced challenges in handling complex treatment-by-covariates interactions.

Building on their foundational work, this paper introduces two Bayesian parametric extensions: the pAFT model and the interactive piecewise Accelerated Failure Time (ipAFT) model. The Bayesian framework enhances the original model by incorporating prior knowledge and improving parameter estimation precision. The ipAFT model, in particular, extends the methodology by explicitly modeling treatment-by-covariates interactions, offering deeper insights into treatment efficacy on different subgroups.

Comprehensive simulation studies demonstrate that the proposed Bayesian models perform exceptionally in capturing delayed treatment effects, achieving accurate estimations and reliable coverage probabilities even with small sample sizes. The ipAFT model provides two measures for patient-specific treatment effects: probabilities of receiving the benefit from the treatment and patient-specific benefit after the delayed time. Applying some multivariate analysis techniques (such as hierarchical clustering) to the two measures, we can effectively characterize patients' treatment effects. Application to a real-world immuno-oncology clinical trial dataset reveals distinct patient subgroups based on the result of the ipAFT model.

By addressing key limitations of traditional survival models and extending Sunami and Hattori’s pAFT framework, the proposed Bayesian models offer flexible tools for analyzing immuno-oncology clinical trials. The stable and flexible natures allow our methods to be useful in early-phase clinical trials with small patient counts.

posters-tuesday: 44

Bayesian power-based sample size determination for single-arm clinical trials with time-to-event endpoints

Go Horiguchi¹, Isao Yokota², Satoshi Teramukai¹

¹Department of Biostatistics, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, Japan; ²Department of Biostatistics, Hokkaido University Graduate School of Medicine, Japan

Introduction

Single-arm exploratory trials are widely used in early-phase oncology research to assess the potential of new treatments, often using time-to-event endpoints. Conventional sample size calculations under a frequentist framework typically rely on limited statistics, such as point estimates of survival rates at specific time points or a single hazard ratio (HR). By contrast, Bayesian methods can incorporate prior information and allow interim decisions with greater flexibility. We propose a Bayesian sample size determination method based on posterior and prior predictive probabilities of the hazard ratio, introducing analysis and design priors to improve decision-making accuracy and efficiency.

Methods

In our Bayesian design, we set a target hazard ratio of 1 to show the superiority of new treatment. Using the analysis prior, we compute the posterior probability that the hazard ratio is below this target. If this probability exceeds a prespecified threshold, we conclude efficacy and stop the trial. For each candidate sample size, we draw from the design prior, generate predicted outcomes under proportional hazards, and calculate the proportion of simulated trials that would meet the stopping criterion. This proportion is the Bayesian power. The smallest sample size achieving the desired power is then selected. Here, the analysis prior encodes historical knowledge about the parameter, while the design prior represents its uncertainty at the planning stage.

Results

Simulation results show that more informative analysis priors reduce sample size, while greater uncertainty in the design priors increases it. For designs without interim analysis, the Bayesian method produces sample sizes comparable to or smaller than frequentist methods while maintaining type I error rates. Interim analyses reduce expected sample size and trial duration, with thresholds for posterior probabilities influencing early termination probabilities. Results also demonstrate flexibility in accommodating varying assumptions about survival distributions and parameter uncertainties.

Conclusion

The proposed Bayesian sample size determination method efficiently incorporates prior information and interim analyses, making it a practical alternative to traditional frequentist approaches. This approach enables flexible and rational trial designs, reducing conflicting decisions and improving resource use. Limitations include reliance on the proportional hazards assumption and computational demands for simulation-based power calculations. Future research should explore extensions to handle censoring and other complexities in clinical trials.

posters-tuesday: 45

Calibration of dose-agnostic priors for Bayesian dose-finding trial designs with multiple outcomes

Emily Alger¹, Shing M. Lee², Ying Kuen K. Cheung², Christina Yap¹

¹The Institute of Cancer Research, United Kingdom; ²Columbia University, USA

Introduction: The goal of dose-finding oncology trials is to assess the safety of novel anti-cancer treatments across multiple doses and to recommend dose(s) for subsequent trials. Based on previous observed responses, trialists dynamically recommend new doses for further investigation during the trial.

Adaptive decision making lends itself to Bayesian learning, with Bayesian frameworks increasingly guiding dose recommendations in model-based dose-finding designs, such as the Continual Reassessment Method (CRM) design. However, these approaches often add complexity by incorporating multiple outcomes and require appropriate prior selection. For trialists who lack prior knowledge, we may look to adopt a dose-agnostic prior – with each dose equally likely to be the a priori optimal dose. However, applying existing methodology to a multiple-outcome CRM may inflate suboptimal, low dose recommendations.

Methods: We broaden calibration techniques for single-outcome trial designs to calibrate dose-agnostic priors for multiple-outcome trial designs, such as designs that jointly evaluate Dose Limiting Toxicities (DLTs) and efficacy responses, or DLTs and patient-reported outcomes (PROs). The a priori probability each dose is identified as the recommended dose is written analytically and optimised using divergence minimisation. A simulation study is presented to demonstrate the effectiveness of calibrated priors for both the PRO-CRM[1] trial design and the joint-outcome CRM model proposed by Wages and Tait[2] in comparison to marginally calibrated priors.

Results: Our analytical and computationally efficient technique maintains an a priori dose agnostic prior whilst improving the probability of correct selection (PCS) and standard deviation of PCS across most simulation scenarios. Thus, jointly calibrated priors reduce the bias present in simulation performance with marginally calibrated priors.

Conclusion: Leveraging analytical expressions for a priori optimal dose recommendations enables computationally efficient implementation and reduces the need for extensive simulations to confirm trial design performance. What’s more, this approach supports trialists to develop deeper intuition about their prior choices, thus strengthening their confidence in selecting robust and suitable priors. As Bayesian dose-finding trial designs continue to advance, research and guidance on the effective calibration of design parameters is essential to support the uptake of Bayesian designs, demonstrate the importance of rigorous prior calibration, and ensure optimal performance in practice.

[1] Lee, Shing M., Xiaoqi Lu, and Bin Cheng. "Incorporating patient‐reported outcomes in dose‐finding clinical trials." Statistics in medicine 39.3 (2020): 310-325.

[2] Wages, Nolan A., and Christopher Tait. "Seamless phase I/II adaptive design for oncology trials of molecularly targeted agents." Journal of biopharmaceutical statistics 25.5 (2015): 903-920.

posters-tuesday: 46

Estimands in platform trials with time - treatment interactions

ZIYAN WANG, Dave Woods

Statistical Sciences Research Institute (S3RI), University of Southampton, United Kingdom

Background
In long-running platform trials, treatment effects may change over time due to shifts in the recruited population or changes in treatment efficacy—such as increased clinician experience with a novel surgical technique [1]. Most existing studies have assumed equal time trends across treatment arms and controls, focusing on treatment-independent time effects [2,3]. However, when time trends are unequal between treatment arms and controls, the standard estimands can lead to inflated type I error rates, reduced statistical power, and biased treatment effect estimates. In this study, we propose a novel model-based estimand designed to correct for unequal time trends, thereby ensuring robust and accurate inference in platform trials.

Methods
We propose a general model-based estimand based on a time-averaged treatment effect that is adaptable to a variety of time trend patterns in platform trials. In our study, we compare the performance of the standard treatment effect estimand with our generalized estimand in settings where time trends differ between treatment arms and the control. A simulation study is conducted within the framework of Bayesian platform trials—including those employing response-adaptive randomization (RAR)—and performance is evaluated in terms of error rates, bias, and root mean squared error.

Results
Our findings demonstrate that the generalized estimand is robust across various time trend patterns, including nonlinear trends. Flexible modelling with this estimand maintains unbiasedness and reduces power loss compared to the standard estimand. Moreover, the approach remains effective under adaptive randomization rules. All simulation analyses were performed using our “BayesianPlatformDesignTimeTrend” R package, which is publicly available on CRAN.

Conclusion
This work provides a practical and innovative approach for addressing time trend effects in platform trials, offering new insights into the analysis of trials where unequal strength of time trends exists.

[1] K. M. Lee, L. C. Brown, T. Jaki, N. Stallard, and J. Wason. Statistical consideration when adding new arms to ongoing clinical trials: the potentials and the caveats. Trials, 22:1–10, 2021.

[2] Roig, M. B., Krotka, P., Burman, C.-F., Glimm, E., Gold, S. M., Hees, K., Jacko, P., Koenig, F., Magirr, D., Mesenbrink, P., et al. (2022). On model-based time trend adjustments in platform trials with non-concurrent controls. BMC medical research methodology. 22.1, pp. 1–16.

[3] Marschner, I. C., & Schou, I. M. (2024). Analysis of Nonconcurrent Controls in Adaptive Platform Trials: Separating Randomized and Nonrandomized Information. Biometrical Journal, 66(6), e202300334.

posters-tuesday: 47

A Graphical Approach to Subpopulation Testing in Biomarker-Driven Clinical Trial Design

Boaz Natan Adler¹, Valeria Mazzanti², Pantelis Vlachos², Laurent Spiess²

¹Cytel Inc., United States of America; ²Cytel Inc., Geneva, Switzerland

Introduction:

As targeted therapies in Oncology are fast-becoming commonplace, clinical studies are increasingly focused on biomarker-driven hypotheses. This type of research, in turn, requires methods for subpopulation analysis and multiplicity comparison procedures (MCPs) for sound clinical trials. In our case study, we employed advanced statistical software to design and optimize such a clinical study with a novel graphical approach to testing sequence and procedures.

Methods:

For this optimization exercise, we interrogated the typical areas of design interest: selecting an appropriate sample size, required number of events, and the timing and attributes of an interim analysis for the study. In addition, our optimization aim included a focus on the testing sequence of the study’s subpopulations, biomarker-positive, and -negative, as well as a test of the overall study population. We also sought to optimize the MCP employed for the study, examining logrank and stepdown logrank tests, alongside different options for alpha splitting among the tests. Design variations and simulation were conducted using advanced statistical software and relied on a graphical approach to testing sequence and alpha splitting, in addition to visualizations of other study parameter variations.

Results:

This extensive simulation and optimization work allowed us to select a design that was tailored to the unique treatment effect assumptions of the investigational drug. We were able to convey design tradeoffs and the implications of testing sequence selection and other key design parameters in a graphical, relatable manner to the entire drug development team.

Conclusion:

A graphical approach to designing complex subpopulation analysis-driven clinical trials enables biostatisticians to assess design tradeoffs and selections clearly, while easing design and simulation work, and enhancing communication with governance committees.

posters-tuesday: 48

Optimizing Biomarker-Based Enrichment strategies in clinical trials

Djuly Asumpta PIERRE PAUL¹, Irina Irincheeva², Hong Sun³

¹Nantes University (France), Bristol-Myers Squibb (Switzerland); ²Bristol-Myers Squibb Boudry (Switzerland); ³Bristol-Myers Squibb Boudry (Switzerland)

Background

Identifying patients’ groups based on biomarkers is crucial in oncology. Validating a biomarker as a stratification criterion in clinical trials can take several years. Choosing the threshold for continuous biomarkers is particularly challenging, often relying on a limited number of values evaluated with simplistic statistical approaches. Early dichotomization ignores the actual distribution of values and the potentially informative “grey zone”.

Methods

In this work, we adapt a biomarker enrichment design to identify the optimal threshold to determine patients who will benefits the most from the experimental treatment. We simulate Simon &Simon design for binomial endpoint and survival endpoint. Various scenarios of chosen thresholds are studied through simulations inspired by existing studies. ROC curve-based approach to determine the threshold, as well as the Song-Chi closed test procedure to assess the treatment effect in both the overall population and the biomarker-positive subgroups are explored.

Results

Initial results suggest that our proposal effectively controls the Type I error for both binomial and survival endpoints. Additionally, switching to a ROC-curve approach for estimating the biomarker threshold improves statistical power by approximately 14%. Furthermore, incorporating the Song-Chi method allows testing of the difference in treatment effects between the standard control group and the experimental group in both the overall population (all the patients enrolled in the trial) and among the biomarker-positive patients, the patients most likely to benefit from the treatment. This method maintains rigorous Type I error control while still ensuring adequate power. Moreover, it facilitates the detection of treatment-specific fluctuations and subgroup dynamics within these two populations, leading to a more nuanced and precise analysis.

Conclusion

In conclusion, this study highlights the importance of a more nuanced approach in selecting biomarker thresholds and improving biomarker enrichment strategy for clinical trials, which is essential to accelerate the development of personalized therapies while optimizing the efficiency of clinical trials.

Reference

Simon N, Simon R. Adaptive enrichment designs for clinical trials.

Biostatistics. 2013 Sep;14(4):613-25. doi: 10.1093/biostatistics/kxt010.

Epub 2013 Mar 21. PMID: 23525452; PMCID: PMC3769998.

Song Y, Chi GY. A method for testing a prespecified subgroup in clinical trials

Stat Med. 2007 Aug 30;26(19):3535-49.

doi: 10.1002/sim.2825.PMID: 17266164

posters-tuesday: 49

Leveraging Synthetic Data for Enhanced Clinical Research Outcomes

Szymon Musik^1,2, Agnieszka Kowalewska³, Gianmarco Gallone³, Jacek Zalewski³, Joanna Sasin-Kurowska³

¹Late Phase Global Clinical Data Management, Clinical Data & Insights, BioPharmaceutical Clinical Operations, R&D, AstraZeneca, Warsaw, Poland; ²Department of Education and Research in Health Sciences, Medical University of Warsaw, Poland; ³Clinical Programming, Clinical Data & Insights, BioPharmaceutical Clinical Operations, R&D, AstraZeneca, Warsaw, Poland

Background / Introduction: In recent years, the pharmaceutical industry has been under immense pressure to make drug development faster and more efficient. Traditional clinical trials often face obstacles like high costs, prolonged durations, and challenges in participant recruitment, particularly for rare diseases. Additionally, testing of programming tools, databases, and software before acquiring patient data is cumbersome. Synthetic Data in Clinical Trials (SDCT) offers an innovative solution by providing high-quality, clinically realistic datasets that meet strict privacy conditions, facilitating thorough research.

Methods: We developed AstraZeneca’s Study Synthetic Data Tool (SYNDATA), which generates synthetic data for a study (referred to as the target study) using its Architect Loader Spreadsheet (ALS) and data from an ongoing or completed study (referred to as the base study). Importantly, the target study may not yet have any data collected. Our pipeline leverages the event chronology specified by the ALS, allowing scenarios for each patient to be created before data generation. We categorize dataset variables into groups based on types, such as dates or binary options (e.g., Yes/No), and use designated methods for generating these variables. This approach employs classic statistical techniques like kernel density estimation and Bayesian networks. Designed primarily for study set-up testing, SYNDATA explores potential variable values in the target study while preserving relationships from the base study. It can also incorporate incorrect values into the data if necessary.

Results: Incorporating synthetic data into clinical trials has significantly improved data scarcity challenges. SYNDATA generates synthetic data as soon as the ALS for a study is available, enabling users to test programming tools, databases, software, and visualizations. Furthermore, synthetic data supports data science projects. SYNDATA is secure and ensures patient privacy.

Conclusion: Synthetic data is set to transform clinical trials by addressing the current challenges in the pharmaceutical industry. It reduces development timelines and enhances data integration efficiency, allowing more reliable trial simulations. Adopting synthetic data as a vital component of clinical research could reshape conventional practices and usher in a new era of data-driven drug development.

posters-tuesday: 50

Graph-Based Integration of Heterogeneous Biological Data for Precision Medicine: A Comparative Analysis of Neo4j and MySQL

Byoung Ha Yoon

KRIBB(korea research institute of bioscience and biotechnology), Korea, Republic of (South Korea)

Precision medicine aims to provide personalized treatment plans tailored to individual patients. However, the complexity and scale of biomedical data, coupled with the exponential growth of clinical knowledge derived from diverse biological databases and scientific publications, pose significant challenges in clinical applications. A key challenge in this context is understanding and integrating the intricate relationships between heterogeneous biological data types.

In this study, we address this challenge by integrating multiple biological datasets—such as protein-protein interactions, drug-target associations, and gene-disease relationships—into a unified graph database. The constructed graph consists of approximately 150,000 nodes and 100 million relationships, with data pre-processed to remove redundancies. To assess the suitability of graph-based databases for handling complex biological networks, we compared the performance of Neo4j, a state-of-the-art graph database, with MySQL, a traditional relational database. Our results demonstrate that while MySQL struggled with complex queries involving multiple joins, Neo4j exhibited superior performance, providing rapid responses to the same queries.

These findings emphasize the potential of graph databases for efficiently storing and querying complex biological relationships. Moreover, the interconnected nature of biological data in graph structures facilitates the application of computational biology techniques, such as network analysis and clinical biostatistics, to uncover hidden patterns and infer new insights. This approach not only enhances the understanding of biological systems but also holds promise for improving clinical decision-making and advancing the field of precision medicine.

posters-tuesday: 51

Revolutionizing Clinical Data Management: A Strategic Roadmap for Integrating AI/ML into CDM

Joanna Magdalena Sasin-Kurowska¹, Szymon Musik¹, Mariusz Panczyk²

¹Astra Zeneca, Poland; ²Medical University of Warsaw, Poland

Clinical Data Management (CDM) is essential in clinical research, ensuring the accuracy and integrity of data for regulatory submissions. As clinical trials become more complex and generate larger volumes of data—especially in Phase III trials—there is a growing need for advanced tools to manage and analyze this information. This poster highlights key findings from our research on integrating Artificial Intelligence (AI) and Machine Learning (ML) into CDM, transforming it into Clinical Data Science (CDS). By reviewing literature from 2008 to 2025, we identified emerging trends such as the use of Natural Language Processing (NLP) to analyze unstructured data, AI/ML for automating data cleaning and analysis, and new technologies like blockchain, wearable devices, and patient-centric approaches. Our results indicate that AI/ML can improve data quality, automate processes, and enhance predictive analytics, offering a more efficient and scalable solution for clinical research. We also present a roadmap for successfully integrating AI/ML into CDM to drive innovation and advance clinical research. This review emphasizes the need for a strategic, multidisciplinary approach to fully leverage these technologies for more efficient and accurate clinical trials.

posters-tuesday: 52

Strategies to scale up model selection for analysis of proteomic datasets using multiple linear mixed-effect models

ILYA POTAPOV, MATTHEW DAVIS, ADAM BOXALL, FRANCESCO TUVERI, GEORGE WARD, SIMONE JUELIGER, HARPREET SAINI

Astex Pharmaceuticals, United Kingdom

Linear mixed-effect models (LMEM) are a key tool to model biomedical data with dependencies. For example, longitudinal read-outs from patients would necessarily need to address the correlation between samples, which violates the assumption of independence in the standard linear modelling approach. Designing the LMEM in terms of factors and their interaction that constitute the model is an elaborate process that takes into account both the formal analysis of the model variance and the end points of the study. Whereas there are multiple hypotheses of how best to design LMEMs, this process takes place normally at the level of a single model. In biomedical applications, however, we are often interested in multiple comparisons. In this case, the LMEM design process should be scaled up to optimise the model design for all comparisons simultaneously. In this work, we considered an example of the multiple design problem in a proteomic experiment. We showed how a general framework for the multiple LMEM designs can be established via the analysis of variance of the full and restricted (nested) models. This analysis included the formation of the P-value distribution for each of the factor terms and subsequent analysis of that distribution. We also demonstrated that the multiple design framework necessarily poses a question of whether all the models should have the same universal model design or individualised tailored models per protein. Both pathways are possible from the methodological point of view, yet they may have different implications for statistical inference. We discuss these implications.

posters-tuesday: 53

Cost-utility analysis of sodium-glucose cotransporter-2 inhibitors on chronic kidney disease progression in diabetes patients: a real-world data in Thailand

Sukanya Siriyotha¹, Amarit Tansawet², Oraluck Pattanaprateep¹, Tanawan Kongmalai³, Panu Looareesuwan¹, Junwei Yang¹, Suparee Wisawapipat Boonmanunt¹, Gareth J McKay⁴, John Attia⁵, Ammarin Thakkinstian¹

¹Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand; ²Department of Research and Medical Innovation, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok, Thailand; ³Division of Endocrinology and Metabolism, Faculty of medicine Siriraj Hospital Mahidol University, Bangkok, Thailand; ⁴Centre for Public Health, School of Medicine, Dentistry and Biomedical Sciences, Queen’s University Belfast, Belfast, United Kingdom; ⁵School of Medicine and Public Health, and Hunter Medical Research Institute, University of Newcastle, New Lambton, New South Wales, Australia

Introduction and Objective(s): Type 2 diabetes (T2D) increases the risk of micro- and macro-vascular complications, including chronic kidney disease (CKD), a major burdens that could significantly impair the quality of life and socioeconomic status. Evidence from numerous clinical trials demonstrate the benefits of sodium-glucose co-transporter 2 inhibitors (SGLT2is) in CKD prevention. However, the high cost of SGLT2i may limit their accessibility, despite economic evaluations suggesting cost-effectiveness. Therefore, this study aims to conduct a cost-utility analysis using real-world data in Thailand to provide more realistic and relevant evidence for policy decisions.

Method(s) and Results: Clinical and cost data of CKD patients between 2012 and 2022 were retrieved from Ramathibodi T2D data warehouse. Markov model was constructed for the following states: CKD stage 3, 4, 5, and death. A cost-utility analysis that estimates the cost per quality-adjusted life year (QALY) between two interventions: non-SGLT2i versus SGLT2i was performed in societal perspective. The incremental cost-effectiveness ratio (ICER) was calculated by dividing the difference in costs between the compared treatments by the difference in QALY associated with each treatment. A total of 20,735 patients were recruited. The lifetime costs were US$72,234.98 and 74,887.31 in patients with renal replacement therapy (RRT) and US$71,638.41 and 74,749.86 in patients without RRT, for non-SGLT2i and SGLT2i, respectively. ICERs were US$955.40 and 1,114.56 per QALY in patients with and without RRT.

Conclusions: SGLT2i was associated with higher treatment cost compared with non-SGLT2i. However, SGLT2i was still cost-effective considering Thailand willingness to pay at US$4,651 per QALY.

Keywords: Cost-utility analysis (QALY), Real-world data, Type 2 diabetes (T2D), Chronic kidney disease (CKD), Sodium-glucose co-transporter 2 inhibitors (SGLT2is)

References:

[1] Beckman JA, Creager MA. Vascular Complications of Diabetes. Circulation Research. 2016;118(11):1771-85.

[2] Wanner C, Inzucchi SE, Lachin JM, Fitchett D, von Eynatten M, Mattheus M, et al. Empagliflozin and Progression of Kidney Disease in Type 2 Diabetes. N Engl J Med. 2016;375(4):323-34.

[3] Reifsnider OS, Kansal AR, Wanner C, Pfarr E, Koitka-Weber A, Brand SB, et al. Cost-Effectiveness of Empagliflozin in Patients With Diabetic Kidney Disease in the United States:

posters-tuesday: 54

Comparing the Safety and Effectiveness of Covid-19 Vaccines administered in England using OpenSAFELY: A Common Analytic Protocol

Martina Pesce¹, Christopher Wood¹, Helen McDonald², Frederica Longfoot¹, Venexia Walker³, Edward PK Parker⁴, William J Hulme¹

¹Bennett Institute for Applied Data Science, Nuffield Department of Primary Care Health Science, Oxford University, UK; ²University of Bath, UK; ³Population Health Sciences, Bristol Medical School, University of Bristol, UK; ⁴London School of Hygiene and Tropical Medicine, UK

Background

In England, Covid-19 vaccination campaigns have been delivered in Spring and Autumn each year since 2021, and this pattern is set to continue for the foreseeable future. At least two vaccine products are used each campaign to mitigate any potential unforeseen supply or safety issues.

Post-authorisation evaluations of these vaccines in routine, out-of-trial settings are crucial: incidence of longer-term and rarer outcomes are often not reliably estimable in trials, and vaccines may perform differently in more diverse population groups or in the context of newer viral variants.

The regularity and similarity of campaigns, including future campaigns, coupled with the availability of reliable routinely-collected health data on who is getting which vaccine and when, provides an opportunity to specify a single analysis protocol that can be reused across multiple campaigns.

Methods

We developed a Common Analytic Protocol to compare the safety and effectiveness of vaccine products used in each Covid-19 vaccination campaign. Planned analyses will use the OpenSAFELY research platform which provides secure access to routinely-collected health records for millions of people in England.

The protocol uses complementary approaches to control for confounding (one-to-one matching without replacement and inverse probability of treatment weighting) to compare products for a variety of safety and effectiveness endpoints, within a variety of population subgroups, and with various accompanying sensitivity analyses and balance checks. The analogous hypothetical randomised trial that the design emulates is also described.

All design elements are specified explicitly in R scripts, fully executable against simulated dummy data before any real data is available for analysis.

Discussion

The ability to plan analyses comparing vaccine products well in advance of the delivery of the campaign has numerous benefits and challenges, which will be described in this talk. We invite feedback on the proposed design prior to its use in real data.

posters-tuesday: 55

Statistical requirements in medical diagnostic development across the UK, US, and EU markets: A review of regulation, guidelines and standards.

Timothy Hicks^1,2, Joseph Bulmer¹, Alison Bray¹, Jordan L. Oakley³, Rachel L. Binks^2,3, Kile Green², Will S. Jones⁴, James M.S. Wason³, Kevin J. Wilson³

¹Newcastle Upon Tyne Hospitals NHS Foundation Trust, United Kingdom; ²NIHR HealthTech Research Centre in Diagnostic and Technology Evaluation, United Kingdom; ³Newcastle University, United Kingdom; ⁴Centre of Excellence for Data Science, Artificial Intelligence and Modelling (DAIM), University of Hull, United Kingdom

Background: When developing novel medical diagnostic devices, including In Vitro Diagnostics, Medical Diagnostic Software, and General Medical Devices, developers must conform to their chosen markets’ regulations. In developing novel statistical methods to support diagnostic development, such as the use of adaptive design for sample size reassessment, it is paramount that the regulations, and associated guidance, do not preclude the proposed novel methodology. This review of legislation, official policy guidance, and standards across the UK, EU, and US aimed to identify regulatory requirements or restrictions relating to statistical methodology for diagnostics development.

Methods: Data sources identified for legislation, official policy guidance, and standards included: EUR-Lex, WestLaw UK, US Food and Drug Administration (FDA), Lexis+, Policy Commons, Medical Device Co-ordination Group (MDCG), and the British Standards Online Library. These data sources were searched for records relating to medical diagnostic development. Search terms included: Medical Device, In Vitro Diagnostic, Medical Diagnostic, Diagnostic, and IVD. Identified records were double screened for inclusion, including a within document search for 25 key terms related to statistical requirements and diagnostic development. Identified terms were coded and relevant statistical requirements both mandatory and recommended, extracted.

Results: This systematic review identified 2479 potential records, 540 of which met the inclusion criteria for data extraction, of which 139 had statistical requirements or recommendations related to medical diagnostic development. Mandatory requirements for specific tests or conditions were identified across the three regions (Total: n =187, UK = 12, EU = 82, US = 93). Examples of requirements include minimum sample sizes and specific populations when demonstrating diagnostic accuracy in certain high-risk conditions. For example, the EU Common Technical Specifications require first line assays for anti-HIV1/2 to include ≥400 positive HIV-1 and ≥100 positive HIV-2 specimens, of which 40 are non-B subtypes, and 25 are ‘same day’ fresh serum. Whilst not mandatory, this review also identified recommendations for best practice in diagnostic development and trial design covering: evidence requirements, statistical validity, study design, and study management.

Conclusion: Whilst mandatory statistical requirements exist for high-risk areas, thereby limiting the potential benefit of an adaptive trial due to mandating sample sizes, there remains a great opportunity for the development of novel methodologies and adaptive trial designs in medical diagnostics. This review will allow future development of a framework for designing adaptive trial in medical diagnostics, empowering statisticians and developers to improve efficiency whilst meeting regulatory requirements.

posters-tuesday: 56

Calf muscle development in NICU graduates compared with typically developing babies: an analysis of growth trajectories using linear mixed models

Alana Cavadino¹, Sian Williams^2,3, Malcolm Battin⁴, Ali Mirjalili⁵, Louise Pearce⁶, Amy Mulqueeney⁴, N. Susan Stott⁷

¹Epidemiology & Biostatistics, Faculty of Medical and Health Sciences, University of Auckland, New Zealand; ²Curtin School of Allied Health, Faculty of Health Sciences, Curtin University, Australia; ³Liggins Institute, University of Auckland, New Zealand; ⁴Newborn Services, Starship Child Health, Auckland District Health Board, New Zealand; ⁵Department of Anatomy and Medical Imaging, Faculty of Medical and Health Sciences, University of Auckland, New Zealand; ⁶Auckland Children’s Physiotherapy, Auckland, New Zealand; ⁷Department of Surgery, Faculty of Medical and Health Sciences, University of Auckland, New Zealand

Background / Introduction

Preterm birth and Neonatal Intensive Care Unit (NICU) admission are related to adverse health consequences in early childhood and beyond. This study evaluated lower leg muscle growth and motor development in the first 12 months of life in NICU graduates compared to typically developing (TD) infants.

Methods

A prospective, longitudinal study of infants born in Auckland, New Zealand, without complications and recruited from the community (TD), or discharged from a NICU (classed as intermediate-risk (NICU-IR) or higher-risk (NICU-HR) based on additional risk factors for adverse neurodevelopmental outcomes. Muscle volume and gross motor development were assessed at term-corrected ages 3-, 6- and 12-months (±1 month). Linear mixed models with REML and Kenward-Roger small-sample adjustment were used to estimate trajectories in Triceps Surae muscle volume measurements (Medial Gastrocnemius, Lateral Gastrocnemius, Soleus, and total Triceps Surae). Models included random intercepts for individuals and slopes for term-corrected-age, and fixed effects for term-corrected-age (months), body side (left/right leg), group (TD/NICU-IR/NICU-HR), and sex. Non-linear terms and interactions (by-group and by-side) for term-corrected-age, and different variance-covariance structures were evaluated. Estimated group trajectories and marginal means at 3-, 6- and 12-months term-corrected-age were presented.

Results

Sixty-one infants were recruited; n=24 TD, n=14 NICU-IR, and n=23 NICU-HR. NICU infants had lower birthweight (1.7±0.9kg) and length (40.3±6.2cm) compared to TD infants (3.3±0.5kg; 51.1±2.8cm). COVID-19 restrictions meant some 6- and 12-month assessments occurred late, with variable timings. For muscle volume measures, there were significant term-corrected-age*group and (term-corrected-age)²*group interactions, indicating muscle growth trajectories over time differed by group (Medial Gastrocnemius, Lateral Gastrocnemius, Triceps Surae, p<0.001; Soleus, p=0.04). Negative correlations between random intercepts and slopes indicated lower muscle volume at 3-months term-corrected-age was associated with faster growth. Between 3-12 months term-corrected-age, Triceps Surae increased on average by 18.1cm³ (95%CI: 16.1-20.2cm³), 13.3cm³ (10.6-16.0cm³) and 12.5cm³ (10.5-14.6cm³) in TD, NICU-IR, and NICU-HR infants, respectively. Soleus was smaller at 6- and 12-months term-corrected-age for both NICU groups, and Lateral Gastrocnemius was smaller at 12-months, term-corrected-age, for NICU-HR (p<0.001). At 12-months term-corrected-age, raw Gross Motor Quotient scores were lower for NICU-HR (p=0.005), and <10% of NICU infants were walking compared to 30% of TD.

Conclusion

Failure of typical Soleus growth over the first year contributed to a smaller Triceps Surae at 12-months term-corrected-age in NICU graduates. These findings add to the increasing body of evidence for an adverse impact of preterm birth and NICU stays on infant skeletal muscle growth.

posters-tuesday: 57

Automating Report Generation with Stata: A Case Study of NORUSE

Maria Elstad

Helse Stavanger, Norway

Abstract

The Norwegian Service User Registry (NORUSE) is a comprehensive health registry utilized by Norwegian municipalities to document service recipients with substance abuse and/or mental health issues. The primary goal of NORUSE is to gather knowledge about the extent of services and the expected demand for services for this patient group. This data supports the formulation of municipal substance abuse policies, better decision-making regarding prioritization of user groups, and improved evaluation of service offerings. Nationally, the statistics contribute to the data foundation for shaping national policies for mental health and substance abuse work.

In 2024, we generated 64 automated municipality reports using VBA code in Excel. However, we have begun exploring the use of the Stata command putdocx for creating these reports. We are already using this for subgroup analysis, regional and national reports. This exploration highlights the potential of putdocx to streamline the process of generating detailed and consistent reports. Although we have also considered other software like Power BI, we found it less flexible compared to Stata, despite its superior graphing capabilities.

By employing putdocx, we can automate the creation of reports, which is particularly beneficial for municipalities that receive community-specific reports shortly after data collection. Additionally, Helse Stavanger produces regional and national reports, further leveraging the efficiency of automated report generation. The integration of putdocx in our reporting workflow enhances the accuracy and timeliness of data presentation, supporting better decision-making and policy formulation.

As we consider employing this method more broadly, we anticipate significant improvements in our ability to provide clear snapshots of users' situations based on the latest contact status. This tool contributes significantly to the ongoing efforts to improve service delivery for individuals with substance abuse and mental health challenges. The flexibility and scalability of putdocx make it a promising solution for our future reporting needs.

posters-tuesday: 58

Maternal Mortality Rate in Sudan 2020: Causes of Death, Obstetric Characteristics and Territorial Disparity, Using Statistical Analysis.

MOHAMMED ABDU MUDAWI

freelancer (1Senior Statistician, Health Information System and Biostatistics Specialist) (Health (Health Information System))

Abstract

Maternal mortality in general is deaths associated with pregnancy. Maternal mortality is one of crucial of social determents of health and sociodemographic to measure and evaluate the quality of health care services (Antenatal Care Services), and reflects to the strength of health system in general, although Sudan was among the first countries in the Arab and Africa region, which conducted the (demographical health survey ¹⁹⁸⁹, safe motherhood survey ¹⁹⁹⁰, Sudan household health survey ^{2006 & 2010} and multiple indicators cluster survey ²⁰¹⁴), but the last survey that conducted and included the maternal mortality rate was been in 2010 and the maternal mortality rate was been 216 per 100000 live birth, due the instability situation (Sudanese Revolution 2018 and political situation), the sixth multiple indicators cluster survey (MICS 6^th) 2018 was not conducted.

The paper focus and illustrates the estimation the Maternal Mortality Rate - MMR in Sudan by causes of diseases, place of deaths, obstetric characteristics and territorial disparity, the data were collected from the Federal Ministry of Health (Annual Statistical Report and Maternal Mortality Deaths Surveillance ) for the year of 2020.

The Maternal Mortality Rate – MMR in the country was (278.7 per 100000 live birth) in 2020, and the higher Maternal Mortality Rate in the East Darfur state was (1531.8 per 100000 live birth), and most maternal deaths was happened due the Obstetric Hemorrhage by (35%), and (45%) of maternal deaths in age between (20 - 30), 508 deaths (62%) happened out of antenatal care and ANC follow up services. West Kordofan state was most state registered the maternal deaths by (10% of deaths of all states), and the most maternal deaths was happened in health facilities by (82% of deaths according of place) it was more than deaths at home and in road. The Maternal Mortality Rate higher in 2020 than the last in SHHS survey 2010 was been 216 per 100000 live birth.

posters-tuesday: 59

Community-Based Health Screening Attendance and All-Cause Mortality in Rural South Africa: A Causal Analysis

Faith Magut¹, Stephen Olivier¹, Ariane Sessego¹, Lusanda Mazibuko¹, Jacob Busang¹, Dickman Gareta^1,6, Kobus Herbst^1,5, Kathy Baisely^3,1, Mark Siedner^1,2,4

¹Africa Health Research Institute (AHRI), South Africa; ²Massachusetts General Hospital, Boston, Massachusetts, United States of America; ³London School of Hygiene & Tropical Medicine, Keppel Street, London, UK; ⁴University of KwaZulu-Natal, Durban, South Africa; ⁵DSI-SAMRC South African Population Research Infrastructure Population Infrastructure Network, Durban, South Africa; ⁶Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland

Background

South Africa is moving from a period marked by high mortality from HIV and tuberculosis(TB) to one characterised by a growing burden of non-communicable diseases. Community health fairs help to diagnose and refer individuals with chronic diseases in underserved areas. However, their impact on morbidity and all-cause mortality is unknown.

Methods

We enrolled individuals 15 years and older in the Africa Health Research Institute Health Demographic Health Surveillance area in rural KwaZulu-Natal to a community-based health fair screening and referral program (Vukuzazi). Testing was performed for HIV, TB, hypertension and diabetes. Those with positive results were visited at home for results provision and referral to local clinics.

All Individuals in the area were followed longitudinally through routine household surveillance to detect deaths. We used directed acyclic graphs to identify the following confounders of the association between health-fair attendance and mortality: age, sex, educational attainment, employment, household socio-economic status and prior healthcare-seeking behavior.

To estimate the effect of Vukuzazi health fair attendance on all-cause mortality, we first estimated inverse probability of treatment weights (IPTW) for health fair attendance, then applied weighted Kaplan-Meier analysis to compare survival and weighted Cox regression to estimate hazard ratios and marginal risk differences. We conducted a sensitivity analysis where we excluded deaths due to external factors (e.g. injuries) that would not be expected to be prevented by health fair attendance

Results

A total of 18,041 individuals (50.0% of those eligible) attended Vukuzazi health fairs. Compared to non-attenders, attenders were more likely to be women (68% vs 49%), older (median 37 vs 31 years), unemployed (37% vs 20%) and more likely to have accessed health care in the past year (53% vs 33%). Individuals were observed after health fairs for a median of 4.0 years (IQR 3.7 - 4.2 years) comprising a total of 127,625 person years. The crude mortality rate was 12.14(11.54-12.76) per 1000 person years. In weighted Kaplan-Meier analysis, attenders had better survival compared to non-attenders. In the IPTW adjusted models, Vukuzazi health fair attendance was associated with a 25% reduction in the hazard of all-cause mortality (HR=0.75, 95%CI: 0.67, 0.84), corresponding to a 1.5% absolute reduction in mortality over five years. Findings were similar in sensitivity analysis.

Discussion

Participation in a community-based health fair was associated with a reduction in 5-year all-cause mortality. The integration of health fairs with referral practices into standard healthcare delivery within rural areas may be an effective strategy to improve health outcomes.

posters-tuesday: 60

Reducing Uncertainty in Fertility Meta-Analysis: A Multivariate Approach to Clinical Pregnancy and Live Birth Outcomes

Mahru Ahmad, Jack Wilkinson, Andy Vail

University of Manchester, United Kingdom

Background:

Meta-analyses of assisted reproductive technology (ART) trials commonly assess clinical pregnancy and live birth as separate outcomes, despite their hierarchical dependency. Many trials report pregnancy but not live birth, limiting the applicability of univariate meta-analyses for live birth outcomes. This can lead to imprecise estimates and uncertainty about intervention effectiveness. Multivariate meta-analysis (MVMA) offers a potential solution by jointly modelling related outcomes, maximizing the use of available data and improving statistical precision.

Objectives:
This study aims to investigate whether Multivariate Meta Analysis (MVMA) provides a more reliable estimation of live birth outcomes compared to traditional univariate meta-analysis. Specifically, we:

Construct an MVMA model incorporating both clinical pregnancy and live birth outcomes using data from systematic reviews of ART trials (2020–2021).
Compare MVMA with univariate approaches, evaluating the extent to which MVMA improves precision and whether this would lead to different inferences. .
Explore different correlation structures between clinical pregnancy and live birth, assessing their impact on effect estimates.

Methods:
Systematic review data from the Cochrane systematic reviews (2020–2021) will be extracted, including trial-level counts of clinical pregnancies and live births for treatment and control groups. MVMA models will be implemented using various correlation assumptions, as well as the use of the Wei and Higgins method to account for the relationship between outcomes. The study will assess the performance of MVMA versus univariate meta-analysis by comparing uncertainty in effect estimates and methodological implications.

Results:
This study will provide insights into whether MVMA can enhance the precision of live birth effect estimates, making better use of incomplete ART trial data. By improving the analysis of imperfectly reported data, this study aims to reduce the considerable uncertainty surrounding many fertility interventions. The results will provide insights into whether MVMA leads to more precise effect estimates compared to univariate methods. The findings will be available at the time of the presentation and will help determine the extent to which MVMA can enhance statistical power when live birth data is incomplete. This work will contribute to methodological advancements in fertility research by optimising the use of available trial data and improving the reliability of conclusions drawn from ART studies.

posters-tuesday: 62

Causal discovery for multi-cohort studies

Christine Bang¹, Vanessa Didelez^2,3

¹University of Copenhagen; ²Leibniz Institute for Prevention Research and Epidemiology - BIPS; ³University of Bremen

Causal discovery methods aim to learn causal structures in a data-driven way. The availability of multiple overlapping cohort datasets enables us to learn causal pathways over an entire lifespan. Evidence of such pathways may be highly valuable, e.g. in life course epidemiology. No previous causal discovery methods tailored to this framework exist. We show how to adapt an existing causal discovery algorithm for overlapping datasets to account for the time structure embedded in cohort data. In particular, we show that this strengthens the method in multiple aspects.
We consider causal discovery methods that recover causal structures from (conditional) independencies in a given set of variables. Multiple causal structures may induce the same dependence structure and form an equivalence class. Without additional, stronger assumptions, it is usually not possible to recover more than the equivalence class; i.e. we cannot identify all causal directions. Moreover, when combining multiple datasets, if some variables are never measured jointly their (conditional in-)dependence is by construction unknown. Then, we cannot even identify the equivalence class. Hence, constraint-based causal discovery for multiple datasets suffers from two types of obstacles for identification.
Time structured data induces a partial causal ordering of the variables, which we refer to as tiered background knowledge. It is easy to see that tiered background knowledge improves the identifiability of causal directions. Additionally, we show that tiered background knowledge also improves the (partial) identifiability of the equivalence class, which is not trivial. We provide theoretical results on the informativeness as well as theoretical guarantees of the algorithm. Finally, we provide detailed examples that illustrate how the algorithm proceeds, as well as examples of cases where tiered background knowledge increases the level of informativeness.

posters-tuesday: 63

Extension of Causal Interaction Estimation Techniques through Integration of Machine Learning Algorithms

A F M Tahsin Shahriar, AHM Mahbub-ul Latif

University of Dhaka, Bangladesh, People's Republic of

This study explores the challenges of causal interaction analysis, particularly in public health and policy evaluation, where understanding how multiple exposures influence outcomes is crucial. Identifying these interactions is complex due to unobserved confounding, measurement errors, and high-dimensional datasets. Traditional econometric methods, while widely used, often rely on strong assumptions that may not hold in complex real-world scenarios.

This study reviews established causal inference methods, including Difference-in-Differences (DiD), Changes-in-Changes (CiC), and matching. These methods have limitations, particularly in handling high-dimensional data and complex interactions. To address these challenges, this research investigates an alternative approach using machine learning models, specifically Causal Forests and Bayesian Additive Regression Trees (BART), to estimate causal interactions. These models are used to obtain Conditional Average Treatment Effect (CATE) estimates, which are then used to compute the Average Treatment Effect on the Treated (ATET). However, these methods did not consistently outperform traditional methods in simulations, especially with smaller samples.

A key contribution of this study is the development of causal mixture methods, which integrate the adaptability of machine learning algorithms, like Gradient Boosting Machines (GBM) and Random Forests (RF), for first-stage estimation with the interpretability and robustness of traditional econometric frameworks, such as Difference-in-Differences (DiD), to enhance resilience to unmeasured confounding and measurement errors. This approach involves first estimating propensity scores using machine learning methods to capture complex relationships between covariates and treatment assignment. These estimated propensity scores are then integrated into the standard DiD model to improve covariate balance and comparability between treated and control groups, mitigating selection bias and enhancing the robustness of causal estimates. This approach aligns with modern econometric frameworks like Double Machine Learning (DML).

Simulation studies were conducted to assess the performance of various causal inference methods. Data were generated with varying levels of noise to examine the impact of measurement error. The mixture methods, integrating ML-based propensity scores with DiD regression, produced unbiased estimates, demonstrating robustness to measurement error.

In summary, this study advances the field of causal inference by: (i) presenting a detailed comparative analysis of econometric and machine learning-based methods, (ii) proposing causal mixture models that integrate machine learning for robust first-stage estimation, and (iii) comparing bias through simulations. These contributions provide researchers with practical tools and a stronger theoretical foundation for addressing challenges in causal interaction analysis, particularly in high-dimensional and complex settings, ensuring more reliable and interpretable conclusions for decision-making in public health and policy research.

posters-tuesday: 64

Embrace Variety, Find Balance: Integrating Clinical Trial and External Data Using Causal Inference Methods

Rima Izem¹, Yuan Tian², Robin Dunn³, Weihua Cao³

¹Novartis Pharma AG, Switzerland; ²China Novartis Institutes for BioMedical Research Co., Ltd.; ³Novartis Pharmaceuticals Corporation, USA

Integrating information from multiple sources is important for multiple stakeholders in the development of pharmaceutical products. For example, augmenting the control arm of a randomized controlled trial with external data from previously conducted trials can inform internal decision-making in early development or expedite development in small populations with unmet medical need. Also, leveraging external controls from a disease registry to a single arm trial can make it possible to estimate the comparative treatment effect of the study drug when a randomized comparison is unfeasible or unethical. The main challenge in this data integration is assessing potential biases, due to between-source differences, and minimizing or mitigating these biases in the integrated design and analysis.

This presentation proposes the use of a workflow implementing propensity score methods, developed in observational data, when estimating treatment effects from multiple data sources with individual-level data. First, causal inference thinking can help identify the causal estimand, establish the underlying assumptions, and focus the assessment of between-source heterogeneity on key variables. The use of target trial emulation and balance diagnostics can identify the relevant subset in the external data, assess the extent of adjustment needed, evaluate the plausibility of important assumptions, such as positivity, and assess adequacy of propensity score adjustment. Lastly, for fit-for-purpose external data, a variety of methods can leverage the propensity score to estimate the treatment effect. Our presentation will share practical considerations at each step of the workflow and illustrate its use with case studies and simulated data from pharmaceutical development.

posters-tuesday: 65

Revisiting subgroup analysis: A reflection on health disparities using conditional independence

Nia Kang, Tibor Schuster

McGill University, Canada

Introduction: Comparative assessment is deeply ingrained in human nature to answer cause and effect questions. It is also an important feature of methodological rigour, underlying many research designs including randomized controlled trials, epidemiological studies and population-level evaluations for informing health policy. Programs that aim at addressing health disparities often rely on comparisons of health indicators across predefined sub-populations (i.e., groups distinguished by fixed socio-demographic characteristics), rather than by theoretically assignable exposures or interventions.

Although tailoring health policy implications to such subgroups may seem reasonable, this approach risks oversimplification, as the intersectional nature of socio-demographic factors can obscure those with the greatest need, rendering population-level interventions derived from such analyses less effective.

Methods: Using principles from probability theory, we define health parity as the stochastic independence between one or more health indicators and any subdivision of the population conditional on confounding factors. We consider the presence of two or more group-defining features that may intersect within and across subpopulations. We further assume the availability of a program or policy P that has a positive causal impact on the health indicator(s) under study but has limited resource allocation.

Using Bayes’ theorem, we derived a target function that factorizes the tradeoff between decreasing subgroup-specific health disparities and lowering the marginal prevalence of a poor health outcome given practical constraints such as resource availability. We conducted extensive Monte Carlo simulation studies to demonstrate how the proposed function can help identify the most optimal P in terms of maximizing health parity. Factors considered in the simulations are the degree of impact of P, resource availability, number and prevalence of population subgroups, and varying distributions of health outcomes.

Results/Conclusion: The proposed functional approach demonstrated utility in assessing the effectiveness of health programs and policies aimed at maximizing health parity. Although subpopulations defined based on sociodemographic features provide an easy ground for conventional comparative assessment, they may have limited capacity to inform the most effective health policies. Indeed, our findings imply that comparative subgroup analysis should be supplemented with marginal outcome distributions by leveraging the proposed target function approach.

posters-tuesday: 66

Comparison of Multiple Imputation Approaches for Skewed Outcomes in Randomised Trials: a Simulation Study

Jingya Zhao, Gareth Ambler, Baptiste Leurent

University College London, United Kingdom

Introduction

Missing outcome data is a common issue in trials, leading to information loss and potential bias. Multiple imputation (MI) is commonly used to impute missing data; one advantage is that it can include additional predictors of 'missingness' that are not in the analysis model. However, standard MI methods assume normality for continuous variables, which is often violated in practice, e.g. healthcare costs are typically highly skewed. Alternative MI approaches, involving Predictive Mean Matching (PMM) or log transformations, have been proposed for handling skewed variables. Using simulation, we compare different methods for imputing missing values of skewed outcome variables in randomised trials.

Methods

We simulated trial data with two treatment arms and correlated skewed baseline and follow-up variables. We considered three different missing data mechanisms for the follow-up variable: missing completely at random (MCAR), missingness associated with treatment arm (MAR-T), and missingness associated with baseline (MAR-B). We compared seven methods: Complete Case Analysis (CCA), Multivariate Normal Imputation (MVN), Multiple Imputation by Chained Equations (MICE), and Predictive Mean Matching (PMM), along with log-transformed versions (LogMVN, LogMICE, and LogPMM) which perform imputation on the log-transformed variables. Assessment of performance focused on bias and confidence interval (CI) coverage when estimating the mean difference between arms. These methods were also applied to the analysis of a healthcare costs trial dataset.

Results

The simulation results showed that LogMVN and LogMICE typically outperformed other methods. MVN and MICE also performed well under MCAR and MAR-T but had poor performance under MAR-B. PMM and LogPMM generally performed poorly, often showing under-coverage. CCA performed well under MCAR but not under MAR mechanisms. When applied to the trial dataset, PMM and LogPMM produced point estimates similar to that of CCA, with the narrowest CIs. Conversely, LogMVN and LogMICE yield higher point estimates, along with the widest CIs. Additional simulations are being performed to explore further results under different outcome distributions, missing data mechanism and sample sizes.

Conclusion

Our results suggest that a log transformation before MI strategy might be useful for handling skewed variables (although non-positive values need careful handling). The performance of MVN and MICE depends on the specific missingness mechanism, and the PMM method cannot be recommended. However, further evaluation alternative data generation mechanisms are needed.

posters-tuesday: 67

Assessing the effect of drug adherence on longitudinal clinical outcomes: A comparison of Instrumental Variable and Inverse Probability Weighting methods.

Xiaoran Liang¹, Deniz Türkmen¹, Jane A H Masoli^1,2, Luke C Pilling¹, Jack Bowden^1,3

¹University of Exeter, United Kingdom; ²RoyalDevon University Healthcare NHS Foundation Trust, Exeter, United Kingdom; ³Novo Nordisk Research Centre (NNRCO), Oxford, United Kingdom

Background: Drug adherence refers to the degree to which patients comply with prescribed therapeutic regimens when taking medications and high adherence is essential for ensuring the expected efficacy of pharmacological treatments. However, in routine care settings, low adherence is a major obstacle that frustrates this desired process. For instance, real-world studies report that adherence to commonly provided statin therapy can drop below 50% within the first year of treatment, which is substantially lower than observed in the controlled trials that led to their original approval. Method: In this paper we discuss the use of longitudinal causal modelling to estimate the time-varying causal effects of adherence on patients’ health outcomes over a sustained period. The goal of such analyses is to provide a means for quantifying the impact of interventions to improve adherence on long-term health. If a meaningfully large difference is estimated by such an analysis, the natural focus can then shift to deciding how to realize such an intervention in a cost-effective manner. Two estimation approaches, Inverse Probability Weighting (IPW) and Instrumental Variables (IV), have been proposed in the ‘Estimand framework’ literature to adjust for non-adherence in randomized clinical trials, where non-adherence is viewed as an intercurrent event. We refine and adapt these methods to assess long-term adherence in the observational data setting, which differs in several key respects compared to a clinical trial: Firstly, an absence of overt randomization to treatment, secondly, adherence and longitudinal outcomes only being available in those who are treated. We clarify the assumptions each method makes and assess the statistical properties of each approach using Monte Carlo simulation as well real data examples on statin use for LDL cholesterol control and metformin use for HbA1c control taken from primary care data in UK Biobank.

Results: The findings from our simulations align with theoretical expectations. The IV method effectively accounts for time-varying observed and unobserved confounders but relies on strong, valid instruments and additional parametric assumptions on the causal effects. In contrast, the IPW method addresses observed confounders without requiring additional assumptions but remains susceptib

posters-tuesday: 68

Compliance between different anthropometric indexes reflecting nutritional status in women with polycystic ovary syndrome

Aleksander J. Owczarek¹, Marta Kochanowicz², Paweł Madej², Magdalena Olszanecka-Glinianowicz¹

¹Health Promotion and Obesity Management Unit, Department of Pathophysiology, Faculty of Medical Sciences in Katowice, Medical University of Silesia in Katowice, Poland; ²Department of Gynecological Endocrinology, Faculty of Medical Sciences in Katowice, Medical University of Silesia in Katowice, Poland

Background: Obesity (mainly diagnosed based on the body mass index – BMI) is the main risk factor for developing polycystic ovary syndrome (PCOS). Based on BMI not all women with PCOS are diagnosed with obesity. However, BMI does not assess visceral fat deposits that play a key role in the pathogenesis of PCOS. Thus, there is a constant search for anthropometric indicators that allow the assessment of visceral fat deposits. This study aimed to assess the comparison of various anthropometric indicators for the diagnosis of excessive fat deposits.

Methods: Based on body mass, height, waist, and hip circumference eleven indexes were calculated: BMI, waist-to-hip ratio (WHR), waist-to-height ratio (WHtR), waist-to-hip-to-height ratio (WHHR), body adiposity index (BAI), a body shape index (ABSI), body roundness index(BRI), weight-adjusted waist index (WWI), abdominal volume index (AVI), CI (Conicity index), Roher or corpulence index (RI). To compare indexes with each other based on the Passing-Bablock (PB) regression, they were scaled to range [0,1]. The serum lipid profile total cholesterol, LDL and HDL cholesterol, triglycerides) as well as triglyceride-glucose index (TyG) were also determined.

Results: The study group encloses 611 women with diagnosed PCOS, with a mean age of 26.3 ± 4.8 (range: 17 – 43) years. There were positive significant linear correlations between indexes (ranging from 0.08 to 0.99), except between ABSI versus BAI and RI. Overall, 55 comparisons between indexes were done with PB regression regarding intercept and slope. Apart from the comparison between BMI vs RI, WHHR vs WWI, and WHR vs ABSI, all methods were different from each other regarding the intercept (ranging from -0.28 to 0.24). Taking slope into consideration, 21 (38.2%) comparisons yielded slopes that did not differ significantly from 1. The highest positive slope value of 1.42 (95% CI: 1.27 – 1.57) was noted in the comparison between WWI vs BAI. The lowest value of 0.72 was noted in the comparison between BAI vs WHtR (95% CI: 0.68 – 0.77). Indexes the most consistent with each other were WHR vs WHtR and ABSI, BRI vs RI, and BMI vs BRI and RI. The highest significant correlations with lipid profile were observed for WHtR while the lowest for ABSI.

Conclusions: Individual anthropometric indexes are not equivalent to each other. Assessment of the level of nutrition using different indicators may lead to over- or underdiagnosis of obesity among women with PCOS.

posters-tuesday: 69

Effectiveness of different macronutrient composition diets on weight loss and blood pressure. A network meta-analysis

Katerina Nikitara¹, Anna-Bettina Haidich², Meropi Kontogianni³, Vasiliki Bountziouka¹

¹Computer Simulation, Genomics and Data Analysis Laboratory, Department of Food Science and Nutrition, School of the Environment, University of the Aegean; ²Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki; ³Department of Nutrition and Dietetics, School of Health Sciences and Education, Harokopio University

Background: The scientific evidence surrounding the effectiveness of macronutrient composition on weight loss and reduction of Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP) is conflicting. Advanced analytical methods can be used to examine the effects of different macronutrient compositions. This study explored the effectiveness of diets with different macronutrient compositions for weight loss and blood pressure through network meta-analysis (NWMA).

Methods: A systematic review was conducted by retrieving studies from five bibliographic databases (January 2013 to May 31, 2023). The study population included adults at high risk for cardiovascular diseases, while the outcomes assessed involved markers of glycemic control, obesity, dyslipidemia, and inflammation. Specifically, in the present study, the outcomes of interest were the mean difference in Body Mass Index (BMI), Waist Circumference (WC), SBP and DBP before and after the intervention. The reference diet used for BMI and WC was low-fat (<30%), moderate-carbohydrate (45-60%), and high-protein (19-40%) (LFMCHP) and for the SBP and DBP, low-fat, moderate-carbohydrate, and moderate-protein (10-18%) (LFMCMP), according to the reference diets used in the studies included for each outcome.

Results: Ten studies (n=1,008 individuals) were included in NWMA for BMI, six (n=835) for WC, and seven (n=1,103) for SBP and DBP. The random effect model was used in the NWMA. Results revealed that, compared to the reference diet (LFMCHP), only the high-fat (36-60%), low-carbohydrate (26-44%), high-protein (HFLCHP) diet demonstrated a greater reduction in BMI after the intervention by 0.32 kg/m² (95% CI: -0.34; -0.30, I²=0%, p-value=<0.001). Additionally, the highest ranking in terms of certainty of effectiveness was observed for the high-fat, very low-carbohydrate (<26%), very high-protein (>40%) diet (HFVLCVHP) (P-score: 0.71) compared to other interventions, followed by the HFLCHP diet (P-score: 0.63) and the high-fat, low-carbohydrate, moderate-protein diet (HFLCMP) (P-score: 0.59). Non-significant results were found for WC, SBP, and DBP.

Conclusion: This NWMA suggests that high-fat, low-carbohydrate, high-protein diets may be more effective for BMI reduction, while no significant effects were observed for blood pressure. These findings highlight the potential role of macronutrient composition in weight management but indicate the need for further research to clarify its impact on other cardiometabolic outcomes.

posters-tuesday: 70

Going from methodological research to methods guidance: the STandards for the development REseArch Methods guidance (STREAM) initiative

Malena Chiaborelli, Julian Hirt, Matthias Briel, Stefan Schandelmaier

University Hospital Basel, Switzerland

Background: Health researchers need clear and trustworthy methods guidance (e.g. tutorials on handling baseline missing data in trials; best practice regarding calibration of prediction models) to help them plan, conduct, and analyse their studies. Methodological research (based on logic, simulation, or empirical studies) can sensibly inform methods guidance. How to go from methodological research to methods guidance, however, is currently unclear. A new initiative (Standards for the Development of Research Methods Guidance, STREAM) aims to develop a structured process to connect methodological research with methods guidance.

Methods: STREAM includes a series of studies: 1) a scoping review of existing standards to develop methods guidance, 2) a meta-study to assess the current practice of methods guidance development, 3) an interview study to understand the needs of health researchers who use methods guidance, 4) a consensus study to develop standards for methods guidance development, and 5) user testing of these standards in ongoing guidance development projects.

Results: At the conference, we will present the overall initiative and results of the first two studies. The scoping review identified 6 articles addressing the development of methods guidance. Of those, 1 mentioned methodological research (specifically: empirical studies) as an input for guidance development, without specifying a process. None of the included articles mentioned simulation studies as an input. For the meta-study, we reviewed 1202 methods guidance articles, most published after 2018. Of those, 347 reported a development process: 156 (45%) performed a systematic review of the methodological literature, 93 (27%) a consensus process, 71 (20%) user-testing, 43 (12%) empirical studies, and 36 (10%) simulation studies.

Impact: The two initial studies of the STREAM initiative reveal that the literature addressing the development of methods guidance is scarce and limited and that methods guidance articles rarely report a development process. Guidance developers use varying ad hoc approaches to create guidance and rarely seek input from their users (health researchers). The findings suggest that current methods guidance could be improved to make it more helpful for health researchers and better support the production of high-quality evidence. The new standards for the development of research methods guidance will provide explicit solutions to these challenges.

posters-tuesday: 71

Effectiveness of a Skill Check Sheet for Registered Dietitians: A Cluster Randomized Controlled Trial Protocol

Misa Adachi^1,2, Asuka Suzuki², Kazue Yamaoka^1,3, Mariko Watanabe⁴, Toshiro Tango^1,5

¹Nutrition Support Network LLC, Sagamihara, Japan; ²Teikyo University Graduate School of Public Health, Japan; ³Tetsuyu Clinical Research Center, Tetsuyu Institute Medical Corporation, Tokyo, Japan; ⁴Showa Women’s University, Tokyo, Japan; ⁵Center for Medical Statistics, Tokyo, Japan

Inroduction:

Registered dietitians (RDs) play a critical role in promoting lifestyle improvements through evidence-based nutrition interventions. To enhance RD competencies in nutrition education, we developed a Skill Check Sheet (SCS) designed to support self-assessment and skill improvement. A preliminary single-group intervention study (3 months) suggested that SCS might effectively improve RD skills. This study aims to evaluate its effectiveness in reducing glycated hemoglobin (HbA1c) levels among patients with type 2 diabetes (T2D) by conducting a cluster randomized controlled trial (cRCT). The intervention compares a validated nutrition education program, the SILE program (Adachi et al., 2017), with an enhanced version incorporating the SCS (SILE+SCS).

Methods and Results:

This 4-month cRCT will randomly assign RDs to one of two intervention arms (SILE+SCS vs. SILE). Each RD will manage seven T2D patients aged 20–80 years. The primary outcome is the change in HbA1c from baseline. The intervention effect will be assessed using an intention-to-treat (ITT) analysis with a generalized linear mixed-effects model, adjusting for covariates.

The sample size calculation was based on previous studies and preliminary data, assuming a standardized mean difference (SMD) of 0.33, an intraclass correlation coefficient (ICC) of 0.01, a two-sided significance level of 5%, and 80% power, with seven patients per RD cluster. This resulted in a required sample of 21 RDs per group. Accounting for a 10% dropout rate, the final target is 23 RDs per group, totaling 322 patients.

Conclusion:

Preliminary findings suggest that SCS may enhance RD skills in nutrition education. This cRCT will rigorously evaluate its effectiveness, ultimately aiming to contribute to the prevention and management of lifestyle-related diseases.

Referrence

Adachi M, Yamaoka K, Watanabe M, et al. Does the behavioural type-specific approach for type 2 diabetes promote changes in lifestyle? Protocol of a cluster randomised trial in Japan. BMJ Open 2017;7:e017838. doi:10.1136/ bmjopen-2017-017838Adachi et al., 2017.

posters-tuesday: 72

Bootstrap-based approaches for inference on the total deviation index in agreement studies with replicates

Anna Felip-Badia¹, Josep L Carrasco², Sara Perez-Jaume^1,2

¹BiMaU, Sant Joan de Déu Pediatric Cancer Center Barcelona, Spain; ²Department of Basic Clinical Practice, Universitat de Barcelona, Spain

Introduction

The total deviation index (TDI) is an unscaled statistical measure used to evaluate the deviation between paired quantitative measurements when assessing the extent of agreement between different raters. It describes a boundary such that a large specified proportion of the differences in paired measurements are within the boundary (Lin, 2000). The inference of the TDI involves the estimation of a 100(1-α)% upper bound (UB), where α is the significance level. Some methods to estimate the TDI and the UB have been proposed (Choudhary, 2008, 2010; Escaramis, 2010). In 2015, Perez-Jaume and Carrasco (P-J&C) proposed a non-parametric method that estimates the TDI as a quantile of the absolute value of the within-subject differences between raters and bootstraps them with two strategies to estimate the UB. Our goal is to assess an alternative bootstrap approach when estimating the UB using P-J&C’s method, and to compare its performance as well as the one of the TDI estimates to that of the already existing methods in the literature.

Methods
We consider two non-parametric bootstrap approaches for studies with replicates: the bootstrap of the within-subject differences and an alternative approach of a cluster bootstrap at subject level. We also consider four strategies to estimate the UB: the ones based on the basic percentile and the normal distribution from P-J&C and two additional ones based on empirical quantiles and BC_a confidence limits. This leads to eight different ways of UB estimation. We implement all the above-mentioned methods to estimate the TDI and the bootstrap-based approaches for inference in an R package and conduct a simulation study to compare the performance of all the methodologies considered in this work. Furthermore, we apply them to a real case dataset.

Results
All the methods exhibit a tendency to overestimate the TDI except for Choudhary's 2010 method that seems to underestimate it in all combinations considered in the simulation study. The bias and the mean squared error is reduced when the sample size is increased for all methods, indicating consistent asymptotic properties. Regarding the empirical coverages, the cluster bootstrap approach gives values closer to the expected 95% than the ones from the bootstrap of the within-subject differences. Finally, under real data with replicates all techniques provided similar estimates with the BC_a strategy resulting in slightly higher UBs in most cases.

Conclusion
In studies with replicates, when applying bootstrapping to estimate the UB using the P-J&C estimator, the cluster bootstrap approach is recommended.

posters-tuesday: 73

Baseline treatment group adjustment in the BEST study, a longitudinal randomised controlled trial.

Robin Young¹, Alex McConnachie¹, Helen Minnis²

¹Robertson Centre for Biostatistics, University of Glasgow, United Kingdom; ²Centre for Developmental Adversity and Resilience (CeDAR), University of Glasgow, United Kingdom

In an RCT with measurements of the outcome variable at baseline and one or more follow up visits, a linear mixed effects regression model can be used. Due to randomisation it would be expected that there is no difference between treatment groups at baseline, and so a model term for treatment effect at baseline can be omitted. It has been shown that such a “constrained baseline analysis” would have more power than if a term for the baseline treatment effect is included in analysis models¹.

The BEST² trial was an RCT assessing the impact of the New Orleans Intervention Model on children entering foster care in the UK, with measurement of outcomes at baseline and two follow up visits. As a result of practical and legal considerations relating to the setting of the study, over the 10 year duration of the trial there were three separate schedules of recruitment: (1) Consent first followed by baseline measures and then randomisation (2) Randomisation followed by consent then baseline (3) Consent followed by randomisation then baseline. As not all participants were recruited with randomisation occurring after baseline, it could not be guaranteed prior to unblinding at the end of the study that the treatment groups were balanced at baseline for the primary outcome. A pre-defined statistical analysis plan for the study therefore took the approach to include a term for the treatment effect at baseline to account for any unexpected differences.

At the conclusion of the trial, there was some degree of difference at baseline between the unblinded treatment groups for the primary outcome, and as a result the choice to include a term for this in the primary analysis model appeared justified. Using the data from the trial in combination with simulations, we will show that there are scenarios where due to study design, or to account for high variability in outcome measures, including the baseline treatment effect may be relevant to consider as either the primary model or as a sensitivity to constrained baseline analysis.

References:

[1] Coffman CJ, Edelman D, Woolson RF, To condition or not condition? Analysing ‘change’ in longitudinal randomised controlled trials. BMJ Open 2016;6:e013096. doi: 10.1136/bmjopen-2016-013096

[2] BEST [Accepted Nature medicine]

posters-tuesday: 74

The Subtle Yet Impactful Choices in Procedure to conduct Matching-Adjusted Indirect Comparison - Insights from Simulation

Gregory Chen¹, Micahel Seo², Isaac Gravestock²

¹MSD, Switzerland; ²Roche, Switzerland

Population-adjusted indirect treatment comparisons (ITCs) play a crucial role in clinical biostatistics, particularly in the health technology assessment (HTA) space. Demonstrating the comparative effectiveness of an investigational treatment against standard-of-care comparators is essential for both clinical and economic decision-making in reimbursement submissions. However, head-to-head randomized trials for payer-interested comparators are often unavailable at the time of a HTA submission, necessitating the use of indirect comparison methods.

When only aggregate data (AgD) are available for a comparator, the Matching-Adjusted Indirect Comparison (MAIC) method, originally introduced by Signorovitch, has become the go-to approach. Over time, variations and refinements have been introduced in both research and practice. This study conducts a simulation-based evaluation of the bias and relative efficiency of different MAIC estimators for the average treatment effect among treated (ATT), along with an assessment of confidence interval (CI) coverage based on asymptotic derivations, robust variance estimators, and bootstrap methods.

The simulation utilizes {maicplus} R package and is designed to generate insights for both binary and time to event endpoints. The primary focus is on unanchored ITCs, with a secondary analysis of anchored comparisons to assess the robustness of findings. The study examines performance across various scenarios, including different sample sizes, true event rates, and degrees of prognostic factor overlap. Additionally, we investigate the impact of including non-prognostic factors, omitting key confounders, and interactions between these factors. To further contextualize MAIC findings, we incorporate inverse probability of treatment weighting (IPTW) estimators, quantifying the trade-offs in performance metrics when individual patient data (IPD) for the comparator arm are unavailable.

The findings from this study will provide critical insights into the feasibility, reliability, and trade-offs of population-adjusted ITCs, offering guidance on best practices and methodological considerations in comparative effectiveness research.

posters-tuesday: 75

Utility-based design: an improved approach to jointly analyze efficacy and safety in randomized comparative trials

Patrick Djidel, Armand Chouzy, Pierre Colin

Bristol Myers Squibb, Switzerland

Introduction

In randomized clinical trials, multiple endpoints are evaluated to assess new treatments, focusing on both efficacy and safety. Traditional oncology study designs often rely on a single primary endpoint, which can overlook other important objectives. Various frameworks, such as those proposed by Murray, Kavelaars, and Park, incorporate multivariate outcomes to improve decision-making by considering the risk-benefit tradeoff. We propose a utility-based design tool, extending Murray’s approach, that accounts for the correlation between efficacy, safety and the cause of death (due to disease progression vs. fatal adverse event).

Methods

The proposed statistical framework is based on a joint probit model as follows: the clinical endpoints are considered categorical (e.g. toxicity grade and objective response rate) and a composite endpoint is derived based on combinations of both safety and efficacy categories and numerical utilities. The utility matrix is obtained via a consensus among clinical trial physicians. Then, to evaluate the treatment effect, we calculate the mean joint probabilities via a joint probit model and combine them with the utility matrix. To support decision-making, a formal test is derived to analyze the improvement of the utility score due to the treatment effect.

Results

We provide a statistical tool to efficiently compare treatment arms from randomized trials and evaluate the efficacy/safety trade-off. A statistical test and a target sample size calculation tool have been developed to properly compare treatment arms for decision making, while controlling Type I and Type II error rates. Some examples of treatment arm comparisons are available using data from oncology studies.

Conclusion

We propose a practical approach to consider the efficacy-safety tradeoff and efficiently compare treatments based on categorical outcomes. The joint probit model considers the correlation between efficacy and toxicity to support multivariate decision-making and efficiently determines whether a treatment is clinically superior to another, by reducing the multidimensional outcome to a single mean utility score. In addition, the benefit-risk ratio is often considered to compare multiple dose levels, looking for the optimal dose. The proposed utility score is useful in summarizing the benefit-risk ratio in early drug development. The statistical test we propose can also be used for dose optimization or seamless designs and combined with commonly used study designs, such as Group Sequential Design.

posters-tuesday: 76

Hierarchical Composite Endpoints and win ratio methods in cardiovascular trials: a systematic review and consequent guidance

Ruth Owen^1,2,3, John Gregson¹, Dylan Taylor^2,3, David Cohen^4,5, Stuart Pocock¹

¹London School of Hygiene and Tropical Medicine, United Kingdom; ²Centro Nacional de Investigaciones Cardiovasculares, Spain; ³Oxon Epidemiology, Spain; ⁴Cardiovascular Research Foundation, NY USA; ⁵St. Francis Hospital, NY USA

Introduction

The value of hierarchical composite endpoints (and their analysis using the win ratio) is being increasingly recognised, especially in cardiology trials. Their reporting in journal publications has not been previously explored.

Methods

A search of 14 general medical and cardiology journals was done using 13 search terms including “hierarchical composite”, “win ratio”, and “Finkelstein Schoenfeld” during 01/Jan/2022 to 31/Jan/2024. We identified 61 articles (from 36 unique trials) that included analyses using the win ratio. For multiple such articles from the same trial, we selected the most major (or first) one. A standardized proforma was completed by two reviewers (DT+RO), with any inconsistencies resolved by consensus.

Results

Of the 36 trials identified, 10 were in NEJM, 20 were primary publications, and 10 had win ratio as the primary analysis. Most (N=26) were drug trials, but trials of device/surgery (N=7) and treatment strategies (N=3) also occurred. The most common conditions were heart failure (N=15) and ischemic heart disease (N=5).

The choice of hierarchical components varied: nearly all trials (N=32) had mortality as the first comparison, 30 of which had non-fatal events next. The number of non-fatal event components ranged from 0 (4 trials) to 6 (2 trials). In 27 trials, at least one component was a quantitative outcome, most commonly a quality-of-life score, of which 12 defined a minimal margin to claim a win/loss. Hierarchies ranged from 1 to 9 components, with 3 (N=11) and 4 (N=6) components being most common.

Trials usually reported the unmatched win ratio, its 95% CI and Finkelstein-Schoenfeld p-value, with results commonly presented using flowcharts (N=10) or bar charts (N=12). Win odds (4 trials) and win difference (3 trials) were occasionally reported. Stratified (9 trials) and covariate-adjusted analyses (1 trial) were not common. Of the 28 trials that reported the percentage of tied comparisons, 8 had <10% ties whilst 5 had >70% ties.

Specific examples will be presented to illustrate the diversity of good (and sometimes bad) practice in the use and reporting of the win ratio. We conclude with a set of recommendations for future use.

Discussion

This systematic review is the first to document the diversity of uses of hierarchical composite endpoints and win ratio analyses in journal publications. This portfolio of mostly appropriate applications in cardiovascular trials suggests that hierarchical composite outcomes could be relevant in other diseases where treatment response cannot be captured by a single endpoint.

posters-tuesday: 77

Power calculation using the win-ratio for composite outcomes in randomized trials

David Kronthaler¹, Felix Beuschlein², Sven Gruber², Matthias Schwenkglenks³, Ulrike Held¹

¹Epidemiology, Biostatistics and Prevention Institute, Department of Biostatistics, University of Zurich, Switzerland; ²Department of Endocrinology, Diabetology and Clinical Nutrition, University Hospital Zurich, University of Zurich, Switzerland; ³Health Economics Facility, Department of Public Health, University of Basel, Switzerland

Background: The use of composite outcomes is common in clinical research. These can include, for example, death from any cause and any untoward hospitalization, and corresponding effect measures would be the risk ratio or the hazard ratio typically addressing the time to first occurrence of any of the two events. In these situations, the hierarchy of the outcomes is ignored, and the combination of different outcome distributions is difficult.

Methods: We used the win-ratio approach (Pocock et al. 2024) for the design and sample size calculation of a randomized controlled trial in patients suspected for primary aldosteronism. The win-ratio assumes N_T and N_C patients in treatment and control group, resulting in N_T × N_C pairwise comparisons of patients in treatment and control group. The win-ratio is then calculated as R_W = N_W/N_L, with N_W and N_L being the counts of wins and losses of patients in the treatment group.

The trial has a composite outcome with the following hierarchy:

I Elevated blood pressure (binary, according to WHO definition) and

II Defined daily dose (DDD) of blood pressure medication.

To assess for each comparison whether the patient of the treatment group is the winner or the looser, first hierarchy I, and upon a tie, hierarchy II outcomes are compared. As reference, the power of the trial was compared to a standard sample size calculation for a binary and a continuous outcome with the same specifications.

Results: The power of the trial was assessed with 1000 simulation runs, and with N_T = N_C = 300 patients, assuming 15% drop-out. Our simulation showed that the resulting power of the trial was 85% and the estimated win-ratio R_W was 1.3. Under identical assumptions, standard power calculation would have resulted in 30% power for the hierarchy I outcome, and 73% power for the hierarchy II outcome.

Conclusion: While the win-ratio has been employed in secondary analyses of randomized trials, it has rarely been used at study design level. Sample size calculation using the win-ratio as effect measure is efficient from a methodological perspective, and it captures well the complexities of using potentially censored composite outcomes with a hierarchy in clinical research.

References

Pocock, Stuart J, John Gregson, Timothy J Collier, Joao Pedro Ferreira, and Gregg W Stone. 2024. “The Win Ratio in Cardiology Trials: Lessons Learnt, New Developments, and Wise Future Use.” European Heart Journal 45 (44): 4684–99. https://doi.org/10.1093/eurheartj/ehae647.

posters-tuesday: 78

Feasibility of propensity score weighted analysis in rare disease trials: a simulation study

Alexander Przybylski¹, Francesco Ambrosetti², Lisa Hampson², Nicolas Ballarini²

¹Novartis, UK; ²Novartis, Switzerland

Introduction

Clinical trials in rare diseases often face challenges due to small sample sizes and single-arm non-randomized designs, which increase the risk of confounding bias. Propensity scoring (PS) methods are commonly applied to mitigate such biases. However, in small samples, the ability to fit adequate PS models that reduce covariate imbalance has not been widely studied. In the context of an anticipated large treatment effect where the response probability on control is very low, the statistical challenges of using PS weighting for treatment effect estimation are further complicated. Our aim was to evaluate the feasibility and performance of PS methods under these specific conditions.

Methods

A simulation study was conducted to assess the impact of covariate imbalance, sample size, and treatment effect size on the feasibility and performance of several estimators and intervals for the average treatment effect in the treated (ATT; expressed as a difference in marginal risks). The focus was on two key baseline covariates; large treatment effects informed by prior knowledge; and a small sample size of 15 subjects per arm. Weighted and unweighted ratio estimators, a hybrid approach incorporating PS model convergence and covariate imbalance criteria, and standardization-based estimators were evaluated according to estimator convergence rate, the probability of proceeding with an indirect comparison based on measures of imbalance (standardized mean difference; SMD), and bias. Coverage probabilities of intervals were also calculated.

Results

Standardization-based estimators were unreliable due to low sample size and complete separation issues. Propensity score models could be estimated and were able to reduce imbalance even with small sample sizes and high imbalance. Setting a 0.1 SMD threshold for adequate covariate balance, 25% of simulation runs met the criteria for performing the indirect comparison analysis. The use of a less conservative 0.25 threshold for SMD increased this probability to 50% while maintaining acceptable bias and coverage probability. Conditional on observing at least one response in the control arm, average conditional bias was marginally improved via propensity score weighting.

Conclusions

Propensity score weighting methods can address confounding biases in non-randomized studies, even with small sample sizes and large treatment effects. However, in our setting, the most suitable approach involved using a hybrid method that combines pre-specified criteria for performing the indirect comparison.

posters-tuesday: 79

A basket trial for rare diseases, with a crossover design for its substudies: a simulation study

Elena G Lara, Steven Teerenstra, Kit C.B. Roes, Joanna IntHout

Radboud University Medical Center, Netherlands, The

Background. Recent advancements in precision medicine generate therapy options for rare diseases. Assessing a new treatment targeted to a rare disease subgroup can make recruiting the required sample size even more challenging. Current work recommends grouping rare diseases in a basket trial, where one drug is evaluated in multiple diseases based on a shared etiology (e.g. gene mutation). This allows to include more patients and to borrow information between substudies. A further recommendation to improve efficiency for trials involving chronic and stable conditions is the use of crossover designs. Our research focuses on basket trials with a crossover design for the substudies. These may increase precision of the estimated treatment effect, both by borrowing information across substudies, as well as efficient substudy design.

Methods. In this study, we evaluated the operating characteristics of basket trials where each substudy corresponds to a crossover design via Monte Carlo simulation. We generated realistic scenarios related to the SIMPATHIC project, under parallel and crossover designs, and with different numbers of substudies (from 2 to 9). We applied estimation methods including random-effects meta-analysis, Bayesian hierarchical modelling (BHA), EXNEX, adaptive lasso, stratified analysis and naïve pooling. And we studied the bias, precision, power and false positive rate of the substudy estimates as well as the trial overall estimates.

Results. The efficiency gains of crossover designs in conventional trials are also present in basket trials. Methods that use information borrowing improve estimation of substudy treatment effects in terms of increased precision. This increase in precision is lower in substudies with a crossover design compared to the parallel-group design with the same number of patients; borrowing in this setting also results in lower shrinkage. Among the borrowing methods evaluated, EXNEX seems the most able to discriminate between substudies with a true effect and those with a small or null effect. Meta-analysis, BHA and naïve pooling achieve the highest power for the overall estimate, although this power is low when the treatment had a true effect in less than half of the substudies.

Conclusion. The incorporation of crossover designs to basket trial substudies - when assumptions are met - results in a more efficient design and practicable sample sizes compared to parallel-group designs. Besides, adding randomization and a control per substudy provides a more valid inference than a single arm design. Altogether, this design can facilitate drug development for rare diseases.

Project funded by Horizon Europe (Grant no. 101080249).

posters-tuesday: 80

Comparing randomized trial designs in rare diseases with longitudinal models: a simulation study showcased by Autosomal Recessive Cerebellar Ataxias

Niels Hendrickx¹, France Mentré¹, Alzahra Hamdan², Mats Karlsson², Andrew Hooker², Andreas Traschütz^3,4, Cynthia Gagnon⁵, Rebecca Schüle⁶, ARCA Study group⁷, EVIDENCE-RND Consortium⁷, Matthis Synofzik^3,4, Emmanuelle Comets^1,8

¹Université Paris Cité, IAME, Inserm, F-75018, Paris, France; ²Pharmacometrics Research Group, Department of Pharmacy, Uppsala University, Uppsala, Sweden; ³Division Translational Genomics of Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research (HIH), University of Tübingen, Tübingen, Germany; ⁴German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany.; ⁵Centre de Recherche du CHUS Et du Centre de Santé Et Des Services Sociaux du Saguenay-Lac-St-Jean, Faculté de Médecine, Université de Sherbrooke, Québec, Canada.; ⁶Hertie-Center for Neurology, University of Tübingen, Tübingen, Germany; ⁷Group author; ⁸Univ Rennes, Inserm, EHESP, Irset - UMR_S 1085, 35000, Rennes, France.

Background:

Parallel designs with an end-of-treatment analysis are commonly used for randomised trials (1), but they remain challenging to conduct in rare diseases due to small sample size and heterogeneity. A more powerful alternative could be to use model-based approaches (2,3). We investigated the performance of longitudinal modelling to evaluate disease-modifying treatments in rare diseases using simulations. Our setting was based on a model describing the progression of the standard clinician-reported outcome SARA score in patients with ARCA (Autosomal Recessive Cerebellar Ataxia), a group of ultra-rare, genetically defined, neurodegenerative diseases (4).

Methods:

We performed a simulation study to evaluate the influence of trials settings on their ability to detect a treatment effect slowing disease progression, using a previously published non-linear mixed effect logistic model (5). We compared the power of parallel, crossover and delayed start designs (6,7), investigating several trial settings: trial duration (2 or 5 years); disease progression rate (slower or faster); magnitude of residual error (σ=2 or σ=0.5); number of patients (100 or 40); method of statistical analysis (longitudinal analysis with non-linear or linear models; standard statistical analysis), and we investigated their influence on the type 1 error and corrected power of randomised trials.

Results:

In all settings, using non-linear mixed effect models resulted in controlled type 1 error and higher power (88% for a parallel design) than a rich (75% for a parallel design) or sparse (49% for a parallel design) linear mixed effect model or standard statistical analysis (36% for a parallel design). Parallel and delayed start designs performed better than crossover designs. With slow disease progression and high residual error, longer durations are needed for power to be greater than 80%, 5 years for slower progression and 2 years for faster progression ataxias.

Conclusion:

In our settings, using non-linear mixed effect modelling allowed all three designs to have more power than a standard end-of-treatment analysis. Our analysis also showed that delayed start designs are promising as, in this context, they are as powerful as parallel designs, but with the advantage that all patients are treated within this design.

References:

(1) E9 Statistical Principles for Clinical Trials, https://www.fda.gov/regulatory-information/search-fda-guidance-documents/e9-statistical-principles-clinical-trials, 2020

(2) Synofzik et al. Neuron 2019

(3) Buatois et al; Statistics in Medicine 2021

(4) Karlsson et al. CPT Pharmacometrics Syst Pharmacol 2013

(5) Hamdan et al. CPT 2024

(6) Liu-Seigfert et al. PLoS ONE 2015

(7) Wang et al. Pharmaceutical Statistics 2019

posters-tuesday: 81

Sequential decision making in basket trials leveraging external-trial data: with applications to rare-disease trials

Giulia Risca¹, Stefania Galimberti¹, Maria Grazia Valsecchi¹, Haiyan Zheng²

¹Bicocca Bioinformatics Biostatistics and Bioimaging B4 Center, Department of Medicine and Surgery, University of Milan-Bicocca, Monza, Italy; ²Department of Mathematical Sciences, University of Bath, Bath, UK

Introduction: Rare diseases present unique challenges in the design of clinical trials due to a small pool of eligible patients. Planning rare-disease studies within a basket trials, which can simultaneously evaluate a new treatment in patients with a shared disease trait, is practical because of borrowing strength from relevant patient subgroups. Motivated by a real rare-disease trial under planning, we develop a Bayesian sequential design that allows incorporation of both external-trial and within-trial data for basket trials involving rare diseases.

Methods: We consider two subgroups of patients to receive the same treatment before deciding on if a third one would be treated in the basket trial. The EXNEX method¹ is extended to include a prior mixture component formed using external-trial data. That is, the treatment effects in those three subgroups are assumed to be exchangeable, or non-exchangeable but consistent with the external-trial data, or completely extreme. On the completion of the first two subgroups, our Bayesian meta-analytic-predictive model is used to obtain the predictive probability (PP) of an efficacious treatment in the third subgroup. Interim futility assessment is guided using a power spending function.

Results: We assess the performance of this design through simulations, which results sensitive to the choice of certain parameters (e.g., prior mixture weights, cut-offs for the interim and the final analyses). Specifically, the PPs at the first interim are highly dependent on the different allocation weights. Pessimistic scenarios have large variability in PPs depending on whether the exchangeability or the prior-data consistency assumption is violated. However, it is generally robust when there is strong belief in a highly effective treatment and all models seem to accurately estimate the true treatment effect in each subgroup in terms of bias and mean squared error. Finally, the marginal type I error is always well controlled.

Conclusions: In conclusion, our method allows mid-course adaptation and ethical decision-making. It is novel and can address critical gaps in rare diseases. The principles are generalizable to other contexts.

References:

1. Neuenschwander, B., Wandel, S., Roychoudhury, S. & Bailey, S. (2016) Robust exchangeability designs for early phase clinical trials with multiple strata. Pharmaceutical Statistics, 15, 123–134. Available from: https://doi.org/10.1002/pst.1730

posters-tuesday: 82

Adaptive Designs and Bayesian Approaches: The Future of Clinical Trials

Anjali Yadav

JSS Medical Research Asia Pacific Pvt. Ltd., India

Background / Introduction

Traditional clinical trial designs rely on fixed protocols that do not allow for modifications once the study is initiated. This rigidity can lead to inefficiencies, ethical concerns, and prolonged development timelines. Adaptive designs provide a flexible framework that permits pre-specified modifications based on interim analyses, improving resource allocation and patient outcomes. Meanwhile, Bayesian approaches leverage prior knowledge and continuously update probabilities, offering a more dynamic and intuitive method for decision-making. The integration of these methodologies has the potential to revolutionize clinical trial efficiency, particularly in the era of precision medicine and rare disease research.

Methods

This study reviews key adaptive design strategies, including group sequential, response-adaptive, and platform trials, highlighting their statistical foundations and regulatory considerations. Bayesian methodologies, such as Bayesian hierarchical modeling and predictive probability monitoring, are explored in the context of trial adaptation and decision-making. Case studies from oncology, vaccine development, and rare disease trials are examined to illustrate the real-world application and advantages of these approaches.

Results

Adaptive designs have demonstrated significant reductions in trial duration and costs while maintaining scientific integrity. Bayesian methods have enhanced decision-making by incorporating historical data and real-time learning, leading to more efficient dose-finding, early stopping for efficacy or futility, and improved patient allocation. Regulatory agencies, including the FDA and EMA, have increasingly supported these innovative methodologies, providing frameworks for their implementation. Case studies highlight improved success rates, patient safety, and ethical advantages compared to traditional approaches.

Conclusion

The adoption of adaptive designs and Bayesian approaches is transforming clinical research by making trials more efficient, ethical, and informative. While challenges remain, including regulatory acceptance, operational complexity, and computational demands, ongoing advancements in statistical methods and trial simulations continue to enhance their feasibility. The future of clinical trials lies in the strategic integration of these methodologies, fostering a more flexible and patient-centric approach to drug development.

Conference Agenda