JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at organizers@iscb2025.info.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Poster Exhibition: W / Wednesday posters

Time:

Wednesday, 27/Aug/2025:

2:00pm - 3:30pm

Location: Poster Location

Presentations

posters-wednesday: 1

Exploring the Exposome correlated with Body Mass Index in Adolescents: Findings from the 2014-2015 and 2022-2023 KNHANES

Hye Ah Lee¹, Hyesook Park²

¹Clinical Trial Center, Ewha Womans University Mokdong Hospital, Seoul, Republic of Korea; ²Department of Preventive Medicine, Graduate Program in System Health Science and Engineering, Ewha Womans University, Seoul, Republic of Korea

Background: To identify multifaceted features correlated with body mass index (BMI) in adolescents, we conducted an exposome-wide association study (ExWAS) using data from the Korea National Health and Nutrition Examination Survey (KNHANES), a nationally representative survey.

Methods: To obtain robust findings, we constructed a multi-year dataset covering two study periods (2014-2015 and 2022-2023). Adolescents aged 12 to 18 years with complete BMI data were included, while those dieting for weight loss or health conditions were excluded. This resulted in 941 participants from the 2014–2015 dataset and 637 from the 2022–2023 dataset. Approximately 130 features derived from questionnaires, health examinations, and dietary surveys were analyzed. Standardized BMI (zBMI) was used as the outcome, and ordinal or numeric features were standardized by sex and age using mean and standard deviation. ExWAS was performed through survey-design-based linear regression, adjusting for sociodemographic features. Additionally, pairwise relationships between features were assessed using a mixed graphical model (MGM) network.

Results: In the 2022–2023 dataset, 20.2% of boys and 15.0% of girls were classified as obese. Of the approximately 130 exposomes, 13 features in boys and 9 features in girls were selected as features correlated with BMI. Boys who perceived themselves as unhealthy or considered their body shape as fat also had higher BMI. zBMI was positively correlated with alanine aminotransferase (ALT), white blood cell (WBC), platelets, systolic blood pressure (SBP), total cholesterol, and triglyceride (TG) and negatively correlated with high density lipoprotein cholesterol (HDL-C). These trends were also observed in the 2014–2015 dataset. Among girls, zBMI was positively correlated with ALT, WBC, SBP, and TG and negatively correlated with HDL-C. Girls who perceived their body shape as fat had higher BMI, consistent with findings from the 2014–2015 dataset. Notably, in the 2022–2023 dataset, girls who reported suicidal thoughts had higher BMI. In the MGM network analysis, ALT, WBC, and HDL-C were directly correlated with zBMI across all datasets, regardless of sex.

Conclusion: In adolescents, metabolic indices showed a clear correlation with BMI, and in addition to the commonly considered metabolic indices, ALT and WBC were directly correlated. Furthermore, subjective body shape perception, as assessed through questionnaires, was significantly correlated with BMI.

posters-wednesday: 2

Flexible statistical modeling of undernutrition among under-five children in India

SHAMBHAVI MISHRA

UNIVERSITY OF LUCKNOW, India

Background: Childhood undernutrition has an irreversible impact on the physical as well as mental development of the child. Nutrition-related factors were responsible for about 35% of child deaths and 11% of the total global disease burden. This health condition continues to be a major public health issue across the globe.

Methods: Three standard indices based on anthropometric measurements viz. weight and height, that describe nutritional status of children are: height-for-age (stunting), weight-for-age (underweight) and weight-for-height (wasting). Z-scores have been computed on the basis of appropriate anthropometric indicators (weight & height) relative to the WHO International reference population for the particular age. This paper utilises unit-level data on under-five children of India from the NFHS-5, 2019-2021 to find out factors which exert a differential impact on the conditional distribution of the outcome variable. A class of models that allow flexible functional dependence of an outcome variable on covariates by using nonparametric regression have been applied to determine possible factors causing undernutrition. This study also fits a Bayesian additive quantile regression model for the provision of a complete picture of the relationship between the outcome variable and the predictor variables on different desired quantiles of the response distribution. Different types of quantile regression models were fitted and compared according to each Deviance Information Criteria (DIC) for determination of the best model among them.

Results: Maternal characteristics like nutrition, education showed significant impact on child’s nutritional status, consistent with the findings of other studies. Child’ s age and Mother’s nutrition were among the continuous factors exerting non-linear effect on stunting, with mother’s BMI showing maximum effect size at lower end of the distribution. Also it could be seen that maximum number of covariates were found significant for severe undernutrition, indicating that differential effect of predictors on the conditional distribution of the outcome variables.

Conclusions: Although widely applicable, logistic regression model enables the researcher to have an idea of the determinants of undernutrition, providing only a preliminary basis. To study variables such as nutritional status of children, where lower quantiles are of main interest, focus should be on how factors affect the entire conditional distributional of the outcome variable taken as is rather than summarizing the distribution at its mean. This can be achieved by applying quantile regression modeling. An extension to it further enables to non-parametrically estimate the linear or potentially non-linear effects of continuous covariates differentially on the outcome using penalized splines.

posters-wednesday: 3

Comparison of deep learning models with different architectures and training populations for ECG age estimation: Accuracy, agreement, and CVD prediction

Arya Panthalanickal Vijayakumar¹, Tom Wilsgaard¹, Henrik Schirmer^2,3, Ernest Diez Benavente⁴, René van Es⁵, Rutger R. van de Leur⁵, Haakon Lindekleiv⁶, Zachi I. Attia⁷, Francisco Lopez-Jimenez⁷, David A. Leon⁸, Olena Iakunchykova⁹

¹Department of Community Medicine, UiT The Arctic University of Norway, Norway; ²Akershus University Hospital, Lørenskog, Norway; ³Institute of Clinical Medicine, Campus Ahus, University of Oslo, Norway; ⁴Department of Experimental Cardiology University Medical Center Utrecht, The Netherlands; ⁵Department of Cardiology University Medical Center Utrecht, The Netherlands; ⁶Department of Radiology, University Hospital of North Norway; ⁷Mayo Clinic College of Medicine, Rochester, MN, USA; ⁸Department of Noncommunicable Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, United Kingdom; ⁹Department of Psychology, University of Oslo, Norway

Background: Several convolutional neural networks (CNNs) have been developed to estimate biological age based on 12-lead electrocardiograms (ECG) - ECG age. This new biomarker of cardiac health can be used as a predictor of cardiovascular disease (CVD) and mortality. Before implementation into clinical practice, it is crucial to compare proposed CNN models used to estimate ECG age to assess their accuracy, agreement, and predictive abilities on external sample.

Methods: We used 7,108 participants from the Tromsø Study (2015-16) to compare ECG ages estimated with three different previously proposed CNNs. CNNs differed by model architecture and/or population that they were trained and tested on. We calculated mean absolute error (MAE) for each CNN. Agreement was assessed using Pearson and intraclass correlation coefficients (ICC), and Bland-Altman (BA) plots. The predictive abilities of each ECG age or δ-age (difference between ECG age and chronological age) were assessed by the concordance index (C-index) and hazard ratios (HRs) from Cox proportional hazards models for myocardial infarction (MI), stroke, CVD mortality, and all-cause mortality, with and without adjustment for traditional risk factors.

Results: All three CNNs had fairly close MAEs (6.82, 7.82, and 6.42 years) and similar Pearson correlation coefficients with chronological age (0.72, 0.71, and 0.73, respectively). Visual agreement using BA plots was good, and the ICC indicated good agreement (0.86; 95% CI: 0.86, 0.87). The multivariable adjusted HRs for MI and total mortality were strongest for δ-age₁ (HR 1.36 (1.11, 1.67) and 1.27 (1.08, 1.50), respectively, while HRs for stroke and CVD mortality were strongest for δ-age₂ (HR 1.45 (1.17, 1.80) and 1.48 (1.07, 2.05), respectively. The 6-year survival probability predictions showed excellent agreement among all δ-ages for all outcomes in terms of both BA plots and ICC. The C-index values showed no significant difference between pairwise combinations of models with ECG age₁, ECG age₂, or ECG age₃ for all outcomes.

Conclusion: We observed good agreement between ECG ages estimated by three different CNNs in terms of accuracy, agreement, and predictive ability. We did not identify that one CNN for ECG age is superior over another for prediction of CVD outcomes or death in the Tromsø Study.

posters-wednesday: 4

Systematic review and real life-oriented evaluation on methods for feature selection in longitudinal biomedical data

Alexander Gieswinkel^1,2,3, Gregor Buch^1,3, Gökhan Gül^1,4, Vincent ten Cate^1,3,4, Lisa Hartung², Philipp S. Wild^1,3,4,5

¹Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, 55131 Mainz, Germany; ²Institute of Mathematics, Johannes Gutenberg University Mainz, 55128 Mainz, Germany; ³German Center for Cardiovascular Research (DZHK), partner site Rhine Main, 55131 Mainz, Germany; ⁴Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis, University Medical Center of the Johannes Gutenberg University Mainz, 55131 Mainz, Germany; ⁵Institute of Molecular Biology (IMB), 55131 Mainz, Germany

Background

High-dimensional omics data are increasingly commonly available for longitudinal cohort studies since biochemical technology is improving. Supervised feature selection based on biomedical data from multiple time points is often required. However, an overview of existing methods for this setting is lacking, which motivated a systematic review and evaluation of this area.

Methods

A systematic search of statistical software was conducted to identify relevant methods. The Comprehensive R Archive Network (CRAN) was examined via the R package ‘packagefinder’ with a search query containing relevant keywords. Eligible software was characterised by manually screening the package descriptions, and through computational testing with a fix application example. An ADEMP-designed simulation study was conducted to evaluate the identified methods in real-world scenarios, considering varying sample sizes, predictors, time points and signal-to-noise ratios. Only frequentist implementations with given default settings were included for a fair comparison. The estimated true positive rate (eTPR) and estimated false discovery rate (eFDR) were chosen as performance measures.

Results

Of 21,528 accessible packages on CRAN, 324 packages with matching keywords in the descriptions were extracted by the search query. Screening of the descriptions identified 45 packages that were then tested in R, leading to 14 packages. Six packages were based on mixed effects models (‘buildmer’, ‘rpql’, ‘splmm’, ‘alqrfe’, ‘plsmmLasso’; ‘glmmLasso’), five on generalized estimating equations (‘sgee’, ‘LassoGEE’, ‘geeVerse’; ‘PGEE’, ‘pgee.mixed’), two methods were built on Bayesian frameworks (‘sparsereg’; ‘spikeSlabGAM’) and one package was modelling time series (‘midasml’). All implementations were able to process continuous outcomes, while only four supported binary outcomes. A total of N=8 frequentist methods with sufficient default settings were considered in the simulation study.

The packages ‘buildmer’ and ‘pslmmLasso’ consistently demonstrated an eTPR exceeding 80% while maintaining the eFDR under 20%, across various signal-to-noise settings. By comparison, all other methods underperformed in jointly evaluating both performance metrics. ‘splmm’ achieved similar eFDR but yielded lower eTPR, whereas ‘geeVerse’ showed an opposite trend. In contrast, both ‘rpql’ and ‘alqrfe’ failed to select any variables.

Conclusions

Supervised feature selection in longitudinal biomedical data can be performed using a variety of methods. The majority of the available statistical software is based on frequentist techniques, while Bayesian procedures represent a minority. Alternative concepts like tree-based methods are notably absent. No evidence of superiority was found for modern selection techniques such as penalized regression (‘pslmmLasso’) over traditional approaches like stepwise regression (‘buildmer’) for feature selection in longitudinal data.

posters-wednesday: 5

How does acute exposure to environmental factors relate to stroke characteristics such as stroke type, severity, and impairments?

BOHAN ZHANG¹, Andy Vail¹, Craig Smith², Amit Kishore², Matthew Gittins¹

¹University Of Manchester, United Kingdom; ²Manchester Centre for Clinical Neuroscience, Manchester, United Kingdom

Background and Aims: The overall aim of this subject is to better understand the association between acute exposure to environmental factors such as ambient air pollution and temperature and stroke characteristics. We aim to focus on the short-term acute effects associated with same-day or up to 30 days before stroke. Specifically, I will look into Stroke Counts and Stroke Severity using non-identifiable patients data from SSNAP from Manchester Stroke Units.

Methods: We may employ a cohort (or case-control design), where the cohort is all stroke patients within Greater Manchester/Salford, their exposure is the exposure leading up to stroke, and their outcomes are the post-stroke characteristics. It’s more likely to be a cohort study, but the case-control might help deal with some of the selection issues, i.e. the group is defined by being a stroke patient. Rather than being identified before and following up to see if they become a stroke patient or not.

We will employ methods to model the lagged effects of air pollution such as the lag stratified model (where days are grouped and average exposure is modelled), and the distributed lag models (where polynomial functions are applied to represent the 30 days).

Results: The results are still under way and will be presented on ISCB.

Conclusion: From some literature on other diseases it often seen that extreme condition in Environmental Exposure is highly likely to lead to a worse outcome. But the result is still on the way

posters-wednesday: 6

Improving TBI Prognosis in developing world: A Machine Learning-based AutoScore Approach to Predict Six-Month Functional Outcome

Vineet Kumar Kamal¹, Deepak Agrawal²

¹AIIMS, Kalyani; ²AIIMS, New Delhi

Background

Traumatic brain injury (TBI) presents a significant challenge in predicting long-term functional outcomes due to its complex nature and variability among patients. Accurate prognostic tools are essential for clinicians to guide treatment decisions and set realistic expectations for recovery. To address this, AutoScore employs a machine learning-based approach that automates the generation of clinical scores, facilitating the prediction of outcomes. This study aims to develop, validate and to see clinical utility of a prognostic model to accurately predict six-month functional outcomes in severe/moderate, adult TBI patients, enhancing risk stratification.

Methods

This retrospective cohort study included 1,085 adult patients with TBI from a public, tertiary care, level-1 trauma center in India. We considered a total of 72 demographic, clinical, secondary insults, CT and lab variables from admission to first discharge. We developed the AutoScore framework, consisting of six distinct modules: variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. We divided the whole dataset randomly into (0.7, 0.1, 0.2) to develop, parameter tuning/validation, and testing. The predictive performance of the AutoScore framework was evaluated using various metrics, including receiver operating characteristic (ROC) curves, calibration curves, brier score, and decision curves for clinical utility analysis. All the analyses were performed using R software v.4.3.3.

Results

The AutoScore model identified only four key risk predictors: motor response at discharge, verbal response at discharge, motor response at the time of admission, and eye-opening response at discharge, with higher scores indicating an increased risk of an unfavorable six-month outcome in TBI patients. The final model achieved an AUC of 0.93 (95% CI: 0.88–0.98) on the validation set and 0.81 (95% CI: 0.76–0.86) on the test set, demonstrating strong predictive performance. Brier score was 0.14 and graphical plot, and observed-to-expected ratio 0.978 suggested that the model was well-calibrated in test data. Our model was useful in the 0.0–0.6 threshold range (offers better net benefit). The predicted risk increased steadily with the total score, as depicted in the probability plot, with patients scoring above 75 exhibiting near-certain risk of an unfavorable outcome.

Conclusion

The AutoScore-based prognostic model demonstrated strong predictive performance for six-month functional outcomes in moderate-to-severe TBI patients using only four key predictors. These findings suggest that the model could serve as a valuable tool for clinicians in early risk assessment and decision-making. Further validation in diverse populations with recent data is warranted to confirm its generalizability and clinical applicability.

posters-wednesday: 7

Predictive Risk Index for Poor Cognitive Development Among Children Using Machine Learning Approaches

Anita Kerubo Ogero¹, Patricia Kipkemoi¹, Amina Abubakar^1,2,3

¹Aga Khan University, Nairobi, Kenya, Institute for Human Development, Aga Khan University, P.O. BOX 30270-00100, Nairobi, Kenya; ²Centre for Geographic Medicine Research Coast, Kenya Medical Research (KEMRI), P.O Box 230-80108, Kilifi, Kenya; ³Department of Psychiatry, University of Oxford, Warneford Hospital, Warneford Ln, Oxford OX37JX, United Kingdom

Poor cognitive development in early childhood is a major global concern, with over 200 million children failing to reach their developmental milestones due to factors like malnutrition and poverty - particularly in low- and middle-income countries. Cognitive abilities established during childhood are critical determinants of a child’s future academic and socio-economic outcomes. Despite extensive research on socio-demographic, environmental and nutritional influences on cognitive development, there remains a gap in developing a predictive risk index tailored for resource-constrained settings. Early identification of at-risk children is essential to enable timely interventions and inform policy. In this study, we propose to develop and validate a risk index for poor cognitive development among children using advanced machine-learning techniques. Secondary data from approximately 7,000 children, assessed with the Raven’s Progressive Matrices (RPM), will be analysed; cognitive development is classified into no-risk, low-risk, and high-risk groups based on age-adjusted percentile scores. Predictor variables integrate socio-demographic factors (e.g., parental education, socioeconomic status) and nutritional indicators (e.g., anthropometric measurements such as height, weight, head circumference, and derived indices like weight-for-age and height-for-age z-scores). Our analytic framework integrates several methods including logistic regression, Random Forest, Support Vector Machines, Artificial Neural Networks, and Extreme Gradient Boosting. Data preprocessing involves feature selection via Recursive Feature Elimination (RFE) and dimensionality reduction using Principal Component Analysis (PCA). Decision thresholds will be optimised through the Receiver Operating Characteristic (ROC) curve analysis and Youden’s Index to balance sensitivity and specificity. Key risk factors significantly associated with poor cognitive development will be identified, forming the basis for a validated risk index. The risk index will be assessed for predictive accuracy and generalisability. The developed risk index will represent a significant advancement in the early identification of children at risk for poor cognitive development in low-resource environments. Findings may inform policy decisions and the development of digital tools, such as mobile applications, for real-time cognitive risk assessment. Moreover, this tool holds promise for improving long-term developmental outcomes by optimising resource allocation and enabling targeted interventions.

posters-wednesday: 8

Development and validation of a model to predict ceiling of care in COVID-19 hospitalised patients

Natàlia Pallarès Fontanet¹, Hristo Inouzhe², Jordi Cortés³, Sam Straw⁴, Klaus K Witte⁴, Jordi Carratalà⁵, Sebastià Videla⁶, Cristian Tebé¹

¹Biostatistics Support and Research Unit, Germans Trias i Pujol Research Institute and Hospital (IGTP), Badalona, Spain; ²Basque Center for Applied Mathematics, BCAM, Bilbao, Spain; ³Department of Statistics and Operations Research, Universitat Politècnica de Catalunya/BarcelonaTech, Barcelona, Spain; ⁴Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK; ⁵Department of Infectious Diseases, Bellvitge University Hospital, Barcelona, Spain; ⁶Clinical Research Support Area, Department of Clinical Pharmacology, Germans Trias i Pujol University Hospital, Badalona, Spain

Background: Therapeutic ceiling of care is the maximum therapeutic effort to be offered to a patient based on age, comorbidities, and the expected clinical benefit in relation to resource availability. COVID-19 patients with and without an assigned ceiling of care at hospital admission have different baseline variables and outcomes. Analysis of hospitalised COVID-19 subjects should be stratified by ceiling of care to avoid bias, but there are currently no models to predict their ceiling of care. We aimed to develop and validate a clinical prediction model to predict ceiling of care at hospital admission.

Methods: The data used to develop the model came from an observational study conducted during four waves of COVID-19 in 5 centres in Catalonia. Data were sampled 1000 times by bootstrapping. For each sample, a logistic regression model with ceiling as outcome was fitted using backward elimination. Variables retained in more than 95% of the models were candidates for the final model. Alternative variable selection methods such as Lasso, CART, and Boruta were also explored. Discrimination was assessed by estimating the area under the ROC curve and the Brier Score, and calibration by comparing observed versus expected probabilities of ceiling of care by deciles of predicted risk. The final model was validated internally, and externally using a cohort from the Leeds Teaching Hospitals NHS Trust.

Results: A total of 5813 patients were included in the development cohort, of whom 31.5% were assigned a ceiling of care on admission. A model including age, COVID-19 wave, chronic kidney disease, dementia, dyslipidaemia, heart failure, metastasis, peripheral vascular disease, chronic obstructive pulmonary disease, and stroke had excellent discrimination (AUC 0.898 [0.889; 0.907]; Brier Score 0.113) and calibration (slope of the regression line between observed and predicted β=1.01 [0.94; 1.08]) in the whole cohort and in subgroups of interest. External validation on the Leeds Teaching Hospitals cohort also showed good performance (AUC 0.934 [0.908; 0.959]; Brier Score 0.110; β=0.98 [0.80; 1.17]).

Conclusions: Ceiling of care can be predicted with great accuracy from baseline information available at hospital admission. Cohorts without information on ceiling of care could use our model to estimate the probability of ceiling of care. This model, combined with clinical expertise, may be valuable in future pandemics or emergencies requiring time-sensitive decisions about life-prolonging treatments, but further evaluation outside of COVID-19 is needed.

posters-wednesday: 9

Transformer Models for Clinical Prediction – Investigation of BEHRT in UK Biobank and prediction assessment under different scenarios

Yusuf Yildiz, Goran Nenadic, Meghna Jani, David A. Jenkins

The University of Manchester, United Kingdom

Background:

Transformer-based Large Language models (LLMs) like BEHRT¹ have shown potential in modelling Electronic Health Records to predict future instances. These models can represent patient histories by including structured (diagnoses) and unstructured data (doctor notes)². BEHRT showed superior performance over the state-of-the-art models at the time its developed using a large primary care data set. However, it’s unclear if such model and high accuracy can be achieved for other real-world datasets i.e. hospital data. Developing LLMs requires selecting various decisions like data split strategies, medical terminology selection and parameters. Parameter choices have been shown to impact model performance, stability and generalisability, but it’s unclear the extent this also hold for LLMs. This study aims to implement the BEHRT architecture in the UK Biobank and identify challenges of implementing this model into different dataset. The secondary aim is to assess the impact of parameter choices on prediction performance.

Methods:

This study uses UK Biobank data. To capture key features of patient histories, embeddings are created using diagnoses and age at diagnosis. BEHRT workflow included pretraining with masked language modelling (MLM) and fine-tuning for next-diseases prediction across different time frames. Prediction performance was evaluated using Average Precision Score and AUROC. Initially, the original study is replicated using UK Biobank to assess the impact of dataset variability. Subsequently, the model’s performance was evaluated to assess the effects of different medical terminologies (ICD10 and CALIBER phenotyping) and data splits.

Results/Conclusion:

Results showed that decisions that we make while we develop these models using different datasets effects the performance of the model. Our replicated BEHRT model did not achieve as hight predictive performance as performance metrics as the original. Terminologies with bigger vocabularies showed worse performance. Complete separation of the MLM and fine-tuning data resulted with worse performed model. However, most developed models use complete dataset for pre-training and therefore are likely to exhibit overly optimistic performance.

Also, more rigorous, definitive framework and assessment workflow is needed for LLM development in clinical prediction. Especially clinical usefulness of these model should be examined. Reporting guidelines, like TRIPOD-LLM³ should be used for transparent model development.

Further work is needed on time-to-event analysis, censoring adjustment, transparent decision-making and computational costs adjustment for better integration into clinical prediction.

posters-wednesday: 10

What is the best way to analyse ventilator-free days?

Laurent Renard Triché^1,2,3, Matthieu Jabaudon^1,2, Bruno Pereira⁴, Sylvie Chevret^2,5

¹Department of Perioperative Medicine, CHU Clermont-Ferrand, Clermont-Ferrand, France; ²iGReD, INSERM, CNRS, Université Clermont Auvergne, Clermont-Ferrand, France; ³ECSTRRA Team, IRSL, INSERM UMR1342, Université Paris Cité, Paris, France; ⁴Biostatistics Unit, Department of Clinical Research, and Innovation (DRCI), CHU Clermont-Ferrand, Clermont-Ferrand, France; ⁵Department of Biostatistics, Hôpital Saint-Louis, AP-HP, Paris, France

Introduction

Ventilator-free days (VFDs) are a composite outcome increasingly used in critical care research, reflecting both survival and mechanical ventilation duration. However, inconsistencies exist in the models used to analyse VFDs. Some researchers evaluate VFDs as a count, primarily using the Mann-Whitney statistics, while others consider them as a time-to-event outcome, where survival is a competing risk for extubation. Alternative approaches such as the multi-state model and the win ratio warrant investigation.

This study aimed to evaluate different statistical models to determine the best approach for analysing VFDs.

Methods

First, a clinical trial dataset (LIVE study, NCT02149589) was used to apply different statistical models to analyse VFDs. Then, 16 datasets of 300 individuals were simulated with 3,000 independent replications, comparing a control group with an intervention strategy by varying survival rates and ventilation durations derived from exponential distributions. The simulated data were analysed using the same statistical methods, and statistical power and type I error rates were compared between different models.

Eleven statistical methods were evaluated, including the Mann-Whitney test, the zero-inflated negative binomial model, the negative binomial hurdle model, the zero-inflated Poisson model, the Poisson hurdle model, the log-rank test, the Gray test, the cause-specific hazard model, the Fine-Gray model, the multistate Markov model, and the win ratio.

In addition, three sensitivity analyses were performed by adjusting the survival rates and/or ventilation durations in the control group.

Results

In the LIVE study, almost all methods identified a significant association between VFDs (or related measures) and the patient groups, except for the count submodels, the log-rank test, and the cause-specific hazard model for the survival.

For the simulated data, the 28-day mortality rate was set at 20% and the mean duration of ventilation at 15 days for the control group. Most statistical methods effectively controlled the type I error rate, although exceptions included the zero-inflated and hurdle Poisson/negative binomial count sub-models and the cause-specific Cox regression model for survival. Statistical methods had variable power to detect survival benefits and effects on duration of ventilation, with the time-to-event approach and the win ratio generally having the highest power.

The sensitivity analyses found similar results.

Conclusion

The time-to-event approach and the win ratio were more appropriate than the count-based methods to analyse the VFDs and may be extended to other free-days outcomes. Simulation should be recommended for power calculation and sample size estimation rather than a simplified formula.

posters-wednesday: 11

Compare Estimation and Classification Performances of Statistical Shrinkage Methods Ridge Regression, Lasso Regression, and Elastic Net Regression

Gamze Ozen, Fezan Mutlu

Eskisehir Osmangazi University, Medical Faculty, Department of Biostatistics, Eskisehir, Turkey

Introduction: Advances in data science indicate the need to improve the reliability of regression model estimation when the number of independent variables exceeds the number of observations in multidimensional datasets. Such a dataset's multicollinearity causes the accuracy of prediction models to be reduced. This study aims to assess the performance of Ridge, Lasso, and Elastic net regression methods in the case of multicollinearity and multidimensional datasets.

Method: Performance of three regression methods where Ridge, Lasso, and Elastic Net is verified by data stimulation that Elastic Net method exhibits superiority to all the strongly correlated variables into the model over Ridge and Lasso methods. Models are applied to the dataset containing the serum miRNA in large cohorts to identify the miRNAs that can be used to detect breast cancer in the early stage (Shimomura et al., 2016).

Results: Data simulations verify that Elastic Net regression produces better results with an accuracy of 0.963 when the data is high-dimensional and has strong multicollinearity. A determination of breast cancer by miRNAs shows that Elastic Net can use classification with 96% accuracy.

Conclusion: The findings suggest that statistical Shrinkage Methods such as Ridge Regression, Lasso Regression, and Elastic Net Regression are reliable and useful for prediction and classification research on linear and logistic models. This study suggests that Statistical Shrinkage Methods may be enhanced in health science to generate stronger models.

posters-wednesday: 12

Defining Harm in Settings with Outcomes that are Not Binary

Amit Sawant, Mats Stensrud

EPFL, Switzerland

The increasing application of automated algorithms in personalised medicine necessitates that algorithm recommendations do not harm patients, in accordance with the Hippocratic maxim of “Do no harm.” A formal mathematical definition of harm is essential to guide these algorithms in adhering to this principle. A counterfactual definition of harm has been previously proposed, which asserts that a treatment is considered harmful if there exists a non-zero probability that the potential outcome under treatment for an individual is worse than the potential outcome without treatment. Existing literature on counterfactual harm has primarily focused on binary treatments and outcomes. This study aims to illustrate that in scenarios involving multiple treatments and multi-level outcomes, the counterfactual definition of harm can result in intransitivity in the ranking of treatments. Specifically, we analyse three treatments—A, B, and C—for a particular disease. We demonstrate that treatment B is less harmful than treatment A, treatment C is less harmful than treatment B, yet treatment C is more harmful than treatment A in direct comparison, if we follow the counterfactual definition. Our example highlights that the intuitive concept of counterfactual harm in binary settings does not extend to scenarios involving more than two treatments and outcomes. On the other hand, an interventionist definition of harm in terms of utility circumvents the issue of intransitivity.

posters-wednesday: 13

Brier pseudo-observation score for selecting a multiplicative, an additive or an additive-multiplicative hazards regression model

François Lefebvre¹, Roch Giorgi²

¹Groupe méthode en recherche clinique, service de santé publique, Hôpitaux universitaires de Strasbourg, Strasbourg, France; ²Aix Marseille Univ, APHM, Inserm, IRD, SESSTIM, ISSPAM, Hop Timone, BioSTIC, Marseille, France

Background In survival analysis, data can be modelled in different ways: with the Cox model, with an additive hazards model, as the Aalen’s model or with an additive-multiplicative model, as the Cox-Aalen model. Covariates act on the baseline hazard multiplicatively in the first model, additively in the second and some of these act multiplicatively, others additively in the third. Correct modelling of the covariates requires knowledge of its effect on the baseline hazard, which is rarely known a priori. The pseudo-observations has been used in the evaluation of the impact of a covariate on survival outcomes, in addition to the verification of the assumptions inherent in the Cox (proportional hazards, log-linearity) and the Aalen (linearity) models [1]. Nowadays, they do not permit to know which one of the multiplicative, additive or additive-multiplicative model is the more appropriate for a particular survival dataset. The aim of this study is to propose a method for selecting a multiplicative, an additive or an additive-multiplicative hazards regression model adapted to the survival data-generating mechanism.

Methods We propose to use the Brier pseudo-observation score defined by Perperoglou [2] as the mean of the square difference of the pseudo-observations and the survival estimates obtained using a regression model. Therefore, for each type of regression model, Brier pseudo-observation score can be computed and compared to each other. Since the Brier pseudo-observation score is analogous to the mean square error of prediction, the lower the score the better the model. In order to reduce the risk of overfitting, the model parameters were estimated for each individual using Jackknife. Performance of this approach was assessed in simulation studies comparing Brier pseudo-observation score obtained with a multiplicative, an additive and an additive-multiplicative model, in situations in which survival data-generating mechanism was either multiplicative, additive or had both effect.

Results This measure selected the model used to generate the data in over 80% in most of the scenarios considered. The utilisation of this approach was exemplified by an epidemiological example of female breast cancer with the objective of ascertaining the impact of nodal status, age and tumour size on the baseline hazard.

Conclusion This method has been demonstrated to achieve optimal performance in the selecting the hazards regression model adapted to the data-generating mechanism.

[1] M. Pohar Perme, K. Andersen. Statistics in Medicine, 27, 2008, 5309–5328.

[2] A. Perperoglou, A. Keramopoullos, H. C. van Houwelingen. Statistics in Medicine, 26, 2007, 2666–2685.

posters-wednesday: 14

Bayesian spatio-temporal analysis of the COVID-19 pandemic in Catalonia

Pau Satorra, Cristian Tebé

Biostatistics Support and Research Unit, Germans Trias i Pujol Research Institute and Hospital (IGTP), Spain

Introduction: The COVID-19 pandemic posed an unprecedented challenge to public health systems worldwide. The spread of the pandemic varied in different geographical regions, even at the level of small areas. This study investigates the spatio-temporal evolution of COVID-19 cases and hospitalisations in the different basic health areas (ABS) of Catalonia during the pandemic period (2020-2022). Additionally, it assesses the impact of demographic and socio-economic factors, as well as vaccination coverage, on infection and hospitalisation rates at an ABS level.

Methods: Data were obtained from the official open data catalogue of the Government of Catalonia. Bayesian hierarchical spatio-temporal models were used, estimated with Integrated Nested Laplace Approximation (INLA) Demographic and socio-economic ABS variables were included in the models to assess its role as risk factors for cases and hospitalisations. Full ABS vaccination coverage was also incorporated to assess its effect. All analyses were performed using the R statistical program.

Results: During the study period, a cumulative total of 2,685,568 COVID-19 cases and 144,550 hospitalisations were reported in Catalonia, representing a 35% and a 1.89% of the total population, respectively. The estimated spatial, temporal and spatio-temporal relative risks (RR) were visualized through maps and plots, identifying high-risk (hotspots) and low-risk (coldspots) areas and weeks. These results were presented in an interactive R-shiny application: https://brui.shinyapps.io/covidcat_evo/. Urban areas had a higher risk of cases (RR: 5%, CI95%: 2-9%) and hospitalisations (RR: 17%, CI95%: 10-25%). Higher socio-economic deprivation index was associated with an increased hospitalisation risk (RR: 19%, CI95%: 17-22%). Finally, a higher full vaccination coverage in the ABS was associated with a reduced risk of cases (RR: 12%, CI95%: 5-18%) and hospitalisations (RR: 17%, CI95%: 2-32%) during the fourth and fifth pandemic waves.

Conclusion: This study provides a comprehensive study to understand the COVID-19 pandemic across the territory of Catalonia at the small area level, revealing the spatial, temporal and spatio-temporal patterns of the disease. Urban areas had a higher risk of COVID-19 cases and hospitalisations, socio-economic deprivation increased hospitalisations, and full vaccination was protective against cases and hospitalisations during specific pandemic waves. These findings offer valuable insights for public health policymakers to design targeted interventions against future infectious disease threats.

posters-wednesday: 15

A Simulation Study of Bayesian Approaches to Spatial Modelling Using the Besag-York-Mollie Model

Hollie Hughes, David Hughes

Department of Health Data Science, University of Liverpool, United Kingdom

Background/Introduction:

Spatial modelling can be a useful tool for analysing patterns and relationships in data to indicate how events might be spatially related. It is known that neighbouring areas tend to be more strongly correlated and share similar characteristics than distant areas when modelling and mapping data, creating a spatial autocorrelation problem. Spatial models have been successfully developed to account for this autocorrelation problem in areal data, allowing patterns to be succesfully modelled. However, to do this in a Bayesian framework, the Markov Chain Monte Carlo (MCMC) method can often be computationally expensive, particularly in larger spatial datasets. Therefore, many researchers opt for the Integrated Nested Laplace Algorithm (INLA) approach for computational savings. We suggest an alternative using approximate Mean Field Variational Bayes (MFVB) algorithms to decrease the computational burden as the INLA approach does, whilst potentially sustaining accuracy that is promised through the MCMC approach.

Method:

We provide a comparison of the MCMC, INLA and MFVB approach to the Besag-York-Mollie (BYM) model which is commonly used for spatial modelling to account for spatial dependencies. We conducted a simulation study to compare the performance of the three approaches to fitting the BYM model to spatially structured data on incidence of Depression. Synthetic datasets were generated under the BYM model specification outlined in Morris (2019), incorporating both spatially structured and unstructured random effects (1).

Each method was implemented using standard Bayesian modelling tools in R: INLA via the R-INLA package, MCMC using Stan, and MFVB using Stan’s Variational Bayes options. We assessed computational efficiency, and accuracy for each method by comparing posterior estimates against the true simulated values and measuring time taken to fit each model. Accuracy of results were assessed both in terms of distributional similarity and accuracy of point estimates.

Results:

Results will include comparisons of accuracy and performance metrics including measures comparing the ground truth with MCMC results and computation time for each model. The results will be summarised across multiple simulated datasets to evaluate consistency and robustness. Evaluation is ongoing but full results will be presented at the conference.

Conclusion:

This simulation study may indicate the usefulness of the MFVB approach as an alternative to the MCMC approach with the potential of being substantially as accurate when simulated values are known, alongside possible computation speed gains.

References:

1. Morris M. Spatial Models In Stan: Intrinsic Auto-Regressive Models for Areal Data 2019 [Available from: https://mc-stan.org/users/documentation/case-studies/icar_stan.html.

posters-wednesday: 16

A Bayesian analysis of FINEARTS-HF

Alasdair D Henderson¹, Brian L Claggett², Akshay S Desai², Mutthiah Vaduganathan², Carolyn S Lam³, Bertram Pitt⁴, Michele Senni⁵, Sanjiv J Shah⁶, Adriaan A Voors⁷, Faiez Zannad⁸, Meike Brinker⁹, Flaviana Amarante¹⁰, Katja Rohwedder¹¹, James Lay-Flurrie¹², Scott D Solomon², John JV McMurray¹, Pardeep S Jhund¹

¹BHF Glasgow Cardiovascular Research Center, School of Cardiovascular and Metabolic Health, University of Glasgow, Glasgow, Scotland, UK; ²Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA; ³National Heart Centre Singapore & Duke-National University of Singapore, Singapore; ⁴University of Michigan, School of Medicine, Ann Arbor, Michigan, USA; ⁵University Bicocca Milan, Italy, Papa Giovanni XXIII Hospital, Bergamo, Italy; ⁶Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA; ⁷University of Groningen, Groningen, Netherlands; ⁸Université de Lorraine, Inserm Clinical Investigation Centre, CHU, Nancy, France; ⁹Bayer AG, Research & Development, Pharmaceuticals, Wuppertal, Germany; ¹⁰Cardiology and Nephrology Clinical Development, Bayer SA, São Paulo, Brazil; ¹¹Global Medical Affairs, Berlin; ¹²Bayer plc, Research & Development, Pharmaceuticals, Reading, UK

Background: The FINEARTS-HF trial was a large, double-blind, placebo-controlled, randomised trial of the non-steroidal mineralocorticoid receptor agonist (MRA) finerenone. Conventional frequentist analysis of FINEARTS-HF found that finerenone reduced the primary composite endpoint of heart failure events and cardiovascular death in patients with heart failure with mildly reduced or preserved ejection fraction (HFmrEF/HFpEF) (rate ratio 0.84; 95% confidence interval, 0.74 to 0.95; P = 0.007). Bayesian methods offer alternative analytical approaches to provide probabilistic estimates of efficacy and safety, and flexibility to allow the inclusion of prior information and hierarchical modelling of subgroup effects. We analysed FINEARTS-HF with Bayesian methods to demonstrate the strengths and limitations compared to the primary frequentist analysis.

Methods: In a pre-specified Bayesian analysis of FINEARTS-HF, we estimated treatment efficacy under a range of scenarios incorporating prior information from two trials of finerenone in participants with chronic kidney disease and type 2 diabetes (FIDELIO-DKD and FIGARO-DKD, pooled in the FIDELITY program) and a steroidal MRA in patients with HFmrEF/HFpEF (TOPCAT). We also used a combination of these trials in a robust meta-analytic prior. All models of the primary recurrent endpoint were analysed with Bayesian Cox proportional hazards models, with stratum-specific baseline hazards and hierarchical structure for subject-specific random effects. Secondary endpoints were analysed with Bayesian stratified Cox proportional hazards models. We used Bayesian hierarchical models to estimate subgroup effects with reduced heterogeneity from small sample sizes in frequentist subgroup analyses.

Results: A total of 6,001 patients were included and the Bayesian analysis with vague priors confirmed the primary frequentist results with a 95% probability that the rate ratio was between 0.74 and 0.94. Including prior information from previous nonsteroidal and steroidal MRA trials supported this finding and strengthened the probability of a beneficial treatment effect. Bayesian subgroup estimates were qualitatively similar to frequentist estimates but more precise and closer to the overall treatment effect. The probability that finerenone improves survival time until cardiovascular death was 79% (HR 0.93, 95% CrI: 0.79-1.09, Pr(HR<1) = 79%), and all-cause mortality was 87% (HR 0.94, 95% CrI: 0.84-1.05, Pr(HR<1) = 87%), although any benefit was likely small on an absolute scale.

Conclusion: The non-steroidal MRA finerenone reduced the rate of heart failure events and cardiovascular death, and there is a strong probability that there is a small reduction in CV death and all-cause mortality. Bayesian methods offer additional insights to the analysis of a large randomized control trial.

posters-wednesday: 17

Fast Approximation of Joint Models: A Comparative Evaluation of Bayesian Methods

Jinghao Li, David M Hughes

University of Liverpool, United Kingdom

Background Joint models are widely employed in statistics to simultaneously analyze longitudinal data and time-to-event data, effectively capturing the dynamic relationships between the two processes. This framework has shown significant utility in biostatistics and clinical research. The widespread adoption of joint models enable clinicians to make predictions about patient specific risk that update over time, and aid clinical decision making. However, the increased complexity of joint models compared to separate longitudinal and survival models necessitates more sophisticated parameter estimation methods. Early contributions using Maximum Like lihood Estimation (MLE) laid the foundation for joint model estimation, followed by advance ments in Bayesian methods that employed Markov Chain Monte Carlo (MCMC) techniques for inference. While MCMC-based approaches, such as JMBayes and rstanarm, provide accu rate parameter estimates, they are computationally expensive and exhibit slow convergence, particularly when handling large datasets and multiple longitudinal variables. More recently, the INLAjoint package has been introduced, applying the Integrated Nested Laplace Approx imation (INLA) to joint models, offering faster computation but with potential trade-offs in accuracy. Method Variational Bayes (VB) inference, originally popularized in artificial intelligence applications, has gained increasing attention in statistical research due to its computational efficiency and scalability, as highlighted by Ormerod and Wand (2010). This study aims to provide a com prehensive evaluation of existing Variational Bayes methods for joint models, comparing their performance with established MCMC- and INLA-based approaches. The comparison focuses on key evaluation criteria, including computational efficiency, estimation accuracy, error rates, and convergence speed. Implementations from existing R packages, including Stan-based MCMC and Variational Bayes algorithms, are used in the analysis. Performance is assessed through simulation studies generated with the simsurv package (Brilleman) under controlled conditions, as well as through validation on real-world data from the Primary Biliary Cirrhosis (PBC) study. Results The results will include a detailed comparison of model fitting times, estimation accuracy, error metrics, and convergence properties across the different approaches. The evaluation is ongoing, and comprehensive results will be presented at the conference. Future analyses will explore potential trade-offs in estimation bias and error, providing insights into the relative advantages of different inference methods for large-scale joint model applica tions. Keywords Joint Model, Variational Bayes, Bayesian Inference, MCMC, INLA, Longitudinal Data, Survival Analysis References Ormerod, J.T. and Wand, M.P. (2010). Explaining Variational Approximations. The American Statistician, 64(2), pp.140–153. doi: https://doi.org/10.1198/tast.2010.09058.

posters-wednesday: 18

Confidence Intervals for Comparing Two Independent Folded Normals

Eleonora Di Carluccio¹, Sarah Ogutu², Ozkan Köse³, Henry G. Mwambi^1,2, Andreas Ziegler^1,2,4,5

¹Cardio-CARE, Medizincampus Davos, Davos, Switzerland; ²School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzbug, South Africa; ³Orthopedics and Traumatology Department, Antalya Training and Research Hospital, Antalya, Turkey; ⁴Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany; ⁵Centre for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

The absolute change in the angle measured immediately after surgery and after bone healing is a clinically relevant endpoint to judge the stability of an osteotomy. Assuming the difference in angles is normally distributed, the absolute difference follows a folded normal distribution. The confidence interval for the angle change of a novel fixation screw compared to a standard fixation screw may be used for evaluating non-inferiority. In this work, we suggest that the simple two-sample t-statistic or Welch-statistic may serve as the basis for confidence interval calculations for the difference between two folded normal. The coverage probabilities of derived confidence intervals are investigated by simulations. We illustrate the approaches with data from a randomized controlled trial and an observational study on hallux valgus i.e., bunion surgery. In the simulation studies, asymptotic and both non-parametric and parametric bootstrap confidence intervals based on the t-statistic and Welch-test were close to nominal levels. Methods based on the chi-squared distributions were not deemed appropriate for comparing two folded normal. We recommend using confidence intervals based on the t-statistic or the Welch-statistic for evaluating non-inferiority in trials where the stability of angles after osteotomy is to be compared.

posters-wednesday: 19

Towards Realistic Synthetic Nanopore Protein Signals: A Comparative Study of Stochastic and GAN-Based Methods

Göran Köber¹, Jonas Bürgel¹, Tobias Ensslen^1,2, Oliver Amft^1,2

¹University of Freiburg, Germany; ²Hahn-Schickard, Germany

Nanopores provide a powerful tool for molecular analysis, enabling direct, single-molecule measurements of nucleotides, peptides, and other biopolymers. However, developing machine learning models for tasks like peptide sequence recognition is challenging due to the scarcity of labeled training data, as experimental data collection is both expensive and time-consuming. Synthetic data generation offers a promising solution by providing high-quality, customizable datasets for algorithm development and benchmarking.

We develop and compare several techniques for generating synthetic nanopore protein data, leveraging both stochastic methods and deep learning approaches, with a particular focus on Generative Adversarial Networks (GANs). The generated signals can be of arbitrary lengths, reaching up to hundreds of thousands of steps—far exceeding commonly reported time series lengths in the literature. The generation process is structured into two phases. First, a flat reference signal is synthesized to mimic the general shape of a blockade. Next, fluctuation generation algorithms introduce the fluctuating patterns of experimental data into the reference signal.

Multiple signal generation algorithms are explored, starting with a simple Gaussian noise model as a baseline. More advanced stochastic approaches, combining cubic interpolation with Gaussian noise, produce signals that closely resemble real blockade events. Additionally, an RNN-WGAN architecture is developed to generate arbitrarily long, high-fidelity signals that are challenging to distinguish from experimentally observed data. To evaluate the quality of generated signals, a discriminative score is computed using an RNN classifier, complemented by dimensionality reduction on established feature extraction libraries for time series data.

We also provide a comparative analysis of stochastic and data-driven methods, examining both their qualitative and quantitative differences and find that GAN-based methods achieve the best overall results. To the best of our knowledge, this work is the first to introduce high-quality synthetic nanopore protein sensing data generation methods, paving the way for advanced machine learning applications and addressing the critical need for labeled, customizable synthetic datasets in the field.

posters-wednesday: 20

Data Transformations in Machine Learning Approaches for Studying Microbiota as a Biomarker of Non-Response Risk to CFTR Modulators

Marta Avalos¹, Céline Hosteins¹, Diego Kauer¹, Chloé Renault¹, Raphaël Enaud², Laurence Delhaes²

¹University of Bordeaux - Inria - Inserm BPH U1219, France; ²University of Bordeaux, CHU Bordeaux, Inserm U1045, France

Cystic fibrosis (CF) is a genetic disease caused by mutations in the CF transmembrane conductance regulator (CFTR) gene. Impaired mucociliary clearance and the accumulation of respiratory secretions, combined with an altered immune response and chronic treatments, disrupt the airway microbiota and mycobiota. These dysbioses, characterized by reduced microbial diversity and a predominance of opportunistic pathogens, correlate with disease severity and may serve as biomarkers for disease progression.

The introduction of CFTR modulator therapies has transformed CF management, significantly altering the disease’s clinical course by enhancing mucosal hydration and improving patient outcomes. However, response to these therapies remains highly variable among patients, underscoring the need for predictive biomarkers. The airway and digestive microbiota, which play a crucial role in disease progression, represent promising candidates. While bacterial and fungal dysbioses in CF are well documented, their potential as biomarkers for predicting therapeutic response remains poorly explored, posing significant methodological challenges.

Microbiome studies in CF typically involve small cohorts and high-dimensional data, often compositional, zero-inflated, and sometimes longitudinal. Moreover, integrating heterogeneous data sources—including bacterial and fungal communities from different anatomical sites (lung and gut) alongside clinical factors—is essential for building robust predictive models. This requires advanced statistical and machine learning approaches to address challenges in feature selection, model interpretability, and data integration.

In this study, based on CF patients from the French LumIvaBiota cohort, we examine how transformations of relative abundance data affect both the performance and interpretability of various linear (Lasso, PLS, PCA regression) and non-linear (SVM, Random Forest, Neural Networks) machine learning methods. We compare these approaches in their ability to predict non-response to CFTR modulators, balancing the trade-off between model complexity and interpretability—a key consideration for clinical application.

Our findings provide insights into best practices for microbiome-based predictive modeling in CF and offer methodological guidance on selecting appropriate data transformations and machine learning frameworks for biomarker discovery in high-dimensional biological datasets.

posters-wednesday: 21

On microbiome data analysis using Bayesian method under the assumption of a zero-inflated model

Yuki Ando, Asanao Shimokawa

Tokyo University of Science, Japan

Background / Introduction
The data on the abundance of microbial groups is called microbiome data. One of the purposes of analysing microbiome data is to compare the abundance of microbes in the bodies of test subjects with different conditions. There are two main characteristics of microbiome data: firstly, the abundance is discrete, and secondly, there are an excessive number of zeros in the data. However, in order to make it possible to compare between test subjects with different total abundances of microbes, the abundance is sometimes converted into a proportion. This is the method we will use in this study. In this case, the abundance takes a continuous value in the range from 0 to 1. The zero-inflated beta model is the most commonly used population distribution for abundance. This is a distribution in which the abundance follows a beta distribution with a certain probability, and takes 0 in other cases. Furthermore, Chen and Li (2016) stated that the probability that the abundance follows a beta distribution and the parameters of the beta distribution are expressed by a logistic regression model with the covariates of the subjects as explanatory variables. We will examine the method of estimating the parameters in this model.

Methods
The parameters of the zero-inflated beta model are currently estimated using the maximum likelihood method. However, since it is not possible to obtain the maximum likelihood estimate analytically, we will use an iterative calculation method such as the EM algorithm in combination. One drawback of this method is that it cannot obtain good estimates when the sample size of the microbiome data is small, i.e., when the number of subjects is small. Therefore, we consider estimating the parameters using Bayesian methods.

Results
We applied the maximum likelihood method and Bayesian methods to simulation data and compared the obtained estimates. We found that the Bayesian method worked well in situations with small sample sizes.

Conclusion
We dealt with parameter estimation when assuming a zero-inflated beta model for microbiome data.
We found that it is recommended to use the Bayesian method rather than the maximum likelihood method for microbiome data with a small sample size.

Reference
Chen E.Z. and Li H. 2016. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32 (17): 2611–2617.

posters-wednesday: 22

Estimating prevalence of cystic fibrosis heterozygosity using Fast Boltzmann Inversion (FBI): An improved Monte Carlo algorithm for Bayesian inference

Jan Brink Valentin

Danish Center for Health Services Research, Department of Clinical Medicine, Aalborg University, Denmark

Background: Monte Carlo sampling of probabilistic potentials is often biased if the density of state is not properly managed. Bias is further induced when applying shrinkage priors to avoid overfitting. Boltzmann inversion (BI) provides a generic sampling scheme to avoid such bias. However, because of the iterative nature of BI, the algorithm is often intractable. In this study, we developed a fast Boltzmann inversion (FBI) algorithm with the same computational complexity as the standard Metropolis-Hastings (MH) algorithm and applied the method for estimating the heterozygous carrier prevalence of Cystic fibrosis (CF).

Case: CF is a rare genetic disease which have a massive impact on the patients’ health, daily living and overall survival. The disease is inherited from both parents, and in Denmark children have been screened for CF at birth since 2016. While the incidence rate can be estimated using patient registers, the heterozygous carrier prevalence is not easily found.

Method: We applied a simple two parameter probabilistic model for the probability of having CF conditioning on being the first-born child in a family. This probability was considered the target distribution, and the model parameters included the proportion of heterozygous carriers. We linked the Danish national patient register with the central person register to estimate the mean and variance of the target distribution. We applied shrinkage priors for the model parameters with low, moderate, and strong shrinkage, to avoid overfitting. The FBI algorithm was then used to estimate the model parameters, and the results were compared to that of the MH algorithm.

Results: Using the register data the target probability was estimated to be 2.38 per 10.000. The MH algorithm with low, moderate and strong shrinkage were biased and missed the target by 0.16, 0.25 and 0.35 per 10.000. The FBI algorithm with low and moderate shrinkage were both on target with less than 0.001 bias and estimated the proportion of heterozygous carriers in the Danish population to be 3.05 percent (SE = 0.54). However, the FBI algorithm with strong shrinkage did not converge. Finally, the FBI algorithm used the same computational time as the MH algorithm.

Conclusion: The FBI algorithm provides an unbiased estimate when applying shrinkage estimators without increasing computational time compared to other Monte Carlo algorithms. In addition, the FBI algorithm reduces the issue of setting the hyper parameters of the prior distributions in a Bayesian context.

posters-wednesday: 23

A Sparse Graph Representation of Hi-C Data for Colorectal Cancer Prediction

Jiwon Im¹, Mingyu Go², Insu Jang³, Minsu Park¹

¹Department of Statistics and Data Science, Chungnam National University, Republic of Korea; ²Graduate School of Data Science, KAIST, Republic of Korea; ³Korea Research Institute of Bioscience and Biotechnology, Republic of Korea

Colorectal cancer (CRC) remains a leading cause of cancer-related morbidity and mortality, emphasizing the critical need for early and precise predictive modeling. While advances in genomics and deep learning have enabled computational cancer classification, existing models often face challenges in capturing the complexity of chromatin organization and high-dimensional genomic data.

This study presents a graph-based predictive framework utilizing high-throughput chromosome conformation capture (Hi-C) data from chromosome 18, a region implicated in CRC pathogenesis. The method constructs a sparse weighted graph from chromatin interactions and applies a graph neural network for classification. An optimal bandwidth selection technique removes redundant connections while retaining key genomic relationships to enhance computational efficiency and interpretability.

Experimental evaluations on real-world Hi-C datasets indicate that the proposed approach achieves competitive classification accuracy while improving F1-score and precision-recall performance with reduced training complexity. These findings suggest that sparse graph-based Hi-C analysis may be a useful framework for CRC prediction and contribute to graph representation learning in genomic medicine.

Keywords: Hi-C, graph neural network, sparse graph representation, CRC classification

posters-wednesday: 24

A multi-state survival model for recurring adenomas in Lynch syndrome individuals

Vanessa García López-Mingo, Veerle Coupé, Marjolein Greuter, Thomas Klausch

Amsterdam UMC, Netherlands, The

Introduction

Lynch syndrome is a genetic condition that predisposes individuals to develop colorectal cancer (CRC). It is characterized by a deficiency in the mismatch repair (MMR) system occurring early in life, leading to increased risk of accumulating DNA damage. Individuals with Lynch develop adenomas, pre-cursor lesions to CRC, in the bowel at a higher rate compared to general population. This necessitates close surveillance of affected individuals by colonoscopy. Although surveillance intervals are short (one to three years), continued surveillance is needed to manage CRC risk throughout life.

Based on surveillance data from Lynch patients, this study aims to estimate the time to repeated non-advanced adenoma (nA) formation and progression to advanced adenomas (AA) or CRC. We develop a novel multi-state survival model that, contrary to available models, handles recurring adenomas that characterize Lynch.

Methods

The model treats the adenoma formation as panel count data, where the occurrence of recurrent adenomas is observed only at a sequence of discrete time intervals (colonoscopies). Specifically, the development of nA is modelled as a Poisson process, but with a modification to account for the delay associated with the occurrence of MMR deficiency around the time of the first nA, incorporated through a Weibull model. Immediately after, a Poisson process for later nAs is initialized. Furthermore, every adenoma is assumed to progress to AA or CRC, where the sojourn time is also modelled Weibull distributed. All sojourn times are regressed on covariates like sex and the affected gene to uncover heterogeneity. A Bayesian Metropolis-within-Gibbs sampler combined with data augmentation for the latent times is employed to estimate the parameters.

Results

In first Monte Carlo simulations, we found good performance of the estimation procedure, with unbiased estimates and good mixing across the chains. Additionally, the coverage percentages of the credible intervals matched the nominal level of 95%. At ISCB we will present details on the application to the Lynch patient data, which is currently under development.

Conclusion

This study presents a novel model for analysing adenoma development in Lynch accounting for the impact of MMR deficiency. By using the combination of the delay on the first adenoma and a Poisson process for the recurrent ones we capture the dynamics of adenoma development in Lynch more accurately than existing multi-state screening models such as “msm” (Jackson, 2011). Developing dedicated models for disorders like Lynch could help improve prevention of CRC in affected groups.

posters-wednesday: 25

Evaluating Completeness of Data in CPRD’s Breast Cancer Data: Implications for External Controls for Surrogate Endpoints

Dorcas N Kareithi^1,3, Jingky Lozano-Kuehne¹, David Sinclair², James Wason¹

¹Biostatistics Research Group, Newcastle University, United Kingdom; ²Older People and Frailty Policy Research Unit, Newcastle University, United Kingdom; ³Jasiri Cancer Research Foundation Kenya

Background: Registries such as CPRD Aurum and national cancer registries offer valuable sources of observational data that can be used as external or historical controls for cancer clinical trials and other health research. However, an extensive review of evidence has shown that the investigation of alternative measurements from routinely collected data is dependent on access, validity, and completeness of such data. This study evaluates the completeness, patterns and impact of missing data in breast cancer patients using data from CPRD Aurum and Cancer Registration and Treatment datasets.

Methods: We used linked datasets from CPRD Aurum, the Tumor Registration dataset, and the Treatment Characteristics dataset to identify and extract breast cancer cases (ICD-10: C50). Key clinical variables including demographic characteristics, tumour type and size, comorbidity score; tumour screening, tumour treatment, and cancer stage from female patients who were a18 and above in 2005, were analysed. Completeness of data in 6months’ follow up periods post diagnosis from 2005-2024 and patterns of missing data were assessed using descriptive statistics and Little’s MCAR test to determine missingness mechanisms. No imputation methods were applied, as the focus was on understanding completeness and the extent or impact of missingness.

Results: Preliminary findings of 2.9M records from 68,613 participants who fit our inclusion and exclusion criteria indicate high completeness (>90%) in most demographic characteristics, most observation event dates except hospitalisation date, high (>90%) completeness in most tumour stage and characteristics except for PR and ER scores, high (>90%) completeness in most tumour treatment variables, and average (>60%) completeness of quality-of-life variables. Preliminary time-to-event analyses suggest that incomplete data used to compute surrogate outcomes, such as the quality-of-life data could affect the derivation, computation and estimation of key established surrogate endpoints such as Disease-Free survival (DFS), Time to Next Treatment (TTNT), Event Free Survival (EFS) and Overall survival.

Conclusion: The preliminary findings highlight the value of registry data for use as external or historical controls for cancer clinical trials and other health research. but caution against potential biases introduced by some of the incomplete data, which may impact clinical interpretations and policy decisions.

posters-wednesday: 26

Identification of risk factors for the development of de novo malignancies after liver transplantation

Tereza Hakova¹, Pavel Taimr¹, Tomáš Hucl¹, Zdeněk Valenta²

¹Dept. of Hepatogastroenterology, Institute of Clinical and Experimental Medicine, Prague, Czechia; ²Dept. of Statistical Modelling, Institute of Computer Science of the Czech Academy of Sciences, Prague, Czechia

Background

De novo malignancies (DNM) are a significant long-term complication in liver transplantation, immunosuppressive therapy being a key contributing factor. While necessary to prevent graft rejection, exposure to immunosuppressants may increase the risk of post-transplant malignancies. Identifying risk factors for DNM is crucial to improving post-transplant management strategies. This study uses the cohort of liver transplant patients aged 18 years and older to study competing risks of the incidence of DNM or death, focusing on the role of cumulative exposure to immunosuppressants. Independent prognostic factors, such as the age of donor/recipient, gender, smoking, diabetes status, etc. were adjusted for in the models. We hypothesised that high cumulative doses of immunosuppressants could correlate with an increased incidence of malignancies, suggesting the need for individualised immunosuppression strategies.

Methods

Retrospective right-censored and left-truncated cohort data on 1,073 liver transplant patients aged 18 years or more were used to study competing risks of the incidence of cancerous disorders and death in the context of immunosuppression following transplantation (TX). We studied the effect of cumulative exposure to several immunosuppressants (Azathioprin, Cyclosporin A, Mycophenolate Mofetil, Prednison, Simulect, Sirolimus and Tacrolimus), adjusted for possible confounders. Cause-specific survival models, incl. the Cox PH model, Aalen’s additive model and its McKeague-Sasieni extension, were employed in analysing the effect of left-truncated time-dependent immunosuppression doses on the right-censored outcomes. Follow-up period was limited to 10 years.

Results

Results for time-to-malignancy data showed that male gender and higher recipient and donor age were associated with elevated hazard of malignancy incidence. Immunosuppression using Mycophenolate, Sirolimus and Tacrolimus was associated with the reduction in the hazard, while Simulect had an adverse effect on malignity incidence.

The age, smoking and male gender of TX recipient had an adverse effect on the hazard of death. Applications of Mycophenolate, Prednison, Sirolimus and Tacrolimus all proved to maintain a protective effect on the incidence of death. The latter drug had a time-varying protective effect, the strongest being for a few months after TX.

Conclusion

Our results bring new insights into immunosuppressive treatment. Inconsistency with published studies may be due to a different methodology and the patient population. This study highlights the importance of monitoring immunosuppressive drug levels and controlling modifiable risk factors, such as smoking, in liver transplant recipients. Understanding the multifactorial nature of post-transplant malignancies can lead to improved patient management.

posters-wednesday: 27

Are ACE inhibitors associated with increased lung cancer risk, or are unmeasured confounders biasing results?

Sean Maguire, Ruth Keogh, Elizabeth Williamson, John Tazare

London School of Hygiene & Tropical Medicine, United Kingdom

Background: Unmeasured confounding is nearly always a concern in observational studies of treatment effects. However, despite methods being available to assess the potential impact, it is often ignored. We illustrate methods for assessing the impact of unmeasured confounding through a study of ACE inhibitors and ARBs; commonly prescribed drugs for the treatment of high blood pressure. Safety concerns for ACE inhibitors raised by observational study findings of higher lung cancer risks in ACE inhibitor users relative to ARB users, and the inconsistent findings in subsequent observational studies, may have been caused by unmeasured confounding.

Methods: Using data from the Clinical Practice Research Datalink, we identified a cohort of UK adults who initiated an ACE inhibitor or ARB treatment for the first time between 1995 – 2019, and fitted a Cox model for the outcome of lung cancer, with adjustment for a number of measured confounders. A conditional hazard ratio was estimated, accounting for competing events.
E-values were used to quantify the potential impact of unmeasured confounding on the effect estimates. E-values, introduced in 2017 by VanderWeele and Ding, quantify the minimum strength of association an unmeasured confounder (or set of unmeasured confounders) would need to have with ACE inhibitor use and lung cancer incidence, in order to ‘tip’ our results and change our conclusions.
Covariate e-values, introduced by D’Agostino McGowan and Greevy in 2020, were also calculated and contextualise the potential impact of unmeasured confounding in previous ACE inhibitor and lung cancer studies in the literature.

Results: Our cohort contained 984,000 initiators of ACE inhibitor/ARB users. We found no evidence that ACE inhibitor use is associated with increasing lung cancer risk (conditional hazard ratio = 0.997, 95% CI 0.924 – 1.076). Our investigations using e-values show that this result could be easily tipped to a significantly harmful or protective effect. Similar results are found for the previous studies in the literature.

Conclusion: Through use of quantitative bias analyses using e-values we found that it is likely that studies which have reported both protective and harmful effects of ACE inhibitors, are biased due to unmeasured confounding.

posters-wednesday: 28

DO GENETIC CHANGES IN 15Q13.3 MEAN LOWER IQ SCORE?

Tadas Žvirblis¹, Pilar Caro², Audronė Jakaitienė¹, Christian Schaaf²

¹Institute of Data Science and Digital Technologies, Vilnius University; ²Institute of Human Genetics, Heidelberg University

Background. Genetic changes affecting the copy number of chromosome 15q13.3 have been associated with a group of rare neurodevelopmental conditions (autism spectrum disorder, epilepsy, schizophrenia, and others) [1]. The critical region contains approximately 10 genes. Treatments are limited and are restricted to targeting the main symptoms rather than the underlying etiology. Not every person harboring a 15q13.3 copy number change will manifest the disease, and the severity and clinical diagnosis are difficult to predict [2]. This represents a significant challenge in modelling and determining health outcomes.

Methods. Multi-center prospective study was conducted to assess cerebral activity and neural network function alterations in individuals with 15q13.3 microdeletion or microduplication. It was planned to enroll 15 subjects for each aberration, as well as 15 healthy subjects. During the study period electrophysiological brain network analysis, IQ testing and detailed genetic analysis were performed for each subject. All subjects provided written informed consent. Study protocol was approved by the Ethics Board of the Medical Faculty of Heidelberg University No. S-212-2023.

Results. Six subjects with genetic changes affecting the copy number of chromosome 15q13.3 were identified during the interim statistical analysis. Five (83.3%) had a deletion of 15q13.3 and 1 (16.7%) duplication. The mean (SD) age was 27.5 (11.36) years; 2 (33.3%) of them were of adolescent age, and half (50.0%) of subjects were male. The mean (SD) IQ score was 76.7 (18.14), and it was statistically significantly lower than the average population IQ score (p = 0.027). The mean (SD) IQ score for the males was slightly higher than that of females: 79.7 (22.03) vs. 73.7 (17.62) for males and females, respectively.

Conclusion. The interim statistical analysis showed that subjects with 15q13.3 microdeletion or 15q13.3 microduplication have lower IQ score then average population.

Funding. This work is part of the EJP RD project “Resolving complex outcomes in 15q13.3 copy number variants using emerging diagnostic and biomarker tools (Resolve 15q13)” No. DLR 01GM2307 and has received funding from EJP RD partner the Research Council of Lithuania (LMTLT) under grant agreement No. S-EJPRD-23-1.

Keywords. rare neurodevelopmental conditions, 15q13.3 microdeletion, 15q13.3 microduplication

References

[1] Gillentine MA, Schaaf CP. The human clinical phenotypes of altered CHRNA7 copy number. Biochem Pharmacol. 2015;97(4):352-62.

[2] Yin J, Chen W, Yang H, Xue M, Schaaf CP. Chrna7 deficient mice manifest no consistent neuropsychiatric and behavioral phenotypes. Sci Rep. 2017 Jan 3;7:39941. PMCID: PMC5206704

posters-wednesday: 29

Measuring the performance of survival models to personalize treatment choices

Orestis Efthimiou¹, Jeroen Hoogland², Thomas Debray³, Valerie Aponte Ribero¹, Wilma Knol⁴, Huiberdina Koek⁴, Matthias Schwenkglenks⁵, Séverine Henrard⁶, Matthias Egger⁷, Nicolas Rodondi¹, Ian White⁸

¹University of Bern (Switzerland); ²Amsterdam University Medical Centers (The Netherlands); ³Smart Data Analysis and Statistics B.V. (The Netherlands); ⁴Utrecht University (The Netherlands); ⁵University of Basel (Switzerland); ⁶UCLouvain (Belgium); ⁷University of Bern (Switzerland), University of Bristol (UK), University of Cape Town (South Africa); ⁸University College London (UK)

Background: Statistical and machine learning algorithms can be used to predict treatment effects at the participant level using data from randomized clinical trials (RCTs). Such predictions can facilitate individualized treatment decisions. Although various methods have been proposed to assess the accuracy of participant-level treatment effect predictions, it remains unclear how they can be applied to survival data.

Methods: We propose new methods to quantify individualized treatment effects for survival (time-to-event) outcomes. First, we describe alternative definitions of participant-level treatment effects for survival outcomes. Next, we summarize existing and introduce new measures to evaluate the performance of models predicting participant-level treatment effects. We explore metrics for assessing discrimination, calibration, and decision accuracy of such predictions. These generic metrics are applicable to both statistical and machine learning models and can be used during model development (e.g., for model selection or internal validation) or when testing models in new settings (e.g., external validation). We illustrate our methods using both simulated data as well as real data from the OPERAM trial, an RCT involving multimorbid older adults randomized to either standard care or a pharmacotherapy optimization intervention. We fit competing statistical and machine learning models and apply our newly developed methods to compare their performance.

Results: Analyses of simulated data demonstrated the utility of our metrics in evaluating the performance of models predicting participant-level treatment effects. Application in OPERAM revealed that the models we developed performed sub-optimally, with moderate-to-poor performance in calibration and poor performance in discrimination and decision accuracy, when predicting individualized treatment effects.

Conclusion: Our methods are applicable for models aimed at predicting participant-level treatment effects for survival outcomes. They are suitable for both statistical and machine learning models and can guide model development, validation, and potential impact on decision making.

posters-wednesday: 30

A framework for estimating quality adjusted life years using joint models of longitudinal and survival data

Michael Crowther¹, Alessandro Gasparini¹, Sara Ekberg¹, Federico Felizzi², Elaine Gallagher³, Noman Paracha³

¹Red Door Analytics AB, Stockholm, Sweden; ²Department of Computer Science, ETH Zurich, Switzerland; ³Bayer Pharmaceuticals, Basel, Switzerland

Background

Quality of life (QoL) scores are integral in cost-effectiveness analysis, providing a direct quantification of how much time patients spend at different severity levels. There are a variety of statistical challenges with modeling and utilizing QoL data appropriately. QoL data, and other repeatedly measured outcomes such as prostate-specific antigen (PSA), are often treated as time-varying covariates, which only change value when a new measurement is taken - this is biologically implausible. Additionally, such data often exhibits both between and within subject correlations, which must be taken into account, and are associated with survival endpoints. The proposed framework utilizes "progression" or similar intermediate endpoints or biomarkers like EQ-5D, and models them jointly with overall survival, allowing us to directly calculate quality adjusted life years (QALYs).

Methods

Motivated by the prostate cancer trial setting, we simulated data representing repeatedly measured PSA levels, utilities and overall survival. Using numerical integration and the delta method, we then derive analytical estimates of QALYs, differences in QALYs and restricted time horizon QALYs from the estimated multivariate joint model, along with uncertainty.

Results

PSA and utilities were modeled flexibly using linear mixed effects submodels with restricted cubic splines to capture the nonlinear development over follow-up time. An interaction with treatment was also included to allow different trajectories in those treated and those on placebo. Both PSA and utility were linked to survival through their current value and slopes, with a Weibull survival submodel. Treatment was estimated to provide an additional 1.074 QALYs (95% CI: 0.635, 1.513) across a lifetime horizon.

Conclusion

Deriving QALYs from a joint model of longitudinal and survival data accounts for all of the statistical and biological intricacies of the data, providing a more appropriate, and accurate, estimate for use in cost-effectiveness modeling, and hence reducing uncertainty.

posters-wednesday: 31

Cut Off Determination using Model Derived Estimate in Survival Prediction Model

Jungbok Lee

Asan Medical Center & Univ of Ulsan College of Medicine, South Korea

For practical clinical applications, various prediction models related to disease onset, risk, prognosis, and survival are being developed using EMR or clinical research data. The score generated by these models is used as a measure of risk, often categorized for practical purposes. Determining an appropriate cutoff for score categorization has become a topic of interest. For example, in the case of time-to-event outcomes, the most intuitive method is to identify a cutoff that maximizes the log-rank test statistic. However, the method based on test statistics has the limitation that the cutoff may vary depending on the distribution of the dataset used for model building.

In this study, we present:

1) A phenomenon where the cutoff varies when there are relatively many or few high- or low-risk subjects in the training set.

2) A hazard estimation procedure using a piecewise hazard model and resampling method for survival data.

3) Cutoff criteria for when the hazard rate estimated by the model follows linear, parabolic, cubic, logarithmic, or exponential curves.

4) A proposed resampling procedure to account for variation in the distribution of events, based on the initially estimated cutoff value.

Determining cutoffs based on tests in survival data is dependent on the distribution of scores, censoring rates, and sample size. The method using model-derived estimates can help adjust for these dependencies. The term "optimal" in cutoff determination is limited to the original dataset and the testing method used to identify the cutoff.

posters-wednesday: 32

A two-step testing approach for comparing time-to-event data under non-proportional hazards

Jonas Brugger^1,2, Tim Friede², Florian Klinglmüller³, Martin Posch¹, Franz König¹

¹Medical University of Vienna, Vienna, Austria; ²University Medical Center Göttingen, Germany; ³Austrian Agency for Health and Food Safety, Vienna, Austria

The log-rank test and the Cox proportional hazards model are commonly used to compare time-to-event data in clinical trials, as they are most powerful under proportional hazards. But there is a loss of power if this assumption is violated, which is the case for some new oncology drugs like immunotherapies. We consider a two-stage test procedure, in which the weighting of the log-rank test statistic depends on a pre-test of the proportional hazards assumption. I.e., depending on the pre-test either the log-rank or an alternative test is used to compare the survival probabilities. We show that if naively implemented this can lead to a substantial inflation of the type-I error rate. To address this, we embed the two-stage test in a permutation
test framework to keep the nominal level alpha. We compare the operating characteristics of the two-stage test with the log-rank test and other tests by clinical trial simulations.

posters-wednesday: 33

A systematic review of Bayesian survival modelling for extrapolating survival in cost-effectiveness analysis

Farah Erdogan¹, Gwénaël Le Teuff^1,2

¹Oncostat, INSERM U1018, France; ²Department of Biostatistics and Epidemiology, Gustave Roussy, Paris-Saclay University, France

Background: Cost-effectiveness analysis (CEA) aims to evaluate the clinical and economic impact of health interventions. In some settings, such as oncology, CEA requires estimation of long-term benefits in terms of life years. Survival extrapolation is then necessary when clinical trials have limited follow-up data. As highlighted in the NICE Technical Support Document (TSD) 21 (2020), Bayesian approach offers a flexible framework for incorporating external information in survival modelling and addressing uncertainty in survival prediction. This work aims to report how Bayesian is used to incorporate external information for extrapolating long-term survival in CEA. Methods: We conducted a systematic review up to October 2024 to identify both methodological and non-methodological studies using different electronic databases (PubMed, Scopus, ISPOR conference database), completed by handsearching references cited in the automatically identified studies. Results: Of 52 selected studies (77% automatically and 23% manually), 52% were published since 2022 and 90% (n=47) focused on oncology. 52% (n=27) represented articles and 38% (n=20) were methodological works. 87% (n=45) used external data from different sources: clinical, registry, epidemiology, real world data, general population mortality and 17% (n=9) experts elicitation. We classified the studies into four non-mutually exclusive categories of Bayesian modelling (C1-C4). The first three categories combine, in order of increasing complexity, survival modelling and Bayesian formulation for incorporating external information. C1 (27%, n=14) includes standard parametric models (SMPs: exponential, Weibull, Gompertz, lognormal, log-logistic, generalized gamma distributions) with prior of parameters informed by historical data. C2 (48%, n=25) includes (i) a Bayesian multiple parameter evidence synthesis that allows to combine trial and external data, (ii) joint modelling of progression-free survival and overall survival, and (iii) non-SPMs (e.g., mixture and cure models). C3 (27%, n=14) groups complex hazard regression models (e.g., poly-hazard, relative survival) incorporating disease-specific and general population mortality with predominantly non-informative priors distributions on parameters. The last category (C4, 13%, n=7) represents Bayesian model averaging that weights predictions of different survival models by posterior model probabilities to address structural uncertainty. Conclusion: This review highlights the broad spectrum of Bayesian survival models and the different ways to incorporate external information, resulting in reduced uncertainty in survival extrapolation. Future research should focus on comparing these methods to identify the most suitable approaches given the intervention mechanisms and external data availability. This will help to standardize the use of Bayesian statistics for survival extrapolation and provide guidance, as proposed in the NICE TSD 14 on survival model selection procedures.

posters-wednesday: 34

Power comparison of hazard ratio versus restricted mean survival time in the presence of cure

Ronald Bertus Geskus^1,2

¹Oxford University Clinical Research Unit, Vietnam; ²University of Oxford, United Kingdom

Background: The log-rank test and the Cox proportional hazards model lose power with non-proportional or crossing hazards. A large simulation study did not show consistent superior performance of restricted mean survival time (RMST) over the log-rank test in such settings [2]. That study did not consider the presence of cure, nor the presence of an independent predictor of survival.
In a randomized controlled trial (RCT) investigating the effect of dexamethasone on survival in patients with tubercolous meningitis (TBM), the hazard ratio was the primary effect measure [1]. Baseline MRC grade strongly affected 12-month survival. Testing for difference in RMST gave lower p-value than the hazard ratio: 0.14 versus 0.22, and 0.075 versus 0.21 when correcting for MRC grade. We performed a simulation study to investigate gain in power of RMST.

Methods: For each scenario we simulated 3000 data sets from Weibull distributions with two treatment arms and a three-level categorical predictor of survival representing MRC grade. Weibull parameters were estimated based on the RCT, after exclusion of the 12-month survivors. Sample size including survivors was 700. We also considered approximate scenarios assuming proportional hazards in the non-survivors; note that proportionality is lost once the survivors are included. Models with and without interaction with MRC grade were fitted. In an additional scenario we generated data with divergent survival curves, then converging at 12 months, and mortality between 50% and 100%.

Results: All numbers refer to power, computed as the percentage of simulation runs that gave p-value below 0.05 for the test of treatment effect. With parameters according to the TBM data set, RMST outperforms the hazard ratio (43% versus 36%). Further improvement is seen with adjustment for MRC grade (51% versus 33%). Similar results are observed with data generated assuming proportional hazards for the non-survivors. The test for non-proportionality has power between 10% and 30%. In the additional scenario with 50% mortality, proportional hazards had much lower power than RMST (36% versus 98%), while power was similar with 100% mortality.

Conclusion: Relative performance of proportional hazards versus RMST strongly depends on the shape of the survival curve and the presence of cure.

References:
[1] Donovan et al., Adjunctive dexamethasone for tuberculous meningitis in hiv-positive
adults, New England Journal of Medicine 389 (2023), 1357-1367.

[2] Dormuth et al., A comparative study to alternatives to the log-rank test,
Contemporary Clinical Trials 128 (2023).

posters-wednesday: 35

Sample Size Calculation in Prognostic Studies: A Comparative Analysis

Gloria Brigiari, Ester Rosa, Giulia Lorenzoni, Dario Gregori

Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Italy

Introduction
In classical survival analysis, sample size estimation is typically based on risk differences or hazard ratios (HR) between patient groups. While widely used, these methods have limitations, such as assuming group independence, proportional hazards, and neglecting side-variables. To address these challenges, alternative approaches, such as those proposed by Riley et al. (2019, 2021), focus on model precision and the inclusion of covariates. However, there is no guarantee that these methods will lead to a singular conclusion on the required sample size. This study aims to evaluate the performance of traditional HR-based methods and Riley’s precision-focused approach through sensitivity analysis and Monte Carlo simulations. The goal is to identify the ideal sample size and assess how the inclusion of covariates impacts model performance.

Methods
We conducted a sensitivity analysis using Monte Carlo simulations to compare classical HR-based sample size estimation methods with Riley’s model precision approach. Simulations were run based on historical data, focusing on proportional hazards and the inclusion of multiple covariates. Once the appropriate sample size was determined, Riley’s methodology was applied to evaluate the number of predictors that could be included in the model without overfitting. The analysis used a shrinkage factor of 0.9 to balance model complexity and accuracy. Finally, with the aim of assessing whether the calculated sample size allows for the generalizability of a previously developed model, a simulation-based method was applied to estimate the achieved precision, in terms of calibration, based on the given sample size.

Results
Traditional methods struggled to capture model complexity and did not consider relevant covariates effectively. In contrast, Riley’s method allowed for the inclusion of more covariates while maintaining statistical robustness. The application of Riley’s methodology revealed that the number of predictors that could be included without overfitting depended on the desired model accuracy metrics. External validation approach confirmed the adequacy of the calculated sample size, achieving good calibration and predictive accuracy of the model.

Conclusion
This study highlights the limitations of traditional HR-based methods and demonstrates the advantages of the proposed approach, which prioritizes model precision and avoids overfitting. By allowing the inclusion of additional covariates without sacrificing power, this methodology offers a flexible and reliable framework for sample size estimation and model development in prognostic studies.

posters-wednesday: 36

Prognostic Score Adjustment in a Two-Slope Mixed Effects Model to Estimate Treatment Effects on eGFR Slope in CKD Patients

Silke Janitza¹, Maike Ahrens², Sebastian Voss², Bohdana Ratitch³, Nicole Rethemeier⁴, Meike Brinker⁴, Paula Vesterinen⁵, Antigoni Elefsinioti¹

¹Bayer AG, Germany; ²Chrestos GmbH, Essen, Germany; ³Bayer Inc., Mississauga, Ontario, Canada; ⁴Bayer AG, Wuppertal, Germany; ⁵Bayer AG, Espoo, Finland

Background: The CHMP recently recognized the estimated glomerular filtration rate (eGFR) slope as a validated surrogate endpoint for clinical trials of treatments for chronic kidney disease (CKD). A common method for analysis of this endpoint is a two-slope linear spline mixed effects model (Vonesh et al., 2019). This model can serve as the primary analysis in future CKD trials with the option to adjust for baseline covariates, e.g., sodium-glucose cotransporter-2 inhibitor (SGLT2i) use and urinary albumin-to-creatinine ratio (UACR). Following a CHMP Qualification Opinion on prognostic covariate adjustment, we explore the potential benefits of integrating a prognostic score in the two-slope model using a historical database from two large CKD phase III studies FIDELIO-DKD and FIGARO-DKD.

Methods: Using the FIGARO-DKD study, we developed prognostic score models via random forest methodology, focusing on patients receiving placebo. These models included approximately 60 baseline covariates. We conducted extensive simulations based on FIDELIO-DKD to assess potential precision gains in treatment effect estimates from including a prognostic score obtained for each participant as a prediction from an aforementioned prognostic model.

Results: Pseudo simulations from FIDELIO-DKD indicated that integrating the prognostic score into a two-slope model without other covariates yielded moderate precision gains. When compared to a model, which included SGLT2i use and UACR category, the additional precision gains from including the prognostic score were reduced.

Conclusion: While prognostic score adjustment can enhance efficiency in clinical trials, it has primarily been studied within classical linear models. This work explores prognostic score adjustment to a more complex model, illustrating how sponsors can utilize historical data for pseudo simulations to evaluate the utility of prognostic score adjustments in future trials. Based on our historical studies, our findings from pseudo simulations suggest that incorporating a prognostic score in addition to other key baseline covariates (such as SGLT2i use and UACR category) may not yield substantial additional efficiency in estimating treatment effects.

Literature

Vonesh E, et al. Mixed-effects models for slope-based endpoints in clinical trials of chronic kidney disease. Stat Med. 2019;38(22):4218-4239.

European Medicines Agency. Qualification opinion for Prognostic Covariate Adjustment (PROCOVA™). Committee for Medicinal Products for Human Use (CHMP). 2022.

European Medicines Agency. Qualification opinion for GFR Slope as a Validated Surrogate Endpoint for RCT in CKD. Committee for Medicinal Products for Human Use (CHMP). 2023.

posters-wednesday: 37

Comparative effectiveness of ACE inhibitors and angiotensin receptor blockers to prevent or delay dementia: a target trial emulation

Marie-Laure Charpignon¹, Max Sunog², Colin Magdamo², Bella Vakulenko-Lagun³, Ioanna Tzoulaki⁴, Sudeshna Das², Deborah Blacker², Mark Albers²

¹Kaiser Permanente and UC Berkeley, United States of America; ²Mass General Brigham, United States of America; ³Haifa University, Israel; ⁴Imperial College London, United Kingdom

Alzheimer’s disease, the most common type of dementia, affects 6.7 million Americans and costs $345B annually. Since disease-modifying therapies are limited, repurposing FDA-approved drugs may offer an alternative, expedited path to preventing dementia. Hypertension is a major risk factor for dementia onset. However, prior observational studies contrasting antihypertensive drug classes (Angiotensin Converting Enzyme inhibitors: ACEI, Angiotensin Receptor Blockers: ARB, and Calcium Channel Blockers: CCB), provided mixed results.

We hypothesize that ACEI have an off-target pathogenic mechanism. To test this assumption, we emulate a target trial comparing patients initiating ACEI vs ARB using electronic health records from the US Research Patient Data Registry. We perform intention-to-treat analyses among 25,507 patients aged 50 and over, applying Inverse Propensity score of Treatment Weighting to balance the two treatment arms and accounting for the competing risk of death.

In a cause-specific Cox Proportional Hazards (PH) model, the hazard of dementia onset was higher in ACEI vs ARB initiators (HR=1.10 [95% CI: 1.01-1.21]). Findings were robust to outcome model structures (ie, Cox PH vs nonparametric) and generalized to patients with no hypertension diagnosis at initiation but receiving such drugs for another indication (e.g., heart failure).

Ongoing work includes evaluating differential effects by brain penetrance, discovering subgroups of responders, and assessing the mediating role of blood pressure (BP) control with ACEI vs ARB. Future research will incorporate longitudinal markers (e.g., BP, HbA1c, LDL) in time-to-event models and consider stroke incidence or recurrence under ACEI vs ARB initiation as a mediator.

posters-wednesday: 38

Optimal utility-based design of phase II/phase III programmes with different type of endpoints in the setting of multiple myeloma

Haotian Wang¹, Peter Kimani¹, Michael Grayling², Josephine Khan², Nigel Stallard¹

¹Warwick Clinical Trials Unit, United Kingdom; ²Johnson & Johnson Innovative Medicine, United Kingdom

Background:

High failure rates in phase III oncology trials, often due to overoptimistic assumptions based on limited phase II information, highlight the significant costs and risks associated with drug development. This underscores the importance of approaches that effectively link phase II and phase III trials, balancing resource allocation and decision-making to ensure phase III trials are appropriately powered to optimise success rates.

Method:

We propose a novel method to determine the optimal phase II sample size that maximizes overall utility of the successful programme. The method evaluates go/no-go decision criteria between phase II and phase III based on phase II outcomes including strategy of choosing the optimal go/no-go threshold, calculating the expected phase III sample size, and ensuring the desired power for the entire programme. Existing methods¹ enable optimal designs when the same time-to-event endpoint is used in both phase II and phase III. But in practice, survival data are often not reliably observed in phase II. Our method allows binary outcome data obtained from phase II to inform the sample size calculation for the phase III trial that will use a correlated time-to-event endpoint.

Results:

The proposed method is illustrated by application in multiple myeloma, using achieving minimal residual disease as the endpoint in phase II and progression free survival (PFS) as the endpoint in phase III. With initial parameters set according to MAIA trial² , we found the optimal utility and corresponding optimal phase II sample size. We also did sensitivity analysis under different scenarios based on the change of response and treatment related parameters, the value of the go/no-go decision threshold, the prior distribution of response rate and utility-related parameters such as benefits obtained after approval. Our method would provide the optimal design and also give an expected utility of the whole phase II and phase III programme.

Reference:

1. Kirchner, M., Kieser, M., Götte, H. & Schüler, A. Utility-based optimization of phase II/III programs. Stat. Med. 35, 305–316 (2016).

2. Facon, T. et al. Daratumumab plus Lenalidomide and Dexamethasone for Untreated Myeloma. N. Engl. J. Med. 380, 2104–2115 (2019).

posters-wednesday: 39

Beyond first events: Advancing recurrent adverse event estimates in clinical research.

Nicolas Sauvageot, Leen Slaets, Anirban Mitra, Zoe Craig, Jane Gilbert, Lilla Di Scala, Stefan Englert

Johnson & johnson, Switzerland

Safety analyses of adverse events (AEs) are critical for evaluating the benefit-risk profile of therapies; however, these analyses often rely on simplistic estimators that fail to fully capture the complexity present in safety data. The SAVVY consortium, a collaboration between pharmaceutical companies and academic institutions, aims to improve the estimation of the probability of observing the first AE by time t, using survival techniques appropriately dealing with varying follow-up times and competing events (CEs). Through simulation studies¹ and a meta-analysis², the project demonstrated that common methods for estimating the probability of first events such as incidence proportions, Kaplan–Meier (KM) estimators, and incidence densities often fail to account for important factors like censoring and CEs. It concluded that the Aalen-Johansen estimator is the gold standard when focusing on the first event, providing the most reliable estimates, particularly in the presence of CEs.

Only considering first events does not reflect the real burden that a patient may experience in clinical studies. Nevertheless, usual safety reporting and existing research predominantly focuses on the first AE, overlooking the recurrent nature of AEs. Recognizing that both first and subsequent events provide a more accurate representation of safety profiles, there is a clear need to describe both first- and recurrent-AEs in safety reporting.

The objective of this work is to identify appropriate methods for analyzing recurrent AEs in the presence of varying follow-up times and CEs. To achieve this, we perform a simulation study within a recurrent event framework to compare several estimators quantifying the average number of events per subject over time, including:

Event Rate
Exposure Adjusted Event Rate (EAER)
Mean Cumulative Count (MCC) without accounting for CEs
MCC accounting for CEs³

Our simulations evaluate the performance of these methods regarding bias and examine the impact of various trial characteristics such as the proportion of censoring, the amount of CEs, the AE rate, and the evaluation time point. We illustrate and further strengthen the simulation-based results using real clinical trial data.

References:

1: Stegherr R et al. Estimating and comparing adverse event probabilities in the presence of varying follow-up times and competing events.Pharm Stat.2021Nov;20(6):1125-1146.

2: Rufibach, K et al. Survival analysis for AdVerse events with VarYing follow-up times (SAVVY): summary of findings and assessment of existing guidelines.Trials 25,353(2024).

3: Dong H et al. Estimating the burden of recurrent events in the presence of competing risks: the method of mean cumulative count.Am J Epidemiol.2015Apr1;181(7):532-40.

posters-wednesday: 40

Cure models to compare aftercare monitoring schemes in pediatric cancer

Ulrike Pötschger¹, Harm van Tinteren², Evgenia Glogova¹, Helga Arnardottir¹, Paulina Kurzmann¹, Sabine Taschner-Mandl¹, Lieve Titgat², Martina Mittlböck³

¹St. Anna Children's Cancer Research Institute, Austria; ²Princess Maxima Center; ³Medical University of Vienna, Center for Medical Data Science

Background / Introduction

Neuroblastoma is a malignant tumor of the peripheral nervous system and 50% of the patients are high risk with a poor outcome. Monitoring with minimally invasive liquid biopsies may now allow earlier detection of tumor recurrence compared to conventional follow up evaluations based on imaging and bone marrow biopsies.

In a randomized study two monitoring strategies for relapsed Neuroblastoma are compared: minimal invasive liquid biopsies-based monitoring and conventional follow-up evaluations imaging and bone marrow biopsies.

The primary endpoint is disease-free survival (DFS). When liquid biopsies monitoring is beneficial, disease recurrences can be detected earlier. Thus, survival curves are expected to show an early group difference that vanishes in the long-term and consequently non-proportional hazards are expected.

Methods

The primary statistical evaluation of the treatment effect will be done with a Weibull mixture cure model. The crucial assumption underlying a mixture Cure model is that DSF results from the survival experience of two subgroups: cured patients and uncured patients. Within this model the proportion of cured patients and the time of an event for the uncured subpopulation are modelled separately. The time to detect a recurrence in the subpopulation of uncured patients is of primary interest here.

Monte-Carlo simulations were performed to evaluate power and statistical properties of the Weibull mixture cure-model. For the standard monitoring arm, the inversion method is used to simulate survival data following a mixture cure model as observed in historical populations. For the experimental arm the effect of different data-generating processes and liquid biopsy schedules are explored.

Results

Simulation studies helped to explore different liquid biopsy schedules, effect sizes (lag time between detectable signals with liquid biopsy and imaging) under various data generating processes. Accordingly, the simulation studies helped to refine the study-design and schedule of the liquid biopsies. As compared to a conventional analysis with a Cox regression model, substantial gains in statistical power could be achieved. With a two-sided alpha of 5% and n=150 patients, the simulated power to detect recurrences 5 months earlier was 81% and 60% for the Cure- and Cox-model, respectively.

Conclusion

Comparing aftercare evaluations with different schedules and sensitivities is methodologically challenging. With anticipated non-proportional hazards, it is important to directly address the primary interest in an earlier signal-detection. Simulation studies helped to assess power and to develop an optimal monitoring schedule. Cure-models provide results with a clear interpretation and lead to substantial gains in statistical power.

posters-wednesday: 41

Comparison of treatment sequences in advanced pancreatic cancer

Norbert Marschner^1,2, Nina Haug³, Susanna Hegewisch-Becker⁴, Marcel Reiser⁵, Steffen Dörfel⁶, Rüdiger Liersch⁷, Hartmut Linde⁸, Thomas Wolf⁹, Anna Hof¹⁰, Anja Kaiser-Osterhues², Karin Potthoff², Martina Jänicke¹⁰

¹Med. Klinik 1, Universitätsklinik Freiburg, Freiburg, Germany; ²Medical Department, iOMEDICO, Freiburg, Germany; ³Biostatistics, iOMEDICO, Freiburg, Germany; ⁴Hämatologisch-Onkologische Praxis Eppendorf (HOPE), Hamburg, Germany.; ⁵PIOH-Praxis Internistische Onkologie und Hämatologie, Köln, Germany; ⁶Onkozentrum Dresden/Freiberg, Dresden, Germany; ⁷Hämatologisch-onkologische Gemeinschaftspraxis, Münster, Germany; ⁸MVZ für Blut- und Krebserkrankungen, Potsdam, Germany; ⁹BAG, Gemeinschaftspraxis Hämatologie-Onkologie, Dresden, Germany; ¹⁰Clinical Epidemiology and Health Economics, iOMEDICO, Freiburg, Germany

There are no clear guidelines regarding the optimal treatment sequence for advanced pancreatic cancer, as head-to-head phase III randomised trials are missing. We assessed real-world effectiveness of three frequently administered sequential treatment strategies: FOLFIRINOX→GEMNAB, GEMNAB→FOLFOX/OFF and GEMNAB→ NALIRI + 5-FU. To this end, we emulated a hypothetical target trial where patients were randomised to one of these sequences before the beginning of first-line therapy. As causal estimand, we quantified the per-protocol effect of treatment on overall survival and time-to-deterioration of health-related quality of life. Treatment effects were estimated both for the whole population and stratified by risk group according to the Pancreatic Cancer Score¹. Our analysis included 1551 patients with advanced pancreatic cancer from the prospective, clinical cohort study Tumour Registry Pancreatic Cancer receiving FOLFIRINOX (n = 613) or gemcitabine/nab-paclitaxel (GEMNAB; n = 938) as palliative first-line treatment. We used marginal structural modeling to adjust for time-varying confounding affecting the relation between treatment and endpoint — a key challenge in real-world data analysis². The estimated effectiveness of the three treatment sequences evaluated was largely comparable. Patients with poor prognosis might benefit from intensified treatment with FOLFIRINOX→GEMNAB in terms of survival and quality of life. Future randomised trials on sequential treatments in advanced pancreatic cancer are warranted.³

1. Marschner N, Hegewisch-Becker S, Reiser M, von der Heyde E, Bertram M, Hollerbach SH, Kreher S, Wolf T, Binninger A, Chiabudini M, Kaiser-Osterhues A, Jänicke M, et al. FOLFIRINOX or gemcitabine/nab-paclitaxel in advanced pancreatic adenocarcinoma: A novel validated prognostic score to facilitate treatment decision-making in real-world. Int J Cancer 2023;152:458–69.

2. Robins JM, Hernán MA, Brumback B. Marginal Structural Models and Causal Inference in Epidemiology. In: Epidemiology. 2000. 550–60.

3. Marschner N, Haug N, Hegewisch‐Becker S, Reiser M, Dörfel S, Lerchenmüller C, Linde H, Wolf T, Hof A, Kaiser‐Osterhues A, Potthoff K, Jänicke M, et al. Head‐to‐head comparison of treatment sequences in advanced pancreatic cancer—Real‐world data from the prospective German TPK clinical cohort study. Intl Journal of Cancer 2024;155:1629–40.

posters-wednesday: 42

Clinical Trials with Time-to-event-endpoint: Interim Prediction of Number of Events with Confidence Distributions

Edoardo Ratti, Maria Grazia Valsecchi, Stefania Galimberti

Bicocca Bioinformatics Biostatistics and Bioimaging B4 Center, School of Medicine and Surgery, University of Milan-Bicocca, Monza, Italy

Introduction. An important aspect in randomized clinical trials design is planning interim analyses. With time-to-event endpoints, the target sample size is function of event number. It is crucial that studies provide sufficient follow-up to observe the event number needed to preserve power. Novel approaches were developed in a bayesian framework to predict the date at which the target number of events is reached (maximum information trial). However, there is little on forecasting number of events expected at a fixed future date with corresponding prediction interval in trials with fixed follow-up time (maximum duration trial).

Methods. Based on a recent paper on the use of confidence distributions in clinical trials [1], we adapt a prediction method developed in reliability analysis [2] and show its potential in the clinical context. The proposed method obtains prediction intervals from a predictive distribution constructed on a bootstrap-based confidence distribution of the parameters of the fitted survival model. The appropriateness of the framework was assessed by application on a real phase III trial and by evaluating intervals coverage probability with simulations.

Results. Using data from a published phase III trial [3], at the second interim (accrual closed) and every subsequent 6 months we predicted the number of occurring event. Results show that all intervals included the observed number of events. Simulations show that prediction intervals have the desired coverage under the appropriate survival distribution.

Conclusions. For a maximum duration trial, it is crucial to predict the number of events at future times with proper prediction intervals. The presented approach allows to construct valid predictive inference based on confidence distributions accommodating different parametric model/censoring mechanisms. This is an alternative to a bayesian approach. Its use is proposed here for prediction after accrual closure and further work will face modelling accrual.

References
[1] Marschner IC., Confidence distributions for treatment effects in clinical trials: Posteriors without priors. Statistics in Medicine. 2024; 43(6): 1271-1289
[2] Tian Q., Meng F., Nordman D. J., Meeker W. Q., Predicting the Number of Future Events. Journal of the American Statistical Association. 2021; 117(539): 1296–1310
[3] Conter V, Valsecchi MG, Cario G, at al. Four Additional Doses of PEG-L-Asparaginase During the Consolidation Phase in the AIEOP-BFM ALL 2009 Protocol Do Not Improve Outcome and Increase Toxicity in High-Risk ALL: Results of a Randomized Study. J Clin Oncol. 2024 Mar 10;42(8):915-926.

posters-wednesday: 43

A Bayesian-Informed Dose-Escalation Design for Multi-Cohort Oncology Trials with Varying Maximum Tolerated Doses

Martin Kappler¹, Yuan Ji²

¹Cytel Inc., Waltham, USA; ²University of Chicago, USA

In oncology dose-escalation trials, it is common to evaluate a drug across multiple cancer types within the same study. However, different cancer types may also have different maximum tolerated doses (MTDs) due to potentially different underlying patient characteristics. Standard approaches either pool all patients, potentially ignoring important differences between cancer types, or conduct separate dose-escalation processes for each type, which can lead to inefficiencies. We propose a dose-escalation design that leverages the dose-level information from faster-recruiting cohorts to inform dose-escalation and de-escalation rules for slower-recruiting cohorts, thereby balancing safety, efficiency, and cohort-specific MTD estimation.

Our approach is based on a model assisted dose escalation design and uses informative priors to leverage dose-toxicity information from the faster-recruiting cohort to the slower-recruiting cohort. This approach enables a more conservative and adaptive dose-escalation process for slower cohorts by updating the prior based on observed dose-limiting toxicities in the faster cohort. The informative prior ensures that the dose-escalation in the slower cohort is both cautious and responsive to emerging data, without requiring separate dose-escalation processes for each cancer type. Uncertainty for slower cohorts is reduced and unnecessary toxicity risks are avoided.

The operating characteristics of the approach (probability to determine MTD, number of patients exposed to toxic doses, etc.) are assessed via simulations over a variety of scenarios in the two cohorts and are compared to separate or pooled escalation.

posters-wednesday: 44

Comparison of Bayesian Approaches in Single-Agent Dose-Finding Studies

Vibha Srichand

Prasanna School of Public Health, Manipal Academy of Higher Education, India

Single-agent dose-finding studies conducted as part of phase 1 clinical trials aim to obtain sufficient information regarding the safety and tolerability of a drug, with the primary objective of determining the Maximum Tolerated Dose (MTD) – the maximum test dose that can be administered with an acceptable level of toxicity. While the 3+3 design has been the conventional choice for dose-finding studies, innovative Bayesian designs have gained prominence. These designs provide a framework to incorporate prior knowledge with data accumulated during the study to adapt the study design and efficiently estimate the MTD. However, existing Bayesian assume a specific parametric model for the dose-toxicity relationship which reduces its adaptability to complex data patterns. To address this limitation, recent research has introduced nonparametric Bayesian methods which are model free, robust and well-suited for small sample sizes.Thus, it is imperative to comprehensively compare the performance of parametric and nonparametric Bayesian methods and provide evidence for the implementation of different methods.

This paper aims to understand the accuracy, safety and adaptability of dose-finding methods by analysing different scenarios of target toxicity probabilities and varying cohort sizes for a predetermined sample size. The methods under review are as follows: traditional method – 3+3 design; parametric methods – continual reassessment method (CRM), modified toxicity probability (mTPI and mTPI-2), keyboard and Bayesian optimal interval designs (Kurzrock et al., 2021) as well as nonparametric methods – Bayesian nonparametric continual reassessment (Tang et al., 2018) and Bayesian stochastic approximation method (Xu et al., 2022). The performance of the designs will be assessed using four key metrics, with conclusions drawn based on extensive simulation studies.

Keywords: Dose-finding, Maximum Tolerated Dose, Clinical trial design, Bayesian, Parametric, Nonparametric, Continual Reassessment method, Stochastic Approximation

References:
Kurzrock, R., Lin, C.-C., Wu, T.-C., Hobbs, B. P., Pestana, R. C., MD, & Hong, D. S. (2021). Moving beyond 3+3: The future of clinical trial design. American Society of Clinical Oncology Educational Book. American Society of Clinical Oncology. Meeting, 41, e133–e144. https://doi.org/10.1200/EDBK_319783

Tang, N., Wang, S., & Ye, G. (2018). A nonparametric Bayesian continual reassessment method in single-agent dose-finding studies. BMC Medical Research Methodology, 18(1), 172. https://doi.org/10.1186/s12874-018-0604-9

Xu, J., Zhang, D., & Mu, R. (2022). A dose-finding design for phase I clinical trials based on Bayesian stochastic approximation. BMC Medical Research Methodology, 22(1), 258. https://doi.org/10.1186/s12874-022-01741-3

posters-wednesday: 45

Evaluating the effect of different non-informative prior specifications on the Bayesian proportional odds model in randomised controlled trials

Chris J Selman^1,2, Katherine J Lee^1,2, Michael Dymock^3,4, Ian Marschner⁵, Steven Y.C. Tong^6,7, Mark Jones^4,8, Tom Snelling^3,8, Robert K Mahar^1,9,10

¹Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Australia; ²Department of Paediatrics, University of Melbourne, Australia; ³Wesfarmers Centre of Vaccines and Infectious Diseases, The Kids Research Institute Australia, Australia; ⁴School of Population and Global Health, The University of Western Australia, Australia; ⁵NHMRC Clinical Trials Centre, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2050, Australia; ⁶Victorian Infectious Diseases Services, The Royal Melbourne Hospital, Australia; ⁷Department of Infectious Diseases, University of Melbourne, Australia; ⁸Sydney School of Public Health, Faculty of Medicine and Health, University of Sydney, Australia; ⁹Centre for Epidemiology and Biostatistics, University of Melbourne, Australia; ¹⁰Methods and Implementation Support for Clinical and Health Research Hub, University of Melbourne, Australia

Background

Ordinal outcomes can be a powerful way of combining multiple distinct patient outcomes into a single endpoint in randomised controlled trial (RCT). Such outcomes are commonly analysed using proportional odds (PO) models. When the analysis uses a Bayesian approach, it is not obvious what ‘non-informative’ priors should be used and whether these are truly ‘non-informative’, particularly in adaptive trials where early stopping decisions may be influenced by the choice of prior.

Methods

This study evaluates the effect of different non-informative prior specifications on the Bayesian PO model for a two-arm trial in the context of a design with an early stopping rule and a fixed design scenario. We conducted an extensive simulation study, varying factors such as effect size, sample sizes, number of categories and the distribution of the control arm probabilities. The models are also illustrated using data from the Australian COVID-19 Trial.

Results

Our findings indicate that the prior specification can introduce bias in the estimation of the treatment effect, particularly when control arm probabilities are right-skewed. Using an R-square prior specification had the smallest bias and increased the likelihood of stopping early in such settings when there was a treatment effect. However, this specification exhibited larger biases for control arm probabilities that were U-shaped and trials that incorporated an early stopping rule. Using Dirichlet priors with concentration parameters close to zero had the smallest bias when probabilities were right-skewed in the control arm, and were more likely to stop earlier for superiority for trials that incorporated early stopping rules even if there was no treatment effect. Specifying concentration parameters close to zero using the Dirichlet prior may also cause computational issues at interim analyses with small sample sizes and larger number of categories in the outcome.

Conclusion

The specification of non-informative priors in Bayesian adaptive trials that use ordinal outcomes has implications for treatment effect estimation and early stopping decisions. Careful selection of priors that consider the likely distribution of control arm probabilities or informed sensitivity analyses may be essential to inference is not unduly influenced by inappropriate priors.

posters-wednesday: 46

Bayesian decision analysis for clinical trial design with binary outcome in the context of Ebola Virus Disease outbreak – Simulation study

Drifa Belhadi^1,2, Joonhyuk Cho^3,4,5, Pauline Manchon⁶, Denis Malvy^7,8, France Mentré^1,6, Andrew W Lo^3,5,9,10, Cédric Laouénan^1,6

¹Université Paris Cité, Inserm, IAME, F-75018 Paris, France; ²Saryga, France; ³MIT Laboratory for Financial Engineering, Cambridge, MA, USA; ⁴MIT Department of Electrical Engineering and Computer Science, Cambridge, MA, USA; ⁵MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA; ⁶AP-HP, Hôpital Bichat, Département d′Epidémiologie Biostatistiques et Recherche Clinique, F-75018 Paris, France; ⁷UMR 1219 Inserm/EMR 271 IRD, University of Bordeaux, Bordeaux, France; ⁸Department for Infectious and Tropical Diseases, University Hospital Center Pellegrin, Bordeaux, France; ⁹MIT Operations Research Center, Cambridge, MA, USA; ¹⁰MIT Sloan School of Management, Cambridge, MA, USA

Background

When designing trials for high-mortality diseases with limited available therapies, the conventional 5% type I error rate used for sample size calculation can be questioned. Bayesian Decision Analysis (BDA) for trial design allows for the integration of multiple health consequences of the disease when designing trials. This study adapts BDA for trials with binary outcomes to calculate optimal sample sizes and type I error rates in the context of an Ebola virus disease outbreak.

Methods

We consider a fixed, two-arm randomized trial with a binary outcome and two types of clinical trial loss: post-trial loss, for not approving an effective treatment or approving an ineffective treatment; in-trial loss, for not administrating an effective treatment to patients in the control arm or for administrating an ineffective treatment for patients in the experimental arm. The model accounts for side effects of an ineffective treatment and the burden of Ebola disease. A loss function was defined to summarize the multiple consequences into a single measure, and optimal sample sizes (n) and type I error rates (α) were derived by minimizing this loss function.

Results

Using the mortality rate as the outcome, we varied model parameters to represent different Ebola epidemic scenarios, such as target population size, mortality rate, and treatment efficacy. In most cases, BDA-optimal α values exceeded the conventional one-sided 2.5% rate and BDA-optimal sample sizes were smaller. Additionally, we conducted simulations comparing a BDA-optimized two-arm trial (fixed or sequential) to standard designs (two-arm/single-arm, fixed/sequential) across various outbreak scenarios. Overall, statistical power remained comparable across designs, except when sample size assumptions were incorrect, or when the trial started after the outbreak peak; in these situations, BDA-optimized trials were associated with superior powers.

Conclusion

This BDA adaptation provides a new framework for designing trials with a binary outcome, enabling more effective evaluation of therapeutic options. It is particularly valuable for diseases with high mortality rates and limited treatment options. In an outbreak context, where case numbers decline after the epidemic peak and there is uncertainty around mortality rate and treatment efficacy, BDA-optimized trials offer an interesting approach for evaluating new experimental treatments.

posters-wednesday: 47

Relevance of Electronic Medical Records for Clinical Trial Eligibility: A Feasibility Assessment in Acute Stroke Studies

Yusuke Sasahara¹, Taizo Murata², Yasufumi Gon^3,4, Toshihiro Takeda^2,5, Eisuke Hida¹

¹Department of Biostatistics and Data Science, Osaka University Graduate School of Medicine; ²Department of Medical Informatics, Osaka University Hospital; ³Department of Neurology, Osaka University Graduate School of Medicine; ⁴Academic Clinical Research Center, Osaka University Hospital; ⁵Department of Integrated Medicine, Medical Informatics, Osaka University Graduate School of Medicine

Electronic medical records (EMRs) are a key source of real-world data in clinical trials. In hyperacute-phase diseases, where conducting RCTs is challenging, external control arms using EMRs are expected to enhance trial feasibility. In July 2024, FDA released guidance on evaluating EMRs and claims data to support regulatory decision-making, emphasizing the importance of ensuring data reliability and relevance. However, evidence on how well EMRs meet these criteria remains limited. This study evaluates the feasibility of extracting clinical trial eligibility criteria from EMRs, focusing on data extraction and structuring in acute stroke studies.

Five acute stroke-related clinical trials with detailed eligibility criteria were selected from the jRCT and UMIN-CTR databases. Registration forms were created for each trial, and an expert panel (physician, medical informatician, statistician, and data manager) evaluated the feasibility of extracting these criteria from EMRs at Osaka University Hospital. Data types were categorized into four groups: structured, mosaic (a mix of structured and unstructured), unstructured, and unavailable. The proportion of each type was summarized by trial and item category, and extraction feasibility was scored (structured: 3, mosaic: 2, unstructured: 1, unavailable: 0). Data were visualized using bar charts, box plots, and radar charts.

Across all five trials, structured data accounted for 37.6%, mosaic for 12.1%, unstructured for 42.3%, and unavailable for 8.1%. The proportion of unstructured data varied among trials, with Trial B having the highest (68.3%) and Trial C the lowest (15.8%). Trial A had the highest unavailable data (16.7%). Imaging-related variables were entirely unstructured (100%), and medical history/comorbidity (84.6%) and diagnosis (61.1%) also lacked structure. In contrast, structured data were demography (80.0%), treatment applicability (62.5%), and laboratory/vital signs (56.3%).

The study assessed how well EMRs align with clinical trial eligibility criteria to evaluate their relevance. Due to the variability in EMRs availability across trials and items, a preliminary assessment is necessary for each protocol. Since 42.3% of all items were unstructured, manual chart review may be unavoidable. Structured data were more prevalent in demography and treatment applicability, whereas imaging and medical history/comorbidity data posed major challenges. FDA guidelines highlight the need for validation and bias assessment in data transformation, requiring standardized processes to enhance EMR relevance for regulatory use.

The feasibility of extracting eligibility criteria and the degree of structuring in EMRs varied across trials and items. While imaging and medical history/comorbidity data were poorly structured, developing standardized data extraction methods may enhance the relevance of EMRs.

posters-wednesday: 48

Navigating complex and computationally demanding clinical trial simulation

Saumil Shah, Mitchell Thomann

Boehringer Ingelheim, Germany

Many diseases lacking treatment options have multiple correlated endpoints as progression biomarkers. Establishing efficacy in many endpoints with randomised dose-finding represents an unmet need.

A seamless Phase IIa-IIb trial design was proposed, featuring staggered recruitment, dropouts, longitudinal and correlated endpoints, and interim analysis. The trial design also included using historical information using Bayesian meta-analytic priors and Bayesian dose-finding methods to improve trial efficacy. Scenario planning across a wide range of effects, dose-response models and endpoint correlations is a considerable challenge. Thus, a robust trial simulation implementation was required to estimate operating characteristics precisely and optimise the study design.

We used the random slope and intercept method to capture the longitudinal endpoint and patient-level variance. The correlated secondary endpoint was generated from conditional distributions. The informative historical prior was updated with the generated data to get a posterior. We used the posterior in the interim analysis to compare the across-arm gains in the change from baseline values. The final analysis used the multiple comparison procedureand Bayesian modelling for randomised dose-finding. We considered six appropriate candidate dose-response models for the Bayesian modelling. Each endpoint was assigned go-no-go boundaries for stop, continue or success decisions. We used the median of posterior distribution from the fitted Bayesian models to make the decision.

We used R programming language and available open-source packages to implement the trial simulation. The data generation and analysis steps were implemented as a collection of functions in a pipeline. The pipeline was managed using {targets}, a workflow management package. Such management allowed us to handle many scenarios and replicates, preventing redundant and unnecessary computations. It also helped with parallel execution, bringing execution time to the order of hours on a high-performance cluster.

Our implementation enabled the rapid exploration of a wide range of trial scenarios and treatment effects, enabling reliable estimation of the operating characteristics of each design aspect. This approach provides a potent tool for optimising clinical trial design across therapeutic areas.

posters-wednesday: 49

Transforming Clinical Trials: The Power of Synthetic Data in Augmenting Control Arms

Emmanuelle Boutmy¹, Shane O Meachair², Julie Zhang⁵, Sabrina de Souza¹, Saheli Das⁴, Dina Oksen³, Anna Tafuri¹, Lucy Mosquera^5,6

¹Merck KGaA, Darsmtadt, Germany; ²Aetion, Barcelona, Spain; ³Merck Biopharma Co., Ltd.; ⁴Merck Specialities Pvt., Ltd.; ⁵Aetion, New York, USA; ⁶CHEO Research Institute, Ottawa, Ontario, Canada

Background: Synthetic data generation (SDG) creates artificial datasets that replicate the characteristics of clinical trial (CT) data, potentially mitigating challenges when real data is scarce. This study aimed to explore methods for Synthetic Data Augmentation (SDA): augmentation (adding synthetic data to original data) of a CT control arm. Data from the INTRAPID lung 0037 control arm were used, consisting of advanced NSCLC patients with high PD-L1 expression treated with pembrolizumab (n=152).

Methods: Three generative models were employed to create synthetic data: Sequential decision trees (SDT), Bayesian networks (BN), and Transformer synthesis (TS), alongside a reference approach using bootstrapping (BS). Descriptive statistics, parameter estimates, and standard errors were calculated using a multiple imputation method for synthetic data using 10 synthetic datasets. The quality of synthetic data was assessed through utility and workload-specific assessments. The primary outcome was bias relative to the full CT control arm estimate and standard deviation to assess variability across samples. Bias assessments compared augmented estimates to Progression Free Survival (PFS) from the full control arm, simulating scenarios with 50% unavailable control arm data.

Results: Univariate distances and multivariate relationships were below the pre-specified threshold indicating close replication of real data distribution for all models except TS. Results indicated that synthetic data produced outcomes comparable to real data, with bias in PFS ranging from -2.69 for TS to -+0.2 months for SDT, where values closer to zero indicates better performance (-2.22 for BN, -0.71 months for BS). SDT synthesis demonstrated the lowest bias among all augmentation methods including the reduced control sample alone. In the sensitivity analysis, SDT was the only approach whose 95% interval included the true ground truth PFS estimate from the full CT control arm.

Discussion: These results suggest that generative models could yield nearly identical distributions for real and synthetic variables. SDA has been shown to yield estimates with low bias compared to using the available CT data alone and may be leveraged for clinical trials where patient enrollment in the control arm is difficult, such as simulating trial scenarios or completing datasets for underrepresented groups. Further research is needed to confirm and develop a synthetic validation framework to assess the limits of SDA for statistical inference, as well as impact on other statistical quantities such as power and type I error rates and harness the transformative power of synthetic data.

posters-wednesday: 50

Integrating stakeholder perspectives in modeling routine data for therapeutic decision-making

Michelle Pfaffenlehner^1,2, Andrea Dreßing^3,4, Dietrich Knoerzer⁵, Markus Wagner⁶, Peter Heuschmann^7,8,9, André Scherag¹⁰, Harald Binder^1,2, Nadine Binder^2,11

¹Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Germany; ²Freiburg Center for Data Analysis and Modeling and AI, University of Freiburg, Freiburg, Germany; ³Department of Neurology and Clinical Neuroscience, Medical Center, University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; ⁴Freiburg Brain Imaging Center, Faculty of Medicine, Medical Center–University of Freiburg, University of Freiburg, Freiburg, Germany; ⁵Roche Pharma AG, Grenzach, Germany; ⁶Stiftung Deutsche Schlaganfall-Hilfe, Gütersloh, Germany; ⁷Institute for Medical Data Sciences, University Hospital Würzburg, Würzburg, Germany; ⁸Institute for Clinical Epidemiology and Biometry, University Würzburg, Würzburg, Germany; ⁹Clinical Trial Centre, University Hospital Würzburg, Würzburg, Germany; ¹⁰Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany; ¹¹Institute of General Practice/Family Medicine, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany

Background

Routine medical data offer a valuable resource for generating evidence to improve patient care in therapeutic contexts beyond randomized controlled trials. These data include patient-related parameters, diagnostic information, and treatment data recorded in digital patient records from hospital admission to discharge. With the introduction of the German Health Data Use Act (GDNG) in 2024, the use of such data is becoming more accessible in Germany. However, methodological approaches must account for the diverse needs of stakeholders, including clinicians, the pharmaceutical industry, patient advocacy groups, and statistical modelers. This study explores how different perspectives shape the use and interpretation of routine data in medical decision-making, with each perspective aiming to address specific research questions.

Methods

Building on insights from an interdisciplinary workshop that we recently organized, we examine how various stakeholder perspectives can be incorporated into the modelling of routine data. We discuss key routine data sources, such as electronic health records, and highlight statistical and artificial intelligence (AI)-based techniques that could be used to extract meaningful insights. Moreover, the linkage of patient-reported outcomes will be discussed to address the patient’s perspective. Additionally, we illustrate how different modelling approaches address distinct research questions, reflecting the priorities of the stakeholder groups. A particular focus is placed on multi-state models, which are well-suited for capturing disease and treatment trajectories by structuring diagnoses and treatments as transition events over time.

Results

Our conceptual analysis identifies multiple approaches for integrating diverse perspectives into routine data modelling. For example, clinicians prioritize clinical relevance and interpretability, the pharmaceutical industry focuses on regulatory compliance and real-world evidence, while patient representatives emphasize transparency and inclusion of patient-reported outcomes. Multi-state models are particularly advantageous because they allow the characterization of dynamic disease processes and patient transitions between states, offering a more accessible and interpretable approach to routine data analysis. Still, challenges remain in data quality.

Conclusion

Effective use of routine data in medical decision-making requires robust analytical methods that meet the needs of diverse stakeholders. Multi-state models provide a dynamic framework for capturing disease progression and treatment pathways, making them particularly suitable for clinical and regulatory applications. To maximize their impact, future research should focus on improving data integration, transparency in methods used, and making the methods practically useful, leading to better integration into healthcare decision-making.

posters-wednesday: 51

Aligning Synthetic Trajectories from Expert-Based Models with Real Patient Data Using Low-Dimensional Representations

Hanning Yang¹, Meropi Karakioulaki², Cristina Has², Moritz Hess¹, Harald Binder¹

¹Institute of Medical Biometry and Statistics (IMBI), University of Freiburg, Germany; ²Department of Dermatology, University of Freiburg, Germany

Background:

Quantitative models, such as ordinary differential equations (ODEs), are widely used to model dynamic processes, such as disease progression - e.g., for subsequently generating synthetic data. However, calibrating them with real patient data, which is typically sparse, noisy, and highly heterogeneous can be challenging. This is particularly notable in rare diseases like Epidermolysis Bullosa (EB), where observations are limited, and data is often missing. To address this, we developed an approach to calibrate ODEs informed by expert knowledge with real, observational patient data using low-dimensional representations.

Methods:

We developed an ODE system informed by experts to model EB key biomarker dynamics and employed an autoencoder for dimensionality reduction. Calibration of ODE parameters was informed by a loss computed from the distance between real and ODE-derived synthetic observations in the latent space. Specifically, this loss captures key trajectory features, including temporal alignment and pointwise differences. To handle discrepancies in initial conditions, a centring approach is applied during early iterations, and an imputation layer is trained to address missing data.

Results:

A simulation study demonstrated robustness under high noise and complex missing patterns, with parameters converging to the ground truth. When applied to real EB data, our method consistently improved the alignment between synthetic and real data, despite the challenges of noisy and sparse observations from only 21 highly diverse patients. As a result, relationships in synthetic data became more consistent with real patient data.

Conclusion:

This study presents a novel approach for calibrating an expert-informed synthetic data model using neural networks, supporting realistic synthetic individual patient data (IPD) generation and advancing rare disease research.

posters-wednesday: 52

Integrating semantic information in care pathway studies with medical code embeddings, application to the case of Amyotrophic Lateral Sclerosis

Corentin FAUJOUR^1,2, Stéphane BOUEE¹, Corinne EMERY¹, Anne-Sophie JANNOT^2,3

¹CEMKA, Bourg-La-Reine, France; ²Université Paris Cité, Inria, Inserm, HeKA, F-75015 Paris, France; ³French National Rare Disease Registry (BNDMR), Greater Paris University Hospitals (AP-HP), Université Paris Cité, Paris, France

Background

Modelling care pathways from claims databases is a challenging task given the tens of thousands of existing medical codes, sometimes associated with the same medical concept. In such modelling, medical codes are usually represented as sets of binary variables (one-hot encoding), which does not allow for the inclusion of semantic information. Embedding medical codes in a continuous space, so that semantically related codes are represented by numerically similar vectors, could improve care pathway modelling.

We aimed to embed codes from the International Classification of Diseases (ICD-10) and the Anatomical Therapeutic Chemical Classification (ATC) into a common latent space. A secondary goal was to use these embeddings in the prediction of amyotrophic lateral sclerosis (ALS).

Methods

A co-occurrence matrix between codes was constructed from care sequences contained in the ESND, a French claims database containing care consumptions for a representative sample of 1.5 million patients over 15 years. Code embeddings for all 5 classifications systems available in the ESND, i.e. representative numerical vectors that capture semantic relationships, were then obtained using singular value decomposition on the corresponding pointwise mutual information matrix.

Embeddings’ consistency was assessed using UMAP visualisation and nearest neighbour searches. The resulting embeddings were used to predict the occurrence of ALS in a penalized logistic regression model, taking as input all codes in the care sequence prior to diagnosis. Sequence-level embeddings were obtained by an average-pooling operation at the code level. We compared the performance obtained using embeddings as input with those obtained using one-hot encoding.

Results

We obtained embeddings for 30,000 codes, including 9,900 ICD-10 codes and 1,400 ATC codes from 1.5 million care pathways representing 400 million tokens. Consistency evaluation revealed that semantically related codes form clusters in the latent space, e.g., the diagnosis code for motor neuron disease is surrounded by other muscle disorders (myopathies, muscular dystrophy, etc.) and its specific treatment (riluzole).

Using the resulting embeddings to classify sequences from 22,000 ALS patients and 22,000 matched controls, we were able to significantly improve predictive performance (AUC: 0.78, 95% CI [0.77-0.79] with embeddings vs. 0.74 [0.73-0.75] with one-hot encoding). This suggests that the inclusion of semantic information is relevant for such a prediction task.

Conclusion

This is the first semantic representation of ICD-10 and ATC codes in a common latent space, two classifications commonly used in claims databases. The resulting embeddings can be used to improve the representation of healthcare pathways.

posters-wednesday: 53

Inequalities in impact of respiratory viruses: development and analysis of respiratory virus phenotypes in EHRs from England using OpenSAFELY

Em Prestige¹, Jennifer K. Quint², Charlotte Warren-Gash¹, William Hulme³, Edward PK Parker¹, Elizabeth Williamson¹, Rosalind M. Eggo¹

¹London School of Hygiene & Tropical Medicine, United Kingdom; ²Imperial College London, United Kingdom; ³Bennett Institute for Applied Data Science, Nuffield Department of Primary Care Health Sciences, University of Oxford, United Kingdom

Background

Respiratory virus burden is large and unequally distributed in England, with disproportionate impact in socioeconomically deprived areas and minority ethnic groups. To explore these disparities using electronic health records (EHRs) computable phenotypes must be designed to identify reported respiratory virus health events. However, many EHR codes are non-specific or uncertain, for example, a patient could have codes for ‘cough’ or ‘suspected influenza’ and neither of these would be highly specific identifiers of flu cases. Therefore, sensitivity and specificity of the phenotypes should determine what codes to include. This research explores the design of phenotypes to identify patients with respiratory viruses - respiratory syncytial virus (RSV), influenza (flu), and COVID-19, and the subsequent application exploring disparities in the impact of these conditions. We highlight the trade-offs between sensitivity and specificity in phenotype design and their implications for identifying health disparities.

Methods

With the approval of NHS England, we used pseudonymized GP data in OpenSAFELY, linked with Hospital Episode Statistics (HES) and ONS mortality data, to develop phenotypes for mild (primary/emergency care) and severe (secondary care) respiratory outcomes. For each virus, we created maximally sensitive and specific phenotypes to capture cases with more frequency or accuracy respectively. Maximally sensitive phenotypes included non-specific symptoms and suspected diagnosis codes, whereas, maximally specific phenotypes included lab test results. We then identified disparities by socioeconomic status and ethnicity in these outcomes from 2016-2024. We used Poisson regression for rates of mild and severe outcomes per 1000 person-years, adjusting for age group, sex, rurality, and where relevant, vaccination status. We performed analyses on the NHS records of approximately 45% of England’s population, presenting a unique opportunity to explore respiratory outcomes in cohorts where cases are rare or under-ascertained.

Results

We report differences and overlap in cases identified using specific versus sensitive phenotypes across the three pathogens. We describe the extent to which disparities in respiratory outcomes vary by pathogen, age cohort and severity of disease and use adjusted models to explore patterns of risk across ethnicity and socioeconomic status in different phenotypes.

Conclusion

Both highly specific and sensitive computable phenotypes are essential tools in EHR research. Their design should align with research objectives, balancing accuracy with the required number of outcomes. Exploring multiple phenotype definitions supports sensitivity analyses and subgroup evaluations. Furthermore, disparities in respiratory virus outcomes highlight the pathogen-specific risks and age-related vulnerabilities that should be targeted to minimise health inequities.

posters-wednesday: 54

Modeling Longitudinal Clinical Outcomes: Comparison of Generalized Linear Models, Generalized Estimating Equations, and Marginalized Multilevel Models in Pediatric Intensive Care

Luca Vedovelli¹, Stefania Lando¹, Danila Azzolina², Corrado Lanera¹, Ileana Baldi¹, Dario Gregori¹

¹University of Padova, Italy; ²University of Ferrara, Italy

Introduction Longitudinal data analysis is essential in neonatal and pediatric intensive care, where patient outcomes evolve rapidly, such as in sepsis progression or respiratory distress. Selecting the right statistical model is critical for accurate clinical effect estimation. We compared four modeling approaches—generalized linear models (GLM), GLM with a shrinkage factor, generalized estimating equations (GEE), and marginalized multilevel models (MMM)—in scenarios replicating real-world complexity, including random effects, latent effects, and transition dynamics. Our study evaluated model accuracy, robustness, and interpretability in small and variable cluster settings typical of intensive care units, where patient populations are often limited, heterogeneous, and subject to rapid physiological changes.

Methods We conducted a simulation study reflecting the heterogeneity of clinical trajectories in neonatal and pediatric intensive care. Scenarios included non-fixed patient clusters ranging from 4 to 10 and sample sizes between 20 and 150. Models were evaluated based on Mean Absolute Percentage Error (MAPE), Type I and Type II error rates, and parameter stability. We assessed the impact of incorporating shrinkage factors in GLM to mitigate estimation biases.

Results MMM consistently outperformed GEE and GLM in small sample sizes and low cluster counts, yielding lower MAPE and reduced bias. This superior performance is due to its integration of marginal and subject-specific effects while accounting for within-cluster correlation. As sample size and cluster numbers increased, performance differences diminished. GEE and GLM exhibited high variability in small samples, with GEE particularly unstable. GLM tended to overestimate effects, inflating Type I error rates. MMM maintained a controlled Type I error rate, though at the cost of slightly reduced power.

Conclusion In neonatal and pediatric intensive care, where patient populations are small and heterogeneous, MMM is a more reliable alternative to GEE and GLM. It balances interpretability and robustness, making it well suited for longitudinal clinical applications. While GLM is adequate in large datasets, its tendency to overestimate effects warrants caution, as it may misguide clinical decisions. GEE, although widely used, is less stable in small samples. Our findings support the use of MMM for clinical research requiring accurate inference of treatment effects and patient trajectories. Future work should explore Bayesian extensions of MMM for enhanced inferential precision through improved uncertainty modeling, small-sample estimation, and incorporation of prior knowledge.

posters-wednesday: 55

Modelling the costeffectiveness of Truvada for the Prevention of Mother to Child Prevention (PMTCT) of Hepatitis B Virus in Botswana

Graceful Mulenga^1,2, Motswedi Anderson^1,4,5, Simani Gaseitsiwe^1,3

¹Botswana Harvard Health Partnership, Botswana; ²Department of Mathematics and Statistical Sciences, Faculty of Science, Botswana International University of Science and Technology, Palapye , Botswana; ³Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA, USA; ⁴The Francis Crick Institute, London, UK; ⁵Africa Health Research Institute, Durban, South Africa

Hepatitis B virus (HBV) infection remains a major public health challenge globally, with approximately 254 million people living with chronic HBV, including over 6 million children under 5 years. Mother-to-child transmission (MTCT) of HBV is responsible for a significant portion of new infections, particularly in high-prevalence regions. Infants born to HBV-infected mothers are at risk of chronic infection, which can lead to severe liver diseases later in life. Truvada (TDF), a nucleoside reverse transcriptase inhibitor, is a recommended antiviral treatment for both HBV and HIV and has shown potential in reducing MTCT of HBV. However, the cost-effectiveness of TDF for preventing MTCT in resource-limited settings like Botswana is not well established. This study aims to evaluate the feasibility and cost effectiveness of three distinct strategies for screening and managing HBV among pregnant women in Botswana . This study will use a cohort of pregnant women in Botswana by assessing three groups as follows; i)No HBV screening or treatment is provided, and TDF prophylaxis is not administered (control group), ii)Screening for Hepatitis B surface antigen (HBsAg) is conducted for all pregnant women, with TDF prophylaxis administered to those who test positive for HBsAg, beginning at 28 weeks gestation and continuing for four weeks postpartum, iii)Screening for both HBsAg and HBV e-antigen (HbeAg) is performed, and TDF prophylaxis is administered exclusively to women who test positive for both HBsAg and HBeAg. A cost-utility analysis (CUA) will be conducted to compare the costs and clinical outcomes of each strategy, with effectiveness measured in terms of the number of HBV transmissions prevented. Costs will include screening for HBsAg and HBeAg, TDF treatment, hepatitis B immunoglobulin (HBIG) for infants, all components of the intervention (such as training, administration, supervision, etc) and maternal healthcare. In addition, a decision-analytic model that would allow the generation of cost-effectiveness estimates will be designed. The Incremental Cost-Effectiveness Ratio (ICER) will be calculated to assess the cost per case of HBV transmission prevented for each strategy. Moreover, sensitivity analyses will be performed to test the robustness of results under varying assumptions related to drug costs, screening effectiveness, and intervention costs.

posters-wednesday: 56

Application of machine learning methods for the analysis of randomised controlled trials: A systematic review

Xiao Xuan Tan, Rachel Phillips, Mansour Taghavi Azar Sharabiani

Imperial College London

Background

Randomised controlled trials (RCTs) collect extensive data on adverse events (AEs), yet their analysis and presentation are often overly simplistic, leading to missed opportunities for identifying potential signals of treatment-related harm. A 2024 scoping review identified a variety of machine learning (ML) approaches being employed in RCTs to identify heterogeneous treatment effects (HTEs) across key participant subgroups [1]. This highlights the range of ML methods being explored to derive insights from RCT data. ML methods hold potential to enhance AE analysis, offering tools to better interpret complex AE data and support data-driven, personalised treatment harm profiles. This review aims to identify ML methods and evaluate applications for analysis of RCT data, revealing both established and potentially suitable ML approaches that could be adapted to analyse AE data in RCTs. Additionally, this review will highlight emerging trends in ML applications to RCTs, including shifts in commonly used techniques, evolving best practices, and expanding use cases beyond HTE analysis.

Methods

A systematic search was conducted in November 2024 via the Embase, MEDLINE, Web of Science and Scopus databases, alongside the preprint repositories arXiv, medRxiv and bioRxiv. Articles were eligible if they applied ML methods to analyse or reanalyse RCT datasets, irrespective of the types of outcomes examined, and accounted for the RCT’s treatment assignment in their analyses. Following screening, a pre-piloted data extraction sheet will be used to systematically collect relevant study details.

Results

After deduplication, 11286 articles were retrieved. Following title and abstract review, 2015 articles were eligible for full text review. Data extraction and synthesis are underway. Results presented will describe (i) study characteristics (e.g., purpose of analysis), (ii) RCT characteristics (e.g., medical area, trial design, outcomes examined) (iii) ML methods used, including model implementation details, use of explainability tools (e.g., SHAP, LIME), results, limitations, reproducibility considerations (e.g., software, code availability, dataset access).

Conclusion

The findings of this review will provide a comprehensive overview of applications of ML methods in RCTs, guiding trialists in their potential use for future trial design and analysis. Additionally, it will pinpoint ML techniques most relevant to the analysis of AEs, an area where more advanced analytical approaches are needed to facilitate early identification of potential signals of harm and improve the understanding of treatment-related harm.

[1] Inoue K, et al. Machine learning approaches to evaluate heterogeneous treatment effects in randomized controlled trials: a scoping review. Journal of Clinical Epidemiology. 2024; 176: 111538.

posters-wednesday: 57

Joint longitudinal modelling of non-normally distributed outcomes and endogenous covariates

Chiara Degan¹, Bart Mertens¹, Pietro Spitali², Erik H. Niks², Jelle Goeman¹, Roula Tsonaka¹

¹Department of Biomedical Data Sciences, Leiden University Medical Center, The Netherlands,; ²Department of Human Genetics, Leiden University Medical Center, The Netherlands,

In biomedical research, longitudinal outcomes and endogenous time-dependent covariates are often recorded, creating the need to develop methodological approaches to assess their associations, evaluate how one outcome changes in relation to the covariate, and determine how this relationship evolve over time.
To address these aspects, endogenous covariate and outcome are typically modelled jointly by assuming correlated random effects (Verbeke et al., 2014). We refer to this model as the Joint Mixed Model (JMM). This approach allows for the combination of variables of different types, while preserving the parameter interpretation of the univariate case and accommodating unbalanced data. However, the association is interpreted through the correlation of random effects rather than directly on the scale of the observed variables. Moreover, by making assumptions on the form of the variance-covariance matrix of the random effects, we impose some constraints on the variables' association form that may lead to biased estimations if misspecified.

As an alternative, we consider a modification of the joint model proposed by Rizopoulos (2017), adapting it to include only longitudinal outcomes rather than a time-to-event component. We refer to this adapted model as the Joint Scaled Model (JSM). It induces the association by copying and scaling the linear predictor of the endogenous covariate into the linear predictor of the outcome. This approach preserves the advantages of the JMM while improving interpretability.

To compare the two model results and assess the impact of their underlying assumptions on conclusions, we propose to analytically derive an association coefficient that measures the marginal relation between variables. The purpose of this coefficient is to construct a quantity that has the same meaning in both models and can be interpreted similarly to a regression coefficient. It measures the change in outcome in response to a unit change in the covariate. Furthemore, it is a quantity that depends on the time of both variables, allowing it to capture the cross-sectional effect of the endogenous covariate on the outcome, as well as their relationship at different time points (lag effect).

The practical application of these models is limited by computational costs, which arise from high-dimensional integrations over random effects. To fill this gap, a flexible Bayesian estimation approach, known as INLA, has been used.

We will present the results of a longitudinal study on Duchenne Muscular Dystrophy patients, with a focus on evaluating the relationship between a bounded outcome and blood biomarkers.

posters-wednesday: 58

Mediation analysis for exploring gender differences in mortality among acute myocardial infarction

Alice Bonomi, Arianna Galotta, Francesco Maria Mattio, Lorenzo Cangiano, Giancarlo Marenzi

IRCCS Centro Cardiologico Monzino, Italy

Background. Women with acute myocardial infarction (AMI) have higher mortality rates than men, influenced by factors such as older age, comorbidities, atypical symptoms, and treatment delays. This study analyzed AMI patients (2003-2018) from the Lombardy Health Database (Italy) to investigate sex differences in in-hospital and one-year mortality, assessing the impact of age, percutaneous coronary intervention (PCI), and post-discharge therapy using mediation analysis.

Methods. Among 263,564 AMI patients (93,363 women, 170,201 men), the primary and secondary endpoints were in-hospital and one-year mortality, respectively. Mediation analysis was performed to evaluate the direct and indirect effects of sex on outcomes, incorporating age, PCI, and post-discharge therapy as mediators. The analysis was conducted using the SAS Proc CALIS procedure (SAS Institute Inc., Cary, NC, USA) based on structural equation modeling, with relationships quantified using standardized β coefficients.

Results. Women had significantly higher in-hospital mortality (10% vs. 5%; P<0.0001) and one-year mortality (24% vs. 14%; P<0.0001) compared to men. Mediation analysis revealed that female sex directly contributed 12% to in-hospital mortality and 4% to one-year mortality, whereas age and undertreatment accounted for the majority of the disparity (88% [β=0.09] and 96% [β=0.15], respectively).

Conclusion. Women with AMI experience higher mortality, primarily due to older age and undertreatment, both during hospitalization and after discharge. Addressing these disparities through optimized treatment strategies may improve outcomes in women with AMI.

posters-wednesday: 59

Bivariate random-effects models for the meta-analysis of rare events

Danyu Li, Patrick Taffe

Center for Primary Care and Public Health (unisanté), Division of Biostatistics, University of Lausanne (UNIL), Switzerland

It is well known that standard methods of meta-analysis, such as the inverse variance or DerSimonian and Laird methods, break down with rare binary events. Not only are effect sizes and within-study variances badly estimated, but also heterogeneity is generally not identifiable or strongly underestimated, and the overall summary index is biased. Many alternative estimation methods have been proposed to improve the estimates in sparse data meta-analysis. In addition to the Bivariate Generalized Linear Mixed Model (BGLMM), the Marginal Beta-Binomial, and the Sarmanov Beta-Binomial models are competitive alternatives. These models have already been used in the context of meta-analysis of diagnostic accuracy studies, where the correlation between sensitivity and specificity is likely to be strongly negative. To our best knowledge, they have not been investigated in the context of rare events and sparse data meta-analysis with a focus on estimating the Risk Difference (RD), Relative Risk (RR), and Odds Ratio (OR). Therefore, the goal of this study was to assess the performance and robustness of these three competitive models in this context. More specifically, the robustness of each model will be assessed using data-generating processes based on the other two competing models. For example, if the data were simulated based on the Sarmanov distribution, then the BGLMM and Marginal Beta Binomial models are misspecified, and assessing their robustness is of interest. According to the simulation results, the BGLMM performs worst regardless of the misspecification of the distribution. The Sarmanov Beta-Binomial model and the Marginal Beta-Binomial model perform better and are more stable due to their lower variance.

posters-wednesday: 60

Time-varying Decomposition of Direct and Indirect Effects with Multiple Longitudinal Mediators

Yasuyuki Okuda¹, Masataka Taguri²

¹Daiichi Sankyo Do., Ltd., Japan; ²Tokyo Medical University

Recent advances in mediation analysis using causal inference techniques have led to the development of sophisticated methods for complex scenarios, including those involving multiple time-varying mediators. Although these approaches accommodate time-varying mediators, their estimates are typically restricted to a single timepoint of interest, thus limiting our understanding of the temporal dynamics of mediation processes. In many clinical contexts, it is essential to capture how mediator effects vary over time to elucidate underlying mechanisms and optimize intervention timing. For example, temporal variations in direct and indirect effects can reveal critical windows during which a treatment exerts its primary influence. To address these limitations, we proposed a novel framework that extends existing approaches based on interventional direct and indirect effects with multiple time-varying mediators and treatment-mediator interaction.

Our method not only decomposes the overall effect into direct and indirect effects, but also further decomposes these effects into time-varying components to investigate mediated effects both up to and beyond the timepoint (t), thereby capturing their longitudinal trajectories. We also proposed a practical estimation approach using marginal structural models (MSMs) for both the outcome and mediators, using inverse probability weighting (IPW) method to account for time-varying confounders.

To illustrate the utility of our method, we applied it to the data from a randomized controlled trial evaluating the effect of a mineralocorticoid receptor (MR) blocker on urinary albumin-to-creatinine ratio (UACR) reduction. Specifically, we investigated how much of the treatment effect is mediated by changes in blood pressure and renal function (measured by eGFR) and explored differences in their mediator-specific effects over time. Our analysis indicated that the mediated effects via both systolic blood pressure and eGFR were relatively small compared with other pathways, with different patterns observed in their longitudinal trajectories.

We believe our approach provides investigators with a valuable tool for understanding an agent's mechanism of action, distinguishing it from other agents, and ultimately informing treatment decisions appropriate for each patient.

posters-wednesday: 61

Causal framework for analyzing mediation effects of clinical biomarkers

Jinesh Shah

CSL Behring, Germany

For a biomarker to be at least a "level 3 surrogate" that is "reasonably likely to predict clinical benefit for a specific disease and class of interventions" [1] it must be either a mediator [1,2] on the causal pathway between treatment and response, or else be causally downstream of such a mediator. We investigate causal mediation analysis as an approach to statistically infer potential mediation effects of biomarkers.Steps involve graphically stating the causal structure using DAGs, formulating estimands of interest and using statistical methods to derive estimates. However, longitudinal clinical data are commonplace and causal estimation of such data is notoriously challenging, standard statistical methods might not provide appropriate target estimates. Thus, we also explore methods to account for time-varying confounding in mediation analysis, one such method discussed provides a reasonable approximation by "Landmarking" the biomarker process at a particular timepoint t [3], and modeling the clinical outcome data after time t. We aim to outline fundamental ideas of causal mediation [4] analysis and delineate a potential framework for its use in clinical development.

(1) Fleming, T.R. and Powers, J.H. Biomarkers and surrogate endpoints in clinical trials. Stat. Med. 31 (2012):2973–2984.

(2) Joffe, M.M. andGreene, T. Related causal frameworks for surrogate outcomes. Biometrics 65 (2009):530–538.

(3) Putter, H. and van Houwelingen, H.C. Understanding landmarking and its relation with time-dependent Cox regression. Stat. Biosci. 9 (2017):489–503.

(4) Imai, K., Keele, L. and Tingley, D. A general approach to causal mediation analysis. Psychol. Methods 15 (2010):309–334.

posters-wednesday: 62

A Modified Doubly Robust Estimator for Clustered Causal Inference: Integrating GLMM and GBM

Samina Naznin, Dr. Mohaimen Monsur

Institute of Statistical Research and Training, University of Dhaka, Bangladesh, People's Republic of

Background: Causal inference uncovers cause-and-effect relationships, but confounding in observational studies complicates estimation. Propensity score methods like IPW are sensitive to model misspecification, while doubly robust estimators are more reliable when either the treatment or outcome model is correctly specified. Recent advancements in machine learning improve propensity score estimation, especially when traditional methods like logistic regression fail. However, most studies focus on single-level data and overlook clustered multilevel structures, which can lead to biased estimates. Recent efforts to combine machine learning with mixed-effects models have improved propensity score estimation in clustered settings but still face vulnerabilities due to treatment model misspecification.

Methods: This study proposes a modified doubly robust estimator for causal effect estimation in clustered data that integrates generalized linear mixed models (GLMM) with generalized boosted models (GBM). This novel approach leverages GBM's ability to manage complex functional forms and incorporates a random intercept to handle clustering effects.

Results: Through extensive simulation, the proposed doubly robust method with GLMM boosting outperforms existing methods in terms of bias and standard error, particularly when either the treatment or outcome model is correctly specified. The proposed method only requires the selection of appropriate covariates and does not require correct functional form specification, as it incorporates GBM to handle misspecification. Even when both the treatment and outcome models are incorrect, our method still shows superior performance. Applying this method to BDHS 2022 data to estimate the causal effect of ANC on birth weight, we find a modest impact (26.75–33.4g) with large standard errors, suggesting no significant effect. Unlike existing methods that may overestimate effects, our approach provides a more conservative estimate.

Conclusion: Our study highlights the importance of robust causal inference methods, proposing a doubly robust GLMM Boosting approach that reduces bias in clustered data and outperforms IPW, g-formula, and standard methods, especially with misspecified treatment models. This approach offers a more reliable alternative for researchers facing model uncertainty in clustered settings, ensuring more accurate causal effect estimation. We have demonstrated its effectiveness by applying it to BDHS data, making it accessible and practical for practitioners to use in real-world scenarios.

posters-wednesday: 63

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

Martin Spindler^1,3, Victor Chernozhukov², Philipp Bach¹, Jan Teichert-Kluge¹, Sven Klaassen^1,3, Suhas Vijaykumar²

¹Universität Hamburg, Germany; ²MIT, USA; ³Economic AI, Germany

This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in medicine, biostatistics and data science in general who are interested in estimating causal quantities using non-traditional data.

posters-wednesday: 64

Causal Machine Learning Methods for Estimating Personalised Treatment Effects - Insights on validity from two large trials

Hongruyu Chen, Helena Aebersold, Milo Alan Puhan, Miquel Serra-Burriel

University of Zurich, Switzerland

Causal machine learning (ML) methods hold great promise for advancing precision medicine by estimating personalised treatment effects. However, their reliability remains largely unvalidated in empirical settings. In this study, we assessed the internal and external validity of 17 mainstream causal heterogeneity ML methods—including metalearners, tree-based methods, and deep learning methods— using data from two large randomized controlled trials: the International Stroke Trial (N=19,435) and the Chinese Acute Stroke Trial (N=21,106). Our findings reveal that none of the ML methods reliably validated their performance, neither internal or external, showing significant discrepancies between training and test data on the proposed evaluation metrics. The individualized treatment effects estimated from training data failed to generalize to the test data, even in the absence of distribution shifts. These results raise concerns about the current applicability of causal ML models in precision medicine, and highlight the need for more robust validation techniques to ensure generalizability.

posters-wednesday: 65

Challenges with subgroup analyses in individual participant data meta-analysis of randomised trials

Alain Amstutz^1,2,3, Dominique Costagliola⁴, Corina S. Rueegg^2,5,6, Erica Ponzi^2,5, Johannes M. Schwenke¹, France Mentré^7,8, Clément R. Massonnaud^7,8, Cédric Laouénan^7,8, Aliou Baldé⁴, Lambert Assoumou⁴, Inge C. Olsen^2,5, Matthias Briel^1,9, Stefan Schandelmaier^9,10,11

¹Division of Clinical Epidemiology, Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland; ²Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway; ³Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom; ⁴Sorbonne Université, Inserm, Institut Pierre-Louis d’Épidémiologie et de Santé Publique, Paris, France; ⁵Department of Research Support for Clinical Trials, Oslo University Hospital, Oslo, Norway; ⁶Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland; ⁷Université Paris Cité, Inserm, IAME, Paris, France; ⁸Département d’Épidémiologie, Biostatistique et Recherche Clinique, Hôpital Bichat, AP-HP, Paris, France; ⁹Department of Health Research Methods, Evidence, and Impact (HEI), McMaster University, Hamilton, Canada; ¹⁰School of Public Health, University College Cork, Cork, Ireland; ¹¹MTA–PTE Lendület "Momentum" Evidence in Medicine Research Group, Medical School, University of Pécs, Pécs, Hungary

Background: Individual participant data meta-analyses (IPDMA) offer the opportunity to conduct credible subgroup analyses of randomized clinical trial data by standardising subgroup definitions across trials, avoiding between-trial information sharing, and enabling effect comparison from trial to trial. These advantages are reflected and judged in item 1 and 2 of the Instrument for the Credibility of Effect Modification ANalyses (ICEMAN), a tool increasingly used by Cochrane meta-analysts and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group. However, guidance on optimal approaches to inform these ICEMAN items when conducting an IPDMA is limited and might differ when using an IPDMA one-stage versus two-stage models. We recently conducted two large IPDMAs, analysing 20 COVID-19 trials with over 23,000 randomised participants. Here, we provide a case report on the approaches used to inform ICEMAN item 1 and 2.

Methods: Following a pre-specified protocol, we applied one- and two-stage models for these IPDMAs, and documented challenges and mitigation strategies along the subgroup analysis process to enhance guidance for future updates to the ICEMAN tool.

Results: We identified several challenges. First, ensuring that the one-stage model separates within from between trial information (ICEMAN item 1), as the two-stage model does by design, is difficult and requires stratification of certain parameters, and correctly specifying the random parameters. Second, the default estimation methods may differ depending on the statistical packages used for one- and two-stage, resulting in different interaction estimates to inform ICEMAN item 1. Third, choosing descriptive thresholds for continuous effect modifiers in meta-analysis of interaction plots can mislead about the direction of effect modification in individual trials (ICEMAN item 2). We developed illustrative modular R codes to inform ICEMAN item 1 with one- and two-stage models, and provided plots with meta-analysis of interaction estimates alongside trial-specific subgroup effects to inform ICEMAN item 2.

Conclusion: At the conference, we will present these challenges in detail, their mitigation strategies and discuss the need for refining methods guidance to evaluate the effect modification credibility in IPDMAs using the ICEMAN tool.

posters-wednesday: 66

Illustration and evaluation of a causal approach to sensitivity analysis for unmeasured confounding using measured proxies with a simulation study

Nerissa Nance^1,2, Romain Neugebauer³

¹Novo Nordisk, Denmark; ²University of California, Berkeley CA; ³Kaiser Permanente Northern California Division of Research, Pleasanton CA

Introduction

Sensitivity analysis for unmeasured confounding is a key component of applied causal analyses using observational data[1]. A general method [2] based on a rigorous causal framework has been previously proposed; this approach addresses limitations of existing methods such as reliance on arbitrary parametric assumptions or expert opinion without taking advantage of the available data at hand. We illustrate and evaluate this general method through a simulation study.

Methods
We simulated data using a parametrized nonparametric structural equation model. Our simulated observed data consisted of unmeasured covariate, measured covariate, exposure, and outcome. We studied the performance of point and interval estimation of an inverse probability weighting estimator that aims to adjust for unmeasured confounding through a measured proxy variable. We assessed this method under a range of scenarios, including: interaction terms with the exposure, various association strengths and directions between the covariates and the exposure/outcome.

Results

We demonstrated potential bias elimination and recovery of confidence interval coverage from unmeasured confounding in the case where the unmeasured covariate has the same magnitude and direction of association with both exposure and outcome as the measured proxy. However, in other scenarios, such as when the measured and unmeasured confounders had antagonistic effects, recovery was low or minimal.

Discussion

We illustrate through simulations that when there is the same magnitude and direction of the association of the unmeasured confounder and measured proxy with the exposure and outcome, the true unconfounded effect can be fully recovered. However, we also show how this recovery can break down in other situations that analysts may encounter. Results from this study informs key practical considerations for applying these methods, as well as highlight potential limitations.

References

Dang LE et al.. A causal roadmap for generating high-quality real-world evidence. J Clin Transl Sci. 2023 Sep 22;7(1):e212.
Luedtke, A.R., Diaz, I. and van der Laan, M.J., 2015. The statistics of sensitivity analyses.

posters-wednesday: 67

Quantifying causal treatment effect on binary outcome in RCTs with noncompliance: estimating risk difference, risk ratio and odds ratio

Junxian Zhu, Mark Y. Chan, Bee-Choo Tai

National University of Singapore, Singapore

Randomized Controlled Trials (RCT) are currently the most reliable method for empirically evaluating the effectiveness of a new drug. However, patients may fail to adhere to the treatment protocol due to side effects. Medical guidelines recommend reporting the risk difference (RD), the risk ratio (RR) and the odds ratio (OR), as they offer distinct perspectives on the effect of the same drug. Unlike RD, there are only a few available methods to estimate RR and OR for RCT in the presence of non-compliance. In this paper, we propose a new inverse probability weighting (IPW)-based RD, RR and OR estimators for RCT in the presence of non-compliance. This IPW-based method creates a new categorical variable by utilizing information on non-compliance with the randomly assigned treatment. For all estimators, we prove their identification, asymptotic normality and derive corresponding asymptotic confidence intervals. We evaluate the performance of these three estimators through an intensive simulation study. Its application is further demonstrated using data from the IMMACULATE trial on remote post-discharge treatment for patients with acute myocardial infarction.

posters-wednesday: 68

Blinded sample size recalculation for randomized controlled trials with analysis of covariance

Takumi Kanata, Yasuhiro Hagiwara, Koji Oba

Department of Biostatistics, School of Public Health, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan

Background / Introduction: In randomized controlled trials, covariate adjustment can improve the statistical power and reduce the necessary sample size compared to unadjusted estimator. Analysis of covariance (ANCOVA) is often used for adjusting baseline covariates when outcome is continuous. For designing a sample size based on ANCOVA, it is necessary to pre-specify the association between outcome and baseline covariates, as well as that among baseline covariates. However, determining these parameters at the design stage is challenging. While it may be possible to adaptively assess them during the trial, the statistical impact remains unclear. In this study, we propose a blinded sample size recalculation method for ANCOVA estimator, which is asymptotically valid under minimum distributional assumptions and thus allows for arbitrary model misspecification.

Methods: We show that the asymptotic variance of ANCOVA estimator and unadjusted estimator can be calculated using the pooled outcome and baseline covariates when the treatment is randomly assigned with 1:1 ratio independent of the baseline covariates. This result is valid under arbitrary model misspecification. Our proposal is as follows. First, we calculate the sample size based on a t-test without adjusting for baseline covariates. Then, at a specific time point (e.g. when 50% of outcome is observed), we assess the relevant parameters under blinded conditions without examining the between-group differences. We propose a sample size recalculation method that considers the asymptotic variance reduction through covariate adjustment and recalculate the final sample size based on this proposed method. We conducted simulations to evaluate the performance of the proposed method under various scenarios.

Results: The proposed method achieved a nominal statistical power under various scenarios and it reduced the necessary sample size at the final analysis according to the correlations between the outcome and the baseline covariates; for example, when the correlations are 0.5, the sample size reduction ranged from 15% to 36% on average. Although the proposed method was based on the asymptotic results, it performed well under the relatively small sample size. We also found that type-I error at the final analysis was not affected by the proposed method.

Conclusion: The proposed sample size recalculation method achieves a nominal statistical power in randomized controlled trials based on ANCOVA without type-I error inflating. The proposed method possibly reduces the necessary sample size, and it would lead to efficient drug development.

posters-wednesday: 69

Variance stabilization transformation for the intraclass correlation coefficient of agreement with an application example to meta-analyses of inter-rater reliability studies

Abderrahmane Bourredjem^1,2,3, Isabelle Fournel¹, Sophie Vanbelle⁴, Nadjia El Saadi³

¹Inserm CIC1432, Centre d’investigation clinique, Module Epidémiologie Clinique/Essais cliniques, CHU de Dijon, France.; ²Institut de Mathématiques de Bourgogne, UMR 5584, CNRS, Université de Bourgogne, F-21000 Dijon, France.; ³LAMOPS, École Nationale Supérieure de Statistique et d’Economie Appliquée, Kolea, Algérie.; ⁴Department of Methodology and Statistics, Care and Public Health Research Institute (CAPHRI), Maastricht University, The Netherland.

Introduction :

We consider the problem of variance stabilizing transformation (VST) for the two-way intra-class correlation coefficient of agreement (ICC2a) in inter-rater reliability studies when both raters and subjects are assumed to be randomly selected from their respective populations. Such transformations aim to make the ICC2a’s variance independent from its estimate, improving the ICC2a confidence interval (CI) and the combination of independent ICC2as in meta-analyses. In this work, we calculate three potential VSTs for the ICC2a, evaluate their properties by simulation for single study CIs and demonstrate their use on meta-analysis of inter-rater reliability studies.

Methods :

It was recently shown that the variance of the ICC2a estimate depends on a nuisance parameter, defined as the ratio of the inter-rater to the inter-subject variances. Using this variance expression, three VST approximations (noted T0, T1 and T2) were obtained addressing the nuisance parameter differently. A simulation study with small to moderate sample sizes compared the properties of the obtained VSTs against two reference CIs methods: 1) the modified large sample approach (MLS), 2) a beta-distribution-based method (β). Finally, we illustrated the use of our VSTs on a single inter-rater reliability study with 10 physiotherapists evaluating the exercise performance of 42 low back pain patients, as well as on a meta-analysis of 11 inter-rater reliability studies of upper extremity muscle tone measurements.

Results :

The analytical expression of the three VSTs varies in complexity, from T0 (the simplest) to T2 (the most complex, requiring numerical methods to calculate its inverse transformation), through T1 (a middle level between T0 and T2). Simulations show that for small samples (up to 30 subjects and fewer than 10 raters), the MLS and β approaches remain preferable. For medium-sized samples (from 40 subjects and 10 raters), T1 provides coverage rates close to 95% while shortening the CI length. In the meta-analysis example, T1 offers advantages including transformed estimators simpler to interpret and better considering study weights in the synthesis of the ICC2a estimates and their CI.

Conclusion :

We propose a novel VST (noted T1) for ICC2a, filling a gap in the literature. We recommend using T1 for ICC2a CIs in medium-sized individual studies and for meta-analyses of inter-rater reliability studies. However, more extensive simulations are required to refine this recommendation, especially for meta-analyses.

posters-wednesday: 70

Bridging Single Arm Studies with Individual Participant Data in Network Meta-Analysis of Randomized Controlled Trials: A Simulation Study

Katerina Maria Kontouli, Stavros Nikolakopoulos, Christos Christogiannis, Dimitrios Mavridis

University of Ioannina, Greece

Background: There is a growing interest in including single-arm studies within health technology assessments (HTA). Manufacturers often have access to individual participant data (IPD) from their own studies (a single-arm study evaluating treatment B), while only aggregate data (AGD) are available from published studies (e.g., comparing treatments C, D etc. to a reference treatment A). Several methods such as the Matching-Adjusted Indirect Comparison (MAIC) and the Simulated Treatment Comparison (STC) have been suggested to estimate an indirect effect (e.g, BvsA, BvsC) when the distribution of prognostic factors and effect modifiers differ across studies. The aim is to evaluate MAIC and STC in estimating an indirect effect in the above scenario through a simulation study.

Methods: We examined three methods: two widely used adjusted methods for unanchored comparisons, MAIC and STC, and the naïve (unadjusted) method. We applied these methods to incorporate single-arm studies with available interventions within a connected network of randomized controlled trials. To optimize the matching process, we employed two distinct distance metrics: Gower’s and Mahalanobis distance. Our simulation study explored various scenarios, varying (i) the sample size of studies, (ii) the magnitude of the treatment effect, (iii) the correlation between continuous covariates representing study population characteristics, (iv) the baseline probability, and (v) the degree of overlap between the single-arm study and the RCTs.

Results: Our simulation results indicate that when all continuous covariates are drawn from the same distributions with zero correlation, all methods perform similarly in terms of bias, mean squared error, and coverage across all scenarios. However, when the covariate overlaps between the single-arm study and the RCTs is around 80%, the Bucher method produces more biased estimates compared to MAIC and STC. As the overlap decreases to approximately 60%, the differences between MAIC and STC become more pronounced, particularly in terms of coverage and MSE.

Conclusion: STC emerges as the most robust approach for integrating evidence from single-arm studies into a network of RCTs. Additionally, Mahalanobis distance proves to be effective in identifying the optimal match, enhancing the reliability of the synthesis.

posters-wednesday: 71

Comparative Efficacy and Safety of Migraine Treatments: A Network Meta-Analysis of Clinical Outcomes

Shashank Tripathi¹, Rachna Agarwal²

¹University College of Medical Sciences GTB Hospital, New Delhi, India; ²Institute of Human Behavior and Allied Sciences, New Delhi, India

Introduction

Migraine is a common and debilitating neurological condition, affecting roughly 10% of the global population and placing a significant burden on public health. It occurs in episodes, often characterized by intense headaches accompanied by sensitivity to light (photophobia), sensitivity to sound (phonophobia), and a range of autonomic and sensory disturbances.

Methods

A comprehensive search of three databases was conducted up to April 30, 2023. A frequentist network meta-analysis was utilized to estimate both direct and indirect effects across three outcomes; mean migraine days, freedom for pain in two hours, and adverse event. Interventions were ranked independently for each outcome using the p-score. The choice of meta-analysis model was based on the I² statistic: a random-effects model was applied when I² exceeded 30%, while a fixed-effect model was used when I² was ≤30%. All statistical analyses were performed using R version 4.3.2.

Results

A total of 80 articles were included in current investigation. For, change in mean migraine days (MMD) as direct estimate, suggested statistically significant result for CGRP antagonist [SMD: -0.38 (-0.61, -0.14)], CGRP mAbds [SMD: -0.35 (-0.41, -0.31)] and Triptans [SMD: -0.36 (-0.62, -0.10)]. Similarly, direct estimates were calculated for freedom for pain in two hours, suggested statistically significant result CGRP antagonist [RR: 5.83 (2.50, 13.59)], Dihydroergotamine [RR: 19.92 (3.41, 116.76)], Nasal agent (NSAID) [RR: 10.27 (1.03, 102.28)], Nasal agent (Triptan) [RR: 8.27 (3.51, 19.56)], NSAID [RR: 19.1 (7.36, 49.01)], and Triptan [RR: 22.82 (16.74, 31.12)]. Additionally, for the outcome adverse event, the direct estimate suggested statistically significant result for CGRP mAbs [RR: 2.77 (1.97, 3.91)], CGRP antagonist [RR: 2.92 (1.95, 4.37)], Dihydroergotamine [RR: 3.99 (1.47, 10.82), NSAID [RR: 4.21 (2.1, 8.1)], Nasal agent (CGRP antagonist) [RR: 7.61 (2.31, 25.19)], Triptans [RR: 8.40 (6.91, 10.22)], Nasal agent (triptan) [RR: 23.57 (9.01, 61.71)]. The indirect estimates were calculated taking all treatments under investigations as reference treatment, simultaneously, for each outcome of interest.

Conclusion

A network meta-analysis of migraine treatments found Triptans to be highly effective for pain, though with a higher risk of adverse events. CGRP antagonists excelled at reducing monthly migraine days but also had increased side effects.

posters-wednesday: 72

Optimal standardization as an alternative to matching using propensity scores

Ekkehard Glimm, Lillian Yau

Novartis Pharma, Switzerland

In many development programs in the pharmaceutical industry, there is a need for indirect comparisons of medical treatments that were investigated in separate trials. Usually, trials have slightly different inclusion criteria, hence the influence of confounding factors has to be removed for a “fair” comparison. The most common method applied for this is propensity score matching. This method yields a set of weights used to re-weight patients in such a way that the weighted averages of the confounding variables are rendered comparable across the studies.

Propensity score matching typically achieves "roughly matched" groups, but almost invariably some differences between the averages of the matching variables in the compared trials remain.

We have recently suggested an approach for exact matching which may serve as an alternative to propensity score matching via a logistic regression model. This approach treats the matching problem as a constrained optimization problem. This approach guarantees that post-matching, the averages of the variables used in matching from the two trials are identical. While several objective functions could in theory be selected to generate a set of weights, in this talk we will focus on weights that maximize the effective sample size (ESS).

While the approach is closely related to matching-adjusted indirect comparison (MAIC, Signorovitch et al, 2010), it goes beyond their suggestion because we do not impose a specific functional form on the matching weights. Furthermore, in the talk we focus on the case where individual patient data (IPD) is available from all trials in the analysis, whereas the original MAIC approach considered only the matching of IPD onto aggregated data.

In the talk, we illustrate the application of the approach to two studies. Furthermore, we present the results from a simulation study showing that the new suggestion leads to weights which are considerably more stable than propensity score weights.

References

Signorovitch JE, Wu EQ, Andrew P, et al. Comparative effectiveness without head-to-head trials: a method for matching-adjusted indirect comparisons applied to psoriasis treatment with adalimumab or etanercept. PharmacoEconomics. 2010;28(10):935-945.

Glimm, E. and Yau, L. (2022): Geometric approaches to assessing the numerical feasibility for conducting matching-adjusted indirect comparisons. Pharm Stat 21, 974-987.

posters-wednesday: 73

Evaluating Diagnostic Tests Against Composite Reference Standards: Quantifying and Adjusting for Bias

Vera Hudak, Nicky J. Welton, Efthymia Derezea, Hayley Jones

University of Bristol, United Kingdom

Background: Composite reference standards (CRSs) are often used in diagnostic accuracy studies in situations where gold standards are unavailable or impractical to carry out on everyone. Here, the test under evaluation is compared with some combination (composite) of results from other tests. We consider a special case of CRS, which we refer to as a ‘check the negatives’ design. Here, all study participants receive an imperfect reference standard, and those who test negative on this are additionally tested with the gold standard. Unless the imperfect reference standard is 100% specific, some bias can be anticipated.

Methods: We derive algebraic expressions for the bias in the estimated accuracy of the test under evaluation in a ‘check the negatives’ study, under the assumption that test errors are independent given the true disease status. We then describe how bias can be adjusted for using a Bayesian model with an informative prior for the specificity of the imperfect reference standard, based on external information. Our approach is evaluated through a simulation study under two scenarios. First, we consider the case where the prior for the specificity of the imperfect reference standard is correctly centred around its true value, and we assess the impact of increasing uncertainty by increasing the prior standard deviation. Second, we examine the case where the prior is incorrectly centred, but the true value remains within the 95% prior credible interval, to explore the consequences of moderate prior misspecification.

Results/Conclusions: In a ‘check the negatives’ study, under the assumption of conditional independence of errors made by the test under evaluation and the imperfect reference standard, the estimated specificity is unbiased but the sensitivity is underestimated. Preliminary findings suggest that, if the informative prior is correctly centred, the Bayesian model will always reduce bias and can successfully eliminate it in some, but not all, scenarios. Full simulation results, including those with incorrectly centred prior, and their implications will be presented at the conference.

posters-wednesday: 74

Characteristics, Design and Statistical Methods in Platform Trials: A Systematic Review

Clément R. Massonnaud^1,2, Christof Manuel Schönenberger³, Malena Chiaborelli³, Selina Ehrenzeller³, Alexandra Griessbach³, André Gillibert⁴, Matthias Briel³, Cédric Laouénan^1,2

¹Université Paris Cité, Inserm, IAME, F-75018 Paris, France; ²AP-HP, Hôpital Bichat, Département d’Épidémiologie, Biostatistique et Recherche Clinique, F-75018 Paris, France; ³CLEAR Methods Center, Division of Clinical Epidemiology, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland; ⁴Department of Biostatistics, CHU Rouen, Rouen, France

Background

Platform trials (PTs) are gaining popularity in clinical research due to their innovative and flexible methodologies. However, their complex design underscores the need for a review of how they are currently implemented. The objective of this systematic review was to determine the characteristics, methodological and statistical practices in PTs.

Methods

We identified PTs from trial registries and bibliographic databases up to August 2024. Eligible PTs were randomized controlled trials studying multiple interventions within a single population, with flexibility to add or drop arms. Data were extracted on trial status, design, statistical methods, and reporting practices. Key variables included sample size determination, interim analyses, and type I error control. Descriptive statistics summarized findings across therapeutic areas and statistical framework (frequentist or Bayesian).

Results

We identified 190 PTs. Most focused on infectious diseases (77 [40.5%], including 57 for COVID-19) and oncology (69 [36.3%]). PT initiation peaked during the COVID-19 pandemic but has since stabilized at 85 active trials, with 25 PTs in planning. Non-industry sponsorship accounted for 78% (142/183) of PTs, with differences between infectious disease (95%, 71/75) and oncology trials (51%, 35/68). A complete master protocol was available for 47% (89/190) of all PTs and for 55% (83/152) of ongoing, completed, or discontinued PTs. Amendments were tracked in 61% (52/85) of protocols with multiple versions. Registry entries were considered up-to-date for 87% (153/175) of registered PTs. Bayesian designs featured in 59/190 PTs versus 56/190 frequentist trials, 20/190 trials utilizing both frameworks (unclear statistical framework in 55/190 PTs). Overall, 25/111 trials (23%) were designed without a pre-determined target sample size, all of which were Bayesian. Among these, 15 were explicitly reported as “perpetual” trials. The number of interim analyses was pre-determined in 19% (11/58) of Bayesian trials versus 58% (28/48) of frequentist trials. Simulations to evaluate operating characteristics were used in 93% (39/42) of Bayesian trials. Simulation reports were available in 67% (26/39) of cases, and the procedure was detailed for 62% (24/39) of trials. Only two trials shared the simulation code.

Conclusions

Platform trials remain popular and increasingly diverse. Efforts to enhance transparency and reporting, especially in complex Bayesian platform trials, are essential to ensure reliability and broader acceptance.

posters-wednesday: 75

WRestimates: An R Package for Win-Ratio Sample Size and Power Calculations

Autumn O Donnell

University of Galway, Ireland

The win-ratio has excellent potential for determining the overall efficacy of treatments and
therapies in clinical trials. Its ability to hierarchically account for multiple endpoints provides a holistic metric of the treatment effect. For the win-ratio to become a prominent and reliable statistical method outside of cardiovascular disease, there is a need for a straightforward approach to the study design, particularly the power and sample size determination. An appropriate method for determining these metrics is vital to ensure the validity of the results obtained in a study. The WRestimates package provides easy-to-use functions which can be used in required sample size determination and power of studies implementing the win-ratio. These allow for the calculation of sample size and power based on estimands or pilot data, negating the need for complex simulation-based methods which require many assumptions to be made of the data.

posters-wednesday: 76

Randomizing With Investigator Choice of Treatment: A Powerful Pragmatic Tool in Clinical Trials

Lillian Yau, Betty Molloy

Novartis Pharma, Switzerland

Taking a patient-centric approach, pragmatic clinical trials aim for study designs that are closer to clinical practice. Results of treatment benefits and risks of new medical products from these trials can provide information for patients, health care professionals, and decision-makers that are more easily generalized to the real world.

We present as an example the design of a multi-regional, phase III registration study for a first-line cancer treatment. The study compares an experimental treatment against two generations of standards-of-care (SoC) that are approved and used worldwide for newly diagnosed patients. The first generation (1G) and second generation (2G) treatments differ with respect to efficacy and safety.

To mimic clinical practice, before randomization, trial investigators and patients together selected an SoC option based on patient-related factors such as age, comorbidities, disease characteristics, as well as on regional practice. This choice of SoC was used as a stratification factor in the randomization and subsequent data analysis. This approach facilitates causal inference on the comparison of the experimental treatment with the different SoC options (1G or 2G) separately and combined.

To satisfy the requirements of different health authorities and the reimbursement agencies, joint primary endpoints as well as key secondary endpoints were designed to be tested at different time points. Strong-control of the type I error was guaranteed by combining multiplicity adjustment with group sequential testing.

By allowing investigator choice of 4 currently available SoC in the active control arm, the study optimized patients’ treatment and reduced the risk of exclusion of patients. It was very attractive to both patients and physicians, as reflected in the fast recruitment with close to 30 patients per month, nearly double what was expected in this disease area.

The study had its primary and key secondary read-outs in 2024. The primary results were the basis of the approval of the new treatment in many countries including the US, Canada, and Switzerland. The key secondary results are used to support the submission to EMA.

This study not only advances the therapeutic landscape but also sets a benchmark for future clinical trials, demonstrating that patient-centered strategies and robust designs can address the requirements of multiple decision makers and can lead to significant advancements in clinical research and patient care.

posters-wednesday: 77

Confirming assay sensitivity in 2-arm non-inferiority trial using meta-analytic-predictive approach

Satomi OKAMURA¹, Eisuke HIDA²

¹Department Of Medical Innovation, The University of Osaka Hospital, Japan; ²Graduate School of Medicine, The University of Osaka, Japan

Introduction and Objective: Assay sensitivity is a well-known issue in 2-arm non-inferiority (NI) trials. To assess assay sensitivity, a 3-arm NI trial including placebo, control, and treatment is strongly recommended, with concerns about ethics and feasibility. FDA guidance on NI trials states: “In the absence of a placebo arm, knowing whether the trial had assay sensitivity relies heavily on external information (not within-study), giving NI studies some of the characteristics of a historically controlled trial.” Hence, the new NI trial must be similar to the historical trials. Additionally, the historical trials must have consistently shown that the ‘control’ in the NI trial is superior to placebo. The superiority here requires that the effect of the ‘control’ minus a NI margin is greater than that of placebo, not just the ‘control’.

Our objective is to propose a method to ensure the similarity of the NI trial to the historical trials and the superiority of ‘control’ to placebo for confirming assay sensitivity in the 2-arm NI trial. Information from historical trials usually consists of aggregate data. However, it has become clear that when effect modifiers are present, simple summary statistics for the entire population are insufficient. Therefore, it is important that the proposed method take into account the presence of effect modifiers.

Method and Results: To assess assay sensitivity, we use the meta-analytic-predictive approach. This approach is the Bayesian method so the prior distribution, especially in this study for the between-trial heterogeneity, is crucial. We assume the parameter follows the half-normal distribution for deviation or inverse-gamma distribution for variance. The performance of the approach is evaluated from two perspectives. First, we assess the influence of the prior setting on assay sensitivity by varying the amount of prior information about between-trial heterogeneity. Second, we demonstrate what trials may reduce assay sensitivity by setting multiple conditions for the historical trials, such as the number of historical trials, sample size, effect size, and the property of effect modifiers. For each scenario, we compute the posterior distribution of the ‘control’ effect and assess the performance of the method through joint power and type I error rate.

Conclusions: Our simulation study suggests that the meta-analytic-predictive approach is one of the useful methods to evaluate assay sensitivity in 2-arm NI trial. Especially, the consideration of uncertainty, which is unique to the Bayesian approach, is of great benefit where only the aggregate data in the historical trials are available.

posters-wednesday: 78

Adding baskets to an ongoing basket trial with information borrowing: When do you benefit?

Libby Daniells¹, Pavel Mozgunov¹, Helen Barnett³, Alun Bedding⁴, Thomas Jaki^1,2

¹MRC Biostatistics Unit, Cambridge University, United Kingdom; ²Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany; ³Department of Mathematics and Statistics, Lancaster University, Lancaster, United Kingdom; ⁴Roche Products, Ltd, Welwyn Garden City, United Kingdom

Innovation in trial designs has led to the development of basket trials in which a single therapeutic treatment is tested in several patient populations, each of which forms a basket. This trial design allows for the testing of rare diseases or subgroups of patients. However, limited basket sample sizes can cause a lack of statistical power and precision of treatment effect estimates. This is tackled through the use of Bayesian information borrowing.

To provide flexibility to these studies, adaptive features are desirable as they allow for pre-specified modifications to an ongoing trial. In this talk we focus on the incorporation of (a) newly identified basket(s) part-way through a study. We propose and compare several approaches for adding new baskets to an ongoing basket trial under an information borrowing structure and highlight when it is beneficial to add a new basket to an ongoing trial as opposed to running a separate investigation for them. We also propose a novel calibration for the decision criteria in basket trials that is robust with respect to false decision making. Results display a substantial improvement in power for a new basket when information borrowing is utilized, however, this comes with potential inflation of error rates. This inflation is reduced under the novel calibration procedure.

posters-wednesday: 79

Optimizing Adaptive Trial Design to Ensure Robustness Across Varying Treatment Effect Assumptions

Valeria Mazzanti¹, Dirk Klingbiel²

¹Cytel Inc.; ²Bristol Myers Squibb

Background:

Strong adaptive clinical trial design relies on several key aspects: experience in a therapeutic area; expertise in statistical methodology; and appropriate technology to assess design robustness. A recent study design assessment for a compound under development in Hematology highlighted each of these aspects in an interesting way. The study’s primary endpoint was Progression-Free Survival (PFS), though there was also strong interest in monitoring observed events for Overall Survival (OS), adding to the design’s complexity. Our aim in this assessment and optimization process was to shorten the expected average study duration, while still ensuring appropriate statistical power to detect a minimally clinically viable treatment effect for this product.

Methods:

The original design targeted 90% power using a 1-sided alpha of 0.025 and 1:1 randomization approach. The design included one interim analysis after 40% of PFS events, assessing futility only. In our simulation plan, we varied the number of interim analyses (1 or 2 interim looks) and explored the impact of a variety of interim timings and types of assessment (futility and/or efficacy) on the expected number of events. We also employed a multi-state model to simulate PFS and OS events for each patient so that we could report how many OS events would be observed at each analysis of PFS. These variations resulted in over 5,000 parameter combinations that were ranked and scored in-line with the stated strategic priority of overall reduction in study duration, using industry-standard advanced statistical software.

Results:

We managed to optimize our design such that the required sample size fell by 85 patients and was 13 months shorter in duration on average. The optimized design included two interim analyses, in which both efficacy and futility were assessed. The additional efficacy evaluations led to a high probability of early stopping without compromising on overall power of the study.

Conclusion:

The results of the exploration highlighted the value of adding an efficacy stopping boundary and a second, later interim analysis both leading to savings in average sample size and average study duration. Additional explorations may include assigning statistical significance to evaluate the treatment’s impact on the OS endpoint as well, and assessing the probability of success of the trial by sampling the events generated in our simulation from prior distributions identified from historical studies.

posters-wednesday: 80

N-of-1 Trials to Estimate Individual Effects of Music on Concentration

Thomas Gärtner¹, Fabian Stolp¹, Stefan Konigorski^1,2

¹Hasso Plattner Institute for Digital Engineering, Germany; ²Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, USA

Focus and concentration are influenced by various environmental factors, with music shown to impact cognitive performance. However, recent research highlights the individualized nature of the effect of music, as responses vary largely based on genre and personal preference. Traditional population-level studies often obscure these differences, whereas N-of-1 trials can provide a personalized approach that may be particularly suited for examining how self-selected music genres affect concentration.
This study presents the design of a series of N-of-1 trials investigating the individual effects of music on cognitive processes as primary outcome, measured using a digitally adjusted Stroop test. In the study, participants will select one music genre, with or without lyrics, as their intervention, which will be compared to silence as a baseline. Each participant will be randomly assigned to a sequence of 3-minute music listening periods (intervention, A) and 3-minute silent periods (control, B) in a two-cycle crossover design (ABAB or BABA). To minimize carryover effects and concentration loss, a 1-minute break is scheduled between blocks. After each block, participants will complete a brief questionnaire to assess self-reported concentration and stress levels. Additionally, physiological proxies for stress and cognitive load, including heart rate, electroencephalography (EEG), and pupil dilation, will be recorded. Intervention effects will be estimated using a Bayesian linear mixed models, with a primary focus on individual-level analyses and secondary analyses at the population level.

This study will provide valuable insights into the personalized effects of music on concentration, helping individuals optimize their cognitive performance. At the population level, it will identify variations in concentration effects across different music genres, contributing to the broader understanding of music as a cognitive intervention.

posters-wednesday: 81

Simulation Study Examining Impact of Study Design Factors on Variability Measures

Laura Quinn^1,2, Jon Deeks^1,2, Yemisi Takwoingi^1,2, Alice Sitch^1,2

¹Department of Applied Health Sciences, University of Birmingham; ²National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK

Introduction

Interobserver variability studies in diagnostic imaging are crucial for assessing the reliability of imaging test interpretations between different observers. The design and conduct of these studies are influenced by various factors that can impact the calculation and interpretation of variability estimates. These factors include participant sample size, condition prevalence, diagnostic test discrimination, and reader error levels.

Methods

Data was simulated for a study design with binary outcomes and two interpretations for each patient. A range of scenarios were simulated, varying participant sample size (25 to 200), condition prevalence (5% to 95%), diagnostic test discrimination (good, reasonable, poor), and reader error levels (low, medium, high). For each combination, 1,000 simulations were performed, and variability measures (percentage agreement, Cohen’s kappa, Prevalence- Adjusted Bias-Adjusted Kappa (PABAK), Krippendorff’s alpha, and Gwet’s AC coefficient) were calculated, along with sensitivity and specificity.

Results

The study showed that increased sample size consistently produced more precise variability estimates across all scenarios. Percentage agreement consistently showed the highest values among the variability measures. PABAK and Gwet’s AC coefficient demonstrated greater stability and less sensitivity to condition prevalence compared to Cohen’s kappa and Krippendorff’s alpha, which showed more variable performance. As diagnostic test discrimination decreased and reader error increased, all variability measures showed a decline.

Conclusion

These findings show the importance of considering different factors in assessing interobserver variability in diagnostic imaging tests. Different variability measures are affected in distinct ways by participant sample size, condition prevalence, diagnostic test discrimination, and reader error levels. By providing guidance on designing interobserver variability studies, future studies can be improved, providing more accurate information on the reliability of diagnostic imaging tests, leading to better patient care.

posters-wednesday: 82

Simulation-based optimization of adaptive designs using a generalized version of assurance

Pantelis Vlachos¹, Valeria Mazzanti¹, Boaz Adler²

¹Cytel Inc, Switzerland; ²Cytel Inc, USA

The power of cloud computing is utilized to create a tool that collects information from different parts of the clinical development team (clinical, operations, commercial etc) and with the statistician at the driver seat seeks and proposes designs that optimize a clinical study with respect to sample size, cost, duration and power. The optimization is performed using a generalized assurance measure that takes into account all trial possible scenarios with respect to treatment effect, control response, enrollment, dropouts etc. Furthermore, this tool can be used to communicate and update information to the trial team in real time, considering (possibly) changing target objectives. Case studies of actual adaptive trials will be given.

posters-wednesday: 83

Evaluating the impact of outcome delay on adaptive designs

Aritra Mukherjee¹, Michael J. Grayling², James M. S. Wason¹

¹Population Health Sciences Institute, Newcastle University; ²Johnson and Johnson

Background: Adaptive designs (AD) are a broad class of trial designs that allow pre-planned modifications to be made to a trial as patient data is accrued, without undermining its validity or integrity. ADs can lead to improved efficiency, patient-benefit, and power of a trial. However, these advantages may be affected adversely by a delay in observing the primary outcome variable. In the presence of such delay, a choice must be made between (a) pausing recruitment until requisite data is accrued for the interim analysis, leading to longer trial completion period; or (b) continuing to recruit patients, which may result in a large number of participants who do not benefit from the interim analysis. In the latter case, little work has investigated the size of outcome delay that results in the realised efficiency gains of ADs being negligible compared to classical fixed-sample alternatives. Our study covers different kinds of ADs and the impact of outcome delay on them.

Methods:We assess the impact of delay on the expected efficiency gains of an AD by estimating the number of pipeline patients being recruited in the trial under the assumption that recruitment is not paused while we await treatment outcomes. We assume different recruitment models to suitably adjust for single- or multi-centred trials. We discuss findings for two-arm group-sequential designs as well as multi-arm multi-stage designs. Further, we focus on sample size re-estimation (SSR), a design where the variable typically optimized to characterise trial efficiency is not the expected sample size (ESS).

Results and conclusions: Our results indicate that if outcome delay is not considered at the planning stage of a trial, this can translate to much of the expected efficiency gains being lost due to delay. The worst affected designs are typically those with early stopping, where the efficiency gains are assessed through a reduced ESS. SSR can also suffer adversely if the initial sample size specification was largely over-estimated.

Finally, in light of these findings, we discuss the implications of using the ratio of the total recruitment length to the outcome delay as a measure of the utility of different ADs.

Mobile View Print View

Contact and Legal Notice · Contact Address:

organizers{at}iscb2025 dot

info

Privacy Statement · Conference: ISCB46