Conference Agenda

Session
Poster Exhibition: W / Wednesday posters at ETH
Time:
Wednesday, 27/Aug/2025:
3:30pm - 4:00pm

Location: ETH, UG hall

ETH, -1 / UG floor poster area

Presentations
posters-wednesday-ETH: 1

Exploring the Exposome correlated with Body Mass Index in Adolescents: Findings from the 2014-2015 and 2022-2023 KNHANES

Hye Ah Lee1, Hyesook Park2

1Clinical Trial Center, Ewha Womans University Mokdong Hospital, Seoul, Republic of Korea; 2Department of Preventive Medicine, Graduate Program in System Health Science and Engineering, Ewha Womans University, Seoul, Republic of Korea

Background: To identify multifaceted features correlated with body mass index (BMI) in adolescents, we conducted an exposome-wide association study (ExWAS) using data from the Korea National Health and Nutrition Examination Survey (KNHANES), a nationally representative survey.

Methods: To obtain robust findings, we constructed a multi-year dataset covering two study periods (2014-2015 and 2022-2023). Adolescents aged 12 to 18 years with complete BMI data were included, while those dieting for weight loss or health conditions were excluded. This resulted in 941 participants from the 2014–2015 dataset and 637 from the 2022–2023 dataset. Approximately 130 features derived from questionnaires, health examinations, and dietary surveys were analyzed. Standardized BMI (zBMI) was used as the outcome, and ordinal or numeric features were standardized by sex and age using mean and standard deviation. ExWAS was performed through survey-design-based linear regression, adjusting for sociodemographic features. Additionally, pairwise relationships between features were assessed using a mixed graphical model (MGM) network.

Results: In the 2022–2023 dataset, 20.2% of boys and 15.0% of girls were classified as obese. Of the approximately 130 exposomes, 13 features in boys and 9 features in girls were selected as features correlated with BMI. Boys who perceived themselves as unhealthy or considered their body shape as fat also had higher BMI. zBMI was positively correlated with alanine aminotransferase (ALT), white blood cell (WBC), platelets, systolic blood pressure (SBP), total cholesterol, and triglyceride (TG) and negatively correlated with high density lipoprotein cholesterol (HDL-C). These trends were also observed in the 2014–2015 dataset. Among girls, zBMI was positively correlated with ALT, WBC, SBP, and TG and negatively correlated with HDL-C. Girls who perceived their body shape as fat had higher BMI, consistent with findings from the 2014–2015 dataset. Notably, in the 2022–2023 dataset, girls who reported suicidal thoughts had higher BMI. In the MGM network analysis, ALT, WBC, and HDL-C were directly correlated with zBMI across all datasets, regardless of sex.

Conclusion: In adolescents, metabolic indices showed a clear correlation with BMI, and in addition to the commonly considered metabolic indices, ALT and WBC were directly correlated. Furthermore, subjective body shape perception, as assessed through questionnaires, was significantly correlated with BMI.



posters-wednesday-ETH: 2

Flexible statistical modeling of undernutrition among under-five children in India

SHAMBHAVI MISHRA

UNIVERSITY OF LUCKNOW, India

Background: Childhood undernutrition has an irreversible impact on the physical as well as mental development of the child. Nutrition-related factors were responsible for about 35% of child deaths and 11% of the total global disease burden. This health condition continues to be a major public health issue across the globe.

Methods: Three standard indices based on anthropometric measurements viz. weight and height, that describe nutritional status of children are: height-for-age (stunting), weight-for-age (underweight) and weight-for-height (wasting). Z-scores have been computed on the basis of appropriate anthropometric indicators (weight & height) relative to the WHO International reference population for the particular age. This paper utilises unit-level data on under-five children of India from the NFHS-5, 2019-2021 to find out factors which exert a differential impact on the conditional distribution of the outcome variable. A class of models that allow flexible functional dependence of an outcome variable on covariates by using nonparametric regression have been applied to determine possible factors causing undernutrition. This study also fits a Bayesian additive quantile regression model for the provision of a complete picture of the relationship between the outcome variable and the predictor variables on different desired quantiles of the response distribution. Different types of quantile regression models were fitted and compared according to each Deviance Information Criteria (DIC) for determination of the best model among them.

Results: Maternal characteristics like nutrition, education showed significant impact on child’s nutritional status, consistent with the findings of other studies. Child’ s age and Mother’s nutrition were among the continuous factors exerting non-linear effect on stunting, with mother’s BMI showing maximum effect size at lower end of the distribution. Also it could be seen that maximum number of covariates were found significant for severe undernutrition, indicating that differential effect of predictors on the conditional distribution of the outcome variables.

Conclusions: Although widely applicable, logistic regression model enables the researcher to have an idea of the determinants of undernutrition, providing only a preliminary basis. To study variables such as nutritional status of children, where lower quantiles are of main interest, focus should be on how factors affect the entire conditional distributional of the outcome variable taken as is rather than summarizing the distribution at its mean. This can be achieved by applying quantile regression modeling. An extension to it further enables to non-parametrically estimate the linear or potentially non-linear effects of continuous covariates differentially on the outcome using penalized splines.



posters-wednesday-ETH: 3

Comparison of deep learning models with different architectures and training populations for ECG age estimation: Accuracy, agreement, and CVD prediction

Arya Panthalanickal Vijayakumar1, Tom Wilsgaard1, Henrik Schirmer2,3, Ernest Diez Benavente4, René van Es5, Rutger R. van de Leur5, Haakon Lindekleiv6, Zachi I. Attia7, Francisco Lopez-Jimenez7, David A. Leon8, Olena Iakunchykova9

1Department of Community Medicine, UiT The Arctic University of Norway, Norway; 2Akershus University Hospital, Lørenskog, Norway; 3Institute of Clinical Medicine, Campus Ahus, University of Oslo, Norway; 4Department of Experimental Cardiology University Medical Center Utrecht, The Netherlands; 5Department of Cardiology University Medical Center Utrecht, The Netherlands; 6Department of Radiology, University Hospital of North Norway; 7Mayo Clinic College of Medicine, Rochester, MN, USA; 8Department of Noncommunicable Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, United Kingdom; 9Department of Psychology, University of Oslo, Norway

Background: Several convolutional neural networks (CNNs) have been developed to estimate biological age based on 12-lead electrocardiograms (ECG) - ECG age. This new biomarker of cardiac health can be used as a predictor of cardiovascular disease (CVD) and mortality. Before implementation into clinical practice, it is crucial to compare proposed CNN models used to estimate ECG age to assess their accuracy, agreement, and predictive abilities on external sample.

Methods: We used 7,108 participants from the Tromsø Study (2015-16) to compare ECG ages estimated with three different previously proposed CNNs. CNNs differed by model architecture and/or population that they were trained and tested on. We calculated mean absolute error (MAE) for each CNN. Agreement was assessed using Pearson and intraclass correlation coefficients (ICC), and Bland-Altman (BA) plots. The predictive abilities of each ECG age or δ-age (difference between ECG age and chronological age) were assessed by the concordance index (C-index) and hazard ratios (HRs) from Cox proportional hazards models for myocardial infarction (MI), stroke, CVD mortality, and all-cause mortality, with and without adjustment for traditional risk factors.

Results: All three CNNs had fairly close MAEs (6.82, 7.82, and 6.42 years) and similar Pearson correlation coefficients with chronological age (0.72, 0.71, and 0.73, respectively). Visual agreement using BA plots was good, and the ICC indicated good agreement (0.86; 95% CI: 0.86, 0.87). The multivariable adjusted HRs for MI and total mortality were strongest for δ-age1 (HR 1.36 (1.11, 1.67) and 1.27 (1.08, 1.50), respectively, while HRs for stroke and CVD mortality were strongest for δ-age2 (HR 1.45 (1.17, 1.80) and 1.48 (1.07, 2.05), respectively. The 6-year survival probability predictions showed excellent agreement among all δ-ages for all outcomes in terms of both BA plots and ICC. The C-index values showed no significant difference between pairwise combinations of models with ECG age1, ECG age2, or ECG age3 for all outcomes.

Conclusion: We observed good agreement between ECG ages estimated by three different CNNs in terms of accuracy, agreement, and predictive ability. We did not identify that one CNN for ECG age is superior over another for prediction of CVD outcomes or death in the Tromsø Study.



posters-wednesday-ETH: 4

Systematic review and real life-oriented evaluation on methods for feature selection in longitudinal biomedical data

Alexander Gieswinkel1,2,3, Gregor Buch1,3, Gökhan Gül1,4, Vincent ten Cate1,3,4, Lisa Hartung2, Philipp S. Wild1,3,4,5

1Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, 55131 Mainz, Germany; 2Institute of Mathematics, Johannes Gutenberg University Mainz, 55128 Mainz, Germany; 3German Center for Cardiovascular Research (DZHK), partner site Rhine Main, 55131 Mainz, Germany; 4Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis, University Medical Center of the Johannes Gutenberg University Mainz, 55131 Mainz, Germany; 5Institute of Molecular Biology (IMB), 55131 Mainz, Germany

Background

High-dimensional omics data are increasingly commonly available for longitudinal cohort studies since biochemical technology is improving. Supervised feature selection based on biomedical data from multiple time points is often required. However, an overview of existing methods for this setting is lacking, which motivated a systematic review and evaluation of this area.

Methods

A systematic search of statistical software was conducted to identify relevant methods. The Comprehensive R Archive Network (CRAN) was examined via the R package ‘packagefinder’ with a search query containing relevant keywords. Eligible software was characterised by manually screening the package descriptions, and through computational testing with a fix application example. An ADEMP-designed simulation study was conducted to evaluate the identified methods in real-world scenarios, considering varying sample sizes, predictors, time points and signal-to-noise ratios. Only frequentist implementations with given default settings were included for a fair comparison. The estimated true positive rate (eTPR) and estimated false discovery rate (eFDR) were chosen as performance measures.

Results

Of 21,528 accessible packages on CRAN, 324 packages with matching keywords in the descriptions were extracted by the search query. Screening of the descriptions identified 45 packages that were then tested in R, leading to 14 packages. Six packages were based on mixed effects models (‘buildmer’, ‘rpql’, ‘splmm’, ‘alqrfe’, ‘plsmmLasso’; ‘glmmLasso’), five on generalized estimating equations (‘sgee’, ‘LassoGEE’, ‘geeVerse’; ‘PGEE’, ‘pgee.mixed’), two methods were built on Bayesian frameworks (‘sparsereg’; ‘spikeSlabGAM’) and one package was modelling time series (‘midasml’). All implementations were able to process continuous outcomes, while only four supported binary outcomes. A total of N=8 frequentist methods with sufficient default settings were considered in the simulation study.

The packages ‘buildmer’ and ‘pslmmLasso’ consistently demonstrated an eTPR exceeding 80% while maintaining the eFDR under 20%, across various signal-to-noise settings. By comparison, all other methods underperformed in jointly evaluating both performance metrics. ‘splmm’ achieved similar eFDR but yielded lower eTPR, whereas ‘geeVerse’ showed an opposite trend. In contrast, both ‘rpql’ and ‘alqrfe’ failed to select any variables.

Conclusions

Supervised feature selection in longitudinal biomedical data can be performed using a variety of methods. The majority of the available statistical software is based on frequentist techniques, while Bayesian procedures represent a minority. Alternative concepts like tree-based methods are notably absent. No evidence of superiority was found for modern selection techniques such as penalized regression (‘pslmmLasso’) over traditional approaches like stepwise regression (‘buildmer’) for feature selection in longitudinal data.



posters-wednesday-ETH: 5

How does acute exposure to environmental factors relate to stroke characteristics such as stroke type, severity, and impairments?

BOHAN ZHANG1, Andy Vail1, Craig Smith2, Amit Kishore2, Matthew Gittins1

1University Of Manchester, United Kingdom; 2Manchester Centre for Clinical Neuroscience, Manchester, United Kingdom

Background and Aims: The overall aim of this subject is to better understand the association between acute exposure to environmental factors such as ambient air pollution and temperature and stroke characteristics. We aim to focus on the short-term acute effects associated with same-day or up to 30 days before stroke. Specifically, I will look into Stroke Counts and Stroke Severity using non-identifiable patients data from SSNAP from Manchester Stroke Units.

Methods: We may employ a cohort (or case-control design), where the cohort is all stroke patients within Greater Manchester/Salford, their exposure is the exposure leading up to stroke, and their outcomes are the post-stroke characteristics. It’s more likely to be a cohort study, but the case-control might help deal with some of the selection issues, i.e. the group is defined by being a stroke patient. Rather than being identified before and following up to see if they become a stroke patient or not.

We will employ methods to model the lagged effects of air pollution such as the lag stratified model (where days are grouped and average exposure is modelled), and the distributed lag models (where polynomial functions are applied to represent the 30 days).

Results: The results are still under way and will be presented on ISCB.

Conclusion: From some literature on other diseases it often seen that extreme condition in Environmental Exposure is highly likely to lead to a worse outcome. But the result is still on the way



posters-wednesday-ETH: 6

Improving TBI Prognosis in developing world: A Machine Learning-based AutoScore Approach to Predict Six-Month Functional Outcome

Vineet Kumar Kamal1, Deepak Agrawal2

1AIIMS, Kalyani; 2AIIMS, New Delhi

Background

Traumatic brain injury (TBI) presents a significant challenge in predicting long-term functional outcomes due to its complex nature and variability among patients. Accurate prognostic tools are essential for clinicians to guide treatment decisions and set realistic expectations for recovery. To address this, AutoScore employs a machine learning-based approach that automates the generation of clinical scores, facilitating the prediction of outcomes. This study aims to develop, validate and to see clinical utility of a prognostic model to accurately predict six-month functional outcomes in severe/moderate, adult TBI patients, enhancing risk stratification.

Methods

This retrospective cohort study included 1,085 adult patients with TBI from a public, tertiary care, level-1 trauma center in India. We considered a total of 72 demographic, clinical, secondary insults, CT and lab variables from admission to first discharge. We developed the AutoScore framework, consisting of six distinct modules: variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. We divided the whole dataset randomly into (0.7, 0.1, 0.2) to develop, parameter tuning/validation, and testing. The predictive performance of the AutoScore framework was evaluated using various metrics, including receiver operating characteristic (ROC) curves, calibration curves, brier score, and decision curves for clinical utility analysis. All the analyses were performed using R software v.4.3.3.

Results

The AutoScore model identified only four key risk predictors: motor response at discharge, verbal response at discharge, motor response at the time of admission, and eye-opening response at discharge, with higher scores indicating an increased risk of an unfavorable six-month outcome in TBI patients. The final model achieved an AUC of 0.93 (95% CI: 0.88–0.98) on the validation set and 0.81 (95% CI: 0.76–0.86) on the test set, demonstrating strong predictive performance. Brier score was 0.14 and graphical plot, and observed-to-expected ratio 0.978 suggested that the model was well-calibrated in test data. Our model was useful in the 0.0–0.6 threshold range (offers better net benefit). The predicted risk increased steadily with the total score, as depicted in the probability plot, with patients scoring above 75 exhibiting near-certain risk of an unfavorable outcome.

Conclusion

The AutoScore-based prognostic model demonstrated strong predictive performance for six-month functional outcomes in moderate-to-severe TBI patients using only four key predictors. These findings suggest that the model could serve as a valuable tool for clinicians in early risk assessment and decision-making. Further validation in diverse populations with recent data is warranted to confirm its generalizability and clinical applicability.



posters-wednesday-ETH: 7

Predictive Risk Index for Poor Cognitive Development Among Children Using Machine Learning Approaches

Anita Kerubo Ogero1, Patricia Kipkemoi1, Amina Abubakar1,2,3

1Aga Khan University, Nairobi, Kenya, Institute for Human Development, Aga Khan University, P.O. BOX 30270-00100, Nairobi, Kenya; 2Centre for Geographic Medicine Research Coast, Kenya Medical Research (KEMRI), P.O Box 230-80108, Kilifi, Kenya; 3Department of Psychiatry, University of Oxford, Warneford Hospital, Warneford Ln, Oxford OX37JX, United Kingdom

Poor cognitive development in early childhood is a major global concern, with over 200 million children failing to reach their developmental milestones due to factors like malnutrition and poverty - particularly in low- and middle-income countries. Cognitive abilities established during childhood are critical determinants of a child’s future academic and socio-economic outcomes. Despite extensive research on socio-demographic, environmental and nutritional influences on cognitive development, there remains a gap in developing a predictive risk index tailored for resource-constrained settings. Early identification of at-risk children is essential to enable timely interventions and inform policy. In this study, we propose to develop and validate a risk index for poor cognitive development among children using advanced machine-learning techniques. Secondary data from approximately 7,000 children, assessed with the Raven’s Progressive Matrices (RPM), will be analysed; cognitive development is classified into no-risk, low-risk, and high-risk groups based on age-adjusted percentile scores. Predictor variables integrate socio-demographic factors (e.g., parental education, socioeconomic status) and nutritional indicators (e.g., anthropometric measurements such as height, weight, head circumference, and derived indices like weight-for-age and height-for-age z-scores). Our analytic framework integrates several methods including logistic regression, Random Forest, Support Vector Machines, Artificial Neural Networks, and Extreme Gradient Boosting. Data preprocessing involves feature selection via Recursive Feature Elimination (RFE) and dimensionality reduction using Principal Component Analysis (PCA). Decision thresholds will be optimised through the Receiver Operating Characteristic (ROC) curve analysis and Youden’s Index to balance sensitivity and specificity. Key risk factors significantly associated with poor cognitive development will be identified, forming the basis for a validated risk index. The risk index will be assessed for predictive accuracy and generalisability. The developed risk index will represent a significant advancement in the early identification of children at risk for poor cognitive development in low-resource environments. Findings may inform policy decisions and the development of digital tools, such as mobile applications, for real-time cognitive risk assessment. Moreover, this tool holds promise for improving long-term developmental outcomes by optimising resource allocation and enabling targeted interventions.



posters-wednesday-ETH: 8

Development and validation of a model to predict ceiling of care in COVID-19 hospitalised patients

Natàlia Pallarès Fontanet1, Hristo Inouzhe2, Jordi Cortés3, Sam Straw4, Klaus K Witte4, Jordi Carratalà5, Sebastià Videla6, Cristian Tebé1

1Biostatistics Support and Research Unit, Germans Trias i Pujol Research Institute and Hospital (IGTP), Badalona, Spain; 2Basque Center for Applied Mathematics, BCAM, Bilbao, Spain; 3Department of Statistics and Operations Research, Universitat Politècnica de Catalunya/BarcelonaTech, Barcelona, Spain; 4Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK; 5Department of Infectious Diseases, Bellvitge University Hospital, Barcelona, Spain; 6Clinical Research Support Area, Department of Clinical Pharmacology, Germans Trias i Pujol University Hospital, Badalona, Spain

Background: Therapeutic ceiling of care is the maximum therapeutic effort to be offered to a patient based on age, comorbidities, and the expected clinical benefit in relation to resource availability. COVID-19 patients with and without an assigned ceiling of care at hospital admission have different baseline variables and outcomes. Analysis of hospitalised COVID-19 subjects should be stratified by ceiling of care to avoid bias, but there are currently no models to predict their ceiling of care. We aimed to develop and validate a clinical prediction model to predict ceiling of care at hospital admission.

Methods: The data used to develop the model came from an observational study conducted during four waves of COVID-19 in 5 centres in Catalonia. Data were sampled 1000 times by bootstrapping. For each sample, a logistic regression model with ceiling as outcome was fitted using backward elimination. Variables retained in more than 95% of the models were candidates for the final model. Alternative variable selection methods such as Lasso, CART, and Boruta were also explored. Discrimination was assessed by estimating the area under the ROC curve and the Brier Score, and calibration by comparing observed versus expected probabilities of ceiling of care by deciles of predicted risk. The final model was validated internally, and externally using a cohort from the Leeds Teaching Hospitals NHS Trust.

Results: A total of 5813 patients were included in the development cohort, of whom 31.5% were assigned a ceiling of care on admission. A model including age, COVID-19 wave, chronic kidney disease, dementia, dyslipidaemia, heart failure, metastasis, peripheral vascular disease, chronic obstructive pulmonary disease, and stroke had excellent discrimination (AUC 0.898 [0.889; 0.907]; Brier Score 0.113) and calibration (slope of the regression line between observed and predicted β=1.01 [0.94; 1.08]) in the whole cohort and in subgroups of interest. External validation on the Leeds Teaching Hospitals cohort also showed good performance (AUC 0.934 [0.908; 0.959]; Brier Score 0.110; β=0.98 [0.80; 1.17]).

Conclusions: Ceiling of care can be predicted with great accuracy from baseline information available at hospital admission. Cohorts without information on ceiling of care could use our model to estimate the probability of ceiling of care. This model, combined with clinical expertise, may be valuable in future pandemics or emergencies requiring time-sensitive decisions about life-prolonging treatments, but further evaluation outside of COVID-19 is needed.



posters-wednesday-ETH: 9

Transformer Models for Clinical Prediction – Investigation of BEHRT in UK Biobank and prediction assessment under different scenarios

Yusuf Yildiz, Goran Nenadic, Meghna Jani, David A. Jenkins

The University of Manchester, United Kingdom

Background:

Transformer-based Large Language models (LLMs) like BEHRT1 have shown potential in modelling Electronic Health Records to predict future instances. These models can represent patient histories by including structured (diagnoses) and unstructured data (doctor notes)2. BEHRT showed superior performance over the state-of-the-art models at the time its developed using a large primary care data set. However, it’s unclear if such model and high accuracy can be achieved for other real-world datasets i.e. hospital data. Developing LLMs requires selecting various decisions like data split strategies, medical terminology selection and parameters. Parameter choices have been shown to impact model performance, stability and generalisability, but it’s unclear the extent this also hold for LLMs. This study aims to implement the BEHRT architecture in the UK Biobank and identify challenges of implementing this model into different dataset. The secondary aim is to assess the impact of parameter choices on prediction performance.

Methods:

This study uses UK Biobank data. To capture key features of patient histories, embeddings are created using diagnoses and age at diagnosis. BEHRT workflow included pretraining with masked language modelling (MLM) and fine-tuning for next-diseases prediction across different time frames. Prediction performance was evaluated using Average Precision Score and AUROC. Initially, the original study is replicated using UK Biobank to assess the impact of dataset variability. Subsequently, the model’s performance was evaluated to assess the effects of different medical terminologies (ICD10 and CALIBER phenotyping) and data splits.

Results/Conclusion:

Results showed that decisions that we make while we develop these models using different datasets effects the performance of the model. Our replicated BEHRT model did not achieve as hight predictive performance as performance metrics as the original. Terminologies with bigger vocabularies showed worse performance. Complete separation of the MLM and fine-tuning data resulted with worse performed model. However, most developed models use complete dataset for pre-training and therefore are likely to exhibit overly optimistic performance.

Also, more rigorous, definitive framework and assessment workflow is needed for LLM development in clinical prediction. Especially clinical usefulness of these model should be examined. Reporting guidelines, like TRIPOD-LLM3 should be used for transparent model development.

Further work is needed on time-to-event analysis, censoring adjustment, transparent decision-making and computational costs adjustment for better integration into clinical prediction.



posters-wednesday-ETH: 10

What is the best way to analyse ventilator-free days?

Laurent Renard Triché1,2,3, Matthieu Jabaudon1,2, Bruno Pereira4, Sylvie Chevret2,5

1Department of Perioperative Medicine, CHU Clermont-Ferrand, Clermont-Ferrand, France; 2iGReD, INSERM, CNRS, Université Clermont Auvergne, Clermont-Ferrand, France; 3ECSTRRA Team, IRSL, INSERM UMR1342, Université Paris Cité, Paris, France; 4Biostatistics Unit, Department of Clinical Research, and Innovation (DRCI), CHU Clermont-Ferrand, Clermont-Ferrand, France; 5Department of Biostatistics, Hôpital Saint-Louis, AP-HP, Paris, France

Introduction

Ventilator-free days (VFDs) are a composite outcome increasingly used in critical care research, reflecting both survival and mechanical ventilation duration. However, inconsistencies exist in the models used to analyse VFDs. Some researchers evaluate VFDs as a count, primarily using the Mann-Whitney statistics, while others consider them as a time-to-event outcome, where survival is a competing risk for extubation. Alternative approaches such as the multi-state model and the win ratio warrant investigation.

This study aimed to evaluate different statistical models to determine the best approach for analysing VFDs.

Methods

First, a clinical trial dataset (LIVE study, NCT02149589) was used to apply different statistical models to analyse VFDs. Then, 16 datasets of 300 individuals were simulated with 3,000 independent replications, comparing a control group with an intervention strategy by varying survival rates and ventilation durations derived from exponential distributions. The simulated data were analysed using the same statistical methods, and statistical power and type I error rates were compared between different models.

Eleven statistical methods were evaluated, including the Mann-Whitney test, the zero-inflated negative binomial model, the negative binomial hurdle model, the zero-inflated Poisson model, the Poisson hurdle model, the log-rank test, the Gray test, the cause-specific hazard model, the Fine-Gray model, the multistate Markov model, and the win ratio.

In addition, three sensitivity analyses were performed by adjusting the survival rates and/or ventilation durations in the control group.

Results

In the LIVE study, almost all methods identified a significant association between VFDs (or related measures) and the patient groups, except for the count submodels, the log-rank test, and the cause-specific hazard model for the survival.

For the simulated data, the 28-day mortality rate was set at 20% and the mean duration of ventilation at 15 days for the control group. Most statistical methods effectively controlled the type I error rate, although exceptions included the zero-inflated and hurdle Poisson/negative binomial count sub-models and the cause-specific Cox regression model for survival. Statistical methods had variable power to detect survival benefits and effects on duration of ventilation, with the time-to-event approach and the win ratio generally having the highest power.

The sensitivity analyses found similar results.

Conclusion

The time-to-event approach and the win ratio were more appropriate than the count-based methods to analyse the VFDs and may be extended to other free-days outcomes. Simulation should be recommended for power calculation and sample size estimation rather than a simplified formula.



posters-wednesday-ETH: 11

Compare Estimation and Classification Performances of Statistical Shrinkage Methods Ridge Regression, Lasso Regression, and Elastic Net Regression

Gamze Ozen, Fezan Mutlu

Eskisehir Osmangazi University, Medical Faculty, Department of Biostatistics, Eskisehir, Turkey

Introduction: Advances in data science indicate the need to improve the reliability of regression model estimation when the number of independent variables exceeds the number of observations in multidimensional datasets. Such a dataset's multicollinearity causes the accuracy of prediction models to be reduced. This study aims to assess the performance of Ridge, Lasso, and Elastic net regression methods in the case of multicollinearity and multidimensional datasets.

Method: Performance of three regression methods where Ridge, Lasso, and Elastic Net is verified by data stimulation that Elastic Net method exhibits superiority to all the strongly correlated variables into the model over Ridge and Lasso methods. Models are applied to the dataset containing the serum miRNA in large cohorts to identify the miRNAs that can be used to detect breast cancer in the early stage (Shimomura et al., 2016).

Results: Data simulations verify that Elastic Net regression produces better results with an accuracy of 0.963 when the data is high-dimensional and has strong multicollinearity. A determination of breast cancer by miRNAs shows that Elastic Net can use classification with 96% accuracy.

Conclusion: The findings suggest that statistical Shrinkage Methods such as Ridge Regression, Lasso Regression, and Elastic Net Regression are reliable and useful for prediction and classification research on linear and logistic models. This study suggests that Statistical Shrinkage Methods may be enhanced in health science to generate stronger models.



posters-wednesday-ETH: 12

Defining Harm in Settings with Outcomes that are Not Binary

Amit Sawant, Mats Stensrud

EPFL, Switzerland

The increasing application of automated algorithms in personalised medicine necessitates that algorithm recommendations do not harm patients, in accordance with the Hippocratic maxim of “Do no harm.” A formal mathematical definition of harm is essential to guide these algorithms in adhering to this principle. A counterfactual definition of harm has been previously proposed, which asserts that a treatment is considered harmful if there exists a non-zero probability that the potential outcome under treatment for an individual is worse than the potential outcome without treatment. Existing literature on counterfactual harm has primarily focused on binary treatments and outcomes. This study aims to illustrate that in scenarios involving multiple treatments and multi-level outcomes, the counterfactual definition of harm can result in intransitivity in the ranking of treatments. Specifically, we analyse three treatments—A, B, and C—for a particular disease. We demonstrate that treatment B is less harmful than treatment A, treatment C is less harmful than treatment B, yet treatment C is more harmful than treatment A in direct comparison, if we follow the counterfactual definition. Our example highlights that the intuitive concept of counterfactual harm in binary settings does not extend to scenarios involving more than two treatments and outcomes. On the other hand, an interventionist definition of harm in terms of utility circumvents the issue of intransitivity.



posters-wednesday-ETH: 13

Brier pseudo-observation score for selecting a multiplicative, an additive or an additive-multiplicative hazards regression model

François Lefebvre1, Roch Giorgi2

1Groupe méthode en recherche clinique, service de santé publique, Hôpitaux universitaires de Strasbourg, Strasbourg, France; 2Aix Marseille Univ, APHM, Inserm, IRD, SESSTIM, ISSPAM, Hop Timone, BioSTIC, Marseille, France

Background In survival analysis, data can be modelled in different ways: with the Cox model, with an additive hazards model, as the Aalen’s model or with an additive-multiplicative model, as the Cox-Aalen model. Covariates act on the baseline hazard multiplicatively in the first model, additively in the second and some of these act multiplicatively, others additively in the third. Correct modelling of the covariates requires knowledge of its effect on the baseline hazard, which is rarely known a priori. The pseudo-observations has been used in the evaluation of the impact of a covariate on survival outcomes, in addition to the verification of the assumptions inherent in the Cox (proportional hazards, log-linearity) and the Aalen (linearity) models [1]. Nowadays, they do not permit to know which one of the multiplicative, additive or additive-multiplicative model is the more appropriate for a particular survival dataset. The aim of this study is to propose a method for selecting a multiplicative, an additive or an additive-multiplicative hazards regression model adapted to the survival data-generating mechanism.

Methods We propose to use the Brier pseudo-observation score defined by Perperoglou [2] as the mean of the square difference of the pseudo-observations and the survival estimates obtained using a regression model. Therefore, for each type of regression model, Brier pseudo-observation score can be computed and compared to each other. Since the Brier pseudo-observation score is analogous to the mean square error of prediction, the lower the score the better the model. In order to reduce the risk of overfitting, the model parameters were estimated for each individual using Jackknife. Performance of this approach was assessed in simulation studies comparing Brier pseudo-observation score obtained with a multiplicative, an additive and an additive-multiplicative model, in situations in which survival data-generating mechanism was either multiplicative, additive or had both effect.

Results This measure selected the model used to generate the data in over 80% in most of the scenarios considered. The utilisation of this approach was exemplified by an epidemiological example of female breast cancer with the objective of ascertaining the impact of nodal status, age and tumour size on the baseline hazard.

Conclusion This method has been demonstrated to achieve optimal performance in the selecting the hazards regression model adapted to the data-generating mechanism.

[1] M. Pohar Perme, K. Andersen. Statistics in Medicine, 27, 2008, 5309–5328.

[2] A. Perperoglou, A. Keramopoullos, H. C. van Houwelingen. Statistics in Medicine, 26, 2007, 2666–2685.



posters-wednesday-ETH: 14

Bayesian spatio-temporal analysis of the COVID-19 pandemic in Catalonia

Pau Satorra, Cristian Tebé

Biostatistics Support and Research Unit, Germans Trias i Pujol Research Institute and Hospital (IGTP), Spain

Introduction: The COVID-19 pandemic posed an unprecedented challenge to public health systems worldwide. The spread of the pandemic varied in different geographical regions, even at the level of small areas. This study investigates the spatio-temporal evolution of COVID-19 cases and hospitalisations in the different basic health areas (ABS) of Catalonia during the pandemic period (2020-2022). Additionally, it assesses the impact of demographic and socio-economic factors, as well as vaccination coverage, on infection and hospitalisation rates at an ABS level.

Methods: Data were obtained from the official open data catalogue of the Government of Catalonia. Bayesian hierarchical spatio-temporal models were used, estimated with Integrated Nested Laplace Approximation (INLA) Demographic and socio-economic ABS variables were included in the models to assess its role as risk factors for cases and hospitalisations. Full ABS vaccination coverage was also incorporated to assess its effect. All analyses were performed using the R statistical program.

Results: During the study period, a cumulative total of 2,685,568 COVID-19 cases and 144,550 hospitalisations were reported in Catalonia, representing a 35% and a 1.89% of the total population, respectively. The estimated spatial, temporal and spatio-temporal relative risks (RR) were visualized through maps and plots, identifying high-risk (hotspots) and low-risk (coldspots) areas and weeks. These results were presented in an interactive R-shiny application: https://brui.shinyapps.io/covidcat_evo/. Urban areas had a higher risk of cases (RR: 5%, CI95%: 2-9%) and hospitalisations (RR: 17%, CI95%: 10-25%). Higher socio-economic deprivation index was associated with an increased hospitalisation risk (RR: 19%, CI95%: 17-22%). Finally, a higher full vaccination coverage in the ABS was associated with a reduced risk of cases (RR: 12%, CI95%: 5-18%) and hospitalisations (RR: 17%, CI95%: 2-32%) during the fourth and fifth pandemic waves.

Conclusion: This study provides a comprehensive study to understand the COVID-19 pandemic across the territory of Catalonia at the small area level, revealing the spatial, temporal and spatio-temporal patterns of the disease. Urban areas had a higher risk of COVID-19 cases and hospitalisations, socio-economic deprivation increased hospitalisations, and full vaccination was protective against cases and hospitalisations during specific pandemic waves. These findings offer valuable insights for public health policymakers to design targeted interventions against future infectious disease threats.



posters-wednesday-ETH: 15

A Simulation Study of Bayesian Approaches to Spatial Modelling Using the Besag-York-Mollie Model

Hollie Hughes, David Hughes

Department of Health Data Science, University of Liverpool, United Kingdom

Background/Introduction:

Spatial modelling can be a useful tool for analysing patterns and relationships in data to indicate how events might be spatially related. It is known that neighbouring areas tend to be more strongly correlated and share similar characteristics than distant areas when modelling and mapping data, creating a spatial autocorrelation problem. Spatial models have been successfully developed to account for this autocorrelation problem in areal data, allowing patterns to be succesfully modelled. However, to do this in a Bayesian framework, the Markov Chain Monte Carlo (MCMC) method can often be computationally expensive, particularly in larger spatial datasets. Therefore, many researchers opt for the Integrated Nested Laplace Algorithm (INLA) approach for computational savings. We suggest an alternative using approximate Mean Field Variational Bayes (MFVB) algorithms to decrease the computational burden as the INLA approach does, whilst potentially sustaining accuracy that is promised through the MCMC approach.

Method:

We provide a comparison of the MCMC, INLA and MFVB approach to the Besag-York-Mollie (BYM) model which is commonly used for spatial modelling to account for spatial dependencies. We conducted a simulation study to compare the performance of the three approaches to fitting the BYM model to spatially structured data on incidence of Depression. Synthetic datasets were generated under the BYM model specification outlined in Morris (2019), incorporating both spatially structured and unstructured random effects (1).

Each method was implemented using standard Bayesian modelling tools in R: INLA via the R-INLA package, MCMC using Stan, and MFVB using Stan’s Variational Bayes options. We assessed computational efficiency, and accuracy for each method by comparing posterior estimates against the true simulated values and measuring time taken to fit each model. Accuracy of results were assessed both in terms of distributional similarity and accuracy of point estimates.

Results:

Results will include comparisons of accuracy and performance metrics including measures comparing the ground truth with MCMC results and computation time for each model. The results will be summarised across multiple simulated datasets to evaluate consistency and robustness. Evaluation is ongoing but full results will be presented at the conference.

Conclusion:

This simulation study may indicate the usefulness of the MFVB approach as an alternative to the MCMC approach with the potential of being substantially as accurate when simulated values are known, alongside possible computation speed gains.

References:

1. Morris M. Spatial Models In Stan: Intrinsic Auto-Regressive Models for Areal Data 2019 [Available from: https://mc-stan.org/users/documentation/case-studies/icar_stan.html.



posters-wednesday-ETH: 16

A Bayesian analysis of FINEARTS-HF

Alasdair D Henderson1, Brian L Claggett2, Akshay S Desai2, Mutthiah Vaduganathan2, Carolyn S Lam3, Bertram Pitt4, Michele Senni5, Sanjiv J Shah6, Adriaan A Voors7, Faiez Zannad8, Meike Brinker9, Flaviana Amarante10, Katja Rohwedder11, James Lay-Flurrie12, Scott D Solomon2, John JV McMurray1, Pardeep S Jhund1

1BHF Glasgow Cardiovascular Research Center, School of Cardiovascular and Metabolic Health, University of Glasgow, Glasgow, Scotland, UK; 2Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA; 3National Heart Centre Singapore & Duke-National University of Singapore, Singapore; 4University of Michigan, School of Medicine, Ann Arbor, Michigan, USA; 5University Bicocca Milan, Italy, Papa Giovanni XXIII Hospital, Bergamo, Italy; 6Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA; 7University of Groningen, Groningen, Netherlands; 8Université de Lorraine, Inserm Clinical Investigation Centre, CHU, Nancy, France; 9Bayer AG, Research & Development, Pharmaceuticals, Wuppertal, Germany; 10Cardiology and Nephrology Clinical Development, Bayer SA, São Paulo, Brazil; 11Global Medical Affairs, Berlin; 12Bayer plc, Research & Development, Pharmaceuticals, Reading, UK

Background: The FINEARTS-HF trial was a large, double-blind, placebo-controlled, randomised trial of the non-steroidal mineralocorticoid receptor agonist (MRA) finerenone. Conventional frequentist analysis of FINEARTS-HF found that finerenone reduced the primary composite endpoint of heart failure events and cardiovascular death in patients with heart failure with mildly reduced or preserved ejection fraction (HFmrEF/HFpEF) (rate ratio 0.84; 95% confidence interval, 0.74 to 0.95; P = 0.007). Bayesian methods offer alternative analytical approaches to provide probabilistic estimates of efficacy and safety, and flexibility to allow the inclusion of prior information and hierarchical modelling of subgroup effects. We analysed FINEARTS-HF with Bayesian methods to demonstrate the strengths and limitations compared to the primary frequentist analysis.

Methods: In a pre-specified Bayesian analysis of FINEARTS-HF, we estimated treatment efficacy under a range of scenarios incorporating prior information from two trials of finerenone in participants with chronic kidney disease and type 2 diabetes (FIDELIO-DKD and FIGARO-DKD, pooled in the FIDELITY program) and a steroidal MRA in patients with HFmrEF/HFpEF (TOPCAT). We also used a combination of these trials in a robust meta-analytic prior. All models of the primary recurrent endpoint were analysed with Bayesian Cox proportional hazards models, with stratum-specific baseline hazards and hierarchical structure for subject-specific random effects. Secondary endpoints were analysed with Bayesian stratified Cox proportional hazards models. We used Bayesian hierarchical models to estimate subgroup effects with reduced heterogeneity from small sample sizes in frequentist subgroup analyses.

Results: A total of 6,001 patients were included and the Bayesian analysis with vague priors confirmed the primary frequentist results with a 95% probability that the rate ratio was between 0.74 and 0.94. Including prior information from previous nonsteroidal and steroidal MRA trials supported this finding and strengthened the probability of a beneficial treatment effect. Bayesian subgroup estimates were qualitatively similar to frequentist estimates but more precise and closer to the overall treatment effect. The probability that finerenone improves survival time until cardiovascular death was 79% (HR 0.93, 95% CrI: 0.79-1.09, Pr(HR<1) = 79%), and all-cause mortality was 87% (HR 0.94, 95% CrI: 0.84-1.05, Pr(HR<1) = 87%), although any benefit was likely small on an absolute scale.

Conclusion: The non-steroidal MRA finerenone reduced the rate of heart failure events and cardiovascular death, and there is a strong probability that there is a small reduction in CV death and all-cause mortality. Bayesian methods offer additional insights to the analysis of a large randomized control trial.



posters-wednesday-ETH: 17

Fast Approximation of Joint Models: A Comparative Evaluation of Bayesian Methods

Jinghao Li, David M Hughes

University of Liverpool, United Kingdom

Background Joint models are widely employed in statistics to simultaneously analyze longitudinal data and time-to-event data, effectively capturing the dynamic relationships between the two processes. This framework has shown significant utility in biostatistics and clinical research. The widespread adoption of joint models enable clinicians to make predictions about patient specific risk that update over time, and aid clinical decision making. However, the increased complexity of joint models compared to separate longitudinal and survival models necessitates more sophisticated parameter estimation methods. Early contributions using Maximum Like lihood Estimation (MLE) laid the foundation for joint model estimation, followed by advance ments in Bayesian methods that employed Markov Chain Monte Carlo (MCMC) techniques for inference. While MCMC-based approaches, such as JMBayes and rstanarm, provide accu rate parameter estimates, they are computationally expensive and exhibit slow convergence, particularly when handling large datasets and multiple longitudinal variables. More recently, the INLAjoint package has been introduced, applying the Integrated Nested Laplace Approx imation (INLA) to joint models, offering faster computation but with potential trade-offs in accuracy. Method Variational Bayes (VB) inference, originally popularized in artificial intelligence applications, has gained increasing attention in statistical research due to its computational efficiency and scalability, as highlighted by Ormerod and Wand (2010). This study aims to provide a com prehensive evaluation of existing Variational Bayes methods for joint models, comparing their performance with established MCMC- and INLA-based approaches. The comparison focuses on key evaluation criteria, including computational efficiency, estimation accuracy, error rates, and convergence speed. Implementations from existing R packages, including Stan-based MCMC and Variational Bayes algorithms, are used in the analysis. Performance is assessed through simulation studies generated with the simsurv package (Brilleman) under controlled conditions, as well as through validation on real-world data from the Primary Biliary Cirrhosis (PBC) study. Results The results will include a detailed comparison of model fitting times, estimation accuracy, error metrics, and convergence properties across the different approaches. The evaluation is ongoing, and comprehensive results will be presented at the conference. Future analyses will explore potential trade-offs in estimation bias and error, providing insights into the relative advantages of different inference methods for large-scale joint model applica tions. Keywords Joint Model, Variational Bayes, Bayesian Inference, MCMC, INLA, Longitudinal Data, Survival Analysis References Ormerod, J.T. and Wand, M.P. (2010). Explaining Variational Approximations. The American Statistician, 64(2), pp.140–153. doi: https://doi.org/10.1198/tast.2010.09058.



posters-wednesday-ETH: 18

Confidence Intervals for Comparing Two Independent Folded Normals

Eleonora Di Carluccio1, Sarah Ogutu2, Ozkan Köse3, Henry G. Mwambi1,2, Andreas Ziegler1,2,4,5

1Cardio-CARE, Medizincampus Davos, Davos, Switzerland; 2School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzbug, South Africa; 3Orthopedics and Traumatology Department, Antalya Training and Research Hospital, Antalya, Turkey; 4Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany; 5Centre for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

The absolute change in the angle measured immediately after surgery and after bone healing is a clinically relevant endpoint to judge the stability of an osteotomy. Assuming the difference in angles is normally distributed, the absolute difference follows a folded normal distribution. The confidence interval for the angle change of a novel fixation screw compared to a standard fixation screw may be used for evaluating non-inferiority. In this work, we suggest that the simple two-sample t-statistic or Welch-statistic may serve as the basis for confidence interval calculations for the difference between two folded normal. The coverage probabilities of derived confidence intervals are investigated by simulations. We illustrate the approaches with data from a randomized controlled trial and an observational study on hallux valgus i.e., bunion surgery. In the simulation studies, asymptotic and both non-parametric and parametric bootstrap confidence intervals based on the t-statistic and Welch-test were close to nominal levels. Methods based on the chi-squared distributions were not deemed appropriate for comparing two folded normal. We recommend using confidence intervals based on the t-statistic or the Welch-statistic for evaluating non-inferiority in trials where the stability of angles after osteotomy is to be compared.



posters-wednesday-ETH: 19

Towards Realistic Synthetic Nanopore Protein Signals: A Comparative Study of Stochastic and GAN-Based Methods

Göran Köber1, Jonas Bürgel1, Tobias Ensslen1,2, Oliver Amft1,2

1University of Freiburg, Germany; 2Hahn-Schickard, Germany

Nanopores provide a powerful tool for molecular analysis, enabling direct, single-molecule measurements of nucleotides, peptides, and other biopolymers. However, developing machine learning models for tasks like peptide sequence recognition is challenging due to the scarcity of labeled training data, as experimental data collection is both expensive and time-consuming. Synthetic data generation offers a promising solution by providing high-quality, customizable datasets for algorithm development and benchmarking.

We develop and compare several techniques for generating synthetic nanopore protein data, leveraging both stochastic methods and deep learning approaches, with a particular focus on Generative Adversarial Networks (GANs). The generated signals can be of arbitrary lengths, reaching up to hundreds of thousands of steps—far exceeding commonly reported time series lengths in the literature. The generation process is structured into two phases. First, a flat reference signal is synthesized to mimic the general shape of a blockade. Next, fluctuation generation algorithms introduce the fluctuating patterns of experimental data into the reference signal.

Multiple signal generation algorithms are explored, starting with a simple Gaussian noise model as a baseline. More advanced stochastic approaches, combining cubic interpolation with Gaussian noise, produce signals that closely resemble real blockade events. Additionally, an RNN-WGAN architecture is developed to generate arbitrarily long, high-fidelity signals that are challenging to distinguish from experimentally observed data. To evaluate the quality of generated signals, a discriminative score is computed using an RNN classifier, complemented by dimensionality reduction on established feature extraction libraries for time series data.

We also provide a comparative analysis of stochastic and data-driven methods, examining both their qualitative and quantitative differences and find that GAN-based methods achieve the best overall results. To the best of our knowledge, this work is the first to introduce high-quality synthetic nanopore protein sensing data generation methods, paving the way for advanced machine learning applications and addressing the critical need for labeled, customizable synthetic datasets in the field.



posters-wednesday-ETH: 20

Data Transformations in Machine Learning Approaches for Studying Microbiota as a Biomarker of Non-Response Risk to CFTR Modulators

Marta Avalos1, Céline Hosteins1, Diego Kauer1, Chloé Renault1, Raphaël Enaud2, Laurence Delhaes2

1University of Bordeaux - Inria - Inserm BPH U1219, France; 2University of Bordeaux, CHU Bordeaux, Inserm U1045, France

Cystic fibrosis (CF) is a genetic disease caused by mutations in the CF transmembrane conductance regulator (CFTR) gene. Impaired mucociliary clearance and the accumulation of respiratory secretions, combined with an altered immune response and chronic treatments, disrupt the airway microbiota and mycobiota. These dysbioses, characterized by reduced microbial diversity and a predominance of opportunistic pathogens, correlate with disease severity and may serve as biomarkers for disease progression.

The introduction of CFTR modulator therapies has transformed CF management, significantly altering the disease’s clinical course by enhancing mucosal hydration and improving patient outcomes. However, response to these therapies remains highly variable among patients, underscoring the need for predictive biomarkers. The airway and digestive microbiota, which play a crucial role in disease progression, represent promising candidates. While bacterial and fungal dysbioses in CF are well documented, their potential as biomarkers for predicting therapeutic response remains poorly explored, posing significant methodological challenges.

Microbiome studies in CF typically involve small cohorts and high-dimensional data, often compositional, zero-inflated, and sometimes longitudinal. Moreover, integrating heterogeneous data sources—including bacterial and fungal communities from different anatomical sites (lung and gut) alongside clinical factors—is essential for building robust predictive models. This requires advanced statistical and machine learning approaches to address challenges in feature selection, model interpretability, and data integration.

In this study, based on CF patients from the French LumIvaBiota cohort, we examine how transformations of relative abundance data affect both the performance and interpretability of various linear (Lasso, PLS, PCA regression) and non-linear (SVM, Random Forest, Neural Networks) machine learning methods. We compare these approaches in their ability to predict non-response to CFTR modulators, balancing the trade-off between model complexity and interpretability—a key consideration for clinical application.

Our findings provide insights into best practices for microbiome-based predictive modeling in CF and offer methodological guidance on selecting appropriate data transformations and machine learning frameworks for biomarker discovery in high-dimensional biological datasets.



posters-wednesday-ETH: 21

On microbiome data analysis using Bayesian method under the assumption of a zero-inflated model

Yuki Ando, Asanao Shimokawa

Tokyo University of Science, Japan

Background / Introduction
The data on the abundance of microbial groups is called microbiome data. One of the purposes of analysing microbiome data is to compare the abundance of microbes in the bodies of test subjects with different conditions. There are two main characteristics of microbiome data: firstly, the abundance is discrete, and secondly, there are an excessive number of zeros in the data. However, in order to make it possible to compare between test subjects with different total abundances of microbes, the abundance is sometimes converted into a proportion. This is the method we will use in this study. In this case, the abundance takes a continuous value in the range from 0 to 1. The zero-inflated beta model is the most commonly used population distribution for abundance. This is a distribution in which the abundance follows a beta distribution with a certain probability, and takes 0 in other cases. Furthermore, Chen and Li (2016) stated that the probability that the abundance follows a beta distribution and the parameters of the beta distribution are expressed by a logistic regression model with the covariates of the subjects as explanatory variables. We will examine the method of estimating the parameters in this model.

Methods
The parameters of the zero-inflated beta model are currently estimated using the maximum likelihood method. However, since it is not possible to obtain the maximum likelihood estimate analytically, we will use an iterative calculation method such as the EM algorithm in combination. One drawback of this method is that it cannot obtain good estimates when the sample size of the microbiome data is small, i.e., when the number of subjects is small. Therefore, we consider estimating the parameters using Bayesian methods.

Results
We applied the maximum likelihood method and Bayesian methods to simulation data and compared the obtained estimates. We found that the Bayesian method worked well in situations with small sample sizes.

Conclusion
We dealt with parameter estimation when assuming a zero-inflated beta model for microbiome data.
We found that it is recommended to use the Bayesian method rather than the maximum likelihood method for microbiome data with a small sample size.

Reference
Chen E.Z. and Li H. 2016. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32 (17): 2611–2617.



posters-wednesday-ETH: 22

Estimating prevalence of cystic fibrosis heterozygosity using Fast Boltzmann Inversion (FBI): An improved Monte Carlo algorithm for Bayesian inference

Jan Brink Valentin

Danish Center for Health Services Research, Department of Clinical Medicine, Aalborg University, Denmark

Background: Monte Carlo sampling of probabilistic potentials is often biased if the density of state is not properly managed. Bias is further induced when applying shrinkage priors to avoid overfitting. Boltzmann inversion (BI) provides a generic sampling scheme to avoid such bias. However, because of the iterative nature of BI, the algorithm is often intractable. In this study, we developed a fast Boltzmann inversion (FBI) algorithm with the same computational complexity as the standard Metropolis-Hastings (MH) algorithm and applied the method for estimating the heterozygous carrier prevalence of Cystic fibrosis (CF).

Case: CF is a rare genetic disease which have a massive impact on the patients’ health, daily living and overall survival. The disease is inherited from both parents, and in Denmark children have been screened for CF at birth since 2016. While the incidence rate can be estimated using patient registers, the heterozygous carrier prevalence is not easily found.

Method: We applied a simple two parameter probabilistic model for the probability of having CF conditioning on being the first-born child in a family. This probability was considered the target distribution, and the model parameters included the proportion of heterozygous carriers. We linked the Danish national patient register with the central person register to estimate the mean and variance of the target distribution. We applied shrinkage priors for the model parameters with low, moderate, and strong shrinkage, to avoid overfitting. The FBI algorithm was then used to estimate the model parameters, and the results were compared to that of the MH algorithm.

Results: Using the register data the target probability was estimated to be 2.38 per 10.000. The MH algorithm with low, moderate and strong shrinkage were biased and missed the target by 0.16, 0.25 and 0.35 per 10.000. The FBI algorithm with low and moderate shrinkage were both on target with less than 0.001 bias and estimated the proportion of heterozygous carriers in the Danish population to be 3.05 percent (SE = 0.54). However, the FBI algorithm with strong shrinkage did not converge. Finally, the FBI algorithm used the same computational time as the MH algorithm.

Conclusion: The FBI algorithm provides an unbiased estimate when applying shrinkage estimators without increasing computational time compared to other Monte Carlo algorithms. In addition, the FBI algorithm reduces the issue of setting the hyper parameters of the prior distributions in a Bayesian context.



posters-wednesday-ETH: 23

A Sparse Graph Representation of Hi-C Data for Colorectal Cancer Prediction

Jiwon Im1, Mingyu Go2, Insu Jang3, Minsu Park1

1Department of Statistics and Data Science, Chungnam National University, Republic of Korea; 2Graduate School of Data Science, KAIST, Republic of Korea; 3Korea Research Institute of Bioscience and Biotechnology, Republic of Korea

Colorectal cancer (CRC) remains a leading cause of cancer-related morbidity and mortality, emphasizing the critical need for early and precise predictive modeling. While advances in genomics and deep learning have enabled computational cancer classification, existing models often face challenges in capturing the complexity of chromatin organization and high-dimensional genomic data.

This study presents a graph-based predictive framework utilizing high-throughput chromosome conformation capture (Hi-C) data from chromosome 18, a region implicated in CRC pathogenesis. The method constructs a sparse weighted graph from chromatin interactions and applies a graph neural network for classification. An optimal bandwidth selection technique removes redundant connections while retaining key genomic relationships to enhance computational efficiency and interpretability.

Experimental evaluations on real-world Hi-C datasets indicate that the proposed approach achieves competitive classification accuracy while improving F1-score and precision-recall performance with reduced training complexity. These findings suggest that sparse graph-based Hi-C analysis may be a useful framework for CRC prediction and contribute to graph representation learning in genomic medicine.

Keywords: Hi-C, graph neural network, sparse graph representation, CRC classification



posters-wednesday-ETH: 24

A multi-state survival model for recurring adenomas in Lynch syndrome individuals

Vanessa García López-Mingo, Veerle Coupé, Marjolein Greuter, Thomas Klausch

Amsterdam UMC, Netherlands, The

Introduction

Lynch syndrome is a genetic condition that predisposes individuals to develop colorectal cancer (CRC). It is characterized by a deficiency in the mismatch repair (MMR) system occurring early in life, leading to increased risk of accumulating DNA damage. Individuals with Lynch develop adenomas, pre-cursor lesions to CRC, in the bowel at a higher rate compared to general population. This necessitates close surveillance of affected individuals by colonoscopy. Although surveillance intervals are short (one to three years), continued surveillance is needed to manage CRC risk throughout life.

Based on surveillance data from Lynch patients, this study aims to estimate the time to repeated non-advanced adenoma (nA) formation and progression to advanced adenomas (AA) or CRC. We develop a novel multi-state survival model that, contrary to available models, handles recurring adenomas that characterize Lynch.

Methods

The model treats the adenoma formation as panel count data, where the occurrence of recurrent adenomas is observed only at a sequence of discrete time intervals (colonoscopies). Specifically, the development of nA is modelled as a Poisson process, but with a modification to account for the delay associated with the occurrence of MMR deficiency around the time of the first nA, incorporated through a Weibull model. Immediately after, a Poisson process for later nAs is initialized. Furthermore, every adenoma is assumed to progress to AA or CRC, where the sojourn time is also modelled Weibull distributed. All sojourn times are regressed on covariates like sex and the affected gene to uncover heterogeneity. A Bayesian Metropolis-within-Gibbs sampler combined with data augmentation for the latent times is employed to estimate the parameters.

Results

In first Monte Carlo simulations, we found good performance of the estimation procedure, with unbiased estimates and good mixing across the chains. Additionally, the coverage percentages of the credible intervals matched the nominal level of 95%. At ISCB we will present details on the application to the Lynch patient data, which is currently under development.

Conclusion

This study presents a novel model for analysing adenoma development in Lynch accounting for the impact of MMR deficiency. By using the combination of the delay on the first adenoma and a Poisson process for the recurrent ones we capture the dynamics of adenoma development in Lynch more accurately than existing multi-state screening models such as “msm” (Jackson, 2011). Developing dedicated models for disorders like Lynch could help improve prevention of CRC in affected groups.



posters-wednesday-ETH: 25

Evaluating Completeness of Data in CPRD’s Breast Cancer Data: Implications for External Controls for Surrogate Endpoints

Dorcas N Kareithi1,3, Jingky Lozano-Kuehne1, David Sinclair2, James Wason1

1Biostatistics Research Group, Newcastle University, United Kingdom; 2Older People and Frailty Policy Research Unit, Newcastle University, United Kingdom; 3Jasiri Cancer Research Foundation Kenya

Background: Registries such as CPRD Aurum and national cancer registries offer valuable sources of observational data that can be used as external or historical controls for cancer clinical trials and other health research. However, an extensive review of evidence has shown that the investigation of alternative measurements from routinely collected data is dependent on access, validity, and completeness of such data. This study evaluates the completeness, patterns and impact of missing data in breast cancer patients using data from CPRD Aurum and Cancer Registration and Treatment datasets.

Methods: We used linked datasets from CPRD Aurum, the Tumor Registration dataset, and the Treatment Characteristics dataset to identify and extract breast cancer cases (ICD-10: C50). Key clinical variables including demographic characteristics, tumour type and size, comorbidity score; tumour screening, tumour treatment, and cancer stage from female patients who were a18 and above in 2005, were analysed. Completeness of data in 6months’ follow up periods post diagnosis from 2005-2024 and patterns of missing data were assessed using descriptive statistics and Little’s MCAR test to determine missingness mechanisms. No imputation methods were applied, as the focus was on understanding completeness and the extent or impact of missingness.

Results: Preliminary findings of 2.9M records from 68,613 participants who fit our inclusion and exclusion criteria indicate high completeness (>90%) in most demographic characteristics, most observation event dates except hospitalisation date, high (>90%) completeness in most tumour stage and characteristics except for PR and ER scores, high (>90%) completeness in most tumour treatment variables, and average (>60%) completeness of quality-of-life variables. Preliminary time-to-event analyses suggest that incomplete data used to compute surrogate outcomes, such as the quality-of-life data could affect the derivation, computation and estimation of key established surrogate endpoints such as Disease-Free survival (DFS), Time to Next Treatment (TTNT), Event Free Survival (EFS) and Overall survival.

Conclusion: The preliminary findings highlight the value of registry data for use as external or historical controls for cancer clinical trials and other health research. but caution against potential biases introduced by some of the incomplete data, which may impact clinical interpretations and policy decisions.



posters-wednesday-ETH: 26

Identification of risk factors for the development of de novo malignancies after liver transplantation

Tereza Hakova1, Pavel Taimr1, Tomáš Hucl1, Zdeněk Valenta2

1Dept. of Hepatogastroenterology, Institute of Clinical and Experimental Medicine, Prague, Czechia; 2Dept. of Statistical Modelling, Institute of Computer Science of the Czech Academy of Sciences, Prague, Czechia

Background

De novo malignancies (DNM) are a significant long-term complication in liver transplantation, immunosuppressive therapy being a key contributing factor. While necessary to prevent graft rejection, exposure to immunosuppressants may increase the risk of post-transplant malignancies. Identifying risk factors for DNM is crucial to improving post-transplant management strategies. This study uses the cohort of liver transplant patients aged 18 years and older to study competing risks of the incidence of DNM or death, focusing on the role of cumulative exposure to immunosuppressants. Independent prognostic factors, such as the age of donor/recipient, gender, smoking, diabetes status, etc. were adjusted for in the models. We hypothesised that high cumulative doses of immunosuppressants could correlate with an increased incidence of malignancies, suggesting the need for individualised immunosuppression strategies.

Methods

Retrospective right-censored and left-truncated cohort data on 1,073 liver transplant patients aged 18 years or more were used to study competing risks of the incidence of cancerous disorders and death in the context of immunosuppression following transplantation (TX). We studied the effect of cumulative exposure to several immunosuppressants (Azathioprin, Cyclosporin A, Mycophenolate Mofetil, Prednison, Simulect, Sirolimus and Tacrolimus), adjusted for possible confounders. Cause-specific survival models, incl. the Cox PH model, Aalen’s additive model and its McKeague-Sasieni extension, were employed in analysing the effect of left-truncated time-dependent immunosuppression doses on the right-censored outcomes. Follow-up period was limited to 10 years.

Results

Results for time-to-malignancy data showed that male gender and higher recipient and donor age were associated with elevated hazard of malignancy incidence. Immunosuppression using Mycophenolate, Sirolimus and Tacrolimus was associated with the reduction in the hazard, while Simulect had an adverse effect on malignity incidence.

The age, smoking and male gender of TX recipient had an adverse effect on the hazard of death. Applications of Mycophenolate, Prednison, Sirolimus and Tacrolimus all proved to maintain a protective effect on the incidence of death. The latter drug had a time-varying protective effect, the strongest being for a few months after TX.

Conclusion

Our results bring new insights into immunosuppressive treatment. Inconsistency with published studies may be due to a different methodology and the patient population. This study highlights the importance of monitoring immunosuppressive drug levels and controlling modifiable risk factors, such as smoking, in liver transplant recipients. Understanding the multifactorial nature of post-transplant malignancies can lead to improved patient management.



posters-wednesday-ETH: 27

Are ACE inhibitors associated with increased lung cancer risk, or are unmeasured confounders biasing results?

Sean Maguire, Ruth Keogh, Elizabeth Williamson, John Tazare

London School of Hygiene & Tropical Medicine, United Kingdom

Background: Unmeasured confounding is nearly always a concern in observational studies of treatment effects. However, despite methods being available to assess the potential impact, it is often ignored. We illustrate methods for assessing the impact of unmeasured confounding through a study of ACE inhibitors and ARBs; commonly prescribed drugs for the treatment of high blood pressure. Safety concerns for ACE inhibitors raised by observational study findings of higher lung cancer risks in ACE inhibitor users relative to ARB users, and the inconsistent findings in subsequent observational studies, may have been caused by unmeasured confounding.

Methods: Using data from the Clinical Practice Research Datalink, we identified a cohort of UK adults who initiated an ACE inhibitor or ARB treatment for the first time between 1995 – 2019, and fitted a Cox model for the outcome of lung cancer, with adjustment for a number of measured confounders. A conditional hazard ratio was estimated, accounting for competing events.
E-values were used to quantify the potential impact of unmeasured confounding on the effect estimates. E-values, introduced in 2017 by VanderWeele and Ding, quantify the minimum strength of association an unmeasured confounder (or set of unmeasured confounders) would need to have with ACE inhibitor use and lung cancer incidence, in order to ‘tip’ our results and change our conclusions.
Covariate e-values, introduced by D’Agostino McGowan and Greevy in 2020, were also calculated and contextualise the potential impact of unmeasured confounding in previous ACE inhibitor and lung cancer studies in the literature.

Results: Our cohort contained 984,000 initiators of ACE inhibitor/ARB users. We found no evidence that ACE inhibitor use is associated with increasing lung cancer risk (conditional hazard ratio = 0.997, 95% CI 0.924 – 1.076). Our investigations using e-values show that this result could be easily tipped to a significantly harmful or protective effect. Similar results are found for the previous studies in the literature.

Conclusion: Through use of quantitative bias analyses using e-values we found that it is likely that studies which have reported both protective and harmful effects of ACE inhibitors, are biased due to unmeasured confounding.



posters-wednesday-ETH: 28

DO GENETIC CHANGES IN 15Q13.3 MEAN LOWER IQ SCORE?

Tadas Žvirblis1, Pilar Caro2, Audronė Jakaitienė1, Christian Schaaf2

1Institute of Data Science and Digital Technologies, Vilnius University; 2Institute of Human Genetics, Heidelberg University

Background. Genetic changes affecting the copy number of chromosome 15q13.3 have been associated with a group of rare neurodevelopmental conditions (autism spectrum disorder, epilepsy, schizophrenia, and others) [1]. The critical region contains approximately 10 genes. Treatments are limited and are restricted to targeting the main symptoms rather than the underlying etiology. Not every person harboring a 15q13.3 copy number change will manifest the disease, and the severity and clinical diagnosis are difficult to predict [2]. This represents a significant challenge in modelling and determining health outcomes.

Methods. Multi-center prospective study was conducted to assess cerebral activity and neural network function alterations in individuals with 15q13.3 microdeletion or microduplication. It was planned to enroll 15 subjects for each aberration, as well as 15 healthy subjects. During the study period electrophysiological brain network analysis, IQ testing and detailed genetic analysis were performed for each subject. All subjects provided written informed consent. Study protocol was approved by the Ethics Board of the Medical Faculty of Heidelberg University No. S-212-2023.

Results. Six subjects with genetic changes affecting the copy number of chromosome 15q13.3 were identified during the interim statistical analysis. Five (83.3%) had a deletion of 15q13.3 and 1 (16.7%) duplication. The mean (SD) age was 27.5 (11.36) years; 2 (33.3%) of them were of adolescent age, and half (50.0%) of subjects were male. The mean (SD) IQ score was 76.7 (18.14), and it was statistically significantly lower than the average population IQ score (p = 0.027). The mean (SD) IQ score for the males was slightly higher than that of females: 79.7 (22.03) vs. 73.7 (17.62) for males and females, respectively.

Conclusion. The interim statistical analysis showed that subjects with 15q13.3 microdeletion or 15q13.3 microduplication have lower IQ score then average population.

Funding. This work is part of the EJP RD project “Resolving complex outcomes in 15q13.3 copy number variants using emerging diagnostic and biomarker tools (Resolve 15q13)” No. DLR 01GM2307 and has received funding from EJP RD partner the Research Council of Lithuania (LMTLT) under grant agreement No. S-EJPRD-23-1.

Keywords. rare neurodevelopmental conditions, 15q13.3 microdeletion, 15q13.3 microduplication

References

[1] Gillentine MA, Schaaf CP. The human clinical phenotypes of altered CHRNA7 copy number. Biochem Pharmacol. 2015;97(4):352-62.

[2] Yin J, Chen W, Yang H, Xue M, Schaaf CP. Chrna7 deficient mice manifest no consistent neuropsychiatric and behavioral phenotypes. Sci Rep. 2017 Jan 3;7:39941. PMCID: PMC5206704



posters-wednesday-ETH: 29

Measuring the performance of survival models to personalize treatment choices

Orestis Efthimiou1, Jeroen Hoogland2, Thomas Debray3, Valerie Aponte Ribero1, Wilma Knol4, Huiberdina Koek4, Matthias Schwenkglenks5, Séverine Henrard6, Matthias Egger7, Nicolas Rodondi1, Ian White8

1University of Bern (Switzerland); 2Amsterdam University Medical Centers (The Netherlands); 3Smart Data Analysis and Statistics B.V. (The Netherlands); 4Utrecht University (The Netherlands); 5University of Basel (Switzerland); 6UCLouvain (Belgium); 7University of Bern (Switzerland), University of Bristol (UK), University of Cape Town (South Africa); 8University College London (UK)

Background: Statistical and machine learning algorithms can be used to predict treatment effects at the participant level using data from randomized clinical trials (RCTs). Such predictions can facilitate individualized treatment decisions. Although various methods have been proposed to assess the accuracy of participant-level treatment effect predictions, it remains unclear how they can be applied to survival data.

Methods: We propose new methods to quantify individualized treatment effects for survival (time-to-event) outcomes. First, we describe alternative definitions of participant-level treatment effects for survival outcomes. Next, we summarize existing and introduce new measures to evaluate the performance of models predicting participant-level treatment effects. We explore metrics for assessing discrimination, calibration, and decision accuracy of such predictions. These generic metrics are applicable to both statistical and machine learning models and can be used during model development (e.g., for model selection or internal validation) or when testing models in new settings (e.g., external validation). We illustrate our methods using both simulated data as well as real data from the OPERAM trial, an RCT involving multimorbid older adults randomized to either standard care or a pharmacotherapy optimization intervention. We fit competing statistical and machine learning models and apply our newly developed methods to compare their performance.

Results: Analyses of simulated data demonstrated the utility of our metrics in evaluating the performance of models predicting participant-level treatment effects. Application in OPERAM revealed that the models we developed performed sub-optimally, with moderate-to-poor performance in calibration and poor performance in discrimination and decision accuracy, when predicting individualized treatment effects.

Conclusion: Our methods are applicable for models aimed at predicting participant-level treatment effects for survival outcomes. They are suitable for both statistical and machine learning models and can guide model development, validation, and potential impact on decision making.



posters-wednesday-ETH: 30

A framework for estimating quality adjusted life years using joint models of longitudinal and survival data

Michael Crowther1, Alessandro Gasparini1, Sara Ekberg1, Federico Felizzi2, Elaine Gallagher3, Noman Paracha3

1Red Door Analytics AB, Stockholm, Sweden; 2Department of Computer Science, ETH Zurich, Switzerland; 3Bayer Pharmaceuticals, Basel, Switzerland

Background

Quality of life (QoL) scores are integral in cost-effectiveness analysis, providing a direct quantification of how much time patients spend at different severity levels. There are a variety of statistical challenges with modeling and utilizing QoL data appropriately. QoL data, and other repeatedly measured outcomes such as prostate-specific antigen (PSA), are often treated as time-varying covariates, which only change value when a new measurement is taken - this is biologically implausible. Additionally, such data often exhibits both between and within subject correlations, which must be taken into account, and are associated with survival endpoints. The proposed framework utilizes "progression" or similar intermediate endpoints or biomarkers like EQ-5D, and models them jointly with overall survival, allowing us to directly calculate quality adjusted life years (QALYs).

Methods

Motivated by the prostate cancer trial setting, we simulated data representing repeatedly measured PSA levels, utilities and overall survival. Using numerical integration and the delta method, we then derive analytical estimates of QALYs, differences in QALYs and restricted time horizon QALYs from the estimated multivariate joint model, along with uncertainty.

Results

PSA and utilities were modeled flexibly using linear mixed effects submodels with restricted cubic splines to capture the nonlinear development over follow-up time. An interaction with treatment was also included to allow different trajectories in those treated and those on placebo. Both PSA and utility were linked to survival through their current value and slopes, with a Weibull survival submodel. Treatment was estimated to provide an additional 1.074 QALYs (95% CI: 0.635, 1.513) across a lifetime horizon.

Conclusion

Deriving QALYs from a joint model of longitudinal and survival data accounts for all of the statistical and biological intricacies of the data, providing a more appropriate, and accurate, estimate for use in cost-effectiveness modeling, and hence reducing uncertainty.



posters-wednesday-ETH: 31

Cut Off Determination using Model Derived Estimate in Survival Prediction Model

Jungbok Lee

Asan Medical Center & Univ of Ulsan College of Medicine, South Korea

For practical clinical applications, various prediction models related to disease onset, risk, prognosis, and survival are being developed using EMR or clinical research data. The score generated by these models is used as a measure of risk, often categorized for practical purposes. Determining an appropriate cutoff for score categorization has become a topic of interest. For example, in the case of time-to-event outcomes, the most intuitive method is to identify a cutoff that maximizes the log-rank test statistic. However, the method based on test statistics has the limitation that the cutoff may vary depending on the distribution of the dataset used for model building.

In this study, we present:

1) A phenomenon where the cutoff varies when there are relatively many or few high- or low-risk subjects in the training set.

2) A hazard estimation procedure using a piecewise hazard model and resampling method for survival data.

3) Cutoff criteria for when the hazard rate estimated by the model follows linear, parabolic, cubic, logarithmic, or exponential curves.

4) A proposed resampling procedure to account for variation in the distribution of events, based on the initially estimated cutoff value.

Determining cutoffs based on tests in survival data is dependent on the distribution of scores, censoring rates, and sample size. The method using model-derived estimates can help adjust for these dependencies. The term "optimal" in cutoff determination is limited to the original dataset and the testing method used to identify the cutoff.



posters-wednesday-ETH: 32

A two-step testing approach for comparing time-to-event data under non-proportional hazards

Jonas Brugger1,2, Tim Friede2, Florian Klinglmüller3, Martin Posch1, Franz König1

1Medical University of Vienna, Vienna, Austria; 2University Medical Center Göttingen, Germany; 3Austrian Agency for Health and Food Safety, Vienna, Austria

The log-rank test and the Cox proportional hazards model are commonly used to compare time-to-event data in clinical trials, as they are most powerful under proportional hazards. But there is a loss of power if this assumption is violated, which is the case for some new oncology drugs like immunotherapies. We consider a two-stage test procedure, in which the weighting of the log-rank test statistic depends on a pre-test of the proportional hazards assumption. I.e., depending on the pre-test either the log-rank or an alternative test is used to compare the survival probabilities. We show that if naively implemented this can lead to a substantial inflation of the type-I error rate. To address this, we embed the two-stage test in a permutation
test framework to keep the nominal level alpha. We compare the operating characteristics of the two-stage test with the log-rank test and other tests by clinical trial simulations.



posters-wednesday-ETH: 33

A systematic review of Bayesian survival modelling for extrapolating survival in cost-effectiveness analysis

Farah Erdogan1, Gwénaël Le Teuff1,2

1Oncostat, INSERM U1018, France; 2Department of Biostatistics and Epidemiology, Gustave Roussy, Paris-Saclay University, France

Background: Cost-effectiveness analysis (CEA) aims to evaluate the clinical and economic impact of health interventions. In some settings, such as oncology, CEA requires estimation of long-term benefits in terms of life years. Survival extrapolation is then necessary when clinical trials have limited follow-up data. As highlighted in the NICE Technical Support Document (TSD) 21 (2020), Bayesian approach offers a flexible framework for incorporating external information in survival modelling and addressing uncertainty in survival prediction. This work aims to report how Bayesian is used to incorporate external information for extrapolating long-term survival in CEA. Methods: We conducted a systematic review up to October 2024 to identify both methodological and non-methodological studies using different electronic databases (PubMed, Scopus, ISPOR conference database), completed by handsearching references cited in the automatically identified studies. Results: Of 52 selected studies (77% automatically and 23% manually), 52% were published since 2022 and 90% (n=47) focused on oncology. 52% (n=27) represented articles and 38% (n=20) were methodological works. 87% (n=45) used external data from different sources: clinical, registry, epidemiology, real world data, general population mortality and 17% (n=9) experts elicitation. We classified the studies into four non-mutually exclusive categories of Bayesian modelling (C1-C4). The first three categories combine, in order of increasing complexity, survival modelling and Bayesian formulation for incorporating external information. C1 (27%, n=14) includes standard parametric models (SMPs: exponential, Weibull, Gompertz, lognormal, log-logistic, generalized gamma distributions) with prior of parameters informed by historical data. C2 (48%, n=25) includes (i) a Bayesian multiple parameter evidence synthesis that allows to combine trial and external data, (ii) joint modelling of progression-free survival and overall survival, and (iii) non-SPMs (e.g., mixture and cure models). C3 (27%, n=14) groups complex hazard regression models (e.g., poly-hazard, relative survival) incorporating disease-specific and general population mortality with predominantly non-informative priors distributions on parameters. The last category (C4, 13%, n=7) represents Bayesian model averaging that weights predictions of different survival models by posterior model probabilities to address structural uncertainty. Conclusion: This review highlights the broad spectrum of Bayesian survival models and the different ways to incorporate external information, resulting in reduced uncertainty in survival extrapolation. Future research should focus on comparing these methods to identify the most suitable approaches given the intervention mechanisms and external data availability. This will help to standardize the use of Bayesian statistics for survival extrapolation and provide guidance, as proposed in the NICE TSD 14 on survival model selection procedures.



posters-wednesday-ETH: 34

Power comparison of hazard ratio versus restricted mean survival time in the presence of cure

Ronald Bertus Geskus1,2

1Oxford University Clinical Research Unit, Vietnam; 2University of Oxford, United Kingdom

Background: The log-rank test and the Cox proportional hazards model lose power with non-proportional or crossing hazards. A large simulation study did not show consistent superior performance of restricted mean survival time (RMST) over the log-rank test in such settings [2]. That study did not consider the presence of cure, nor the presence of an independent predictor of survival.
In a randomized controlled trial (RCT) investigating the effect of dexamethasone on survival in patients with tubercolous meningitis (TBM), the hazard ratio was the primary effect measure [1]. Baseline MRC grade strongly affected 12-month survival. Testing for difference in RMST gave lower p-value than the hazard ratio: 0.14 versus 0.22, and 0.075 versus 0.21 when correcting for MRC grade. We performed a simulation study to investigate gain in power of RMST.

Methods: For each scenario we simulated 3000 data sets from Weibull distributions with two treatment arms and a three-level categorical predictor of survival representing MRC grade. Weibull parameters were estimated based on the RCT, after exclusion of the 12-month survivors. Sample size including survivors was 700. We also considered approximate scenarios assuming proportional hazards in the non-survivors; note that proportionality is lost once the survivors are included. Models with and without interaction with MRC grade were fitted. In an additional scenario we generated data with divergent survival curves, then converging at 12 months, and mortality between 50% and 100%.

Results: All numbers refer to power, computed as the percentage of simulation runs that gave p-value below 0.05 for the test of treatment effect. With parameters according to the TBM data set, RMST outperforms the hazard ratio (43% versus 36%). Further improvement is seen with adjustment for MRC grade (51% versus 33%). Similar results are observed with data generated assuming proportional hazards for the non-survivors. The test for non-proportionality has power between 10% and 30%. In the additional scenario with 50% mortality, proportional hazards had much lower power than RMST (36% versus 98%), while power was similar with 100% mortality.

Conclusion: Relative performance of proportional hazards versus RMST strongly depends on the shape of the survival curve and the presence of cure.

References:
[1] Donovan et al., Adjunctive dexamethasone for tuberculous meningitis in hiv-positive
adults, New England Journal of Medicine 389 (2023), 1357-1367.

[2] Dormuth et al., A comparative study to alternatives to the log-rank test,
Contemporary Clinical Trials 128 (2023).



posters-wednesday-ETH: 35

Sample Size Calculation in Prognostic Studies: A Comparative Analysis

Gloria Brigiari, Ester Rosa, Giulia Lorenzoni, Dario Gregori

Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Italy

Introduction
In classical survival analysis, sample size estimation is typically based on risk differences or hazard ratios (HR) between patient groups. While widely used, these methods have limitations, such as assuming group independence, proportional hazards, and neglecting side-variables. To address these challenges, alternative approaches, such as those proposed by Riley et al. (2019, 2021), focus on model precision and the inclusion of covariates. However, there is no guarantee that these methods will lead to a singular conclusion on the required sample size. This study aims to evaluate the performance of traditional HR-based methods and Riley’s precision-focused approach through sensitivity analysis and Monte Carlo simulations. The goal is to identify the ideal sample size and assess how the inclusion of covariates impacts model performance.

Methods
We conducted a sensitivity analysis using Monte Carlo simulations to compare classical HR-based sample size estimation methods with Riley’s model precision approach. Simulations were run based on historical data, focusing on proportional hazards and the inclusion of multiple covariates. Once the appropriate sample size was determined, Riley’s methodology was applied to evaluate the number of predictors that could be included in the model without overfitting. The analysis used a shrinkage factor of 0.9 to balance model complexity and accuracy. Finally, with the aim of assessing whether the calculated sample size allows for the generalizability of a previously developed model, a simulation-based method was applied to estimate the achieved precision, in terms of calibration, based on the given sample size.

Results
Traditional methods struggled to capture model complexity and did not consider relevant covariates effectively. In contrast, Riley’s method allowed for the inclusion of more covariates while maintaining statistical robustness. The application of Riley’s methodology revealed that the number of predictors that could be included without overfitting depended on the desired model accuracy metrics. External validation approach confirmed the adequacy of the calculated sample size, achieving good calibration and predictive accuracy of the model.

Conclusion
This study highlights the limitations of traditional HR-based methods and demonstrates the advantages of the proposed approach, which prioritizes model precision and avoids overfitting. By allowing the inclusion of additional covariates without sacrificing power, this methodology offers a flexible and reliable framework for sample size estimation and model development in prognostic studies.



posters-wednesday-ETH: 36

Prognostic Score Adjustment in a Two-Slope Mixed Effects Model to Estimate Treatment Effects on eGFR Slope in CKD Patients

Silke Janitza1, Maike Ahrens2, Sebastian Voss2, Bohdana Ratitch3, Nicole Rethemeier4, Meike Brinker4, Paula Vesterinen5, Antigoni Elefsinioti1

1Bayer AG, Germany; 2Chrestos GmbH, Essen, Germany; 3Bayer Inc., Mississauga, Ontario, Canada; 4Bayer AG, Wuppertal, Germany; 5Bayer AG, Espoo, Finland

Background: The CHMP recently recognized the estimated glomerular filtration rate (eGFR) slope as a validated surrogate endpoint for clinical trials of treatments for chronic kidney disease (CKD). A common method for analysis of this endpoint is a two-slope linear spline mixed effects model (Vonesh et al., 2019). This model can serve as the primary analysis in future CKD trials with the option to adjust for baseline covariates, e.g., sodium-glucose cotransporter-2 inhibitor (SGLT2i) use and urinary albumin-to-creatinine ratio (UACR). Following a CHMP Qualification Opinion on prognostic covariate adjustment, we explore the potential benefits of integrating a prognostic score in the two-slope model using a historical database from two large CKD phase III studies FIDELIO-DKD and FIGARO-DKD.

Methods: Using the FIGARO-DKD study, we developed prognostic score models via random forest methodology, focusing on patients receiving placebo. These models included approximately 60 baseline covariates. We conducted extensive simulations based on FIDELIO-DKD to assess potential precision gains in treatment effect estimates from including a prognostic score obtained for each participant as a prediction from an aforementioned prognostic model.

Results: Pseudo simulations from FIDELIO-DKD indicated that integrating the prognostic score into a two-slope model without other covariates yielded moderate precision gains. When compared to a model, which included SGLT2i use and UACR category, the additional precision gains from including the prognostic score were reduced.

Conclusion: While prognostic score adjustment can enhance efficiency in clinical trials, it has primarily been studied within classical linear models. This work explores prognostic score adjustment to a more complex model, illustrating how sponsors can utilize historical data for pseudo simulations to evaluate the utility of prognostic score adjustments in future trials. Based on our historical studies, our findings from pseudo simulations suggest that incorporating a prognostic score in addition to other key baseline covariates (such as SGLT2i use and UACR category) may not yield substantial additional efficiency in estimating treatment effects.

Literature

Vonesh E, et al. Mixed-effects models for slope-based endpoints in clinical trials of chronic kidney disease. Stat Med. 2019;38(22):4218-4239.

European Medicines Agency. Qualification opinion for Prognostic Covariate Adjustment (PROCOVA™). Committee for Medicinal Products for Human Use (CHMP). 2022.

European Medicines Agency. Qualification opinion for GFR Slope as a Validated Surrogate Endpoint for RCT in CKD. Committee for Medicinal Products for Human Use (CHMP). 2023.



posters-wednesday-ETH: 37

Comparative effectiveness of ACE inhibitors and angiotensin receptor blockers to prevent or delay dementia: a target trial emulation

Marie-Laure Charpignon1, Max Sunog2, Colin Magdamo2, Bella Vakulenko-Lagun3, Ioanna Tzoulaki4, Sudeshna Das2, Deborah Blacker2, Mark Albers2

1Kaiser Permanente and UC Berkeley, United States of America; 2Mass General Brigham, United States of America; 3Haifa University, Israel; 4Imperial College London, United Kingdom

Alzheimer’s disease, the most common type of dementia, affects 6.7 million Americans and costs $345B annually. Since disease-modifying therapies are limited, repurposing FDA-approved drugs may offer an alternative, expedited path to preventing dementia. Hypertension is a major risk factor for dementia onset. However, prior observational studies contrasting antihypertensive drug classes (Angiotensin Converting Enzyme inhibitors: ACEI, Angiotensin Receptor Blockers: ARB, and Calcium Channel Blockers: CCB), provided mixed results.

We hypothesize that ACEI have an off-target pathogenic mechanism. To test this assumption, we emulate a target trial comparing patients initiating ACEI vs ARB using electronic health records from the US Research Patient Data Registry. We perform intention-to-treat analyses among 25,507 patients aged 50 and over, applying Inverse Propensity score of Treatment Weighting to balance the two treatment arms and accounting for the competing risk of death.

In a cause-specific Cox Proportional Hazards (PH) model, the hazard of dementia onset was higher in ACEI vs ARB initiators (HR=1.10 [95% CI: 1.01-1.21]). Findings were robust to outcome model structures (ie, Cox PH vs nonparametric) and generalized to patients with no hypertension diagnosis at initiation but receiving such drugs for another indication (e.g., heart failure).

Ongoing work includes evaluating differential effects by brain penetrance, discovering subgroups of responders, and assessing the mediating role of blood pressure (BP) control with ACEI vs ARB. Future research will incorporate longitudinal markers (e.g., BP, HbA1c, LDL) in time-to-event models and consider stroke incidence or recurrence under ACEI vs ARB initiation as a mediator.



posters-wednesday-ETH: 38

Optimal utility-based design of phase II/phase III programmes with different type of endpoints in the setting of multiple myeloma

Haotian Wang1, Peter Kimani1, Michael Grayling2, Josephine Khan2, Nigel Stallard1

1Warwick Clinical Trials Unit, United Kingdom; 2Johnson & Johnson Innovative Medicine, United Kingdom

Background:

High failure rates in phase III oncology trials, often due to overoptimistic assumptions based on limited phase II information, highlight the significant costs and risks associated with drug development. This underscores the importance of approaches that effectively link phase II and phase III trials, balancing resource allocation and decision-making to ensure phase III trials are appropriately powered to optimise success rates.

Method:

We propose a novel method to determine the optimal phase II sample size that maximizes overall utility of the successful programme. The method evaluates go/no-go decision criteria between phase II and phase III based on phase II outcomes including strategy of choosing the optimal go/no-go threshold, calculating the expected phase III sample size, and ensuring the desired power for the entire programme. Existing methods1 enable optimal designs when the same time-to-event endpoint is used in both phase II and phase III. But in practice, survival data are often not reliably observed in phase II. Our method allows binary outcome data obtained from phase II to inform the sample size calculation for the phase III trial that will use a correlated time-to-event endpoint.

Results:

The proposed method is illustrated by application in multiple myeloma, using achieving minimal residual disease as the endpoint in phase II and progression free survival (PFS) as the endpoint in phase III. With initial parameters set according to MAIA trial2 , we found the optimal utility and corresponding optimal phase II sample size. We also did sensitivity analysis under different scenarios based on the change of response and treatment related parameters, the value of the go/no-go decision threshold, the prior distribution of response rate and utility-related parameters such as benefits obtained after approval. Our method would provide the optimal design and also give an expected utility of the whole phase II and phase III programme.

Reference:

1. Kirchner, M., Kieser, M., Götte, H. & Schüler, A. Utility-based optimization of phase II/III programs. Stat. Med. 35, 305–316 (2016).

2. Facon, T. et al. Daratumumab plus Lenalidomide and Dexamethasone for Untreated Myeloma. N. Engl. J. Med. 380, 2104–2115 (2019).



posters-wednesday-ETH: 39

Beyond first events: Advancing recurrent adverse event estimates in clinical research.

Nicolas Sauvageot, Leen Slaets, Anirban Mitra, Zoe Craig, Jane Gilbert, Lilla Di Scala, Stefan Englert

Johnson & johnson, Switzerland

Safety analyses of adverse events (AEs) are critical for evaluating the benefit-risk profile of therapies; however, these analyses often rely on simplistic estimators that fail to fully capture the complexity present in safety data. The SAVVY consortium, a collaboration between pharmaceutical companies and academic institutions, aims to improve the estimation of the probability of observing the first AE by time t, using survival techniques appropriately dealing with varying follow-up times and competing events (CEs). Through simulation studies1 and a meta-analysis2, the project demonstrated that common methods for estimating the probability of first events such as incidence proportions, Kaplan–Meier (KM) estimators, and incidence densities often fail to account for important factors like censoring and CEs. It concluded that the Aalen-Johansen estimator is the gold standard when focusing on the first event, providing the most reliable estimates, particularly in the presence of CEs.

Only considering first events does not reflect the real burden that a patient may experience in clinical studies. Nevertheless, usual safety reporting and existing research predominantly focuses on the first AE, overlooking the recurrent nature of AEs. Recognizing that both first and subsequent events provide a more accurate representation of safety profiles, there is a clear need to describe both first- and recurrent-AEs in safety reporting.

The objective of this work is to identify appropriate methods for analyzing recurrent AEs in the presence of varying follow-up times and CEs. To achieve this, we perform a simulation study within a recurrent event framework to compare several estimators quantifying the average number of events per subject over time, including:

  • Event Rate
  • Exposure Adjusted Event Rate (EAER)
  • Mean Cumulative Count (MCC) without accounting for CEs
  • MCC accounting for CEs3

Our simulations evaluate the performance of these methods regarding bias and examine the impact of various trial characteristics such as the proportion of censoring, the amount of CEs, the AE rate, and the evaluation time point. We illustrate and further strengthen the simulation-based results using real clinical trial data.

References:

1: Stegherr R et al. Estimating and comparing adverse event probabilities in the presence of varying follow-up times and competing events.Pharm Stat.2021Nov;20(6):1125-1146.

2: Rufibach, K et al. Survival analysis for AdVerse events with VarYing follow-up times (SAVVY): summary of findings and assessment of existing guidelines.Trials 25,353(2024).

3: Dong H et al. Estimating the burden of recurrent events in the presence of competing risks: the method of mean cumulative count.Am J Epidemiol.2015Apr1;181(7):532-40.