Conference Agenda

Session

Poster Exhibition: M / Monday posters at ETH

Time:

Monday, 25/Aug/2025:

1:00pm - 2:00pm

Location: ETH, UG hall

ETH, -1 / UG floor poster area

Presentations

posters-monday-ETH: 2

Cardio-metabolic traits and its socioeconomic differentials among school children including MONW phenotypes in India: A baseline characteristics of LEAP-C cohort

Kalaivani Mani¹, Chitralok Hemraj¹, Varhlunchhungi Varhlunchhungi¹, Lakshmy Ramakrishnan¹, Sumit Malhotra¹, Sanjeev Kumar Gupta¹, Raman Kumar Marwaha², Ransi Ann Abraham¹, Monika Arora³, Tina Rawal⁴, Maroof Ahmad Khan¹, Aditi Sinha¹, Nikhil Tandon¹

¹All India Institute of Medical Sciences, Delhi, India; ²International Life sciences Institute, Delhi, India; ³Public Health Foundation of India, Delhi, India; ⁴HRIDAY, Delhi, India

Background

Cardio-metabolic risks emerge in early life and are transmitted into adult life. Further, these risks may have aggravated due to worsening food security and diet quality during the pandemic. We aimed to assess the prevalence of cardiometabolic traits including the metabolically obese normal weight phenotype and socioeconomic differentials in children and adolescents aged 6-19 years in India.

Methods

A baseline assessment was conducted between August 17, 2022, and December 20, 2022, as part of a school-based cohort study that aimed at longitudinally evaluating the anthropometric and metabolic parameters among urban children and adolescents aged 6-19 years from three public schools and two private schools in India. Private and public schools were considered a proxy for higher and lower socioeconomic status respectively. Blood pressure and blood samples in a fasting state were obtained only from adolescents. The prevalence along with its 95% confidence interval using Clopper exact method and adjusted prevalence ratios was calculated using random-effects logistic regression models.

Results

Among the 3,888 students (aged 6–19 years) recruited, 1,985 were from public schools and 1,903 from private schools. The prevalence of underweight was 4.95% (95% CI 1·25-12·72), significantly higher in public schools (p<0.0001), while general obesity (13.41% (95% CI 2·98-33·87)) and central obesity (9.15% (95% CI 1·40-27·44)) were significantly higher in private schools (adjusted PR = 4.42 and 8.31, respectively). Hypertension prevalence (7.37% (95% CI 6·44-8·38)) was similar across schools, but impaired fasting glucose (adjusted PR = 2.37) and metabolic syndrome (adjusted PR = 3.51) were more common in private schools. Among 2,160 adolescents, 67.73% had a normal BMI, with a 42.86% (95% CI 30·79-55·59) prevalence of the metabolically obese normal weight (MONW) phenotype, higher in public (46.39%) than private (35.33%) schools (p=0.0742). Low HDL-C was the most common MONW abnormality (41.74%), significantly more prevalent in public schools (62.12% vs. 52.73%, p=0.0393).

Conclusion

Effective implementation of food security measures and targeted initiatives will be crucial to mitigate the socio-economic and gender disparities associated with the growing burden of cardiometabolic traits. Metabolic obesity among phenotypically normal or underweight adolescents should not be overlooked but intervened early through novel screening criteria to prevent future cardiovascular burden. These findings also have implications for low-income and middle-income countries like India undergoing nutritional transition where socioeconomic status strongly influences cardio-metabolic traits.

posters-monday-ETH: 3

External validation of SMART2 model for recurrent cardiovascular risk

Jasper Wilhelmus Adrianus van Egeraat, Nan van Geloven, Hendrikus van Os

LUMC, Netherlands, The

Background

Assessing performance of prediction models in external data is important before use in medical practice. In real medical data sets, this may be challenged by several data complexities, including censoring, competing events and missing data. For example, when using routine electronic health records, the missing at random (MAR) property required for multiple imputation is often violated, possibly leading to inaccurate performance metrics.

This work illustrates how the combined challenges of censoring, competing events and missing data were addressed when evaluating the predictive performance of the SMART2 prediction model. The SMART2 prediction model can identify individuals at high risk of recurrent atherosclerotic cardiovascular diseases.

Methods

Electronic health records from the Extramural LUMC Academic Network were used to derive routine clinical data from patients registered between January 2010 and December 2021 in the greater Leiden-The Hague region of the Netherlands. Individuals were included if they had been hospitalized for cardiovascular disease. The outcome was the first recurrent occurrence of a composite of non-fatal myocardial infarction, non-fatal stroke, and vascular death within 10 years.

Calibration plots and observed/expected (OE) ratios were determined. Censoring and competing events were incorporated in the observed outcome proportion with the Aalen-Johansen estimator. Discrimination was determined between subjects who developed the primary event before 10 years and those who did not experience any event by 10 years, applying inverse probability of censoring weights.

Missing variables were handled using multiple imputation with chained equations. Longitudinal measurements were used to improve imputation of the measurements used at the prediction moment. To account for possible missingness not at random, a sensitivity analysis was performed by delta-scaling the imputed values after each iteration, mimicking various degrees of missingness not at random.

Results

Out of the 15,561 included patients, 2,257 patients suffered a recurrent cardiovascular event and 2,098 had a competing event. The median follow-up time was 6.07 years. The AUC_t was 0.62 (95%CI: 0.60–0.64) and the OE ratio was 0.97 (95%CI: 0.93–1.02).

Discrimination was robust under various delta-scaling parameters. Assuming unobserved predictors were overestimated by the imputation model, scaling imputed values downward by 10% every iteration, resulted in an AUC_t of 0.62 (95%CI: 0.60–0.64). The OE ratio changed to 1.01 (95%CI: 0.96-1.05).

Conclusions

In this real-world analysis challenged by censoring, competing evens and missing data, we showed the feasibility of testing robustness of predictive performance assessment under varying degrees of missingness not at random.

posters-monday-ETH: 4

Multi-disease risk models to target concomitant diseases and their interactions: Insights on cardio-renal-metabolic syndrome in England

Stelios Boulitsakis Logothetis¹, Niels Peek², Angela Wood¹

¹British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, United Kingdom; ²THIS Institute (The Healthcare Improvement Studies Institute), University of Cambridge, United Kingdom

Introduction

Clinical risk prediction models are used to identify patients at high risk of disease onset. However, most existing approaches only focus on a single disease, ignoring clusters of conditions with shared pathophysiology and common treatments. Accounting for these relationships could support better disease prevention and health outcomes.

This study develops multi-disease models to jointly predict cardiovascular disease, chronic kidney disease, and metabolic disorders like diabetes. These conditions, collectively termed cardio-renal-metabolic syndrome, share risk factors and intervention effects and are significant contributors to premature mortality. We aim to extract insights about disease progression in the English population and lay the foundations for future individualised multi-disease prediction models.

Methods

We modelled disease progression as a state transition process, fitting a multi-state model to predict 5-year incident cardiovascular disease (CVD) and chronic kidney disease (CKD), with diabetes as a risk factor and death as a competing risk. State transition intensities were jointly estimated using Cox proportional hazards sub-models.

We extracted a novel dataset of electronic health records spanning the entire adult population of England from NHS databases, including diagnoses, laboratory measurements, and treatments. Missing data were multiply imputed, and we ensured congeniality with the multi-state model by including non-parametric state probabilities in the imputation. To support computational feasibility, we discretised and coarsened the time scale and restricted to a curated set of well-established risk predictors.

Results

We identified 394,555 cases of concomitant CVD and CKD among the 48.65 million eligible adults. The incidence of CKD following a CVD diagnosis was approximately twice that of CVD following a CKD diagnosis (24.73 vs. 12.85 per 1000 person-years). The Cox models achieved an average concordance index of 0.882 across imputations. Nearly all predictors were significantly associated with every state transition. The strongest predictor was smoking, with hazard ratios ranging from 2.14-2.69.

Conclusion

We demonstrated how cardio-renal-metabolic syndrome can be jointly modelled at a national scale. Next, we will experimentally evaluate this model’s individual-level predictions and develop more granular multi-state models that include additional clinically relevant intermediate states. The optimisations required for model fitting suggest that classical approaches are reaching their computational limits. Future work will explore machine learning methods to better leverage whole-population electronic health records and their wide range of risk predictors.

posters-monday-ETH: 5

Machine learning methods for analyzing longitudinal health data streams: A comparative study

Inês Sousa

Universidade do Minho, Portugal

Chronic kidney disease (CKD) is characterized by kidney damage or an estimated glomerular filtration rate (eGFR) of less than 60 ml/min per 1.73 square meters for three months or more. The performance of six tree-based machine learning models - Decision Trees, Random Forests, Bagging, Boosting, Very Fast Decision Tree (VFDT), and Concept-adapting Very Fast Decision Tree (CVFDT)- are evaluated on longitudinal health data. Longitudinal data, where individuals are measured repeatedly over time, provide an opportunity to predict future trajectories using dynamic predictions that incorporate the entire historical dataset. These predictions are essential for real-time decision-making processes in healthcare. The dataset comprised 406 kidney transplant patients, spanning from January 21, 1983, to August 16, 2000. It captures 120 time points over the first 119 days post-transplant, including baseline glomerular filtration rates (GFR), along with three static variables: weight, age, and gender. Data preprocessing involved robust imputation techniques to handle missing data, ensuring consistency and trend accuracy. The models were trained to predict health outcomes starting from the eight-day post-transplant, progressively incorporating daily values to predict subsequent days up to day 119. Model performance was evaluated using mean squared error (MSE) and mean absolute error (MAE) through data partitioning and cross-validation techniques.

posters-monday-ETH: 6

Evaluating the fairness of a clinical prediction model for outcomes following psychological treatment in the UK’s National Health Service

Nour Kanso¹, Thalia C. Eley¹, Ewan Carr²

¹Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; ²Department of Biostatistics and Health Informatics, Institute of Psychology, Psychiatry and Neuroscience, King’s College London, London, UK

Background
Depression and anxiety are common psychiatric conditions that significantly affect individuals’ well-being. The UK NHS Talking Therapies programme delivers evidence-based psychological treatments to over a million patients annually, but outcomes are heterogeneous; only half achieve clinical definitions of recovery. Stratified care involves predicting outcomes using patient characteristics to identify individuals who may need adapted or alternative treatments. However, the fairness, accuracy, and generalisability of such prediction models across sociodemographic subgroups remain underexplored. This study evaluates the stability and performance of an existing clinical prediction model for outcomes following treatment across gender, employment status, ethnicity, age, and sexuality.

Methods
We evaluated an existing clinical prediction model across sociodemographic subgroups to assess prediction stability and performance variations. Outcomes included reliable improvement in depression (PHQ-9) and anxiety (GAD-7), defined as a change from baseline to the end of treatment exceeding the measurement error of the scale (6 points for depression; 4 for anxiety). Predictors included age, gender, ethnicity, religion, language proficiency, employment, sexuality, long-term condition, disability, medication, prior referrals, diagnosis, and symptom severity. Stability was assessed using bootstrapping (200 iterations), where the model was repeatedly trained on resamples of the dataset and tested within sociodemographic subgroups. Sample size calculations suggested a minimum of 1,788 participants per subgroup, assuming 50% prevalence and a c-statistic of 0.7. Performance was evaluated across subgroups based on calibration and prediction instability.

Results
The analytical sample (n = 30,999) was predominantly female (73%) with a median age of 34, and had an ethnic composition including 57% White, and 22% Black, Black British, Caribbean, or African. In the full sample, the model demonstrated good discrimination (depression AUC: 0.76, anxiety: 0.75) and calibration (intercept/slope: -0.00/0.99 (depression), -0.02/1.03 (anxiety)). We observed differences in performance and stability across subgroups. Model calibration and stability were higher for women, whereas the model tended to underestimate outcome probabilities for men. The model also underestimated the probability of reliable improvement for unemployed and retired individuals, especially at the extremes of the probability range. Our full results will present differences by ethnicity, age, and sexuality.

Conclusion
No study to date has explored the fairness of clinical prediction models for psychological therapy in the UK NHS. Our study addresses major gaps in understanding predictive performance across sociodemographic subgroups within UK NHS Talking Therapies. By evaluating fairness, accuracy, and stability, findings will inform model refinements, supporting equitable and reliable treatment recommendations.

posters-monday-ETH: 7

Multiple Imputation vs. Machine Learning for Handling Missing Data in Prediction Modelling: Which Best Balances Stability, Performance, and Computational Efficiency?

Pakpoom Wongyikul¹, Phichayut Phinyo¹, Noraworn jirattikanwong¹, Natthanaphop Isaradech², Wachiranun Sirikul², Arintaya Phrommintikul³

¹Department of Biomedical Informatics and Clinical Epidemiology (BioCE), Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; ²Department of Community Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; ³Division of Cardiology, Department of Internal Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand

Background: Missing data is a common challenge in clinical prediction modelling. Multiple Imputation with chained equation (MICE) remains the main approach but is computationally intensive and adds complexity. Recent evidence suggests that simpler machine learning-based methods may perform just as well. This study compares MICE and machine learning-based approaches for handling missing data in terms of prediction stability, performance, and computational time to identify the most balanced approach.

Methods: A real-world dataset of 8,245 patients, previously used to develop a clinical prediction model for major adverse cardiovascular events, was utilised. We then generated nine datasets to represent different missing data scenarios, varying by missing variable type (categorical, continuous, or mixed) and missing proportion (20%, 40%, or 60%). All missing data were assumed to be missing at random (MAR). Four methods to handle missing data were evaluated: (1) MICE, (2) random forest (RF), (3) k-nearest neighbor (kNN), and (4) complete case analysis (CCA). Performance and stability were evaluated using the bootstrap internal validation procedure according to Riley and Collins. Model performance was assessed with optimism-corrected area under the curve (AUC) and calibration slopes, while stability was measured using mean absolute prediction error (MAPE). Bootstrapping time was also recorded and compared.

Results: With 20% missing data, RF, MICE, and kNN showed comparable AUC and MAPE, though kNN exhibited poorer calibration. As missing data increased, all methods except CCA maintained similar AUC, but prediction stability declined, particularly for mixed variable types. Across all scenarios, MICE performed best overall, followed by RF. While kNN produced stable predictions with high AUC, significant miscalibration persisted in most cases, except when 20%–40% of continuous data was missing. In terms of computational efficiency, MICE was the most intensive, taking two to three times longer than RF and kNN.

Conclusions: Provided the development sample size is sufficiently large, RF is preferred for its balance of predictive performance, stability, and computational efficiency. If computational time is not a constraint (e.g., with access to high-performance computing), MICE is recommended, followed by RF. Otherwise, kNN may be a suitable alternative when missing data are continuous and below 40%. Finally, CCA should be avoided in all cases.

posters-monday-ETH: 8

Estimating rates of dental treatment and unmet dental need in a spatially explicit model in children in England, 2016-2018

Beatrice Catherine Downing

University of Bristol

Registries and the extensive collection of linked data has led to extraordinary advances in our understanding of disease dynamics and optimal resource allocation. However, this requires accompanying investment, collaboration and continuity at multiple levels over many years. In an imperfect world, unlinked data and aggregate counts are more readily available. With proper communication of the uncertainty, aggregate unlinked data from different sources can be used to estimate and validate the prevalence of disease and the scale of unmet clinical need. Here we use publicly available data on the number of dental procedures in children and a signifier of unmet need - the number of hospitalisations for tooth extraction - to estimate relative rates of dental ill-health and to identify areas in England with relatively high levels of unmet need given the level of background deprivation. We used Bayesian hierarchical spatial models to allow for spatial correlation between neighbouring areas, bringing together dental procedures at fine scales and hospital extractions at coarse scales. We demonstrate the power of modelling spatial relationships in systems where both service provision and wider determinants of health show spatial structuring.

posters-monday-ETH: 9

Statistical Approach to Assess the Impact of Hospital Settings on Optimal Staffing Levels

Diana Trutschel¹, Maryam Ahmadi Shad¹, Michael Ketzer¹, Jack Kuipers², Giusi Moffa³, Michael Simon¹

¹Department of Public Health, Institute of Nursing Science, University Basel, Switzerland; ²Department of Biosystems Science and Engineering, ETH Zürich, Switzerland; ³Department of Mathematics and Computer Science, University of Basel, Switzerland

Background:

Optimal hospital staffing, often measured by the patient-to-nurse ratio (PNR), is critical to healthcare quality and patient outcomes. Variations in PNR are driven by factors originating from both the patient and nursing side of the ratio. Understanding the extent to which these factors influence PNR is essential for designing effective strategies to achieve and sustain optimal staffing levels. Identifying the relative contributions of these influences can guide decision-making by highlighting the potential impact of adjustments within the healthcare setting.

Methods:

The distribution of PNR was derived through theoretical modeling, incorporating the relationship between the number of patients and available nursing staff, and approximate real-world PNRs. Simulations were conducted to explore the impact of key variables such as planned staffing schemes and staff absence rates (80, 85, 90, 95%) representing various healthcare settings presented by unit size (20, 30, 40 beds) and occupancy rate (70, 80, 90%). These simulations estimated the proportion of days with overstaffing and understaffing by calculating the area under the PNR distribution curve for values outside a predefined optimal PNR range. This approach enabled the quantification of deviations from optimal staffing levels across diverse scenarios, providing insights into the sensitivity of PNR to changes in system parameters.

Results:

The simulation results indicate that most common staffing configurations exhibit a high risk of understaffing compared to standard PNR schemes. In a 20-bed unit with a nurse absence rate of 80%, more than 50% of hospital days show overstaffing for PNR values of 6 or higher, whereas more than 50% show understaffing for PNR values of 4 or lower. The findings further demonstrate that variations in staffing plans and nurse absence rates affect the proportion of over- and understaffed days. Smaller units (e.g., 20 beds) are more prone to overstaffing, with nurse absence rates having a more significant influence on overstaffing variability than larger units (e.g., 30 beds).

Discussion:

This study highlights the importance of understanding the PNR dynamics in hospital staffing. By deriving the theoretical distribution of PNR and simulating different settings, we approximated the proportion of overstaffed and understaffed days. The results emphasize the sensitivity of PNR to fluctuations in patient volume and nursing availability, underscoring the need for adaptive staffing strategies. This approach allows the evaluation of staffing policies, offering insights for optimizing resource allocation.

posters-monday-ETH: 10

Real-time predictions of bed occupancy in hospitals.

Ensor Rafael Palacios, Theresa Smith

University of Bath, United Kingdom

Increased demand for hospital resources has led to bed occupancy which often approaches and exceeds maximum capacity. Even relatively short periods (e.g., a few days) of elevated bed occupancy can have immediate negative impact at all levels of an hospital service chain, including the number of ambulances available, their response times, and the quality and number of discharges. Predicting periods of high demand, with time horizons up to one or two weeks, is thus of critical operational importance, as it enables hospital managers to proactively initiate adaptive strategies. Here we develop a predictive state-space model of bed occupancy, designed to be deployed within hospitals in real time to support adaptive decision making. We develop and test the model using daily data from two large hospitals in Bristol, United Kingdom. These data include information about bed occupancy itself, admissions, discharges, staffing level and other hospital-level variables; we additionally include information about seasonal infectious diseases (e.g. flu) and weather (e.g., temperature). We benchmark the model against different alternatives, including naive and ARIMA models (with and without covariates) and random forests. For model comparison, we consider multiple loss functions to ensure accurate predictions of different, expert-derived aspects of the data, such as sudden peaks in change occupancy. The next steps involve further validation of the model and testing in an operational setting.

posters-monday-ETH: 11

Positive and negative predictive values of diagnostic tests using area under the curve

Kanae Takahashi¹, Kouji Yamamoto²

¹Osaka Metropolitan University Graduate School of Medicine, Japan; ²Yokohama City University School of Medicine, Japan

In medicine, diagnostic tests are important for the early detection and treatment of disease. The positive predictive value (PPV) and the negative predictive value (NPV) describe how well a test predicts abnormality. The PPV represents the probability of disease when the diagnostic test result is positive, while the NPV represents the probability of no disease when the diagnostic test result is negative. These predictive values inform clinicians and patients about the probability that the diagnostic test will give the correct diagnosis. Compared to sensitivity and specificity, the predictive values are more patient focused and often more relevant in patient cases.

However, the predictive values observed in one study do not apply universally because these values depend on the prevalence. In order to overcome the shortcoming, in this study, we proposed a measure of positive and negative predictive values using area under the curve (PPV-AUC and NPV-AUC). In addition, we provided a method for computing confidence intervals of PPV-AUC and NPV-AUC based on the central limit theorem and delta-method.

A simulation study was conducted to investigate the coverage probabilities of the proposed confidence intervals. Simulation results showed that the coverage probabilities of 95% confidence intervals were close to 0.95 when the sample size was large.

posters-monday-ETH: 12

Freely accessible software for recruitment prediction and recruitment monitoring: Is it necessary?

Philip Heesen, Manuela Ott, Katarina Zatkova, Malgorzata Roos

University of Zurich, Switzerland

Background:
Scientific studies require an adequate number of observations for statistical analyses. The ability of a study to successfully collect the required number of observations ultimately depends on a realistic study design based on accurate recruitment predictions. Inaccurate recruitment predictions inevitably lead to inappropriately designed studies, small sample sizes and unreliable statistical inference, increasing the risk of study discontinuation and wasted funding. To realistically predict recruitment, researchers need free access to statistical methods implemented in user-friendly, well-documented software.
Methods:
A recent systematic review assessed the availability of software implementations for predicting and monitoring recruitment.
Results:
This systematic review demonstrated that freely accessible software for recruitment predictions is currently difficult to obtain. Although several software implementations exist, only a small fraction is freely accessible. Ultimately, only one article provided a link to directly applicable free open-source software, but other links were outdated.
Conclusion:
To improve access for researchers worldwide, we propose three measures: First, future authors could increase the findability of their software by explicitly mentioning it in titles, abstracts and keywords. Second, they could make their software available online on open access platforms. Finally, they could provide user-friendly documentation and instructive examples on how to use the statistical methods implemented in their software in applications. In the long term, it could become standard practice to use such software for insightful recruitment predictions and realistic decision making. Such realistic decisions would increase the chance that studies are appropriately designed, adequately powered, and successfully completed, thereby optimising the use of limited funding resources and supporting scientific progress worldwide.

posters-monday-ETH: 13

On moderation in a Bayesian log-contrast compositional model with a total. Interaction between extreme temperatures and pollutants on mortality

Germá Coenders^1,2, Javier Palarea-Albadalejo³, Marc Saez^1,2, Maria A. Barceló³

¹Research Group on Statistics, Econometrics and Health (GRECS), University of Girona, Spain; ²Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP). Instituto de Salud Carlos III, Madrid, Spain; ³Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Spain

Introduction: Compositional regression models with a dependent real variable can be specified as log-contrast models with a zero-sum constraint on the model coefficients. Moreover, the Bayesian approach to model fitting, through the Integrated Nested Laplace Approximation (INLA) method, is gaining increasing popularity to deal with complex data structures such as spatiotemporal observations.

Methods: In this work, we combine these elements and extend the approach to encompass both total effects, formally defined in a T-space, and moderation or interaction effects into the data modelling. The interpretation of the results is formulated both, in the original scale of the dependent variable and in terms of elasticities.

An illustrative case study is presented aimed at relating all-cause mortality with the interaction between extreme temperatures, air pollution composition, and total air pollution in Catalonia, Spain, during the summer of 2022.

Results: The results show that extreme temperature, exposure to total pollution and to some pollutants in particular (ozone and particulate matter), allowing for some delay in their effect, were associated with increased risk of dying. Also, again considering delayed effects, the mortality risk particularly increased on days of extreme temperatures and greater exposure to ozone.

Conclusions: When assessing the effects of extreme temperatures on mortality, the effects of composition and total pollution, and not just individual pollutants, as well as possible interactions, must be taken into account.

posters-monday-ETH: 14

Graphical inference in nonparametric hypothesis testing: A two-dimensional grid framework for analysis and visualization

Lubomír Štěpánek^1,2, Ondřej Vít², Lubomír Seif²

¹First Faculty of Medicine of Charles University (Czech Republic); ²Faculty of Informatics and Statstics of Prague University of Economics and Business (Czech Republic)

Background / Introduction: Nonparametric tests often utilize intuitive concepts such as ranking of observations or assessing pre-post changes. While these tests, including the Mann-Whitney test and signed-rank tests, offer numerical precision, they can also be interpreted graphically. Though graphical techniques cannot replace numerical calculations, they enhance comprehension of test logic and may lead to practical heuristic formulations.
Methods: This study revisits graphical inference testing for selected nonparametric tests, including both two-sample and paired tests. The graphical testing approach transforms test statistic construction into orthogonal directional changes on a two-dimensional finite-step grid. The graphical pathway depends on what the test statistics emphasize in the observations. For two-sample tests, both the ranking distribution and sample affiliation changes matter, whereas, for paired tests, the sequence of positive and negative pre-post changes is critical. These changes are represented as unit steps in orthogonal directions on the grid. Under the null hypothesis of no difference, graphical pathways exhibit an almost regular alternation of grid directions, which follow a binomial distribution and can thus be analyzed within a probability framework. As a novel contribution, we apply Popoviciu’s inequality to derive an upper bound on the probability of observing data contradicting the null hypothesis to the same or a greater extent, thereby estimating the p-value and offering insights into the statistical power of the test.
Results: We developed R functionality for computing and visualizing two-dimensional grids used in graphical inference testing. The grids highlight regions corresponding to typical null hypothesis rejection scenarios. In particular, the grids accommodate asymmetric null hypotheses for the signed-rank test by upper-bounding directional "traffic" maxima. Various simulations were conducted, evaluating different sample pairs and pre-post scenarios to demonstrate the method's applicability.
Conclusion: Graphical inference testing provides an alternative perspective on nonparametric hypothesis testing, fostering better understanding and serving educational purposes. The developed R functionality for graphical testing will soon be integrated into an R package, expanding accessibility and usability for statistical analysis and instruction.

posters-monday-ETH: 15

On regression analysis of interval-valued data based on order statistics

Ryo Mizushima, Asanao Shimokawa

Tokyo University of Science / Japan

Background / Introduction:

Some of today's diverse data may be given as interval values, such as blood pressure, instead of point values. Interval values can also be used to summarise point-valued data by certain characteristics. For example, the temperature at a certain point in time is given as a point value, but the temperature throughout the day can be described as a minimum and maximum temperature. Most studies in regression analysis of interval-valued data have been proposed based on methods using midpoint and width information. However, those methods rarely consider the information in the interval. In this study, therefore, we consider the case where the upper and lower values of the objective variable and essentially the number of individuals in between are known. An example would be a hospital with 100 patients, where some numerical information on their health status is available, but only the maximum and minimum values are known among them from a privacy point of view.

Methods:

We propose a model that takes into account the information in the intervals of the objective variable. The proposed method assumes that the values in the interval are generated based on a certain distribution and aims to estimate the distribution of the objective variable under a given set of explanatory variables. To this end, the upper and lower sides of the objective variable are considered as the maximum and minimum values of the order statistics and the number of observations whose values are not known in the interval is assumed to be known. The maximum and minimum values and the number of observations between them give the conditional probability density function of the objective variable given the explanatory variables from the nature of the order statistic. It is used as a likelihood function to obtain a maximum likelihood estimator of the parameters of the distribution. The estimator can give approximate confidence intervals for the parameters.

Results:

We checked through simulations the behaviour of parameter estimators and approximate confidence intervals under finite samples when the sample size and the number of objective variables in the interval are varied. The results show that they can be successfully estimated under several conditions.

Conclusion:

We proposed a method of regression analysis using a likelihood function obtained from the information in the interval of the objective variable, considering the maximum and minimum values of the interval as order statistics.

posters-monday-ETH: 16

Two-sided Beyesian simultaneous credible bands in linear regression model

Fei Yang

University of Manchester, United Kingdom

Credible bands, which comprise a series of credible intervals for each component of a parameter vector, are frequently employed to visualize estimation uncertainty in Bayesian statistics. Unlike the often-used pointwise credible interval, simultaneous credible bands (SCBs) can cover the entire parameter vector of interest with an asymptotic probability of at least 1-α.

In this study, in order to assess where lies the true model x^T θ from which the observed data have been generated, we propose the two-sided 1-α level Bayesian simultaneous credible bands for the regression line x^T θ over a finite interval of the covariate x in a simple linear regression model. By incorporating the prior information, the proposed method exhibits advantages over the traditional frequentist approach in more robust and stable estimates, especially in cases with limited data.

Using non-informative priors, we analyze the posterior distribution of targeted parameters of interest and employ Monte-Carlo simulations to produce the critical constant related to the construction of Bayesian SCBs.

Simulation results show that the proposed methodology has highly satisfactory frequentist properties. Additionally, it meets the required false-positive rate with a pre-specified level of certainty. Real data analysis in drug stability studies also verify its effectiveness of the proposed framework.

posters-monday-ETH: 17

Cell composition analysis with unmeasured confounding

Amber Huybrechts^1,2, Koen Van den Berge², Sanne Roels², Oliver Dukes¹

¹Ghent University, Belgium; ²Janssen Pharmaceutica, Belgium

Analysis of single-cell sequencing data, in particular cell abundance data where one counts the number of cells detected for each cell type in each sample, involves handling data compositionality. Indeed, cell composition data contain only relative information on a cell type’s abundance. An increase in one cell type might therefore also be reflected as a decrease in other cell types’ abundance. This makes estimating causal disease effects in cell composition data challenging, especially in the presence of confounders. On top of that, not all confounders might be observed.

Existing methods like CATE [1] and RUV-4 [2] attempt to obtain unbiased disease or treatment effects by estimating the unmeasured confounders using factor analysis and making assumptions on sparsity and the existence of negative controls. However, it is uncertain how these methods perform in the context of cell composition analysis, where in addition to the compositionality, the number of features is smaller in comparison to the settings where these methods are generally used (e.g. gene expression analysis with thousands of genes).

In this work, we investigate how we can account for compositionality and unmeasured confounders when assessing differences in cell type abundance between biological conditions. We find that a vanilla factor analysis model, typically used for estimating unmeasured confounders, is unsuitable in the context compositional data, and evaluate alternative approaches.

[1] Jingshu Wang, Qingyuan Zhao, Trevor Hastie, Art B. Owen, "Confounder adjustment in multiple hypothesis testing", The Annals of Statistics, Ann. Statist. 45(5), 1863-1894, (October 2017)

[2] Johann A. Gagnon-Bartsch, Laurent Jacob, Terence P. Speed “Removing Unwanted Variation from High Dimensional Data with Negative Controls”, Berkeley University of California (December 2013)

posters-monday-ETH: 18

Temporal transcriptomic analysis of microexon alternative splicing in mouse neurodevelopmental genes

Jimin Kim, Kwanghoon Cho, Jahyun Yun, Dayeon Kang

Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Korea, Republic of (South Korea)

Background: Alternative splicing plays a pivotal role in gene regulation, particularly within neurological processes. Microexons, short exon sequences ranging from 3 to 27 base pairs, are highly neuron-specific and fine-tune protein interactions within synaptic networks. Dysregulation of microexon splicing has been linked to impaired neuronal connectivity and altered synaptic function, which are hallmarks of neurodevelopmental disorders. Understanding the dynamic regulation of microexon splicing across developmental stages is crucial for identifying potential biomarkers and therapeutic targets.

Methods: We investigated the temporal dynamics of microexon splicing by analysing whole-cortex RNA sequencing (RNA-seq) data from mice across eleven developmental stages, spanning embryonic, postnatal, and ageing periods. We focused on microexons under 30 base pairs, using the Percent Spliced In (PSI) metric to assess alternative splicing patterns. Our analysis centred on genes involved in neural function and neurodevelopmental disorders to explore the role of microexons in neuronal maturation and synaptic function.

Results: We identified distinct stage-specific microexon splicing patterns in several genes, highlighting the complexity of microexon regulation during cortical development. During early embryonic stages (E10–E16), low PSI values were observed for genes involved in neurogenesis and axon guidance, such as Nrcam and Robo1. Nrcam showed a gradual increase in PSI during embryogenesis, whereas Robo1 exhibited a decline from embryonic to postnatal stages, reflecting their roles in neuronal connectivity and circuit stabilisation, respectively. In postnatal stages, Shank3 and Dlgap1 showed significant PSI increases, indicating their involvement in synaptic maturation and plasticity. Conversely, Bin1 displayed a decline in PSI during maturation and ageing, suggesting a shift from synaptic plasticity to stability.

Conclusion: This study demonstrates the importance of microexons in neural development and their potential contribution to neurodevelopmental disorders. The stage-specific PSI variations indicate that microexons are crucial for neural circuit formation, synaptic plasticity, and functional specialisation. The observed co-regulation patterns suggest that microexon splicing is tightly regulated, orchestrating key neurodevelopmental events. Future research into the regulatory mechanisms governing microexon splicing will be essential to understanding their broader biological implications and therapeutic potential.

posters-monday-ETH: 19

A Robust Method for Accurate Reconstruction of 3D Genome Conformation from Hi-C Data

Insu Jang^1,2, MInsu Park¹

¹Department of Information and Statistics, Chungnam National Unversity, Korea, Republic of (South Korea); ²Korea Research Institute of Bioscience and Biotechnology, Korea, Republic of (South Korea)

The three-dimensional (3D) organization of the genome within the cell nucleus plays a pivotal role in critical biological processes, including transcriptional regulation, DNA replication, and repair. Disruptions to this spatial organization, such as aberrant chromatin looping or genomic deletions, are linked to various diseases. Despite its significance, resolving the 3D genome architecture has been historically challenging due to the lack of techniques for high-resolution chromatin mapping. The advent of Chromosome Conformation Capture (3C) technologies, particularly Hi-C, revolutionized this field by enabling genome-wide quantification of chromatin interactions. Hi-C produces a contact count map, providing interaction frequencies between genomic loci, which serves as the basis for computational 3D genome reconstruction. However, deriving biologically meaningful 3D structures from Hi-C data remains computationally challenging due to noise and chromatin complexity. To overcome these challenges, we propose a novel, robust methodology combining Thin Plate Spline (TPS) and Non-Metric Multi-Dimensional Scaling (nMDS), specifically designed to infer smooth and biologically plausible 3D genomic structures while being resilient to noise. Our method was rigorously evaluated on simulated datasets encompassing diverse sized structures with varying levels of noise, as well as on real Hi-C data from the IMR90 cell line. Comparative assessments using simulation datasets demonstrated that our approach consistently produced robust and smoother results under varying noise conditions, outperforming existing models in handling varying levels of noise. Furthermore, its predictive validity was substantiated through comparisons with 111 replicate conformations derived from Multiplexed Fluorescence in situ hybridization (M-FISH) imaging, providing strong empirical support for the method and its applications in 3D genome analysis.

posters-monday-ETH: 20

Balancing Accuracy, Clinical Utility, and Explainability: A Machine Learning Approach to Prostate Cancer Prediction

Luis Mariano Esteban^1,2, Rocío Aznar^1,3, Angel Borque-Fernando^4,5, Alejandro Camón⁵, Patricia Guerrero⁵

¹Escuela Universitaria Politécnica de la Almunia, Universidad de Zaragoza, Spain; ²Institute for Biocomputation and Physics of Complex Systems (BIFI), Spain; ³Instituto Tecnológico de Aragón, Spain; ⁴Department of Urology, Miguel Servet University Hospital, Spain; ⁵Health Research Institute of Aragon Foundation, Spain

Background

Advances in mathematical modelling have significantly improved cancer diagnosis. While these models enhance predictive performance, typically measured by discriminative power, they often overlook their role as classification tools. Recently, greater emphasis has been placed on their clinical utility and explainability, highlighting the need for models that balance accuracy with interpretability. Tools such as clinical utility curves and Shapley values can help achieve this balance.

Methodology and results

We analysed data from 86,359 patients at Miguel Servet University Hospital, Zaragoza, Spain (2017–2022) with at least one PSA measurement, including 2,391 prostate cancer diagnoses, to develop a predictive model for PCa. From their clinical records, we selected approximately 50 demographic and clinical variables as candidate predictors, including PSA, free PSA, PSA history, blood analysis parameters, and comorbidities. Several machine learning models were tested, including logistic regression, ridge regression, LASSO, elastic net, classification trees, random forest, neural networks, and Extreme Gradient Boosting (XGBoost). Model performance was validated using an external dataset of 47,284 patients from the Lozano Blesa University Hospital.

XGBoost demonstrated the best discrimination in the validation cohort, with an AUC of 0.965, sensitivity of 0.904, and specificity of 0.914. More importantly, it also showed the highest clinical utility. For a cutoff that resulted in a 5% diagnostic loss in the training dataset, the validation dataset showed a 7.87% loss while recommending biopsy for 11.1% of patients. In comparison, a screening policy of biopsying all patients with PSA > 3 would result in 15.3%.

To assess variable influence within the XGBoost model, we used SHAP values (SHapley Additive exPlanations), a game theory-based method for evaluating feature importance in predictive models. SHAP values indicate the contribution of each variable for each individual and can be analysed collectively or individually. In our analysis, PSA was the most influential risk factor, producing the highest Shapley values. Protective factors included older age, multiple PSA readings between 3.2 and 8 with negative biopsies, and prolonged use of antihypertensives, statins, or antidiabetics. Conversely, a previous negative biopsy with ASAP or PIN was a notable risk factor.

Conclusions

This study developed a predictive tool for prostate cancer with high accuracy while minimising unnecessary biopsies. As screening protocols remain unstandardised in Spain, it is crucial to explore alternative strategies that incorporate models capable of reflecting variable importance and clinical utility before implementation.

posters-monday-ETH: 21

Development of a thyroid cancer recurrence prediction calculator: A regression approach

Jiaxu Zeng

University of Otago, New Zealand

Background

The thyroid cancer staging calculator has been recognised as one of the most efficient tools for assisting clinicians in making clinical treatment decisions. However, the current calculator is missing patients’ serum Thyroglobulin information, which is crucial for staging cancer patients in practice. The primary aim of this study is to update current calculator with serum thyroglobulin included based on the tertiary thyroid cancer service database from Australia.

Methods

Records from 3962 thyroid patients were analysed for training a logistic model for predicting recurrence. Twelve predictive variables were chosen under close guidance of thyroid cancer specialists, which includes age at operation, sex, number of carcinomas presented in the operation, size of the greatest tumour, histologic type of carcinoma, extrathyroidal extension status of tumours, pathologic staging of the primary tumour, presence of venous invasion of the primary tumour, immunohistochemistry for the primary tumour, presence of extranodal spread, number of lymph nodes and serum thyroglobulin level presented in the scans.

Results

The strongest predictors were number of lymph nodes, histologic type of carcinoma and most importantly, the serum thyroglobulin level. The model demonstrated excellent performance with an AUC of 0.874.

Conclusions

This study has addressed an important concern that serum thyroglobulin information was not used to predict thyroid cancer recurrent in practice.

posters-monday-ETH: 22

A comparison of methods for modelling multi-state cancer progression using screening data with censoring after intervention

Eddymurphy U. Akwiwu, Veerle M.H. Coupé, Johannes Berkhof, Thomas Klausch

Amsterdam UMC, Amsterdam, The Netherlands

Background: Optimizing cancer screening and surveillance frequency requires accurate information on parameters such as sojourn time and cancer risk from pre-malignant lesions. These parameters can be estimated using multi-state cancer models applied to screening or surveillance data. Although multi-state model methods exist, their performance has not been thoroughly investigated, specifically not in the common setting where cancer precursors are treated upon detection so that the transition to cancer is prevented. Our main goal is understanding the performance of available multi-state methods in this challenging censoring setting.

Methods: Six methods implemented in R software packages (msm, msm with a phase-type model, cthmm, smms, BayesTSM, and hmm) were compared. We assumed commonly used time-independent (i.e., exponential) or time-dependent (i.e., Weibull) progression hazards between consecutive health states in a three-state model (healthy, HE; cancer precursor; cancer) in simulation studies. Bias, empirical standard error (ESE), and root mean squared error (rMSE) of progression risk estimates were compared across methods. The methods were illustrated using surveillance data from 734 individuals at increased risk of colorectal cancer, classified into three health states: HE, non-advanced adenoma (nAA), and advanced neoplasia (AN). Age was used as the time scale in the analysis, with both the risks estimates of developing nAA from HE and AN after the onset of nAA compared across the methods.

Results: All methods performed well with time-independent progression hazards in simulation study. With time-dependent hazards, only the packages smms and BayesTSM provided unbiased risk estimates with low ESE and rMSE. In the application (median follow-up: 6 years), 447 (65%), 208 (28.3%) and 49 (6.7%) individuals were classified as HE, nAA and AN, respectively. Only the packages msm, hmm, and BayesTSM yielded converged solutions. The risks estimates of developing nAA from HE were similar between hmm and BayesTSM (e.g., nAA risks estimates at age 30 were approximately zero to 2 decimal places) but differed for the msm package (e.g., nAA risk estimate at age 30 was 16%), while the risks estimates of developing AN after the onset of nAA varied (5-year risk range: 3% to 23%) across methods.

Conclusion: Methods for multi-state cancer models, more specifically with unobservable precursor to cancer transition, are strongly impacted by the time dependency of hazard. Careful consideration is crucial when selecting a method for multi-state cancer models. With more realistic (time-dependent hazard) models, the BayesTSM and smms packages performed accurately. However, BayesTSM outperformed in situations with weakly identifiable likelihoods.

posters-monday-ETH: 23

ADHD and 10–Year Disease Progression from Initiating Pharmacotherapy for Hypertension to Death: A Multistate Modelling Analysis

Yiling Zhou¹, Douwe Postmus¹, Anne van Lammeren², Casper F.M. Franssen¹, Harold Snieder¹, Catharina A. Hartman¹

¹University of Groningen, University Medical Center Groningen, Netherlands, The; ²Expertisecentrum Fier, Leeuwarden, the Netherlands.

Background: Attention-deficit/hyperactivity disorder (ADHD)—the most common neurodevelopmental disorder—affects 2.5% of adults globally and is associated with a 1.5–2-fold increased risk of hypertension, which typically manifests a decade earlier than in the general population. However, this population regarding their cardiovascular health has been largely overlooked in research and clinical practice. This nationwide cohort study aims to investigate 10-year disease trajectories after initiating hypertension pharmacotherapy in adults with ADHD using multistate modelling.

Methods: This nationwide cohort study included adults aged 18–90 years in the Netherlands who initiated hypertension medication between 2013 and 2020, without prior cardiovascular disease (CVD) or chronic kidney disease (CKD). Hypertension was defined as the initial state, critical complications (stroke, heart failure hospitalisation [HHF], acute myocardial infarction, and CKD) as intermediate states, and death from a cardiovascular or renal cause (cardiorenal death) or other causes as final states. Transition rates were estimated using Cox proportional hazards regression, individual-level trajectories were generated via microsimulation, and the effect of ADHD was estimated by comparing outcomes in individuals with ADHD to a counterfactual scenario where individuals were assumed not to have ADHD.

Results: Of 592,362 adults included, 9,728 had ADHD (median age, 45.0 years). Compared to the counterfactual scenario, individuals with ADHD had a higher 10-year risk of cardiorenal death via the HHF pathway (risk difference [95% CI]: 4.8 [2.0–9.2] per 10,000 persons), driven by increased transition risks from hypertension to HHF (14.2 [7.6–26.1] per 10,000 persons), and HHF to cardiorenal death (752.8 [55.7–1517.9] per 10,000 persons). Similarly, individuals with ADHD had an elevated 10-year risk of cardiorenal death via the CKD pathway (2.8 [1.3–7.0] per 10,000 persons), primarily due to an increased transition risk from CKD to cardiorenal death after CKD onset (351.6 [175.7–782.4] per 10,000 persons).

Conclusion: In individuals initiating hypertension medication without pre-existing CVD or CKD, ADHD was associated with a worse 10-year prognosis of hypertension, particularly for the pathways initiated by heart failure and CKD. Our findings indicate the importance of interdisciplinary care and highlight the need for research aimed at preventing heart failure after hypertension onset and optimising heart failure and CKD management in individuals with ADHD.

posters-monday-ETH: 24

Understanding PSA Dynamics: Integrating Longitudinal Trajectories, Testing Patterns, and Disease Progression.

Birzhan Akynkozhayev, Benjamin Christoffersen, Mark Clements

Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Sweden

Background:
The prostate-specific antigen (PSA) test is a widely used, inexpensive test for prostate cancer screening and prognosis. However, its clinical utility remains debated. Examining PSA trajectories over time provides deeper insights into prostate cancer risk and progression. We believe that a critical challenge in such analyses is the observational process: men with higher PSA levels tend to undergo more frequent testing, introducing bias when PSA trajectories are evaluated without adjustment for different follow-up patterns. This study utilises the Stockholm Prostate Cancer Diagnostics Register, which contains PSA measurements from over half a million men living in Stockholm between 2003 and 2023.

Methods:
Longitudinal mixed-effects models were applied to characterise the PSA trajectories. Separate survival models were fitted for time-to-prostate-cancer-diagnosis and the observational process, where time-to-next PSA test was treated as a recurrent event. A full joint model was fitted to incorporate both processes, examining different association structures---including current PSA value, rate of change, and cumulative PSA levels---for their predictive impact on prostate cancer diagnosis and testing behaviour. The survival component of the model incorporated recurrent events (time-to-next-PSA test for the observational process) alongside a terminal event (time-to-diagnosis) while also accounting for delayed entry. Model estimation was facilitated by our recently developed VAJointSurv framework, allowing scalable inference through variational approximations for fast integrations.

Results:
Our findings highlight the importance of accounting for the observational process in PSA testing. Frequent follow-up testing among men with higher PSA values influenced PSA trajectory characterisation and hazard estimates for diagnosis. We found that modelling the observational process and disease progression separately yielded different results compared to a joint approach, which combines and accounts for both processes. These findings indicate that joint modelling may provide a more comprehensive understanding of PSA dynamics and its relationship with disease progression, rather than modelling each process separately.

Conclusions: By jointly modelling PSA trajectories, disease progression, and the observational process, we provide a robust framework for understanding the relationship between these related processes. To our knowledge, this is the largest study of longitudinal and joint PSA modelling. These findings may aid researchers who are exploring PSA trajectories over time. This approach highlights the necessity of adjusting for the observational process to derive accurate assessments of PSA trajectories and prostate cancer risk.

posters-monday-ETH: 25

Joint Modeling for Principal Stratification: Analyzing Stroke Events in CADASIL Patients Across NOTCH3 Variants

Léa Aguilhon, Sophie Tezenas du Montcel, Juliette Ortholand

Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, F-75013, Paris, France, France

Background: Principal Stratification is a statistical framework designed to analyze causal effects by considering subgroups of individuals defined by their potential outcomes, often in the context of mediators or intermediate variables. This method is especially valuable for addressing challenges where treatment effects are influenced by intermediate events, such as competing events, allowing for a more accurate estimation of causal effects. While powerful, principal stratification relies on unobservable counterfactual strata, which raises identifiability challenges, often requiring strong assumptions too. Recent methodological advancements have focused on reducing these assumptions with estimation techniques. In parallel, powerful estimators such as joint models have been developed as effective tools for predicting event outcomes using repeated measures.

Objective: This study uses joint modeling to reduce reliance on untestable assumptions in principal stratification. We applied this approach to study Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy (CADASIL), a genetic disorder that impacts small blood vessels. We analyze stroke occurrence in the presence of death across NOTCH3 variants (1-6 and 7-34).

Method: We analyzed observational follow-up data from 337 CADASIL patients who were followed on average for 5.6 years. We studied the occurrence of the second stroke event (83 observed events) in the context of death truncation (70 observed events) and used functional and cognitive scores to inform both events’ occurrences.

Membership of the main strata was obtained from the predicted counterfactual of death from a Bayesian multivariate joint model (JMBayes2 package). Inverse Probability of Treatment Weighting was used to adjust on covariates as sex, cardiovascular risks, education level, and baseline scores. And Restricted Mean Survival Time (RMST) was then applied to quantify stroke-free survival in the "always-survivor" subpopulation.

Results: Preliminary analysis indicates that carriers of the NOTCH3 1-6 variant have a lower 2-year and 5-year restricted mean stroke-free survival time compared to non-carriers, suggesting an accelerated time to second stroke.

Conclusion: We used joint modeling to estimate the probability of each patient of belonging to the principal strata (always survivor), in context of death truncation. This permits alleviating assumptions necessary for Principal Stratification estimation. Finally, this study provides valuable insights into CADASIL progression and paves the way for the design of future clinical trials.

posters-monday-ETH: 26

Challenges in modelling neuropsychiatric symptoms in early Alzheimer's Disease

Rachid Abbas

F. Hoffman -La Roche, France

The Neuropsychiatric Inventory (NPI) is a structured caregiver-based questionnaire designed to assess and quantify neuropsychiatric symptoms in individuals with various neurodegenerative disorders. NPI is a widely utilized instrument in clinical and research settings to provide a comprehensive evaluation of behavioral and psychological symptoms associated with cognitive impairment. The NPI has demonstrated good reliability and validity across various neurological conditions, such as Alzheimer's disease (AD), frontotemporal dementia, and vascular dementia.

Most of the current AD clinical trials are focused on the early stage of the disease when neuropsychiatric symptoms are rare, but there is a growing interest in the detection of their incidence as this may be viewed as a clinically meaningful hallmark of deterioration of the quality of life. From an analytical perspective, this requires to analyse a continuous variable with an excess of 0, also described as overdispersed data.

Over-dispersed data refers to a situation where the variance of observed data exceeds what would be expected under a theoretical distribution, such as a Poisson distribution. This departure from the assumptions of homogeneity in variance poses challenges in statistical modeling, as it may lead to inefficient parameter estimates and inflated Type I error rates. Many analytical solutions were proposed to tackle such issues and contribute to a more accurate and robust statistical analysis of over-dispersed count data.

In this work, we quantified the benefits in terms of predictive performance and type I error control of various analytical approaches to handle over-dispersed NPI data. Our findings allow us to make evidence-based recommendations on analysis strategies. By optimizing the statistical approach to NPI data analysis, we pave the way for more sensitive and reliable detection of treatment effects on neuropsychiatric symptoms in early-stage AD clinical trials. As we continue to push the boundaries of AD drug development, these methodological advancements will be crucial in unlocking new possibilities for upcoming targeted interventions.

posters-monday-ETH: 27

Network models to decipher the Human exposome : application to food exposome patterns in the general population

Ima Bernada¹, Gregory Nuel², Cécilia Samieri¹

¹INSERM U1219, France; ²LPSM, CNRS 8001

Complex chronic diseases are partly due to the exposome. Some exposures co-occur in usual life and combination of exposures, rather than single factors, contribute to disease development. In co-exposure modeling, most studies use risk scores or dimension reduction approaches. These ignore important features of the dependency structure, like highly connected variables that may play a central role in disease development. Network approaches allow to capture the full complexity of the exposome structure. Our objective was to decipher the food exposome, encompassing both intakes and biological fingerprints, in a large cohort of older persons. We aimed at characterizing diet intake networks, understanding how they may be reflected internally through diet-related metabolites networks, and integrating the two in a bipartite network.

We analyzed a sample of n=311 participants from the 3C-Bordeaux cohort study who answered a dietary survey with assessment of intakes in 32 food groups (n=1730) and provided blood draw for measurement of 143 food-related metabolites (n=375). Using MIIC algorithm based on conditional mutual information, we constructed three co-exposure networks: (i) food co-consumption network; (ii) food-related metabolite network; and (iii) food-to-metabolite bipartite network. To address estimation uncertainty, networks were analyzed through bootstrap replication, using graph theory metrics (e.g. degrees, distances). Obtaining collections of networks by bootstrap replication enabled to quantify the uncertainty of each link. This approach allowed a rigorous analysis of the results. A consensus network was also searched and represented. The networks were also studied through clustering, on one hand by using a priori clusters (e.g. metabolite families), and on the other hand using a node clustering method.

The consensus food co-consumption network reflected the French southwest diets of older person living in Bordeaux in early 2000. A subnetwork centered on potatoes indicated that its consumption was central and closely linked to that of many other foods contained in the traditional south-western diet. The metabolite network showed expected links between metabolites originating from the same food sources or the same biological pathways. Finally, when taking an interest in the links between food components and metabolites, we found expected biological links and more novel associations which warrant further investigations.

Network approaches applied simultaneously to food intakes and food-derived metabolites allow to integrate both external and internal parts of the food exposome in a single statistical framework. This integrated behavioral-biological approach gives novel insights on how environmental exposures such as diet impact biology and health.

posters-monday-ETH: 28

Study brain connectivity changes in Dementia with Lewy Bodies with Functional Conditional Gaussian Graphical Models

Alessia Mapelli^1,2, Laura Carini³, Michela Carlotta Massi², Dario Arnaldi^4,5, Emanuele Di Angelantonio^2,6,7, Francesca Ieva^1,2, Sara Sommariva³

¹MOX – Laboratory for Modeling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Italy; ²HDS– Health Data Science Center, Human Technopole, Italy; ³Università degli studi di Genova, Department of Mathematics, Italy; ⁴Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health (DINOGMI), University of Genoa, Genoa, Italy; ⁵Clinical Neurophysiology Unit, IRCCS Ospedale Policlinico S. Martino, Genoa, Italy; ⁶Blood and Transplant Research Unit in Donor Health and Behaviour, Cambridge, UK; ⁷Dept of Public Health & Primary Care, University of Cambridge, Cambridge, UK

Dementia with Lewy Bodies (DLB) represents the second most common cause of neurodegenerative dementia after Alzheimer's Disease. Alterations in functional connectivity in the brain are possible phenotypic expressions of this disorder. Multivariate time series analysis of signals resulting from electroencephalography (EEG) is extensively used to study the associations between simultaneously recorded signals, and thus quantify functional connectivity at a subject level. Network-Based Statistic is a modern approach used to perform statistical group-level analysis and identify differential connectivity graphs between groups of patients presenting different clinical features . However, current methods fail to distinguish between direct and indirect associations between brain areas and often neglect the impact of confounding factors. We propose a conditional Gaussian graphical model for multivariate random functions to achieve a population-level representation of the conditional dependence of brain functionality captured by EEG, allowing the graph structure to vary with the external variables.

Our method builds on the work of Zhao et al. [1], extending their high-dimensional functional graphical model to account for external variables. In this approach, each node in the graph represents the signal from an EEG electrode. We adopt a neighborhood selection strategy to estimate sparse brain connectivity graphs based on penalized function-on-function regression. Briefly, each node's signal is predicted from the signals of all other nodes using a lasso-based function-on-function regression. External variables (such as phenotype and age) are included in the model as interactions with the signals to capture their influence on brain connectivity. By combining the estimated neighborhoods, we recover the complete graph structure, which can adapt to variations in external factors. The key advantage of this method is its ability to detect differential connectivity changes associated with specific conditions modeling confounder-linked networks for more accurate estimates.

The method was first validated through simulated data mimicking high-density EEG data recorded using a 64-electrode cap during an eyes-closed resting state task. The method was then tested on experimental data demonstrating the capability of the proposed approach in characterizing differences in functional connectivity in DLB patients with different clinical features, including hallucinations, fluctuations, parkinsonism, and REM sleep behavior disorder.

This study introduces a novel conditional graphical model for multivariate random functions that enables more precise modeling of brain connectivity by accounting for conditional relationships and mitigating confounding bias in differential network analysis.

[1] Zhao, Boxin, et al. "High-dimensional functional graphical model structure learning via neighborhood selection approach." Electronic Journal of Statistics 18.1 (2024): 1042-1129.

posters-monday-ETH: 29

LongiSurvSHAP: Explaining Survival Models with Longitudinal Features

Van Tuan NGUYEN, Lucas Ducrot, Agathe Guilloux

Inria, Université Paris Cité, Inserm, HeKA, F-75015 Paris, France

Background Recent developments in survival models integrating longitudinal mea-
surements have significantly improved prognostic algorithm performance (Lee, Yoon, and
Van Der Schaar 2019; Bleistein et al. 2024). However, their complexity often renders
them black boxes, limiting applicability, particularly in critical fields like healthcare.
Regulatory frameworks in the EU and the US now require interpretability tools to ensure
model predictions align with expert reasoning, thereby enhancing reliability (Geller 2023;
Panigutti et al. 2023). Despite this requirement, research on explaining these models re-
mains limited, and existing methods are often constrained to specific architectures (Lee,
Yoon, and Van Der Schaar 2019).

Methods We introduce LongiSurvSHAP, a model-agnostic explanation algorithm de-
signed to interpret any prognostic model based on longitudinal data. While TimeSHAP
(Bento et al. 2021) extends the concept of SHapley Additive exPlanations (SHAP) to
time series classification, we advance this framework to survival analysis, accommodating
irregular measurements of longitudinal features, which is ubiquitous in healthcare.

Results Our algorithm provides both individual and global explanations. Exten-
sive simulations demonstrate LongiSurvSHAP’s effectiveness in detecting key features and
identifying crucial time intervals influencing prognosis. Applied to data from MIMIC
(Johnson et al. 2016), our method aligns with established clinical knowledge, confirming
its utility in real-world healthcare scenarios.

Conclusion We present a novel algorithm that enhances interpretability in survival
analysis by revealing the impact of longitudinal features on survival outcomes.

References

Bento, João, Pedro Saleiro, André F Cruz, Mário AT Figueiredo, and Pedro Bizarro
(2021). “Timeshap: Explaining recurrent models through sequence perturbations”. In:
Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data
mining, pp. 2565–2573.

Bleistein, Linus, Van-Tuan Nguyen, Adeline Fermanian, and Agathe Guilloux (2024).
“Dynamic Survival Analysis with Controlled Latent States”. In: Forty-first Interna-
tional Conference on Machine Learning.

Geller, Jay (2023). “Food and Drug Administration Published Final Guidance on Clinical
Decision Support Software”. In: Journal of Clinical Engineering 48.1, pp. 3–7.
Johnson, Alistair EW et al. (2016). “MIMIC-III, a freely accessible critical care database”.
In: Scientific data 3.1, pp. 1–9.

Lee, Changhee, Jinsung Yoon, and Mihaela Van Der Schaar (2019). “Dynamic-deephit: A
deep learning approach for dynamic survival analysis with competing risks based on
longitudinal data”. In: IEEE Transactions on Biomedical Engineering 67.1, pp. 122–
133.

Panigutti, Cecilia et al. (2023). “The role of explainable AI in the context of the AI
Act”. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and
Transparency, pp. 1139–1150.

posters-monday-ETH: 30

Prognostic Models for Recurrent Event Data

Victoria Watson^1,2, Laura Bonnett², Catrin Tudur-Smith²

¹Phastar, United Kingdom; ²University of Liverpool, Department of Health Data Sciences

Background / Introduction

Prognostic models predict outcome for people with an underlying medical condition. Many conditions are typified by recurrent events such as seizures in epilepsy. Prognostic models for recurrent events can be utilised to predict individual patient risk of disease recurrence or outcome at certain time points.

Methods for analysing recurrent event data are not widely known or applied in research. Most analyses use survival analysis to consider time until the first event, meaning subsequent events are not analysed and key information is lost. An alternative is to analyse the event count using Poisson or Negative Binomial regression. However, this ignores the timing of events. Recurrent event methods analyse both the event count and the timing between events meaning key information is not discarded.

Methods

A systematic review on methodology for analysing recurrent event data in prognostic models was conducted. Results from this review identified methods commonly used in practice to analyse recurrent event data. A simulation study was then conducted which evaluated the most frequently identified methods in the systematic review with respect to the underlying event rate. The event rates were categorised into low, medium and high based on data collected in the systematic review to best represent a variety of chronic conditions or illnesses where recurrent events are typically seen.

Results

The simulation study provided evidence to determine if model choice may be influenced by the underlying event rate in the data. This was assessed by deriving statistics suitable for recurrent event methods to assess the model fit and predictive performance of the recurrent event methods. These statistics were used to determine if certain methods identified tended to perform better than others under different scenarios.

Conclusion

Results from the systematic review and simulation study will be presented including a summary of each method identified. The results will be the first step towards a toolkit for future analysis of recurrent event data.

posters-monday-ETH: 31

Unlocking diagnosis code for longitudinal modeling through representations from large language models

Fabian Kabus¹, Maren Hackenberg¹, Moritz Hess¹, Simon Ging³, Maryam Farhadizadeh², Nadine Binder², Harald Binder¹

¹Institute of Medical Biometry and Statistics (IMBI), Medical Center, University of Freiburg; ²Institute of General Practice/Family Medicine, Medical Center, University of Freiburg; ³Department of Computer Science, Faculty of Engineering, University of Freiburg

Background: In longitudinal data, there often is a multitude of diagnosis codes, such as ICD-10 codes, in particular when considering clinical routine data. Incorporating a large number of codes can be challenging, as treating them as categorical variables in statistical models leads to a large number of parameters, and also one-hot encoding, often used in machine learning, provides no solution to this. In addition, the actual meaning of the diagnoses is not captured. There, large language models might provide a solution, as they capture meaning and can provide alternative numerical representations via their embeddings. We consider such an approach specifically in the context of longitudinal modeling with transformer neural networks.

Methods: We generate embeddings using pre-trained language models and refine them during training in the longitudinal prediction task. Specifically, we compare two embedding strategies, sentence embeddings from SBERT and attention-weighted pooled hidden states from LLaMa, with one-hot encoding as a baseline. Additionally, we investigate different text generation strategies, using either standard ICD-10 descriptions or expanded descriptions generated via prompt-engineered large language models. To evaluate the structure of the learned embeddings, we apply TriMap for dimensionality reduction, assessing whether language-based embeddings capture more coherent relationships between ICD-10 codes.

Results: On a clinical routine dataset, models initialized with language-based embeddings derived from sentence-level representations outperform the one-hot encoding baseline in prediction performance, while embeddings extracted from the larger autoregressive model do not show a consistent improvement. Visualization using TriMap suggests that sentence-level embeddings lead to more coherent clustering of ICD-10 codes, capturing their semantic relationships more effectively. Attention analysis indicates that the transformer utilizes these structured embeddings to enhance prediction performance. Additionally, results suggest that incorporating domain-specific prompt engineering further refines embedding quality, leading to more distinct and clinically informative code representations.

Conclusion: Integrating textual descriptions into ICD-10 embeddings enhances prediction modeling by providing a structured initialization that incorporates domain knowledge upfront. As large language models continue to evolve, this approach allows advancements in language understanding to be leveraged for longitudinal medical modeling.

posters-monday-ETH: 32

Machine Learning Perspectives in Survival Prediction Model Selection: Frequentist vs. Bayesian Approach

Emanuele Koumantakis¹, Valentina Bonuomo¹, Selene Grano², Fausto Castagnetti³, Carlo Gambacorti-Passerini⁴, Massimo Breccia⁵, Maria Cristina Miggiano⁶, Chiara Elena⁷, Matteo Pacilli⁸, Isabella Capodanno⁹, Tamara Intermesoli¹⁰, Monica Bocchia¹¹, Alessandra Iurlo¹², Fabio Ciceri¹³, Fabrizio Pane¹⁴, Federica Sorà¹⁵, Barbara Scappini¹⁶, Angelo Michele Carella¹⁷, Elisabetta Abruzzese¹⁸, Sara Galimberti¹⁹, Sabrina Leonetti Crescenzi²⁰, Marco de Gobbi¹, Giuseppe Saglio¹, Daniela Cilloni¹, Carmen Fava¹, Paola Berchialla¹

¹Department of Clinical and Biological Sciences, University of Torino, Torino, Italy; ²Department of Molecular Biotechnologies and Health Sciences, University of Torino, Torino, Italy; ³Department of Medical and Surgical Sciences, Institute of Hematology "Seragnoli", University of Bologna, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy; ⁴Department of Medicine and Surgery, University Milano-Bicocca, Monza, Italy; ⁵Department of Translational and Precision Medicine, Az. Policlinico Umberto I-Sapienza University, Rome, Italy; ⁶Hematology Department, San Bortolo Hospital, Vicenza U.O.C. di Ematologia, Vicenza, Italy; ⁷U.O.C. Ematologia 1, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; ⁸U.O.C. Ematologia, Grande Ospedale Metropolitano Bianchi-Melacrino-Morelli, Reggio Calabria, Italy; ⁹Hematology, AUSL Reggio Emilia, Reggio Emilia, Italy; ¹⁰Hematology and Bone Marrow Transplant Unit, Azienda Socio-Sanitaria Regionale Papa Giovanni XXIII, Bergamo, Italy; ¹¹Hematology Unit, Azienda Ospedaliera Universitaria Senese, University of Siena, Siena, Italy; ¹²Hematology Division, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy; ¹³Hematology and Bone Marrow Transplantation Unit, IRCCS San Raffaele Hospital, Milan, Italy; ¹⁴Hematology and Hematopoietic Stem Cell Transplant Center, Department of Medicine and Surgery, University of Naples Federico II, Naples, Italy; ¹⁵Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy; ¹⁶Hematology Unit, Azienda Ospedaliero-Universitaria Careggi, Florence, Italy; ¹⁷Hematology and Bone Marrow Transplant Unit, IRCCS Fondazione Casa Sollievo della Sofferenza San Giovanni Rotondo, Foggia, Italy; ¹⁸Department of Hematology S. Eugenio Hospital, Rome, Italy; ¹⁹Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy; ²⁰Division of Hematology, Azienda Ospedaliera San Giovanni Addolorata, Rome, Italy

INTRODUCTION

Predictive model selection remains one of the most challenging and critical tasks in medical statistics, particularly in survival analysis or high–dimensional prediction settings. The Cox proportional hazards model is widely used for its simplicity and interpretability but struggles with high-dimensional data, multicollinearity, and overfitting [1]. Stepwise selection methods, while intuitive, suffer from instability, inflated type I error rates, and a tendency to produce overly optimistic models due to their reliance on multiple hypothesis testing. Alternatives like adaptive Lasso and Bayesian Model Averaging (BMA) incorporate regularization and probabilistic frameworks to improve model performance [2,3]. This study focuses on identifying predictive factors for treatment restart in patients who discontinued tyrosine kinase inhibitor (TKI) therapy, using data from the Italy-TFR longitudinal study.

METHODS

The Italy-TFR study is a multicenter observational study evaluating treatment-free remission (TFR) feasibility in chronic myeloid leukemia (CML). We included patients who achieved deep molecular response, discontinued TKI, and had at least one year of follow-up. Survival analysis considered time from TKI discontinuation to restart or last follow-up. Since different model selection strategies can yield different results, we compared Cox proportional hazards model including the whole set of predictors (considered as baseline model), bidirectional stepwise model selection, Multimodel Inference (MMI), adaptive Lasso, and BMA as predictive models of the risk of restarting treatment.

RESULTS

Among 542 patients from 38 centers, the predictive value of nine independent variables was analyzed. MMI, adaptive Lasso, and BMA identified TKI treatment duration as the most significant predictor of treatment resumption. Stepwise regression, in contrast, selected three variables: duration of therapy, generation of last TKI discontinued, and Sokal Score. The Bayesian Information Criterion (BIC) was lower for MMI, adaptive Lasso, and BMA (2131.242) compared to stepwise regression (2137.84), suggesting better model performance.

CONCLUSIONS

MMI, Adaptive Lasso, and BMA outperformed stepwise regression based on BIC, identifying TKI treatment duration as the most significant predictor. These findings show the advantages of regularization and probabilistic frameworks in improving model stability and interpretability, highlighting their great potential for predictive modeling.

REFERENCES

Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66(3), 411-421.
Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418-1429.
Hoeting et al. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382-417.

posters-monday-ETH: 33

Non-parametric methods for comparing survival functions with censored data: Exhaustive simulation of all possible beyond-observed censoring scenarios and computational analysis

Lubomír Štěpánek^1,2, Ondřej Vít², Lubomír Seif²

¹First Faculty of Medicine of Charles University (Czech Republic); ²Faculty of Informatics and Statstics of Prague University of Economics and Business (Czech Republic)

Background / Introduction: Comparing survival functions, which describe the probability of not experiencing an event by a given time in two groups, is one of the fundamental tasks in survival analysis. Standard methods, such as the log-rank test, Wilcoxon test, and score-rank test of Cox’s proportional hazards model and its variants, may rely on statistical assumptions, including sufficient sample size for asymptotic validity or even proportional hazards. However, these assumptions may not always hold, limiting their applicability. This study introduces a non-parametric alternative for comparing survival functions that minimizes assumptions and offers a direct computation of the p-value.
Methods: Unlike traditional approaches requiring hazard function estimation, our method models all possible scenarios based on observed data, encompassing cases where survival functions differ at least as much as observed. This exhaustive scenario-based modeling enables direct p-value calculation without reliance on asymptotic approximations. Given that censoring introduces additional uncertainty, we address its impact by considering a comprehensive (and often large) set of all potential survival function differences. Due to the computational intensity of enumerating all scenarios (coming from observed censoring), we compare a fully exhaustive computational approach with a Monte Carlo simulation-based method. The performance of these approaches is evaluated against the log-rank test, particularly in terms of Type I error rate and computational efficiency. Additionally, we analyze the asymptotic time complexity of both proposed approaches.
Results: Based on simulation outputs, our method reduces the Type I error rate compared to the log-rank test, making it particularly useful in settings requiring robustness against false positives. The exhaustive approach ensures an exact p-value calculation but is computationally demanding. The Monte Carlo-based approximation significantly improves computational efficiency while maintaining acceptable accuracy, making it a viable alternative for large datasets. Our complexity analysis highlights the trade-offs between computational cost and statistical precision.
Conclusion: The proposed non-parametric method provides an alternative to traditional survival function comparison techniques. A novel aspect of our approach is the calculation of all possible scenarios for censored observations when estimating the counts of survival functions that are at least as different as observed. By directly evaluating all plausible scenarios, it reduces reliance on assumptions while improving Type I error rate control. The Monte Carlo approximation offers a computationally feasible alternative, retaining statistical robustness in practical applications. These findings support the use of assumption-minimized approaches in survival analysis, particularly in studies where conventional methods may be restrictive.

posters-monday-ETH: 34

Tree-based methods for length-biased survival data

Jinwoo Lee¹, Jiyu Sun¹, Donghwan Lee²

¹Integrated Biostatistics Branch, National Cancer Center, Republic of Korea; ²Department of Statistics, Ewha Womans University, Republic of Korea

Background: Left truncation in prevalent cohort studies, where only individuals who have experienced an initiating event (such as disease onset) and survived until study enrollment are observed, leads to length-biased data when the onset follows a stationary Poisson process. Although the existing survival trees and survival forests for left-truncated right-censored (LTRC) data can be applied to estimate survival functions, they may be inefficient for analyzing length-biased right-censored (LBRC) data.

Methods: We proposed tree-based methods for LBRC data by adapting the conditional inference tree (CIT) and forest (CIF) frameworks. Unlike LTRC-based approaches, which use log-rank scores from a conditional likelihood, our methods employed log-rank scores derived from the full likelihood, which is valid under LBRC settings. To improve numerical stability and computational efficiency, we adopted a closed-form cumulative hazard function (CHF) estimator for log-rank scores as an alternative to the nonparametric maximum likelihood estimator.

Results: Simulation studies indicated that LBRC-CIT achieves a higher recovery rate of the true tree structure in LBRC data than conventional LTRC-CIT, with particularly notable benefits in small-sample settings. Under proportional hazards and complex nonlinear LBRC scenarios, LBRC-CIF offers more accurate predictions than LTRC-CIF. We illustrated the application of our methods to the estimation of survivorship using a dataset of lung cancer patients with COPD.

Conclusions: By using full-likelihood-based log-rank scores and a closed-form CHF estimator, our proposed LBRC-CIT and LBRC-CIF methods enhance both statistical efficiency and computational stability for length-biased right-censored data.

posters-monday-ETH: 35

Evaluating different pragmatic approaches for selecting the truncation time of the restricted mean survival time in randomized controlled trials

Léa Orsini^1,2, Andres Cardona², Emmanuel Lesaffre³, David Dejardin², Gwénaël Le Teuff¹

¹Oncostat U1018, Inserm, University Paris-Saclay, Villejuif, France; ²Product Development, Data Sciences, F. Hoffmann-La Roche AG, Basel, Switzerland; ³I-Biostat, KU-Leuven, Leuven, Belgium

Introduction:

The difference in restricted mean survival time between two arms (dRMST) is a meaningful measure of treatment effect in randomized controlled trials (RCTs) for time-to-event data, especially with non-proportional hazards. Choosing the time window [0,τ] is important to avoid any misinterpretation. Correct RMST estimation can be performed up to τ defined as the last follow-up time under a mild condition on the censoring distribution [1]. However, extensive comparisons between the different ways of selecting τ are still needed to address this important choice in practical settings. The objective is to empirically evaluate them through RCTs.

Methods:

Four techniques for choosing τ are evaluated: (a) 90th or 95th percentile of event times, (b) 90th or 95th percentile of follow-up times, (c) largest time with standard error of survival estimate within 5%, 7.5%, or 10%, and (d) minimum of the maximum follow-up times in each arm. τ-RMST estimations were performed using three frequentist methods (Kaplan-Meier estimator, pseudo-observations-based model, and Cox-based model) and two Bayesian methods (non-parametric model with a mixture of Dirichlet processes prior and pseudo-observations-based model), some of them allowing for covariate adjustments. For evaluation, we used three RCTs (IPSOS n=453, IMpower110 n=554, IMpower133 n=403) comparing immunotherapy with chemotherapy in lung cancer, with delayed treatment effects.

Results:

The range of τ calculated from the different techniques exceeded two years for IPSOS and IMpower110, and one year for IMpower133, impacting the Kaplan-Meier-based RMST estimation and its variance. With a delayed treatment effect, higher τ provides higher dRMST estimates with larger variances. Approaches (a) and (b) provide smaller τ often leading to immature conclusions while (d) results in an increased variability that can be mitigated in some cases by adjusting for appropriate covariates. Approach (c) emerged as a good candidate, balancing statistical precision with clinical relevance. All RMST estimators (frequentist and Bayesian) provided similar results.

Conclusion:

There is so far no consensus on defining τ, highlighting the need for clearer guidelines and greater transparency. Ideally, τ should be defined a priori with a clinical rationale. If not, data-driven approaches can be employed. Based on our findings, we recommend the (c) proposal as it ensures sufficient representation of patients at risk. Establishing standardized, clinically relevant practices for defining τ will enhance the applicability and reproducibility of RMST analyses in future research.

[1] Lu Tian et al. On the Empirical Choice of the Time Window for Restricted Mean Survival Time (2020), Biometrics, 76(4): 1157–1166.

posters-monday-ETH: 36

Identifying risk factors for hospital readmission in home-based care: a study from a monographic paediatric cancer centre

Sara Perez-Jaume^1,2, Maria Antònia Colomar-Riutort², Anna Felip-Badia¹, Maria Fabregat¹, Laura Andrés-Zallo¹

¹BiMaU, Sant Joan de Déu Pediatric Cancer Center Barcelona, Spain; ²Department of Basic Clinical Practice, Universitat de Barcelona, Spain

Introduction

Paediatric cancer is a group of rare malignancies that occur in childhood and adolescence. This potentially life-threatening disease often requires aggressive therapies, such as chemotherapy or immunotherapy. The nature of these interventions requires patients to be hospitalised multiple times. In this context, a monographic paediatric cancer centre in the south of Europe initiated a home-based hospitalisation programme for paediatric patients diagnosed with cancer, which potentially offers relevant benefits (enhanced quality of life and reduced economic costs). However, a concern with home-based hospitalisations is the occurrence of adverse events, such as the need for hospital readmission during the hospitalisation at home, which is considered an unfavourable outcome in home-based care. Data from this home hospitalisation programme are available from its foundation in November 2021 until June 2024. The aim of this work is to use these data to identify risk factors for hospital readmission during the home-based hospitalisation.

Methods

The dataset used in this project poses a statistical challenge since patients may be hospitalised at home more than once. Appropriate methods for repeated measures are then required for a proper analysis. Since the outcome of interest is the binary variable "need for hospital readmission during the home hospitalisation", we used Generalized Estimating Equations (GEE) and Generalized Linear Mixed Models (GLMM) with a logit link function (marginal/subject-specific approaches). From these models, we derive the corresponding odds ratios. We applied a variable selection algorithm to identify risk factors for hospital readmission.

Results

Data consist of the 380 home-based hospitalizations from 156 paediatric patients previously diagnosed with cancer included in the home hospitalisation programme. Most patients were male (59%) and the median distance from hospital to the place of home-based hospitalisation was 8 km. Both GEE and GLMM approaches led to a final model with four variables; being three of them significantly associated with the outcome. Among the reasons for the home-based hospitalisation, we found that hydration-intended hospitalisations reduced the odds of hospital readmission compared to the rest of reasons considered. Moreover, lower neutrophil counts increased the odds of hospital readmission. The occurrence of incidences with the intravenous route also increased the odds of hospital readmission.

Conclusion

We identified reason of hospitalisation, neutrophil count and the occurrence of incidences with the intravenous route as risk factors for hospital readmission in the context of home-based care in paediatric oncology, which might influence physicians' decisions about the management of these patients at home.

posters-monday-ETH: 37

A web-application for predicting Serious Adverse Event for guiding the enrollment procedure in Clinical Trials with Machine Learning Methods

Ajsi Kanapari, Corrado Lanera, Dario Gregori

Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Padova, Italy, Italy

Background. Serious Adverse events (SAEs) refer to the undesired occurrence of an event that derives from a drug reaction with direct consequences on patients’ life and compromising study validity and safety. On the matter there is room for improvement, that can be guided by the usage of Machine Learning (ML) with the aim of identifying subgroups of patients with meaningful combination of clinical features linked to SAEs, for limiting their frequency, with the usage of probabilistic methods that rely on clinical features rather than on specific dichotomized variables However, they are explored often through post-hoc analysis and not directly informing the design of Clinical Trials, due to their complex application in a dynamic context, which makes necessary the support of electronic applications.

Objective. The aim of this work is the development of a framework and a web-application in accordance with FDA guidance on enrichment strategies1 for reducing trial variability, that implements ML models to allow early detection. It employs ML models to identify patients at high risk of SAEs, enhancing early detection and informing inclusion/exclusion decisions. Historical data of early phase trials are used to train predictive models that estimate SAE probabilities for new participants, with inclusion decisions guided by a predefined decision rule.

Results. Simulations and the application on a case study assess the operational characteristics of the proposed framework, with the aim to maintain balance between the reduction of SAEs incidence, algorithm accuracy and maintaining generalizability of the study. Due to reduced variability consequent to patient exclusion, and most importantly the reduction of drop-outs lead to having power not only maintained but also increased, if the model provides a high performance, however issues are found particularly if low specificity is involved that would cause the unnecessary exclusion of low risk of SAEs subjects. On the positive extend, the algorithm provides reduced standard errors and more precise estimates of treatment effect.

posters-monday-ETH: 38

Assessing the Overall Burden of Adverse Events in Clinical Trials: Approaches and Challenges

Seid Hamzic, Hans-Joachim Helms, Eva Rossman, Robert Walls

F. Hoffmann-La Roche Ltd, Basel, Switzerland

Measuring the total toxicity or adverse event (AE) burden of a therapeutic intervention is a longstanding challenge in clinical research. While trial reports commonly provide the incidence of individual AEs or a summary of the proportion of patients experiencing serious, e.g. grade ≥3 AEs, these metrics do not necessarily capture the global burden that they may impose on patients. Various approaches to consolidate AEs into a single composite score, such as summing CTCAE grades, have been proposed. However, these efforts face substantial methodological and interpretational hurdles.

This work offers a theoretical exploration of how AE burden could be conceptualized, quantified, and used when comparing two or more therapies. We review the limitations of incidence-based reporting that fails to capture interdependence or cumulative effects of multiple, possibly lower-grade AEs. We then discuss the existing proposals for composite toxicity scoring, noting the difficulties in weighting different AEs, some of which might be more tolerable to patients despite a higher grade. Additionally, current standard data collection approaches might lack the granularity necessary to distinguish differences in patient experience or quality of life.

We argue that while composite scores can offer a more holistic view of the total harm posed by a drug, they risk oversimplification and obscuring the clinical relevance of specific, important toxicities. Ultimately, this highlights a need for more robust data collection and careful methodological development that balances interpretability and accuracy in the comparison of AE burden across treatments.