Conference Agenda

Session

Poster Exhibition: T / Tuesday posters at ETH

Time:

Tuesday, 26/Aug/2025:

9:15am - 10:45am

Location: ETH, UG hall

ETH, -1 / UG floor poster area

Presentations

posters-tuesday-ETH: 1

Leveraging Wearable data for Probabilistic Imputation in Cardiovascular Risk Calculators.

Antoine Faul¹, Patric Wyss², Anja Mühlemann¹, Manuela Moraru³, Danielle Bower³, Petra Stute³, Ben Spycher², David Ginsbourger¹

¹Institute of Mathematical Statistics and Actuarial Science, University of Bern, Bern, Switzerland; ²Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland; ³Department of Obstetrics and Gynecology, University Women’s Hospital, Bern, Switzerland

Wearable technology for health data collection is rapidly expanding enabling continuous, non-invasive monitoring of physiological parameters such as heart rate variability and physical activity. This advancement offers promising improvements for cardiovascular disease (CVD) risk prediction, which traditionally depends on clinical measurements often requiring time-consuming and costly healthcare visits.

This study analyzes data from 193 female participants, aged 40 to 69, gathered during an observational study at Inselspital, Bern. Participants provided comprehensive medical and personal information, supplemented by wearable data collected with Garmin Vivosmart 3 devices over a week.

In this work, we explore the potential of replacing systematically missing inputs with probabilistic predictions derived from wearable data and self-reported information. By integrating this uncertainty into a risk calculator, we aim to provide probabilistic assessments of cardiovascular risk. Our approach uses an interpretable statistical model based on Gaussian copulas. This method flexibly characterizes the joint distribution, employing distinct marginals and a Gaussian dependence structure to facilitate analytical conditioning.

We extend the approach outlined by Mühlemann [1] by adressing the challenge of high dimensionality of smartwatch data. For this we focus on selected features obtained by both supervised and unsupervised dimensionality reduction techniques.

Proper scoring rules, such as the CRPS and the Brier Score, are employed to assess the quality of probabilistic predictions. We also compare various methods by cross validation in the context of high vs low CVD risk classification.

Our results demonstrate that wearable data can help in substituting clinical missing inputs in cardiovascular risk calculators, provided that an efficient dimension reduction step is implemented. However, the gains in predictive performance are moderate, suggesting that further exploration of advanced dimensionality reduction techniques could be beneficial.

[1] MÜHLEMANN, Anja, STANGE, Philip, FAUL, Antoine, et al. Comparing imputation approaches to handle systematically missing inputs in risk calculators. PLOS Digital Health, 2025, vol. 4, no 1, p. e0000712.

posters-tuesday-ETH: 2

Stepwise Prediction of Tuberculosis Treatment Outcomes Using XGBoost and Feature-Level Analysis: A Multi-Stage Approach to Clinical Decision Support

Linfeng Wang, Jody Phelan, Taane Clark

London School of Hygiene and Tropical Medicine, United Kingdom

Tuberculosis (TB) remains a global health crisis, with multidrug-resistant (MDR-TB) and extensively drug-resistant (XDR-TB) strains posing significant challenges to treatment. Utilizing the extensive TB Portals database, comprising clinical, radiological, demographic, and genomic data from 15,997 patients across high-burden countries, we developed an XGBoost-based machine learning model to predict treatment outcomes. Our approach categorizes features into four diagnostic evidence data categories: demographic, microbiology and disease state, x-ray result variables and treatment variables. This framework enables the model to progressively incorporate available data while maintaining robust predictive performance, even in the presence of missing values typical of real-world healthcare settings. The model achieved high predictive accuracy (AUC-ROC: 0.96, F1-score: 0.94), with key predictors including age of onset, drug resistance, and treatment adherence. Regional analysis highlighted variability in performance, underscoring the potential for localized model adaptation. By accommodating missing data at various diagnostic stages, our model provides actionable insights for personalized TB treatment strategies and supports clinical decision-making in diverse and resource-constrained contexts.

posters-tuesday-ETH: 3

The use of variable selection in clinical prediction modelling for binary outcomes: a systematic review

Xinrui Su, Gareth Ambler, Nathan Green, Menelaos Pavlou

Department of Statistical Science, University College London

Background

Clinical prediction models can serve as important tools, assisting in medical decision-making. Concise, accurate and interpretable models are more likely to be used in practice and hence an appropriate selection of predictor variables is viewed as essential. While many statistical methods for variable selection are available, data-driven selection of predictors has been criticised. For example, the use of variable selection with very low significance levels can lead to the exclusion of variables that may improve predictive ability. Hence, their use has been discouraged in prediction modelling. Instead, selection of predictors based on the literature and expert opinion is often recommended. Recent sample size guidelines also assume that predictors have been pre-specified, and no variable selection is performed. This systematic review aims to investigate current practice with respect to variable selection when developing clinical prediction models using logistic regression.

Methods

We focused on published articles in PubMed between 1-21 October 2024 that developed logistic prediction models for binary health outcomes. We extracted information on study characteristics and methodology.

Results

In total 141 papers were included in the review. We found that almost all papers (140/141) used variable selection. Univariable selection (UVS) was by far the most commonly reported method; it was used solely or sequentially alongside other methods in 78% (110/141) of papers. It was followed by backwards elimination (BE) (60/141, 43%), ‘with bulk removal’ (BR) from a single model (58/141, 41%) and LASSO (35/141, 25%). UVS and BE were frequently applied together (45/139, 32%), as were UVS and BR (43/139, 31%).

Conclusions

Despite criticisms regarding the uncritical use of data-driven variable selection methods, surprisingly almost all studies in this review employed at least one such method to reduce the number of predictors, with many studies using multiple methods. Traditional methods such as UVS and BE, as well as more modern techniques such as LASSO, are still commonly used. In the pursuit of parsimonious, as well as accurate risk models, model developers must be cautious when using methods based on significance testing, particularly with very low significance levels. However, methods such as LASSO, which directly aim to optimise out-of-sample predictive performance while also removing redundant predictors, may be promising and merit attention.

posters-tuesday-ETH: 4

Comparison of Methods for Incorporating Related Data when Developing Clinical Prediction Models: A Simulation Study

Haya Elayan, Matthew Sperrin, Glen Martin, David Jenkins

University of Manchester, United Kingdom

Background

Clinical Prediction Models (CPMs) are algorithms that compute an individual’s risk of a diagnostic or prognostic outcome, given a set of their predictors. Guidance states CPMs should be constructed using data directly sampled from the target population. However, researchers might also have access to additional and potentially related datasets (ancillary data) originating from different time points, countries, or healthcare settings, which could support model development, especially when the target dataset is small.

A critical consideration in this context is the potential heterogeneity between the target and ancillary datasets due to data distribution shifts. These occur when the distributions of predictors, event rates, or the relationships between predictors and outcome differ. Such shifts can negatively affect CPMs performance in the target population. We aim to investigate in what situations and using which methods, the ancillary data should be incorporated when developing CPMs. Specifically, if the effectiveness of utilising the ancillary data is influenced by the heterogeneity between the available datasets, and their relative sample sizes.

Methods

We conducted a simulation study to assess the impact of these factors on CPM performance when ancillary data is available. Target and ancillary populations were generated with varying degrees of heterogeneity. CPMs were developed using naive logistic regression (developed on data from target only), Logistic and Intercept regression updating methods (developed on ancillary data and updated to target), and importance weighting using propensity scores (developed on all available data, while weighting the ancillary data samples based on their similarity to the target). These models were then validated on independent data drawn from the same data-generating mechanism as the target population, using calibration, discrimination, and prediction stability metrics.

Results and Conclusion

Incorporating ancillary data consistently improved performance compared to using the target data only, especially when the target sample size was small. Both Logistic and Intercept Recalibration improve performance over naïve regression in most scenarios. However, the former showed greater variability in calibration slopes and more instability in calibration curves, while the latter performed worse in calibration slope under predictor-outcome association shift.

Importance weighting using propensity scores showed consistent results with improved performance to other methods in many scenarios, particularly under predictor–outcome association shift.

While this study investigates a-priori known data distribution shifts, their presence and type in practical settings are often unknown. Therefore, we recommend using importance weighting method for its robustness and stability across varied scenarios.

posters-tuesday-ETH: 5

A Systematic Review of Methodological Research on Multi-State Prediction Models

Chantelle Cornett, Glen Martin, Alexander Pate, Victoria Palin

University of Manchester, United Kingdom

Background:
Prediction models use information about a person to predict their risk of disease. Across health, patients transition between multiple states over time, such as health states or disease progression. Here, multi-state models are crucial, but these models require additional methodological considerations and their application in prediction modelling remains scarce. The methodological state-of-play of these methods in a prediction context has not been summarised.

Objectives:
This systematic review aims to summarise and critically evaluate the methodological literature on multi-state models, with a focus on development and validation techniques.

Methods:
A comprehensive search strategy was implemented across PubMed, Scopus, Web of Science, arXiv to identify methodological papers on multi-state models up to 7^th October 2024. Papers were included if they focused on methodological innovation, such as sample size determination, calibration, or novel computational methods; we excluded purely applied papers. Methodological details were extracted and summarised using thematic analysis.

Results:

The search identified 14,788 papers. After the title and abstract screening, there were 443 papers for full-text screening, of which 299 papers were included.
Preliminary findings from these studies reveal the majority of methodological research falls into the following groups:

Techniques for estimating transition probabilities, state occupation time, and hazards.
Hypothesis testing.
Variable selection techniques.

This presentation will overview the themes of methodological work, the limitations/gaps in methodological literature in this space, and outline areas for future work.

Conclusions:
Early results highlight progress in the methodological development of multi-state models and emphasise areas requiring further attention, such as more research into sample size and robust validation practices. The final results of this study aim to guide future research and support the adoption of best practices in the use of multi-state models.

posters-tuesday-ETH: 6

Assessing the robustness of prediction models: A case study on in-hospital mortality prediction using MIMIC-III and MIMIC-IV

Alan Balendran¹, Raphaël Porcher^1,2

¹Université Paris Cité, Université Sorbonne Paris Nord, INSERM, INRAE, Centre for Research in Epidemiology and StatisticS (CRESS); ²Centre d’Épidémiologie Clinique, Assistance Publique-Hôpitaux de Paris, Hôtel-Dieu

Clinical prediction models have become increasingly prevalent due to the availability of large healthcare datasets. While these models often achieve strong predictive performance, their robustness—their ability to remain stable under various perturbations—remains underexplored. However, models may experience significant performance degradation when tested on perturbed data (e.g., noisy data or datasets collected at different time points). Understanding how robust a prediction model is is essential for ensuring reliable clinical decision-making.

Building on an existing framework that identified eight key robustness concepts in healthcare (Balendran, A., Beji, C., Bouvier, F. et al. A scoping review of robustness concepts for machine learning in healthcare. npj Digit. Med. 8, 38 (2025)), we evaluate the robustness of different machine learning models using real-world critical care data from intensive care unit (ICU) patients.

We utilise the MIMIC-III and MIMIC-IV critical care databases to predict in-hospital mortality based on patient data from their first 24 hours of ICU admission. The dataset includes vital signs, laboratory test results, and demographic information (Johnson, A., Pollard, T., Shen, L. et al. MIMIC-III, a freely accessible critical care database. Sci Data 3, 160035 (2016)). To develop prediction models, we explore a range of machine learning approaches, from linear models such as logistic regression and LASSO to more complex tree-based methods, including random forest and gradient boosting. Additionally, we assess deep learning models, including a multilayer perceptron (MLP) and the recently introduced transformer-based model TabPFN (Hollmann, N., Müller, S., Purucker, L. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326 (2025)), which has been reported to outperform traditional gradient boosting techniques.

Each model is evaluated across multiple robustness concepts, including input perturbations, label noise, class imbalance, missing data, temporal validation, and subgroup analysis. To better reflect real-world clinical settings, we introduce varying levels of noise and test different scenarios for some concepts.

Our findings demonstrate that no model is consistently robust across all concepts, with some models being particularly sensitive to specific perturbations. Our result highlights that relying solely on standard performance metrics within a dataset does not account for potential deviations that can be encountered in real clinical settings. We advocate for robustness assessments as a crucial component of model evaluation and selection in healthcare.

posters-tuesday-ETH: 7

The Influence of Variable Selection Approaches on Prediction Model Stability in Low-Dimensional Data: From Traditional Stepwise Selection to Regularisation techniques

Noraworn Jirattikanwong¹, Phichayut Phinyo¹, Pakpoom Wongyikul¹, Natthanaphop Isaradech², Wachiranun Sirikul², Wuttipat Kiratipaisarl²

¹Department of Biomedical Informatics and Clinical Epidemiology (BioCE), Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; ²Department of Community Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand

Introduction: Prediction model instability presents significant challenges to clinical decision-making and may lead to patient harm. While several factors, such as dataset size and algorithm choice, are known to affect stability, evidence on how specific modelling decisions, particularly variable selection methods, influence stability remains limited. This study examines the impact of different variable selection approaches on prediction stability and model performance.

Methods: The German HOPE dataset of 9,924 patients, previously used to develop an anxiety prediction model was used. We generated three datasets of different sizes (0.5, 1, and 2 times the base size), where the base size was determined using Riley’s minimum sufficient sample size method. We defined 61 candidate parameters and replicated the model to predict anxiety using logistic regression. Six variable selection approaches were examined: (1) UNIVAR – univariate screening followed by backward elimination, (2) FULL – full model including all variables, (3) FORWARD – forward selection, (4) BACKWARD – backward elimination, (5) LASSO – least absolute shrinkage and selection operator, and (6) ELASTIC – elastic net. Model performance was evaluated in terms of discrimination and calibration. Optimism in performance metrics and mean absolute prediction error (MAPE) were estimated using the bootstrap internal validation procedure proposed by Riley and Collins.

Results: All variable selection approaches exhibited a similar level of discrimination at the base size and twice the base size. In contrast, at half the base size, both discrimination and calibration measures varied considerably. FULL achieved the highest discrimination in the smallest dataset but consistently displayed poor calibration across all sample sizes. Regularisation approaches (i.e., LASSO and ELASTIC) were well-calibrated across all dataset sizes, whereas traditional stepwise selection methods (i.e., UNIVAR, FORWARD, and BACKWARD) were only well-calibrated when the sample size was twice the base size. In terms of stability, both regularisation approaches had lower MAPE than others at the base size and twice the base size, while FULL showed lower MAPE at half the base size. All approaches required at least twice the minimum sufficient sample size to achieve a high level of individual stability.

Conclusion: Variable selection using regularisation is recommended, provided the sample size is sufficiently large. When sample sizes are around half the base size, regularisation approaches may still outperform other techniques in terms of stability and calibration. While FULL resulted in a modest improvement in stability, it exhibited significantly poorer calibration compared to UNIVAR, FORWARD, and BACKWARD.

posters-tuesday-ETH: 8

Early-detection of high-risk patient profiles admitted to hospital with respiratory infections using a multistate model

João Pedro Carmezim¹, Cristian Tebé¹, Natàlia Pallarès¹, Roger Paredes¹, Cavan Reilly²

¹Germans Trias i Pujol Research Instituteand Hospital (IGTP), Spain; ²University of Minnesota

Background: This study aims to identify clinically relevant prognostic factors associated with oxygen support, death or hospital discharge in a global cohort of adult patients with Influenza or COVID-19 using a multistate model.

Methods: Data was drawn from a cohort of adult patients diagnosed with respiratory infections admitted to a hospital of the Strategies and Treatments for Respiratory Infections and Viral Emergencies (STRIVE) research group. The study evaluates socio-demographic factors, medical history, comorbidities, vaccination status, virus type and clinical symptoms as prognostic factors. The multistate model was defined with the following states: hospital admission, noninvasive ventilation, invasive ventilation, oxygen support discharge, hospital discharge and death. The model estimates cause-specific hazard ratios, cumulative hazards and transition probabilities.

Results: A total of 4968 patients were included where the median age was 62.1 and the percentage of females was 47.9%. The number of patients that needed noninvasive ventilation was 1906 (38.4%), 277 (5.6%) required invasive ventilation , and 275 (5.5%) died. Demographic and clinical risk profiles revealed distinct progression pathways, and visualization using trajectory plots highlighted how risk factors influenced movement through disease states.

Conclusion: This study highlights the utility of a multistate model in mapping the progression of respiratory infections, providing critical insights into high-risk patient profiles. Transition probability trajectories provide clinicians with data to predict outcomes and, ideally, could help to plan resource allocation for these patients.

posters-tuesday-ETH: 9

Investigating fair data acquisition for risk prediction in resource-constrained settings

Ioanna Thoma¹, Matthew Sperrin², Karla Diaz Ordaz³, Ricardo Silva³, Brieuc Lehmann³

¹The Alan Turing Institute, London, United Kingdom; ²Division of Informatics, Imaging & Data Sciences, The University of Manchester, Manchester, United Kingdom; ³Department of Statistical Science, University College London, London, United Kingdom

Introduction: Accurate risk prediction relies on robust clinical prediction models (CPMs), yet their reliability, generalisability, and fairness can be constrained by the available data. While additional covariates may improve risk prediction, collecting them for an entire population might not always be feasible due to resource constraints. For example, genetic testing can provide additional predictive power when combined with a clinical risk model, but a population-wide rollout may not be financially viable. A key question is how to allocate resources, prioritising individuals for whom additional (genetic) testing would benefit most. This framework optimises utility and fairness when choosing between a baseline prediction model and a more costly but potentially more informative augmented model.

Methods: We develop a framework that quantifies the potential benefit to fairness and accuracy of a CPM when assessing policies for acquiring additional information for a subset of individuals. A specific use case is deploying an integrated tool that combines a traditional CPM, based on clinical risk factors, with a polygenic risk score (PRS). The goal is to evaluate the utility gained from such data integration. This involves comparing the outcomes of a conventional CPM with those of an integrated tool to assess how risk categorisation shifts when genetic information is incorporated.

Results: We apply our methodology to cardiovascular disease (CVD) risk prediction on a UK Biobank cohort of 96884 individuals aged 40-75. Transitions in risk classification help identify populations that benefit most from genetic score integration. Once these population subgroups have been identified, we define sub-sampling policies to determine which individuals should be selected based on their covariates and existing model uncertainty. We investigate deterministic and stochastic policies that also account for varying subgroup proportions, ensuring a representative and fair sample composition. The methodology identifies age and gender groups that experience the most significant shifts in risk classification when transitioning from the baseline to the integrated model.
Conclusion: This framework has the potential to guide future data collection strategies, helping to prioritise population subgroups that need it the most. While our application focuses on the evaluation of an integrated tool for CVD risk prediction, we expect the methodology to be broadly applicable and can be adapted to a variety of predictive models across the disease spectrum.

posters-tuesday-ETH: 10

A critical benchmark of Bayesian shrinkage estimation for subgroup analysis

Sebastian Weber¹, Björn Bornkamp¹, David Ohlssen²

¹Novartis Pharma AG, Switzerland; ²Novartis Pharmaceuticals, USA

The estimation of subgroup specific treatment effects is known to be a statistically difficult problem. We suggest to evaluate different estimation approaches using a benchmark. This benchmark is based on scoring the predictive distribution for the subgroup treatment effect using late phase clinical trial data comprising normal, binary and time-to-event endpoints. Bayesian shrinkage estimation models for subgroups are traditionally applied to non-overlapping subgroups using hierarchical models. This implies that several models need to be fitted to the same data set when several subgroup defining variables are of interest. Recently Wolbers et al (2024) propose to use a single global regression model using priors such as horseshoe priors to induce shrinkage for the used model. This method has the benefit that there is no need to create a disjoint space of subgroups. Thus, overlapping subgroups can be investigated with a single model avoiding the need to refit a given data set multiple times. We will compare the performance of different shrinkage approaches based on a real data benchmark. The evaluated approaches include no and full-shrinkage towards the overall treatment effect, Bayesian hierarchical shrinkage and more novel priors such as the global model prior R2D2 proposed by Zhang et al (2020).

posters-tuesday-ETH: 11

Mathematical Modelling of Oxygenation Dynamics Using High-Resolution Perfusion Data: An Advanced Statistical Framework for Understanding Oxygen Metabolism

Mansour Taghavi Azar Sharabiani¹, Alireza Mahani², Richard Issitt³, Yadav Srinivasan⁴, Alex Bottle¹, Serban Stoica⁵

¹School of Public Health, Imperial College London, United Kingdom; ²Statman Solution Ltd, United Kingdom; ³Perfusion Department, Great Ormond Street Hospital for Children, London, United Kingdom; ⁴Cardiac Surgery Department, Great Ormond Street Hospital for Children, London, United Kingdom; ⁵Cardiac Surgery Department, Bristol Royal Children’s Hospital, Bristol, United Kingdom

Background
Balancing oxygen supply and demand during cardiopulmonary bypass (CPB) is crucial to minimising adverse outcomes. Oxygen supply is determined by cardiac index (CI), haemoglobin concentration (Hb), and arterial oxygen saturation (SaO₂), whereas oxygen demand is driven by metabolism, which itself depends on body temperature (Temp). Actual oxygen consumption is driven by oxygen extraction ratio (OER), dynamically adapting to changes in oxygen supply and demand, yet the mechanisms of this adaptation remain poorly understood. We developed GARIX and eGARIX, mathematically extending classical time-series models to incorporate nonlinear dependencies, patient-specific variabilities and minute-by-minute OER dynamics.

Methods
GARIX is a time-series model that integrates exogenous variables (CI, Hb, SaO₂, Temp) with a disequilibrium term representing the imbalance between oxygen consumption and temperature-dependent oxygen demand, initially modelled via a constant Q₁₀ framework (van’t Hoff model). The model was trained on intraoperative data from 343 CPB operations (20,000 minutes) in 334 paediatric patients at a UK centre (2019–2021). eGARIX extends GARIX by relaxing the assumption of constant Q₁₀, introducing nonparametric temperature dependence (splines) and incorporating age, weight, and their interaction. Subgroup analyses explored OER responses across different age groups.

Results
GARIX identified that OER adapts in a two-phase process: a rapid adjustment phase (<10 minutes) and a slower phase lasting several hours. Equilibrium analysis estimated Q₁₀ ≈2.25, indicating that oxygen demand doubles with every 8.5°C temperature increase. eGARIX demonstrated indexed oxygen demand following a nonlinear trajectory with age and weight, peaking at 3 years of age. In neonates and infants, oxygen demand correlated positively with weight, whereas in adolescents, the correlation was negative. Additionally, temperature dependence deviated from the classical Q₁₀ assumption, showing low sensitivity at mild hypothermia and high sensitivity at deep hypothermia. Younger patients exhibited a diminished OER response to Hb changes compared to older children.

Conclusions
Proposed GARIX and eGARIX represent mathematical extensions of classical time-series modelling, enabling a data-driven approach to studying oxygen metabolism during CPB. By harnessing vast amounts of recently available high-resolution perfusion data, these models compensate for the ethical limitations of direct human experimentation, providing a powerful framework to refine intraoperative oxygenation strategies. Our findings highlight the importance of advanced mathematical modelling in optimising personalised oxygen delivery strategies, adapting to individual patient characteristics, and enhancing our understanding of oxygen metabolism in paediatric CPB.

posters-tuesday-ETH: 12

Marginal structural Cox model with weighted cumulative exposure modelling for the estimation of counterfactual Population Attributable Fractions

Yue Zhai¹, Ana-Maria Vilcu², Jacques Benichou^2,3, Lucas Morin², Agnès Fournier⁴, Anne Thiébaut², Vivian Viallon¹

¹Nutrition and Metabolism Branch, International Agency of Research in Cancer(IARC), Lyon, France; ²High Dimensional Biostatistics for Drug Safety and Genomics Team, Université Paris-Saclay, UVSQ, Inserm, CESP, Villejuif, France; ³Department of Biostatistics, Rouen University Hospital, Rouen, France; ⁴Exposome and Heredity Team, CESP U1018, Université Paris-Saclay, UVSQ, Inserm, Gustave Roussy, Villejuif, France

Introduction: Marginal structural Cox models (Cox MSMs) have become popular for estimating the causal effect of time-varying exposures on a time-to-event outcome, accounting for time-varying confounders affected by prior exposure levels. They can be combined with the weighted cumulative exposure (WCE) method to flexibly model the causal effect of past levels of the exposure on the hazard rate. This study evaluated the performance of the corresponding approach (Cox WCE MSM) based on regression B-splines, for the estimation of population attributable fractions (PAF) through extensive simulations.

Method: Independent samples of 10,000 and 1,000 individuals, each with 100 regular visits of follow-up, were generated. In each sample, approximately 50% of individuals experienced the event of interest before the end of follow-up. For a given hazard ratio comparing “always exposed” to “never exposed”, we considered four scenarios, with different standardized weight functions reflecting how past exposure causally influences the current hazard rate as time elapses since exposure: i) monotonically decreasing weight; ii) bell-shaped weight; iii) constant weight; iv) current exposure only. Estimands of interest were the PAF and the causal effect function of past exposure. Various versions of Cox WCE MSMs were implemented to assess the influence of parameters like the number of knots and the length of time window. Additionally, we implemented two versions of Cox MSM accounting for only current exposure and unweighted cumulative exposure, respectively. The Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) were used for model selection.

Results: PAF estimates produced by most Cox WCE MSMs were unbiased in scenarios i to iii, but were biased in scenario iv. The variance of Cox WCE MSMs was comparable to that of conventional Cox MSMs. Notably increasing the number of knots had little effect on variance. Models selected via either AIC or BIC provided unbiased PAF estimates across all scenarios. As for the causal effect of past exposure, although average estimates provided by Cox WCE MSMs were generally close to the true function, we observed large variation across samples, especially with smaller samples and weaker effects.

Conclusion: Overall, Cox WCE MSMs selected by either AIC or BIC yielded unbiased estimates of counterfactual PAF. To ensure robust model selection, we recommend considering also the conventional Cox MSMs that account for current and unweighted cumulative exposure in the model selection process.

posters-tuesday-ETH: 13

Lost in the Forest of Forest Plots? Practical Guidelines and an All-in-One Tool for Forest Plots

Hongqiu Gu, Yong Jiang, Hao Li

Beijing Tiantan Hospital, Capital Medical University, People's Republic of China

Background: Forest plots are indispensable visualization tools in meta-analyses and other contexts of medical research, yet existing guidelines and implementation tools are often fragmented and lack a cohesive framework. In this study, we aimed to develop comprehensive guidelines and integrated tools to extend the applicability of forest plots across a wider range of research contexts.

Methods: In consultation with a thorough review of existing literature and guidelines, combined with practical experience, we synthesized and developed a comprehensive classification system for forest plots driven by analysis methods. Additionally, we proposed key principles for their construction and created a versatile SAS macro to facilitate more effective application and communication of forest plots across various research scenarios.

Results: We categorized forest plots into four main types that correspond to regression analysis, subgroup analysis, estimation analysis, and meta-analysis across 11 scenarios independent of study design. The five key principles for creating effective forest plots are providing comprehensive data, arranging items logically, ensuring accurate scaling, and applying aesthetic formatting. Furthermore, we developed versatile and integrated SAS tools that align with the framework and principles proposed.

Conclusion: This guideline provides a versatile, integrated solution for applying forest plots across various research contexts. It is expected to lead to improved use and visualization of forest plots.

posters-tuesday-ETH: 14

Robust Outlier Detection with Skewness-Adjusted Fences: Theoretical Foundations and Applications

YUNCHAE JUNG, MINSU PARK

Department of Statistics and Data Science, Chungnam National University, Republic of Korea

Outlier detection plays a crucial role in statistical analysis by ensuring data integrity and improving the reliability of inferences. Traditional methods, such as Tukey’s boxplot, often struggle with skewed distributions, leading to inaccurate detection and potential misinterpretation of results. While approaches like the adjusted boxplot (Hubert and Vandervieren, 2008) provide some improvements, they can be computationally demanding and less effective under extreme skewness.

In this study, we present an outlier detection framework that incorporates a skewness-adjusted fence into an enhanced boxplot design. By utilizing a robust skewness measure based on the median absolute deviation, this method addresses key limitations of existing approaches, offering a computationally efficient and statistically reliable alternative for skewed distributions. Simulation studies and real-world applications demonstrate that the proposed method consistently improves detection accuracy while maintaining efficiency.

Additionally, we extend this approach to time-dependent data, showing its effectiveness in identifying outliers in time series settings. This extension makes the method applicable to a wide range of fields, including finance, healthcare, and environmental monitoring, where detecting anomalies in structured and evolving datasets is essential.

Keywords: Robust outlier detection, Skewness-adjusted boxplot, Influence function, Median absolute deviation

posters-tuesday-ETH: 15

Minimum Area Confidence Set Optimality for Simultaneous Confidence Bands for Percentiles in Linear Regression: An Application to Estimating Shelf Life

Lingjiao Wang¹, Yang Han¹, Wei Liu², Frank Bretz^3,4

¹Department of Mathematics, University of Manchester, UK; ²School of Mathematical Sciences and Southampton Statistical Sciences Research Institute, University of Southampton, UK; ³Novartis Pharma AG, Basel, Switzerland; ⁴Section for Medical Statistics, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Austria

Background: The stability of a drug product over time is a critical property in pharmaceutical development. A key objective in drug stability studies is to estimate the shelf-life of a drug, involving a suitable definition of the true shelf-life and the construction of an appropriate estimate of the true shelf-life. Simultaneous confidence bands (SCBs) for percentiles in linear regression are valuable tools for determining drug shelf-life in drug stability studies.
Methods: In this paper, we propose a novel criterion, the Minimum Area Confidence Set (MACS), for identifying the optimal SCB for percentile regression lines. This criterion focuses on the area of the constrained regions for the newly proposed pivotal quantities, which are generated from the confidence set for the unknown parameters of a SCB. We employ the new pivotal quantities to construct exact SCBs over any finite covariate intervals and use the MACS criterion to compare several SCBs of different forms. Additionally, we introduce a computationally efficient method for calculating the critical constants of exact SCBs for percentile regression lines.
Results: The optimal SCB under the MACS criterion is demonstrated to effectively construct interval estimates of the true shelf-life. The proposed method for calculating critical constants significantly improves computational efficiency. A real-world drug stability dataset is used to illustrate the application and advantages of the proposed approach.

posters-tuesday-ETH: 16

One-sided simultaneous tolerance intervals based on kernel density estimates

Gian Louisse Roy

University of the Philippines Diliman

Tolerance intervals are informative tools with wide-ranging applications in various fields, especially in laboratory medicine. They are valuable in medical decision making as they contain a specified proportion of values of the sampled population with high degree of confidence. When several biochemical analytes are measured from patients, simultaneous inference becomes useful. This study proposes nonparametric methods that construct simultaneous tolerance intervals (STIs) under the one-sided case. As most medical data show skewness and come from unknown underlying distributions, the proposed STIs are based on kernel density estimates. The methodologies used are evaluated by examining performance metrics, such as estimated coverage probabilities and expected lengths, and by comparing them with the usual Bonferroni-correction approach (BCA). The proposed methods show accurate results as the said metrics exhibit desirable patterns, with a few exceptions that are further examined and justified. These methods also address a spurious behavior that BCA results tend to display. The proposed one-sided nonparametric STIs are generally favourable than the ones from BCA and can be improved through recommended future work that are laid out.

posters-tuesday-ETH: 17

Robust large-scale multiple testing for hidden Markov random field model

Donghwan Lee¹, Jiyn Sun²

¹Department of Statistics, Ewha Womans University, Republic of Korea; ²Integrated Biostatistics Branch, Division of Cancer Data Science, National Cancer Center, Republic of Korea

The hidden Markov random field model (HMRF), as an effective model to describe the local dependence of two or three-dimensional image data, has been successfully applied to large-scale multiple testing of correlated data, image segmentation, graph discovery, and so on. Given the unobservable random field, the emission probability (conditional distribution of observations) is usually assumed to be known, and the Gaussian distribution is frequently used. To achieve robustness, we introduce a novel framework for large-scale multiple testing when the emission probability distribution of HMRF is unknown or misspecified. We build the inferential procedure for estimating parameters and the false discovery rate (FDR) based on a quadratically convergent method for computing non-parametric maximum likelihood estimates of a mixing distribution. Furthermore, we integrate latent variable modeling with the knockoff filter method to improve FDR control in testing. The proposed method is validated by simulation studies, which show that it outperforms the other existing methods in terms of FDR validity and power. A real data example for neuroimaging is illustrated to demonstrate the utility of the proposed procedure.

posters-tuesday-ETH: 18

Model informed assurance approach for 3-way PK similarity studies

Rachid El Galta, Roland Baumgartner

Sandoz, Germany

In the absence of actual data, published pharmacokinetic (PK) models can simulate subjects' PK profiles to estimate geometric mean ratios and coefficient of variations for parameters like AUC and Cmax. These estimates can be used to inform sample size calculations for PK similarity studies. However, the accuracy depends on the quality of the PK model and input parameters. Ignoring uncertainty can lead to underpowered studies.

To address this, we use an assurance approach alongside power calculations. This involves simulating PK profiles with a published PK model, considering parameter uncertainty by sampling from a multivariate normal distribution. We generate multiple parameter sets, simulate PK profiles by treatment arm for each, and perform equivalence testing. Assurance is the proportion of successful equivalence tests.

Combining assurance with traditional power calculations provides a more comprehensive assessment of sample size considerations.

posters-tuesday-ETH: 19

Korea Sequence Read Archive (KRA) - A public repository for archiving raw sequence data

JAEHO LEE

KRIBB, Korea, Republic of (South Korea)

The Korea Sequence Read Archive (KRA; https://kbds.re.kr/KRA) is a publicly availble repository of high throughput sequencing data as a part of the Korea BioData Station (K-BDS; https://kbds.re.kr/) database. KRA collects and provides key nucleotide sequence data, including files in FASTQ or FASTA format and rich metadata generated by various NGS technologies. The primary objective of the KRA is to support and promote the use of nucleotide sequencing as an experimental research platform. It achieves this by offering comprehensive services for data submission, archiving, searching, and downloading. Recently, the existing collaboration with DDBJ has been further strengthened to establish close cooperation with INSDC. As a result, KRA now supports data submission to INSDC via DDBJ DRA, and through enhanced browser functionalities, users can search and download data more efficiently. By ensuring the long-term preservation and accessibility of nucleotide sequence data and through continuous development and improvements, KRA remains an important resource for researchers utilizing nucleotide sequence analysis data. KRA is available at https://kbds.re.kr/KRA.

posters-tuesday-ETH: 20

Integrative analysis of transcriptomic and epigenomic dynamics of liver organoids using single cell RNA-seq and ATAC-seq

Kwang Hoon Cho, Jong-Hwan Kim, Jimin Kim, Jahyun Yun, Dayeon Kang

Korea Research Institute of Bioscience and Biotechnology, Korea, Republic of (South Korea)

Previously, we developed a novel method to generate functionally mature human hepatic organoids derived from pluripotent stem cells (PSCs), and their maturation was validated through bulk RNA sequencing (RNA-seq). In this study, we aimed to characterize the heterogeneity and dynamic changes in the transcriptome and epigenome at the single-cell level. To achieve this, we employed single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) using the 10x Chromium platform.

Hepatic organoids were cultured under two distinct medium conditions: hepatic medium (HM) and differentiation medium (DM). A total of 39,310 and 36,940 individual cells were analyzed using scRNA-seq and scATAC-seq, respectively. To validate our findings, we compared our data with publicly available RNA-seq datasets from liver organoids and liver tissues at various stages of differentiation, including induced pluripotent stem cells (iPSCs), DM-treated cells, primary human hepatocytes (PHHs), and adult liver tissues.

Our analysis revealed that cells clustered into 10 to 11 distinct subpopulations, representing different developmental stages in both scRNA-seq and scATAC-seq datasets. Furthermore, integrative analysis of scRNA-seq and scATAC-seq data identified coordinated changes in gene expression and chromatin accessibility near key liver differentiation marker genes. These findings indicate that hepatic organoids cultured under HM and DM conditions consist of heterogeneous cell populations spanning multiple stages of hepatic differentiation.

In conclusion, single-cell transcriptomic and epigenomic profiling provided insights into the cellular diversity and developmental trajectory within hepatic organoids. This study highlights the utility of scRNA-seq and scATAC-seq in elucidating the molecular dynamics underlying liver differentiation and maturation.

posters-tuesday-ETH: 21

Leveraging tumor imaging compositional data structure in model feature space for predicting recurrence in colorectal carcinoma

Olivia J Bobek, Nicholas Larson, Rish K Pai, Fang-Shu Ou

Mayo Clinic, United States of America

Background/Introduction:

The quantitative segmentation algorithm QuantCRC extracts morphologic features of digitized H&E slides in colorectal carcinoma (CRC), quantitatively decomposing the tumor bed area into stroma and stromal subtypes, necrosis, and tumor components. These features have previously been incorporated as linear predictors in a LASSO regularized regression model for cancer recurrence in a cancer registry study. However, as compositional data, representing these features as simple proportions may not maximize their informativeness for prediction. Likewise, algorithms based on linear predictors may fail to account for more complex relationships between compositional features and outcome. The objective of this research was to investigate how commonly used log-ratio transformations for compositional data impact QuantCRC-based prognostic modeling performance as well as assess competing machine learning algorithms that may offer benefits for compositional feature spaces.

Methods:

The study cohort consisted of 2411 CRC patients from the Colon Cancer Family Registry. The outcome of interest was recurrence-free survival, measured as time from surgery to recurrence or last follow-up. The original LASSO model included 15 QuantCRC features, tumor stage (I-IV) and mismatch repair status (deficient vs. proficient). The proposed model feature space included the additive log-ratio transformations of the composition variables in addition to the clinical variables, yielding 34 features total. In addition to LASSO, elastic net and gradient boosting machine (GBM) algorithms were also applied using the log-ratio feature set. Training was performed using 10-fold cross validation on 80% (n=1928) and tested on 20% (n=483) of the data. Harrell’s C-index was used to assess discrimination.

Results:

On the training set, the original LASSO produced a Harrell’s C-index of 0.697 (bootstrapped 95% Confidence Interval (CI): 0.672, 0.723) and the LASSO with log-ratio features produced a C-index of 0.703 (95% CI: 0.679, 0.729). The C-index for the elastic net and GBM was 0.704 (95% CI: 0.677, 0.731) and 0.719 (95% CI: 0.692, 0.744) respectively. In the test data, the LASSO with the log-ratio transformation produced a slightly improved C-index: 0.701 (95% CI: 0.650, 0.746) compared to the original features (0.697 (95% CI: 0.646, 0.743)). The elastic net resulted in a C-index of 0.703 (95% CI: 0.653, 0.749) and GBM produced a C-index of 0.702 (95% CI: 0.647, 0.751).

Conclusion:

The additive log-ratio transformation is a compositional data representation to consider for predictive models. In this application, feature engineering based on compositional structure slightly improved model performance. All algorithms with compositional data features demonstrated comparable model discrimination.

posters-tuesday-ETH: 22

BayesPIM: A Bayesian Prevalence-Incidence Mixture Model for Screening Outcomes, with an Application to Colorectal Cancer

Thomas Klausch, Birgit Lissenberg-Witte, Veerle Coupé

Amsterdam University Medical Center

Background

Screening programs for diseases, such as colorectal cancer (CRC), involve inviting individuals in regular or irregular intervals for a test, such as the Fecal immunochemical test (FIT) or a colonoscopy. The resulting data can be analyzed to obtain the time to (pre-state) disease which, when additionally regressed on covariates, such as age and gender, is informative on risk heterogeneity. Such information helps decide whether screening intervals should be personalized to identified risk factors.

We present the R package BayesPIM – Bayesian prevalence-incidence mixture model – which is particularly suited in settings where individuals are periodically tested (interval censoring), have the disease at baseline (prevalence), baseline tests may be missing, and the screening test has imperfect sensitivity. We motivate the model using data from high-risk familial CRC surveillance through colonoscopy, where adenomas, precursors of CRC, are the primary target of screening. Besides demonstrating the functionalities of BayesPIM, we also show how to evaluate model performance using simulations based on the real-world CRC data.

Methods

BayesPIM models the interval-censored time to incidence via an accelerated failure time model while handling latent prevalence, imperfect test sensitivity, and covariate data. Internally, a Metropolis-within-Gibbs sampler and data augmentation is used, implemented through an Rcpp backend. A user-friendly R interface is available. Model fit can be assessed using information criteria and validated against a non-parametric estimator of cumulative incidence.

Additionally, performance is evaluated by resampling the real-world CRC screening data. Specifically, we set the data-generating model parameters to their estimates and then generate screening times and outcomes that closely resemble those observed in practice via an innovative algorithm. Repeatedly comparing estimates on these resampled datasets to the true values assesses model performance under realistic data conditions.

Results

In the CRC application, baseline prevalence of adenomas was estimated at 27.4% [95% CI: 22.2%, 33.3%], with higher prevalence in males and older individuals. Among those free of adenoma at baseline, incidence reached 20% at five years and 45% at ten years, with older individuals experiencing faster incidence. Resampling simulations based on the CRC data showed that model estimation remained stable if informative priors on test sensitivity were imposed, even at low sensitivity (40%).

Conclusion

BayesPIM offers robust estimation of both prevalence and incidence under complex, real-world screening conditions, including uncertain test sensitivity, latent disease status, and irregular intervals. The model demonstrated stable performance under varying test sensitivities, highlighting its practical value for designing more effective, patient-centered screening programs.

posters-tuesday-ETH: 23

Joint Modelling of Random Heterogeneity in Longitudinal and Multiple time-to-Events in Colon Cancer

DIVYA DENNIS, Jagathnath Krishna KM

Regional Cancer Centre, Thiruvananthapuram, Kerala, India, India

Background: In caner survival studies, disease progression can be assessed with longitudinal study designs where the patients are observed over time and the covariates information (biomarkers, carcinoembryonic antigen -CEA) are measured repeatedly during the follow up period. Apart from repeated measured covariates, multiple survival outcomes were observed longitudinally. Also there may exist unobserved random heterogeneity between the survival outcomes. This motivated to derive a joint multi-state frailty model (JMFM) capable of predicting the risk for multiple time-to events simultaneously utilizing the dynamic predictors and random heterogeneity factor, the frailty. The frailty variable was assumed to follow gamma distribution, thus forms the joint multi-state gamma frailty model (JMGFM).

Methodology: For accounting heterogeneity, longitudinal outcome and multiple time-to-events, we derived a JMGFM. The longitudinal sub-model was modeled using linear mixed model and the survival sub-model using multi-state gamma frailty model (MGFM). The latent variable was used to link the longitudinal and multiple time-to-event sub-models. The parameters were estimated using maximum likelihood estimation method. The existing MGFM and the developed model were illustrated using colon cancer patient data. The covariate considered for risk prediction were, composite stage, lymph node involvement, T4, age, sex, PNI, LVE; and CEA as longitudinal outcome.

Results: The study observed that frailty coefficient had a significant impact on predicting the risk at each transition states along with longitudinally measured covariate. So JMGFM were found to be better predictive than MGFM. The JMGFM model is capable of providing dynamic risk prediction simultaneously, which the MGFM cannot. The present study identified that PNI (transition from diagnosed as disease to death), Composite Stage (transition from recurrence to death; transition from metastasis to death and transition from recurrence to metastasis) lymph-node involvement and age along with the longitudinally measured CEA value as significant prognostic factors for predicting the multiple time-to-events based on the proposed JMGFM and also found that for each transition state, the longitudinal observation (CEA) has strong association with corresponding survival events (η ranges from 1.3 to 1.5).

Conclusion: Thus we conclude that the joint multi-state frailty model as a better model for simultaneous dynamic risk prediction of multiple events in presence of random heterogeneity in longitudinal study design.

Keywords: Multi-state model, Joint multi-state model, joint multi-state frailty model, longitudinal sub-model, Colon cancer

posters-tuesday-ETH: 24

Refining the Association between BMI, Waist Circumference, and Breast Cancer Risk in Postmenopausal Women using G-formula Method

Somin Jeon^1,2, Boyoung Park^1,3, Junghyun Yoon^1,2

¹Department of Preventive Medicine, Hanyang University College of Medicine, Seoul, Republic of Korea; ²Institute for Health and Society, Hanyang University, Seoul, South Korea; ³Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, Republic of Korea

Purpose. Previous studies have shown an increased risk of postmenopausal breast cancer (BC) in obese women. However, these studies did not focus on longitudinal changes in obesity levels and did not account for time-varying covariates. This study applies the g-formula method to assess how changes in BMI and waist circumference associate with subsequent BC risk.

Methods. Data were obtained from the Korean National Health Insurance Database. We utilized data from the national BC screening, with baseline data including women who underwent screening in 2009-2010. Screening information in the subsequent biennial cycles (2011-2012 until 2019-2020) was examined, and only women with postmenopausal status at baseline and with at least three screenings were included in the analysis. Incident BC cases were ascertained until 2021. We applied the g-formula method to compare BC risk in women who maintained a certain BMI/waist circumference level versus the natural course. Hazard ratios (HRs) were estimated, and the model was adjusted for age, fixed covariates, and time-varying covariates.

Results. Of the 91,092 postmenopausal women, the mean (SD) age was 60.7 (7.5), and the mean (SD) BMI and waist circumference were 24.3 (3.1) and 79.9 (8.1), respectively. Results from the G-formula show that compared to women with a natural course of BMI, those who maintained a normal BMI level (<23) or overweight BMI (23 to <25) had a decreased BC risk (adjusted hazard ratio [aHR] 0.93, 95% CI 0.90 – 0.95, and aHR 0.97, 95% CI 0.96-0.98, respectively). In contrast, those who maintained obese status had an increased BC risk (obese 1, BMI 25 to <27.5, with an aHR of 1.07, and obese 2, BMI ≥27.5, with an aHR of 1.20). A similar pattern was observed in the results for waist circumference.

Conclusions. Results from the g-formula indicate that maintaining a normal BMI or waist circumference is associated with a lower BC risk, while obese women are at an increased risk of postmenopausal breast cancer.

Acknowledgments: This study was funded by the National Research Foundation of Korea (NRF) (grant no. RS-2023-00241942, RS-2024-00462658, and 2021R1A2C1011958).

posters-tuesday-ETH: 25

Building cancer risk prediction models by synthesizing national registry and prevention trial data

Oksana Chernova¹, Donna Ankerst^1,2, Ruth M Pfeiffer³

¹Technical University of Munich, Germany; ²Department of Urology, University of Texas Health Science Center at San Antonio, USA; ³Biostatistics Branch, National Cancer Institute, NIH, HHS, Bethesda, USA

Current online United States (US) five-year prostate cancer risk calculators are based on screening trials or databases not calibrated to the heterogeneous US population. They are underpowered for the rarer outcome of high-grade disease, particularly for the subpopulation of African Americans, who are underrepresented in many national trials. The US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) program has monitored state cancer rates since 1973, more recently adding Gleason grade. SEER rates are stratified by five-year age groups and race, filling in statistical power gaps for African Americans. This talk provides the statistical method for integrating SEER incidence and mortality rates with time-to-event data with competing risks from prevention and screening trials following the NCI Breast Cancer Risk Assessment Tool. The methodology allows development of a contemporary 5-year high-grade prostate cancer risk prediction model that is trained from merging individual-participant data from the Selenium and Vitamin E Cancer Prevention Trial (SELECT) with population aggregated data in SEER. Simulation of a contemporary US validation set is performed by merging individual-level data from the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) with SEER.

posters-tuesday-ETH: 26

Modelling Individual-level Uncertainty from Missing Data in Personalised Breast Cancer Risk Prediction

Bethan L. White, Lorenzo Ficorella, Xin Yang, Douglas F. Easton, Antonis C. Antoniou

University of Cambridge, United Kingdom

Breast cancer risk prediction models use a range of predictors to estimate an individual’s chance of developing breast cancer in a given timeframe. These can facilitate risk stratification, to identify individuals who would benefit most from screening or preventive options. The BOADICEA breast cancer risk model, implemented in the CanRisk tool (www.canrisk.org), uses genetic, lifestyle, hormonal, family history and anthropometric data to estimate an individual’s risk. When implementing risk prediction models, risk predictor data are often incomplete. Point-estimates calculated when some risk factor data are missing can therefore hide considerable uncertainty.

We developed a methodological approach for quantifying uncertainty and the probability of risk-reclassification in the presence of missing data. We employed Monte Carlo simulation methods to estimate the distribution of breast cancer risk for individuals with missing data, using multiple imputation by chained equations (MICE) with UK Biobank and KARMA as reference datasets to sample missing covariates. We developed a framework for estimating uncertainty, that can be applied to any given individual with missing risk factor data. We used exemplar cases to assess the probability that collecting all missing data would result in a change in risk categorisation, on the basis of the 10-year predicted risk from age 40, using the UK National Institute for Health and Care Excellence (NICE) guidelines.

For example, a woman whose mother and sister have both been previously diagnosed with breast cancer, but with all other personal risk factor information unmeasured, will be categorised as at “moderate risk” by the BOADICEA model, with around a 5% chance of developing breast cancer between the ages of 40 and 50. However, if all remaining risk factor information were measured, our methodology estimates a 52% chance of reclassification to the “population risk” group, and a 5% chance of reclassification to the “high risk” group. Taking into account all missing risk factor information, an estimated 95% uncertainty interval for the risk point estimate would be (0.9%, 9.0%).

These results demonstrate that there can be a considerable likelihood of reclassification into a different risk category after collecting missing data. The methodology presented in this work can identify situations where it would be most beneficial to collect additional patient information before making decisions in clinical settings.

posters-tuesday-ETH: 27

Time-varying covariates in Survival Analysis: a graphical approach to assessing the risk of cardiovascular events and aortic valve sclerosis development

Arianna Galotta¹, Francesco Maria Mattio¹, Veronika Myasoedova¹, Elisabetta Salvioni¹, Paolo Poggio^1,3, Piergiuseppe Agostoni^1,2, Alice Bonomi¹

¹Centro Cardiologico Monzino, IRCCS, Milan, Italy; ²Department of Clinical and Community Sciences, University of Milan, Milan, Italy; ³Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy

Background: Survival analysis is essential for studying the time to the occurrence of an event of interest, such as death or the onset of a disease. When covariates change over time, it is crucial to consider these variations to estimate the relationship between exposure and outcome accurately and robustly. While using the Cox model with time-dependent covariates is methodologically appropriate, its graphical representation remains challenging. This study focuses on evaluating the development of aortic valve sclerosis (AVSc) as an exposure condition to the risk of cardiovascular (CV) events, taking into account its progression over time.

Methods: The relative risk of CV events linked to AVSc development was assessed using the Cox proportional hazards model. To generate the survival curves, we applied the method proposed by Simon and Makuch (Schultz et al., 2002). This approach differs from the traditional Kaplan-Meier method, which treats covariates as fixed; the difference imposed by considering a time-dependent covariate is in the interpretation of the risk set. In our case, a time-varying covariate leads to a continuous renewal of risk sets based on the value of the covariate at each time point. Therefore, the risk set includes all individuals at risk just before time t, whose covariate value indicates their membership in the relevant group at that time.

Results: Time-dependent analyses were conducted to evaluate AVSc development as a time-sensitive exposure to CV events. Participants who developed AVSc during the follow-up period were considered unexposed from baseline until the onset of development, after which they were classified as exposed. The hazard ratio related to the AVSc development was then evaluated using the Cox proportional-hazards model. The analysis with the time-dependent covariate approach provided a more detailed understanding of the association between the AVSc development and the risk of CV events over time. The survival curves generated using this method demonstrated that accounting for the time-varying nature of AVSc exposure significantly impacted the prognosis of patients.

Conclusion: This study emphasises the importance of considering time-varying covariates in survival analysis for an accurate risk estimate. Although the Cox model with time-dependent covariates is the correct methodological choice, its graphical representation is complex. The method proposed by Simon and Makuch enhances the traditional Kaplan-Meier approach by allowing the integration of covariates that evolve over time. This is particularly relevant in medical research, where dynamic exposures must be considered to avoid misleading conclusions.

posters-tuesday-ETH: 28

Polygenic scores as tools for intervention selection in the setting of finasteride for prostate cancer prevention

Allison Meisner

Fred Hutchinson Cancer Center, United States of America

Background/introduction: Polygenic risk scores (PRS) have been proposed as tools for intervention selection. PRS are weighted combinations of single nucleotide polymorphisms (SNPs) where each SNP is weighted by its association with outcome risk. An alternative approach utilizes predictive polygenic scores (PPS), in which the weight for each SNP corresponds to its association with intervention effect. We compare the utility of PRS and PPS for identifying individuals expected to benefit from finasteride in the prevention of prostate cancer.

Methods: We used data from the Prostate Cancer Prevention Trial (PCPT), a randomized trial of finasteride for prostate cancer prevention. Of the 8,506 men with available genotype data, YY developed prostate cancer. We used the Polygenic Score Catalog to identify a recently developed prostate cancer PRS. We split the data into training (2/3 of the data) and test (1/3 of the data) sets. We constructed three scores, each of which was a combination of 198 SNPs in the PRS published on the Polygenic Score Catalog: (1) a PRS based on the coefficients published in the Polygenic Score Catalog (PRS1), (2) a PRS based on coefficients estimated in the training data via logistic regression (PRS2), and (3) a PPS based on the interaction between each SNP and randomization to finasteride, estimated in the training data via logistic regression. In the test data, we compared the three scores based on the reduction in the rate of prostate cancer when a given score is used for intervention selection.

Results: In the test data, 17.0% of men developed prostate cancer and finasteride was significantly associated with a reduction in risk of prostate cancer; thus, the default setting is to treat all men with finasteride. For PRS1, there was no threshold at which treatment with finasteride would not be recommended; thus, use of PRS1 to guide intervention use would not reduce the rate of prostate cancer. For PRS2, 0.2% of men would not be recommended finasteride, leading to a reduction in the rate of prostate cancer of < 0.001%. Finally, for the PPS, 35.3% of men would not be recommended finasteride, leading to a reduction in the rate of prostate cancer of 3.0% if the PPS were used to guide intervention use.

Conclusion: In this analysis of PCPT data, PPS demonstrated substantially greater clinical utility as tools for intervention selection compared to PRS. PPS should be considered as tools for intervention selection more broadly.

posters-tuesday-ETH: 29

Implementation of a Disease Progression Model accounting for covariates

Gabrielle Casimiro, Sofia Kaisaridi, Sophie Tezenas du Montcel

ARAMIS, Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Groupe Hospitalier Sorbonne Université, Paris, France

Introduction: Disease progression models are promising tools for analysing longitudinal data presenting multiple modalities. Such models can be used to estimate long-term disease progression and reconstruct individual trajectories. Inter-patient variability is often modeled as random perturbations around a fixed reference. However, much of this variability is driven by external factors such as genetic mutations, gender, level of education or socio-economic status.

In this work, we extend a non-linear mixed-effects disease progression model (Disease Course Mapping Model), implementing a multivariate logistic framework to explicitly account for covariates. We illustrate the potential of this approach by modelling the evolution of CADASIL disease, the most frequent small artery brain disease caused by pathogenic variants of the NOTCH3 gene, using the genetic mutation location as a covariate.

Methods: A general formulation involves a non-linear mapping η between timepoints and clinical markers, parametrized by fixed effects α (population level) and random effects β_i (individual level): y_i = η_α(t_i | β_i).

The Disease Course Mapping Model applies time reparameterization to realign all individual trajectories into a common timeline, accounting for spatiotemporal variability. To do so, it estimates a population parameter expressing the average disease onset time, enabling direct comparison of features (such as scores or biomarker values measured longitudinally) at this time and identifying the sequence of symptom onset.

To incorporate baseline covariates in the model, the existing paradigm was extended. Instead of estimating a fixed effect α parametrizing the average disease course, we introduced a link function f_φ that can predict an expected trajectory of the disease conditioned by a given set of covariates c_i.

Results: The proposed model has been implemented in the Open-source library Leaspy. Applied to different clinical scores, it reveals significant differences according to the mutation location: patients with the pathogenic variant located in EGFr domains 1-6, previously identified as a determinant of disease severity, showed a faster and more pronounced decline in the Rankin score assessing the severity of disability.

Conclusion: This approach allows us to explicitly model how external factors influence disease progression rather than treating variability as purely stochastic. While the current model incorporates a single binary covariate, future work will focus on extending this framework to handle multiple covariates simultaneously and to integrate continuous variables.

posters-tuesday-ETH: 30

Genetics influences LDL-C response to statin therapy: short- and long-term observational study with functional data analysis

Andrea Corbetta^1,2,3, Emanuele Di Angelantonio^1,4, Andrea Ganna³, Francesca Ieva^1,2

¹Human Technopole, Milan, Italy; ²Politecnico Di Milano, Milan, Italy; ³Institute for Molecular Medicine Finland, Helsinki, Finland; ⁴University of Cambridge, Cambridge UK, UK

Introduction: Understanding the genetic basis of lipid-lowering responses to statin therapy may provide critical insights into personalized cardiovascular treatment strategies. This study employs advanced statistical methods to investigate how genetic predisposition, captured through polygenic scores (PGS) for low-density lipoprotein cholesterol (LDL-C), influences short-term and long-term changes in LDL-C levels following statin initiation.

Methods: We utilized data from the FinnGen cohort, focusing on LDL-C measurements in two distinct groups: (1) a short-term group of 11,343 individuals with LDL-C measurements recorded within one year before and after initiating statin therapy and (2) a long-term group of 15,864 individuals who had maintained statin therapy for a minimum of five years. The LDL-C trajectories were modelled as functional objects, allowing us to apply functional principal components analysis (FPCA) to identify independent patterns of LDL-C response.

In the short-term group, we modelled the absolute and relative reduction of LDL-C using linear regression models with PGS as a predictor. In the long-term group, we analyzed the first two FPCA components: the first principal component (PC1) representing the baseline LDL-C level (mean pattern) and the second principal component (PC2) capturing the LDL-C reduction pattern. Genome-wide association studies (GWAS) were conducted to identify genetic variants associated with these phenotypic patterns, applying stringent Bonferroni correction for multiple testing.

Results: We observed that individuals in the highest PGS tertile experienced a greater absolute LDL-C reduction in the first year after statin initiation, with a mean reduction of 8.12 mg/dL (95% CI: 6.93–9.57) compared to the lowest tertile. However, this group demonstrated a smaller relative reduction of 1.81% (95% CI: 0.06–2.99). In the long-term group, higher PGS was associated with elevated LDL-C levels over five years but no significant association was found between PGS and LDL-C change patterns. The GWAS identified significant genome-wide loci for relative LDL-C reduction and baseline LDL-C levels, with lead variants near genes previously implicated in lipid metabolism.

Conclusion: Our findings suggest that short-term LDL-C response exhibits a genetic basis strongly linked to baseline LDL-C regulation. In contrast, long-term LDL-C changes appear predominantly influenced by non-genetic factors such as adherence. Nonetheless, individuals with higher LDL-C PGS consistently maintain higher LDL-C levels over extended periods. These results underscore the complex genetic architecture of LDL-C response to statins and highlight the utility of FPCA in characterizing dynamic lipid trajectories.

posters-tuesday-ETH: 31

Longitudinal analysis of imprecise disease status using latent Markov models: application to Italian Thyroid Cancer Observatory data

Silvia D'Elia¹, Marco Alfò¹, Maria Francesca Marino²

¹Sapienza University of Rome (Italy); ²University of Florence (Italy)

Background

Longitudinal data are widely used in medicine to monitor patients over time, providing dynamic view of disease progression, and treatment responses. Ordinal scales are often used to measure response to treatment or summarise disease severity.

Methods

Latent Markov (LM) models represent an important class for dealing with ordinal longitudinal data. LM models are based on a latent process which is assumed to follow a Markov chain with a certain number of states (latent states). The state characterises the (conditional) response distribution at each time occasion.

LM model allows to analyse longitudinal data when dealing with:

measurement error
unobserved heterogeneity

Such models estimate transition probabilities between latent states, also including individual covariates¹. Of particular interest is the evaluation of patient dynamics over time as a function of individual covariates (both constant and time-dependent).

Application

A latent Markov model is used to analyse data from the Italian Thyroid Cancer Observatory (ITCO), a database of over 15,000 patients with diagnosis of thyroid cancer, treated in different clinical centres in Italy. Despite the high survival rate, the risk of recurrence remains significant, and long-term monitoring is needed to detect recurrence early and maintain appropriate therapies². The study aims to monitor and assess the effectiveness of the response to treatment over the rime, trying to identify factors that predict true disease status.

Patients are monitored prospectively from the date of surgery, with follow-up visits at 12, 36, and 60 months. At each follow-up visit, the response to treatment is assessed through clinical, biochemical, and imaging findings and response is classified into 4 categories: excellent (ER, no evidence of disease), indeterminate (IND), biochemical incomplete (BIR) and structural incomplete (SIR, evidence of disease). However, this classification has limitations: categories synthesise multiple measurements prone to error, are influenced by unobserved factors, and the disease status (evidence vs. no evidence of disease) is not directly observable due to the presence of ambiguous categories (IND, BIR).

While around 50% of patients show clearly no evidence of disease at any time point, 30–40% fall into indeterminacy area.

Conclusion

Latent Markov models may lead to a better understanding of patients' clinical trajectories depicting a more accurate picture of patients' dynamics, considering variables that may influence transitions between states.

¹Bartolucci et al., Latent Markov Models for Longitudinal Data, 2012.

²Haugen et al., 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer, 2016.

posters-tuesday-ETH: 32

Long-term risk prediction from short-term data – a microsimulation approach

Moritz Pamminger, Theresa Ullmann, Moritz Madern, Daniela Dunkler, Georg Heinze

Medical University of Vienna, Austria

Background
In medical research, long-term risk prediction is often desirable, e.g. to predict the risk of a cardiovascular or other health event within the next 30 years. To estimate such a prediction model requires data with a long enough follow-up. Such data are rarely available and may be outdated. Our aim is to develop and evaluate methods to harness contemporary data for long-term predictions.

Methods
We assume longitudinal data with 2-5 possibly irregular measurements of 5-20 prognostic factors (e.g. cholesterol, blood pressure etc.) per individual over a 5-years period and associated survival outcomes. We present a microsimulation-based strategy to obtain predictions of survival and of trajectories of the prognostic factors over a long-term prediction horizon of 20-30 years.

First, we trained models using the current values of prognostic factors to predict subsequent measurements and the event status with a short-term prediction horizon of 1-2 years. Starting with individual-specific initial values of the prognostic factors, these short-term models were then applied to generate follow-up values of the prognostic factors and of the survival state as draws from the respective predictive distributions. These values serve as new baseline for the next prediction-and-generation step. Iteration proceeds until an event is predicted or the long-term prediction horizon is reached. For each individual multiple (e.g. 1.000) trajectories for prognostic factors and the survival process are generated, which can be suitably summarized.

We validated the approach using various synthetic datasets for which long-term follow-up was available. We artificially censored these datasets to mimic data with short-term follow-up, which we used to train our models. Then we applied the microsimulation approach to make long-term predictions and compared the predicted outcomes with the observed ones in the training set. We also validated predictions in an independent test set.

Results
The approach was implemented in an R package for convenient application in various situations. The package provides flexible options to specify short-term models. It can perform predictions for individuals, efficiently processing entire datasets, and present results with appropriate graphical summaries.

Conclusion
Despite some limitations, the method effectively handles irregular time intervals in the training data and allows capturing nonlinear and interaction effects for prognostic factors and survival. It provides analysts with a flexible tool for long-term prognosis across various fields and in the future may provide a practically useful framework for individual long-term prognosis at routine health screenings. This work was supported through the FWF project P-36727-N.

posters-tuesday-ETH: 33

Deep learning algorithm for dynamic survival prediction with competitive risks

Tristan Margaté^1,2,3, Marine Zulian¹, Agathe Guilloux², Sandrine Katsahian^2,3,4

¹Healthcare and Life Sciences Research, Dassault Systemes, France; ²HeKa team, INRIA, Paris, France; ³Université Paris Cité, France; ⁴URC HEGP, APHP Paris

Background:

The medical follow-up of a patient suffering from cancer is spread over time, making it possible to obtain repeated measurements that allow to consider the progression of the disease state over time. The development of a prognostic solution requires the ability to update predictions of the occurrence of clinical events over time according to new measurements, i.e., to make dynamic predictions.
In oncology, patients can face appearance of metastasis, other diseases due to comorbidities or death. It can be useful to predict which of these event s will occur first; this is referred in the literature as competing risks. We aim to develop new methodologies capable of considering longitudinal data for predicting competing risks.

Methods:
Various Deep Learning algorithms have recently been developed to consider longitudinal survival data and competing risks [1] [2]. However, as they consider the entirety of the patient's available longitudinal data to create a time-independent static embedding, they suffer from bias when used to predict survival over the patient's follow-up interval. Recent methodologies use a progressive approach to integrate patient’s data [3], allowing to get an embedding of features that varies over time. It shows superior results in a context of classical survival analysis.
We have chosen to use this type of methodology, modifying the method used to create the embedding of longitudinal data, and to extend it to competing survival setting.
In addition, we have developed a new simulation scheme to obtain synthetic longitudinal survival data with competitive risks.

Results/Conclusion:

We will present results on different approaches to consider both longitudinal and survival data and their limitations in order to produce unbiased predictions. We compare our algorithm with existing algorithms on simulated data and a subset of real-world data from the Framingham Heart Study whose aim was to study the etiology of cardiovascular disease.

References:
[1] Lee, C. et al. (2019). Dynamic-deephit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Transactions on Biomedical Engineering
[2] Moon, I. et al. (2022). SurvLatent ODE: A Neural ODE based time-to-event model with competing risks for longitudinal data improves cancer-associated Venous Thromboembolism (VTE) prediction. In Machine Learning for Healthcare Conference
[3] Bleistein, L et al. (2024). Dynamical Survival Analysis with Controlled Latent States. arXiv preprint arXiv:2401.17077.

posters-tuesday-ETH: 34

Identifying Cutoff in Predictors in Survival Analysis: An Ensemble Strategy for Flexible Knot Selection

Stefania Lando, Giulia Lorenzoni, Dario Gregori

Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padova, Italy

Background

Restricted cubic splines (RCS) are widely used in Cox proportional hazards models to capture nonlinear relationship between a continuous biomarker and patient outcomes. Traditional approaches for knot selection often rely on predefined quantiles (e.g., 5th, 35th, 65th, and 95th percentiles), normally arbitrarily chosen, or a fixed number of knots systematically placed across the biomarker range. All such strategies present certain limitations: quantile-based methods provide stability and reproducibility but may oversimplify underlying nonlinear relationships, whereas fixed knots are at risk of overlooking variations in the data.

Methods

Our study explores an ensemble methodology that seeks to put together the robustness of quantile-based knot placement with the flexibility of data-driven strategies, aiming to provide robust knot placement while preserving clinical meaning in cutoff determination. The core idea is to maintain the intuitive simplicity of quantile-based knots while introducing a selective tuning mechanism—guided by cross-validation—to refine their placement. In parallel, our approach incorporates time-dependent ROC analysis to identify clinically relevant cutoffs for risk stratification at a chosen time horizon.

Applications

The proposed methodology can be applied in various clinical and epidemiological settings where risk stratification based on continuous biomarkers is essential. Examples include oncology for identifying prognostic thresholds in tumor markers, cardiology for refining cardiovascular risk scores, and infectious disease modeling for determining severity cutoffs. Additionally, this approach can be extended to precision medicine, where patient subgroups with distinct risk profiles can be identified for targeted interventions.

Conclusion

By integrating flexible knot placement with an ensemble-based cutoff strategy, the method enhances the adaptability of spline-based Cox models while preserving clinical relevance.

posters-tuesday-ETH: 35

Estimating Quality Adjusted Life Years (QALYs) from a joint modeling framework: a simulation-based study

Vincent Bonnemains¹, Yohann Foucher², Philippe Tessier¹, Etienne Dantan¹

¹1. Nantes Université, Univ Tours, CHU Nantes, INSERM, MethodS in Patients-centered outcomes and HEalth Research, SPHERE, F-44000 Nantes, France; ²2. INSERM, CIC-1402, Centre Hospitalier Universitaire de Poitiers, Université de Poitiers, Poitiers, France.

Background. Clinical trials investigators often chose patient survival as the primary outcome.

When health-related quality of life (HRQoL) outcomes are considered, they are usually analyzed secondarily and separately from the survival outcome, precluding the consideration of potential trade-offs between them. In contrast, Quality-Adjusted Life Years (QALYs) are a composite outcome that allows the two stakes to be considered simultaneously by weighting years of life by HRQoL indexes (utility scores) that reflect individual preferences. Hence, QALYs could be a practical primary outcome for assessing treatments benefit.

However, the estimation of QALYs usually relies on non-parametric approaches suffering several methodological pitfalls. This work aims to propose a sounder method for estimating QALYs using the shared random effects joint modelling framework.

Methods. We developed a shared random-effect joint model, the longitudinal utility scores being considered through a mixed beta regression and the time-to-death through a proportional hazard Weibull model. We then proposed a method for estimating QALYs using this model. We compared its performances with the commonly used non-parametric method through a simulation study.

We simulated a wide range of clinical trials considering the presence and absence of treatment effect, 200 and 500 included patients, one and two utility score measurements per patient per year for three years, and two censoring rates: 0% and 30% at a three-year horizon. We also considered different data generation mechanisms resulting in well-specified or misspecified models. For each scenario, we simulated 1000 data samples. The treatment effect was estimated in terms of QALYs at a three-year horizon.

Results. Our proposed method provided unbiased estimates of QALYs and significant improvements over the non-parametric approach when the joint model was well-specified. This was particularly the case when a low number of repeated utility measurements per patient or a high censoring rate was simulated. The two methods performed poorly when simulating the risk of event with non-proportional hazards.

Conclusions. We proposed a method based on joint modeling for estimating QALYs. We reported accurate estimations for clinical trials with moderate sizes when the model is well specified. However, we found the estimations to be sensitive to model misspecification. We are working to develop additional modelling tools and deliver an R package that will allow users to accurately estimate QALYs in a wide range of situations. We hope this will encourage a larger use of QALYs in clinical trials and better consideration of patients’ preferences in medical decision-making.

posters-tuesday-ETH: 36

An Alternative Estimand for Overall Survival in the Presence of Treatment Discontinuation: Simulation Results and Case Study

Kara-Louise Royle¹, David Meads², Jennifer Visser-Rogers³, David A Cairns¹, Ian R White⁴

¹Leeds Cancer Research UK Clinical Trials Unit, Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK; ²Academic Unit of Health Economics, Leeds Institute of Health Sciences, University of Leeds, Leeds, UK; ³Coronado Research, Kent, England; ⁴MRC Clinical Trials Unit at UCL, London, UK

Introduction

Overall survival (OS) is a definitive endpoint for clinical effectiveness in cancer clinical trials. However, intercurrent events, like treatment discontinuation, can affect its interpretation.

A recent literature review concluded that treatment discontinuation and the uptake of subsequent anti-cancer treatment is often considered part of the treatment strategy i.e. researchers follow the “Treatment Policy” approach.

Our objective was to investigate the novel alternative hypothetical estimand: What is the effect on OS of the experimental trial treatment versus the control treatment, if all participants who discontinued prior to death received the same subsequent treatment?

Methods

Statistical techniques, including simple intention-to-treat (ITT) and per protocol (PP) methods and more complex two-stage and inverse proportional censoring weighting (IPCW) methods, were applied in a simulation study. The data-generating mechanism simulated a two-arm randomised controlled trial dataset of 700 participants and three stratification factors. Observed and unobserved variables were simulated at baseline and follow-up timepoints. At each follow-up timepoint, some participants were simulated to discontinue and start one of two (A or B) subsequent treatments. Eleven different scenarios were considered, including varying the true experimental treatment effect and timing of treatment discontinuation. The estimand of interest was the hazard ratio and 95% confidence interval of the experimental vs control arms if everyone who discontinued had the same subsequent treatment (A rather than B). The methods were evaluated in terms of bias, coverage, and power, calculated across 1000 repetitions.

Results

The ITT method was biased across all scenarios, but mostly had adequate power and coverage. The PP methods were biased with poor coverage in all scenarios. The two-stage methods were unbiased and had adequate power and coverage in almost all scenarios. The IPCW methods’ performance fluctuated the most across the scenarios.

Discussion

The simulation study found that the estimand could be estimated, with varying levels of performance, by all implemented methods. Overall, the two-stage method was the most consistently accurate method across the scenarios. The practicability of estimating the hypothetical estimand using the two-stage method in practice will be assessed through a real clinical trial case study, presented at the meeting. The trial was chosen as second-line immunotherapy was introduced during trial follow-up. As more effective treatments are developed, this is likely to be a common scenario. We will discuss the generalisability of the hypothetical estimand, how it improves the interpretation of clinical trial results and the necessary considerations when analysing OS in such situations.

posters-tuesday-ETH: 37

Corrections of confidence interval for differences in restricted mean survival times in clinical trials with small sample sizes

Hiroya Hashimoto¹, Akiko Kada^2,1

¹NHO Nagoya Medical Center, Japan; ²Fujita Health University, Japan

Background / Introduction

In recent years, restricted mean survival time (RMST) has been used as a measure to demonstrate the difference in efficacy between treatment groups in time-to-event outcomes, especially when the proportional hazards assumption does not hold. However, statistical tests and interval estimations based on asymptotic normality may deviate from the normal distribution when the sample size is small, leading to an inflation of the type I error rate. In this presentation, we discuss the correction of confidence intervals for between-group differences in RMST.

Methods

Under the condition that the survival functions of two groups follow the same Weibull distribution and the censoring functions follow a uniform distribution, we conducted a simulation analysis in a two-group comparative study with fewer than 100 subjects per group under various scenarios. We examined the following methods:

(1) A method based on asymptotic normality,

(2) A method that applies bias correction to the standard error of Method (1), specifically, multiplying the standard error for each group by √{m_i/(m_i-1)}, where m_i is the number of events in group i,

(3) A method that lies between Methods (1) and (2), specifically, multiplying the standard error for each group by √{m_i/(m_i-0.5)}.

Results

As expected, Method (1) had the highest Type I error rate in all scenarios considered, followed by Method (3), and Method (2) had the lowest. In the uncensored situation, Method (2) was generally the most appropriate, and Method (3) was optimal when the event rate was low (S(τ)=0.7). Method (2) also tended to be too conservative as the censoring rate increased, and this was more pronounced for smaller sample sizes.

Conclusions

Method (3) produces better results when events occur less frequently. Method (2) yields conservative results, but caution should be exercised because it is too conservative in situations with small sample sizes and high censoring. When the sample size per group exceeds 100, the difference between methods is negligible.