46th Annual Conference of the International Society for Clinical Biostatistics (ISCB)

Session

Prediction / prognostic modelling 3

Time:

Wednesday, 27/Aug/2025:

9:00am - 10:30am

Location: ETH E27

D-BSSE, ETH, 84 seats

Presentations

34-1 Prediction 3: 1

Predicting Prediction Performance

Max Westphal, Rieke Alpers

Fraunhofer Institute for Digital Medicine MEVIS, Germany

Introduction: Internal-external validation studies aim to quantify the transferability of a clinical prediction model to a new context or population. In a recent work, we introduced an estimand framework to allow a precise specification of the difference between development and inference context and in effect the exact type of transferability to be estimated. [1] The chosen estimand has direct implications for the data splitting scheme in the validation study. In this talk, we will focus on comparing different statistical models for the out-of-distribution performance of a newly developed prediction model or learning algorithm. Such models can be used for a “meta” prediction of the unknown performance of the new “primary” prediction model or algorithm once it is implemented in a new context (e.g., in a country that was not part of the development data) and in particular provide an adequate uncertainty quantification for this prediction.

Methods: In a case study based on the International Stroke Trial dataset, we compared different statistical modelling approaches to predict (out-of-distribution) prediction performance. Hereby, performance was measured as the area under the curve (AUC) of the developed classification models for two-week survival after an acute stroke. [2] We compared a frequentist meta-analysis and a variety of Bayesian hierarchical models. For the latter, different (linear and non-linear) parametric learning curve models were utilized to model the dependence of the performance on the training sample size.

Results and Discussion: The frequentist meta-analysis approach is relatively simple to apply and provides plausible performance predictions. However, in contrast to the Bayesian hierarchical modelling approach, it cannot be used directly to adequately characterize the dependence on the training sample size. It should thus only be used if training sample sizes in the development dataset are representative. Another advantage of the Bayesian approach is that posterior uncertainty estimates offer a more expressive description of the different levels of uncertainty in the data (e.g., unseen patients, clinics or countries). It is however also more complex to use, requiring the specification of the hierarchical model structure, prior distributions and a learning curve model.

References

[1] Alpers, Rieke, and Westphal, Max. (2025). “An estimand framework to guide model and algorithm validation in predictive modelling.” Submitted for publication.

[2] International Stroke Trial Collaborative Group. (1997). The International Stroke Trial (IST): a randomised trial of aspirin, subcutaneous heparin, both, or neither among 19 435 patients with acute ischaemic stroke. The Lancet, 349(9065), 1569-1581.

34-1 Prediction 3: 2

Predictive modelling with block missingness

Viktor Racinskij¹, Robin Mitra², Chris Harbron³, Niels Hagenbuch⁴

¹The Alan Turing Institute (United Kingdom); ²UCL (United Kingdom), The Alan Turing Institute (United Kingdom); ³Roche Pharmaceutical (United Kingdom); ⁴F. Hoffmann-La Roche AG (Switzerland)

There is increasing interest in creating and utilising large data sets combined at scale across different modalities. Sophisticated predictive modelling methods can routinely handle large numbers of possible predictors derived from such data sources. However, when some units are not measured on certain modalities, missing data blocks will likely arise in the integrated data and can be viewed as an instance of structured missingness. These complicate the application of predictive models on such data sources.

Multiple imputation is a convenient and theoretically justified tool to handle missing values in many settings, including predictive modelling. However, in highly multivariate settings with large blocks of missing data, the approach may be computationally demanding, and it is increasingly difficult to specify a well-performing imputation model. In contrast, we leverage the basic principle of the factored likelihood approach and combine the joint model of missing data given the observed with the marginal statistics related to the observed data. Our method directly adjusts for missingness in statistics that play a role in predicting the outcome variable. In doing so, we can make predictions based on all the observed data but without the need to perform any imputation and without explicit specification of the likelihood function.

We compare our factorisation-based approach with multiple imputation (such as drawing from a predictive distribution) and regression imputation (using the conditional mean) across a range of comprehensive simulations. Our method avoids bias incurred by a single regression-type imputation, and its achieved predictive accuracy matches or exceeds that of carefully implemented multiple imputation, while computational burden is substantially reduced. We also illustrate our approach on a set of variables from the Cancer Genome Atlas, a large-scale multimodal pan-cancer data set, and identify some key insights from this application.

The benefits of the proposed method include its computational and statistical efficiency, with no need to create multiple imputations to obtain unbiased predictions. It also does not rely on distributional assumptions about the underlying data. In conclusion, our method provides a fast, reliable, and accurate solution for predictive modelling in cases of block missingness.

34-1 Prediction 3: 3

Lifestyle Predictors of All-Cause Mortality: Enhancing Risk Models Beyond Traditional Biomarkers

Yuhe Wang^1,2, Cameron Razieh^1,2, Thomas Yates^1,2

¹Diabetes Research Centre, University of Leicester, Leicester General Hospital, Leicester LE5 4PW, UK; ²NIHR Leicester Biomedical Research Centre, University of Leicester and University Hospitals of Leicester NHS Trust, LE5 4PW, UK

Background

Accurate risk prediction models are crucial for estimating all-cause mortality, particularly among aging populations with a high prevalence of chronic diseases. Traditional models primarily rely on non-modifiable factors and clinical biomarkers such as blood pressure and cholesterol-to-HDL ratio. However, lifestyle factors including physical activity, strength, and fitness measures may provide additional predictive value or serve as alternatives in the risk prediction model. This study aims to assess whether substituting traditional biomarkers with easily measurable lifestyle factors can improve mortality risk prediction.

Methods

Data were obtained from the UK Biobank and stratified by disease history, sex, and age (using 60 years as the threshold). The base model included six traditional risk factors: age (years), smoking status (Never, Previous, Current), BMI (kg/m²), systolic blood pressure (BP) (mmHg), total cholesterol-to-HDL ratio (mmol/L), and Townsend deprivation score. Five lifestyle factors include resting heart rate (RHR), handgrip strength (HGS), leisure-time physical activity (LTPA), walking pace (WP), and sleep duration were incorporated either as additions to or replacements for traditional risk factors to the base model, individually or combined. The analysis was conducted in three stages: (1) adding lifestyle factors, (2) substituting BP or cholesterol-to-HDL ratio, and (3) replacing both BP and cholesterol-to-HDL ratio. Model performance was evaluated using the C-index, comparing lifestyle risk predictors incorporated models to the traditional clinical model.

Results

Adding lifestyle factors and substituting the cholesterol-to-HDL ratio improved risk discrimination across all subgroups, with the greatest improvements observed when incorporating all five lifestyle predictors together. In the unhealthy cohort, the C-index improved by 0.0360 in young women, 0.0161 in older women, 0.0227 in young men, and 0.0237 in older men when replacing cholesterol-to-HDL ratio with lifestyle factors. A similar pattern was observed in the healthy group, though with smaller differences between the base and cholesterol-substituted models. Overall, RHR provided the greatest predictive improvement, except in healthy women, where HGS showed the highest predictive enhancement.

Conclusion

Lifestyle-based predictors, particularly RHR and combined 5 lifestyle factors, enhance mortality risk prediction and may serve as viable alternatives to clinical biomarkers. Given their accessibility and non-invasive nature, these factors could be integrated into prognostic models to improve risk estimation, particularly in settings with limited clinical data. Further research is needed to confirm the long-term utility of lifestyle predictors in mortality risk assessment.

34-1 Prediction 3: 4

Development and validation of the Options model, a clinical prediction model predicting risk of emergency caesarean births in nulliparous women

Alexandra Hunt

University of Liverpool, United Kingdom

Objective:

Globally, the rate of caesarean births (CB), including emergency caesarean births (EmCB), is increasing significantly. It is estimated that nearly one-third of all births will involve caesareans by 2030.

Several tools exist to predict Emergency Caesarean Births (EmCB), but they are not yet routinely implemented in clinical practice. While these tools generally demonstrate acceptable performance, external validation of some models and changes to national guidelines highlight the need for a new prediction model applicable to all-risk women. Using routinely collected data from a multi-ethnic pregnancy cohort in Bristol, the Options study aimed to develop and externally validate a clinical prediction model predicting the risk of EmCBs in nulliparous women and introduced a point scoring system to make the model more accessible and easily understood. This innovation promotes a more personalised and relaxed discussion between expecting mothers and their midwives. This model is validated across three diverse UK populations to ensure broad applicability.

Methods:

The model includes predictors age, height, BMI, estimated fetal weight, and weight gain. The model was developed using multivariable fractional polynomials and data funded by NIHR Bristol BRC, encompassing approximately 26,600 records from pregnant women at NBT in Bristol since 2009. External validation was conducted using datasets from Born in Bradford, Cambridge and Liverpool. Discrimination and predictive performance were assessed through C-statistics and calibration plots. A trial phase of the study will be implemented via a point scoring system, utilising the tool within a clinical setting.

Results:

The Options model demonstrates good internal discriminative ability (C-statistic: 0.66) and strong calibration. External validation results are comparable, showcasing good generalisability across diverse UK populations.

Conclusions:

Ensuring the applicability of prediction models across heterogeneous populations is essential. The Options prediction model forecasts EmCBs at 36 weeks gestation, supporting evidence-based, personalised care for pregnant women across varying risk profiles, but underlying differences in prevalence across cohorts highlight the challenges posed by varying regulations and hospital preferences. The prediction model has the potential to be integrated into NHS clinical practice, as a point scoring system, facilitating informed discussions between women and their clinicians regarding labour planning at 36 weeks.

34-1 Prediction 3: 5

Optimizing Dynamic Predictions from Joint Models using Super Learning

Dimitris Rizopoulos¹, Jeremy M.G. Taylor²

¹Erasmus MC, the Netherlands; ²University of Michigan, USA

Background: The motivation for our research comes from prostate cancer patients who, after diagnosis, underwent surgical removal of the prostate gland. The treating physicians closely monitor the prostate-specific antigen (PSA) levels of these patients to determine the risk of recurrence and metastasis and determine reintervention. Joint models for longitudinal and time-to-event data have been previously employed in prostate cancer to calculate dynamic individualized predictions and guide physicians. Two components of joint models that influence the accuracy of these predictions are the shape of the longitudinal trajectories and the functional form linking the longitudinal outcome history to the hazard of the event.

Methods: Finding a single well-specified joint model that produces accurate predictions for all subjects and follow-up times can be challenging, especially when considering multiple longitudinal outcomes. In this work, we use the concept of super learning and avoid selecting a single model. In particular, we specify a weighted combination of the dynamic predictions calculated from a library of joint models with different specifications. In particular, we focus on various formulations of the time effect for the longitudinal outcome and different functional forms to link this outcome with the event process. The weights are selected to optimize a predictive accuracy metric using V-fold cross-validation. We use as predictive accuracy measures the expected quadratic prediction error and the expected predictive cross-entropy.

Results: In our motivating University of Michigan Prostatectomy Dataset, the ensemble super learner performed better than the model best-selected model in the cross-validation procedure, especially when using the expected predictive cross-entropy as an accuracy metric. In a simulation study, we found that the super learning approach produces results very similar to those of the Oracle model, which was the model with the best performance in the test datasets. All proposed methodology is implemented in a freely available R package.

Conference Agenda