46th Annual Conference of the International Society for Clinical Biostatistics (ISCB)

Session

Prediction / prognostic modelling 1

Time:

Monday, 25/Aug/2025:

11:30am - 1:00pm

Location: ETH E21

D-BSSE, ETH, 54 seats

Presentations

05-prediction-prognostic-1: 1

Developing a clinical prediction model with a continuous outcome: sample size calculations to target precise predictions

Rebecca Whittle^1,2, Richard D. Riley^1,2, Lucinda Archer^1,2, Gary S. Collins³, Paula Dhiman³, Amardeep Legha^1,2, Kym Snell^1,2, Joie Ensor^1,2

¹Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, United Kingdom; ²National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, United Kingdom; ³Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, United Kingdom

Background

When developing a clinical prediction model, the precision of predictions is heavily influenced by the sample size used for development. Without adequate sample sizes, models may yield predictions that are too imprecise to usefully guide clinical decisions. Previous sample size research for developing models with a continuous outcome is based on minimising overfitting and targeting precise estimation of the residual standard deviation and model intercept. However, even when meeting these criteria, the uncertainty (instability) in predictions is often considerable. We propose a new approach for calculating the sample size required to target precise individual-level predictions when developing a prediction model for a continuous outcome.

Methods

We outline a four-step approach which can be used either before data collection (based on published aggregate data), or when an existing dataset is available (e.g., from a pilot study or existing study/database). We derive closed-form solutions that decompose the anticipated variance of individual outcome estimates into Fisher’s unit information matrix, predictor values and total sample size.

Results

The approach allows researchers to examine anticipated interval widths of individual predictions based on one particular sample size (i.e., of a known existing dataset), or to identify the sample size needed for a new study aiming to target a certain level of precision (e.g., a new cohort study). Additionally, this can be examined in particular subgroups of patients to help improve fairness of the model. We use a real example predicting Forced Expiratory Volume (FEV) in children to showcase how the approach allows researchers to calculate and examine expected individual-level uncertainty interval widths for particular sample sizes. We also showcase our new software module pmstabilityss.

Conclusions

We derived a new approach to determine the minimum required sample size to develop a clinical prediction model with a continuous outcome that gives precise individual outcomes. The approach enables researchers to assess the impact of sample size on the individual-level uncertainty; to calculate the required sample size based on a specified acceptable level of uncertainty; and to examine differences in precision across subgroups to inform fairness checks.

05-prediction-prognostic-1: 2

Sequential sample size calculations for developing clinical prediction models: learning curves suggest larger datasets are needed for individual-level stability

Amardeep Legha^1,2, Joie Ensor^1,2, Ben Van Calster^3,4, Evangelia Christodoulou⁵, Lucinda Archer^1,2, Rebecca Whittle^1,2, Kym I.E. Snell^1,2, Paula Dhiman⁶, Gary S. Collins⁶, Richard D. Riley^1,2

¹Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, United Kingdom; ²National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, United Kingdom; ³Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium; ⁴Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; ⁵German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany; ⁶Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, United Kingdom

Background:

Clinical prediction models (CPMs) estimate an individual’s risk of a particular outcome to inform clinical decision-making. Small sample sizes may lead to unreliable predictions. Current model development sample size calculations are mainly conducted before data collection, leading to a fixed minimum sample size target based on sensible assumptions. However, adaptive sample size calculations can be used during data collection, to sequentially examine expected model performance and identify when enough data have been collected.

This study aims to extend existing sequential sample size calculations when developing a CPM, by applying stopping rules based on individual-level uncertainty of estimated risks and probability of misclassification. This is relevant for situations including prospective cohort studies with a short-term outcome.

Methods:

Using a sequential approach, the model development strategy is repeated after every 100 new participants are recruited, beginning when the initial sample size reaches the minimum recommended before analysis. For every iteration of the model, prediction and classification instability statistics and plots are calculated using bootstrapping, alongside measures of calibration, discrimination and clinical utility. For each statistic, learning curves display the trend of estimates against sample size and stopping rules are formed on the perceived value of additional information; crucially this is context specific, for example, guided by the level of uncertainty and classification errors that stakeholders (e.g., patients, clinicians) are willing to accept.

Results:

Our approach is illustrated using real examples, including (penalised and unpenalised) regression and machine learning approaches. The findings show that the sequential approach often leads to much larger sample sizes than the fixed sample size approach, and learning curves based on individual-level stability typically require larger sample sizes than focusing on population-level stability defined by overall calibration, discrimination, and clinical utility. Further, what ultimately constitutes an adequate sample size is strongly dependent on the level of prediction and classification instability deemed acceptable by stakeholders.

Conclusions:

For model development studies carrying out prospective data collection, an uncertainty-based sequential sample size approach allows users to dynamically monitor and identify when enough participants have been recruited to minimise prediction and classification instability in individuals. Engagement with patients and other stakeholders is crucial.

05-prediction-prognostic-1: 3

Determining the Sample Size for Risk Prediction Models with Clustered Binary Data

Izzati Izyani Japar¹, Gareth Ambler¹, M. Shafiqur Rahman², Rumana Omar¹

¹Statistical Science, University College London, United Kingdom; ²Institute of Statistical Research and Training (ISRT), University of Dhaka, Bangladesh

Background: Risk prediction models are increasingly being used in clinical practice to predict health outcomes. These models are often developed using data from multiple centres (clustered data) where patient outcomes within a centre are likely to be correlated. It is important that the dataset used to develop a risk model is of an appropriate size, to avoid model overfitting problems and poor predictions in new data. Wynants et al. (2015) recommended using at least 10 events per variable (including the random parameter) to minimise bias in the regression coefficients and obtain acceptable C-statistic values when applying a random-effects model to clustered data. This approach focused only on ‘median predictions’ where the random intercept is ignored. More recently, Riley et al. (2020) and Pavlou et al. (2024) have proposed methods for sample size for independent data targeting the predictive performance of models however, these methods may not be appropriate for clustered data.

Methods: We conducted a full-factorial simulation to evaluate whether the Wynants method yields sample sizes sufficient for developing models with good predictive ability. Additionally, we assessed the applicability of sample size methods proposed by Riley and Pavlou for clustered data. Simulation scenarios were investigated by varying multiple factors (e.g., degree of clustering, model strength etc.). Model performance was evaluated using the mean absolute prediction error (MAPE), calibration slope (CS), and the c-statistic. Both overall and cluster-specific performance measures were used and acceptable target values were specified for these measures. We developed a new sample size calculation formula for clustered data using a meta-model based on the simulation results.

Results: None of the existing methods achieved our target acceptable MAPE values. The approaches by Wynants and Riley failed to achieve a CS of at least 0.9 when the prevalence was ≥15%. All methods generally produced c-statistic values within 0.02 of their true values. The new meta-model formula generally achieved the target acceptable MAPE values. It produced CS of at least 0.9 and c-statistic values within 0.02 of their true values when the prevalence was ≤25%.

Conclusions: Current sample size calculation methods for developing binary risk models often fail to ensure adequate predictive performance of models and therefore may not be suitable for clustered data. A novel sample size calculation formula that achieved good predictive performance of the models across a range of clustered data scenarios is proposed.

05-prediction-prognostic-1: 4

Conformal prediction intervals for the individual treatment effect

Danijel Kivaranovic², Robin Ristl¹, Martin Posch¹, Hannes Leeb²

¹Medical University of Vienna, Austria; ²University of Vienna, Austria

Background: The analysis of randomized clinical trials typically focusses on the estimation of the average treatment effect. However, the effect of a medical treatment in a specific patient may depend on individual patient characteristics. Predictions regarding this individual treatment effect have the potential to allow for personalized treatment decisions and improve overall treatment success. To allow for an informed decision, prediction intervals are required, which take into account model uncertainty and individual residual variability, and cover the true individual treatment effect with a certain probability. In absence of strong model assumptions, the calculation of such intervals from parallel-group data is complicated by the fact that for each patient the outcome under only one treatment option can be observed and the counterfactual outcome remains unknown.

Methods: We consider the setting of a randomized clinical trial comparing an experimental treatment versus control. We propose several procedures to calculate prediction intervals for the individual treatment effect in a new patient, which use multidimensional patient characteristics and a prediction model that is fitted with data from the randomized trial. The proposed methods do not depend on the chosen prediction model, however, for illustration, we consider linear regression models and fully connected neural networks. To construct the prediction intervals for the individual treatment effect, we first use two variations of the conformal inference method [Vovk, Gammerman and Shafer. Algorithmic learning in a random world. Springer, New York, 2005] to construct prediction intervals of the outcome under either treatment or control. In a second step we combine these intervals to obtain a prediction interval for the difference of the individual outcomes under treatment and control. In a simulation study, we compare the coverage probability and length of the proposed intervals using different regression models and under different assumptions regarding the distribution of individual residuals.

Results: We analytically prove finite-sample coverage guarantee for two prediction interval procedures with mild assumptions on the true data generating process. We prove asymptotic coverage for a further method that allows for more narrow intervals, however requires a consistent regression model and bivariate normal distribution of the individual residuals. We further demonstrate that complex learning algorithms, such as neural networks, can lead to narrower prediction intervals than simple algorithms, such as linear regression, if the sample size is large enough.

Conclusions: The proposed methods provide robust prediction intervals for the individual treatment effect, which have the potential to support personalized treatment decisions.

05-prediction-prognostic-1: 5

Effective sample size for Cox survival models: A measure of individual uncertainty in predictions

Toby Hackmann¹, Doranne Thomassen¹, Saskia le Cessie^1,2, Hein Putter¹, Liesbeth C de Wreede¹, Ewout W Steyerberg^1,3

¹Department of Biomedical Data Sciences, LUMC, The Netherlands; ²Department of Clinical Epidemiology, Leiden University Medical Center, the Netherlands; ³Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, the Netherlands

Background/Introduction: Clinical prediction models are becoming increasingly popular to support shared decision-making. Most prediction models prioritize the accurate estimation and clear communication of point predictions. Uncertainty around the point prediction may be expressed by confidence intervals but is usually left out altogether. To present prediction uncertainty in an intuitive way, the concept of effective sample size may be attractive.[1] Our goal is to provide estimates of the effective sample size for individual survival predictions at a specific timepoint based on a Cox model.

Methods: Effective sample size for a patient’s risk prediction is defined as the hypothetical sample size of similar patients (with respect to the model), such that the variance of the survival probability in that sample would be the same as the prediction variance. In many cases, it can be calculated as a ratio of the outcome variance conditional on the predictor values to the prediction variance. We estimate the effective sample size for Cox model-based risk predictions using standard variance formulae. We investigate the behaviour of this estimator in an illustrative clinical data set of colon cancer patients and through simulations.

Results: The variance of a risk prediction based on a Cox model depends on the variance of the estimated coefficients and variance of the baseline hazard. The latter is impacted by censoring and calculated based on the complete dataset weighted by the linear predictors. Effective sample size for a prediction is impacted by the distribution of covariates and censoring pattern in the development data and by model assumptions. Patients who are more ‘typical’/well-represented in the data, with covariate values close to the population mean, have higher effective sample sizes, while patients with more uncommon covariate values have lower effective sample sizes. Model assumptions, such as the proportional hazards assumption and resulting shared baseline hazard in the Cox model, increase the effective sample size of predictions, sometimes to counterintuitively high values.

Conclusions: Effective sample size can express the statistical/sampling uncertainty of risk predictions from a Cox model for individual patients. This uncertainty measure could be better interpretable for healthcare providers or patients compared to estimates of variance or confidence intervals. Future studies should clarify its role in communicating uncertainty of predicted survival probabilities.

References: [1] Thomassen, D., le Cessie, S., van Houwelingen, H. C., & Steyerberg, E. W. (2024). Effective sample size: A measure of individual uncertainty in predictions. Statistics in Medicine. https://doi.org/10.1002/sim.10018

Conference Agenda