18-biomarker-diagnostic: 1
Assessing diagnostic accuracy for three-class classification problems
Maria C. Pardo, Alba M. Franco-Pereira, Victor M. Sierra
Complutense University of Madrid, Spain
Keywords: diagnostic accuracy, power, volume under the ROC surface (VUS), overlap measure (OVL)
- Background / Introduction
Diagnostic testing is an extremely important aspect of medical care. In many situations the diagnostic decisions are not always binary. An early or intermediate disease stage usually occurs as individuals transition from the healthy stage to the fully diseased stage. To summarize a diagnostic test’s overall ability to simultaneously discriminate three diagnostic groups, the volume under the curve (VUS) is one of the most well-known measures which generalizes the notion of the area under the curve (AUC) in the two-class problem.
However, VUS is limited in their ability to fully capture the complexities of some scenarios in the three -class problem as well as AUC in the two-class problem. Pardo and Franco (2025) explored the advantages of the Overlap measures (OVL) over the AUC to assess the accuracy of a medical diagnostic test in the binary case. In this work, extension of this measure has been studied for three-class classification problems. We study methods for estimating OVL for three groups under both parametric and non-parametric frameworks. Furthermore, we propose a testing process for its statistical significance.
The size and power of the proposed methods for testing the utility of biomarkers drawn from normal, lognormal gamma distributions and mixture of them at different sample sizes are evaluated. Most of cases, our proposal is preferred to VUS.
In some situations, VUS tends to perform very poorly, with power values approaching to 0.05. Therefore, VUS would naively lead to rejection of informative biomarkers. However, OVL outperforms VUS in these situations, making it a valuable tool in the biomarker field.
References: Pardo, M.C. and Franco-Pereira, A.M. (2025). Overlap measures against ROC summary indices. Statistical Science, in press
18-biomarker-diagnostic: 2
Biological Age Estimation in the Estonian Biobank Based on NMR Metabolomics Data and Phenotype
Mara Delesa-Velina1, Krista Fischer1,2, Estonian Biobank Research Team2
1Institute of Mathematics and Statistics, University of Tartu, Estonia; 2Institute of Genomics, University of Tartu, Estonia
Background
With aging populations on the rise, there is increasing interest in studying biological age measures. NMR metabolites, small molecules involved in metabolic pathways detected using NMR spectroscopy, have shown promise in estimating biological age. With now more than 200,000 participants in the Estonian Biobank having NMR blood metabolite data available, we aim to develop a model for predicting all-cause mortality and estimating biological age.
A common approach for biological age estimation is regression modelling, where age is used as the dependent variable. This approach produces biological age estimators (aging clocks) that predict an individual's age as precisely as possible. However, this does not imply that individuals with biological age estimate exceeding their chronological age have a higher risk of disease or a shorter lifespan. An alternative approach is to define biological age so that it is directly related to the underlying risk level. We propose such an approach based on a parametric survival model.
Methods
We develop the model using the first cohort of biobank participants (n=31,359, recruited between 2002 and 2010, mean follow-up 13.3 years, SD 4.4 years). We validate the model using the second cohort of the biobank (n=118,664, recruited from 2018 onwards, mean follow-up 5.2 years, SD 0.7 years).
We employ a Cox proportional hazards model with age as a timescale and stepwise selection to identify NMR metabolite biomarkers independently associated with 10-year mortality. We model an individual’s survival probability using a parametric Gompertz distribution with NMR score, prevalent disease, and phenotype as covariates. Finally, we define survival-based biological age as the age where the individual's current survival probability, given their covariate profile, equals the survival probability of an average individual in the cohort. We estimate biological age acceleration (BAA) as the difference between biological and chronological age.
Results
The NMR score comprises 17 metabolic biomarkers and is highly associated with mortality in both the development and validation cohorts, with HR (per SD of NMR score) of 1.78 (95% CI 1.73–1.83) and 1.79 (95% CI 1.74–1.84), respectively. The survival-based biological age estimate is symmetrically distributed around the chronological age. BAA estimate is a powerful predictor of 5-year survival (C-index 0.762, Cox model with age timescale) in the validation cohort and remains informative for the age group over 70 (C-index 0.673).
Conclusion
Survival-based biological age estimate based on NMR metabolite score is a more powerful predictor of mortality than the chronological age adjusted by common phenotypic predictors.
18-biomarker-diagnostic: 3
Bézier curve parametric method for approximating ROC curves in the context of multiple clinical decision thresholds
Denys Prociuk1, Brendan Delaney1, Francesca Fiorentino1,2
1Imperial College London, United Kingdom; 2University of Leeds, United Kingdom
Background / Introduction
Receiver Operating Characteristic (ROC) curves are fundamental in clinical decision-making and are widely used to assess diagnostic test performance. However, selecting cut-off points to guide a clinical decision can be challenging. Traditional approaches—such as Youden’s J statistic or clinician judgment—often have limitations, especially when multiple thresholds would have better clinical utility. We propose using the Bézier curve parametric method to fit a curve to diagnostic test data and to determine cut-off points by leveraging on the fitted curve’s shape and its rate of change.
Methods
We use the RECAP-V1[1] study data to demonstrate the application of the Bézier curve method. RECAP-V1 produced a ROC curve to determine which patients were at hight risk of hospitalisation for COVID-19.
Using the non-linear least squares methods, we identified “control” points for optimising the fitting of both cubic and quadratic Bézier curves. These control points were then used to identify candidate cut-off points, which were compared to thresholds derived from expert clinician judgment in RECAP-V1 (Green/Amber-Amber/Red thresholds for risk). Additionally, we examined the Bézier curve’s curvature as an alternative strategy for identifying a single optimal cut-off to compare the use of Bézier to Youden’s method. Sensitivity, specificity, and interval likelihood ratios (ILR) were used as performance metrics.
Results
The quadratic Bézier approach yielded a Green/Amber threshold with 91% sensitivity (ILR of 0.27), and an Amber/Red threshold with 97% specificity (ILR of 6.66). The cubic method produced similar outcomes, demonstrating the robustness of the approach. When comparing to expert clinical judgement, the Green/Amber threshold showed a similar sensitivity (91% vs. 90%, ILR 0.27 vs 0.16), while the Amber/Red threshold demonstrated higher specificity (97% vs. 90%, ILR 6.66 vs. 6.00). Hence, Bézier-derived thresholds were in close agreement with those selected by clinicians. Curvature-based analysis provided an alternative single cut-off point that closely matched Youden’s J statistic.
Conclusion
Bézier curve fitting offers a robust method for selecting ROC curve cut-off points, aligning closely with expert clinical judgment. It can aid non experts in the identification of multiple thresholds for clinical decision. For RECAP-V1 it improved sensitivity and specificity hence could have improved clinical decision-making. Future research should explore its applicability across a range of diagnostic models.
[1]Espinosa-Gonzalez et al.Remote COVID-19 Assessment in Primary Care(RECAP) risk prediction tool: derivation and real-world validation studies. Lancet Digital Health.2022;4(9):e646–e656.https://doi.org/10.1016/S2589-7500(22)00123-6
18-biomarker-diagnostic: 4
Deriving Cost-effective Neyman-Pearson Classifier with Multiple-Modality Detection Tools
jiaming Qiu, Yingqi Zhao, Yingye Zheng
Fred Hutchinson Cancer Center, United States of America
Background: In binary medical decision-making, such as early disease detection, the goal is to identify patients at risk for malignant outcomes while avoiding unnecessary invasive diagnostic procedures. Neyman-Pearson (NP) classifiers are commonly used to control the false positive rate (FPR) within an acceptable threshold while maximizing the true positive rate (TPR). However, when multiple testing modalities are available, there is a challenge in balancing the benefits of comprehensive disease detection with the costs and complications of extensive testing. In prostate cancer diagnosis, for example, a biopsy is an invasive procedure that may not be necessary for many low-risk patients. Current clinical guidelines suggest the use of various biomarker tests and multiparametric magnetic resonance imaging (mpMRI) for risk stratification, yet clinicians face uncertainty in selecting and sequencing these tests. Methods: We propose a sequential decision-making framework within the NP classifier paradigm to address these challenges in prostate cancer diagnosis. Specifically, we develop a 2-step diagnostic protocol that utilizes biomarker tests and MRI results. In the first step, patients are categorized based on biomarker test values: those with values below a low threshold are sent home without further testing, those exceeding a high threshold are referred for a biopsy, and those with intermediate values undergo MRI for additional information. In the second step, the biomarker and MRI results are combined to decide whether a biopsy is necessary for patients who underwent MRI. The objective is to minimize unnecessary biopsies, maintain an acceptable false negative rate for aggressive cancers, and reduce procedural costs by limiting the number of patients who undergo further testing. Results: The proposed sequential rule effectively minimizes false positives (unnecessary biopsies) while keeping the false negative rate for aggressive cancers within clinically acceptable limits. By selectively using tests, it reduces the number of patients who undergo subsequent procedures, leading to a reduction in overall procedural costs without compromising diagnostic accuracy. The trade-offs between limiting initial testing and controlling the false positive rate are quantified, optimizing the balance between diagnostic performance and cost efficiency. Conclusion: The sequential decision-making protocol presented in this study offers a more cost-effective and personalized approach to prostate cancer diagnosis. By optimizing the use of biomarker tests and MRI, the method minimizes unnecessary procedures and maximizes diagnostic accuracy, providing a framework that can be adapted for other medical decision-making contexts involving multiple tests.
18-biomarker-diagnostic: 5
Improving Biomarker Diagnostic Accuracy with the Likelihood Ratio Transformation
Ainesh Sewak1, Vanda Inacio2
1University of Bern, Switzerland; 2University of Edinburgh, Scotland
section*{Introduction} Accurate biomarker-based diagnostic and screening tests rely on the receiver operating characteristic (ROC) curve to assess classification performance. However, in some cases, the empirical ROC curve of a biomarker is improper or 'hooked', meaning it crosses below the diagonal and fails to provide a reliable decision rule. It has been established since the invention of ROC curves that mapping biomarkers to the likelihood ratio scale yields a mathematically optimal decision rule and ensures a proper ROC curve. However, despite its theoretical appeal, there is surprisingly little literature on this approach, with only a few parametric developments addressing it. section*{Methods} We present three models for transforming biomarkers to the likelihood ratio scale, each leading to an optimal decision rule. The parametric binormal model provides a closed-form transformation under Gaussian assumptions. This serves as a foundation for more flexible approaches. Next, the semiparametric approach leverages flexible distributional regression models, allowing for marginal density estimation without strict parametric constraints. Finally, we demonstrate how additive logistic regression can achieve the same transformation using standard binary regression techniques. For each method, we establish theoretical properties that ensure proper ROC curves and optimal classification performance. section*{Results} Through simulations and analysis of three biomarker datasets, we demonstrate that transforming improper biomarkers to the likelihood ratio scale consistently improves diagnostic accuracy. The improvement is most pronounced when the original ROC curve is highly improper, while for already proper ROC curves, the transformation has minimal effect. section*{Conclusion} The likelihood ratio transformation offers a simple and powerful solution for correcting improper ROC curves and improving biomarker diagnostic accuracy. Our results indicate that transforming to the likelihood ratio scale should be the default, especially when biomarkers exhibit improperness. This work has broad practical implications for clinical biomarker evaluation and its implementation is readily accessible using standard statistical software.
emph{Keywords}: {ROC curve, likelihood ratio, biomarkers, regression, generalized additive models.}
|