46th Annual Conference of the International Society for Clinical Biostatistics (ISCB)

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session

Meta-analysis 1

Time:

Monday, 25/Aug/2025:

2:00pm - 3:30pm

Location: ETH E27

D-BSSE, ETH, 84 seats

Presentations

10-meta-analysis-1: 1

Precision Of Treatment Hierarchy: A Metric for Quantifying Certainty in Treatment Hierarchies from Network Meta-Analysis

Augustine Wigle¹, Audrey Béliveau¹, Georgia Salanti², Gerta Rücker³, Guido Schwarzer³, Dimitris Mavridis⁴, Adriani Nikolakopoulou^5,3

¹Department of Statistics and Actuarial Science, University of Waterloo, Ontario, Canada; ²Institute of Social and Preventative Medicine, University of Bern, Bern, Switzerland; ³Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany; ⁴Department of Primary Education, University of Ioannina, Ioannina, Greece; ⁵Laboratory of Hygiene, Social and Preventive Medicine and Medical Statistics, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece

Background: Network meta-analysis (NMA) is an extension of pairwise meta-analysis which facilitates the estimation of relative effects for multiple competing treatments. A hierarchy of treatments is a useful output of an NMA. Treatment hierarchies are produced using ranking metrics. Common ranking metrics include the Surface Under the Cumulative RAnking curve (SUCRA) and P-scores, which are the frequentist analogue to SUCRAs. Both metrics consider the size and uncertainty of the estimated treatment effects, with larger values indicating a more preferred treatment. Although SUCRAs and P-scores themselves consider uncertainty, treatment hierarchies produced by these ranking metrics are typically reported without a measure of certainty, which might be misleading to practitioners.

Methods: We propose a new metric, Precision of Treatment Hierarchy (POTH), which quantifies the certainty in producing a treatment hierarchy from SUCRAs or P-scores. The metric connects three statistical quantities: The variance of the SUCRA values, the variance of the mean rank of each treatment, and the average variance of the distribution of individual ranks for each treatment. We show how the metric can be adapted to apply to subsets of treatments in a network, for example, to quantify the certainty in the hierarchy of the top three treatments.

Results: We calculate POTH for a database of NMAs to investigate its empirical properties, and we demonstrate its use on two published networks: a network of antifungal treatments to prevent mortality for solid organ transplant recipients (POTH=0.326) and a network of pharmacological treatments for persistent depressive disorder (POTH=0.559).

Conclusion: Although POTH was proposed specifically to summarise the certainty in treatment hierarchies derived using SUCRAs or P-scores, it can also be viewed simply as a way to summarise all the ranking probabilities in a given network. It is therefore an indicator of the certainty in any treatment hierarchy derived using a ranking metric related to the ranking probabilities. In summary, POTH provides a single, interpretable value which quantifies the degree of certainty in producing a treatment hierarchy.

10-meta-analysis-1: 2

Extending P-Scores for Ranking Diagnostic Tests in Network Meta-Analysis

Sofia Tsokani^1,2, Fani Apostolidou-Kiouti¹, Adriani Nikolakopoulou^1,3, Areti-Angeliki Veroniki^4,5, Anna-Bettina Haidich¹, Dimitris Mavridis⁶

¹Laboratory of Hygiene, Social & Preventive Medicine and Medical Statistics, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece; ²Methods Support Unit, Cochrane, London, UK; ³Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany; ⁴Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, Toronto, ON, Canada; ⁵Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada; ⁶Department of Primary Education, School of Education, University of Ioannina, Ioannina, Greece

Background / Introduction

Network meta-analysis (NMA) of diagnostic test accuracy (DTA) studies allows for the comparison of multiple diagnostic tests in a single framework, considering both sensitivity and specificity. Traditional ranking metrics such as the diagnostic odds ratio (DOR) and superiority index summarize diagnostic performance but fail to account for the correlation between sensitivity and specificity. P-scores, initially suggested for ranking interventions in NMA, provide a probabilistic measure of how likely an intervention is superior to any other, averaged over all competing interventions. In NMA of interventions, there has been an extension of P-scores to accommodate multiple outcomes. Using this advancement, in this study, we adapt P-scores to DTA-NMA, incorporating sensitivity and specificity jointly to improve diagnostic test ranking.

Methods

Our approach requires estimates of comparisons of sensitivity and specificity across all diagnostic tests within the network, as derived from a DTA-NMA model. We use logit-transformed estimates of sensitivity and specificity, along with their standard errors and correlation between sensitivity and specificity. We have extended P-score methodology in which ranking probabilities for each test are computed by assessing test’s superiority in both sensitivity and specificity simultaneously. To achieve this, we modified the P-score function, allowing it to accommodate bivariate diagnostic measures. The final rankings obtained via P-score were compared against DOR-based and superiority index-based rankings to assess differences in interpretation. An R package to enable implementation of P-scores in DTA-NMAs is under preparation.

Results

As an example, we applied an ANOVA-based DTA-NMA model to a dataset of six diagnostic tests for breast cancer recurrence (15 studies, 659 individuals, 338 with recurrence, reference standard histological diagnosis/long-term clinical follow-up/autopsy finding). P-score rankings identified PET/CT as the best-performing test (P-score = 0.76), followed by MRI (0.58), PET (0.51), BS (0.39), CT (0.11), and CW (0.01). Τhis means that PET/CT has 75% probability of outperforming all other diagnostic tests in both sensitivity and specificity. These results aligned with DOR rankings, where PET/CT had the highest DOR (88.99) and CW the lowest (7.00). However, P-scores provided additional insights, accounting also for the correlation between sensitivity and specificity, which traditional ranking methods overlook.

Conclusion

Ranking diagnostic tests in DTA-NMA is challenging due to the bivariate nature of sensitivity and specificity. Traditional ranking methods fail to distinguish between tests with high sensitivity but low specificity (or vice versa). The proposed P-score extension offers a clearer, probabilistic ranking framework for more effective comparisons.

10-meta-analysis-1: 3

Resolving conflicting treatment hierarchies across multiple outcomes in multivariate network meta-analysis

Theodoros Evrenoglou¹, Anna Chaimani², Guido Schwarzer¹

¹Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, Germany; ²Oslo Center for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway

Background/Introduction: Multivariate network meta-analysis (mvNMA) extends network meta-analysis (NMA) by enabling the simultaneous comparison of multiple treatments across multiple outcomes. On top of indirect evidence, mvNMA accounts for the between-outcome correlation, further improving the precision and reliability of the summary treatment effect estimates. Outputs from mvNMA are typically extensive and comprise several treatment effect estimates with varying uncertainty. Similar to NMA, these outputs can be summarized using ranking metrics. However, in mvNMA, separate treatment hierarchies are generated for each outcome, making it challenging to identify the best treatments overall, particularly when hierarchies conflict. To date, the literature lacks proper methods to address conflicting treatment hierarchies. Consequently, decision-making relies solely on ad-hoc approaches, based on separate NMAs for each outcome.

Methods: We introduce a novel framework for handling conflicting treatment hierarchies across outcomes in mvNMA. First, we fit a Bayesian mvNMA model to obtain outcome-specific hierarchies in terms of SUCRAs. Then, to resolve conflicts across treatment hierarchies, we adapt the VIKOR method – originally proposed for multi-criteria decision analysis – to the meta-analytic setting. This method aims to identify the best ‘compromise’ solution across the different outcome-specific hierarchies. Specifically, by combining both the weighted L₁ and L_oo norms, we develop a novel amalgamated ranking metric that evaluates each treatment’s overall and worst performance across all outcomes. Here, weights represent the importance of each outcome to the decision-maker, obtained either by expert opinion or patient preferences. We further extend our method by establishing concrete mathematical criteria to determine whether a unique treatment or a set of treatments should be recommended as the ‘best’ compromise solution across outcomes.

Results: We illustrate the use of our approach through a network comparing seven treatment classes for chronic plaque psoriasis in terms of their efficacy and safety. In this example, the outcome-specific treatment hierarchies are notably conflicting, as the most efficacious treatments demonstrate reduced performance in terms of safety. By applying the proposed method, we resolved these conflicts and obtained both an amalgamated treatment hierarchy and the best compromise solution that offers optimal balance between efficacy and safety.

Conclusions: The proposed framework provides a novel method for generating amalgamated treatment hierarchies and resolving conflicting hierarchies across outcomes. It identifies either a unique solution or a set of treatments, offering clear guidance on the best compromise. By incorporating outcome weights, our method also supports decision-making based on expert opinion or patient preferences.

10-meta-analysis-1: 4

A novel approach for modelling components of complex interventions in network meta-analysis

Tianqi Yu¹, Anna Chaimani²

¹Center of Research in Epidemiology and Statistics, Université Paris Cité, France; ²Oslo Center for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway

Background

Complex interventions consisting of multiple potentially interacting components are increasingly encountered in many health domains. Synthesis of data on complex interventions within networks of studies through component network meta-analysis (CNMA) is advantageous since the different components are shared across several interventions and this allows estimating the effects of the individual components. The majority of NMAs involving complex interventions assume that the effects of the individual components are additive; namely for an intervention consisting of components A and B they assume that effect(A+B)=effect(A)+effect(B). This approach often lacks clinical relevance since it ignores the potential interactions leading to synergistic or antagonistic effects between the components. On the other hand, models that incorporate interactions are more flexible but face challenges with limited statistical power.

Methods

We introduce a novel CNMA approach that uses ideas from mediation analysis and models complex interventions by specifying mediating pathways that capture the cumulative and sequential effects of the different components on the outcome. The primary assumption is that in studies combining multiple components, there exists a pathway of effects through which one component influences the outcome directly and/or indirectly via other components. For example, for an intervention comprising components A and B, B has a direct effect on the outcome, while A impacts the outcome both directly and indirectly through B. In interventions with three or more components, a hierarchical framework ranks components by their relative “strength”. “Stronger” components affect “weaker” ones, which mediate their effects on the outcome. These relationships are mathematically expressed using recursive equations that decompose the total effect into direct and mediated effects. The method can be implemented using both frequentist and Bayesian frameworks. In the frequentist setting, iterative algorithms such as Newton-Raphson or BFGS are used for estimation, while in the Bayesian framework, estimation is achieved through MCMC methods.

Results

We illustrate our method using data from 56 randomized controlled trials of psychological interventions for coronary heart disease. Compared to standard and additive models, the proposed approach yielded more precise estimates while better capturing the interactions between components.

Conclusion

Our approach offers a robust and flexible framework for modeling mechanisms in complex interventions, addressing key limitations of existing methods. However, the identification of plausible pathways requires collaboration with domain experts to ensure clinical relevance.

10-meta-analysis-1: 5

The impact of using different random effects models in meta-analysis

KANELLA PANAGIOTOPOULOU¹, SOODABEH BEHBOODI², JENNIFER ZEITLIN², ANNA CHAIMANI³

¹Université Paris Cité, Center of Research in Epidemiology and Statistics, Inserm, Paris, France; ²Université Paris Cité, Inserm, National Research Institute for Agriculture, Food and the Environment, Centre for Research in Epidemiology and Statistics, Obstetrical Perinatal and Pediatric Epidemiology Research Team, Paris, France; ³Oslo Center for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway

Background: In the majority of published meta-analyses, the random effects model is used to allow for heterogeneity across studies assuming a normal distribution for the study effects. It has been claimed previously, that under certain conditions, the between-study normality assumption might be suboptimal. Previous simulation studies that compared several alternative random effects meta-analysis models with the normal model found small differences in bias but larger differences in coverage probability and precision. To date, the real impact on the meta-analytic results and the potential benefit from using alternative random-effects models remain unclear.

Methods: To investigate the impact of adding flexibility to the between-study distribution, we used a meta-analysis of 65 cohort studies comparing the cognitive functioning between preterm- and term-born children. Despite the presence of heterogeneity among studies, none of the traditional approaches, such as subgroup analysis and meta-regression, succeeded in identifying the factors that may cause it. We compared the results between 18 different random effects models: a) models based on skewed extensions of the normal and the t-distribution, b) models based on mixtures of distributions, and c) models based on Dirichlet process (DP) priors. We also evaluated the potential of non-normal models to give insight in the true distribution of the underlying effects. Sensitivity analyses on prior distributions and on key model parameters were also conducted.

Results: We found small differences in the estimation of the mean treatment effect but larger differences for the between-study variance. Skewed and t-distribution models gave a negatively skewed, heavy-tailed and highly peaked posterior distribution for the random effects. This was in line with the results from models incorporating a test for outliers which suggested the presence of two outlying studies. Models using DP priors revealed the presence of two main clusters of studies suggesting that probably the most important effect modifiers are the level of birth prematurity, the use of matched or unmatched data and the type of Intelligence Quotient (IQ) assessment test. The potential impact of these characteristics together had not been considered in the original meta-analysis. Other mixture models, such as mixtures of two normal or t-distributions, appeared less informative.

Conclusion: Our study highlights that using various random effects models might not affect materially the summary estimates but may assist to explain the observed heterogeneity and provide better insights into the distribution of the underlying effects and the interpretation of the findings.

Conference Agenda