JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at organizers@iscb2025.info.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Observational/real-world data 1

Time:

Wednesday, 27/Aug/2025:

2:00pm - 3:30pm

Location: ETH E21

D-BSSE, ETH, 54 seats

Presentations

42-observational-rwd-1: 1

Handling informative patient monitoring in routinely-collected data used to estimate treatment effects, with application to high-frequency hospital data

Leah Pirondini¹, Karla Diaz-Ordaz², Ruth Keogh¹

¹Department of Medical Statistics, London School of Hygiene and Tropical Medicine, UK; ²Department of Statistical Science, University College London, UK

Introduction and Objectives:

Routinely-collected hospital data provide opportunities to gain understanding of treatment effects that would not be feasible in randomised trials and that reflect their impact in realistic clinical practice. A challenge presented by hospital data is that measurements of patients’ clinical status are made at high frequency, on differing schedules for each patient dependent on their underlying clinical status, so timing and frequency of measurements is informative. However, many existing causal inference methods assume measurements are made at regular time intervals. The aim of this work is to evaluate methods for estimating causal effects of longitudinal treatments in the presence of informative monitoring. This is motivated by hospital data on patients in the intensive care unit and questions about optimal mechanical ventilation strategies.

Methods and Results:

We compare methods based on (i) marginal structural models fitted by inverse probability of treatment weighting (MSM-IPW), (ii) G-computation, and (iii) longitudinal targeted maximum likelihood estimation (LTMLE). We assume an underlying grid of time, such that time-dependent variables are either monitored or unmonitored at each time-point. Methods are based either on imputation of unmonitored covariate data or on adapting inverse probability weights to account for monitoring variables. We evaluate methods using a simulation study, comparing against more simple approaches using last-observation-carried-forward (LOCF) ignoring informativeness of monitoring. Data are simulated to represent a range of realistic scenarios with time-varying treatment and covariates, in which monitoring depends on past covariate, treatment and monitoring levels. We also illustrate methods in a real-world example using routinely-collected intensive care data from UCLH to investigate the use and the timing of initiation of invasive mechanical ventilation vs non-invasive or no ventilation on mortality.

We show that ignoring monitoring can result in bias, the size of which depends on informativeness of the monitoring process. All methods reduce bias compared with their naïve LOCF-based equivalents, with LTMLE and G-computation based methods resulting in the smallest bias.

Conclusions:

Data with informative monitoring are common in observational studies, but there is a lack of readily-implementable methods to handle them. We describe three methods and evaluate their performance.

42-observational-rwd-1: 2

Target Trial Emulation to Duplicate Randomized Clinical Trials using Registry Data in Multiple Sclerosis

Antoine Gavoille^1,2,3, Mikail Nourredine^2,3, Fabien Rollot⁴, Romain Casey⁴, Sandra Vukusic^1,4, Muriel Rabilloud^2,3, Fabien Subtil^2,3

¹Hospices Civils de Lyon, Service de Neurologie, sclérose en plaques, pathologies de la myéline et neuro-inflammation, F-69677 Bron, France; ²Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69100 Villeurbanne, France; ³Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, F-69003 Lyon, France; ⁴Observatoire Français de la Sclérose en Plaques, Centre de Recherche en Neurosciences de Lyon, INSERM 1028 et CNRS UMR 5292, F-69003 Lyon, France

Introduction: Target trial emulation (TTE) offers a rigorous framework to answer causal questions using observational data and could be of major interest to the field of multiple sclerosis (MS) research. Replicating the results of randomized clinical trials (RCTs) is a key approach to validate the TTE methodology and the data source used. In the present study, we aimed to replicate 8 RCTs evaluating the efficacy of an active disease-modifying therapy (DMT) versus a treated control group in MS using observational data from the French MS registry, and to compare different g-methods.

Method: Data were extracted in December 2023 from the Observatoire Français de la Sclérose en Plaques (OFSEP) database. For each emulated trial, we included patients who initiated one of the DMTs evaluated in the trial and met its inclusion criteria, and compared the nitiation of the active DMT vs. control DMT, in an intention-to-treat setting. The primary outcome was the annualized relapse rate (ARR). Secondary outcomes were EDSS progression confirmed at 3 months during the study period, and new/enlarged T2-lesions and new gadolinium-enhanced T1-lesions on a brain MRI during the study period. Several g-methods were applied to estimate the treatment effect adjusted for confounding factors between groups and corrected for censoring and missing outcome assessment: propensity-matching, inverse probability weighting (IPW), g-computation, and targeted maximum likelihood estimator (TMLE). The concordance between the treatment effects estimated in emulated trials and in the corresponding RCT was analyzed using predefined agreement metrics.

Results: A total of 14 111 patients were included in the 8 emulated trials: ASSESS, BEYOND, CONFIRM, OPERA, REGARD, RIFUND-MS, TENERE, and TRANSFORMS. Emulated trials estimates were concordant with RCT results in 7 of 8 trials for relapse rate, and in all 6 trials which evaluated EDSS progression. Radiological outcomes were more challenging to replicate, achieving concordance in 3 of 5 trials for the analysis of new T2-lesions, and 1 of 4 trials for new gadolinium-enhanced T1-lesions. Among g-methods, TMLE provided estimates most consistent with RCTs, while IPW and g-computation yielded comparable results but diverged in trials with fewer patients. Matching-based estimates showed higher variance and greater deviation from TMLE in smaller sample sizes.

Conclusion: The use of a TTE methodology applied to the OFSEP registry data is a valid and powerful tool for evaluating treatment effectiveness in MS. Our results support the use of real-world evidence to explore questions beyond the scope of RCTs.

42-observational-rwd-1: 3

Exploring Synthetic Control Data Quality Between Data Types in Two Case Studies: COVID-19 and Crohn’s Disease

Nicole Ann Cizauskas, Svetlana Cherlin, James Wason

Newcastle University, United Kingdom

Introduction

Synthetic control arms are useful in clinical trials that have restricted numbers of participants available, such as for rare diseases. The current literature on creating synthetic controls suggests that randomised control trial (RCT) data is the best data source compared to observational study data or external data. This paper aims to provide a method for measuring and comparing the quality of synthetic control data, using two metrics: treatment effect maintenance and standard mean difference (SMD) between data types.

Methods

Two case study datasets were selected to illustrate this, COVID-19 and Crohn’s Disease. For each case study, RCT data, observational study data, and external real-world data were selected and compared. Datasets were simulated from summary level data in real studies, and synthetic data was produced from these simulated datasets. Four scenarios with differing sample sizes were simulated to test the effect of sample size on synthesis quality. Three different data synthesis methods were compared: categorical and regression tree (CART) models, linear/logistic regression, and random sampling. The treatment effect on the disease outcome was measured using a chi-squared test. SMD was calculated between each simulated variable and its corresponding synthetic variable in each dataset. SMD was also calculated between corresponding variables of different data types (RCT, observational, and external) and compared across both simulated and synthetic datasets.

Results

The metrics show little difference in quality between RCTs and other data types in the two disease case studies tested. There were no notable differences between sample size scenarios or method of data synthesis in either treatment effect maintenance or SMD. Quality did fluctuate across synthetic datasets, but not in an identifiable pattern.

Discussion

Future studies looking to use synthetic controls should not disregard the use of observational study or external data in the creation of synthetic controls but should check the quality of any synthetic control groups created regardless. Testing this method on other disease datasets would provide a better understanding of how data type influences synthetic data quality.

42-observational-rwd-1: 4

The most appropriate method for outlier detection in a clinical audit depends on the data distribution

Anqi Sui, Menelaos Pavlou, Rumana Z. Omar, Gareth Ambler

University College London, United Kingdom

Introduction

Monitoring the clinical performance of healthcare units (e.g. hospitals, surgeons) is essential for national audits, particularly in identifying 'outlier' units whose performance (e.g. probability of in-hospital death) deviates significantly from expected performance. Detecting and managing outliers is crucial for improving healthcare quality.

Common methods for outlier detection include Common Mean Model (CMM) and Random Effects Logistic Regression (RELR). Our study evaluates their performance through simulation and provides recommendations for their appropriate use.

Methods

CMM assumes that the probability of death is the same in all units, attributing any observed differences to random binomial variation. As the observed variability is often larger than expected (overdispersion), CMM is applied with an overdispersion correction. To detect outliers, test statistics are constructed based on differences between observed and expected unit mortality; these are assumed to follow a normal distribution for 'in-control' units. In contrast, RELR uses test statistics based on the estimated random effects which are on the logit scale and assumed to follow a normal distribution for 'in-control' units. Both assumptions cannot hold simultaneously unless outcome prevalence is close to 0.5.

To assess the performance of these methods when their assumptions are violated, we simulated scenarios with varying numbers of units, unit sizes, outcome prevalences, and levels of variability between units. Two data-generating mechanisms (DGMs) were used, based on CMM and RELR respectively. The performance of each method was assessed focusing on the overall false positive rate (FPR) and the FPR for 'good' and 'bad' (low/high mortality) outliers separately.

Results

Both methods appeared to work well, achieving the nominal overall FPR. However, the FPR for good and bad outliers deviated from the nominal level when the DGM was not aligned with the outlier detection method. When outcome prevalence was low, applying CMM to RELR-DGM data led to over-detection of bad outliers and under-detection of good outliers (and vice versa). These issues worsened by small unit sizes and greater variability between units. Both methods were applied to real datasets with low prevalence leading to differences that can be attributed to the findings above.

Conclusion

CMM and RELR are widely used in clinical audits for outlier detection. Our findings reveal that violations of their underlying assumptions can have serious implications, potentially leading to unfair scrutiny of healthcare units or failing to flag underperforming units. The most appropriate method should be chosen following a check of the test statistics distribution, e.g. using appropriate diagnostic tools.

42-observational-rwd-1: 5

Incorporating real-world data to refine the calculation of probability of success

Bergas Fayyad^1,2, Laura Rodwell¹, Kit Roes¹, Giulia Ferrannini², Christian Basile², Lars Lund², Gianluigi Savarese², Aysun Cetinyurek-Yavuz¹

¹Radboud University Medical Center, Netherlands; ²Karolinska Institutet, Sweden

Background: In drug development, several trials are required to progress to confirmatory evaluation. A crucial milestone is the decision on whether to proceed to phase III based on phase II results. Several quantitative methods have been developed to inform this decision, with one of the more widely used being probability of success (PoS). In some cases, the endpoint used in phase II and phase III trials are different. Several approaches have been proposed to address this difference. However, a recent review [1] highlighted the potential of using real-world data (RWD) for that purpose. We focus on the case where the phase II trial uses a continuous biomarker endpoint while the planned phase III trial uses a survival endpoint. We propose a method to construct the “design prior” for the primary survival endpoint which incorporates the association between the biomarker and the survival endpoint estimated from RWD.

Methods: The association between the biomarker and the survival endpoint is first obtained through a Cox proportional hazard model using registry data. This association is then combined with the biomarker treatment effect estimate from phase II trial to obtain the treatment effect on the survival endpoint. This approach can also incorporate a prior distribution directly on the hazard ratio of the survival endpoint if the information is available (e.g., from phase II). We demonstrated this approach using the Swedish Heart Failure Registry. We compared the impact of important data-related decisions in using registry data, including the timing of the follow-up and biomarker measurement, choice of endpoint, and the relevant patient population.

Results and conclusion: With a well-established registry, it was possible to derive estimates of PoS, including exploring for relevant subgroups. The settings in using registry data had an impact on the association between the biomarker and survival endpoints, and thus the PoS. The choice of the subset of patients had the largest impact of the registry-related aspects. The change in PoS due to the inclusion of a prior distribution directly on the hazard ratio was larger than any specifications related to the registry data. This work provides a methodological solution to incorporate registry data in the PoS calculations to aid decision-making.

References:

1. Cetinyurek Yavuz, A., et al., On the Concepts, Methods, and Use of “Probability of Success” for Drug Development Decision-Making: A Scoping Review. Clinical Pharmacology & Therapeutics, 2025.

Mobile View Print View

Contact and Legal Notice · Contact Address:

organizers{at}iscb2025 dot

info

Privacy Statement · Conference: ISCB46