42-observational-rwd-1: 1
Handling informative patient monitoring in routinely-collected data used to estimate treatment effects, with application to high-frequency hospital data
Leah Pirondini1, Karla Diaz-Ordaz2, Ruth Keogh1
1Department of Medical Statistics, London School of Hygiene and Tropical Medicine, UK; 2Department of Statistical Science, University College London, UK
Introduction and Objectives:
Routinely-collected hospital data provide opportunities to gain understanding of treatment effects that would not be feasible in randomised trials and that reflect their impact in realistic clinical practice. A challenge presented by hospital data is that measurements of patients’ clinical status are made at high frequency, on differing schedules for each patient dependent on their underlying clinical status, so timing and frequency of measurements is informative. However, many existing causal inference methods assume measurements are made at regular time intervals. The aim of this work is to evaluate methods for estimating causal effects of longitudinal treatments in the presence of informative monitoring. This is motivated by hospital data on patients in the intensive care unit and questions about optimal mechanical ventilation strategies.
Methods and Results:
We compare methods based on (i) marginal structural models fitted by inverse probability of treatment weighting (MSM-IPW), (ii) G-computation, and (iii) longitudinal targeted maximum likelihood estimation (LTMLE). We assume an underlying grid of time, such that time-dependent variables are either monitored or unmonitored at each time-point. Methods are based either on imputation of unmonitored covariate data or on adapting inverse probability weights to account for monitoring variables. We evaluate methods using a simulation study, comparing against more simple approaches using last-observation-carried-forward (LOCF) ignoring informativeness of monitoring. Data are simulated to represent a range of realistic scenarios with time-varying treatment and covariates, in which monitoring depends on past covariate, treatment and monitoring levels. We also illustrate methods in a real-world example using routinely-collected intensive care data from UCLH to investigate the use and the timing of initiation of invasive mechanical ventilation vs non-invasive or no ventilation on mortality.
We show that ignoring monitoring can result in bias, the size of which depends on informativeness of the monitoring process. All methods reduce bias compared with their naïve LOCF-based equivalents, with LTMLE and G-computation based methods resulting in the smallest bias.
Conclusions:
Data with informative monitoring are common in observational studies, but there is a lack of readily-implementable methods to handle them. We describe three methods and evaluate their performance.
42-observational-rwd-1: 2
Target Trial Emulation to Duplicate Randomized Clinical Trials using Registry Data in Multiple Sclerosis
Antoine Gavoille1,2,3, Mikail Nourredine2,3, Fabien Rollot4, Romain Casey4, Sandra Vukusic1,4, Muriel Rabilloud2,3, Fabien Subtil2,3
1Hospices Civils de Lyon, Service de Neurologie, sclérose en plaques, pathologies de la myéline et neuro-inflammation, F-69677 Bron, France; 2Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69100 Villeurbanne, France; 3Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, F-69003 Lyon, France; 4Observatoire Français de la Sclérose en Plaques, Centre de Recherche en Neurosciences de Lyon, INSERM 1028 et CNRS UMR 5292, F-69003 Lyon, France
Introduction: Target trial emulation (TTE) offers a rigorous framework to answer causal questions using observational data and could be of major interest to the field of multiple sclerosis (MS) research. Replicating the results of randomized clinical trials (RCTs) is a key approach to validate the TTE methodology and the data source used. In the present study, we aimed to replicate 8 RCTs evaluating the efficacy of an active disease-modifying therapy (DMT) versus a treated control group in MS using observational data from the French MS registry, and to compare different g-methods.
Method: Data were extracted in December 2023 from the Observatoire Français de la Sclérose en Plaques (OFSEP) database. For each emulated trial, we included patients who initiated one of the DMTs evaluated in the trial and met its inclusion criteria, and compared the nitiation of the active DMT vs. control DMT, in an intention-to-treat setting. The primary outcome was the annualized relapse rate (ARR). Secondary outcomes were EDSS progression confirmed at 3 months during the study period, and new/enlarged T2-lesions and new gadolinium-enhanced T1-lesions on a brain MRI during the study period. Several g-methods were applied to estimate the treatment effect adjusted for confounding factors between groups and corrected for censoring and missing outcome assessment: propensity-matching, inverse probability weighting (IPW), g-computation, and targeted maximum likelihood estimator (TMLE). The concordance between the treatment effects estimated in emulated trials and in the corresponding RCT was analyzed using predefined agreement metrics.
Results: A total of 14 111 patients were included in the 8 emulated trials: ASSESS, BEYOND, CONFIRM, OPERA, REGARD, RIFUND-MS, TENERE, and TRANSFORMS. Emulated trials estimates were concordant with RCT results in 7 of 8 trials for relapse rate, and in all 6 trials which evaluated EDSS progression. Radiological outcomes were more challenging to replicate, achieving concordance in 3 of 5 trials for the analysis of new T2-lesions, and 1 of 4 trials for new gadolinium-enhanced T1-lesions. Among g-methods, TMLE provided estimates most consistent with RCTs, while IPW and g-computation yielded comparable results but diverged in trials with fewer patients. Matching-based estimates showed higher variance and greater deviation from TMLE in smaller sample sizes.
Conclusion: The use of a TTE methodology applied to the OFSEP registry data is a valid and powerful tool for evaluating treatment effectiveness in MS. Our results support the use of real-world evidence to explore questions beyond the scope of RCTs.
42-observational-rwd-1: 3
Exploring Synthetic Control Data Quality Between Data Types in Two Case Studies: COVID-19 and Crohn’s Disease
Nicole Ann Cizauskas, Svetlana Cherlin, James Wason
Newcastle University, United Kingdom
Introduction
Synthetic control arms are useful in clinical trials that have restricted numbers of participants available, such as for rare diseases. The current literature on creating synthetic controls suggests that randomised control trial (RCT) data is the best data source compared to observational study data or external data. This paper aims to provide a method for measuring and comparing the quality of synthetic control data, using two metrics: treatment effect maintenance and standard mean difference (SMD) between data types.
Methods
Two case study datasets were selected to illustrate this, COVID-19 and Crohn’s Disease. For each case study, RCT data, observational study data, and external real-world data were selected and compared. Datasets were simulated from summary level data in real studies, and synthetic data was produced from these simulated datasets. Four scenarios with differing sample sizes were simulated to test the effect of sample size on synthesis quality. Three different data synthesis methods were compared: categorical and regression tree (CART) models, linear/logistic regression, and random sampling. The treatment effect on the disease outcome was measured using a chi-squared test. SMD was calculated between each simulated variable and its corresponding synthetic variable in each dataset. SMD was also calculated between corresponding variables of different data types (RCT, observational, and external) and compared across both simulated and synthetic datasets.
Results
The metrics show little difference in quality between RCTs and other data types in the two disease case studies tested. There were no notable differences between sample size scenarios or method of data synthesis in either treatment effect maintenance or SMD. Quality did fluctuate across synthetic datasets, but not in an identifiable pattern.
Discussion
Future studies looking to use synthetic controls should not disregard the use of observational study or external data in the creation of synthetic controls but should check the quality of any synthetic control groups created regardless. Testing this method on other disease datasets would provide a better understanding of how data type influences synthetic data quality.
42-observational-rwd-1: 4
The most appropriate method for outlier detection in a clinical audit depends on the data distribution
Anqi Sui, Menelaos Pavlou, Rumana Z. Omar, Gareth Ambler
University College London, United Kingdom
Introduction
Monitoring the clinical performance of healthcare units (e.g. hospitals, surgeons) is essential for national audits, particularly in identifying 'outlier' units whose performance (e.g. probability of in-hospital death) deviates significantly from expected performance. Detecting and managing outliers is crucial for improving healthcare quality.
Common methods for outlier detection include Common Mean Model (CMM) and Random Effects Logistic Regression (RELR). Our study evaluates their performance through simulation and provides recommendations for their appropriate use.
Methods
CMM assumes that the probability of death is the same in all units, attributing any observed differences to random binomial variation. As the observed variability is often larger than expected (overdispersion), CMM is applied with an overdispersion correction. To detect outliers, test statistics are constructed based on differences between observed and expected unit mortality; these are assumed to follow a normal distribution for 'in-control' units. In contrast, RELR uses test statistics based on the estimated random effects which are on the logit scale and assumed to follow a normal distribution for 'in-control' units. Both assumptions cannot hold simultaneously unless outcome prevalence is close to 0.5.
To assess the performance of these methods when their assumptions are violated, we simulated scenarios with varying numbers of units, unit sizes, outcome prevalences, and levels of variability between units. Two data-generating mechanisms (DGMs) were used, based on CMM and RELR respectively. The performance of each method was assessed focusing on the overall false positive rate (FPR) and the FPR for 'good' and 'bad' (low/high mortality) outliers separately.
Results
Both methods appeared to work well, achieving the nominal overall FPR. However, the FPR for good and bad outliers deviated from the nominal level when the DGM was not aligned with the outlier detection method. When outcome prevalence was low, applying CMM to RELR-DGM data led to over-detection of bad outliers and under-detection of good outliers (and vice versa). These issues worsened by small unit sizes and greater variability between units. Both methods were applied to real datasets with low prevalence leading to differences that can be attributed to the findings above.
Conclusion
CMM and RELR are widely used in clinical audits for outlier detection. Our findings reveal that violations of their underlying assumptions can have serious implications, potentially leading to unfair scrutiny of healthcare units or failing to flag underperforming units. The most appropriate method should be chosen following a check of the test statistics distribution, e.g. using appropriate diagnostic tools.
42-observational-rwd-1: 5
Incorporating real-world data to refine the calculation of probability of success
Bergas Fayyad1,2, Laura Rodwell1, Kit Roes1, Giulia Ferrannini2, Christian Basile2, Lars Lund2, Gianluigi Savarese2, Aysun Cetinyurek-Yavuz1
1Radboud University Medical Center, Netherlands; 2Karolinska Institutet, Sweden
Background: In drug development, several trials are required to progress to confirmatory evaluation. A crucial milestone is the decision on whether to proceed to phase III based on phase II results. Several quantitative methods have been developed to inform this decision, with one of the more widely used being probability of success (PoS). In some cases, the endpoint used in phase II and phase III trials are different. Several approaches have been proposed to address this difference. However, a recent review [1] highlighted the potential of using real-world data (RWD) for that purpose. We focus on the case where the phase II trial uses a continuous biomarker endpoint while the planned phase III trial uses a survival endpoint. We propose a method to construct the “design prior” for the primary survival endpoint which incorporates the association between the biomarker and the survival endpoint estimated from RWD.
Methods: The association between the biomarker and the survival endpoint is first obtained through a Cox proportional hazard model using registry data. This association is then combined with the biomarker treatment effect estimate from phase II trial to obtain the treatment effect on the survival endpoint. This approach can also incorporate a prior distribution directly on the hazard ratio of the survival endpoint if the information is available (e.g., from phase II). We demonstrated this approach using the Swedish Heart Failure Registry. We compared the impact of important data-related decisions in using registry data, including the timing of the follow-up and biomarker measurement, choice of endpoint, and the relevant patient population.
Results and conclusion: With a well-established registry, it was possible to derive estimates of PoS, including exploring for relevant subgroups. The settings in using registry data had an impact on the association between the biomarker and survival endpoints, and thus the PoS. The choice of the subset of patients had the largest impact of the registry-related aspects. The change in PoS due to the inclusion of a prior distribution directly on the hazard ratio was larger than any specifications related to the registry data. This work provides a methodological solution to incorporate registry data in the PoS calculations to aid decision-making.
References:
1. Cetinyurek Yavuz, A., et al., On the Concepts, Methods, and Use of “Probability of Success” for Drug Development Decision-Making: A Scoping Review. Clinical Pharmacology & Therapeutics, 2025.
|