Conference Agenda

Session

Observational/real-world data 2

Time:

Wednesday, 27/Aug/2025:

4:00pm - 5:30pm

Session Chair: Sabine Hoffmann

Location: ETH E21

D-BSSE, ETH, 54 seats

Presentations

48-observational-rwd-2: 1

Group measurement invariance assessment with item-level latent variable models: A comparative simulation study of two methods

Myriam Blanchin¹, Odile Stahl¹, Yseulys Dubuy¹, Véronique Sébille^1,2

¹Nantes Université, Université de Tours, INSERM, UMR1246 SPHERE « methodS in Patient-centered outcomes and HEalth ResEarch », Nantes, France; ²CHU Nantes, DRCI, Methodology and Biostatistics Department, Nantes, France

Introduction: Measurement invariance assessment is essential when comparing health-related quality of life between groups as it can bias mean comparisons and obfuscate the interpretation of intervention effect. Ordinal item responses to quality of life questionnaires can be analyzed with partial-credit models. Among the various methods for invariance assessment, also known as Differential Item Functioning (DIF) analysis, only a few are available to analyze DIF in multiple groups of patients. The objective was to compare two methods for DIF analysis with latent regression partial-credit models across three groups of patients.

Methods: Ordinal responses of three groups of patients were simulated with a partial-credit model. Sample sizes, number of items and response categories, DIF patterns (DIF in none, two pairs of groups or all groups), and group effect values all varied in the simulation scenarios.

The anchor selection method consists in i/ an iterative Wald test procedure to identify a stable set of anchor items (DIF-free items) starting from an unrestricted model (DIF on all items), and ii/ a DIF refinement of all DIF items at once to determine the groups affected by DIF and the type of DIF. The DIF items identification method performs first a likelihood-ratio test between the fully invariant and the unrestricted model. If this test is significant, DIF is assumed and an iterative Wald test procedure is performed to identify DIF items starting from a fully invariant model. Refinement of DIF is processed each time an item is flagged with DIF.

The performances were compared in terms of false detection (no simulated DIF), correct detection (simulated DIF), quality of DIF detection (affected groups and items) and group effect bias.

Results: Rates of false DIF detection were low and ranged between 0.2 % and 1.6 %, and 1.0 % and 3.2 % for the anchor selection method and the DIF items identification method, respectively. DIF was correctly detected in 5 % to 99% and 24% to 98% of the cases for the anchor selection method and the DIF items identification method, respectively. Rates of correct detection increased with the number of groups affected by DIF, sample size and number of response categories.

Conclusion: The DIF items identification method performed generally better than the anchor selection method. A sample size of 300 patients per group is required to achieve 80% of correct DIF detection which may limit the applicability of these methods, derived from educational sciences, in health sciences.

48-observational-rwd-2: 2

Linkage of HIV treatment and population-based surveillance records in rural South Africa

Dickman Gareta^1,2,3,4, Evelyn Lauren⁵, Khumbo Shumba⁶, Cornelius Nattey⁶, William Macleod^6,7, Matthew P. Fox^6,7,8, Koleka Mlisana^4,10, Matthias Egger^2,11,12, Dorina Onoya⁶, Kobus Herbst^1,9, Jacob Bor^6,7

¹Africa Health Research Institute, South Africa; ²Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland; ³Graduate School for Health Sciences, University of Bern, Bern, Switzerland; ⁴School of Laboratory Medicine and Medical Sciences, University of KwaZulu Natal, Durban, South Africa; ⁵Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; ⁶Health Economics and Epidemiology Research Office, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa; ⁷Boston University School of Public Health, Department of Global Health, Boston, MA, United States; ⁸Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA; ⁹DSTI-SAMRC South African Population Research Infrastructure Network (SAPRIN), Durban, South Africa; ¹⁰National Institute for Communicable Diseases, Johannesburg, South Africa; ¹¹Centre for Infectious Disease Epidemiology and Research, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa; ¹²Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK

Background

Integrating HIV clinical records and demographic surveillance offers research opportunities to understand uptake of health services and healthcare quality and inequality and improve patient outcomes. We implemented a graph-based record linkage algorithm to deduplicate and link HIV treatment records with population-based clinical and surveillance records in an HIV-endemic setting in rural South Africa.

Methods

We deduplicated and linked data from four data sources: Africa Health Research Institute (AHRI) Health and Demographic Surveillance System (HDSS), AHRI Clinic and Hospital Information System (AHRILink), National Health Laboratory Service (NHLS), and Three Integrated Electronic Registers (TIER.Net, HIV care and treatment records). Data were collected between January 1, 2000, and July 31, 2024, through repeated HDSS surveys of over 22,000 households residing in AHRI’s surveillance area and from one hospital and 17 clinics in Hlabisa sub-district, KwaZulu-Natal. The databases contained identifying attributes such as first name, surname, date of birth, gender, health facility, and South African national identity (ID) number, although typographical errors were common. We implemented a graph-based record linkage algorithm adapted from the Fellegi-Sunter model. The algorithm was trained and validated using a subset of records that contained valid national ID numbers. We assessed the algorithm performance by computing sensitivity, positive predictive value (PPV), and F-score, and computed descriptive statistics for different cohorts constructed from the linked database.

Results

Deduplication and linkage of the four databases yielded a sensitivity of 91.7% and PPV of 94.8% (F-score= 0.932). Of 246,945 unique individuals from the HDSS, 43,325(17.54%) were HIV positive based on the data from the four data sources. Of these, 31,051(71.9%) had a record in TIER.Net or NHLS and 25,175(81.1%) had a record in TIER.Net. Of 64,140 unique individuals from TIER.Net, 25,175 (39.0%) individuals were household members in the HDSS. 16,074 (25.0%) of the individuals in TIER.Net had at least one hospital admission.

Conclusion

Records from multiple data sources can be deduplicated and linked with a high degree of accuracy in a resource-poor HIV-endemic region in rural South Africa. This study paves the way for further advancements in clinical and population data integration, offering the potential to deepen our understanding of HIV epidemiology in a well-described population with a high prevalence of infectious and non-communicable diseases.

48-observational-rwd-2: 3

Investigating Statistical Inference for Consistency, Heterogeneity and Efficacy of Federated Learning Models: Insights from a Mega-Simulation of Real-World Data

Narayan Sharma¹, Gonzalo Durán-Pacheco¹, Jacek Chmiel², Eric Boernert¹, Doug Kelkhoff³, Vittorio P. Illiano¹, Gabriele Zilorri¹, Matthias Antonin¹, Bjoern Tackenberg¹, Dominik Heinzmann¹

¹F. Hoffmann - La Roche Ltd, Basel, Switzerland; ²Avenga, Poland; ³Hoffmann-La Roche Limited, Canada

Background

Federated learning and analytics with medical data has emerged as a key solution to collaboratively train or fit machine-learning and statistical models to distributed datasets at different hospitals without compromising data privacy. Heterogeneity and site level differences necessitate a profound understanding of the operational characteristics of federated algorithms to ensure accuracy and interpretability of the models once they are applied to distributed real world medical datasets (RWD).

Different federated algorithms including GLMs and marginal structural models have been developed for an upcoming RWD analysis of persistence of treatment and its impact on patient outcome across hospitals on different continents. Their operational characteristics and accuracy has been investigated in a holistic simulation study and learnings will inform interpretability of the model outcomes once applied to RWD.

Methods

We simulated RWD from three hospitals across 36 scenarios including heterogeneity (high, low), treatment effect (no, moderate and large), data missingness (no, at-random, not at random) and independent and identically distributed (IID) data process (IID, no-IID). A registry was used to estimate between hospital variability and other metrics to ensure simulation is as realistic as possible.

The workflow was set-up to generate data, model and extract summary statistics. We execute parallelly local and federated analysis for conditional and marginal models (linear, logistic and cox regression) for continuous, binary and time-to-event endpoints. We repeat each scenario for maximum 100 times (i.e., total 21,600 fitted models). Federated models were executed with DataSHIELD, an open source solution.

In addition, simulated hospital individual data were synthesized by a meta-analysis to compare it to the federated results.

Results

The coefficients and standard errors for federated models and central models were highly similar, accurate to five decimal places, demonstrating that identical results can be obtained without centralizing data across hospitals. Under IID conditions, both federated and central models yielded unbiased results, while non-IID scenarios introduced biases with ground truth, particularly with increasing treatment effects. Notably, meta-analysis results strongly diverged, revealing an average hazard decrease of 1.2% to 6.3% under IID and an increase of 1.3% to 10.4% under non-IID for Cox-model, compared with the federated results across different scenarios.

Conclusion

Federated models are appropriate for analysis of RWD from hospitals across different countries and continents where privacy laws and other considerations restrict pooling (i.e. centralizing) of the data. For many scenarios, a simple meta-analysis could be misleading, favoring the federated approach.

48-observational-rwd-2: 4

Characterizing Medication Timelines in Huntington’s Disease: A Cluster-Based Analysis of Treatment Patterns

Marc Dibling¹, Alexandra Durr², Sophie Tezenas du Montcel¹

¹Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, Paris, France; ²Sorbonne Université, Paris Brain Institute - ICM, Inserm, CNRS, APHP, Hopital de la Pitié-Salpêtrière, Paris, France

Introduction

Huntington disease (HD) is a rare inherited neurodegenerative disorder characterized by motor, cognitive, and psychiatric symptoms managed through diverse medication regimen strategies. However, the heterogeneity of clinical symptoms, adherence and tolerance poses significant challenges in determining the optimal care trajectory. Leveraging a national health claims database, this study aims to investigate HD patients' medication timeline to characterize the distinct strategies and assess heterogeneity factors.

Methods

We identified 1,776 incident HD patients in the French health claims database (SNDS) between 2019 and 2023 and retrieved their tetrabenazine, antipsychotics and antidepressants pharmacy deliveries between 2009 and 2023. We then applied a three-steps data processing pipeline designed to detect patterns of medication use and enable inter-group comparisons. We first summarized individual patient treatment timelines via meta features (i.e. number of event occurrences, time gaps between occurrences) to secondly perform k-medoids timeline clustering using a large number of medoids (silhouette score > 0.3) to ensure that only highly similar timelines are grouped together thereby reducing complexity by selecting representative ones while preserving key patterns. Third, we performed hierarchical aggregative clustering of medoids to cluster similar timeline sequences and facilitate visualization of patterns of medication use. Number of clusters was selected to maximize the silhouette score. Finally, after the three steps data processing, we conducted an explanatory analysis to assess the relationship between treatment sequence clusters and baseline patient characteristics using ANOVA and chi-squared tests.

Results

Mean age at detection of first HD diagnosis code was 57.5 [56.7-58.3] years with a 0.83 male-female ratio and most patients (99%) had at least one delivery of the drugs of interest between 2019 and 2023. We identified 7 clusters of medication timelines: one with timelines indicating no treatment adherence or delivery, one for the individual use of each drug of interest, one for the combined use of antidepressants with antipsychotics and another for antidepressants with tetrabenazine. The last cluster included patients with a delayed treatment initiation. Baseline characteristics comparison indicated that patients not adhering to any treatment were significantly older, patients with regular tetrabenazine deliveries were significantly less hospitalized and there were significantly more men included in the cluster indicating a delayed treatment initiation.

Conclusion

This study demonstrates the value of leveraging large registry data to analyze complex medication trajectories in Huntington disease. We identified distinct treatment patterns and adherence variations that will allow to understand disease management strategies on a national level.

48-observational-rwd-2: 5

Testing the Similarity of Healthcare Pathways based on Transition Probabilities - A New Bootstrap Procedure

Zoe Lange¹, Holger Dette¹, Maryam Farhadizadeh², Nadine Binder²

¹Ruhr-University Bochum, Germany; ²University Freiburg, Germany

Background

Establishing a common standard of care within or across clinics and finding the best treatment strategies for diseases are important goals in the healthcare system. To contribute to achieving these goals we study the healthcare pathways of patients, consisting of sequences of diagnoses, treatment procedures, or hospital readmissions observed over time. Working with healthcare pathway data is attractive since this data is collected by clinics routinely and therefore, has a high availability. However, the healthcare pathways of different patients tend to be highly heterogeneous, even for common diseases. With our newly developed similarity testing approach, presented here, we can find patterns, namely typical pathways, in this heterogeneous data.

Methods

We model the healthcare trajectory for a group of patients by a multistate model, where diagnoses, treatments or readmissions are considered as states that patients can transition to. We define the similarity of two multistate models in terms of the probabilities of a patient transitioning between states. This modelling of similarity enables us to formulate a similarity hypothesis test. If for two multistate models the difference between their transition probabilities is large, they are considered non-similar and fulfil the null hypothesis. If the difference is sufficiently small, they are considered similar and fulfil the alternative hypothesis. Groups with similar healthcare pathways, according to the test, are pooled into one group, representing a typical pathway. Based on these pooled data sets, one can perform further estimation tasks like estimating the risks or probabilities for hospital readmission. The increased sample size, that results from pooling similar pathways, yields to more accurate statistical inference, especially in small sample settings as with heterogeneous pathways.

Results

We introduce a special parametric bootstrap test that is tailored to our similarity hypotheses. We proof the validity of this test and investigate its performance in a comprehensive simulations study with different sample sizes, censoring rates, and similarity thresholds. Furthermore, we show how the results are applicable by discussing an example of prostate cancer data.

Conclusion

Testing the similarity of seemingly heterogeneous healthcare pathways to identify typical pathways is a new and promising approach that accounts for small sample sizes and draws on available routine data. Beyond this, the presented bootstrap test can easily be adapted to other settings making it an attractive tool for general similarity testing problems, especially when only limited data is available.