Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Statistical methods in epidemiology
Time:
Tuesday, 26/Aug/2025:
11:30am - 1:00pm

Location: ETH E21

D-BSSE, ETH, 54 seats

Show help for 'Increase or decrease the abstract text size'
Presentations
30-1 Stats Methods Epi: 1

Federated Inference methods for estimation and comparison of Standardized Mortality Ratios

Zoë D. van den Heuvel, Bas de Groot, Marianne A. Jonker

RadboudUMC, Netherlands, The

One way to benchmark quality of care in emergency departments of medical centers is by comparison of quality indicators like the Standardized Mortality Ratio (SMR), in order to improve patient outcomes where possible. The SMR is defined as the ratio of the observed and the expected mortality rate. To be able to estimate the SMRs that account for the different patient populations in the medical centers, it is necessary to combine the data from different medical centers into a single database. However, sharing data across medical centers is in practice challenging due to regulatory and privacy problems.

Recently, Bayesian Federated Inference (BFI) was introduced to construct from local inferences in separate medical centers what would have been inferred had the data sets been merged [1]. In this methodology, the estimates and statistical power of a combined database can be obtained, without actually constructing this database. The aim of this research is to apply the BFI methodology to real world emergency department data from multiple medical centers, in order to estimate and compare SMRs, adjusted for case-mix. In the presentation we explain how the BFI methodology can be applied to achieve this, without combining data from the different centers.

[1] Jonker, M. A., Pazira, H., & Coolen, A. C. (2024). Bayesian federated inference for estimating statistical models based on non‐shared multicenter data sets. Statistics in Medicine, 43(12), 2421-2438.



30-1 Stats Methods Epi: 2

A mixture model for subtype identification: Application to CADASIL

Sofia Kaisaridi1, Juliette Ortholand1, Caglayan Tuna2, Nicolas Gensollen1, Sophie Tezenas du Montcel1

1ARAMIS, Sorbonne Université, Institut du Cerveau-Paris Brain Institute-ICM, CNRS, Inria, Inserm, AP-HP, Groupe Hospitalier Sorbonne Université, Paris, France; 2Inria, Université Paris Cité, Inserm, HeKA, F-75015 Paris, France

Background / Introduction: Disease progression models are promising tools for analysing longitudinal data presenting multiple modalities. Such models can be used to estimate a long-term disease progression and to reconstruct individual trajectories, while accounting for the variability between patients and features. However, these techniques often assume that individuals form a homogeneous cluster, thus ignoring possible subgroups within the population. Taking into account different subtypes of progression, while estimating the average course of the disease, is an important task, particularly for diseases with poorly understood underlying mechanisms. Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy (CADASIL) has been shown to be such an example, with a post-hoc classification study revealing two different subtypes1 depending on the spatiotemporal variability. The aim of this study is to extend an existing mixed effects model, to identify different clusters of disease progression at the time when the estimation task is performed.

Methods: We integrate a probabilistic mixture framework, into an existing non-linear mixed-effect model used for disease course mapping, implemented in the open-source python library Leaspy. In this framework, inter-individual variability is captured through three spatiotemporal parameters: the disease onset, the pace of progression and the ordering of the symptoms. We add a new layer atop the hierarchical structure of the model, assuming that the random effects are coming from a mixture of normal distributions, with their respective probabilities of occurrence. The joint estimation of clusters and model parameters is performed using a mixture Monte-Carlo Markov chain stochastic approximation Expectation Maximisation (M-MCMC SAEM) algorithm.

Results: We show that our model successfully recovers the ground truth parameters from synthetic data, with reduced bias comparing to the naïve solution of post-hoc clustering of individual parameters from a one-class model. Our application to CADASIL disease data allows the unsupervised identification of disease subtypes, involving all aspects of the modelled spatiotemporal variability.

Conclusion: We proposed a mixture model, as an extension of the disease course mapping model (Leaspy), to properly identify the underlying subgroups of progression based on the individual parameters. This work aims to contribute to the complex challenge of uncovering the heterogeneity arising from different subtypes in chronic diseases.

1.Kaisaridi S., Herve D., Jabouley A., Reyes S., Machado C., Guey S., Taleb A., Fernandes F., Chabriat H. et Tezenas du Montcel S. (2025). Determining Clinical Disease Progression in Symptomatic Patients With CADASIL. Neurology, 104(1), e210193



30-1 Stats Methods Epi: 3

Performance of a residual-based algorithm aiming at identifying response shift at the item level using Rasch models: a simulation study

Yseulys Dubuy1, Victor Rechard1, Véronique Sébille1,2

1Nantes Université, U1246 SPHERE "methodS in Patient-centered outcomes & HEalth ResEarch", France; 2Nantes Université, CHU Nantes, Methodology and Biostatistics Unit, Nantes, France

Background: One of the main challenges when analyzing longitudinal Patient-Reported Outcomes (PROs) data is that items interpretation in questionnaires designed to measure the PRO of interest (e.g., fatigue, anxiety) can change over time. As a result, the observed change in patient responses reflects both the change in the PRO itself and the changes in the items interpretation, referred to as Response Shift (RS). RS can arise due the experiencing challenging health events (e.g., salient events or living with a progressive chronic condition). Examining RS at the item level is crucial, as it may provide valuable insights into patients' experiences, notably regarding their potential psychological adjustment to challenging health events. Furthermore, ignoring RS can interfere with the inferences made from PRO data.

Methods: Inspired by Andrich & Hagquist's work on differential item functioning (a phenomenon related to RS), we developed an item-level RS detection procedure based on the residuals of a random-effect Partial Credit Model. Specifically, we aimed at determining whether the residuals distribution is associated with time, as an operationalization of RS. The performance of this newly developed procedure was evaluated through a simulation study involving two measurement occasions. Different scenarios were considered, in which varied: the number of items and response categories of the questionnaire, the sample size, the mean change in the PRO levels over time, the presence/absence of RS, the RS effect size, and the proportion of items affected by RS (i.e. proportion of RS items). The performance of the procedure was assessed based on: (1) the rates of false and correct detection of RS (in scenarios without/with simulated RS, respectively), (2) RS recovery (whether the procedure identified the items on which RS was simulated) and (3) the bias when estimating the change in the PRO levels over time, after accounting for RS effects evidenced.

Results: The rate of false detection of RS should be lower than 5%, due to correction for multiple testing. The rate of correct detection of RS will likely be influenced by sample size, RS effect size, and proportion of RS items. Specifically, higher sample sizes, RS effect sizes, and proportion of RS items are expected to increase correct RS detection rate.

Conclusions: Detecting and accounting for RS is crucial to avoid suboptimal healthcare decision-making. The newly developed procedure might outperform existing item-level RS detection methods. Moreover, this approach can be extended to integrate covariates, helping to identify RS determinants.

References: Andrich&Hagquist, https://doi.org/10.1186/s12955-017-0755-0



30-1 Stats Methods Epi: 4

Lifecourse modelling and time-varying covariates: empirical application and simulation study for novel method

Solomon Beer1, Sherief Eldeeb2, Erin Dunn2, Andrew Simpkin1, Andrew Smith3

1University of Galway, Ireland; 2Purdue University, Indiana; 3University of the West of England, United Kingdom

In prospective cohort studies scientists collect many repeated measures, including exposures and outcomes, that are highly correlated over time. Where an exposure is collected repeatedly, interest often lies in determining whether timing has a differential effect on a later outcome. However, few such studies consider the effect of time-varying covariates (TVC) which may impact associations identified.

One approach for such data is the Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, selecting the hypothesis that explains the most variation in outcome. However, traditionally SLCMA has not accounted for time-varying covariates. We present a modified version of this approach - direct and mediated effects (DME) SLCMA - which corrects for TVC by adjusting on a per hypothesis basis only for covariates measured before the time of each respective temporal hypothesis. A covariate could be a confounder or mediator depending on the temporal hypotheses tested.

In a simulation study, informed by empirical data from the Drakenstein Child Health Study (DCHS), we compared the existing and modified SLCMA and found several scenarios where TVC had a major effect on selecting an incorrect hypothesis. In particular, where the covariates have a greater causal effect on the following exposure than exposures have on the following covariate, and especially when no direct exposure effects are present. Only in scenarios with no indirect effects did SLCMA always select the correct hypothesis, indicating the importance of correcting for relevant TVC when confounding is plausible.

As an application of this method, we use DME SLCMA on DCHS data to investigate the importance that timing of exposure to maternal psychopathology (repeatedly measured at birth, age 4 and 8) has on childhood depression measured at age 8 whilst correcting for time-varying socioeconomic position (parental assets measured at birth, age 4 and 8). Exposure to childhood adversity is a potent risk factor for later negative mental health outcomes, with growing evidence that the developmental timing of adversity exposure is important. Exposure at age 8 is found to have the strongest effect on childhood depression, even after accounting for time-varying covariates.

To our knowledge, this study is the first to account for time-varying covariates in a lifecourse modelling approach. Our results are useful for future studies where it is essential to adjust for time-varying covariates to prevent incorrect and biased estimates, such as when the exposure of interest acts only indirectly on the outcome.



30-1 Stats Methods Epi: 5

Emulating hypothetical interventions of physical activity on obesity: applying target trial emulation in the 1970 Birth Cohort Study

Michail Katsoulis, Jamie Wong

Population Science & Experimental Medicine, Institute of Cardiovascular Science, UCL, London, UK

Background/Introduction: Many studies examining the effect of physical activity on obesity have only used a single baseline measurement of physical activity in their analyses as their exposure. There is a limited number of papers that have evaluated the lifelong effect of physical activity on obesity measured at multiple timepoints, accounting for changes over time. This study aimed to use the target trial emulation framework to estimate the 16-year risk of obesity under hypothetical physical activity interventions in middle-aged individuals (30 to 46 years old) from the 1970 British Cohort Study (BCS70).

Methods: This population-based cohort study utilised data from BCS70, a British birth cohort involving 8479 individuals born in 1970. Participants were allocated to one of two hypothetical interventions: low (0-3 times per month) and high (1-5 times per week) physical activity levels. Obesity was defined when body mass index was ≥30kg/m2. The study aimed to estimate the per-protocol effect, assessing the impact of physical activity if all participants had fully adhered to their assigned intervention. Multiple imputation was used to address missing data, creating 10 imputed datasets. Inverse probability weighting was used to account for adherence, using information from time-fixed and time-dependent confounders. Pooled logistic regression models were utilised to estimate the standardised risk curves. Non-parametric bootstrap with 200 samples for each imputed dataset was used to estimate 95% confidence intervals, defined by the 2.5 and 97.5 percentiles of the pooled sample of the 10*200=2000 of risk estimates[1].

Results: 2371 participants were assigned to the low physical activity (hypothetical) intervention, and 6108 to the high physical activity (hypothetical) intervention. The estimated standardised risk of obesity over 16 years was 30.6% (95% CI = 27.1%, 34.8%) for the low physical activity group, and 29.8% (95% CI = 28.3%, 31.2%) for the high physical activity group. The risk difference between the two hypothetical interventions after 16 years was -0.8% (95% CI = -5.0%, 2.9%)

Conclusion: Individuals who adhered to a high level of physical activity exhibited a moderately lower risk of developing obesity between 30 to 46 years of age. This study was among very few cohort studies overall, to utilise the target trial emulation framework to evaluate hypothetical interventions.

Reference

Schomaker M, Heumann C. Bootstrap inference when using multiple imputation. Stat Med. 2018 Jun 30;37(14):2252-2266