JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at organizers@iscb2025.info.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Only Sessions at Date / Time

Session Overview

Session

Poster Exhibition: M / Monday posters at Biozentrum

Time:

Monday, 25/Aug/2025:

10:45am - 11:30am

Location: Biozentrum, 2nd floor

Biozentrum, 2nd floor poster area

Presentations

posters-monday-BioZ: 1

Matching-adjusted indirect comparison of endoscopic and craniofacial resection for the treatment of sinonasal cancer invading the skull base

Florian Chatelet^1,2,3, Sylvie Chevret^1,2, MUSES collaborative group^3,4,5, Philippe Herman^1,3, Benjamin Verillaud^1,3

¹Université Paris Cité, France; ²SBIM Hôpital Saint Louis APHP Paris, ECSTRA team; ³ENT department Hôpital Lariboisière APHP Paris; ⁴“ASST Spedali Civili di Brescia,” University of Brescia, Brescia, Italy;; ⁵“Ospedale di Circolo e Fondazione Macchi,” University of Insubria, Varese, Italy

Background

In surgical oncology, new techniques often replaces established methods without direct comparative studies, making it difficult to assess their actual effectiveness. This is particularly relevant for endoscopic endonasal approaches (EEA), which have progressively supplanted craniofacial resection (CFR) for sinonasal cancers invading the skull base. As a result, contemporary CFR-treated cohorts have become too small for direct comparisons, and randomised trials remain unfeasible due to ethical and logistical constraints. Matching-adjusted indirect comparison (MAIC) offers a statistical method to indirectly compare a contemporary individual-patient dataset (EEA) with a historical aggregate dataset (CFR), adjusting for confounding variables.

Methods

We conducted a MAIC using individual patient data (IPD) from the MUSES cohort (EEA-treated patients) and aggregated data from Ganly et al. historical CFR cohort, including patients with skull base invasion. Key prognostic variables—including age, tumour histology, orbital and brain invasion, prior radiotherapy or surgery—were used to weight the MUSES cohort to match the CFR cohort.

Primary and secondary endpoints included overall survival (OS), recurrence-free survival (RFS), perioperative mortality, surgical margins, and complication rates. Survival analyses were conducted using Kaplan-Meier estimations, log-rank tests, and Cox proportional hazards models, with bootstrap resampling for confidence interval estimation.

Results

A total of 724 EEA-treated and 334 CFR-treated patients were analysed. Before MAIC, EEA was associated with significantly improved OS (HR= 2.33, 95% CI= 1.88–2.87, p< 0.001), and this benefit persisted after adjustment (HR= 1.93, 95%CI= 1.60–2.34, p< 0.001). RFS was initially higher in the EEA cohort (HR= 1.39, 95%CI= 1.14–1.69, p= 0.001) but was no longer statistically significant after adjustment (HR= 1.06, 95%CI= 0.91–1.23, p= 0.63). Perioperative mortality and complications were significantly lower in the EEA cohort compared to CFR. Clear resection margins were achieved in 79% of EEA cases and 71% of CFR cases (OR= 0.67, 95%CI= 0.50–0.90, p= 0.008), but this difference was no longer significant after MAIC adjustment (OR= 1.15, 95%CI= 0.93–1.40, p= 0.36).

Conclusion

This study highlights the potential utility and limitations of MAIC in addressing selection biases in non-randomised comparisons. OS remained superior in the EEA group after adjustment, while RFS was similar between EEA and CFR. Perioperative mortality and complications were significantly higher with CFR, although both techniques achieved similar resection margin rates after adjustment. These findings support endoscopic surgery as a first-line approach for sinonasal cancers invading the skull base, provided it is technically feasible and performed in expert centres.

posters-monday-BioZ: 2

Information borrowing in phase II randomized dose-ranging clinical trials in oncology

Guillaume Mulier^1,2, Vincent Lévy³, Lucie Biard^1,2

¹Inserm U1342, team ECSTRRA. Saint-Louis Research Institute, Paris, France; ²APHP, Department of Biostatistics and Medical Information, Saint-Louis hospital, Paris, France; ³APHP, Clinical research department, Avicenne hospital, Paris, France

Introduction

Over the past decades, the emergence of therapeutics such as immunotherapies and targeted therapies has challenged conventional trial designs, particularly single-arm studies. Selecting a single dose from phase I trials with limited follow-up, typically based solely on toxicity endpoints, has often resulted in suboptimal drug dosages. As a result, dose optimization in oncology is now encouraged by international initiatives such as the FDA’s Project Optimus, the Optimal Cancer Care Alliance, and the Patient-Centered Dosing Initiative. This study was motivated by the case of Ibrutinib in chronic lymphocytic leukemia, where the initially approved dose of 420 mg/day—determined through conventional phase I designs based on the maximum tolerated dose—was later found to achieve comparable response rates at lower doses. This highlights the potential value of dose-ranging phase II studies in oncology.

Assuming that borrowing information across doses can enhance statistical power, our objective is to compare various strategies for information borrowing in phase II randomized trials involving multiple doses of the same drug.

Methods

The backbone phase II design considered is the Bayesian Optimal Design (BOP2), adapted for multi-arm settings with co-primary binary endpoints and interim analyses. This design employs a multinomial conjugate distribution within a Bayesian framework, with decision rules for stopping due to futility and/or toxicity based on posterior probabilities.

We adapted and compared different information borrowing approaches for estimating efficacy and toxicity:
(i) power prior,
(ii) incorporation of information from stopped arms,
(iii) Bayesian hierarchical modeling,
(iv) Bayesian logistic regression.

These methods were applied alongside BOP2 decision rules. A simulation study was conducted to assess the operating characteristics of each approach in a hypothetical randomized dose-ranging trial, evaluating efficacy and toxicity against reference values.

Results

Our findings indicate that power prior, when applied without dynamic adaptation, is unsuitable as it increases false positive rates. Bayesian hierarchical modeling shrinks estimates toward a common mean, reducing variance but also inflating false positive rates. In contrast, Bayesian logistic regression provides a balanced trade-off, enhancing power to some extent while maintaining a lower false positive rate.

Conclusion

Bayesian logistic regression, modeling both dose-toxicity and dose-efficacy relationships, combined with BOP2 decision rules, offers a promising approach for borrowing information in dose-ranging studies with a limited number of doses. However, designs without information borrowing provide stricter false positive control and should also be considered.

posters-monday-BioZ: 3

Information borrowing in Bayesian clinical trials: choice of tuning parameters for the robust mixture prior

Vivienn Weru¹, Annette Kopp-Schneider¹, Manuel Wiesenfarth³, Sebastian Weber², Silvia Calderazzo¹

¹German Cancer Research Center (DKFZ), Germany; ²Novartis Pharma AG, 4002 Basel, Switzerland; ³Cogitars GmbH, Heidelberg, Germany

Introduction

Borrowing external data for use in a current study has emerged as an attractive research area with potential to make current studies more efficient especially where recruitment of patients is difficult.

Methods

Bayesian methods provide a natural approach to incorporate external data via specification of informative prior distributions. Potential heterogeneity between external and current trial data, however, poses a significant challenge in this context. We focus on the robust mixture prior, a convex combination of an informative prior with a robustifying component, that allows to borrow most when the current and external data are observed to be similar and least otherwise. This prior requires the choice of three additional quantities: the mixture weight, and the mean and dispersion of the robust component. Some choices of these quantities may, however, lead to undesirable operating characteristics. We systematically investigate this impact across combinations of robust component parameters and weight choices in one-arm and hybrid-control trials, where in the latter, current control data is informed by external control data. An alternative functional form for the robust component is also investigated.

Results

For some parameter choices, losses may be still unbounded despite the use of dynamic borrowing for both testing and estimation, i.e. Type I error (TIE) rate may approach 1 while MSE may increase unconstrained. In the hybrid-control setting, the parameter choices further impact the size and shift of the “sweet spot”, where control of TIE rate and gain in power is observed. We observe that for such a sweet spot, the width negatively correlates with the maximum power gain. We further explore behavior of the mixture prior when adopting a heavy tailed distribution for the robust component, which is able to cap TIE rate and MSE inflation.

Conclusion

The choice of the parameters of the robust component of the mixture prior as well as the mixture weights is non-trivial. All three parameter choices are influential, acting together and therefore their impact needs to be assessed jointly. We provide recommendations for these choices as well as considerations to keep in mind when evaluating operating characteristics.

posters-monday-BioZ: 4

A Bayesian approach to decision making in early development clinical trials : An R solution.

Audrey Te-ying Yeo

Independant

Early clinical trials play a critical role in Oncology drug development. The main purpose of early trials is to determine whether a novel treatment demonstrates sufficient safety and efficacy signals to warrant further investment (Lee & Liu, 2008). The new open source R package phase1b (Yeo et al, 2024) is a flexible toolkit that calculates many properties to this end, especially in the oncology therapeutic area. The primary focus of this package is on binary endpoints. The benefit of a Bayesian approach is the possibility to account for prior data (Thall & Simon, 1994) in that a new drug may have shown some signals of efficacy owing to its proposed mode of action, or similar activity based on prior data. The concept of the phase1b package is to evaluate the posterior probability that the response rate with a novel drug is better than with the current standard of care treatment in early phase trials such as Phase I. The phase1b package provides a facility for early development study teams to decide on further development of a drug either through designing for phase 2 or 3, or expanding current cohorts. The prior distribution can incorporate any previous data via mixtures of beta distributions. Furthermore, based on an assumed true response rate if the novel drug was administered in the wider population, the package calculates the frequentist probability that a current clinical trial would be stopped for efficacy or futility conditional on true values of the response, otherwise known as operating characteristics. The intended user is the early clinical trial statistician in the design and interim stage of their study and offers a flexible approach to setting priors and weighting.

posters-monday-BioZ: 5

Designing Clinical Trials in R with rpact and crmPack

Daniel Sabanés Bové¹, Gernot Wassmer², Friedrich Pahlke²

¹RCONIS, Taiwan; ²rpact GbR, Germany

The focus of this poster will be on clinical trial designs and their implementation in R. We will present rpact, which is a fully validated, open source, free-of-charge R package for the design and analysis of fixed sample size, group-sequential, and adaptive trials. We will summarize and showcase the functionality of rpact.

In addition, we will also briefly present crmPack, which is an open source, free-of-charge R package for the design and analysis of dose escalation trials.

Together, rpact and crmPack enable the implementation of a very wide range of clinical trials. The poster presentation aims to increase the visibility of the two open source packages in the clinical biostatistics community, and allow for discussions about future developments.

posters-monday-BioZ: 6

Leveraging on historical controls in the design and analysis of phase II clinical trials

Zhaojin Chen¹, Ross Andrew Soo^2,3, Bee Choo Tai^1,4

¹Saw Swee Hock School of Public Health, National University of Singapore, Singapore; ²Department of Haematology-Oncology, National University Cancer Institute Singapore, Singapore; ³Cancer Science Institute of Singapore, National University of Singapore, Singapore; ⁴Yong Loo Lin School of Medicine, National University of Singapore, Singapore

Background

In oncology, phase II trials are commonly used to screen novel agents for solid tumour by following a single-arm design. All patients receive a concurrent treatment (CT) and their overall objective response rate is compared with some pre-defined threshold. However, evidence has suggested that such a design often results in false claims of efficacy. This not only causes waste in time and resources but is also of great ethical concern for trial participants. This study thus aims to improve the current design by incorporating a historical control (HC) arm for more appropriate treatment evaluation.

Methods

For treatment evaluation using HCs, major challenges involve imbalance in baseline characteristics, unmeasured baseline variables and temporal drift of disease outcomes. To tackle these problems, we adopted three main statistical approaches, namely regression adjustment (RA), inverse probability of treatment weighting (IPTW_PS) and matching (MC_PS) based on propensity score, to reduce potential confounding bias when evaluating the effect of treatment. Simulation studies were conducted for null, small, moderate and large treatment effect based on a binary disease outcome, assuming sample sizes of 100 and 200 with equal treatment allocation. Bias, mean squared error (MSE), coverage probability, type I error and power were used to evaluate their performances. These methods were then applied to the PLASMA phase II trial using HCs from the previously completed AURA 3 phase III trial.

Results

Simulation results showed that the RA method slightly overestimates, whereas the IPTW_PS method slightly underestimates treatment effect as it goes from null to large. Bias of the MC_PS method can be in either direction and reduces in magnitude when more HCs are available. As the level of imbalance in baseline characteristics increases, the bias and MSE increase and power decreases. All three methods are sensitive to unmeasured baseline confounders, but the RA method appears to be more sensitive to model misspecification as compared to the propensity score based methods.

Conclusion

Consistent with existing literature, our study found that phase II trials incorporating HCs should be recommended for diseases with well-known mechanisms. Moreover, when there are a large number of HCs available, the MC_PS generally performs better than the other two methods with desirable bias, MSE, type I error and power.

posters-monday-BioZ: 7

Design of a research project to evaluate the statistical utility after transformation of a CDISC database into OMOP format

Claire Castagné¹, Amélie Lambert¹, Jacek Chmiel², Alberto Labarga³, Eric Boernert³, Lukasz Kaczmarek³, Francois Margraff³, David Pau¹, Camille Bachot¹, Thomas Stone³, Dimitar Toshev³

¹Roche, France; ²Avenga, Germany; ³F. Hoffmann-La Roche AG

Interoperability between databases is an important issue, to facilitate analyses from multiple sources. The OMOP (Observational Medical Outcomes Partnership) format is increasingly used in Europe, particularly in France. After a targeted bibliographical review of data sources and standard formats used, no article precisely assesses the loss of data and/or information following transformation to the OMOP format. The aim of this work is to assess the statistical and scientific usefulness of the OMOP format.

An observational study in early breast cancer was conducted in 2019. The database is currently in CDISC SDTM format.

The first step of the project involves transforming the SDTM database into OMOP format.

In the second step, a statistical analysis of the data in OMOP format will be carried out.

In the third step, all the results will be compared with the initial results, using quality indicators to assess the loss of information:

indicators regarding transformation to OMOP format, such as the number of observations or variables not transformed,
indicators regarding the number of statistical tables not generated,
indicators regarding the reliability (no loss of information, partial loss, complete loss) of results obtained by comparing SDTM vs OMOP results

315 patients were included in the study, the database structure is made of 7 CDISC domains containing 73 variables: 25 continuous and 48 categorical regarding patient, disease, surgery and treatments characteristics.

Age at treatment initiation was 52.2 (11.8) years, distribution of SBR grade evaluating disease severity was: grade III 50.7% of patients, II 45.7% and I 1.6%.

40.3% of patients met the primary outcome, evaluated at surgery by the pathological complete response.

OMOP database will start in February 2025 and results will be available at the congress: descriptive analyses (univariate, bivariate), correlation matrices, modeling and survival analysis will first be performed on the raw study data (SDTM format), then these same analyses will be reproduced on the OMOP datasets. The usual statistical indicators (percentage of missing data, data dispersion, etc.) and the maintenance of relationships between variables will be used to quantify the differences observed between the databases in the different formats.

This work will make it possible to assess the statistical usefulness remaining after the switch to OMOP format, thanks to a synthesis of indicators, and to ensure the reproducibility of classic statistical analyses.

At the conference, the results/indicators observed on the OMOP format database will be presented and discussed in relation to the initial results.

posters-monday-BioZ: 8

Introducing CAMIS: an open-source, community endeavor for Comparing Analysis Method Implementations in Software

Yannick Vandendijck^1,4, Christina Fillmore^2,4, Lyn Taylor^3,4

¹J&J Innovative Medicine, Belgium; ²GSK, UK; ³Parexel, UK; ⁴on behalf of the CAMIS working group

Try this in R: > round(2.5), and it will give the result of 2.

Try this in SAS: > data rounding; x = round(2.5); run; and it will give the result of 3.

Seriously?

Introduction:

Statisticians using multiple statistical software (SAS, R, Python) will have found differences in analysis results that warrant further exploration and justification. These possible discrepancies across statistical software for a similar analysis can cause unease when submitting these results to a regulatory agency, as it is uncertain if the agency will view these differences as problematic. This becomes increasingly important since the pharma industry is more and more turning to open-source software like R to handle complex data analysis, drawn by its flexibility, innovation, added value and cost-effectiveness.

Knowing the reasons for differences (different methods, options, algorithms, etc.) and understanding how to mimic analysis results across software is critical to the modern statistician and subsequent regulatory submissions.

CAMIS:

This talk will introduce the PHUSE DVOST CAMIS (Comparing Analysis Method Implementations in Software) project. The aim of CAMIS is to investigate and document differences and similarities between different statistical software (SAS, R, Python) to help ease the transitions to new languages by providing comparison and comprehensive explanations. CAMIS contributes to the confidence in reliability of open-source software by understanding how analysis results can be matched perfectly or knowing the source of any discrepancies.

In this talk, I will discuss the objectives of the CAMIS project, identify some key results on differences and similarities between SAS and R, show how we collaborate on CAMIS across companies/ industries/ universities in the open-source community.

Conclusion:

In the transition from proprietary to open-source technology in the industry, CAMIS can serve as a guidebook to navigate this process.

https://psiaims.github.io/CAMIS/

https://github.com/PSIAIMS/CAMIS

posters-monday-BioZ: 9

Assessing covariates influence on cure probability in mixture cure models using martingale difference correlation

Blanca E. Monroy-Castillo, M. Amalia Jácome, Ricardo Cao

Universidade da Coruña, Spain

Background: Cure models analyze time-to-event data while accounting for a subgroup of individuals who will never experience the event. A fundamental question in these models is whether the cure probability is influenced by specific covariates. However, formal statistical tests for assessing covariate effects remain limited. Martingale difference correlation (MDC) provides a non-parametric measure of dependence, where MDC(Y|X) = 0 if and only if E(Y|X) = E(Y), meaning X has no impact on the expectation of Y. This makes MDC a promising tool for testing covariate effects on cure probability.

Methods: We propose a non-parametric hypothesis test based on MDC to evaluate the effect of covariates on the cure probability. A key challenge is that the cure indicator (ν) is only partially observed due to censoring. To address this, we estimate the cure status before applying the test. The methodology is validated through extensive simulation studies, assessing its power and robustness under different scenarios. Additionally, we apply the proposed test to data from a randomized clinical trial on rheumatoid arthritis treatment to identify covariates influencing disease remission.

Results: Simulation studies demonstrate the effectiveness of the proposed method in detecting covariate effects on the cure probability. When applied to the clinical trial data, the test identifies specific covariates associated with an increased probability of experiencing a flare-up. These findings provide new insights into factors influencing disease progression and treatment response in rheumatoid arthritis patients.

posters-monday-BioZ: 10

Aligning Estimators to Treatment Effects in the presence of Intercurrent Events in the Analyses of Safety Outcomes

Pedro Lopez-Romero¹, Brenda Crowe², Philip He³, Natalia Kan-Dobrosky⁴, Andreas Sashegyi², Jonathan Siegel⁵

¹Novartis, Spain; ²Eli Lilly, USA; ³Daiichi Sankyo Inc, USA; ⁴AbbVie Inc, USA; ⁵Bayer, USA

Introduction: The evaluation of safety is a crucial aspect of drug development. The ICH Estimand Framework (EF) defines clinically relevant treatment effects in the presence of intercurrent events (ICE) and can enhance this evaluation. However, its application in safety evaluation is uncommon. Additionally, sometimes it is not evident which specific estimand a given estimator is targeting, leading to the implementation of analytical strategies that may not align with the treatment effect of clinical interest.

Methods: This work reviews the clinical questions or treatment effects (estimands) that are most common in the safety evaluation of drugs and the strategies outlined in the EF that reflect those treatment effects. We examine the most common statistical estimators used to assess the risk of drugs, including incidence proportions, Aalen-Johansen estimator, expected adjusted incidence rates and 1 minus Kaplan-Meier, focusing on the interpretation of the estimates and on the estimand they target, depending on how ICEs are defined for analysis, e.g. ignored, censored, or as competing events. By understanding a) the treatment effects that we can feasibly define in the presence of ICEs and b) the estimand that is targeted by different estimators, our goal is to define treatment effects that are clinically meaningful for the evaluation of safety, and to use the estimator that aligns with the treatment effect of interest, so that the treatment effect estimates are meaningful and interpretable.

Results: Our review includes treatment effects or estimands that are relevant to the evaluation of drug safety, such as treatment policy, hypothetical and while-on-treatment, considering ICE such as early treatment discontinuation or use of rescue medication. We explain why the common estimators target different estimands, helping researchers to select the estimator that aligns with the treatment effect of interest. A misalignment between the estimator and the treatment effect of interest can eventually lead to misinterpretations of safety results that potentially can compromise the understanding about the safety profile of a drug.

Conclusions: Applying the EF to safety evaluation can improve the interpretability of treatment effects in clinical development, both in the area of signal detection and in the analysis of selected adverse events of special interest. By clearly defining the estimand and selecting the appropriate statistical method, researchers can ensure that their analyses align with clinically relevant questions. This approach enhances the accuracy and reliability of safety assessments, ultimately contributing to better-informed decision-making in drug development by regulators, physicians, patients and other stakeholders.

posters-monday-BioZ: 11

CUtools: an R package for clinical utility analysis of predictive models

María Escorihuela Sahún¹, Luis Marianos Esteban Escaño¹, Gerardo Sanz², Ángel Borque-Fernando³

¹Department of Applied Mathematics, Escuela Universitaria Politécnica La Almunia, University of Zaragoza, Spain; ²Department of Statistical Methods, University of Zaragoza, Spain; ³Urology department, Miguel Servet university hospital, Spain

This work presents a new library in R that provides statistical techniques to validate and evaluate a prediction model both analytically and graphically. The library offers the functions CUC_plot, CUC_table, Efficacy, Efficacy_curve, and Efficacy_test to construct the clinical utility curve, a table of clinical utility values, the efficacy of a biomarker, the efficacy curve, and a test to compare the efficacy of biomarkers.

The purpose of predictive models in clinical diagnosis is to define a biomarker that accurately predicts the occurrence of an event related to a disease. To analyse the predictive capability of a biomarker, this library provides, as an initial output, the clinical utility curve via the CUC_plot function. Clinical utility assesses the benefit of a biomarker used as a dichotomous classifier with a cut-off point. On the X-axis, the possible cut-off points of a biomarker as a continuous variable are plotted, and on the Y-axis, two magnitudes appear: the percentage of misclassified events and the percentage of individuals below the cut-off point. These values represent the false negative rate and the number of treatments avoided when applying the model. Additionally, the CUC_table function provides the numerical values represented graphically.

Another way to analyse the clinical utility of a biomarker is by calculating its efficacy. To study efficacy, this library offers an analytical result with the Efficacy function and a graphical result with the Efficacy_curve function. On the one hand, the numerical value of the marker’s efficacy is obtained as the difference between the treatments avoided by the model and the misclassified events; on the other hand, a graph is produced in which the X-axis shows the values of misclassified events versus the efficacy of the proposed model.

posters-monday-BioZ: 12

Impact of Particulate Matter 2.5 Levels on Chronic Obstructive Pulmonary Disease: An Analysis of Nationwide Claims Data in Thailand

Pawin Numthavaj¹, Tint Lwin Win¹, Chaiyawat Suppasilp¹, Wanchana Ponthongmak¹, Panu Looareesuwan¹, Suparee Boonmanunt¹, Oraluck Pattanaprateep¹, Prapaporn Pornsuriyasak¹, Chathaya Wongrathanandha², Kriengsak Vareesangthip³, Phunchai Charatcharoenwitthaya⁴, Atiporn Ingsathit¹, Ammarin Thakkinstian¹

¹Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Thailand; ²Department of Community Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University; ³Division of Nephrology, Department of Medicine, Faculty of Medicine Siriraj Hospital, Mahidol University; ⁴Division of Gastroenterology, Department of Medicine, Faculty of Medicine Siriraj Hospital, Mahidol University

Introduction: Particulate matter 2.5 (PM 2.5) levels have been associated with morbidity and mortality in chronic obstructive pulmonary diseases (COPD). We explored the association between levels of PM 2.5 and exacerbations documented in a Thailand national database claim by the National Health Security Office, which covers about 70% of the Thai population.

Methods: We extracted the data of COPD exacerbations from the identified international classification of disease - 10^th version (ICD-10) among patients who were more than 40 years old, as well as verified the information upon documented procedures of usage of nebulizer, intubation, ventilator use, and temporary tracheostomy performed. Data of PM 2.5 levels were estimated from satellite data formula verified with ground datapoint collection. Incidences of COPD exacerbation were then calculated for each week of each district of provinces across Thailand and were modelled for a relationship with the exposure of PM 2.5 in the previous seven days, adjusted for age, gender, and baseline rates of diagnosed comorbidities of cancer, asthma, hypertension, heart failure, anxiety, depression, obesity, diabetes, and dyslipidaemia in mixed-effect Poisson regression with random intercept using R. We also explored the formula for averaging PM 2.5 in the area with the standard average and the area-weighted PM 2.5 level on the model fitness.

Results: A total number of 407,866 verified COPD patients from January 2017 until December 2002 were identified, corresponding to 1,687,517 hospital visits. Among these visits, exacerbation or visits that required lower airway interventions happened in a total of 1,687,517 visits (9.9%). Multivariate Poisson regression analysis found that the incidence rate ratio (IRR) of COPD exacerbation of 1.00098 for each 1 microgram per cubic metre of PM 2.5 increment (95%CI 1.00091 – 1.00106). The weighted PM 2.5 formula was found to have less Akaike information criterion and Bayesian information criterion values in the multivariate model compared to the standard average PM 2.5 calculation used in previous studies (6,278,282 vs. 6,279,384, and 6,279,627 vs. 6,278,539, respectively).

Conclusion: From our analysis, the PM 2.5 level is associated with an increase in the occurrence of COPD exacerbation. We also found that the weighted formula used to calculate the exposure levels seems to fit the data more than the regular formula used in the traditional formula in the literature.

posters-monday-BioZ: 13

Changes in health services use of a cohort of COPD patients from a pre-pandemic to a COVID-19 pandemic period

Jose M Quintana^1,2,4,5, Maria J Legarreta^1,2,4,5, Nere Larrea^1,2,4,5, Irantzu Barrio^2,4,5,6, Amaia Aramburu^1,3, Cristóbal Esteban^1,3

¹Osakidetza/SVS - Galdakao-Usansolo Hospital, Spain; ²Instituto Biosistemak, Bilbao, Spain; ³Instituto BioBizkaia, Barakaldo, Spain; ⁴REDISSEC; ⁵RICAPPS; ⁶UPV/EHU

Background. COVID-19 pandemic had negative effects on health especially in people with chronic diseases. We evaluate the differences in health services use among patients with chronic obstructive pulmonary disease (COPD) during the period of 2017-2019 compared to 2020-2022, COVID pandemic period.

Methods. Cohort of patients recruited from different hospital who had an admission due to COPD exacerbation. Sociodemographic and clinical data were collected from all participants at 2016. A follow up was performed at 2022 with those who agreed to participate, focusing on their use of health services. This included number hospital admissions by any cause, to ICU, visits to Emergency Room, consultations with primary care physician, nurse, or medical specialists. The data was collected for the periods of 2017-2019 and 2020-2022. A sample of patients in the form of paired data was generated where time 1 corresponds to the years 2017-2019 and time 2 of the same patient corresponds to the 2020-2022 period. From these data, multivariate negative binomial regression models were developed for all the number of service usage even data with random effects for patients. Models were adjusted by study period age, Charlson Index, previous admissions and SARS-CoV-2 infection or hospital admission on period 2.

Results. Out of the original cohort of 1,401 patients, 703 (50.2%) died during the follow-up period. Of the remaining, 314 (45%) chose not to participate in the study, while 384 (55%) did participate. The mean age of the participants was 69.2 years (SD: ±9.8), with men constituting 72.1% of the sample. We observed a statistically significant reduction in the number of hospital admissions, ICU admissions, emergency visits, and face-to-face visits with primary care doctors from the first period to the second period. However, there was no significant change in the number of face-to-face consultations with primary care nurses or pneumologists. Having a SARS-CoV-2 infection or being admitted for it during the second period was associated with an increase in hospital admissions, emergency visits, and face-to-face consultations with pneumologists and primary care nurses. Additionally, SARS-CoV-2 infection influenced the face-to-face visits to primary care doctors, but neither factor affected ICU admissions.

Conclusion. COVID-19 pandemic had an important negative effect on patients with COPD. On the one hand, access to the use of most health services in these patients decreased significantly. On the other hand, having had a SARS-COV-2 infection or a hospital admission by it was related to a greater use of these health services.

posters-monday-BioZ: 14

The ISARIC Clinical Epidemiology Platform: Standardized Analytical Pipelines for Rapid Outbreak Response

Esteban Garcia-Gallo¹, Tom Edinburgh¹, Sara Duque¹, Leonardo Bastos², Igor Tona Peres², Elise Pesonel¹, Laura Merson¹

¹Pandemic Sciences Institute, University of Oxford (United Kingdom); ²Pontifical Catholic University of Rio de Janeiro (Brazil)

Background:
The International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) is a global research network facilitating rapid clinical responses to infectious disease outbreaks. Comprising 60 members across 133 countries, ISARIC has generated critical evidence for diseases such as COVID-19, dengue, Ebola, and mpox. Its guiding principles—Prepare, Integrate, Collaborate, and Share—support research readiness, integration with public health systems, strong partnerships, and open-access resource-sharing.

Since 2012, the ISARIC Clinical Characterisation Protocol (CCP) has enabled standardized, adaptable investigations of high-consequence pathogens. During the COVID-19 pandemic, ISARIC’s CRF was widely adopted, contributing to a dataset of one million patients. Lessons from past outbreaks underscore the need for both flexibility and standardization in clinical research. A decentralized approach ensures local data ownership while enabling global integration, scalability, and equitable collaboration—key principles driving the development of the ISARIC Clinical Epidemiology Platform (ISARIC-CEP).

Methods:
The ISARIC-CEP consists of three tools—ARC, BRIDGE, and VERTEX—designed to streamline data collection, curation, analysis, and evidence generation. ARC provides a machine-readable library of standardized CRF questions, BRIDGE automates CRF generation for seamless REDCap integration, and VERTEX is an open-source application comprising three packages:

get_REDCap_Data: Harmonizes and transforms ARC-formatted REDCap data into analysis-ready dataframes.
ISARICAnalytics: A set of Reusable Analytical Pipelines (RAPs) standardizing key epidemiological analyses, including descriptive statistics, data imputation, regression models, feature selection, and survival analysis.
ISARICDraw: Generates interactive dashboards with customizable outbreak-specific visualizations using Plotly.

VERTEX supports insight panels, organizing outputs into thematic sections, and its adaptable framework enables secure customized dashboards for multiple projects.

Results:
The ISARIC-CEP has accelerated clinical research responses, including studies on dengue in Southeast Asia and Brazil. By providing openly accessible tools, it has facilitated high-quality analyses for both scientific and public health communities. Key resources include:

ARC: https://github.com/ISARICResearch/ARC
BRIDGE: http://bridge.isaric.org/
VERTEX: https://github.com/ISARICResearch/VERTEX
Public Dashboard Example: http://vertex-observationalcohortstudy-mpox-drc.isaric.org

Conclusions:
The ISARIC-CEP accelerates outbreak research by ensuring that during an initial response, most time is spent on data capture, not on harmonization, curation, or preparation for analysis. VERTEX’s RAPs streamline analyses, allowing standardized workflows to be shared and adapted across outbreaks, reducing duplication and improving efficiency. Our goal is to build a collaborative community where researchers contribute RAPs, making validated methodologies easily integrable and reusable, amplifying their real-world impact. This approach strengthens clinical research sites, providing automated tools that enhance local capacity and ensure rapid, reproducible, and scalable outbreak analyses.

posters-monday-BioZ: 15

Topic modelling and time-series analysis to explore methodological trend evolution

Gabrielle Gauthier-Gagné, Tibor Schuster

McGill University, Canada

Background: Statistical methodology used in biomedical research is evolving rapidly, driven by advances in biostatistical approaches and increased integration of machine learning techniques and causal inference frameworks. This convergence is reshaping the methodological foundations that underlie the analysis and interpretation of biomedical data in the literature. Both applied and methodological researchers may wish to explore these trends in their field to better understand the associated implications for evaluating, planning and conducting future studies. However, exploring these trends using conventional literature reviews is both time-consuming and requires periodical updates as the field develops. Therefore, we propose leveraging topic modelling and time-series analysis to explore methodological trend evolution which can easily be replicated and updated.

Methods: We considered two parallel case studies to informally assess the utility of the proposed approach: examination of i) the literature on clinical trials and ii) literature pertaining to medical records. We employed readily available APIs to systematically extract PubMed abstract data related to studies which conducted clinical trials or examined medical records, respectively, in the last 10 years. Abstract text data was tokenized and structured as a document-term matrix (DTM). A large language model software was used to generate an exhaustive dictionary of terms (uni- and bigrams) commonly used in statistics and machine learning. The DTM was reduced to include only the terms corresponding to the entries of the derived term dictionary. Very common terms were additionally excluded. Latent Dirichlet Allocation was used to uncover latent topics across abstracts and to enable mapping of the distribution of topics within abstracts. Time-series analysis was used to characterize and visualize the trends of (average) topical prevalence over time (months), leveraging abstract publication dates and corresponding topic distributions.

Results: The search identified 166, 932 and 7,999 unique abstracts relating to clinical trials or medical record studies, respectively, for review. The generated statistical and machine-learning term lists contained 1803 statistical- and 200 machine-learning-related terms and bigrams. Both time series analyses and visualizations of topic trends over the past decade indicate dynamic and distinct shifts in the landscape of statistical methodology specific to each case study.

Conclusion: We demonstrate that topic modelling paired with time-series analysis are powerful tools for methodological researchers to explore the evolution of statistical methodologies in their field over time.

posters-monday-BioZ: 16

Post-stroke facial palsy: prevalence on admission, risk factors, and recovery with hyperacute treatments

Zewen Lu^1,2, Havva Sumeyye Eroglu³, Halvor Næss⁴, Matthew Gittins^1,2, Amit K Kishore^2,5, Craig J Smith^2,5, Andy Vail^1,2, Claire Mitchell³

¹Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, UK; ²Manchester Centre for Clinical Neuroscience, Geoffrey Jefferson Brain Research Centre, Manchester Academic Health Centre, Manchester Academic Health Centre, Salford Care Organisation, Northern Care Alliance NHS Foundation Trust, UK; ³Division of Psychology, Communication & Human Neuroscience, Geoffrey Jefferson Brain Research Centre, University of Manchester, Manchester, UK; ⁴Department of Neurology, University of Bergen, Haukeland University Hospital, Bergen, Norway; ⁵Division of Cardiovascular Sciences, Faculty of Biology, Medicine and Health, University of Manchester, UK

Background
Facial palsy affects 40 - 50% of stroke survivors, impacting quality of life, communication, and emotional expression. This study estimated its prevalence, identified risk factors, assessed 7-day recovery post-admission and examined associations between hyper-acute treatments (intravenous thrombolysis [IVT] and mechanical thrombectomy [MT]) and recovery in acute ischaemic stroke (AIS) patients.

Methods
This was a retrospective individual data analysis of the Bergen NORSTROKE registry with 5987 patients (2006–2021). Only 2293 patients with facial palsy were included in our recovery analysis. We further investigated the association of hyper-acute treatments with facial palsy recovery for 1954 patients with AIS. The complete case analysis was used in each stage of analysis due to minimal missing data. Facial palsy was assessed via the National Institute of Health Stroke Scale. Prevalence and severity of facial palsy on admission were analysed using descriptive statistics, while multifactorial logistic regression explored associations with demographics, stroke subtypes, and neurological symptom clusters. Kaplan-Meier survival curves estimated recovery rates within seven days of admission, and Cox proportional hazards models identified factors associated with recovery. The association between hyper-acute treatments and recovery was assessed using Cox models with time-dependent covariates, adjusting for baseline characteristics.

Results
Facial palsy was present in 43% of patients on admission, with 40% experiencing minor or partial paralysis and 3% complete paralysis. Significant risk factors included sex, age, admission motor and sensory function, and ischaemic stroke. By day 3, 25% of patients had recovered, but over 60% still had facial palsy by day 7. Better admission motor and sensory function were strongly associated with recovery. Receiving IVT showed a significant association with better recovery in unadjusted analyses, but neither IVT nor MT were significant in adjusted models.

Conclusions
Post-stroke facial palsy is common on admission, with less than 40% of patients recovering within the first week. This highlights the need for targeted monitoring and rehabilitation. Further research is to explore the role of hyper-acute treatments in longer-term recovery.

posters-monday-BioZ: 17

Evaluating Outlier Detection Methods in Real-World Growth Data: A Sensitivity Analysis of Imperfect Data in a Cluster Randomised Controlled Trial

Maryam Shojaei Shahrokhabadi¹, Mohadeseh Shojaei Shahrokhabadi², Bram Burger³, Ashley J. Adamson², Dawn Teare²

¹Hasselt University, Belgium; ²Newcastle University, UK; ³Uppsala University, Sweden

Background: Growth studies with longitudinal measurements need outlier detection methods that can consider diverse, individual growth trajectories. Several methodological approaches have been developed, with distinct underlying assumptions, which can lead to differing results, potentially influencing study conclusions. To assess the reliability and robustness of primary analyses, we conducted a sensitivity analysis exploring the impact of multiple outlier detection methods on findings from the MapMe 2 study [1].

Methods: The MapMe 2 study, a cluster randomised controlled trial (cRCT), evaluated whether incorporating the MapMe 2 intervention into existing National Child Measurement Programme (NCMP) feedback letters improved child weight outcomes after one year. The primary outcome compared the change in BMI Z-score between intervention and control groups, including all children irrespective of baseline weight status, and specifically among children with a BMI Z-score > 1.33 at baseline. While the study initially used static WHO cut-offs to identify extreme or biologically implausible values (BIVs), in this large-scale trial we explored alternative outlier detection methods. Five approaches were compared to the original sBIV method [2]: (1) modified BIV detection (mBIV), (2) single-model outlier measurement detection (SMOM), (3) multi-model outlier measurement detection (MMOM), (4) multi-model outlier trajectory detection (MMOT), and (5) clustering-based outlier trajectory detection (COT). We then evaluated the impact of these methods on the study findings.

Results: Different outlier detection methods resulted in variations in the number of subjects analysed and slight changes in the estimated effect of the MapMe 2 intervention on BMI Z-score change at one year. However, these differences were minimal, and the overall trends remained consistent.

Conclusion: Sensitivity analyses under varying assumptions yielded results consistent with the primary analysis, confirming its robustness and reinforcing confidence in the trial findings.

References:

Adamson AJ, et al. Can embedding the MapMe2 intervention in the National Child Measurement Programme lead to improved child weight outcomes at one year? 2021. Trial registration: [ISRCTN12378125]. Available from: https://www.isrctn.com/ISRCTN12378125.
Massara P, Asrar A, Bourdon C, Ngari M, Keown-Stoneman CD, Maguire JL, Birken CS, Berkley JA, Bandsma RH, Comelli EM. New approaches and technical considerations in detecting outlier measurements and trajectories in longitudinal children growth data. BMC Medical Research Methodology. 2023 Oct 13;23(1):232.

posters-monday-BioZ: 18

Latent class analysis on intersectional social identities and mental wellbeing among ethnic minority youth in Aotearoa New Zealand

Arier Lee¹, Shanthi Ameratunga^1,2, Rodrigo Ramalho¹, Rachel Simon-Kumar¹, Vartika Sharma¹, Renee Liang¹, Kristy Kang¹, Terryann Clark³, Terry Fleming⁴, Roshini Peiris-John¹

¹School of Population Health, University of Auckland, Auckland, New Zealand; ²Population Health Gain, Population Planning Funding and Outcomes Directorate, Te Whatu Ora – Health New Zealand, Auckland, New Zealand; ³School of Nursing, University of Auckland, Auckland, New Zealand; ⁴School of Health, Victoria University of Wellington, Wellington, New Zealand

Background / Introduction

Ethnic minority youth in Aotearoa New Zealand who identify as Asian, Middle Eastern, Latin American, or African navigate multiple shifting identities. Conventional approaches in the literature often frame their experiences through a single social dimension, such as ethnicity. However, this limits deeper insights into how overlapping social identities, linked to broader structural inequities, affect emotional wellbeing. Using an intersectional framework, this study explored how multiple social identities and affiliations influence the mental health and wellbeing of ethnic minority young people.

Methods

We analysed cross-sectional data from 2,111 ethnic minority youth (99% aged 13 to 19) who participated in a population-based secondary school survey in New Zealand in 2019. Latent Class Analysis (LCA) was employed to identify unobserved social affiliation groups based on categorical variables, including sex, sexual and gender identities, religion, perceived ethnicity, migrant generational status, disability, and material deprivation. LCA was also applied to nine family connectedness indicators (e.g., trust in sharing feelings with a family member), classifying participants into distinct family support groups. Multiple logistic regression models were used to predict the outcomes of mental health and wellbeing, by LCA-identified social affiliation and family support groups, and experiences of discrimination and bullying.

Results

LCA identified four distinct social affiliation groups among ethnic minority youth:

Least marginalised, mixed migration generations
Some marginalised affiliations, mainly overseas-born
Some marginalised affiliations, mainly NZ-born
Multiply marginalised, mixed migration generations

The least marginalised group (Group 1) reported the best mental health and wellbeing outcomes, followed by Groups 2 and 3, while the multiply marginalised group (Group 4) exhibited the highest risks of adverse health outcomes. Independent of social affiliation group, experiences of discrimination and bullying were strongly associated with increased risks of poor mental health. However, higher levels of family support significantly reduced these risks across all social affiliation groups.

Conclusion

Marginalised social identities have cumulative harmful effects on the mental health and wellbeing of ethnic minority youth, but family support can serve to, mitigate some, but not all risk. The use of LCA enabled the classification of participants into distinct social affiliation groups based on multiple intersecting social identity variables, without assuming their independence, thus providing a more nuanced relationship between identity and mental health outcomes. These findings underscore the need to create inclusive and supportive environments for ethnic minority youth and their families.

posters-monday-BioZ: 19

Using multiple imputation in real-word data studies to aid in the identification of predictors of response while addressing missing data

Jozefien Buyze¹, Lada Mitchell², Lorenzo Acciarri³

¹Johnson & Johnson, Beerse, Belgium; ²Johnson & Johnson, Allschwil, Switzerland; ³CRO Valos (J&J partner), Genova, Italy

Background: In real-world data (RWD) studies, the inference drawn from estimates can be jeopardized by missingness in key variables. Recent guidance from the FDA (March 2024) and ICH EMA (May 2024) emphasizes the importance of addressing this issue. This research aims to address missing data for covariates in RWD studies. The hypothesis is that multiple imputation helps to reduce bias and improve validity, reliability, and efficiency of the estimation methods.

Methods: Multiple imputation was applied, assuming data is missing at random (MAR). Multiple imputation preserves all cases and accounts for uncertainty due to missing data. It is crucial to recognize that if the MAR assumption is violated, the results may be biased. Given the non-monotone missing data pattern observed, we applied the fully conditional method for imputing missing variables. This method does not rely on joint distribution but generates separate conditional distributions for each variable needing imputation (Van Buuren 2007).

Understanding the effectiveness of the standard of care and the predictors for it remains an area of unmet need. The case study utilizes data pooled from two prospective single-arm oncology real-world data (RWD) studies, where missing data is present in several baseline covariates relevant to the statistical model for the effectiveness variable, ie for the overall response rate (ORR).

The performance of multiple imputation in different scenarios with varying amounts of missingness was investigated via simulations.

Results: The models were applied on pooled data (N=302) of two RWD studies. The results of the models applied to the 50 imputed datasets were combined using Rubin’s rules (Rubin 1996). Notably, 59% of patients did not have missing data for the selected covariates. Applying multiple imputation allowed for the identification of covariates that affect standard of care effectiveness. Potential predictors for ORR include number of prior lines of therapy, refractory status, thrombocytes, and type of measurable disease. Simulation outcomes further validated the results.

Conclusions: This research investigated methodologies for handling missing data in RWD studies and established a clear framework for applying multiple imputation for important covariates within the context of multiple myeloma. The results show that multiple imputation helped to reduce bias and improve validity, reliability, and efficiency of the prediction methods.

posters-monday-BioZ: 20

An imputation method for heterogeneous studies in Network Meta-Analysis: A Fully Conditional Specification approach using distance metrics

Christos Christogiannis^1,2, Dimitris Mavridis²

¹University of Southampton, UK; ²University of Ioannina, Greece

Background: Multiple Imputation (MI) is a popular method for addressing missing data in Individual Patient Data (IPD) meta-analysis. In an IPD meta-analysis with missing data, the complete case analysis (CCA) is considered a reasonable starting point, and then MI as a sensitivity analysis, and vice versa. Fully Conditional Specification (FSC) is a MI method that addresses missing data by imputing one variable at a time, cycling through iterations of univariate models. In each iteration, the incomplete variable is imputed based both on the complete and previously imputed variables.

Methods: Our approach involves estimating the proximity between studies using various distance metrics. By doing so, we identify group studies. Then, we use imputation for each study individually, borrowing information from those studies that exhibit close proximity in terms of distance. Therefore, imputation is informed by neighboring studies, enhancing its accuracy. We conducted a simulation study to evaluate the properties of the suggested methodology and explore how number of studies, number of patients per study, missing rates, standard deviation of the covariates, heterogeneity, and the correlation of covariates affect the results. After accounting for all the aforementioned factors, we resulted in 216 distinct simulation scenarios. The methods that we compared were CCA, FCS, our proposed approach and the full model as it would be if no missing values were induced in the data. The missing mechanism was set to be MAR.

Results: Simulation results were similar between FCS and the proposed method for small percentage of missingness. In scenarios with percentage of missingness of 50% the proposed method outperformed the FCS imputation method in most of the cases. As missingness percentages decreased, our method yielded similar results to FCS, with differences in the third decimal place. More specifically, it had a closer coverage rate (CR) to 95% and was less biased than FCS approach but had a slightly higher root mean square error (RMSE).

Conclusion: The proposed method yielded robust results after evaluation. This means that our method may substantially improve estimation when heterogeneous studies are present in IPD meta-analysis.

posters-monday-BioZ: 21

Impact of lack of measurement invariance on causal inference in randomized controlled-trials including patient-reported outcome measures: a simulation study

Corentin Choisy, Yseulys Dubuy, Véronique Sébille

Nantes Université, Université de Tours, INSERM, methodS in Patients-centered outcomes and HEalth ResEarch, SPHERE, 44200 Nantes, France.

Aims: Randomized controlled-trials (RCTs) are considered as the gold standard for causal inference. RCTs often include patient-reported outcome measures (PROMs) giving insight into patients’ subjective experience regarding e.g., quality of life, fatigue, using questionnaires. PROMs are often treated as any other outcome, e.g. blood pressure, despite having their own specificities. For instance, when measuring fatigue, patients’ interpretation of PROM items can differ between groups (Differential Item Functioning, DIF) or change over time (Responses Shift, RS) despite similar fatigue levels. In RCTs, randomization should ensure the absence of DIF at baseline. However, RS may subsequently occur differentially between treatment groups during the study, possibly leading to treatment-related DIF when assessing outcomes post-randomization. While such instances of lack of measurement invariance (MI) may provide a better understanding of patients’ experiences, they can also induce measurement bias, if ignored. Our objectives were to measure the impact of lack of MI on causal inference in RCTs and determine how different statistical approaches can handle lack of MI and restore causal inference using a simulation study.

Methods: Responses to a PROM were simulated two mimic a two-arm RCT with varying sample size, treatment effect (under H₀ and H₁) and number of items. The number of items affected by DIF and DIF size also varied. Partial credit models (PCM) were used to estimate treatment effect with three strategies: S1: ignoring DIF, S2 and S3: accounting for DIF using two PCM-based iterative procedures, either performing tests on PCM parameters (S2) or an analysis of variance of person-item residuals (S3).

Results: When DIF was not simulated, it was not falsely evidenced by S2 and S3. When DIF was simulated and ignored (S1), scenarios under H₀ showed high type-I error rates (up to 74 %), and treatment effect estimations were biased under H₀ and H₁. Overall, bias increased with the size and the proportion of items affected by DIF.

S2 and S3 helped to reduce DIF impact on bias, type-I error, and restore power in scenarios with a sample size of 600 patients. However, they only provided marginal improvements with smaller sample sizes.

Conclusion: This study highlights that causal inference in RCT can be compromised by lack of MI, if ignored or inappropriately recovered. Methods aiming at detecting and accounting for lack of MI can help reduce the risk of biased estimates of treatment effect, particularly when sample size is large.

posters-monday-BioZ: 22

Evaluation of the Psychometric Qualities of Idiographic Patient Reported Outcome Measures (I-PROMs) for Patients Monitoring: PSYCHLOPS example

Salma Ahmed Ayis¹, Luís Miguel Madeira Faísca², Célia Sales³

¹School of Life Course and Population Sciences; King's College London, United Kingdom; ²The University of Algarve, Portugal; ³Faculty of Psychology and Education Sciences; University of Porto (FPCEUP)

Introduction/background:

Nomothetic measures are standardised questionnaires that measure patients’ self-reported experiences (Patient Reported Outcome Measures (PROMs)). PROMs are brief, acceptable to patients and assessors, and broad enough to capture a breadth of difficulties and experiences, allowing for population level comparisons. Patients assign scores against norms derived from clinical and non-clinical populations. Change in scores is often used in trials to assess therapeutic effect.

However, nomothetic PROMs are unable to capture unique problems, and circumstances. Patient-Generated Outcome Measures, known as Idiographic PROMs (I-PROMs), allow people to identify their problems, describe these and provide scores to indicate their impact; therefore, allowing the use of appropriate interventions, and the assessment of the efficacy of interventions. The Psychological Outcome Profiles (PSYCHLOPS) is an I-PROM with questions on problems, function, and wellbeing, where patients can describe their problems and their severity scores. WHO have been using PSYCHLOPS for many years as part of their ‘Problem Management Plus’ intervention.

Nomothetic measures assume that individual questionnaires’ items assess one or more underlying construct that can be summarised using latent class-based methods. I-PROMs on the other hand, primarily value the uniqueness of individual experiences, perceptions, and constructions, therefor, using an underlying construct is considered inappropriate in reflecting persons’ expressions.

In two studies we examined the theory behind I-PROMs and the potential value of latent class methods in providing an insight into these measures. Factor analysis and Item Response Theory (IRT) were used to understand the properties of PSYCHLOPS, an I-PROM.

Methods:

Pre- and post-treatment PSYCHLOPS data derived from six clinical samples (n = 939) were analysed for validity, reliability and responsiveness; caseness cut-offs and reliable change index were calculated. Exploratory and Confirmatory Factor Analyses were used to determine whether items represented a unidimensional construct; IRT examined items’ properties.

Results:

Estimates for internal consistency, construct validity, and structural validity were satisfactory. Responsiveness was high: Cohen’s d, 1.48. Caseness cut-off and reliable clinical change scores were 6.41 and 4.63, respectively. Factor analysis supports items’ unemotionality. IRT analysis confirmed that items’ scores possess strong properties in assessing the underlying trait measured by PSYCHLOPS.

Conclusion:

PSYCHLOPS functioned as a measure of a single latent trait, which we describe as ‘personal distress’.

There are several challenges for I-PROMs including the robustness of the items to be measured, their measurement model, their reliability and validity, and the meaning of an aggregated I-PROM score. I-PROMs may complement nomothetic measures.

posters-monday-BioZ: 23

Bias in the estimation of a psychometric function when using the PSI-method under optimal conditions – a simulation study

Simon Grøntved^1,2, Jakob Nebeling Hedegaard¹, Ib Thorsgaard Jensen³, Daniel Skak Mazhari-Jensen⁴

¹Danish Center for Health Services Research, Department of Clinical Medicine, Aalborg University, Denmark; ²Psychiatry, Region North Jutland, Denmark; ³Statistics and Mathematical Economics, Department of Mathematical Science, Aalborg University, Denmark; ⁴Neural Engineering and Neurophysiology, Department of Health Science and Technology, Aalborg University, Denmark

Background

The PSI-method is a Bayesian adaptive method intended to estimate the threshold and slope of a parametrized psychometric function. The method has been used in both research and clinical practice. It was proposed as an improvement over non-adaptive methods due to a potential need for fewer trials before convergence of estimates is achieved. This has resulted in several studies only running 30-40 stimulation trials when estimation was terminated. A similar range of trials was deemed sufficient in the original study presented by the developers of the algorithm.

While concerns about the choice of parametrization for the lapse rate have been raised, the number of trials needed and how this number relates to estimation of thresholds have been under less scrutiny.

Aim

We aimed to investigate the potential for bias of the PSI-method's estimates of threshold and slope of the psychometric function, and to investigate whether such bias depended on the number of trials used, the ground-truth threshold, and the slope.

Method

We tested the PSI-method (as implemented by the Palamedes toolbox) in a simulation study with 3874 different personality profiles, and 175 simulations per profile. We used a uniform prior for threshold and slope. To restrict potential bias for threshold and slope, we fixed the lapse and guess rates to the ground-truths. We calculated the relative bias in the estimation of threshold along with confidence intervals, and plotted these against the number of trials used, the underlying threshold and slope.

Results

We found presence of bias in the estimation of alpha after 50 trials, where the mean relative bias was positive across most person profiles, median 18.1% [IQR: 11.1%, 44.6%], but in the most extreme cases as high as 147.5%. The observed bias was dependent on the ground-truth threshold, and slope. Increasing the number of trials to 150, the relative bias was considerably reduced to median 4.7% [IQR: 2.6%, 8.9%]. At 1000 trials the relative bias was negligible, median 0.7% [IQR 0.1%, 1.6%], though still mostly positive.

Conclusion

Our results indicate the presence of non-negligible bias in threshold estimation when stopping the PSI-method at the typical number of trials used in real-words settings. This bias was found on simulated data under optimal conditions. We thus conclude that the method requires a significantly greater number of trials than typically used. It should be investigated whether these results can be reproduced in a real-world setting.

posters-monday-BioZ: 24

Psychometric properties confirmation of the Multiple Sclerosis Autonomy Scale (MSAS) questionnaire evaluating patient autonomy in multiple sclerosis (MS)

Cécile Donzé², Claude Mekies³, Géraud Paillot⁴, Lucie Brechenmacher¹, Alexandre Civet¹, David Pau¹, Delphine Chomette¹, Mikael Cohen⁵, Catherine Mouzawak⁶, Patrick Vermersch⁷

¹Roche SAS, France; ²Hôpital saint Philibert, Groupement des Hôpitaux de l'Institut Catholique de Lille Faculté de médecine et de maïeutique de Lille, Lomme, France; ³RAMSAY Clinique des Cèdres, Neurologie, CHU Toulouse, Toulouse, France; ⁴Association Aventure Hustive, Saint-Malo, France; ⁵CRC-SEP Neurologie Pasteur 2, CHU de Nice, Université Côte d’Azur, UMR2CA-URRIS, Nice, France; ⁶Structure régionale neuro SEP SYNAPSE, Hôpital du Vésinet, Le Vésinet, France; ⁷Univ. Lille, INSERM UMR1172 LilNCog, CHU Lille, FHU Precise, Lille, France

Introduction

The Multiple Sclerosis Autonomy Scale (MSAS) is a new Patient Reported Outcome (PRO) aiming to evaluate patient autonomy in multiple sclerosis. Our current study's primary objective is to validate the psychometric properties of the MSAS questionnaire.

Methods

A longitudinal prospective observational study included MS patients from January 2024 to May 2024 in 33 sites.

The initial MSAS questionnaire contains 10 dimensions in a 36-items short form and has to be completed by patients at inclusion, D15, D30 and up to one year after inclusion (study is still ongoing as of today).

Several of the psychometric properties of the MSAS have been evaluated including its construct validity (correlation coefficient between items), Internal consistency (Cronbach's alpha coefficient), unidimensionality (Retrograde Cronbach’s alpha curves) and multiidimensionality (multi trait analysis).

This abstract displays the results of the primary objective of the study evaluated at inclusion, with sensitivity analysis carried out at D15 and D30.

Results

From the 210 patients included in the study from January 2024 to April 2024, 199 completed the MSAS questionnaire at baseline: 132 (66.3%) with relapsing remitting form of MS (RRMS), 23 (11.5%) with primary progressive (PPMS) and 44 (22.1%) with secondary progressive (SPMS).

Internal consistency: Cronbach's alpha coefficient ranged between 0.59 to 0.96 at inclusion. Removal of one item in dimension with the lowest Cronbach's alpha coefficient led to increase the coefficient in this dimension to 0.67.

Construct validity: Few strong correlation coefficient (>|0.8|) between items were observed, and remained between items of the same dimension.

Unidimensionality: Overall, removing impact questions one at a time has no significant impact on Cronbach's alphas. This suggests that the impact questions are highly correlated with each other and are important for the reliability of the scale. The overall Cronbach's alpha coefficient of the questionnaire was 0.845 with 36-items and 0.843 with 35-items.

Multidimensionality: each item was most correlated within its own dimension.

Conclusion

Internal consistency was challenged in a dimension and one item had to be removed. The new MSAS-35 items questionnaire is a psychometrically sound measure of autonomy in Multiple Sclerosis.

posters-monday-BioZ: 25

Learning heterogeneous treatment effect from multiple randomized trials to inform healthcare decision-making: implications and estimation methods

Qingyang Shi¹, Veerle Coupé², Sacha la Bastide-van Gemert³, Talitha Feenstra¹

¹Unit of PharmacoTherapy, -Epidemiology and -Economics, Groningen Research Institute of Pharmacy, University of Groningen, The Netherlands; ²Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, The Netherlands; ³Department of Epidemiology, University Medical Center Groningen, University of Groningen, The Netherlands

Evidence synthesis and meta-analysis are crucial for healthcare decision-making, yet it often assumes treatment effects are shared across populations, neglecting heterogeneity by patients’ characteristics. This review addresses the critical need to account for heterogeneous treatment effects when synthesizing multiple trials’ data to inform decision-making for a specific target population. We present a causal framework for the decision-making process with heterogeneous treatment effects estimated using the data from different sources. We provide an overview of existing methods for estimating these effects from randomized trials, discussing their advantages and limitations in the context of decision-making. The review covers methods utilizing individual patient data (IPD), partly IPD with aggregate data, and exclusively aggregate data. We emphasize the importance of transportability assumptions, such as shared conditional average treatment effect functions and common covariate support, when extrapolating findings from trials to a target population. Furthermore, we discuss value estimation of an optimal treatment rule in the target population, highlighting the necessity of observational data for estimating the baseline function of outcomes. This review aims to guide researchers and practitioners in appropriately applying and interpreting methods for heterogeneous treatment effect estimation that informs healthcare decision-making when using multiple trials’ data.

posters-monday-BioZ: 26

Multiple imputation of missing viral load measurements in HIV treatment trials: a comparison of strategies

Tra My Pham¹, Deborah Ford¹, Anna Turkova¹, Man Chan¹, Ralph DeMasi², Yongwei Wang², Jenny O Huang³, Qiming Liao², James R Carpenter¹, Ian R White¹

¹MRC Clinical Trials Unit at UCL, London, United Kingdom; ²ViiV Healthcare, North Carolina, US; ³GSK, Ontario, Canada

In randomised trials assessing treatments for HIV, a commonly used primary outcome is the proportion of patients achieving or maintaining viral suppression, often defined based on viral load (VL) measurements below a pre-specified threshold, e.g. <400 copies/mL. However, missing data might occur which can impact the analysis of the primary outcome. In addition, in trials of paediatric populations, further complications can arise from measurements being left-censored (i.e. only known to be below a threshold), and obtained from diluted samples due to insufficient volumes (i.e. the limit of quantification is inflated by the dilution factor). As a result, viral suppression status can become unclear.

Multiple imputation (MI) has been used for handling missing outcome data in trials. However, when a continuous outcome such as VL is dichotomised to define the primary outcome, the imputation model specification requires further consideration. Trial statisticians could impute the missing VL measurements before dichotomising them to determine suppression status, or impute a binary indicator of suppression status directly. Alternatively, MI could be performed such that categories of VL measurements, one of which is the threshold for defining suppression, are imputed.

We aim to explore the performance of these MI strategies for handling missing VL data in a simulation study, in setting with/without left-censoring and dilution. To motivate our simulation study, we use data in ODYSSEY, a trial comparing dolutegravir-based antiretroviral treatment with standard of care in children with HIV.¹ The primary outcome was defined as the proportion of patients with virological or clinical treatment failure by 96 weeks. Here we focus on the virological failure component; for simplicity we define the primary outcome for the simulation study as the first of two consecutive VL measurements of ≥400 copies/mL. We simulate VL measurements at baseline and multiple follow-up time points to reflect real trial data collection schedules. VL measurements are made missing under both Missing Completely At Random and Missing At Random mechanisms, and missing data are imputed using different MI strategies. Strategies are compared in terms of method failure, bias, standard errors, coverage, power, and type 1 error. The results of this work will provide the basis for recommendations of practical MI strategies that are relevant to statisticians working in HIV treatment trials.

¹Turkova A, White E, Mujuru HA, et al. Dolutegravir as first- or second-line treatment for HIV-1 infection in children. New England Journal of Medicine 2021; 385: 2531-2543.

posters-monday-BioZ: 27

A novel approach for assessing inconsistency in network meta-analysis: Application to comparative effectiveness analysis of antihypertensive treatments

Kotaro Sasaki^1,2, Hisashi Noma³

¹The Graduate University for Advanced Studies, Japan; ²Eisai Co., Ltd., Japan; ³The Institute of Statistical Mathematics, Japan

Introduction: Network meta-analysis (NMA) is a pivotal methodology for synthesising evidence and comparing the effectiveness of multiple treatments. A key assumption in NMA is consistency, which ensures that direct and indirect evidence are in agreement. When this assumption is violated, inconsistency arises, conceptualized by Higgins et al. [1] as design-by-treatment interactions, where “design” refers to the combination of treatments compared within individual studies. To evaluate inconsistency, various statistical tools have been developed. However, the existing methods based on statistical testing have limitations, including low statistical power and challenges in handling multi-arm studies. Moreover, the testing approaches might not be optimal for inconsistency evaluation, as the primary goal is not to draw definitive conclusions about design-by-treatment interaction but to identify and prioritise specific designs for further investigations into potential sources of bias within the network. To address these challenges, this study proposes a novel approach for evaluating inconsistency using influence diagnostics, focusing on quantifying the impact of individual study designs on the results.

Methods: We developed a "leave-one-design-out" (LODO) analysis framework to systematically quantify the influence of individual designs on the overall NMA results. New influence measures were proposed to evaluate these effects comprehensively. To facilitate interpretation, we also introduced the O-value, a summary metric that prioritises designs based on their potential contribution to inconsistency using a parametric bootstrap method. Additionally, a new testing approach was formulated within the LODO framework to identify critical designs requiring further investigation. These methods were applied to an NMA of antihypertensive drugs comprising various study designs.

Results: The application of the proposed methods identified key designs contributing to inconsistency in the antihypertensive drug NMA. The influence measures effectively quantified the impact of individual designs. Moreover, the novel testing approach highlighted specific designs warranting further investigation to uncover potential biases. In a sensitivity analysis, excluding trials suspected of causing inconsistency, the rankings of certain treatment effects were reversed.

Conclusion: Our proposed method offers an effective framework for evaluating inconsistency in NMA. By enabling the quantitative assessment and prioritisation of individual study designs, it provides deeper insights into the sources of inconsistency and improves the reliability of NMA findings.

References: [1] Higgins JP, Jackson D, Barrett JK, Lu G, Ades AE, White IR. Consistency and inconsistency in network meta-analysis: concepts and models for multi-arm studies. Res. Synth. Methods. 2012;3(2):98-110.

posters-monday-BioZ: 28

Investigating (bio)statistical literacy among health researchers in a Belgian university context: A framework and study protocol

Nadia Dardenne, Anh Diep, Anne-Françoise Donneau

Université De Liège Uliege, Belgium

Introduction Even if the literature highlights the importance of developing (bio)statistical literacy (BSL) through curricula and lifelong trainings, few links are made between the BSL and practices of researchers about statistics. However, the causes of statistical misconducts like p-hacking or HARKing are manifold [1] and need to be investigated as a whole with an appropriate BSL framework.

Framework development A BSL framework will be developed and validated based on current (B)SL definitions[2] and the theory of planned behaviour[3] in order to understand the intentional and behavioural demonstrations, i.e. (intentions) to read/perform statistical reports or analyses - when, why, how often and how - through perceived self-efficacy as consumer and producer of statistics, attitudes towards statistics, subjective norm like pressure and practices from colleagues and basic knowledge of statistic. The objectives of the study will be to assess the BSL by investigating associations among the dimensions of the proposed BSL framework. Also, external factors, notably researchers’ educational background in statistics, their professional experience and socio-demographic characteristics will be studied in relation to the BSL dimensions.

Methods A cross-sectional study with the population of interest including health scientific and academic staff at Belgian universities will be conducted. The study has been approved by the University Hospital of Liège Ethics committee. The Delphi method will be used to validate some parts of the BSL dimensions while Cronbach α will be computed to assess internal consistency. Further, exploratory and confirmatory factor analysis will be used to validate the factor structure. Structural equation modelling will be employed to analyse the associations between the BSL dimensions, some of which will be treated as latent variable, and to test the effect of the external factors on these dimensions. Statistical analyses will be performed using the statistical package SAS and R with appropriate packages as lavaan.

Conclusion The data collected will enable to establish the links between the BSL dimensions among health researchers at Belgian universities, and to suggest ways forward, particularly in terms of adapting or reinforcing existing BSL curriculum and instructional practices.

1. Hardwicke TE et al. Calibrating the scientific ecosystem through meta-research. Annu Rev Stat Its Appl. 2020;7 Volume 7, 2020:11–37.

2. Gal I. Adults’ Statistical Literacy: Meanings, Components, Responsibilities. Int Stat Rev. 2002;70:1–25.

3. Hein de Vries, Margo Dijkstra PK. Self-efficacy: the third factor besides attitude and subjective norm as a predictor of behavioural intentions. Health Educ Res. 1988;3:273–82.

posters-monday-BioZ: 29

Balneotherapy for Peripheral Vascular Diseases: A Systematic Review with a Focus on Peripheral Arterial Disease and Chronic Venous Insufficiency

Mi Mi Ko

Korea Institute of Oriental Medicine, Korea, Republic of (South Korea)

Background: Peripheral vascular diseases (PVDs), including peripheral arterial disease (PAD), chronic venous insufficiency (CVI) and coronary artery disease (CAD), significantly impair vascular function and quality of life. Balneotherapy, a non-invasive intervention involving thermal and mineral water therapies, has shown potential benefits in managing these conditions. However, a systematic evaluation of its efficacy remains limited. This systematic review aims to assess the effects of balneotherapy on vascular outcomes, symptom alleviation, and quality of life in patients with PVDs.

Methods: A systematic search was conducted in PubMed (Medline), Embase, and the Cochrane Central Register of Controlled Trials (CENTRAL) to identify randomized controlled trials (RCTs) published up to November 2, 2024. The search terms included "Balneotherapy" and "Peripheral vascular diseases," and studies meeting predefined inclusion criteria were selected. Eligible studies focused on patients with PAD and CVI, assessed the effects of balneotherapy, and applied the same adjunct interventions to both treatment and control groups. Data extraction was performed independently by two researchers, and the risk of bias was assessed using the Cochrane Risk of Bias (RoB) tool.

Results: A total of 12 RCTs were included in the analysis. For PAD, balneotherapy improved vascular function (e.g., ankle-brachial pressure index, flow-mediated dilation increased walking capacity, and enhanced functional capacity such as leg pain and swelling. In patients with CVI, balneotherapy reduced lower-limb edema, provided pain relief, and improved mobility and quality of life. For CAD, the therapy enhanced endothelial function, reduced vascular inflammation, and improved peripheral perfusion. Adverse events were rare and generally mild, with no severe safety concerns identified. Despite methodological variability, most studies reported favorable effects, particularly in vascular function and symptom management.

Conclusion: Balneotherapy appears to be a safe and effective complementary treatment for improving vascular function, walking capacity, and quality of life in patients with PVDs, particularly PAD and CVI. Further large-scale, high-quality trials with long-term follow-up are needed to confirm these findings and optimize treatment protocol.

posters-monday-BioZ: 30

Binomial Sum Variance Inequality correction of 95% CIs of percentages in multicentre studies ensures approximately 95% coverage with minimal width

Paul Talsma¹, Francesco Innocenti²

¹Phastar, United Kingdom; ²Maastricht University, The Netherlands

Percentages with corresponding 95% confidence intervals (CI) are often reported for clinical and epidemiologic multicentre studies. Many approaches of centre effect correction exist. These are often complex and/or provide inadequate coverage. A method is presented for constructing the CI which ensures approximately 95% coverage, has minimal width and is not overly complex: the Binomial Sum Variance Inequality (BSVI) correction.

Two studies have been done to investigate coverage and width of intervals using the correction.

In study 1, coverage and width of CIs using the BSVI correction were compared with no correction and with mainstream correction approaches. A simulation study was done where data was generated using binomial distributions with population percentages per centre being the same or differing using pre-specified amounts. CIs were constructed using the Wilson, Agresti-Coull and exact methods, for varying numbers of centres (2-32) and participants per centre (10-160). The ratio of number of participants between centres was systematically changed. Eleven traditional ways of correcting the CI were compared with each other, with no correction and with the BSVI correction. The traditional ways included using the ANOVA, Fleiss-Cuzick, Pearson, Hedeker and GEE methods for estimating the Intra-Cluster Correlation with or without correction for differences in centre size, as well as direct estimation of variances using SAS® (v.9.4) PROC SURVEYFREQ. It was found that intervals constructed with no correction or with traditional methods had coverage which is too high, a finding which could be explained using the BSVI. The BSVI correction was shown to be effective in downwards correcting coverage close to the desired 95% level and reducing interval width.

In study 2, the properties of the BSVI correction for small samples and scarce events were investigated. Data were generated from 2-4 centres with average event percentages: 2, 4, 8, 16, and 32; total N: 6, 12, 24, and 48; mean ratio of centre size: 1, 2, or 3; and differences between centre percentages being none, small, medium, and large using Cohen’s effect size. Results show that for N≥24 the BSVI correction leads to 95% CIs with adequate coverage (≥95%) and reduced width compared to no correction. These findings were corroborated with further simulations using the same parameters but N ranging from 14-30 in steps of 2. The BSVI correction is recommended for use for N≥24.

Both studies demonstrate that the BSVI correction leads to CIs with adequate coverage and reduced width when compared to other approaches.

posters-monday-BioZ: 31

Sample size calculation methods for clinical trials using co-primary count endpoints

Takuma Ishihara¹, Kouji Yamamoto²

¹Innovative and Clinical Research Promotion Center, Gifu University Hospital; ²Department of Biostatistics, School of Medicine, Yokohama City University

Introduction: Clinical trials often employ co-primary endpoints to comprehensively evaluate treatment efficacy. In trials where efficacy is established only if all endpoints show significant effects, the Intersection-Union test is commonly applied. While this approach avoids inflation of Type I error rate due to multiple testing, it increases the Type II error rate, necessitating larger sample sizes to maintain adequate statistical power.

However, most trial designs assume independence among endpoints, which may lead to an overestimation of the required sample size. Considering correlations between endpoints can reduce the sample size while maintaining statistical power.

Various sample size determination methods have been developed for co-primary endpoints with different variable types, including continuous, binary, mixed continuous-binary, and time-to-event. Notably, Homma and Yoshida (2023) introduced a method for mixed continuous and count endpoints, but their approach did not address cases where all primary endpoints are count-based.

Objective: This study aims to develop a sample size calculation method for clinical trials with co-primary count endpoints.

Methods: Co-primary count endpoints often follow different probability distributions, such as Poisson, zero-inflated Poisson (ZIP), and negative binomial distributions. This study derives analytical expressions to determine the minimum sample size required to achieve statistical significance at a pre-specified nominal significance level while considering endpoint correlations.

Results: Simulation studies were conducted under various scenarios to evaluate the impact of endpoint correlation on sample size requirements. The results show that by considering the correlation between endpoints, the required sample size can be greatly reduced, especially when the correlation between endpoints is high.

Conclusion: Our proposed methodology provides a practical approach for optimizing sample size determination in clinical trials with co-primary count endpoints. By leveraging endpoint correlations, researchers can design more efficient trials without compromising statistical power. These findings have significant implications for resource allocation and trial feasibility in studies involving co-primary count endpoints.

posters-monday-BioZ: 32

Analysis of Composite Endpoint in Cardiovascular Device Clinical Trials

Hao Jiang, Yonghong Gao

Johnson and Johnson, United States of America

Composite endpoint is often used to assess the safety and the effectiveness of cardiovascular devices to increase study power. For example, MACCE (major adverse cardiac and cerebrovascular events) is commonly used in cardiovascular clinical trials. Time-to-first event analysis, composite event process and Finkelstein Schoenfeld (FS) method are the most used approaches to analyze the composite endpoint to detect the treatment effect of the investigational device.

We investigate the potential power gain or loss of utilizing composite endpoint compared to using only one of the individual component endpoints, under the above mentioned three analysis methods. In addition, we look into the pros and cons of those three methods under different scenarios, including endpoints correlation and censoring mechanism. Simulation studies are conducted to assess the performance of the three methods under different settings. Simulation results are provided which include some thought provoking observations.

posters-monday-BioZ: 33

Bayesian predictive monitoring using two-dimensional index for single-arm trial with bivariate binary outcomes

Takuya Yoshimoto^1,2, Satoru Shinoda², Kouji Yamamoto², Kouji Tahata³

¹Chugai Pharmaceutical Co., Ltd., Japan; ²Yokohama City University; ³Tokyo University of Science

Bayesian predictive probabilities are commonly used in phase II clinical trials and can schematically describe the stability of the data in an interim analysis by considering all possible future data. It thus helps researchers make an informed decision about whether a trial should be prematurely terminated or move to phase III trials. Typically, phase II oncology studies follow a single-arm trial, with the primary endpoint being short-term treatment efficacy. Specifically, objective response based on the RECIST guidelines is commonly used as a primary endpoint in terms of the treatment efficacy.

Although the primary endpoint is commonly set as an efficacy outcome, situations may arise in which the safety outcome is equally important as the efficacy outcome. Brutti et al. (2011) presented a Bayesian posterior probability-based approach that imposed a restrictive definition of the overall goodness of the therapy by controlling the number of responders who simultaneously do not experience adverse toxicity. Similarly, Sambucini (2019) proposed a Bayesian decision-making method based on predictive probability, involving both efficacy and safety with binary outcomes.

These strategies are attractive; however, Brutti et al. (2011) and Sambucini (2019) could not capture the difference in a situation where the joint probability of simultaneously being a non-responder to the therapy while experiencing toxicity is substantially different when comparing the results from the historical control and study treatment. Therefore, we propose a novel approach involving a bivariate index vector for summarizing results by considering the joint probability of described above. Also, through the simulation study to evaluate the operating characteristics of design, we show that the proposed method made appropriate interim go/no-go decisions, and made a valuable contribution to the clinical development. For details, see Yoshimoto et al. (2024).

Reference 1: Brutti, P., Gubbiotti, S. and Sambucini, V. (2011). An extension of the single threshold design for monitoring efficacy and safety in phase II clinical trials. Statistics in Medicine, 30(14), 1648-1664.

Reference 2: Sambucini, V. (2019). Bayesian predictive monitoring with bivariate binary outcomes in phase II clinical trials. Computational Statistics & Data Analysis, 132, 18-30.

Reference 3: Yoshimoto, T., Shinoda, S., Yamamoto, K. and Tahata, K. (2024). Bayesian predictive probability based on a bivariate index vector for single-arm phase II study with binary efficacy and safety endpoints. Pharmaceutical Statistics. http://doi.org/10.1002/pst.2431

posters-monday-BioZ: 34

Optimising covariate allocation at design stage using Fisher Information Matrix for Non-Linear Mixed Effects Models in pharmacometrics

Lucie Fayette^1,2, Karl Brendel², France Mentré¹

¹Université Paris Cité, INSERM, IAME, UMR 1137, Paris, France; ²Pharmacometrics, Ipsen Innovation, Les Ulis, France

Introduction

This work focuses on designing experiments for pharmacometrics studies using Non-Linear Mixed Effects Models (NLMEM) including covariates to describe between-subject variability. Before collecting and modelling new clinical trial data, choosing an appropriate design is crucial. Clinical trial simulations are recommended [1] for power assessment and sample size computation although it is computationally expensive and non-exhaustive. Alternative methods using the Fisher Information Matrix (FIM) [2] have been shown to efficiently optimize sampling times. However, few studies have explored which covariate values provide the most information.

Objectives

Assuming a known model with covariate effects and a joint distribution for covariates in the target population from previous clinical studies, we propose to optimise the allocation of covariates among the subjects to be included in the new trial. It aims achieving better overall parameter estimations and therefore increase the power of statistical tests on covariate effects to detect significance, and clinical relevance or non-relevance of relationships.

Methods

We suggested dividing the domain of continuous covariates into clinically meaningful intervals and optimised their proportions, along with the proportion of each category for discrete covariates. We developed a fast and deterministic FIM computation method, leveraging Gauss-Legendre quadrature and copula modelling [3]. The optimisation problem was formulated as a convex problem subject to linear constraints, allowing resolution using Projected Gradient Descent algorithm.

We applied this approach for a one-compartment population pharmacokinetic model with IV bolus, linear elimination, random effects on volume (V) and clearance (Cl), and a combined error (as in [4]). Additive effects of sex and body mass index were included on log(V), and creatinine clearance on log(Cl). Initial distribution of covariates was imported from NHANES as in [3].

Results

Methods were implemented in R using the package PFIM6.1 (https://cran.r-project.org/web/packages/PFIM/index.html).

We found that optimal distribution reduces the number of subjects needed (NSN) to get 80% power on relevance or non-relevance of the three covariates. Without constraints, results were intuitive: distribution between extreme intervals only. In a more constrained and realistic setting, optimisation reduced NSN by over 60%.

Conclusion

We introduced a novel method to integrate the FIM for NLMEM with covariates to efficiently optimise covariate allocation among patients for future studies. We showed an important reduction in the NSN to achieve desired power in covariate tests.

References

[1]FDA Guidance for Industry Population Pharmacokinetics. https://www.fda.gov/media/128793/download, 2022

[2]Mentré et al. CPT Pharmacometrics Syst Pharmacol, 2013

[3]Guo et al. J Pharmacokinet Pharmacodyn, 2024

[4]Fayette et al. PAGE, 2024

posters-monday-BioZ: 35

Unbiased Estimation for Hierarchical Models in Clinical Trials

Raiann Joanna Hamshaw, Nanxuan Lin

Biostatistics Research Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK

Background

In clinical trials, hierarchical models are applied to data where there are dependence between observations occurring within groups, which would violate the independence assumption of some other non-hierarchical estimation methods. These models allow for the incorporation of group analysis as well as individual level analysis. Unbiasedness refers to the identification of an estimator within a class of unbiased estimators that has a uniformly minimum risk. Researchers often obtain this by minimising the risk for some parameter, and observing whether the result is independent of the parameter.

Methods

The modified covariate method proposed by Tian et al. (2014) is a parametric approach to estimating the causal treatment effect (CATE) as well as identifying significant subgroups. This method is shown to be applicable to continuous, binary and survival outcomes. We intend to apply this method to a hierarchical structure, using the benefit of the eliminated nuisance parameter to obtain an unbiased estimate for the overall treatment effect, given an unbiased estimate exists. Simulation studies were undertaken to assess the variance of our treatment effect estimates against that of traditional methods. Sample size estimation calculations for the method were undertaken.

Results

Our methods show that the application of the modified covariate method consistently allowed for a treatment effect estimate with smaller variances than that of current methods, even when subgroup sizes were not equal and when the model included small subgroups. The sample sizes needed for this method are lower than that of other frequentist estimation methods, which often obtain more accurate estimates as the sample size and the subgroup sizes increase.

Conclusion

This new approach offers advantages over current frequentist and Bayesian methods. The parametric approach to this problem allows for less uncertainty around choosing appropriate parameters than that of the Bayesian methods, as well as having the benefit of having a lower required sample size than that of current frequentist methods. The method obtained smaller variances surrounding the overall treatment effect estimation against both types of methods.

Reference

Tian, L., Alizadeh, A. A., Gentles, A. J., Tibshirani, R. A Simple Method for Estimating Interactions Between a Treatment and a Large Number of Covariates. Journal of the American Statistical Association, 109:508, 1517-1532. 2014. https://doi.org/10.1080/01621459.2014.951443

posters-monday-BioZ: 36

Sample size re-estimation for McNemar's test in a prospective randomized clinical trial on childhood glaucoma

Markus Schepers¹, Esther Hoffmann², Julia Stingl², Anne Ehrlich³, Claudia Wolf³, Thomas Dietlein⁴, Ingeborg Stalmans⁵, Irene Schmidtmann¹

¹IMBEI, University of Mainz, Germany; ²Department of Ophthalmology, University Medical Centre, University of Mainz, Germany; ³IZKS Mainz, Germany; ⁴Department of Ophthalmology, University Hospital of Cologne, Germany; ⁵Department of Ophthalmology, UZ Leuven, Belgium

In clinical trials involving paired binary data, such as those analyzed with McNemar's test, crucial parameters like the correlation between the paired outcomes and the proportion of discordant pairs significantly impact the test's power. However, these parameters are often unknown at the design stage, complicating sample size planning.
We develop sample size re-estimation strategies for McNemar's test, motivated by the PIRATE study - a prospective, multi-center, observer-blinded clinical trial comparing standard trabeculotomy with micro-catheter assisted trabeculotomy for treating childhood glaucoma. The trial involves centers in Mainz and Cologne (Germany) and Leuven (Belgium).
For fixed effect size, the power of McNemar's test decreases with a higher proportion of discordant pairs and increases with greater correlation between paired observations. However, knowledge about the correlation between paired observations is often limited before the start of a clinical trial. Therefore, adaptive sample size adjustments when interim analyses reveal a certain fraction of discordant pairs is desirable.
We propose practical, generalized recommendations for adaptive designs in studies involving McNemar's test with uncertainty about parameters relevant for power, for re-estimating sample size based on interim estimates of these key parameters. Our recommendations aim at maximizing the conditional power while maintaining the type I error.

posters-monday-BioZ: 37

Bayesian bivariate analysis of phase II basket trials enabling borrowing of information

Zhi Cao¹, Pavel Mozgunov¹, Haiyan Zheng²

¹University of Cambridge; ²University of Bath

Introduction:
Phase II clinical trials focus primarily on establishing early efficacy of a new treatment, while the importance of continued monitoring toxicity cannot be ignored. In the era of precision medicine, basket trials have gained increasing attention, with biomarker-driven technology in various patient sub-populations sharing a common disease feature (e.g., genomic aberration). Thus, the borrowing of information across similar patient (sub-)groups is essential to expedite drug development.

Method:
We propose a robust Bayesian hierarchical model that can integrate and analyse clinically relevant differences in toxicity and efficacy, while accounting for possible patient heterogeneity and the correlation between the treatment and toxicity effects. From practical consideration, toxicity responses are treated as binary observations, and the efficacy outcomes are assumed to be normally distributed. Our model can be viewed as a two-dimensional extension of the exchangeable-nonexchangeable (EXNEX^[1]) method: flexible weights are assigned to mixture distributions that imply different borrowing structures concerning toxicity and efficacy, namely, bivariate EX, bivariate NEX, EX in either toxicity or efficacy while NEX in the other.

Results & Conclusion:
Compared with standard Bayesian hierarchical modelling and stand-alone analysis, simulation results of operating characteristics show that our models perform robustly in terms of (the Bayesian analogues of) type I error and power, especially when only toxicity effects are exchangeable (vice versa). The proposed method also has higher power than independently applying the EXNEX method to toxicity and efficacy treatment effects when they are obviously correlated and dissimilar.

Discussion:
We give specific model recommendations for various clinical scenarios based on our simulation study of the joint evaluation of treatment effects. Possible future directions to our proposal are the sample size re-estimation and time-to-event extension.

[1] Neuenschwander, Beat et al. “Robust exchangeability designs for early phase clinical trials with multiple strata.” Pharmaceutical statistics vol. 15,2 (2016): 123-34. doi:10.1002/pst.1730

posters-monday-BioZ: 38

Usefulness of the blinded sample size re-estimation for dose-response trials with MCP-Mod

Yuki Fukuyama^1,2, Gosuke Homma³, Masahiko Gosho⁴

¹Biostatistics and Data Sciences, Nippon Boehringer-Ingelheim Co., Ltd, Tokyo, Japan; ²Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Japan; ³Quantitative Sciences, Astellas Pharma Inc, Chuo-ku, Tokyo, Japan; ⁴Department of Biostatistics, Institute of Medicine, University of Tsukuba, Tsukuba, Japan

Background / Introduction

Sample size calculation requires the assumption of a mean difference and common variance for continuous outcomes. It is often difficult to specify the appropriate value of variance in the planning stage of a clinical trial, and its misspecification results in an unnecessarily large or small sample size. To mitigate such misspecification, blinded sample size re-estimation (BSSR) has been proposed. Since BSSR uses only accumulated data in a blinded manner for variance estimation, and thus is easy to implement. Several variance estimators have been proposed in BSSR for two-arm trials. Recently, a multiple comparison procedure with modelling techniques (MCP-Mod) has become more common in dose-response trials, as it addresses model uncertainty by specifying a set of candidate dose-response models. Nonetheless, no BSSR in dose-response trials with MCP-Mod has been proposed. We extend variance estimators originally developed in BSSR for two-arm trials, and investigate their usefulness in dose-response trials with MCP-Mod.

Methods

For BSSR in dose-response trials with MCP-Mod, we investigate four variance estimators: the unblinded pooled variance estimator, blinded one-sample variance estimator (OS), bias adjusted blinded one-sample variance estimator (bias adjusted OS), and blinded variance estimator using information on randomization block size. We conduct a simulation study to evaluate operating characteristics including the type I error rate and power. Furthermore, to clarify the discrepancy between the actual and nominal power under the final sample size after BSSR, we investigate biases in the point estimates at the end of the trial.

Results

BSSR based on the OS can control the type I error rate and ensure the target power even if the true variance differs from the assumed one. On the other hand, BSSR based on the bias adjusted OS and blinded variance estimator using information on randomization block size can control the type I error rate, but cannot always ensure the target power. Furthermore, it is found that the point estimates are biased after the BSSR based on the OS and bias adjusted OS.

Conclusion

Although the point estimate is biased after the BSSR based on the OS, it is the only method that satisfies both controlling the type I error rate and ensuring the target power. Therefore, we recommend using the OS-based BSSR to mitigate the misspecification of variance at the trial design stage for dose-response trials. Further investigation on the other endpoints (e.g., binary, count, and time-to-event) may be an avenue for future research.

posters-monday-BioZ: 39

Quantification of allocation bias in clinical trials under a response-adaptive randomization procedure for binary response variables

Vanessa Ihl, Ralf-Dieter Hilgers

RWTH Aachen University / Uniklinik Aachen, Germany

Background:

Randomized clinical trials have as one of their main goals to mitigate bias. The (un-)predictability of assigning a patient to a treatment arm is influenced by different aspects, one of which is allocation bias. It describes the selective allocation of patients by the recruiter influenced by his opinion e.g., on which arm is best or has a higher probability of being allocated. It is based on patient characteristics that influence the expected response, causing bias in the response and therefore affecting the results of a trial.

Response-adaptive randomization (RAR) promises to treat more patients with the more effective treatment compared to classical approaches. Recently, it has been the subject of increased interest and initial use in trials, as it is said to have higher success rates. It is expected that more patients receive the best treatment without compromising the results or requiring more patients for the trial. Specifically for rare diseases, where one expects to include a large proportion of all diseased persons, it is desirable to treat more patients with the better treatment during the phase II/III trials. So far, there is nothing about allocation bias in this area.

Methods:

We will consider a single-centered two-arm parallel group design with a binary primary endpoint, binomial distributed, in which the doubly adaptive biased coin design is applied for allocating the patients to a treatment arm. Further, we apply a testing strategy that allows to take the adaptive nature of the randomization procedure into account. We implement the procedure for simulations and quantify the allocation bias with special focus on rare diseases. Different assumptions for the allocation bias are investigated including strict biasing policies and higher values for the effect of the bias.

Results:

Our first results indicate that even if the allocation bias can be very strong, some RAR procedures are hardly influenced. Generally, the responses in a simulation study seem to be weakly affected by allocation bias. For specific strategies however, it is important to model possible bias effects. Further simulations are still in process, the upcoming results are expected to strengthen this hypothesis.

Conclusion:

RAR trials can successfully reduce concerns about allocation bias for certain procedures. In some cases, it is useful to be able to include modelling the bias in the trial analysis if it cannot be addressed initially in the design of the trial.

posters-monday-BioZ: 40

Assessment of Assay Sensitivity in Non-Inferiority Trials Using Aggregate Data from a Historical Trial: A Population Adjustment Approach

Eisuke Hida¹, Satomi Okamura², Tomoharu Sato³

¹The University of Osaka, Japan; ²The University of Osaka Hospital; ³Hiroshima City University

Background: In non-inferiority (NI) trials lacking assay sensitivity, an ineffective treatment may be found non-inferior, potentially leading to an erroneous efficacy assessment. Therefore, a 3-arm NI trial with placebo, test treatment and control treatment is considered the gold standard and is recommended by several guidelines, such as ICH-E10. However, due to ethical and feasibility concerns regarding the inclusion of a placebo, the practical implementation of 3-arm NI trials remains limited. As a result, a useful method for evaluating assay sensitivity in 2-arm NI trials is needed.

Objective: We propose a practical method for confirming assay sensitivity in 2-arm NI trials. This method evaluates the assay sensitivity of the NI trial after adjusting for the distribution of covariates using a population adjustment method, applied to the summary statistics of historical trial data and the individual patient data (IPD) of the NI trial.

Method: To assess assay sensitivity, it is necessary to demonstrate that the acceptable minimum effective value of the test treatment in a 2-arm NI trial is superior to the placebo (Hida & Tango, 2018). Since a placebo is not included in 2-arm NI trials, historical trial results must be used as external information. To evaluate assay sensitivity in NI, an adjustment method is required to align the patient characteristics from a historical trial to the distribution of IPD in the NI trial. In other words, this approach is the reverse of Matching Adjusted Indirect Comparison (MAIC) or Simulated Treatment Comparison (STC). This proposed method evaluates assay sensitivity of the NI trial by estimating the average treatment effect of a historical trial in the population of the NI trial (IPD) through a combination of MAIC and inverse probability weighting (IPTW). We investigated the performance and practicality of the proposed methods through simulations based on several scenarios using clinical trial data.

Results and Conclusions: Although the proposed method relies on external information, which may result in a lower level of evidence compared to the gold standard design, this method suggests that it is useful for evaluating assay sensitivity in NI trials and supporting decision-making.

posters-monday-BioZ: 41

Exploring methods for borrowing evidence across baskets or subgroups in a clinical trial: a simulation study

Wenyue Li, Becky Turner, Duncan Gilbert, Ian White

University College London (UCL), United Kingdom

Introduction:

Basket trials are designed to study a single targeted therapy in the context of multiple diseases or disease types sharing common molecular alterations. To draw adequate inference about small baskets, approaches for borrowing evidence become crucial. We aimed to quantify the benefits of information borrowing and to compare the performance of various methods for a proposed phase III basket trial studying a novel immunotherapy for patients with mucosal squamous cell cancers in two common and four rare cancer sites.

Methods:

We simulated six scenarios with different patterns of variation in true treatment effects of a time-to-event outcome across sites. Scenarios 1, 2 and 3 assumed high, low and moderate variation, while Scenarios 4 and 5 assumed contradictory data for the common sites with high or low variation among the remaining sites, respectively. Scenario 6 assumed similar estimates for the common sites only. We estimated a two-stage random-effects meta-analysis model using restricted maximum likelihood or Bayesian estimation while incorporating different priors for the between-site variance. We also implemented an empirical Bayes method and one-stage Bayesian hierarchical approaches using a Bayesian hierarchical model (BHM) and an exchangeability-nonexchangeability (ExNex) model. We conducted 1000 simulations to compare the performance of all methods to a standalone analysis.

Results:

The standalone method performed the worst in precision, mean squared error and power despite its robustness for bias. On the other hand, the Bayesian meta-analysis method with a strongly informative prior was the most precise while producing very large biases under most scenarios, except Scenario 2 where the empirical Bayes method appeared to be the most precise. However, a substantial undercoverage was found for the empirical Bayes method under Scenario 2 and for the Bayesian meta-analysis method with a strongly informative prior under Scenarios 1, 4, 5 and 6. The ExNex model resulted in fairly low biases under most scenarios, whereas the BHM achieved considerably higher precision and power than the former for the rare sites under Scenario 2.

Conclusions:

Our work demonstrated precision and power gains from using proposed information-borrowing methods rather than a standalone analysis. We also demonstrated sensitivity of the results to the choice of prior for the between-site heterogeneity. To provide further guidance for practice, we recommended using a vague prior for the Bayesian meta-analysis method when treatment effect heterogeneity is likely to be limited. Moreover, we recommended using the ExNex model when contradictory true treatment effects are likely to exist.

posters-monday-BioZ: 42

Comparing The ED50 Between Treatment Groups Using Sequential Allocation Trials.

Teresa Engelbrecht, Alexandra Graf

Medical University Vienna, Austria

Determining the median effective dose (ED50) is a fundamental objective in the field of anaesthesia research, both in human and animal studies. Sequential allocation methods, such as the Up-and-Down Method (UDM) and the Continual Reassessment Method (CRM), offer an efficient method of dose allocation based on the responses of previous subjects, thus reducing the required sample size compared to traditional study designs with fixed sample sizes [1].

Motivated by previous studies [2,3], we aim to evaluate methods for comparing ED50 across different treatment groups. While sequential allocation methods such as the Up-and-Down Method (UDM) and the Continual Reassessment Method (CRM) are well described for estimating the ED50, only a limited amount of literature is available for the comparison of the ED50 between several treatment groups. To evaluate the advantages and limitations of sequential allocation methods in comparison to traditional fixed-sample designs, we conducted simulation studies across various scenarios. Our analysis assessed the power and type-1-error of UDM and CRM, as well as logistic regression with a fixed sample size, to determine their respective strengths and weaknesses in estimating and comparing ED50 values across treatment groups.

[1] Görges M, Zhou G, Brant R, Ansermino JM. Sequential allocation trial design in anesthesia: an introduction to methods, modeling, and clinical applications. Paediatr Anaesth. 2017;27(3):240-247. doi:10.1111/pan.13088

[2] Müller J, Plöchl W, Mühlbacher P, Graf A, Stimpfl T, Hamp T. The Effect of Pregabalin on the Minimum Alveolar Concentration of Sevoflurane: A Randomized, Placebo-Controlled, Double-Blind Clinical Trial. Front Med (Lausanne). 2022;9:883181. Published 2022 May 3. doi:10.3389/fmed.2022.883181

[3] Müller J, Plöchl W, Mühlbacher P, Graf A, Kramer AM, Podesser BK, Stimpfl T, Hamp T. Ethanol reduces the minimum alveolar concentration of sevoflurane in rats. Sci Rep. 2022;12(1):280. Published 2022 Jan 7. doi:10.1038/s41598-021-04364-8

posters-monday-BioZ: 43

A pre-study look into post-study knowledge: communicating the use(fulness) of pre-posteriors in early development design discussions

Monika Jelizarow

UCB Biosciences GmbH, Germany

When designing a clinical study we make assumptions on our drug's true treatment effect, for the endpoint of interest. These assumptions are based on existing data and/or expert belief, that is, they are based on some form of evidence synthesis. In the Bayesian framework, this evidence synthesis will result in a design prior distribution representing our current knowledge about the true treatment effect. A pre-posterior can be interpreted as a conditional posterior distribution representing the updated knowledge about the true treatment effect at the end of our future study given only that we know that a certain study outcome (i.e. success or failure) has been met (Walley et al., 2015; Grieve, 2024). Thus, pre-posteriors enable a look into future updated evidence (this is the 'post' part) before running the future study (this is the 'pre' part). This opens the door to help answer many questions statisticians are often asked by their clinical colleagues in proof-of-concept settings, e.g. 'If the study will be successful, what will this make us learn about the true treatment effect?' or 'Does running the study de-risk our program? How would it compare to running the study with more (or fewer) patients?'. Shaped by experiences gained in our organisation, the goal of this contribution is to propose a question-led and visualisation-informed workflow for how to effectively communicate the (use)fulness of this quantitative tool in discussions with stakeholders. The importance of early contextualisation will be emphasised, and supported by illustrating the relationship between, for example, the pre-posterior of success and the unconditional probability of success (PoS), also known as assurance.

posters-monday-BioZ: 44

Estimation and testing methods for delayed-start design as an alternative to single-arm trials in small clinical trials

Tomoharu Sato^1,2, Eisuke Hida²

¹Hiroshima City University, Japan; ²The University of Osaka, Japan

Introduction and Objective(s):

Traditional randomised controlled trial designs are difficult to implement in small populations, such as in the rare disease and paediatric disease areas. Various methodological and statistical considerations have been reported for such small clinical trials [1, 2]. Due to feasibility, many single-arm trials of test drugs alone are still being conducted, allowing the evaluation of within-patient comparisons. In single-arm trials, the efficacy of a test drug is assessed based on a pre-specified threshold. However, it is well known that even if the treatment effect is better than the threshold in a well-controlled single-arm trial, the estimate of the treatment effect is subject to bias. Therefore, simple estimates from single-arm trials may make it difficult to draw valid conclusions about efficacy. In such situations, it is also desirable to be able to estimate the true effect size of the test drug without the influence of bias. In this study, we propose a delayed-start design as an alternative to single-arm trials and a method for estimating and testing treatment effects.

Method(s) and Results:

We propose a method for estimating and testing treatment effects using a delayed start design. In a delayed start design, a randomised controlled trial is conducted in the first period and a single-arm trial in the second period, allowing to estimate the treatment effect of the trial and the difference between the two treatment effects. Various factors, such as disease and treatment characteristics, determine the ‘estimand’ and alter the modelling, but we have given model-specific estimation and testing methods and interpretations. We show that, with appropriate use of delayed start designs, it is possible to estimate the difference between two treatment effects, in addition to assessing efficacy by comparison with a pre-specified threshold, as in single-arm trials. Numerical study is also used to assess their performance and give model-specific interpretations.

Conclusions:

Delayed start designs with appropriate modelling for the primary endpoints may be more effective than single-arm trials for pragmatic small clinical trials in rare and paediatric disease areas.

Keywords: small clinical trials, delayed-start design

References:

[1] IOM. Small clinical trials. issues and challenges (2001).
[2] CHMP. Guideline on clinical trials in small populations (2006).

posters-monday-BioZ: 45

Dealing with missing values in adaptive N-of-1 trials

Juliana Schneider¹, Maliha Raihan Pranti², Stefan Konigorski^1,3,4

¹Hasso-Plattner-Institute, Germany; ²University of Potsdam, Germany; ³Hasso Plattner Institute for Digital Health at Mount Sinai; ⁴Icahn School of Medicine at Mount Sinai

N-of-1 trials are multi-crossover trials in single participants, designed to estimate individual treat-
ment effects. Participants alternate between phases of intervention and one or more alternatives in
trials that often have only few data points. In response-adaptive designs of N-of-1 trials, trial length
and burden due to ineffective treatment can be reduced by allocating treatments adaptively based
on interim analyses. Bayesian approaches are directly applicable by updating posterior beliefs about
effectiveness probabilities. Furthermore, they allow inference for both individual and aggregated
effects. Missing values occur, for instance, due to commonly reported wavering adherence to the
treatment schedule and other personal or external factors. This may happen randomly through-
out the trial (Missing Completely At Random) or dependent on other factors such as severity of
symptoms addressed in the trial or time. Missing values require adjusting the adaptive allocation
mechanism appropriately, but the best approaches for short adaptive N-of-1 trials are not known.
In fact, careful imputation of such missing values is crucial, since sequential treatment allocation
depends on past outcome values.
Here, we investigate the performance of different imputation methods for missing values in simu-
lated adaptive N-of-1 trials. The imputation approaches use information either from only the respec-
tive individual or from all participants, and the adaptive N-of-1 trials are set up in a Bayesian-bandit
design using Thompson Sampling. We evaluate the different imputation approaches in a simulation
study of 1000 synthetic adaptive n-of-1 trials, comparing two alternate treatments and their associ-
ation with a normally distributed outcome. We compare posterior descriptive and inference metrics
for adaptive trajectories with and without missing values. More precisely, we juxtapose the posterior
means and variances of the fully observed and partly observed trial sequences against each other
and the underlying true distribution, as well as study the Kullback-Leibler divergences among them.
This serves to investigate the impact of data missingness and different imputation methods on bias
and efficiency in treatment effect difference estimation.
Preliminary results indicate that the optimal imputation method in a given situation depends on
whether analysis is intended on the aggregated or individual level. Moreover, the amount of miss-
ingness within and between trial participants impacts imputation results. Lastly, time-dependent
associations between measurements and missingness may alter the success of various imputation
methods. Future research may include such time-dependencies both in the simulated data as well
as in suitable imputation methods.

posters-monday-BioZ: 46

Adaptive clinical trial design with delayed treatment effects using elicited prior distributions

James Salsbury¹, Jeremy Oakley¹, Steven Julious¹, Lisa Hampson²

¹University of Sheffield, United Kingdom; ²Advanced Methodology and Data Science, Novartis Pharma AG, Switzerland

Randomized clinical trials (RCTs) are essential for evaluating new treatments, but modern therapies such as immunotherapies present challenges, as delayed treatment effects often occur. These delayed effects complicate trial design by leading to premature futility decisions or inefficient trials with excessive sample sizes and extended durations. Additionally, the proportional hazards assumption, commonly used in survival analysis, may be violated in the presence of time-varying treatment effects.

Adaptive trial designs provide a flexible alternative, allowing modifications based on accumulating data. However, in the context of delayed treatment effects, incorporating prior knowledge about uncertain parameters, such as delay duration and effect magnitude, can significantly enhance trial efficiency. Eliciting prior distributions for these parameters provides a structured approach to account for uncertainty, helping guide trial decisions and improve design robustness.

We present a framework for adaptive clinical trials that explicitly incorporates elicited priors to account for delayed treatment effects. We propose adaptive strategies such as dynamic interim analysis, and efficacy/futility stopping rules, which can be informed by prior distributions. Simulations compare the performance of adaptive designs to traditional fixed designs, demonstrating the benefits of using priors to improve trial efficiency and decision-making.

Our methods aim to reduce inefficiencies and support real-time decision-making, ultimately advancing the evaluation of new therapies.