46th Annual Conference of the International Society for Clinical Biostatistics (ISCB)

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session

Adaptive and multi-arm multi-stage trials

Time:

Monday, 25/Aug/2025:

2:00pm - 3:30pm

Location: Biozentrum U1.141

Biozentrum, 124 seats

Presentations

08-adaptive-multi: 1

Confidence intervals in two-stage adaptive enrichment designs

Enyu Li¹, Nigel Stallard¹, Ekkehard Glimm², Dominic Magirr², Peter Kimani¹

¹University of Warwick, Coventry, United Kingdom; ²Novartis Pharma AG, Basel, Switzerland

Background: With the deepening understanding of biological pathways of disease progression, the analysis of subpopulations has gained importance in clinical trials. Two-stage adaptive enrichment designs have been proposed as an efficient approach for subgroup analysis when treatment heterogeneity is suspected. In such a design, in stage 1, patients are recruited from the full population. Then, the subpopulation that appears to benefit from the experimental treatment is selected by an interim analysis based on stage 1 outcomes data. In stage 2, patients are only recruited from the selected population. Data from both stages are used in the final confirmatory analysis. The selection nature of adaptive enrichment designs poses statistical challenges regarding inference for treatment effects. In this work, we develop selection adjusted confidence intervals for adaptive enrichment designs.

Method: We consider a two-stage adaptive enrichment design based on Jenkins et al [1]. Based on statistical theories in Lehmann and Romano [2], conditional on each possible decision at the interim analysis, we derive the uniformly most accurate unbiased confidence intervals by inverting the two-sided uniformly most powerful unbiased test and provide numerical methods for calculating the intervals.

Result: As implied by the theoretical derivation, our method has the advertised coverage probability conditional on each possible interim decision. Through an intensive simulation study, we verified that confidence intervals using our method have coverage probabilities that are approximately equal to the nominal level. In contrast, previously proposed approaches including double bootstrap confidence intervals [3] and duality confidence intervals [4] show substantial under-coverage in some cases.

Reference

[1] Jenkins, M., Stone, A. and Jennison, C. (2011), An adaptive seamless phase II/III design for oncology trials with subpopulation selection using correlated survival endpoints. Pharmaceut. Statist., 10: 347-356. https://doi.org/10.1002/pst.472

[2] Lehmann, E.L. and Romano, J.P. (2022). Testing statistical hypotheses. New York: Springer. https://doi.org/10.1007/978-3-030-70578-7

[3] Magnusson, B.P. and Turnbull, B.W. (2013), Group sequential enrichment design incorporating subgroup selection. Statist. Med., 32: 2695-2714. https://doi.org/10.1002/sim.5738

[4] Kimani PK, Todd S, Renfro LA, et al. Point and interval estimation in two-stage adaptive designs with time to event data and biomarker-driven subpopulation selection. Statistics in Medicine. 2020; 39: 2568–2586. https://doi.org/10.1002/sim.8557

08-adaptive-multi: 2

Dynamic Bayesian Sample Size Re-Estimation: Balancing Historical and Real-Time Data in Adaptive Clinical Trials

Niamh Anne Fitzgerald¹, James Wason¹, Adrian Mander²

¹Newcastle University, United Kingdom; ²GSK

Background

Sample size re-estimation (SSRE) is a key component of adaptive clinical trials, allowing for mid-trial adjustments to improve efficiency and statistical power. Traditional frequentist SSRE methods rely on interim variance estimates, which may be imprecise, particularly in small sample sizes. Bayesian methods offer an alternative approach by incorporating external historical data, which can improve variance estimation and lead to better-informed sample size adjustments. However, excessive reliance on historical data may introduce errors if past studies do not align well with the current trial. This study explores a Bayesian SSRE method that dynamically balances historical and real-time data to improve sample size estimation.

Methods

We conducted Monte Carlo simulations to evaluate Bayesian and frequentist SSRE strategies in a two-arm clinical trial setting. The frequentist approach re-estimates sample sizes based on pooled variance from interim analyses, while the Bayesian approach integrates historical data using a mixture prior that shifts weight from external information to accumulating trial data. Each method was assessed based on how closely the re-estimated sample size aligned with the sample size that would be chosen if the true parameters were known from the outset.

Results

Our findings show that Bayesian SSRE methods provide more accurate sample size adjustments compared to frequentist methods. The dynamic mixture prior approach effectively transitions reliance from external data to real-time trial information, leading to more precise sample size determinations. The Bayesian method consistently produced sample sizes that were closer to the ideal sample size, particularly when the assumed variance differed from the true variance. Additionally, we examined the effect of different historical data weights, demonstrating that moderate incorporation of external data improves estimation accuracy, while excessive reliance on historical information can lead to deviations.

Conclusion

Bayesian SSRE methods offer notable advantages for adaptive clinical trials by improving sample size estimation and reducing the risk of under- or over-enrolment. By dynamically adjusting the contribution of historical data, these methods enhance trial efficiency and reliability while maintaining statistical rigour. Our findings support the adoption of Bayesian SSRE approaches in modern clinical trial designs, particularly in settings where variance assumptions are uncertain.

08-adaptive-multi: 3

MAMS with patient reported outcomes and treatments with different modalities: how should we handle a differential placebo effect?

Isobel Grace Landray¹, Jennifer Nicholas¹, James Carpenter^1,2,3

¹London School of Hygiene and Tropical Medicine, United Kingdom; ²MRC CTU at UCL, United Kingdom; ³On behalf of the Edmond J Safra Accelerating Clinical Trials in Parkinson's collaboration

Background

MAMS-platform trials have demonstrated critical efficiency improvements in identifying effective new treatments. The design is now well-accepted, leading to continuing increases in diversity and complexity of applications.

A key example of this is the individually-randomised EJS ACT-PD(23) MAMS-platform, which will open in Summer 2025 with two active arms and a placebo, all delivered through a daily blinded capsule. The primary outcome is a subjective Parkinson’s specific patient-reported functional score. A fourth arm, to be added in Summer 2026, will comprise five daily capsules.

The resulting methodological challenge is: how to identify efficient and resilient design solutions which maintain the integrity of all treatment comparisons while minimising control group size when treatment regimens differ.

Methods

When the new arm is added, there are three main options:

1. keep the size of the placebo arm the same, and further randomise the placebo patients 1:1 to one or five capsules daily;

2. add a second placebo arm which takes five capsules daily, and maintain this till the end of the trial, or

3. add a second placebo arm which takes five capsules daily, but after one year, test for a difference between placebo groups; if there is no difference, move to option 1.

We will use theoretical and simulation based methods to evaluate the consequences of increasing levels of a differential placebo effect on the power of the study.

Results

Option 2 is the gold standard which preserves the integrity of all treatment comparisons. However, because it increases the probability of randomisation to a placebo, it is less acceptable to trial participants.

We will present graphs of power versus effect size for different levels of differential placebo effect, and show how these can be used to identify the bound on a differential placebo effect which is sufficient to justify option 1 and/or inform the test in option 3.

Conclusion

A key attraction of MAMS studies is the option to add new arms as new treatments emerge. Our results show how to frame the discussion on how to adapt the design in settings where the outcome is potentially subjective, so participants must be blinded, and the new treatments have a range of regimens.

08-adaptive-multi: 4

Multi-arm multi-stage (MAMS) randomised selection designs: With an application to miscarriage and surgical platform trials

Babak Choodari-Oskooei¹, Alexandra Blenkinsop², Lee Middleton³, Kelly Handley³, Versha Cheed³, Lee Priest³, Emily Fox³, Leah Fitzsimmons³, Rima Smith³, Adam Devall³, Thomas Pinkney³, Arri Coomarasamy³, Mahesh KB Parmar¹

¹UCL's Institute of Clinical Trials and Methodology, United Kingdom; ²Department of Mathematics, Imperial College London, United Kingdom; ³Birmingham University, United Kingdom

Background:

Multi-arm multi-stage (MAMS) randomised trial designs have been proposed to evaluate multiple research questions in confirmatory settings. In designs with several interventions, there are likely to be strict limits on the number of individuals that can be recruited or the funds available to support the protocol. These limitations may mean that not all research treatments can continue to accrue the required sample size for the definitive analysis of the primary outcome measure at the final stage. In these cases, an additional treatment selection rule can be applied at the early stages of the trial to restrict the maximum number of research arms that can progress to the subsequent stage(s).

This talk provides guidelines on how to implement treatment selection within the MAMS framework. It explores the impact of treatment selection rules, interim lack-of-benefit stopping boundaries, the timing of treatment selection as well as using an intermediate outcome to select interventions on the operating characteristics of the design. It uses real trials such as Tommy’s PREMIS (miscarriage) and ROSSINI-2 (surgery, 8-arm 3-stage) to compare MAMS selection designs with alternative designs.

Methods:

We outline the steps to design a MAMS selection trial. Using extensive simulation studies, we explore the maximum/expected sample sizes, familywise type I error rate (FWER), and overall power of the design under binding and non-binding interim stopping boundaries for lack-of-benefit.

Results:

Pre-specification of a treatment selection rule reduces the maximum sample size by at least 25%. The familywise type I error rate of a MAMS selection design is smaller than that of the standard MAMS design with similar design specifications without the additional treatment selection rule. In designs with strict selection rules, the final stage significance levels can be relaxed for the primary analyses to ensure that the overall type I error for the trial is not underspent. When conducting treatment selection from several treatment arms, it is important to select a large enough subset of research arms (that is, more than one research arm) at early stages to maintain the overall power at the pre-specified level.

Conclusions:

MAMS selection designs gain efficiency over the standard MAMS design by reducing the overall sample size. Diligent pre-specification of treatment selection rules, final stage significance level and interim stopping boundaries for lack-of-benefit are key to controlling the operating characteristics of a MAMS selection design. We provide guidance on these design features to ensure control of the operating characteristics.

08-adaptive-multi: 5

A test for treatment differences using allocation probabilities in response-adaptive clinical trials

Stina Zetterstrom, David S. Robertson, Sofía S. Villar

MRC Biostatistics Unit, University of Cambridge, United Kingdom

Background/Introduction:

Response-adaptive clinical trials update the allocation probabilities to treatment arms sequentially based on the data collected so far in the trial. One main objective for response-adaptive clinical trials is to ensure that patients in the trial will have a higher probability of getting the best treatment compared to using equal randomisation. However, this imbalance in treatment allocation can lead to low power when testing for a treatment difference. Recent works (Barnett et al 2024; Deliu et al 2023) propose a new testing approach that is based on the allocation probability (AP) instead of the outcome directly, which can increase the power. Here, we further investigate and propose alternative versions of the AP test.

Methods:

The original AP test statistic is the sum of blocks in the trial with an allocation probability, to the active arm, larger than 0.5 (in a two-arm trial). The null and alternative distributions of the test statistic are found using simulations. We propose alternative versions of the AP test, where the magnitude of the allocation probabilities and/or the block number is utilised in the test statistic. The different versions of the AP test are evaluated in simulation studies, using the Thompson sampling algorithm for binary outcome in a two-arm setting.

Results:

The simulation studies show that the AP test can perform better in terms of power compared to traditional tests, especially in certain regions of the parametric space, while controlling type 1-error. Furthermore, the AP tests that uses the magnitude of the allocation probabilities and/or the block number can have a higher power than the original AP test, especially when the trial size is larger.

Conclusions:

The AP test is an alternative hypothesis test that can be used in response-adaptive clinical trials when the treatment allocation is imbalanced to increase the power compared to traditional methods. There are different versions of the test, that can further increase the power gain, in specific settings. While we use Thompson sampling, the AP test can be used for any response-adaptive randomisation, and its performance studied in that intersection.

References:

Helen Yvette Barnett, Sofía S. Villar, Helena Geys, Thomas Jaki, “A Novel Statistical Test for Treatment Differences in Clinical Trials Using a Response-Adaptive Forward-Looking Gittins Index Rule”, Biometrics, 79(1), 2023, Pages 86–97.

Nina Deliu, Joseph J. Williams, and Sofía S. Villar. "Efficient inference without trading-off regret in bandits: An allocation probability test for Thompson sampling." arXiv preprint arXiv:2111.00137, 2021.

Conference Agenda