Conference Agenda

Session
Survival analysis 2
Time:
Wednesday, 27/Aug/2025:
2:00pm - 3:30pm

Location: Biozentrum U1.141

Biozentrum, 124 seats

Presentations
38-survival-analysis-2: 1

A framework for estimating, investigating and using the correlation between multiple time-to-event endpoints in a group sequential trial

Anne Lyngholm Soerensen1,2, Paul Blanche1, Henrik Ravn2, Christian Pipper2

1University of Copenhagen (Denmark); 2Novo Nordisk (Denmark)

Introduction:

A correlation estimate is used to calculate the expected power of rejecting hypotheses related to multiple endpoints during and at the end of a group sequential trial (GST). Thus, determining an appropriate correlation between endpoints is key during the planning of the trial. This entails using data from earlier comparable trials. However, when endpoints are time-to-event several challenges present themselves. In particular, the censoring scheme of a given trial will influence the correlation between commonly used log-rank test-statistics and in general the correlation is not expected to adhere to any simple canonical form. We will present a method to estimate the time-dependent correlation of test statistics for time-to-event endpoints. The method will further allow for investigation of what drives the correlation and how its input can be modified to aid in the design of future trials or GSTs with time-to-event endpoints.

As the correlation can be used for more efficient testing, the method can provide an estimate of the correlation to provide more powerful confirmatory testing strategies.

Methods:

Using the identical and independent distributed (iid) decomposition of the log-rank test statistics for the endpoints, we can calculate the time-dependent correlation in previously trials. The decomposition allows us to understand what drives the correlation. We can visibly in the expression of the decomposition isolate several operational characteristics such as the time of censoring, the timing of inclusion, and more. By altering the characteristics, we can estimate how they affect the correlation between the endpoints during the GSTs and identify main drivers of the correlation. This creates a plug-and-play tool for using information from earlier trials to be adapted with the design decisions and expectations from new trials. It is then possible to plan future trials via the method and simulation without enforcing assumptions about the correlation structure.

The implementation of the method and how it can be used for estimating, investigating and planning future trials will be shown using data from a previous cardiovascular outcomes trial.

Results:

The iid decomposition of the log-rank test statistics in a time-to-event trial provides a powerful tool for estimating, investigating and using the correlation between multiple time-to-event endpoints in a GST.



38-survival-analysis-2: 2

Likelihood adaptively incorporated external aggregate information with uncertainty for survival data

Jing Ning

The University of Texas M.D. Anderson Cancer Center, United States of America

Introduction

Population-based cancer registry databases are invaluable for bridging the gap created by the limited statistical power of primary cohort studies with small to moderate sample sizes. While these databases often lack detailed tumor biomarker data or report it inconsistently, they provide comprehensive and publicly accessible aggregate survival statistics. Integrating such data with primary cohorts holds promise for enhancing treatment evaluation and survival prediction across tumor subtypes. However, in rare cancers, even registry sample sizes may be modest, and the variability associated with aggregated statistics can be substantial relative to the primary cohort’s sample variation. Neglecting this variability risks misleading conclusions.

Methods

We propose a likelihood-based method that adaptively incorporates external aggregate information while accounting for its variability. To ensure computational efficiency and stability, we introduce a nuisance parameter to circumvent the infinite-dimensional baseline hazard function in aggregate data. We derive the asymptotic properties of the estimators and assess their finite-sample performance through simulations.

Results

Simulation studies demonstrate that the proposed method performs robustly across varying levels of external information variability, outperforming existing approaches. We applied the method to integrate inflammatory breast cancer (IBC) patient data from the University of Texas MD Anderson Cancer Center with aggregate survival data from the National Cancer Data Base. This enabled an assessment of the Ki-67 biomarker (negative vs. positive) in predicting the survival benefits of trimodality treatment. Results indicated poorer survival outcomes for Ki-67 positive patients, characterized by higher cancer cell proliferation, compared to their negative counterparts with lower proliferation. Trimodality treatment significantly benefited Ki-67 positive patients, while Ki-67 negative patients derived limited survival benefit. These findings highlight Ki-67’s potential as a predictive biomarker for tailoring therapy in IBC.

Conclusion

In real-world data integration, accounting for the variability in aggregate information is critical to avoid bias and enhance statistical efficiency. Our method appropriately incorporates external data variability, safeguarding against the integration of unsuitable external information due to population heterogeneity. By ensuring data-driven borrowing of information, the approach enhances inference accuracy and supports informed decision-making in precision oncology.



38-survival-analysis-2: 3

A new statistical test to compare probability of being in response (PBR) with application to a study in oncology

Norbert Hollaender1, Ekkehard Glimm1,2

1Novartis Pharma AG, Basel, Switzerland; 2Otto-von-Guericke University, Institute of Biometry and Medical Informatics, Magdeburg, Germany

INTRODUCTION

The probability-of-being-in-response (PBR) function provides easily interpretable curves for time from treatment start to first response and time from first response to subsequent failure. Comparison between treatment arms is based on visual inspection of the PBR curves, inference is rarely applied in practice. Here, we describe a statistical test for a comparison of PBR curves.

METHODS

The PBR function can be derived from a multistate model. At study start, all patients are in an initial state 0 (not in response). Patients responding to treatment enter state 1 (in response) at time of first documented response. Later, they might enter the absorbing state, state 2. For comparison of two PBR curves we consider three test statistics which are extensions of the logrank test for right censored survival curves. The derivation of the test statistics’ distribution is based on conditional probabilities of entering the response state given the risk sets in the treatment arms at each event time. In addition, the risks sets for leaving the response state at the event times are also considered. We describe the statistical methodology, investigate type-I errors and power in a simulation study and illustrate the application using data from the clinical phase 3 study REACH3.

RESULTS

The suggested tests keep the type I error under a Markov property. Simulations show that i) high power is achieved for event rate ratios (of the event types ‘entering state 1’ and ‘leaving state 1’) between the treatments that are constant in time and the test treatment is the better one, ii) the type I error is preserved if the test treatment is consistently no better than the control treatment over the entire time axis and iii) if event rate ratios are above 1 for some time periods and below 1 for others, there are no statistical guarantees of the tests’ properties. For REACH3, the tests confirm statistical significance of the observed differences between PBR curves.

CONCLUSION

The proposed tests are straightforward extensions of the logrank test. Simulations and the application to clinical trial data show that these tests are useful additions to the visual comparison of PBR curves.



38-survival-analysis-2: 4

One-sample survival tests for non-proportional hazards in oncology clinical trials

Chloé Szurewsky1, Guosheng Yin2, Gwénaël Le Teuff1

1CESP, INSERM U1018, University of Paris-Saclay, France; 2Department of Statistics and Actuarial Science, University of Hong-Kong Pokfulam Road, Hong Kong

In oncology, well-powered time-to-event randomised clinical trials are challenging for rare diseases (e.g., peadiatric cancers or personalised medicine) due to limited patient numbers. One- or two-stage designs for single-arm trials (SATs) with time-to-event outcomes have emerged in recent years as compelling alternatives to overcome this issue. These designs rely on the one-sample log-rank test (OSLRT) and its modified version (mOSLRT) to compare the survival curve of an experimental arm to that of an external (or historical) control group under the proportional hazards (PH) assumption that may be violated particularly when evaluating immunotherapies. We develop score tests and investigate alternatives for situations where PH does not hold. We extend Finkelstein's score test (OSLRT) developed under PH by using a piecewise-exponential (PE) model with change-points (CPs) for early, middle and delayed treatment effect and an accelerated hazards model for crossing hazards. The restricted mean survival time (RMST-) based test is adapted to the case of SATs. We construct a maximum combination (max-Combo) test combining the mOSLRT, early and delayed score tests. The performances (type I error and power) of the developed tests are evaluated through a simulation study. The survival times are generated with an exponential distribution assuming no sampling variability for the reference group and a PE for the experimental group. The simulation parameters are the sample size of the experimental group (from 20 to 200 patients), the exponential censoring rate (from 0 to 35%) and the relative treatment effect (hazard ratio from 0.5 to 1). A SAT in paediatric patients with neuroblastoma evaluating an inhibitor is used for illustration. The simulation study shows that the score tests are more conservative than the mOSLRT and as conservative as the OSLRT. The score test has the highest power when the data generation matches with the model even when CPs are misspecified. The RMST-based test is less powerful than the mOSLRT except for an early effect with censoring rate less than 15%. The max-Combo test is conservative and more powerful than the mOSLRT with sufficient sample sizes (n>50) but less than the appropriate score test under non-PH. To conclude, the score tests are efficient under non-PH when the approximate values of the CPs are known a priori and the max-Combo is an alternative when the time-dependant treatment effect and values of CPs are unknown. Further researches needs to be conducted to study the impact of the external control group sampling variability.



38-survival-analysis-2: 5

Efficiency of Nonparametric Superiority Tests Based on Restricted Mean Survival Time Versus the Logrank Test Under Proportional Hazards

Dominic Magirr

Novartis, Switzerland

For randomized clinical trials with time-to-event endpoints, proportional hazard models are typically used to estimate treatment effects and log-rank tests are commonly used for hypothesis testing. The summary measure of the primary estimand is frequently a hazard ratio. However, there is growing support for replacing this approach with a model-free summary measure and assumption-lean analysis method—a trend already observed for continuous and binary endpoints. One alternative is to base the analysis on the difference in restricted mean survival time (RMST) at a specific timepoint, a single-number summary measure that can be defined without any restrictive assumptions on the outcome model. In a simple setting without covariates, an assumption-lean analysis can be achieved using nonparametric methods such as Kaplan-Meier estimation. The main advantage of moving to a model-free summary measure and assumption-lean analysis is that the validity and interpretation of conclusions do not depend on the proportional hazards (PH) assumption. The potential disadvantage is that the nonparametric analysis may lose efficiency under PH. There is disagreement in recent literature on this issue, with some studies indicating similar efficiency between the two approaches, while others highlight significant advantages for PH models. We present asymptotic results and a simulation study to clarify the conflicting results from earlier research. We characterize those scenarios where relative efficiency is close to one, and those where it isn’t. Several illustrative examples are provided.