posters-monday: 1
Longitudinal Psychological Status and Risk of Major Cardiovascular Events (MCE) in Individuals with Pre-Existing Myocardial Infarction: A Joint Modeling Approach
Nasrin Salimian2,1, Marjan Mansourian2, Masoumeh Sadeghi3, Hamidreza Roohafza4, Hamid Reza Marateb5
1Pardis Specialized Wellness Institute, Iran, Islamic Republic of; 2Department Of Epidemiology and Biostatistics Health School, Isfahan University Of Medical Sciences,; 3Cardiac Rehabilitation Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran; 4Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran; 5Biomedical Engineering Department, Engineering Faculty, University of Isfahan, Isfahan, Iran
Background:
Patients with a history of myocardial infarction (MI) remain at high risk for major cardiovascular events (MCE), yet the role of longitudinal psychological status in this risk remains understudied. Previous research has largely relied on cross-sectional analyses, overlooking individual variability over time. This study examines how longitudinal changes in psychological status affect the risk of MCE in individuals with pre-existing MI.
Methods:
Patients were recruited from new hospital admissions of acute STEMI and were followed up to 3 years. The data for this study were collected from five cities in Iran. We applied a joint modeling approach, integrating a generalized linear mixed model (GLMM) for ordinal outcomes and a survival sub model to assess the relationship between psychological status and MCE risk. Psychological status (No depression, Mild to Moderate, Severe) was measured yearly at three time points. The GLMM estimated individual trajectories of psychological change, while the joint survival model captured its effect on MCE. The model controlled for baseline covariates, including quality of life, gender, age categories, sexual health status, sleep quality, dietary index (GBDI), sleep duration, ejection fraction (EF; normal vs. abnormal), personality type, physical activity, stress level, smoking status, and socioeconomic status.
Results:
The joint model indicated that progressive deterioration in psychological status significantly increased MCE risk (HR = 1.45, 95% CI: 1.20–1.75, p < 0.001). The longitudinal component revealed substantial inter-individual variability, with a subset of patients experiencing significant psychological decline. Among covariates, age, gender, and sleep quality had the strongest impact on MCE risk.
Conclusion:
Dynamic changes in psychological status are a critical factor in post-MI cardiovascular risk. The joint modeling approach provides a robust framework for capturing these effects and highlights the need for early psychological interventions to improve long-term outcomes.
posters-monday: 2
Cardio-metabolic traits and its socioeconomic differentials among school children including MONW phenotypes in India: A baseline characteristics of LEAP-C cohort
Kalaivani Mani1, Chitralok Hemraj1, Varhlunchhungi Varhlunchhungi1, Lakshmy Ramakrishnan1, Sumit Malhotra1, Sanjeev Kumar Gupta1, Raman Kumar Marwaha2, Ransi Ann Abraham1, Monika Arora3, Tina Rawal4, Maroof Ahmad Khan1, Aditi Sinha1, Nikhil Tandon1
1All India Institute of Medical Sciences, Delhi, India; 2International Life sciences Institute, Delhi, India; 3Public Health Foundation of India, Delhi, India; 4HRIDAY, Delhi, India
Background
Cardio-metabolic risks emerge in early life and are transmitted into adult life. Further, these risks may have aggravated due to worsening food security and diet quality during the pandemic. We aimed to assess the prevalence of cardiometabolic traits including the metabolically obese normal weight phenotype and socioeconomic differentials in children and adolescents aged 6-19 years in India.
Methods
A baseline assessment was conducted between August 17, 2022, and December 20, 2022, as part of a school-based cohort study that aimed at longitudinally evaluating the anthropometric and metabolic parameters among urban children and adolescents aged 6-19 years from three public schools and two private schools in India. Private and public schools were considered a proxy for higher and lower socioeconomic status respectively. Blood pressure and blood samples in a fasting state were obtained only from adolescents. The prevalence along with its 95% confidence interval using Clopper exact method and adjusted prevalence ratios was calculated using random-effects logistic regression models.
Results
Among the 3,888 students (aged 6–19 years) recruited, 1,985 were from public schools and 1,903 from private schools. The prevalence of underweight was 4.95% (95% CI 1·25-12·72), significantly higher in public schools (p<0.0001), while general obesity (13.41% (95% CI 2·98-33·87)) and central obesity (9.15% (95% CI 1·40-27·44)) were significantly higher in private schools (adjusted PR = 4.42 and 8.31, respectively). Hypertension prevalence (7.37% (95% CI 6·44-8·38)) was similar across schools, but impaired fasting glucose (adjusted PR = 2.37) and metabolic syndrome (adjusted PR = 3.51) were more common in private schools. Among 2,160 adolescents, 67.73% had a normal BMI, with a 42.86% (95% CI 30·79-55·59) prevalence of the metabolically obese normal weight (MONW) phenotype, higher in public (46.39%) than private (35.33%) schools (p=0.0742). Low HDL-C was the most common MONW abnormality (41.74%), significantly more prevalent in public schools (62.12% vs. 52.73%, p=0.0393).
Conclusion
Effective implementation of food security measures and targeted initiatives will be crucial to mitigate the socio-economic and gender disparities associated with the growing burden of cardiometabolic traits. Metabolic obesity among phenotypically normal or underweight adolescents should not be overlooked but intervened early through novel screening criteria to prevent future cardiovascular burden. These findings also have implications for low-income and middle-income countries like India undergoing nutritional transition where socioeconomic status strongly influences cardio-metabolic traits.
posters-monday: 3
External validation of SMART2 model for recurrent cardiovascular risk
Jasper Wilhelmus Adrianus van Egeraat, Nan van Geloven, Hendrikus van Os
LUMC, Netherlands, The
Background
Assessing performance of prediction models in external data is important before use in medical practice. In real medical data sets, this may be challenged by several data complexities, including censoring, competing events and missing data. For example, when using routine electronic health records, the missing at random (MAR) property required for multiple imputation is often violated, possibly leading to inaccurate performance metrics.
This work illustrates how the combined challenges of censoring, competing events and missing data were addressed when evaluating the predictive performance of the SMART2 prediction model. The SMART2 prediction model can identify individuals at high risk of recurrent atherosclerotic cardiovascular diseases.
Methods
Electronic health records from the Extramural LUMC Academic Network were used to derive routine clinical data from patients registered between January 2010 and December 2021 in the greater Leiden-The Hague region of the Netherlands. Individuals were included if they had been hospitalized for cardiovascular disease. The outcome was the first recurrent occurrence of a composite of non-fatal myocardial infarction, non-fatal stroke, and vascular death within 10 years.
Calibration plots and observed/expected (OE) ratios were determined. Censoring and competing events were incorporated in the observed outcome proportion with the Aalen-Johansen estimator. Discrimination was determined between subjects who developed the primary event before 10 years and those who did not experience any event by 10 years, applying inverse probability of censoring weights.
Missing variables were handled using multiple imputation with chained equations. Longitudinal measurements were used to improve imputation of the measurements used at the prediction moment. To account for possible missingness not at random, a sensitivity analysis was performed by delta-scaling the imputed values after each iteration, mimicking various degrees of missingness not at random.
Results
Out of the 15,561 included patients, 2,257 patients suffered a recurrent cardiovascular event and 2,098 had a competing event. The median follow-up time was 6.07 years. The AUCt was 0.62 (95%CI: 0.60–0.64) and the OE ratio was 0.97 (95%CI: 0.93–1.02).
Discrimination was robust under various delta-scaling parameters. Assuming unobserved predictors were overestimated by the imputation model, scaling imputed values downward by 10% every iteration, resulted in an AUCt of 0.62 (95%CI: 0.60–0.64). The OE ratio changed to 1.01 (95%CI: 0.96-1.05).
Conclusions
In this real-world analysis challenged by censoring, competing evens and missing data, we showed the feasibility of testing robustness of predictive performance assessment under varying degrees of missingness not at random.
posters-monday: 4
Multi-disease risk models to target concomitant diseases and their interactions: Insights on cardio-renal-metabolic syndrome in England
Stelios Boulitsakis Logothetis1, Niels Peek2, Angela Wood1
1British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, United Kingdom; 2THIS Institute (The Healthcare Improvement Studies Institute), University of Cambridge, United Kingdom
Introduction
Clinical risk prediction models are used to identify patients at high risk of disease onset. However, most existing approaches only focus on a single disease, ignoring clusters of conditions with shared pathophysiology and common treatments. Accounting for these relationships could support better disease prevention and health outcomes.
This study develops multi-disease models to jointly predict cardiovascular disease, chronic kidney disease, and metabolic disorders like diabetes. These conditions, collectively termed cardio-renal-metabolic syndrome, share risk factors and intervention effects and are significant contributors to premature mortality. We aim to extract insights about disease progression in the English population and lay the foundations for future individualised multi-disease prediction models.
Methods
We modelled disease progression as a state transition process, fitting a multi-state model to predict 5-year incident cardiovascular disease (CVD) and chronic kidney disease (CKD), with diabetes as a risk factor and death as a competing risk. State transition intensities were jointly estimated using Cox proportional hazards sub-models.
We extracted a novel dataset of electronic health records spanning the entire adult population of England from NHS databases, including diagnoses, laboratory measurements, and treatments. Missing data were multiply imputed, and we ensured congeniality with the multi-state model by including non-parametric state probabilities in the imputation. To support computational feasibility, we discretised and coarsened the time scale and restricted to a curated set of well-established risk predictors.
Results
We identified 394,555 cases of concomitant CVD and CKD among the 48.65 million eligible adults. The incidence of CKD following a CVD diagnosis was approximately twice that of CVD following a CKD diagnosis (24.73 vs. 12.85 per 1000 person-years). The Cox models achieved an average concordance index of 0.882 across imputations. Nearly all predictors were significantly associated with every state transition. The strongest predictor was smoking, with hazard ratios ranging from 2.14-2.69.
Conclusion
We demonstrated how cardio-renal-metabolic syndrome can be jointly modelled at a national scale. Next, we will experimentally evaluate this model’s individual-level predictions and develop more granular multi-state models that include additional clinically relevant intermediate states. The optimisations required for model fitting suggest that classical approaches are reaching their computational limits. Future work will explore machine learning methods to better leverage whole-population electronic health records and their wide range of risk predictors.
posters-monday: 5
Machine learning methods for analyzing longitudinal health data streams: A comparative study
Inês Sousa
Universidade do Minho, Portugal
Chronic kidney disease (CKD) is characterized by kidney damage or an estimated glomerular filtration rate (eGFR) of less than 60 ml/min per 1.73 square meters for three months or more. The performance of six tree-based machine learning models - Decision Trees, Random Forests, Bagging, Boosting, Very Fast Decision Tree (VFDT), and Concept-adapting Very Fast Decision Tree (CVFDT)- are evaluated on longitudinal health data. Longitudinal data, where individuals are measured repeatedly over time, provide an opportunity to predict future trajectories using dynamic predictions that incorporate the entire historical dataset. These predictions are essential for real-time decision-making processes in healthcare. The dataset comprised 406 kidney transplant patients, spanning from January 21, 1983, to August 16, 2000. It captures 120 time points over the first 119 days post-transplant, including baseline glomerular filtration rates (GFR), along with three static variables: weight, age, and gender. Data preprocessing involved robust imputation techniques to handle missing data, ensuring consistency and trend accuracy. The models were trained to predict health outcomes starting from the eight-day post-transplant, progressively incorporating daily values to predict subsequent days up to day 119. Model performance was evaluated using mean squared error (MSE) and mean absolute error (MAE) through data partitioning and cross-validation techniques.
posters-monday: 6
Evaluating the fairness of a clinical prediction model for outcomes following psychological treatment in the UK’s National Health Service
Nour Kanso1, Thalia C. Eley1, Ewan Carr2
1Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; 2Department of Biostatistics and Health Informatics, Institute of Psychology, Psychiatry and Neuroscience, King’s College London, London, UK
Background Depression and anxiety are common psychiatric conditions that significantly affect individuals’ well-being. The UK NHS Talking Therapies programme delivers evidence-based psychological treatments to over a million patients annually, but outcomes are heterogeneous; only half achieve clinical definitions of recovery. Stratified care involves predicting outcomes using patient characteristics to identify individuals who may need adapted or alternative treatments. However, the fairness, accuracy, and generalisability of such prediction models across sociodemographic subgroups remain underexplored. This study evaluates the stability and performance of an existing clinical prediction model for outcomes following treatment across gender, employment status, ethnicity, age, and sexuality.
Methods We evaluated an existing clinical prediction model across sociodemographic subgroups to assess prediction stability and performance variations. Outcomes included reliable improvement in depression (PHQ-9) and anxiety (GAD-7), defined as a change from baseline to the end of treatment exceeding the measurement error of the scale (6 points for depression; 4 for anxiety). Predictors included age, gender, ethnicity, religion, language proficiency, employment, sexuality, long-term condition, disability, medication, prior referrals, diagnosis, and symptom severity. Stability was assessed using bootstrapping (200 iterations), where the model was repeatedly trained on resamples of the dataset and tested within sociodemographic subgroups. Sample size calculations suggested a minimum of 1,788 participants per subgroup, assuming 50% prevalence and a c-statistic of 0.7. Performance was evaluated across subgroups based on calibration and prediction instability.
Results The analytical sample (n = 30,999) was predominantly female (73%) with a median age of 34, and had an ethnic composition including 57% White, and 22% Black, Black British, Caribbean, or African. In the full sample, the model demonstrated good discrimination (depression AUC: 0.76, anxiety: 0.75) and calibration (intercept/slope: -0.00/0.99 (depression), -0.02/1.03 (anxiety)). We observed differences in performance and stability across subgroups. Model calibration and stability were higher for women, whereas the model tended to underestimate outcome probabilities for men. The model also underestimated the probability of reliable improvement for unemployed and retired individuals, especially at the extremes of the probability range. Our full results will present differences by ethnicity, age, and sexuality.
Conclusion No study to date has explored the fairness of clinical prediction models for psychological therapy in the UK NHS. Our study addresses major gaps in understanding predictive performance across sociodemographic subgroups within UK NHS Talking Therapies. By evaluating fairness, accuracy, and stability, findings will inform model refinements, supporting equitable and reliable treatment recommendations.
posters-monday: 7
Multiple Imputation vs. Machine Learning for Handling Missing Data in Prediction Modelling: Which Best Balances Stability, Performance, and Computational Efficiency?
Pakpoom Wongyikul1, Phichayut Phinyo1, Noraworn jirattikanwong1, Natthanaphop Isaradech2, Wachiranun Sirikul2, Arintaya Phrommintikul3
1Department of Biomedical Informatics and Clinical Epidemiology (BioCE), Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; 2Department of Community Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; 3Division of Cardiology, Department of Internal Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
Background: Missing data is a common challenge in clinical prediction modelling. Multiple Imputation with chained equation (MICE) remains the main approach but is computationally intensive and adds complexity. Recent evidence suggests that simpler machine learning-based methods may perform just as well. This study compares MICE and machine learning-based approaches for handling missing data in terms of prediction stability, performance, and computational time to identify the most balanced approach.
Methods: A real-world dataset of 8,245 patients, previously used to develop a clinical prediction model for major adverse cardiovascular events, was utilised. We then generated nine datasets to represent different missing data scenarios, varying by missing variable type (categorical, continuous, or mixed) and missing proportion (20%, 40%, or 60%). All missing data were assumed to be missing at random (MAR). Four methods to handle missing data were evaluated: (1) MICE, (2) random forest (RF), (3) k-nearest neighbor (kNN), and (4) complete case analysis (CCA). Performance and stability were evaluated using the bootstrap internal validation procedure according to Riley and Collins. Model performance was assessed with optimism-corrected area under the curve (AUC) and calibration slopes, while stability was measured using mean absolute prediction error (MAPE). Bootstrapping time was also recorded and compared.
Results: With 20% missing data, RF, MICE, and kNN showed comparable AUC and MAPE, though kNN exhibited poorer calibration. As missing data increased, all methods except CCA maintained similar AUC, but prediction stability declined, particularly for mixed variable types. Across all scenarios, MICE performed best overall, followed by RF. While kNN produced stable predictions with high AUC, significant miscalibration persisted in most cases, except when 20%–40% of continuous data was missing. In terms of computational efficiency, MICE was the most intensive, taking two to three times longer than RF and kNN.
Conclusions: Provided the development sample size is sufficiently large, RF is preferred for its balance of predictive performance, stability, and computational efficiency. If computational time is not a constraint (e.g., with access to high-performance computing), MICE is recommended, followed by RF. Otherwise, kNN may be a suitable alternative when missing data are continuous and below 40%. Finally, CCA should be avoided in all cases.
posters-monday: 8
Estimating rates of dental treatment and unmet dental need in a spatially explicit model in children in England, 2016-2018
Beatrice Catherine Downing
University of Bristol
Registries and the extensive collection of linked data has led to extraordinary advances in our understanding of disease dynamics and optimal resource allocation. However, this requires accompanying investment, collaboration and continuity at multiple levels over many years. In an imperfect world, unlinked data and aggregate counts are more readily available. With proper communication of the uncertainty, aggregate unlinked data from different sources can be used to estimate and validate the prevalence of disease and the scale of unmet clinical need. Here we use publicly available data on the number of dental procedures in children and a signifier of unmet need - the number of hospitalisations for tooth extraction - to estimate relative rates of dental ill-health and to identify areas in England with relatively high levels of unmet need given the level of background deprivation. We used Bayesian hierarchical spatial models to allow for spatial correlation between neighbouring areas, bringing together dental procedures at fine scales and hospital extractions at coarse scales. We demonstrate the power of modelling spatial relationships in systems where both service provision and wider determinants of health show spatial structuring.
posters-monday: 9
Statistical Approach to Assess the Impact of Hospital Settings on Optimal Staffing Levels
Diana Trutschel1, Maryam Ahmadi Shad1, Michael Ketzer1, Jack Kuipers2, Giusi Moffa3, Michael Simon1
1Department of Public Health, Institute of Nursing Science, University Basel, Switzerland; 2Department of Biosystems Science and Engineering, ETH Zürich, Switzerland; 3Department of Mathematics and Computer Science, University of Basel, Switzerland
Background:
Optimal hospital staffing, often measured by the patient-to-nurse ratio (PNR), is critical to healthcare quality and patient outcomes. Variations in PNR are driven by factors originating from both the patient and nursing side of the ratio. Understanding the extent to which these factors influence PNR is essential for designing effective strategies to achieve and sustain optimal staffing levels. Identifying the relative contributions of these influences can guide decision-making by highlighting the potential impact of adjustments within the healthcare setting.
Methods:
The distribution of PNR was derived through theoretical modeling, incorporating the relationship between the number of patients and available nursing staff, and approximate real-world PNRs. Simulations were conducted to explore the impact of key variables such as planned staffing schemes and staff absence rates (80, 85, 90, 95%) representing various healthcare settings presented by unit size (20, 30, 40 beds) and occupancy rate (70, 80, 90%). These simulations estimated the proportion of days with overstaffing and understaffing by calculating the area under the PNR distribution curve for values outside a predefined optimal PNR range. This approach enabled the quantification of deviations from optimal staffing levels across diverse scenarios, providing insights into the sensitivity of PNR to changes in system parameters.
Results:
The simulation results indicate that most common staffing configurations exhibit a high risk of understaffing compared to standard PNR schemes. In a 20-bed unit with a nurse absence rate of 80%, more than 50% of hospital days show overstaffing for PNR values of 6 or higher, whereas more than 50% show understaffing for PNR values of 4 or lower. The findings further demonstrate that variations in staffing plans and nurse absence rates affect the proportion of over- and understaffed days. Smaller units (e.g., 20 beds) are more prone to overstaffing, with nurse absence rates having a more significant influence on overstaffing variability than larger units (e.g., 30 beds).
Discussion:
This study highlights the importance of understanding the PNR dynamics in hospital staffing. By deriving the theoretical distribution of PNR and simulating different settings, we approximated the proportion of overstaffed and understaffed days. The results emphasize the sensitivity of PNR to fluctuations in patient volume and nursing availability, underscoring the need for adaptive staffing strategies. This approach allows the evaluation of staffing policies, offering insights for optimizing resource allocation.
posters-monday: 10
Real-time predictions of bed occupancy in hospitals.
Ensor Rafael Palacios, Theresa Smith
University of Bath, United Kingdom
Increased demand for hospital resources has led to bed occupancy which often approaches and exceeds maximum capacity. Even relatively short periods (e.g., a few days) of elevated bed occupancy can have immediate negative impact at all levels of an hospital service chain, including the number of ambulances available, their response times, and the quality and number of discharges. Predicting periods of high demand, with time horizons up to one or two weeks, is thus of critical operational importance, as it enables hospital managers to proactively initiate adaptive strategies. Here we develop a predictive state-space model of bed occupancy, designed to be deployed within hospitals in real time to support adaptive decision making. We develop and test the model using daily data from two large hospitals in Bristol, United Kingdom. These data include information about bed occupancy itself, admissions, discharges, staffing level and other hospital-level variables; we additionally include information about seasonal infectious diseases (e.g. flu) and weather (e.g., temperature). We benchmark the model against different alternatives, including naive and ARIMA models (with and without covariates) and random forests. For model comparison, we consider multiple loss functions to ensure accurate predictions of different, expert-derived aspects of the data, such as sudden peaks in change occupancy. The next steps involve further validation of the model and testing in an operational setting.
posters-monday: 11
Positive and negative predictive values of diagnostic tests using area under the curve
Kanae Takahashi1, Kouji Yamamoto2
1Osaka Metropolitan University Graduate School of Medicine, Japan; 2Yokohama City University School of Medicine, Japan
In medicine, diagnostic tests are important for the early detection and treatment of disease. The positive predictive value (PPV) and the negative predictive value (NPV) describe how well a test predicts abnormality. The PPV represents the probability of disease when the diagnostic test result is positive, while the NPV represents the probability of no disease when the diagnostic test result is negative. These predictive values inform clinicians and patients about the probability that the diagnostic test will give the correct diagnosis. Compared to sensitivity and specificity, the predictive values are more patient focused and often more relevant in patient cases.
However, the predictive values observed in one study do not apply universally because these values depend on the prevalence. In order to overcome the shortcoming, in this study, we proposed a measure of positive and negative predictive values using area under the curve (PPV-AUC and NPV-AUC). In addition, we provided a method for computing confidence intervals of PPV-AUC and NPV-AUC based on the central limit theorem and delta-method.
A simulation study was conducted to investigate the coverage probabilities of the proposed confidence intervals. Simulation results showed that the coverage probabilities of 95% confidence intervals were close to 0.95 when the sample size was large.
posters-monday: 12
Freely accessible software for recruitment prediction and recruitment monitoring: Is it necessary?
Philip Heesen, Manuela Ott, Katarina Zatkova, Malgorzata Roos
University of Zurich, Switzerland
Background: Scientific studies require an adequate number of observations for statistical analyses. The ability of a study to successfully collect the required number of observations ultimately depends on a realistic study design based on accurate recruitment predictions. Inaccurate recruitment predictions inevitably lead to inappropriately designed studies, small sample sizes and unreliable statistical inference, increasing the risk of study discontinuation and wasted funding. To realistically predict recruitment, researchers need free access to statistical methods implemented in user-friendly, well-documented software. Methods: A recent systematic review assessed the availability of software implementations for predicting and monitoring recruitment. Results: This systematic review demonstrated that freely accessible software for recruitment predictions is currently difficult to obtain. Although several software implementations exist, only a small fraction is freely accessible. Ultimately, only one article provided a link to directly applicable free open-source software, but other links were outdated. Conclusion: To improve access for researchers worldwide, we propose three measures: First, future authors could increase the findability of their software by explicitly mentioning it in titles, abstracts and keywords. Second, they could make their software available online on open access platforms. Finally, they could provide user-friendly documentation and instructive examples on how to use the statistical methods implemented in their software in applications. In the long term, it could become standard practice to use such software for insightful recruitment predictions and realistic decision making. Such realistic decisions would increase the chance that studies are appropriately designed, adequately powered, and successfully completed, thereby optimising the use of limited funding resources and supporting scientific progress worldwide.
posters-monday: 13
On moderation in a Bayesian log-contrast compositional model with a total. Interaction between extreme temperatures and pollutants on mortality
Germá Coenders1,2, Javier Palarea-Albadalejo3, Marc Saez1,2, Maria A. Barceló3
1Research Group on Statistics, Econometrics and Health (GRECS), University of Girona, Spain; 2Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP). Instituto de Salud Carlos III, Madrid, Spain; 3Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Spain
Introduction: Compositional regression models with a dependent real variable can be specified as log-contrast models with a zero-sum constraint on the model coefficients. Moreover, the Bayesian approach to model fitting, through the Integrated Nested Laplace Approximation (INLA) method, is gaining increasing popularity to deal with complex data structures such as spatiotemporal observations.
Methods: In this work, we combine these elements and extend the approach to encompass both total effects, formally defined in a T-space, and moderation or interaction effects into the data modelling. The interpretation of the results is formulated both, in the original scale of the dependent variable and in terms of elasticities.
An illustrative case study is presented aimed at relating all-cause mortality with the interaction between extreme temperatures, air pollution composition, and total air pollution in Catalonia, Spain, during the summer of 2022.
Results: The results show that extreme temperature, exposure to total pollution and to some pollutants in particular (ozone and particulate matter), allowing for some delay in their effect, were associated with increased risk of dying. Also, again considering delayed effects, the mortality risk particularly increased on days of extreme temperatures and greater exposure to ozone.
Conclusions: When assessing the effects of extreme temperatures on mortality, the effects of composition and total pollution, and not just individual pollutants, as well as possible interactions, must be taken into account.
posters-monday: 14
Graphical inference in nonparametric hypothesis testing: A two-dimensional grid framework for analysis and visualization
Lubomír Štěpánek1,2, Ondřej Vít2, Lubomír Seif2
1First Faculty of Medicine of Charles University (Czech Republic); 2Faculty of Informatics and Statstics of Prague University of Economics and Business (Czech Republic)
Background / Introduction: Nonparametric tests often utilize intuitive concepts such as ranking of observations or assessing pre-post changes. While these tests, including the Mann-Whitney test and signed-rank tests, offer numerical precision, they can also be interpreted graphically. Though graphical techniques cannot replace numerical calculations, they enhance comprehension of test logic and may lead to practical heuristic formulations. Methods: This study revisits graphical inference testing for selected nonparametric tests, including both two-sample and paired tests. The graphical testing approach transforms test statistic construction into orthogonal directional changes on a two-dimensional finite-step grid. The graphical pathway depends on what the test statistics emphasize in the observations. For two-sample tests, both the ranking distribution and sample affiliation changes matter, whereas, for paired tests, the sequence of positive and negative pre-post changes is critical. These changes are represented as unit steps in orthogonal directions on the grid. Under the null hypothesis of no difference, graphical pathways exhibit an almost regular alternation of grid directions, which follow a binomial distribution and can thus be analyzed within a probability framework. As a novel contribution, we apply Popoviciu’s inequality to derive an upper bound on the probability of observing data contradicting the null hypothesis to the same or a greater extent, thereby estimating the p-value and offering insights into the statistical power of the test. Results: We developed R functionality for computing and visualizing two-dimensional grids used in graphical inference testing. The grids highlight regions corresponding to typical null hypothesis rejection scenarios. In particular, the grids accommodate asymmetric null hypotheses for the signed-rank test by upper-bounding directional "traffic" maxima. Various simulations were conducted, evaluating different sample pairs and pre-post scenarios to demonstrate the method's applicability. Conclusion: Graphical inference testing provides an alternative perspective on nonparametric hypothesis testing, fostering better understanding and serving educational purposes. The developed R functionality for graphical testing will soon be integrated into an R package, expanding accessibility and usability for statistical analysis and instruction.
posters-monday: 15
On regression analysis of interval-valued data based on order statistics
Ryo Mizushima, Asanao Shimokawa
Tokyo University of Science / Japan
Background / Introduction:
Some of today's diverse data may be given as interval values, such as blood pressure, instead of point values. Interval values can also be used to summarise point-valued data by certain characteristics. For example, the temperature at a certain point in time is given as a point value, but the temperature throughout the day can be described as a minimum and maximum temperature. Most studies in regression analysis of interval-valued data have been proposed based on methods using midpoint and width information. However, those methods rarely consider the information in the interval. In this study, therefore, we consider the case where the upper and lower values of the objective variable and essentially the number of individuals in between are known. An example would be a hospital with 100 patients, where some numerical information on their health status is available, but only the maximum and minimum values are known among them from a privacy point of view.
Methods:
We propose a model that takes into account the information in the intervals of the objective variable. The proposed method assumes that the values in the interval are generated based on a certain distribution and aims to estimate the distribution of the objective variable under a given set of explanatory variables. To this end, the upper and lower sides of the objective variable are considered as the maximum and minimum values of the order statistics and the number of observations whose values are not known in the interval is assumed to be known. The maximum and minimum values and the number of observations between them give the conditional probability density function of the objective variable given the explanatory variables from the nature of the order statistic. It is used as a likelihood function to obtain a maximum likelihood estimator of the parameters of the distribution. The estimator can give approximate confidence intervals for the parameters.
Results:
We checked through simulations the behaviour of parameter estimators and approximate confidence intervals under finite samples when the sample size and the number of objective variables in the interval are varied. The results show that they can be successfully estimated under several conditions.
Conclusion:
We proposed a method of regression analysis using a likelihood function obtained from the information in the interval of the objective variable, considering the maximum and minimum values of the interval as order statistics.
posters-monday: 16
Two-sided Beyesian simultaneous credible bands in linear regression model
Fei Yang
University of Manchester, United Kingdom
Credible bands, which comprise a series of credible intervals for each component of a parameter vector, are frequently employed to visualize estimation uncertainty in Bayesian statistics. Unlike the often-used pointwise credible interval, simultaneous credible bands (SCBs) can cover the entire parameter vector of interest with an asymptotic probability of at least 1-α.
In this study, in order to assess where lies the true model xT θ from which the observed data have been generated, we propose the two-sided 1-α level Bayesian simultaneous credible bands for the regression line xT θ over a finite interval of the covariate x in a simple linear regression model. By incorporating the prior information, the proposed method exhibits advantages over the traditional frequentist approach in more robust and stable estimates, especially in cases with limited data.
Using non-informative priors, we analyze the posterior distribution of targeted parameters of interest and employ Monte-Carlo simulations to produce the critical constant related to the construction of Bayesian SCBs.
Simulation results show that the proposed methodology has highly satisfactory frequentist properties. Additionally, it meets the required false-positive rate with a pre-specified level of certainty. Real data analysis in drug stability studies also verify its effectiveness of the proposed framework.
posters-monday: 17
Cell composition analysis with unmeasured confounding
Amber Huybrechts1,2, Koen Van den Berge2, Sanne Roels2, Oliver Dukes1
1Ghent University, Belgium; 2Janssen Pharmaceutica, Belgium
Analysis of single-cell sequencing data, in particular cell abundance data where one counts the number of cells detected for each cell type in each sample, involves handling data compositionality. Indeed, cell composition data contain only relative information on a cell type’s abundance. An increase in one cell type might therefore also be reflected as a decrease in other cell types’ abundance. This makes estimating causal disease effects in cell composition data challenging, especially in the presence of confounders. On top of that, not all confounders might be observed.
Existing methods like CATE [1] and RUV-4 [2] attempt to obtain unbiased disease or treatment effects by estimating the unmeasured confounders using factor analysis and making assumptions on sparsity and the existence of negative controls. However, it is uncertain how these methods perform in the context of cell composition analysis, where in addition to the compositionality, the number of features is smaller in comparison to the settings where these methods are generally used (e.g. gene expression analysis with thousands of genes).
In this work, we investigate how we can account for compositionality and unmeasured confounders when assessing differences in cell type abundance between biological conditions. We find that a vanilla factor analysis model, typically used for estimating unmeasured confounders, is unsuitable in the context compositional data, and evaluate alternative approaches.
[1] Jingshu Wang, Qingyuan Zhao, Trevor Hastie, Art B. Owen, "Confounder adjustment in multiple hypothesis testing", The Annals of Statistics, Ann. Statist. 45(5), 1863-1894, (October 2017)
[2] Johann A. Gagnon-Bartsch, Laurent Jacob, Terence P. Speed “Removing Unwanted Variation from High Dimensional Data with Negative Controls”, Berkeley University of California (December 2013)
posters-monday: 18
Temporal transcriptomic analysis of microexon alternative splicing in mouse neurodevelopmental genes
Jimin Kim, Kwanghoon Cho, Jahyun Yun, Dayeon Kang
Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Korea, Republic of (South Korea)
Background: Alternative splicing plays a pivotal role in gene regulation, particularly within neurological processes. Microexons, short exon sequences ranging from 3 to 27 base pairs, are highly neuron-specific and fine-tune protein interactions within synaptic networks. Dysregulation of microexon splicing has been linked to impaired neuronal connectivity and altered synaptic function, which are hallmarks of neurodevelopmental disorders. Understanding the dynamic regulation of microexon splicing across developmental stages is crucial for identifying potential biomarkers and therapeutic targets.
Methods: We investigated the temporal dynamics of microexon splicing by analysing whole-cortex RNA sequencing (RNA-seq) data from mice across eleven developmental stages, spanning embryonic, postnatal, and ageing periods. We focused on microexons under 30 base pairs, using the Percent Spliced In (PSI) metric to assess alternative splicing patterns. Our analysis centred on genes involved in neural function and neurodevelopmental disorders to explore the role of microexons in neuronal maturation and synaptic function.
Results: We identified distinct stage-specific microexon splicing patterns in several genes, highlighting the complexity of microexon regulation during cortical development. During early embryonic stages (E10–E16), low PSI values were observed for genes involved in neurogenesis and axon guidance, such as Nrcam and Robo1. Nrcam showed a gradual increase in PSI during embryogenesis, whereas Robo1 exhibited a decline from embryonic to postnatal stages, reflecting their roles in neuronal connectivity and circuit stabilisation, respectively. In postnatal stages, Shank3 and Dlgap1 showed significant PSI increases, indicating their involvement in synaptic maturation and plasticity. Conversely, Bin1 displayed a decline in PSI during maturation and ageing, suggesting a shift from synaptic plasticity to stability.
Conclusion: This study demonstrates the importance of microexons in neural development and their potential contribution to neurodevelopmental disorders. The stage-specific PSI variations indicate that microexons are crucial for neural circuit formation, synaptic plasticity, and functional specialisation. The observed co-regulation patterns suggest that microexon splicing is tightly regulated, orchestrating key neurodevelopmental events. Future research into the regulatory mechanisms governing microexon splicing will be essential to understanding their broader biological implications and therapeutic potential.
posters-monday: 19
A Robust Method for Accurate Reconstruction of 3D Genome Conformation from Hi-C Data
Insu Jang1,2, MInsu Park1
1Department of Information and Statistics, Chungnam National Unversity, Korea, Republic of (South Korea); 2Korea Research Institute of Bioscience and Biotechnology, Korea, Republic of (South Korea)
The three-dimensional (3D) organization of the genome within the cell nucleus plays a pivotal role in critical biological processes, including transcriptional regulation, DNA replication, and repair. Disruptions to this spatial organization, such as aberrant chromatin looping or genomic deletions, are linked to various diseases. Despite its significance, resolving the 3D genome architecture has been historically challenging due to the lack of techniques for high-resolution chromatin mapping. The advent of Chromosome Conformation Capture (3C) technologies, particularly Hi-C, revolutionized this field by enabling genome-wide quantification of chromatin interactions. Hi-C produces a contact count map, providing interaction frequencies between genomic loci, which serves as the basis for computational 3D genome reconstruction. However, deriving biologically meaningful 3D structures from Hi-C data remains computationally challenging due to noise and chromatin complexity. To overcome these challenges, we propose a novel, robust methodology combining Thin Plate Spline (TPS) and Non-Metric Multi-Dimensional Scaling (nMDS), specifically designed to infer smooth and biologically plausible 3D genomic structures while being resilient to noise. Our method was rigorously evaluated on simulated datasets encompassing diverse sized structures with varying levels of noise, as well as on real Hi-C data from the IMR90 cell line. Comparative assessments using simulation datasets demonstrated that our approach consistently produced robust and smoother results under varying noise conditions, outperforming existing models in handling varying levels of noise. Furthermore, its predictive validity was substantiated through comparisons with 111 replicate conformations derived from Multiplexed Fluorescence in situ hybridization (M-FISH) imaging, providing strong empirical support for the method and its applications in 3D genome analysis.
posters-monday: 20
Balancing Accuracy, Clinical Utility, and Explainability: A Machine Learning Approach to Prostate Cancer Prediction
Luis Mariano Esteban1,2, Rocío Aznar1,3, Angel Borque-Fernando4,5, Alejandro Camón5, Patricia Guerrero5
1Escuela Universitaria Politécnica de la Almunia, Universidad de Zaragoza, Spain; 2Institute for Biocomputation and Physics of Complex Systems (BIFI), Spain; 3Instituto Tecnológico de Aragón, Spain; 4Department of Urology, Miguel Servet University Hospital, Spain; 5Health Research Institute of Aragon Foundation, Spain
Background
Advances in mathematical modelling have significantly improved cancer diagnosis. While these models enhance predictive performance, typically measured by discriminative power, they often overlook their role as classification tools. Recently, greater emphasis has been placed on their clinical utility and explainability, highlighting the need for models that balance accuracy with interpretability. Tools such as clinical utility curves and Shapley values can help achieve this balance.
Methodology and results
We analysed data from 86,359 patients at Miguel Servet University Hospital, Zaragoza, Spain (2017–2022) with at least one PSA measurement, including 2,391 prostate cancer diagnoses, to develop a predictive model for PCa. From their clinical records, we selected approximately 50 demographic and clinical variables as candidate predictors, including PSA, free PSA, PSA history, blood analysis parameters, and comorbidities. Several machine learning models were tested, including logistic regression, ridge regression, LASSO, elastic net, classification trees, random forest, neural networks, and Extreme Gradient Boosting (XGBoost). Model performance was validated using an external dataset of 47,284 patients from the Lozano Blesa University Hospital.
XGBoost demonstrated the best discrimination in the validation cohort, with an AUC of 0.965, sensitivity of 0.904, and specificity of 0.914. More importantly, it also showed the highest clinical utility. For a cutoff that resulted in a 5% diagnostic loss in the training dataset, the validation dataset showed a 7.87% loss while recommending biopsy for 11.1% of patients. In comparison, a screening policy of biopsying all patients with PSA > 3 would result in 15.3%.
To assess variable influence within the XGBoost model, we used SHAP values (SHapley Additive exPlanations), a game theory-based method for evaluating feature importance in predictive models. SHAP values indicate the contribution of each variable for each individual and can be analysed collectively or individually. In our analysis, PSA was the most influential risk factor, producing the highest Shapley values. Protective factors included older age, multiple PSA readings between 3.2 and 8 with negative biopsies, and prolonged use of antihypertensives, statins, or antidiabetics. Conversely, a previous negative biopsy with ASAP or PIN was a notable risk factor.
Conclusions
This study developed a predictive tool for prostate cancer with high accuracy while minimising unnecessary biopsies. As screening protocols remain unstandardised in Spain, it is crucial to explore alternative strategies that incorporate models capable of reflecting variable importance and clinical utility before implementation.
posters-monday: 21
Development of a thyroid cancer recurrence prediction calculator: A regression approach
Jiaxu Zeng
University of Otago, New Zealand
Background
The thyroid cancer staging calculator has been recognised as one of the most efficient tools for assisting clinicians in making clinical treatment decisions. However, the current calculator is missing patients’ serum Thyroglobulin information, which is crucial for staging cancer patients in practice. The primary aim of this study is to update current calculator with serum thyroglobulin included based on the tertiary thyroid cancer service database from Australia.
Methods
Records from 3962 thyroid patients were analysed for training a logistic model for predicting recurrence. Twelve predictive variables were chosen under close guidance of thyroid cancer specialists, which includes age at operation, sex, number of carcinomas presented in the operation, size of the greatest tumour, histologic type of carcinoma, extrathyroidal extension status of tumours, pathologic staging of the primary tumour, presence of venous invasion of the primary tumour, immunohistochemistry for the primary tumour, presence of extranodal spread, number of lymph nodes and serum thyroglobulin level presented in the scans.
Results
The strongest predictors were number of lymph nodes, histologic type of carcinoma and most importantly, the serum thyroglobulin level. The model demonstrated excellent performance with an AUC of 0.874.
Conclusions
This study has addressed an important concern that serum thyroglobulin information was not used to predict thyroid cancer recurrent in practice.
posters-monday: 22
A comparison of methods for modelling multi-state cancer progression using screening data with censoring after intervention
Eddymurphy U. Akwiwu, Veerle M.H. Coupé, Johannes Berkhof, Thomas Klausch
Amsterdam UMC, Amsterdam, The Netherlands
Background: Optimizing cancer screening and surveillance frequency requires accurate information on parameters such as sojourn time and cancer risk from pre-malignant lesions. These parameters can be estimated using multi-state cancer models applied to screening or surveillance data. Although multi-state model methods exist, their performance has not been thoroughly investigated, specifically not in the common setting where cancer precursors are treated upon detection so that the transition to cancer is prevented. Our main goal is understanding the performance of available multi-state methods in this challenging censoring setting.
Methods: Six methods implemented in R software packages (msm, msm with a phase-type model, cthmm, smms, BayesTSM, and hmm) were compared. We assumed commonly used time-independent (i.e., exponential) or time-dependent (i.e., Weibull) progression hazards between consecutive health states in a three-state model (healthy, HE; cancer precursor; cancer) in simulation studies. Bias, empirical standard error (ESE), and root mean squared error (rMSE) of progression risk estimates were compared across methods. The methods were illustrated using surveillance data from 734 individuals at increased risk of colorectal cancer, classified into three health states: HE, non-advanced adenoma (nAA), and advanced neoplasia (AN). Age was used as the time scale in the analysis, with both the risks estimates of developing nAA from HE and AN after the onset of nAA compared across the methods.
Results: All methods performed well with time-independent progression hazards in simulation study. With time-dependent hazards, only the packages smms and BayesTSM provided unbiased risk estimates with low ESE and rMSE. In the application (median follow-up: 6 years), 447 (65%), 208 (28.3%) and 49 (6.7%) individuals were classified as HE, nAA and AN, respectively. Only the packages msm, hmm, and BayesTSM yielded converged solutions. The risks estimates of developing nAA from HE were similar between hmm and BayesTSM (e.g., nAA risks estimates at age 30 were approximately zero to 2 decimal places) but differed for the msm package (e.g., nAA risk estimate at age 30 was 16%), while the risks estimates of developing AN after the onset of nAA varied (5-year risk range: 3% to 23%) across methods.
Conclusion: Methods for multi-state cancer models, more specifically with unobservable precursor to cancer transition, are strongly impacted by the time dependency of hazard. Careful consideration is crucial when selecting a method for multi-state cancer models. With more realistic (time-dependent hazard) models, the BayesTSM and smms packages performed accurately. However, BayesTSM outperformed in situations with weakly identifiable likelihoods.
posters-monday: 23
ADHD and 10–Year Disease Progression from Initiating Pharmacotherapy for Hypertension to Death: A Multistate Modelling Analysis
Yiling Zhou1, Douwe Postmus1, Anne van Lammeren2, Casper F.M. Franssen1, Harold Snieder1, Catharina A. Hartman1
1University of Groningen, University Medical Center Groningen, Netherlands, The; 2Expertisecentrum Fier, Leeuwarden, the Netherlands.
Background: Attention-deficit/hyperactivity disorder (ADHD)—the most common neurodevelopmental disorder—affects 2.5% of adults globally and is associated with a 1.5–2-fold increased risk of hypertension, which typically manifests a decade earlier than in the general population. However, this population regarding their cardiovascular health has been largely overlooked in research and clinical practice. This nationwide cohort study aims to investigate 10-year disease trajectories after initiating hypertension pharmacotherapy in adults with ADHD using multistate modelling.
Methods: This nationwide cohort study included adults aged 18–90 years in the Netherlands who initiated hypertension medication between 2013 and 2020, without prior cardiovascular disease (CVD) or chronic kidney disease (CKD). Hypertension was defined as the initial state, critical complications (stroke, heart failure hospitalisation [HHF], acute myocardial infarction, and CKD) as intermediate states, and death from a cardiovascular or renal cause (cardiorenal death) or other causes as final states. Transition rates were estimated using Cox proportional hazards regression, individual-level trajectories were generated via microsimulation, and the effect of ADHD was estimated by comparing outcomes in individuals with ADHD to a counterfactual scenario where individuals were assumed not to have ADHD.
Results: Of 592,362 adults included, 9,728 had ADHD (median age, 45.0 years). Compared to the counterfactual scenario, individuals with ADHD had a higher 10-year risk of cardiorenal death via the HHF pathway (risk difference [95% CI]: 4.8 [2.0–9.2] per 10,000 persons), driven by increased transition risks from hypertension to HHF (14.2 [7.6–26.1] per 10,000 persons), and HHF to cardiorenal death (752.8 [55.7–1517.9] per 10,000 persons). Similarly, individuals with ADHD had an elevated 10-year risk of cardiorenal death via the CKD pathway (2.8 [1.3–7.0] per 10,000 persons), primarily due to an increased transition risk from CKD to cardiorenal death after CKD onset (351.6 [175.7–782.4] per 10,000 persons).
Conclusion: In individuals initiating hypertension medication without pre-existing CVD or CKD, ADHD was associated with a worse 10-year prognosis of hypertension, particularly for the pathways initiated by heart failure and CKD. Our findings indicate the importance of interdisciplinary care and highlight the need for research aimed at preventing heart failure after hypertension onset and optimising heart failure and CKD management in individuals with ADHD.
posters-monday: 24
Understanding PSA Dynamics: Integrating Longitudinal Trajectories, Testing Patterns, and Disease Progression.
Birzhan Akynkozhayev, Benjamin Christoffersen, Mark Clements
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Sweden
Background: The prostate-specific antigen (PSA) test is a widely used, inexpensive test for prostate cancer screening and prognosis. However, its clinical utility remains debated. Examining PSA trajectories over time provides deeper insights into prostate cancer risk and progression. We believe that a critical challenge in such analyses is the observational process: men with higher PSA levels tend to undergo more frequent testing, introducing bias when PSA trajectories are evaluated without adjustment for different follow-up patterns. This study utilises the Stockholm Prostate Cancer Diagnostics Register, which contains PSA measurements from over half a million men living in Stockholm between 2003 and 2023.
Methods: Longitudinal mixed-effects models were applied to characterise the PSA trajectories. Separate survival models were fitted for time-to-prostate-cancer-diagnosis and the observational process, where time-to-next PSA test was treated as a recurrent event. A full joint model was fitted to incorporate both processes, examining different association structures---including current PSA value, rate of change, and cumulative PSA levels---for their predictive impact on prostate cancer diagnosis and testing behaviour. The survival component of the model incorporated recurrent events (time-to-next-PSA test for the observational process) alongside a terminal event (time-to-diagnosis) while also accounting for delayed entry. Model estimation was facilitated by our recently developed VAJointSurv framework, allowing scalable inference through variational approximations for fast integrations.
Results: Our findings highlight the importance of accounting for the observational process in PSA testing. Frequent follow-up testing among men with higher PSA values influenced PSA trajectory characterisation and hazard estimates for diagnosis. We found that modelling the observational process and disease progression separately yielded different results compared to a joint approach, which combines and accounts for both processes. These findings indicate that joint modelling may provide a more comprehensive understanding of PSA dynamics and its relationship with disease progression, rather than modelling each process separately.
Conclusions: By jointly modelling PSA trajectories, disease progression, and the observational process, we provide a robust framework for understanding the relationship between these related processes. To our knowledge, this is the largest study of longitudinal and joint PSA modelling. These findings may aid researchers who are exploring PSA trajectories over time. This approach highlights the necessity of adjusting for the observational process to derive accurate assessments of PSA trajectories and prostate cancer risk.
posters-monday: 25
Joint Modeling for Principal Stratification: Analyzing Stroke Events in CADASIL Patients Across NOTCH3 Variants
Léa Aguilhon, Sophie Tezenas du Montcel, Juliette Ortholand
Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, F-75013, Paris, France, France
Background: Principal Stratification is a statistical framework designed to analyze causal effects by considering subgroups of individuals defined by their potential outcomes, often in the context of mediators or intermediate variables. This method is especially valuable for addressing challenges where treatment effects are influenced by intermediate events, such as competing events, allowing for a more accurate estimation of causal effects. While powerful, principal stratification relies on unobservable counterfactual strata, which raises identifiability challenges, often requiring strong assumptions too. Recent methodological advancements have focused on reducing these assumptions with estimation techniques. In parallel, powerful estimators such as joint models have been developed as effective tools for predicting event outcomes using repeated measures.
Objective: This study uses joint modeling to reduce reliance on untestable assumptions in principal stratification. We applied this approach to study Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy (CADASIL), a genetic disorder that impacts small blood vessels. We analyze stroke occurrence in the presence of death across NOTCH3 variants (1-6 and 7-34).
Method: We analyzed observational follow-up data from 337 CADASIL patients who were followed on average for 5.6 years. We studied the occurrence of the second stroke event (83 observed events) in the context of death truncation (70 observed events) and used functional and cognitive scores to inform both events’ occurrences.
Membership of the main strata was obtained from the predicted counterfactual of death from a Bayesian multivariate joint model (JMBayes2 package). Inverse Probability of Treatment Weighting was used to adjust on covariates as sex, cardiovascular risks, education level, and baseline scores. And Restricted Mean Survival Time (RMST) was then applied to quantify stroke-free survival in the "always-survivor" subpopulation.
Results: Preliminary analysis indicates that carriers of the NOTCH3 1-6 variant have a lower 2-year and 5-year restricted mean stroke-free survival time compared to non-carriers, suggesting an accelerated time to second stroke.
Conclusion: We used joint modeling to estimate the probability of each patient of belonging to the principal strata (always survivor), in context of death truncation. This permits alleviating assumptions necessary for Principal Stratification estimation. Finally, this study provides valuable insights into CADASIL progression and paves the way for the design of future clinical trials.
posters-monday: 26
Challenges in modelling neuropsychiatric symptoms in early Alzheimer's Disease
Rachid Abbas
F. Hoffman -La Roche, France
The Neuropsychiatric Inventory (NPI) is a structured caregiver-based questionnaire designed to assess and quantify neuropsychiatric symptoms in individuals with various neurodegenerative disorders. NPI is a widely utilized instrument in clinical and research settings to provide a comprehensive evaluation of behavioral and psychological symptoms associated with cognitive impairment. The NPI has demonstrated good reliability and validity across various neurological conditions, such as Alzheimer's disease (AD), frontotemporal dementia, and vascular dementia.
Most of the current AD clinical trials are focused on the early stage of the disease when neuropsychiatric symptoms are rare, but there is a growing interest in the detection of their incidence as this may be viewed as a clinically meaningful hallmark of deterioration of the quality of life. From an analytical perspective, this requires to analyse a continuous variable with an excess of 0, also described as overdispersed data.
Over-dispersed data refers to a situation where the variance of observed data exceeds what would be expected under a theoretical distribution, such as a Poisson distribution. This departure from the assumptions of homogeneity in variance poses challenges in statistical modeling, as it may lead to inefficient parameter estimates and inflated Type I error rates. Many analytical solutions were proposed to tackle such issues and contribute to a more accurate and robust statistical analysis of over-dispersed count data.
In this work, we quantified the benefits in terms of predictive performance and type I error control of various analytical approaches to handle over-dispersed NPI data. Our findings allow us to make evidence-based recommendations on analysis strategies. By optimizing the statistical approach to NPI data analysis, we pave the way for more sensitive and reliable detection of treatment effects on neuropsychiatric symptoms in early-stage AD clinical trials. As we continue to push the boundaries of AD drug development, these methodological advancements will be crucial in unlocking new possibilities for upcoming targeted interventions.
posters-monday: 27
Network models to decipher the Human exposome : application to food exposome patterns in the general population
Ima Bernada1, Gregory Nuel2, Cécilia Samieri1
1INSERM U1219, France; 2LPSM, CNRS 8001
Complex chronic diseases are partly due to the exposome. Some exposures co-occur in usual life and combination of exposures, rather than single factors, contribute to disease development. In co-exposure modeling, most studies use risk scores or dimension reduction approaches. These ignore important features of the dependency structure, like highly connected variables that may play a central role in disease development. Network approaches allow to capture the full complexity of the exposome structure. Our objective was to decipher the food exposome, encompassing both intakes and biological fingerprints, in a large cohort of older persons. We aimed at characterizing diet intake networks, understanding how they may be reflected internally through diet-related metabolites networks, and integrating the two in a bipartite network.
We analyzed a sample of n=311 participants from the 3C-Bordeaux cohort study who answered a dietary survey with assessment of intakes in 32 food groups (n=1730) and provided blood draw for measurement of 143 food-related metabolites (n=375). Using MIIC algorithm based on conditional mutual information, we constructed three co-exposure networks: (i) food co-consumption network; (ii) food-related metabolite network; and (iii) food-to-metabolite bipartite network. To address estimation uncertainty, networks were analyzed through bootstrap replication, using graph theory metrics (e.g. degrees, distances). Obtaining collections of networks by bootstrap replication enabled to quantify the uncertainty of each link. This approach allowed a rigorous analysis of the results. A consensus network was also searched and represented. The networks were also studied through clustering, on one hand by using a priori clusters (e.g. metabolite families), and on the other hand using a node clustering method.
The consensus food co-consumption network reflected the French southwest diets of older person living in Bordeaux in early 2000. A subnetwork centered on potatoes indicated that its consumption was central and closely linked to that of many other foods contained in the traditional south-western diet. The metabolite network showed expected links between metabolites originating from the same food sources or the same biological pathways. Finally, when taking an interest in the links between food components and metabolites, we found expected biological links and more novel associations which warrant further investigations.
Network approaches applied simultaneously to food intakes and food-derived metabolites allow to integrate both external and internal parts of the food exposome in a single statistical framework. This integrated behavioral-biological approach gives novel insights on how environmental exposures such as diet impact biology and health.
posters-monday: 28
Study brain connectivity changes in Dementia with Lewy Bodies with Functional Conditional Gaussian Graphical Models
Alessia Mapelli1,2, Laura Carini3, Michela Carlotta Massi2, Dario Arnaldi4,5, Emanuele Di Angelantonio2,6,7, Francesca Ieva1,2, Sara Sommariva3
1MOX – Laboratory for Modeling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Italy; 2HDS– Health Data Science Center, Human Technopole, Italy; 3Università degli studi di Genova, Department of Mathematics, Italy; 4Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health (DINOGMI), University of Genoa, Genoa, Italy; 5Clinical Neurophysiology Unit, IRCCS Ospedale Policlinico S. Martino, Genoa, Italy; 6Blood and Transplant Research Unit in Donor Health and Behaviour, Cambridge, UK; 7Dept of Public Health & Primary Care, University of Cambridge, Cambridge, UK
Dementia with Lewy Bodies (DLB) represents the second most common cause of neurodegenerative dementia after Alzheimer's Disease. Alterations in functional connectivity in the brain are possible phenotypic expressions of this disorder. Multivariate time series analysis of signals resulting from electroencephalography (EEG) is extensively used to study the associations between simultaneously recorded signals, and thus quantify functional connectivity at a subject level. Network-Based Statistic is a modern approach used to perform statistical group-level analysis and identify differential connectivity graphs between groups of patients presenting different clinical features . However, current methods fail to distinguish between direct and indirect associations between brain areas and often neglect the impact of confounding factors. We propose a conditional Gaussian graphical model for multivariate random functions to achieve a population-level representation of the conditional dependence of brain functionality captured by EEG, allowing the graph structure to vary with the external variables.
Our method builds on the work of Zhao et al. [1], extending their high-dimensional functional graphical model to account for external variables. In this approach, each node in the graph represents the signal from an EEG electrode. We adopt a neighborhood selection strategy to estimate sparse brain connectivity graphs based on penalized function-on-function regression. Briefly, each node's signal is predicted from the signals of all other nodes using a lasso-based function-on-function regression. External variables (such as phenotype and age) are included in the model as interactions with the signals to capture their influence on brain connectivity. By combining the estimated neighborhoods, we recover the complete graph structure, which can adapt to variations in external factors. The key advantage of this method is its ability to detect differential connectivity changes associated with specific conditions modeling confounder-linked networks for more accurate estimates.
The method was first validated through simulated data mimicking high-density EEG data recorded using a 64-electrode cap during an eyes-closed resting state task. The method was then tested on experimental data demonstrating the capability of the proposed approach in characterizing differences in functional connectivity in DLB patients with different clinical features, including hallucinations, fluctuations, parkinsonism, and REM sleep behavior disorder.
This study introduces a novel conditional graphical model for multivariate random functions that enables more precise modeling of brain connectivity by accounting for conditional relationships and mitigating confounding bias in differential network analysis.
[1] Zhao, Boxin, et al. "High-dimensional functional graphical model structure learning via neighborhood selection approach." Electronic Journal of Statistics 18.1 (2024): 1042-1129.
posters-monday: 29
LongiSurvSHAP: Explaining Survival Models with Longitudinal Features
Van Tuan NGUYEN, Lucas Ducrot, Agathe Guilloux
Inria, Université Paris Cité, Inserm, HeKA, F-75015 Paris, France
Background Recent developments in survival models integrating longitudinal mea- surements have significantly improved prognostic algorithm performance (Lee, Yoon, and Van Der Schaar 2019; Bleistein et al. 2024). However, their complexity often renders them black boxes, limiting applicability, particularly in critical fields like healthcare. Regulatory frameworks in the EU and the US now require interpretability tools to ensure model predictions align with expert reasoning, thereby enhancing reliability (Geller 2023; Panigutti et al. 2023). Despite this requirement, research on explaining these models re- mains limited, and existing methods are often constrained to specific architectures (Lee, Yoon, and Van Der Schaar 2019).
Methods We introduce LongiSurvSHAP, a model-agnostic explanation algorithm de- signed to interpret any prognostic model based on longitudinal data. While TimeSHAP (Bento et al. 2021) extends the concept of SHapley Additive exPlanations (SHAP) to time series classification, we advance this framework to survival analysis, accommodating irregular measurements of longitudinal features, which is ubiquitous in healthcare.
Results Our algorithm provides both individual and global explanations. Exten- sive simulations demonstrate LongiSurvSHAP’s effectiveness in detecting key features and identifying crucial time intervals influencing prognosis. Applied to data from MIMIC (Johnson et al. 2016), our method aligns with established clinical knowledge, confirming its utility in real-world healthcare scenarios.
Conclusion We present a novel algorithm that enhances interpretability in survival analysis by revealing the impact of longitudinal features on survival outcomes.
References
Bento, João, Pedro Saleiro, André F Cruz, Mário AT Figueiredo, and Pedro Bizarro (2021). “Timeshap: Explaining recurrent models through sequence perturbations”. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp. 2565–2573.
Bleistein, Linus, Van-Tuan Nguyen, Adeline Fermanian, and Agathe Guilloux (2024). “Dynamic Survival Analysis with Controlled Latent States”. In: Forty-first Interna- tional Conference on Machine Learning.
Geller, Jay (2023). “Food and Drug Administration Published Final Guidance on Clinical Decision Support Software”. In: Journal of Clinical Engineering 48.1, pp. 3–7. Johnson, Alistair EW et al. (2016). “MIMIC-III, a freely accessible critical care database”. In: Scientific data 3.1, pp. 1–9.
Lee, Changhee, Jinsung Yoon, and Mihaela Van Der Schaar (2019). “Dynamic-deephit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data”. In: IEEE Transactions on Biomedical Engineering 67.1, pp. 122– 133.
Panigutti, Cecilia et al. (2023). “The role of explainable AI in the context of the AI Act”. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1139–1150.
posters-monday: 30
Prognostic Models for Recurrent Event Data
Victoria Watson1,2, Laura Bonnett2, Catrin Tudur-Smith2
1Phastar, United Kingdom; 2University of Liverpool, Department of Health Data Sciences
Background / Introduction
Prognostic models predict outcome for people with an underlying medical condition. Many conditions are typified by recurrent events such as seizures in epilepsy. Prognostic models for recurrent events can be utilised to predict individual patient risk of disease recurrence or outcome at certain time points.
Methods for analysing recurrent event data are not widely known or applied in research. Most analyses use survival analysis to consider time until the first event, meaning subsequent events are not analysed and key information is lost. An alternative is to analyse the event count using Poisson or Negative Binomial regression. However, this ignores the timing of events. Recurrent event methods analyse both the event count and the timing between events meaning key information is not discarded.
Methods
A systematic review on methodology for analysing recurrent event data in prognostic models was conducted. Results from this review identified methods commonly used in practice to analyse recurrent event data. A simulation study was then conducted which evaluated the most frequently identified methods in the systematic review with respect to the underlying event rate. The event rates were categorised into low, medium and high based on data collected in the systematic review to best represent a variety of chronic conditions or illnesses where recurrent events are typically seen.
Results
The simulation study provided evidence to determine if model choice may be influenced by the underlying event rate in the data. This was assessed by deriving statistics suitable for recurrent event methods to assess the model fit and predictive performance of the recurrent event methods. These statistics were used to determine if certain methods identified tended to perform better than others under different scenarios.
Conclusion
Results from the systematic review and simulation study will be presented including a summary of each method identified. The results will be the first step towards a toolkit for future analysis of recurrent event data.
posters-monday: 31
Unlocking diagnosis code for longitudinal modeling through representations from large language models
Fabian Kabus1, Maren Hackenberg1, Moritz Hess1, Simon Ging3, Maryam Farhadizadeh2, Nadine Binder2, Harald Binder1
1Institute of Medical Biometry and Statistics (IMBI), Medical Center, University of Freiburg; 2Institute of General Practice/Family Medicine, Medical Center, University of Freiburg; 3Department of Computer Science, Faculty of Engineering, University of Freiburg
Background: In longitudinal data, there often is a multitude of diagnosis codes, such as ICD-10 codes, in particular when considering clinical routine data. Incorporating a large number of codes can be challenging, as treating them as categorical variables in statistical models leads to a large number of parameters, and also one-hot encoding, often used in machine learning, provides no solution to this. In addition, the actual meaning of the diagnoses is not captured. There, large language models might provide a solution, as they capture meaning and can provide alternative numerical representations via their embeddings. We consider such an approach specifically in the context of longitudinal modeling with transformer neural networks.
Methods: We generate embeddings using pre-trained language models and refine them during training in the longitudinal prediction task. Specifically, we compare two embedding strategies, sentence embeddings from SBERT and attention-weighted pooled hidden states from LLaMa, with one-hot encoding as a baseline. Additionally, we investigate different text generation strategies, using either standard ICD-10 descriptions or expanded descriptions generated via prompt-engineered large language models. To evaluate the structure of the learned embeddings, we apply TriMap for dimensionality reduction, assessing whether language-based embeddings capture more coherent relationships between ICD-10 codes.
Results: On a clinical routine dataset, models initialized with language-based embeddings derived from sentence-level representations outperform the one-hot encoding baseline in prediction performance, while embeddings extracted from the larger autoregressive model do not show a consistent improvement. Visualization using TriMap suggests that sentence-level embeddings lead to more coherent clustering of ICD-10 codes, capturing their semantic relationships more effectively. Attention analysis indicates that the transformer utilizes these structured embeddings to enhance prediction performance. Additionally, results suggest that incorporating domain-specific prompt engineering further refines embedding quality, leading to more distinct and clinically informative code representations.
Conclusion: Integrating textual descriptions into ICD-10 embeddings enhances prediction modeling by providing a structured initialization that incorporates domain knowledge upfront. As large language models continue to evolve, this approach allows advancements in language understanding to be leveraged for longitudinal medical modeling.
posters-monday: 32
Machine Learning Perspectives in Survival Prediction Model Selection: Frequentist vs. Bayesian Approach
Emanuele Koumantakis1, Valentina Bonuomo1, Selene Grano2, Fausto Castagnetti3, Carlo Gambacorti-Passerini4, Massimo Breccia5, Maria Cristina Miggiano6, Chiara Elena7, Matteo Pacilli8, Isabella Capodanno9, Tamara Intermesoli10, Monica Bocchia11, Alessandra Iurlo12, Fabio Ciceri13, Fabrizio Pane14, Federica Sorà15, Barbara Scappini16, Angelo Michele Carella17, Elisabetta Abruzzese18, Sara Galimberti19, Sabrina Leonetti Crescenzi20, Marco de Gobbi1, Giuseppe Saglio1, Daniela Cilloni1, Carmen Fava1, Paola Berchialla1
1Department of Clinical and Biological Sciences, University of Torino, Torino, Italy; 2Department of Molecular Biotechnologies and Health Sciences, University of Torino, Torino, Italy; 3Department of Medical and Surgical Sciences, Institute of Hematology "Seragnoli", University of Bologna, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy; 4Department of Medicine and Surgery, University Milano-Bicocca, Monza, Italy; 5Department of Translational and Precision Medicine, Az. Policlinico Umberto I-Sapienza University, Rome, Italy; 6Hematology Department, San Bortolo Hospital, Vicenza U.O.C. di Ematologia, Vicenza, Italy; 7U.O.C. Ematologia 1, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; 8U.O.C. Ematologia, Grande Ospedale Metropolitano Bianchi-Melacrino-Morelli, Reggio Calabria, Italy; 9Hematology, AUSL Reggio Emilia, Reggio Emilia, Italy; 10Hematology and Bone Marrow Transplant Unit, Azienda Socio-Sanitaria Regionale Papa Giovanni XXIII, Bergamo, Italy; 11Hematology Unit, Azienda Ospedaliera Universitaria Senese, University of Siena, Siena, Italy; 12Hematology Division, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy; 13Hematology and Bone Marrow Transplantation Unit, IRCCS San Raffaele Hospital, Milan, Italy; 14Hematology and Hematopoietic Stem Cell Transplant Center, Department of Medicine and Surgery, University of Naples Federico II, Naples, Italy; 15Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy; 16Hematology Unit, Azienda Ospedaliero-Universitaria Careggi, Florence, Italy; 17Hematology and Bone Marrow Transplant Unit, IRCCS Fondazione Casa Sollievo della Sofferenza San Giovanni Rotondo, Foggia, Italy; 18Department of Hematology S. Eugenio Hospital, Rome, Italy; 19Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy; 20Division of Hematology, Azienda Ospedaliera San Giovanni Addolorata, Rome, Italy
INTRODUCTION
Predictive model selection remains one of the most challenging and critical tasks in medical statistics, particularly in survival analysis or high–dimensional prediction settings. The Cox proportional hazards model is widely used for its simplicity and interpretability but struggles with high-dimensional data, multicollinearity, and overfitting [1]. Stepwise selection methods, while intuitive, suffer from instability, inflated type I error rates, and a tendency to produce overly optimistic models due to their reliance on multiple hypothesis testing. Alternatives like adaptive Lasso and Bayesian Model Averaging (BMA) incorporate regularization and probabilistic frameworks to improve model performance [2,3]. This study focuses on identifying predictive factors for treatment restart in patients who discontinued tyrosine kinase inhibitor (TKI) therapy, using data from the Italy-TFR longitudinal study.
METHODS
The Italy-TFR study is a multicenter observational study evaluating treatment-free remission (TFR) feasibility in chronic myeloid leukemia (CML). We included patients who achieved deep molecular response, discontinued TKI, and had at least one year of follow-up. Survival analysis considered time from TKI discontinuation to restart or last follow-up. Since different model selection strategies can yield different results, we compared Cox proportional hazards model including the whole set of predictors (considered as baseline model), bidirectional stepwise model selection, Multimodel Inference (MMI), adaptive Lasso, and BMA as predictive models of the risk of restarting treatment.
RESULTS
Among 542 patients from 38 centers, the predictive value of nine independent variables was analyzed. MMI, adaptive Lasso, and BMA identified TKI treatment duration as the most significant predictor of treatment resumption. Stepwise regression, in contrast, selected three variables: duration of therapy, generation of last TKI discontinued, and Sokal Score. The Bayesian Information Criterion (BIC) was lower for MMI, adaptive Lasso, and BMA (2131.242) compared to stepwise regression (2137.84), suggesting better model performance.
CONCLUSIONS
MMI, Adaptive Lasso, and BMA outperformed stepwise regression based on BIC, identifying TKI treatment duration as the most significant predictor. These findings show the advantages of regularization and probabilistic frameworks in improving model stability and interpretability, highlighting their great potential for predictive modeling.
REFERENCES
-
Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66(3), 411-421.
-
Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418-1429.
-
Hoeting et al. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382-417.
posters-monday: 33
Non-parametric methods for comparing survival functions with censored data: Exhaustive simulation of all possible beyond-observed censoring scenarios and computational analysis
Lubomír Štěpánek1,2, Ondřej Vít2, Lubomír Seif2
1First Faculty of Medicine of Charles University (Czech Republic); 2Faculty of Informatics and Statstics of Prague University of Economics and Business (Czech Republic)
Background / Introduction: Comparing survival functions, which describe the probability of not experiencing an event by a given time in two groups, is one of the fundamental tasks in survival analysis. Standard methods, such as the log-rank test, Wilcoxon test, and score-rank test of Cox’s proportional hazards model and its variants, may rely on statistical assumptions, including sufficient sample size for asymptotic validity or even proportional hazards. However, these assumptions may not always hold, limiting their applicability. This study introduces a non-parametric alternative for comparing survival functions that minimizes assumptions and offers a direct computation of the p-value. Methods: Unlike traditional approaches requiring hazard function estimation, our method models all possible scenarios based on observed data, encompassing cases where survival functions differ at least as much as observed. This exhaustive scenario-based modeling enables direct p-value calculation without reliance on asymptotic approximations. Given that censoring introduces additional uncertainty, we address its impact by considering a comprehensive (and often large) set of all potential survival function differences. Due to the computational intensity of enumerating all scenarios (coming from observed censoring), we compare a fully exhaustive computational approach with a Monte Carlo simulation-based method. The performance of these approaches is evaluated against the log-rank test, particularly in terms of Type I error rate and computational efficiency. Additionally, we analyze the asymptotic time complexity of both proposed approaches. Results: Based on simulation outputs, our method reduces the Type I error rate compared to the log-rank test, making it particularly useful in settings requiring robustness against false positives. The exhaustive approach ensures an exact p-value calculation but is computationally demanding. The Monte Carlo-based approximation significantly improves computational efficiency while maintaining acceptable accuracy, making it a viable alternative for large datasets. Our complexity analysis highlights the trade-offs between computational cost and statistical precision. Conclusion: The proposed non-parametric method provides an alternative to traditional survival function comparison techniques. A novel aspect of our approach is the calculation of all possible scenarios for censored observations when estimating the counts of survival functions that are at least as different as observed. By directly evaluating all plausible scenarios, it reduces reliance on assumptions while improving Type I error rate control. The Monte Carlo approximation offers a computationally feasible alternative, retaining statistical robustness in practical applications. These findings support the use of assumption-minimized approaches in survival analysis, particularly in studies where conventional methods may be restrictive.
posters-monday: 34
Tree-based methods for length-biased survival data
Jinwoo Lee1, Jiyu Sun1, Donghwan Lee2
1Integrated Biostatistics Branch, National Cancer Center, Republic of Korea; 2Department of Statistics, Ewha Womans University, Republic of Korea
Background: Left truncation in prevalent cohort studies, where only individuals who have experienced an initiating event (such as disease onset) and survived until study enrollment are observed, leads to length-biased data when the onset follows a stationary Poisson process. Although the existing survival trees and survival forests for left-truncated right-censored (LTRC) data can be applied to estimate survival functions, they may be inefficient for analyzing length-biased right-censored (LBRC) data.
Methods: We proposed tree-based methods for LBRC data by adapting the conditional inference tree (CIT) and forest (CIF) frameworks. Unlike LTRC-based approaches, which use log-rank scores from a conditional likelihood, our methods employed log-rank scores derived from the full likelihood, which is valid under LBRC settings. To improve numerical stability and computational efficiency, we adopted a closed-form cumulative hazard function (CHF) estimator for log-rank scores as an alternative to the nonparametric maximum likelihood estimator.
Results: Simulation studies indicated that LBRC-CIT achieves a higher recovery rate of the true tree structure in LBRC data than conventional LTRC-CIT, with particularly notable benefits in small-sample settings. Under proportional hazards and complex nonlinear LBRC scenarios, LBRC-CIF offers more accurate predictions than LTRC-CIF. We illustrated the application of our methods to the estimation of survivorship using a dataset of lung cancer patients with COPD.
Conclusions: By using full-likelihood-based log-rank scores and a closed-form CHF estimator, our proposed LBRC-CIT and LBRC-CIF methods enhance both statistical efficiency and computational stability for length-biased right-censored data.
posters-monday: 35
Evaluating different pragmatic approaches for selecting the truncation time of the restricted mean survival time in randomized controlled trials
Léa Orsini1,2, Andres Cardona2, Emmanuel Lesaffre3, David Dejardin2, Gwénaël Le Teuff1
1Oncostat U1018, Inserm, University Paris-Saclay, Villejuif, France; 2Product Development, Data Sciences, F. Hoffmann-La Roche AG, Basel, Switzerland; 3I-Biostat, KU-Leuven, Leuven, Belgium
Introduction:
The difference in restricted mean survival time between two arms (dRMST) is a meaningful measure of treatment effect in randomized controlled trials (RCTs) for time-to-event data, especially with non-proportional hazards. Choosing the time window [0,τ] is important to avoid any misinterpretation. Correct RMST estimation can be performed up to τ defined as the last follow-up time under a mild condition on the censoring distribution [1]. However, extensive comparisons between the different ways of selecting τ are still needed to address this important choice in practical settings. The objective is to empirically evaluate them through RCTs.
Methods:
Four techniques for choosing τ are evaluated: (a) 90th or 95th percentile of event times, (b) 90th or 95th percentile of follow-up times, (c) largest time with standard error of survival estimate within 5%, 7.5%, or 10%, and (d) minimum of the maximum follow-up times in each arm. τ-RMST estimations were performed using three frequentist methods (Kaplan-Meier estimator, pseudo-observations-based model, and Cox-based model) and two Bayesian methods (non-parametric model with a mixture of Dirichlet processes prior and pseudo-observations-based model), some of them allowing for covariate adjustments. For evaluation, we used three RCTs (IPSOS n=453, IMpower110 n=554, IMpower133 n=403) comparing immunotherapy with chemotherapy in lung cancer, with delayed treatment effects.
Results:
The range of τ calculated from the different techniques exceeded two years for IPSOS and IMpower110, and one year for IMpower133, impacting the Kaplan-Meier-based RMST estimation and its variance. With a delayed treatment effect, higher τ provides higher dRMST estimates with larger variances. Approaches (a) and (b) provide smaller τ often leading to immature conclusions while (d) results in an increased variability that can be mitigated in some cases by adjusting for appropriate covariates. Approach (c) emerged as a good candidate, balancing statistical precision with clinical relevance. All RMST estimators (frequentist and Bayesian) provided similar results.
Conclusion:
There is so far no consensus on defining τ, highlighting the need for clearer guidelines and greater transparency. Ideally, τ should be defined a priori with a clinical rationale. If not, data-driven approaches can be employed. Based on our findings, we recommend the (c) proposal as it ensures sufficient representation of patients at risk. Establishing standardized, clinically relevant practices for defining τ will enhance the applicability and reproducibility of RMST analyses in future research.
[1] Lu Tian et al. On the Empirical Choice of the Time Window for Restricted Mean Survival Time (2020), Biometrics, 76(4): 1157–1166.
posters-monday: 36
Identifying risk factors for hospital readmission in home-based care: a study from a monographic paediatric cancer centre
Sara Perez-Jaume1,2, Maria Antònia Colomar-Riutort2, Anna Felip-Badia1, Maria Fabregat1, Laura Andrés-Zallo1
1BiMaU, Sant Joan de Déu Pediatric Cancer Center Barcelona, Spain; 2Department of Basic Clinical Practice, Universitat de Barcelona, Spain
Introduction
Paediatric cancer is a group of rare malignancies that occur in childhood and adolescence. This potentially life-threatening disease often requires aggressive therapies, such as chemotherapy or immunotherapy. The nature of these interventions requires patients to be hospitalised multiple times. In this context, a monographic paediatric cancer centre in the south of Europe initiated a home-based hospitalisation programme for paediatric patients diagnosed with cancer, which potentially offers relevant benefits (enhanced quality of life and reduced economic costs). However, a concern with home-based hospitalisations is the occurrence of adverse events, such as the need for hospital readmission during the hospitalisation at home, which is considered an unfavourable outcome in home-based care. Data from this home hospitalisation programme are available from its foundation in November 2021 until June 2024. The aim of this work is to use these data to identify risk factors for hospital readmission during the home-based hospitalisation.
Methods
The dataset used in this project poses a statistical challenge since patients may be hospitalised at home more than once. Appropriate methods for repeated measures are then required for a proper analysis. Since the outcome of interest is the binary variable "need for hospital readmission during the home hospitalisation", we used Generalized Estimating Equations (GEE) and Generalized Linear Mixed Models (GLMM) with a logit link function (marginal/subject-specific approaches). From these models, we derive the corresponding odds ratios. We applied a variable selection algorithm to identify risk factors for hospital readmission.
Results
Data consist of the 380 home-based hospitalizations from 156 paediatric patients previously diagnosed with cancer included in the home hospitalisation programme. Most patients were male (59%) and the median distance from hospital to the place of home-based hospitalisation was 8 km. Both GEE and GLMM approaches led to a final model with four variables; being three of them significantly associated with the outcome. Among the reasons for the home-based hospitalisation, we found that hydration-intended hospitalisations reduced the odds of hospital readmission compared to the rest of reasons considered. Moreover, lower neutrophil counts increased the odds of hospital readmission. The occurrence of incidences with the intravenous route also increased the odds of hospital readmission.
Conclusion
We identified reason of hospitalisation, neutrophil count and the occurrence of incidences with the intravenous route as risk factors for hospital readmission in the context of home-based care in paediatric oncology, which might influence physicians' decisions about the management of these patients at home.
posters-monday: 37
A web-application for predicting Serious Adverse Event for guiding the enrollment procedure in Clinical Trials with Machine Learning Methods
Ajsi Kanapari, Corrado Lanera, Dario Gregori
Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Padova, Italy, Italy
Background. Serious Adverse events (SAEs) refer to the undesired occurrence of an event that derives from a drug reaction with direct consequences on patients’ life and compromising study validity and safety. On the matter there is room for improvement, that can be guided by the usage of Machine Learning (ML) with the aim of identifying subgroups of patients with meaningful combination of clinical features linked to SAEs, for limiting their frequency, with the usage of probabilistic methods that rely on clinical features rather than on specific dichotomized variables However, they are explored often through post-hoc analysis and not directly informing the design of Clinical Trials, due to their complex application in a dynamic context, which makes necessary the support of electronic applications.
Objective. The aim of this work is the development of a framework and a web-application in accordance with FDA guidance on enrichment strategies1 for reducing trial variability, that implements ML models to allow early detection. It employs ML models to identify patients at high risk of SAEs, enhancing early detection and informing inclusion/exclusion decisions. Historical data of early phase trials are used to train predictive models that estimate SAE probabilities for new participants, with inclusion decisions guided by a predefined decision rule.
Results. Simulations and the application on a case study assess the operational characteristics of the proposed framework, with the aim to maintain balance between the reduction of SAEs incidence, algorithm accuracy and maintaining generalizability of the study. Due to reduced variability consequent to patient exclusion, and most importantly the reduction of drop-outs lead to having power not only maintained but also increased, if the model provides a high performance, however issues are found particularly if low specificity is involved that would cause the unnecessary exclusion of low risk of SAEs subjects. On the positive extend, the algorithm provides reduced standard errors and more precise estimates of treatment effect.
posters-monday: 38
Assessing the Overall Burden of Adverse Events in Clinical Trials: Approaches and Challenges
Seid Hamzic, Hans-Joachim Helms, Eva Rossman, Robert Walls
F. Hoffmann-La Roche Ltd, Basel, Switzerland
Measuring the total toxicity or adverse event (AE) burden of a therapeutic intervention is a longstanding challenge in clinical research. While trial reports commonly provide the incidence of individual AEs or a summary of the proportion of patients experiencing serious, e.g. grade ≥3 AEs, these metrics do not necessarily capture the global burden that they may impose on patients. Various approaches to consolidate AEs into a single composite score, such as summing CTCAE grades, have been proposed. However, these efforts face substantial methodological and interpretational hurdles.
This work offers a theoretical exploration of how AE burden could be conceptualized, quantified, and used when comparing two or more therapies. We review the limitations of incidence-based reporting that fails to capture interdependence or cumulative effects of multiple, possibly lower-grade AEs. We then discuss the existing proposals for composite toxicity scoring, noting the difficulties in weighting different AEs, some of which might be more tolerable to patients despite a higher grade. Additionally, current standard data collection approaches might lack the granularity necessary to distinguish differences in patient experience or quality of life.
We argue that while composite scores can offer a more holistic view of the total harm posed by a drug, they risk oversimplification and obscuring the clinical relevance of specific, important toxicities. Ultimately, this highlights a need for more robust data collection and careful methodological development that balances interpretability and accuracy in the comparison of AE burden across treatments.
posters-monday: 39
Matching-adjusted indirect comparison of endoscopic and craniofacial resection for the treatment of sinonasal cancer invading the skull base
Florian Chatelet1,2,3, Sylvie Chevret1,2, MUSES collaborative group3,4,5, Philippe Herman1,3, Benjamin Verillaud1,3
1Université Paris Cité, France; 2SBIM Hôpital Saint Louis APHP Paris, ECSTRA team; 3ENT department Hôpital Lariboisière APHP Paris; 4“ASST Spedali Civili di Brescia,” University of Brescia, Brescia, Italy;; 5“Ospedale di Circolo e Fondazione Macchi,” University of Insubria, Varese, Italy
Background
In surgical oncology, new techniques often replaces established methods without direct comparative studies, making it difficult to assess their actual effectiveness. This is particularly relevant for endoscopic endonasal approaches (EEA), which have progressively supplanted craniofacial resection (CFR) for sinonasal cancers invading the skull base. As a result, contemporary CFR-treated cohorts have become too small for direct comparisons, and randomised trials remain unfeasible due to ethical and logistical constraints. Matching-adjusted indirect comparison (MAIC) offers a statistical method to indirectly compare a contemporary individual-patient dataset (EEA) with a historical aggregate dataset (CFR), adjusting for confounding variables.
Methods
We conducted a MAIC using individual patient data (IPD) from the MUSES cohort (EEA-treated patients) and aggregated data from Ganly et al. historical CFR cohort, including patients with skull base invasion. Key prognostic variables—including age, tumour histology, orbital and brain invasion, prior radiotherapy or surgery—were used to weight the MUSES cohort to match the CFR cohort.
Primary and secondary endpoints included overall survival (OS), recurrence-free survival (RFS), perioperative mortality, surgical margins, and complication rates. Survival analyses were conducted using Kaplan-Meier estimations, log-rank tests, and Cox proportional hazards models, with bootstrap resampling for confidence interval estimation.
Results
A total of 724 EEA-treated and 334 CFR-treated patients were analysed. Before MAIC, EEA was associated with significantly improved OS (HR= 2.33, 95% CI= 1.88–2.87, p< 0.001), and this benefit persisted after adjustment (HR= 1.93, 95%CI= 1.60–2.34, p< 0.001). RFS was initially higher in the EEA cohort (HR= 1.39, 95%CI= 1.14–1.69, p= 0.001) but was no longer statistically significant after adjustment (HR= 1.06, 95%CI= 0.91–1.23, p= 0.63). Perioperative mortality and complications were significantly lower in the EEA cohort compared to CFR. Clear resection margins were achieved in 79% of EEA cases and 71% of CFR cases (OR= 0.67, 95%CI= 0.50–0.90, p= 0.008), but this difference was no longer significant after MAIC adjustment (OR= 1.15, 95%CI= 0.93–1.40, p= 0.36).
Conclusion
This study highlights the potential utility and limitations of MAIC in addressing selection biases in non-randomised comparisons. OS remained superior in the EEA group after adjustment, while RFS was similar between EEA and CFR. Perioperative mortality and complications were significantly higher with CFR, although both techniques achieved similar resection margin rates after adjustment. These findings support endoscopic surgery as a first-line approach for sinonasal cancers invading the skull base, provided it is technically feasible and performed in expert centres.
posters-monday: 40
Information borrowing in phase II randomized dose-ranging clinical trials in oncology
Guillaume Mulier1,2, Vincent Lévy3, Lucie Biard1,2
1Inserm U1342, team ECSTRRA. Saint-Louis Research Institute, Paris, France; 2APHP, Department of Biostatistics and Medical Information, Saint-Louis hospital, Paris, France; 3APHP, Clinical research department, Avicenne hospital, Paris, France
Introduction
Over the past decades, the emergence of therapeutics such as immunotherapies and targeted therapies has challenged conventional trial designs, particularly single-arm studies. Selecting a single dose from phase I trials with limited follow-up, typically based solely on toxicity endpoints, has often resulted in suboptimal drug dosages. As a result, dose optimization in oncology is now encouraged by international initiatives such as the FDA’s Project Optimus, the Optimal Cancer Care Alliance, and the Patient-Centered Dosing Initiative. This study was motivated by the case of Ibrutinib in chronic lymphocytic leukemia, where the initially approved dose of 420 mg/day—determined through conventional phase I designs based on the maximum tolerated dose—was later found to achieve comparable response rates at lower doses. This highlights the potential value of dose-ranging phase II studies in oncology.
Assuming that borrowing information across doses can enhance statistical power, our objective is to compare various strategies for information borrowing in phase II randomized trials involving multiple doses of the same drug.
Methods
The backbone phase II design considered is the Bayesian Optimal Design (BOP2), adapted for multi-arm settings with co-primary binary endpoints and interim analyses. This design employs a multinomial conjugate distribution within a Bayesian framework, with decision rules for stopping due to futility and/or toxicity based on posterior probabilities.
We adapted and compared different information borrowing approaches for estimating efficacy and toxicity: (i) power prior, (ii) incorporation of information from stopped arms, (iii) Bayesian hierarchical modeling, (iv) Bayesian logistic regression.
These methods were applied alongside BOP2 decision rules. A simulation study was conducted to assess the operating characteristics of each approach in a hypothetical randomized dose-ranging trial, evaluating efficacy and toxicity against reference values.
Results
Our findings indicate that power prior, when applied without dynamic adaptation, is unsuitable as it increases false positive rates. Bayesian hierarchical modeling shrinks estimates toward a common mean, reducing variance but also inflating false positive rates. In contrast, Bayesian logistic regression provides a balanced trade-off, enhancing power to some extent while maintaining a lower false positive rate.
Conclusion
Bayesian logistic regression, modeling both dose-toxicity and dose-efficacy relationships, combined with BOP2 decision rules, offers a promising approach for borrowing information in dose-ranging studies with a limited number of doses. However, designs without information borrowing provide stricter false positive control and should also be considered.
posters-monday: 41
Information borrowing in Bayesian clinical trials: choice of tuning parameters for the robust mixture prior
Vivienn Weru1, Annette Kopp-Schneider1, Manuel Wiesenfarth3, Sebastian Weber2, Silvia Calderazzo1
1German Cancer Research Center (DKFZ), Germany; 2Novartis Pharma AG, 4002 Basel, Switzerland; 3Cogitars GmbH, Heidelberg, Germany
Introduction
Borrowing external data for use in a current study has emerged as an attractive research area with potential to make current studies more efficient especially where recruitment of patients is difficult.
Methods
Bayesian methods provide a natural approach to incorporate external data via specification of informative prior distributions. Potential heterogeneity between external and current trial data, however, poses a significant challenge in this context. We focus on the robust mixture prior, a convex combination of an informative prior with a robustifying component, that allows to borrow most when the current and external data are observed to be similar and least otherwise. This prior requires the choice of three additional quantities: the mixture weight, and the mean and dispersion of the robust component. Some choices of these quantities may, however, lead to undesirable operating characteristics. We systematically investigate this impact across combinations of robust component parameters and weight choices in one-arm and hybrid-control trials, where in the latter, current control data is informed by external control data. An alternative functional form for the robust component is also investigated.
Results
For some parameter choices, losses may be still unbounded despite the use of dynamic borrowing for both testing and estimation, i.e. Type I error (TIE) rate may approach 1 while MSE may increase unconstrained. In the hybrid-control setting, the parameter choices further impact the size and shift of the “sweet spot”, where control of TIE rate and gain in power is observed. We observe that for such a sweet spot, the width negatively correlates with the maximum power gain. We further explore behavior of the mixture prior when adopting a heavy tailed distribution for the robust component, which is able to cap TIE rate and MSE inflation.
Conclusion
The choice of the parameters of the robust component of the mixture prior as well as the mixture weights is non-trivial. All three parameter choices are influential, acting together and therefore their impact needs to be assessed jointly. We provide recommendations for these choices as well as considerations to keep in mind when evaluating operating characteristics.
posters-monday: 42
A Bayesian approach to decision making in early development clinical trials : An R solution.
Audrey Te-ying Yeo
Independant
Early clinical trials play a critical role in Oncology drug development. The main purpose of early trials is to determine whether a novel treatment demonstrates sufficient safety and efficacy signals to warrant further investment (Lee & Liu, 2008). The new open source R package phase1b (Yeo et al, 2024) is a flexible toolkit that calculates many properties to this end, especially in the oncology therapeutic area. The primary focus of this package is on binary endpoints. The benefit of a Bayesian approach is the possibility to account for prior data (Thall & Simon, 1994) in that a new drug may have shown some signals of efficacy owing to its proposed mode of action, or similar activity based on prior data. The concept of the phase1b package is to evaluate the posterior probability that the response rate with a novel drug is better than with the current standard of care treatment in early phase trials such as Phase I. The phase1b package provides a facility for early development study teams to decide on further development of a drug either through designing for phase 2 or 3, or expanding current cohorts. The prior distribution can incorporate any previous data via mixtures of beta distributions. Furthermore, based on an assumed true response rate if the novel drug was administered in the wider population, the package calculates the frequentist probability that a current clinical trial would be stopped for efficacy or futility conditional on true values of the response, otherwise known as operating characteristics. The intended user is the early clinical trial statistician in the design and interim stage of their study and offers a flexible approach to setting priors and weighting.
posters-monday: 43
Designing Clinical Trials in R with rpact and crmPack
Daniel Sabanés Bové1, Gernot Wassmer2, Friedrich Pahlke2
1RCONIS, Taiwan; 2rpact GbR, Germany
The focus of this poster will be on clinical trial designs and their implementation in R. We will present rpact, which is a fully validated, open source, free-of-charge R package for the design and analysis of fixed sample size, group-sequential, and adaptive trials. We will summarize and showcase the functionality of rpact.
In addition, we will also briefly present crmPack, which is an open source, free-of-charge R package for the design and analysis of dose escalation trials.
Together, rpact and crmPack enable the implementation of a very wide range of clinical trials. The poster presentation aims to increase the visibility of the two open source packages in the clinical biostatistics community, and allow for discussions about future developments.
posters-monday: 44
Leveraging on historical controls in the design and analysis of phase II clinical trials
Zhaojin Chen1, Ross Andrew Soo2,3, Bee Choo Tai1,4
1Saw Swee Hock School of Public Health, National University of Singapore, Singapore; 2Department of Haematology-Oncology, National University Cancer Institute Singapore, Singapore; 3Cancer Science Institute of Singapore, National University of Singapore, Singapore; 4Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Background
In oncology, phase II trials are commonly used to screen novel agents for solid tumour by following a single-arm design. All patients receive a concurrent treatment (CT) and their overall objective response rate is compared with some pre-defined threshold. However, evidence has suggested that such a design often results in false claims of efficacy. This not only causes waste in time and resources but is also of great ethical concern for trial participants. This study thus aims to improve the current design by incorporating a historical control (HC) arm for more appropriate treatment evaluation.
Methods
For treatment evaluation using HCs, major challenges involve imbalance in baseline characteristics, unmeasured baseline variables and temporal drift of disease outcomes. To tackle these problems, we adopted three main statistical approaches, namely regression adjustment (RA), inverse probability of treatment weighting (IPTW_PS) and matching (MC_PS) based on propensity score, to reduce potential confounding bias when evaluating the effect of treatment. Simulation studies were conducted for null, small, moderate and large treatment effect based on a binary disease outcome, assuming sample sizes of 100 and 200 with equal treatment allocation. Bias, mean squared error (MSE), coverage probability, type I error and power were used to evaluate their performances. These methods were then applied to the PLASMA phase II trial using HCs from the previously completed AURA 3 phase III trial.
Results
Simulation results showed that the RA method slightly overestimates, whereas the IPTW_PS method slightly underestimates treatment effect as it goes from null to large. Bias of the MC_PS method can be in either direction and reduces in magnitude when more HCs are available. As the level of imbalance in baseline characteristics increases, the bias and MSE increase and power decreases. All three methods are sensitive to unmeasured baseline confounders, but the RA method appears to be more sensitive to model misspecification as compared to the propensity score based methods.
Conclusion
Consistent with existing literature, our study found that phase II trials incorporating HCs should be recommended for diseases with well-known mechanisms. Moreover, when there are a large number of HCs available, the MC_PS generally performs better than the other two methods with desirable bias, MSE, type I error and power.
posters-monday: 45
Design of a research project to evaluate the statistical utility after transformation of a CDISC database into OMOP format
Claire Castagné1, Amélie Lambert1, Jacek Chmiel2, Alberto Labarga3, Eric Boernert3, Lukasz Kaczmarek3, Francois Margraff3, David Pau1, Camille Bachot1, Thomas Stone3, Dimitar Toshev3
1Roche, France; 2Avenga, Germany; 3F. Hoffmann-La Roche AG
Interoperability between databases is an important issue, to facilitate analyses from multiple sources. The OMOP (Observational Medical Outcomes Partnership) format is increasingly used in Europe, particularly in France. After a targeted bibliographical review of data sources and standard formats used, no article precisely assesses the loss of data and/or information following transformation to the OMOP format. The aim of this work is to assess the statistical and scientific usefulness of the OMOP format.
An observational study in early breast cancer was conducted in 2019. The database is currently in CDISC SDTM format.
The first step of the project involves transforming the SDTM database into OMOP format.
In the second step, a statistical analysis of the data in OMOP format will be carried out.
In the third step, all the results will be compared with the initial results, using quality indicators to assess the loss of information:
-
indicators regarding transformation to OMOP format, such as the number of observations or variables not transformed,
-
indicators regarding the number of statistical tables not generated,
-
indicators regarding the reliability (no loss of information, partial loss, complete loss) of results obtained by comparing SDTM vs OMOP results
315 patients were included in the study, the database structure is made of 7 CDISC domains containing 73 variables: 25 continuous and 48 categorical regarding patient, disease, surgery and treatments characteristics.
Age at treatment initiation was 52.2 (11.8) years, distribution of SBR grade evaluating disease severity was: grade III 50.7% of patients, II 45.7% and I 1.6%.
40.3% of patients met the primary outcome, evaluated at surgery by the pathological complete response.
OMOP database will start in February 2025 and results will be available at the congress: descriptive analyses (univariate, bivariate), correlation matrices, modeling and survival analysis will first be performed on the raw study data (SDTM format), then these same analyses will be reproduced on the OMOP datasets. The usual statistical indicators (percentage of missing data, data dispersion, etc.) and the maintenance of relationships between variables will be used to quantify the differences observed between the databases in the different formats.
This work will make it possible to assess the statistical usefulness remaining after the switch to OMOP format, thanks to a synthesis of indicators, and to ensure the reproducibility of classic statistical analyses.
At the conference, the results/indicators observed on the OMOP format database will be presented and discussed in relation to the initial results.
posters-monday: 46
Introducing CAMIS: an open-source, community endeavor for Comparing Analysis Method Implementations in Software
Yannick Vandendijck1,4, Christina Fillmore2,4, Lyn Taylor3,4
1J&J Innovative Medicine, Belgium; 2GSK, UK; 3Parexel, UK; 4on behalf of the CAMIS working group
Try this in R: > round(2.5), and it will give the result of 2.
Try this in SAS: > data rounding; x = round(2.5); run; and it will give the result of 3.
Seriously?
Introduction:
Statisticians using multiple statistical software (SAS, R, Python) will have found differences in analysis results that warrant further exploration and justification. These possible discrepancies across statistical software for a similar analysis can cause unease when submitting these results to a regulatory agency, as it is uncertain if the agency will view these differences as problematic. This becomes increasingly important since the pharma industry is more and more turning to open-source software like R to handle complex data analysis, drawn by its flexibility, innovation, added value and cost-effectiveness.
Knowing the reasons for differences (different methods, options, algorithms, etc.) and understanding how to mimic analysis results across software is critical to the modern statistician and subsequent regulatory submissions.
CAMIS:
This talk will introduce the PHUSE DVOST CAMIS (Comparing Analysis Method Implementations in Software) project. The aim of CAMIS is to investigate and document differences and similarities between different statistical software (SAS, R, Python) to help ease the transitions to new languages by providing comparison and comprehensive explanations. CAMIS contributes to the confidence in reliability of open-source software by understanding how analysis results can be matched perfectly or knowing the source of any discrepancies.
In this talk, I will discuss the objectives of the CAMIS project, identify some key results on differences and similarities between SAS and R, show how we collaborate on CAMIS across companies/ industries/ universities in the open-source community.
Conclusion:
In the transition from proprietary to open-source technology in the industry, CAMIS can serve as a guidebook to navigate this process.
https://psiaims.github.io/CAMIS/
https://github.com/PSIAIMS/CAMIS
posters-monday: 47
Assessing covariates influence on cure probability in mixture cure models using martingale difference correlation
Blanca E. Monroy-Castillo, M. Amalia Jácome, Ricardo Cao
Universidade da Coruña, Spain
Background: Cure models analyze time-to-event data while accounting for a subgroup of individuals who will never experience the event. A fundamental question in these models is whether the cure probability is influenced by specific covariates. However, formal statistical tests for assessing covariate effects remain limited. Martingale difference correlation (MDC) provides a non-parametric measure of dependence, where MDC(Y|X) = 0 if and only if E(Y|X) = E(Y), meaning X has no impact on the expectation of Y. This makes MDC a promising tool for testing covariate effects on cure probability.
Methods: We propose a non-parametric hypothesis test based on MDC to evaluate the effect of covariates on the cure probability. A key challenge is that the cure indicator (ν) is only partially observed due to censoring. To address this, we estimate the cure status before applying the test. The methodology is validated through extensive simulation studies, assessing its power and robustness under different scenarios. Additionally, we apply the proposed test to data from a randomized clinical trial on rheumatoid arthritis treatment to identify covariates influencing disease remission.
Results: Simulation studies demonstrate the effectiveness of the proposed method in detecting covariate effects on the cure probability. When applied to the clinical trial data, the test identifies specific covariates associated with an increased probability of experiencing a flare-up. These findings provide new insights into factors influencing disease progression and treatment response in rheumatoid arthritis patients.
posters-monday: 48
Aligning Estimators to Treatment Effects in the presence of Intercurrent Events in the Analyses of Safety Outcomes
Pedro Lopez-Romero1, Brenda Crowe2, Philip He3, Natalia Kan-Dobrosky4, Andreas Sashegyi2, Jonathan Siegel5
1Novartis, Spain; 2Eli Lilly, USA; 3Daiichi Sankyo Inc, USA; 4AbbVie Inc, USA; 5Bayer, USA
Introduction: The evaluation of safety is a crucial aspect of drug development. The ICH Estimand Framework (EF) defines clinically relevant treatment effects in the presence of intercurrent events (ICE) and can enhance this evaluation. However, its application in safety evaluation is uncommon. Additionally, sometimes it is not evident which specific estimand a given estimator is targeting, leading to the implementation of analytical strategies that may not align with the treatment effect of clinical interest.
Methods: This work reviews the clinical questions or treatment effects (estimands) that are most common in the safety evaluation of drugs and the strategies outlined in the EF that reflect those treatment effects. We examine the most common statistical estimators used to assess the risk of drugs, including incidence proportions, Aalen-Johansen estimator, expected adjusted incidence rates and 1 minus Kaplan-Meier, focusing on the interpretation of the estimates and on the estimand they target, depending on how ICEs are defined for analysis, e.g. ignored, censored, or as competing events. By understanding a) the treatment effects that we can feasibly define in the presence of ICEs and b) the estimand that is targeted by different estimators, our goal is to define treatment effects that are clinically meaningful for the evaluation of safety, and to use the estimator that aligns with the treatment effect of interest, so that the treatment effect estimates are meaningful and interpretable.
Results: Our review includes treatment effects or estimands that are relevant to the evaluation of drug safety, such as treatment policy, hypothetical and while-on-treatment, considering ICE such as early treatment discontinuation or use of rescue medication. We explain why the common estimators target different estimands, helping researchers to select the estimator that aligns with the treatment effect of interest. A misalignment between the estimator and the treatment effect of interest can eventually lead to misinterpretations of safety results that potentially can compromise the understanding about the safety profile of a drug.
Conclusions: Applying the EF to safety evaluation can improve the interpretability of treatment effects in clinical development, both in the area of signal detection and in the analysis of selected adverse events of special interest. By clearly defining the estimand and selecting the appropriate statistical method, researchers can ensure that their analyses align with clinically relevant questions. This approach enhances the accuracy and reliability of safety assessments, ultimately contributing to better-informed decision-making in drug development by regulators, physicians, patients and other stakeholders.
posters-monday: 49
CUtools: an R package for clinical utility analysis of predictive models
María Escorihuela Sahún1, Luis Marianos Esteban Escaño1, Gerardo Sanz2, Ángel Borque-Fernando3
1Department of Applied Mathematics, Escuela Universitaria Politécnica La Almunia, University of Zaragoza, Spain; 2Department of Statistical Methods, University of Zaragoza, Spain; 3Urology department, Miguel Servet university hospital, Spain
This work presents a new library in R that provides statistical techniques to validate and evaluate a prediction model both analytically and graphically. The library offers the functions CUC_plot, CUC_table, Efficacy, Efficacy_curve, and Efficacy_test to construct the clinical utility curve, a table of clinical utility values, the efficacy of a biomarker, the efficacy curve, and a test to compare the efficacy of biomarkers.
The purpose of predictive models in clinical diagnosis is to define a biomarker that accurately predicts the occurrence of an event related to a disease. To analyse the predictive capability of a biomarker, this library provides, as an initial output, the clinical utility curve via the CUC_plot function. Clinical utility assesses the benefit of a biomarker used as a dichotomous classifier with a cut-off point. On the X-axis, the possible cut-off points of a biomarker as a continuous variable are plotted, and on the Y-axis, two magnitudes appear: the percentage of misclassified events and the percentage of individuals below the cut-off point. These values represent the false negative rate and the number of treatments avoided when applying the model. Additionally, the CUC_table function provides the numerical values represented graphically.
Another way to analyse the clinical utility of a biomarker is by calculating its efficacy. To study efficacy, this library offers an analytical result with the Efficacy function and a graphical result with the Efficacy_curve function. On the one hand, the numerical value of the marker’s efficacy is obtained as the difference between the treatments avoided by the model and the misclassified events; on the other hand, a graph is produced in which the X-axis shows the values of misclassified events versus the efficacy of the proposed model.
posters-monday: 50
Impact of Particulate Matter 2.5 Levels on Chronic Obstructive Pulmonary Disease: An Analysis of Nationwide Claims Data in Thailand
Pawin Numthavaj1, Tint Lwin Win1, Chaiyawat Suppasilp1, Wanchana Ponthongmak1, Panu Looareesuwan1, Suparee Boonmanunt1, Oraluck Pattanaprateep1, Prapaporn Pornsuriyasak1, Chathaya Wongrathanandha2, Kriengsak Vareesangthip3, Phunchai Charatcharoenwitthaya4, Atiporn Ingsathit1, Ammarin Thakkinstian1
1Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Thailand; 2Department of Community Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University; 3Division of Nephrology, Department of Medicine, Faculty of Medicine Siriraj Hospital, Mahidol University; 4Division of Gastroenterology, Department of Medicine, Faculty of Medicine Siriraj Hospital, Mahidol University
Introduction: Particulate matter 2.5 (PM 2.5) levels have been associated with morbidity and mortality in chronic obstructive pulmonary diseases (COPD). We explored the association between levels of PM 2.5 and exacerbations documented in a Thailand national database claim by the National Health Security Office, which covers about 70% of the Thai population.
Methods: We extracted the data of COPD exacerbations from the identified international classification of disease - 10th version (ICD-10) among patients who were more than 40 years old, as well as verified the information upon documented procedures of usage of nebulizer, intubation, ventilator use, and temporary tracheostomy performed. Data of PM 2.5 levels were estimated from satellite data formula verified with ground datapoint collection. Incidences of COPD exacerbation were then calculated for each week of each district of provinces across Thailand and were modelled for a relationship with the exposure of PM 2.5 in the previous seven days, adjusted for age, gender, and baseline rates of diagnosed comorbidities of cancer, asthma, hypertension, heart failure, anxiety, depression, obesity, diabetes, and dyslipidaemia in mixed-effect Poisson regression with random intercept using R. We also explored the formula for averaging PM 2.5 in the area with the standard average and the area-weighted PM 2.5 level on the model fitness.
Results: A total number of 407,866 verified COPD patients from January 2017 until December 2002 were identified, corresponding to 1,687,517 hospital visits. Among these visits, exacerbation or visits that required lower airway interventions happened in a total of 1,687,517 visits (9.9%). Multivariate Poisson regression analysis found that the incidence rate ratio (IRR) of COPD exacerbation of 1.00098 for each 1 microgram per cubic metre of PM 2.5 increment (95%CI 1.00091 – 1.00106). The weighted PM 2.5 formula was found to have less Akaike information criterion and Bayesian information criterion values in the multivariate model compared to the standard average PM 2.5 calculation used in previous studies (6,278,282 vs. 6,279,384, and 6,279,627 vs. 6,278,539, respectively).
Conclusion: From our analysis, the PM 2.5 level is associated with an increase in the occurrence of COPD exacerbation. We also found that the weighted formula used to calculate the exposure levels seems to fit the data more than the regular formula used in the traditional formula in the literature.
posters-monday: 51
Changes in health services use of a cohort of COPD patients from a pre-pandemic to a COVID-19 pandemic period
Jose M Quintana1,2,4,5, Maria J Legarreta1,2,4,5, Nere Larrea1,2,4,5, Irantzu Barrio2,4,5,6, Amaia Aramburu1,3, Cristóbal Esteban1,3
1Osakidetza/SVS - Galdakao-Usansolo Hospital, Spain; 2Instituto Biosistemak, Bilbao, Spain; 3Instituto BioBizkaia, Barakaldo, Spain; 4REDISSEC; 5RICAPPS; 6UPV/EHU
Background. COVID-19 pandemic had negative effects on health especially in people with chronic diseases. We evaluate the differences in health services use among patients with chronic obstructive pulmonary disease (COPD) during the period of 2017-2019 compared to 2020-2022, COVID pandemic period.
Methods. Cohort of patients recruited from different hospital who had an admission due to COPD exacerbation. Sociodemographic and clinical data were collected from all participants at 2016. A follow up was performed at 2022 with those who agreed to participate, focusing on their use of health services. This included number hospital admissions by any cause, to ICU, visits to Emergency Room, consultations with primary care physician, nurse, or medical specialists. The data was collected for the periods of 2017-2019 and 2020-2022. A sample of patients in the form of paired data was generated where time 1 corresponds to the years 2017-2019 and time 2 of the same patient corresponds to the 2020-2022 period. From these data, multivariate negative binomial regression models were developed for all the number of service usage even data with random effects for patients. Models were adjusted by study period age, Charlson Index, previous admissions and SARS-CoV-2 infection or hospital admission on period 2.
Results. Out of the original cohort of 1,401 patients, 703 (50.2%) died during the follow-up period. Of the remaining, 314 (45%) chose not to participate in the study, while 384 (55%) did participate. The mean age of the participants was 69.2 years (SD: ±9.8), with men constituting 72.1% of the sample. We observed a statistically significant reduction in the number of hospital admissions, ICU admissions, emergency visits, and face-to-face visits with primary care doctors from the first period to the second period. However, there was no significant change in the number of face-to-face consultations with primary care nurses or pneumologists. Having a SARS-CoV-2 infection or being admitted for it during the second period was associated with an increase in hospital admissions, emergency visits, and face-to-face consultations with pneumologists and primary care nurses. Additionally, SARS-CoV-2 infection influenced the face-to-face visits to primary care doctors, but neither factor affected ICU admissions.
Conclusion. COVID-19 pandemic had an important negative effect on patients with COPD. On the one hand, access to the use of most health services in these patients decreased significantly. On the other hand, having had a SARS-COV-2 infection or a hospital admission by it was related to a greater use of these health services.
posters-monday: 52
The ISARIC Clinical Epidemiology Platform: Standardized Analytical Pipelines for Rapid Outbreak Response
Esteban Garcia-Gallo1, Tom Edinburgh1, Sara Duque1, Leonardo Bastos2, Igor Tona Peres2, Elise Pesonel1, Laura Merson1
1Pandemic Sciences Institute, University of Oxford (United Kingdom); 2Pontifical Catholic University of Rio de Janeiro (Brazil)
Background: The International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) is a global research network facilitating rapid clinical responses to infectious disease outbreaks. Comprising 60 members across 133 countries, ISARIC has generated critical evidence for diseases such as COVID-19, dengue, Ebola, and mpox. Its guiding principles—Prepare, Integrate, Collaborate, and Share—support research readiness, integration with public health systems, strong partnerships, and open-access resource-sharing.
Since 2012, the ISARIC Clinical Characterisation Protocol (CCP) has enabled standardized, adaptable investigations of high-consequence pathogens. During the COVID-19 pandemic, ISARIC’s CRF was widely adopted, contributing to a dataset of one million patients. Lessons from past outbreaks underscore the need for both flexibility and standardization in clinical research. A decentralized approach ensures local data ownership while enabling global integration, scalability, and equitable collaboration—key principles driving the development of the ISARIC Clinical Epidemiology Platform (ISARIC-CEP).
Methods: The ISARIC-CEP consists of three tools—ARC, BRIDGE, and VERTEX—designed to streamline data collection, curation, analysis, and evidence generation. ARC provides a machine-readable library of standardized CRF questions, BRIDGE automates CRF generation for seamless REDCap integration, and VERTEX is an open-source application comprising three packages:
- get_REDCap_Data: Harmonizes and transforms ARC-formatted REDCap data into analysis-ready dataframes.
- ISARICAnalytics: A set of Reusable Analytical Pipelines (RAPs) standardizing key epidemiological analyses, including descriptive statistics, data imputation, regression models, feature selection, and survival analysis.
- ISARICDraw: Generates interactive dashboards with customizable outbreak-specific visualizations using Plotly.
VERTEX supports insight panels, organizing outputs into thematic sections, and its adaptable framework enables secure customized dashboards for multiple projects.
Results: The ISARIC-CEP has accelerated clinical research responses, including studies on dengue in Southeast Asia and Brazil. By providing openly accessible tools, it has facilitated high-quality analyses for both scientific and public health communities. Key resources include:
- ARC: https://github.com/ISARICResearch/ARC
- BRIDGE: http://bridge.isaric.org/
- VERTEX: https://github.com/ISARICResearch/VERTEX
- Public Dashboard Example: http://vertex-observationalcohortstudy-mpox-drc.isaric.org
Conclusions: The ISARIC-CEP accelerates outbreak research by ensuring that during an initial response, most time is spent on data capture, not on harmonization, curation, or preparation for analysis. VERTEX’s RAPs streamline analyses, allowing standardized workflows to be shared and adapted across outbreaks, reducing duplication and improving efficiency. Our goal is to build a collaborative community where researchers contribute RAPs, making validated methodologies easily integrable and reusable, amplifying their real-world impact. This approach strengthens clinical research sites, providing automated tools that enhance local capacity and ensure rapid, reproducible, and scalable outbreak analyses.
posters-monday: 53
Topic modelling and time-series analysis to explore methodological trend evolution
Gabrielle Gauthier-Gagné, Tibor Schuster
McGill University, Canada
Background: Statistical methodology used in biomedical research is evolving rapidly, driven by advances in biostatistical approaches and increased integration of machine learning techniques and causal inference frameworks. This convergence is reshaping the methodological foundations that underlie the analysis and interpretation of biomedical data in the literature. Both applied and methodological researchers may wish to explore these trends in their field to better understand the associated implications for evaluating, planning and conducting future studies. However, exploring these trends using conventional literature reviews is both time-consuming and requires periodical updates as the field develops. Therefore, we propose leveraging topic modelling and time-series analysis to explore methodological trend evolution which can easily be replicated and updated.
Methods: We considered two parallel case studies to informally assess the utility of the proposed approach: examination of i) the literature on clinical trials and ii) literature pertaining to medical records. We employed readily available APIs to systematically extract PubMed abstract data related to studies which conducted clinical trials or examined medical records, respectively, in the last 10 years. Abstract text data was tokenized and structured as a document-term matrix (DTM). A large language model software was used to generate an exhaustive dictionary of terms (uni- and bigrams) commonly used in statistics and machine learning. The DTM was reduced to include only the terms corresponding to the entries of the derived term dictionary. Very common terms were additionally excluded. Latent Dirichlet Allocation was used to uncover latent topics across abstracts and to enable mapping of the distribution of topics within abstracts. Time-series analysis was used to characterize and visualize the trends of (average) topical prevalence over time (months), leveraging abstract publication dates and corresponding topic distributions.
Results: The search identified 166, 932 and 7,999 unique abstracts relating to clinical trials or medical record studies, respectively, for review. The generated statistical and machine-learning term lists contained 1803 statistical- and 200 machine-learning-related terms and bigrams. Both time series analyses and visualizations of topic trends over the past decade indicate dynamic and distinct shifts in the landscape of statistical methodology specific to each case study.
Conclusion: We demonstrate that topic modelling paired with time-series analysis are powerful tools for methodological researchers to explore the evolution of statistical methodologies in their field over time.
posters-monday: 54
Post-stroke facial palsy: prevalence on admission, risk factors, and recovery with hyperacute treatments
Zewen Lu1,2, Havva Sumeyye Eroglu3, Halvor Næss4, Matthew Gittins1,2, Amit K Kishore2,5, Craig J Smith2,5, Andy Vail1,2, Claire Mitchell3
1Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, UK; 2Manchester Centre for Clinical Neuroscience, Geoffrey Jefferson Brain Research Centre, Manchester Academic Health Centre, Manchester Academic Health Centre, Salford Care Organisation, Northern Care Alliance NHS Foundation Trust, UK; 3Division of Psychology, Communication & Human Neuroscience, Geoffrey Jefferson Brain Research Centre, University of Manchester, Manchester, UK; 4Department of Neurology, University of Bergen, Haukeland University Hospital, Bergen, Norway; 5Division of Cardiovascular Sciences, Faculty of Biology, Medicine and Health, University of Manchester, UK
Background Facial palsy affects 40 - 50% of stroke survivors, impacting quality of life, communication, and emotional expression. This study estimated its prevalence, identified risk factors, assessed 7-day recovery post-admission and examined associations between hyper-acute treatments (intravenous thrombolysis [IVT] and mechanical thrombectomy [MT]) and recovery in acute ischaemic stroke (AIS) patients.
Methods This was a retrospective individual data analysis of the Bergen NORSTROKE registry with 5987 patients (2006–2021). Only 2293 patients with facial palsy were included in our recovery analysis. We further investigated the association of hyper-acute treatments with facial palsy recovery for 1954 patients with AIS. The complete case analysis was used in each stage of analysis due to minimal missing data. Facial palsy was assessed via the National Institute of Health Stroke Scale. Prevalence and severity of facial palsy on admission were analysed using descriptive statistics, while multifactorial logistic regression explored associations with demographics, stroke subtypes, and neurological symptom clusters. Kaplan-Meier survival curves estimated recovery rates within seven days of admission, and Cox proportional hazards models identified factors associated with recovery. The association between hyper-acute treatments and recovery was assessed using Cox models with time-dependent covariates, adjusting for baseline characteristics.
Results Facial palsy was present in 43% of patients on admission, with 40% experiencing minor or partial paralysis and 3% complete paralysis. Significant risk factors included sex, age, admission motor and sensory function, and ischaemic stroke. By day 3, 25% of patients had recovered, but over 60% still had facial palsy by day 7. Better admission motor and sensory function were strongly associated with recovery. Receiving IVT showed a significant association with better recovery in unadjusted analyses, but neither IVT nor MT were significant in adjusted models.
Conclusions Post-stroke facial palsy is common on admission, with less than 40% of patients recovering within the first week. This highlights the need for targeted monitoring and rehabilitation. Further research is to explore the role of hyper-acute treatments in longer-term recovery.
posters-monday: 55
Evaluating Outlier Detection Methods in Real-World Growth Data: A Sensitivity Analysis of Imperfect Data in a Cluster Randomised Controlled Trial
Maryam Shojaei Shahrokhabadi1, Mohadeseh Shojaei Shahrokhabadi2, Bram Burger3, Ashley J. Adamson2, Dawn Teare2
1Hasselt University, Belgium; 2Newcastle University, UK; 3Uppsala University, Sweden
Background: Growth studies with longitudinal measurements need outlier detection methods that can consider diverse, individual growth trajectories. Several methodological approaches have been developed, with distinct underlying assumptions, which can lead to differing results, potentially influencing study conclusions. To assess the reliability and robustness of primary analyses, we conducted a sensitivity analysis exploring the impact of multiple outlier detection methods on findings from the MapMe 2 study [1].
Methods: The MapMe 2 study, a cluster randomised controlled trial (cRCT), evaluated whether incorporating the MapMe 2 intervention into existing National Child Measurement Programme (NCMP) feedback letters improved child weight outcomes after one year. The primary outcome compared the change in BMI Z-score between intervention and control groups, including all children irrespective of baseline weight status, and specifically among children with a BMI Z-score > 1.33 at baseline. While the study initially used static WHO cut-offs to identify extreme or biologically implausible values (BIVs), in this large-scale trial we explored alternative outlier detection methods. Five approaches were compared to the original sBIV method [2]: (1) modified BIV detection (mBIV), (2) single-model outlier measurement detection (SMOM), (3) multi-model outlier measurement detection (MMOM), (4) multi-model outlier trajectory detection (MMOT), and (5) clustering-based outlier trajectory detection (COT). We then evaluated the impact of these methods on the study findings.
Results: Different outlier detection methods resulted in variations in the number of subjects analysed and slight changes in the estimated effect of the MapMe 2 intervention on BMI Z-score change at one year. However, these differences were minimal, and the overall trends remained consistent.
Conclusion: Sensitivity analyses under varying assumptions yielded results consistent with the primary analysis, confirming its robustness and reinforcing confidence in the trial findings.
References:
- Adamson AJ, et al. Can embedding the MapMe2 intervention in the National Child Measurement Programme lead to improved child weight outcomes at one year? 2021. Trial registration: [ISRCTN12378125]. Available from: https://www.isrctn.com/ISRCTN12378125.
- Massara P, Asrar A, Bourdon C, Ngari M, Keown-Stoneman CD, Maguire JL, Birken CS, Berkley JA, Bandsma RH, Comelli EM. New approaches and technical considerations in detecting outlier measurements and trajectories in longitudinal children growth data. BMC Medical Research Methodology. 2023 Oct 13;23(1):232.
posters-monday: 56
Latent class analysis on intersectional social identities and mental wellbeing among ethnic minority youth in Aotearoa New Zealand
Arier Lee1, Shanthi Ameratunga1,2, Rodrigo Ramalho1, Rachel Simon-Kumar1, Vartika Sharma1, Renee Liang1, Kristy Kang1, Terryann Clark3, Terry Fleming4, Roshini Peiris-John1
1School of Population Health, University of Auckland, Auckland, New Zealand; 2Population Health Gain, Population Planning Funding and Outcomes Directorate, Te Whatu Ora – Health New Zealand, Auckland, New Zealand; 3School of Nursing, University of Auckland, Auckland, New Zealand; 4School of Health, Victoria University of Wellington, Wellington, New Zealand
Background / Introduction
Ethnic minority youth in Aotearoa New Zealand who identify as Asian, Middle Eastern, Latin American, or African navigate multiple shifting identities. Conventional approaches in the literature often frame their experiences through a single social dimension, such as ethnicity. However, this limits deeper insights into how overlapping social identities, linked to broader structural inequities, affect emotional wellbeing. Using an intersectional framework, this study explored how multiple social identities and affiliations influence the mental health and wellbeing of ethnic minority young people.
Methods
We analysed cross-sectional data from 2,111 ethnic minority youth (99% aged 13 to 19) who participated in a population-based secondary school survey in New Zealand in 2019. Latent Class Analysis (LCA) was employed to identify unobserved social affiliation groups based on categorical variables, including sex, sexual and gender identities, religion, perceived ethnicity, migrant generational status, disability, and material deprivation. LCA was also applied to nine family connectedness indicators (e.g., trust in sharing feelings with a family member), classifying participants into distinct family support groups. Multiple logistic regression models were used to predict the outcomes of mental health and wellbeing, by LCA-identified social affiliation and family support groups, and experiences of discrimination and bullying.
Results
LCA identified four distinct social affiliation groups among ethnic minority youth:
- Least marginalised, mixed migration generations
- Some marginalised affiliations, mainly overseas-born
- Some marginalised affiliations, mainly NZ-born
- Multiply marginalised, mixed migration generations
The least marginalised group (Group 1) reported the best mental health and wellbeing outcomes, followed by Groups 2 and 3, while the multiply marginalised group (Group 4) exhibited the highest risks of adverse health outcomes. Independent of social affiliation group, experiences of discrimination and bullying were strongly associated with increased risks of poor mental health. However, higher levels of family support significantly reduced these risks across all social affiliation groups.
Conclusion
Marginalised social identities have cumulative harmful effects on the mental health and wellbeing of ethnic minority youth, but family support can serve to, mitigate some, but not all risk. The use of LCA enabled the classification of participants into distinct social affiliation groups based on multiple intersecting social identity variables, without assuming their independence, thus providing a more nuanced relationship between identity and mental health outcomes. These findings underscore the need to create inclusive and supportive environments for ethnic minority youth and their families.
posters-monday: 57
Using multiple imputation in real-word data studies to aid in the identification of predictors of response while addressing missing data
Jozefien Buyze1, Lada Mitchell2, Lorenzo Acciarri3
1Johnson & Johnson, Beerse, Belgium; 2Johnson & Johnson, Allschwil, Switzerland; 3CRO Valos (J&J partner), Genova, Italy
Background: In real-world data (RWD) studies, the inference drawn from estimates can be jeopardized by missingness in key variables. Recent guidance from the FDA (March 2024) and ICH EMA (May 2024) emphasizes the importance of addressing this issue. This research aims to address missing data for covariates in RWD studies. The hypothesis is that multiple imputation helps to reduce bias and improve validity, reliability, and efficiency of the estimation methods.
Methods: Multiple imputation was applied, assuming data is missing at random (MAR). Multiple imputation preserves all cases and accounts for uncertainty due to missing data. It is crucial to recognize that if the MAR assumption is violated, the results may be biased. Given the non-monotone missing data pattern observed, we applied the fully conditional method for imputing missing variables. This method does not rely on joint distribution but generates separate conditional distributions for each variable needing imputation (Van Buuren 2007).
Understanding the effectiveness of the standard of care and the predictors for it remains an area of unmet need. The case study utilizes data pooled from two prospective single-arm oncology real-world data (RWD) studies, where missing data is present in several baseline covariates relevant to the statistical model for the effectiveness variable, ie for the overall response rate (ORR).
The performance of multiple imputation in different scenarios with varying amounts of missingness was investigated via simulations.
Results: The models were applied on pooled data (N=302) of two RWD studies. The results of the models applied to the 50 imputed datasets were combined using Rubin’s rules (Rubin 1996). Notably, 59% of patients did not have missing data for the selected covariates. Applying multiple imputation allowed for the identification of covariates that affect standard of care effectiveness. Potential predictors for ORR include number of prior lines of therapy, refractory status, thrombocytes, and type of measurable disease. Simulation outcomes further validated the results.
Conclusions: This research investigated methodologies for handling missing data in RWD studies and established a clear framework for applying multiple imputation for important covariates within the context of multiple myeloma. The results show that multiple imputation helped to reduce bias and improve validity, reliability, and efficiency of the prediction methods.
posters-monday: 58
An imputation method for heterogeneous studies in Network Meta-Analysis: A Fully Conditional Specification approach using distance metrics
Christos Christogiannis1,2, Dimitris Mavridis2
1University of Southampton, UK; 2University of Ioannina, Greece
Background: Multiple Imputation (MI) is a popular method for addressing missing data in Individual Patient Data (IPD) meta-analysis. In an IPD meta-analysis with missing data, the complete case analysis (CCA) is considered a reasonable starting point, and then MI as a sensitivity analysis, and vice versa. Fully Conditional Specification (FSC) is a MI method that addresses missing data by imputing one variable at a time, cycling through iterations of univariate models. In each iteration, the incomplete variable is imputed based both on the complete and previously imputed variables.
Methods: Our approach involves estimating the proximity between studies using various distance metrics. By doing so, we identify group studies. Then, we use imputation for each study individually, borrowing information from those studies that exhibit close proximity in terms of distance. Therefore, imputation is informed by neighboring studies, enhancing its accuracy. We conducted a simulation study to evaluate the properties of the suggested methodology and explore how number of studies, number of patients per study, missing rates, standard deviation of the covariates, heterogeneity, and the correlation of covariates affect the results. After accounting for all the aforementioned factors, we resulted in 216 distinct simulation scenarios. The methods that we compared were CCA, FCS, our proposed approach and the full model as it would be if no missing values were induced in the data. The missing mechanism was set to be MAR.
Results: Simulation results were similar between FCS and the proposed method for small percentage of missingness. In scenarios with percentage of missingness of 50% the proposed method outperformed the FCS imputation method in most of the cases. As missingness percentages decreased, our method yielded similar results to FCS, with differences in the third decimal place. More specifically, it had a closer coverage rate (CR) to 95% and was less biased than FCS approach but had a slightly higher root mean square error (RMSE).
Conclusion: The proposed method yielded robust results after evaluation. This means that our method may substantially improve estimation when heterogeneous studies are present in IPD meta-analysis.
posters-monday: 59
Impact of lack of measurement invariance on causal inference in randomized controlled-trials including patient-reported outcome measures: a simulation study
Corentin Choisy, Yseulys Dubuy, Véronique Sébille
Nantes Université, Université de Tours, INSERM, methodS in Patients-centered outcomes and HEalth ResEarch, SPHERE, 44200 Nantes, France.
Aims: Randomized controlled-trials (RCTs) are considered as the gold standard for causal inference. RCTs often include patient-reported outcome measures (PROMs) giving insight into patients’ subjective experience regarding e.g., quality of life, fatigue, using questionnaires. PROMs are often treated as any other outcome, e.g. blood pressure, despite having their own specificities. For instance, when measuring fatigue, patients’ interpretation of PROM items can differ between groups (Differential Item Functioning, DIF) or change over time (Responses Shift, RS) despite similar fatigue levels. In RCTs, randomization should ensure the absence of DIF at baseline. However, RS may subsequently occur differentially between treatment groups during the study, possibly leading to treatment-related DIF when assessing outcomes post-randomization. While such instances of lack of measurement invariance (MI) may provide a better understanding of patients’ experiences, they can also induce measurement bias, if ignored. Our objectives were to measure the impact of lack of MI on causal inference in RCTs and determine how different statistical approaches can handle lack of MI and restore causal inference using a simulation study.
Methods: Responses to a PROM were simulated two mimic a two-arm RCT with varying sample size, treatment effect (under H0 and H1) and number of items. The number of items affected by DIF and DIF size also varied. Partial credit models (PCM) were used to estimate treatment effect with three strategies: S1: ignoring DIF, S2 and S3: accounting for DIF using two PCM-based iterative procedures, either performing tests on PCM parameters (S2) or an analysis of variance of person-item residuals (S3).
Results: When DIF was not simulated, it was not falsely evidenced by S2 and S3. When DIF was simulated and ignored (S1), scenarios under H0 showed high type-I error rates (up to 74 %), and treatment effect estimations were biased under H0 and H1. Overall, bias increased with the size and the proportion of items affected by DIF.
S2 and S3 helped to reduce DIF impact on bias, type-I error, and restore power in scenarios with a sample size of 600 patients. However, they only provided marginal improvements with smaller sample sizes.
Conclusion: This study highlights that causal inference in RCT can be compromised by lack of MI, if ignored or inappropriately recovered. Methods aiming at detecting and accounting for lack of MI can help reduce the risk of biased estimates of treatment effect, particularly when sample size is large.
posters-monday: 60
Evaluation of the Psychometric Qualities of Idiographic Patient Reported Outcome Measures (I-PROMs) for Patients Monitoring: PSYCHLOPS example
Salma Ahmed Ayis1, Luís Miguel Madeira Faísca2, Célia Sales3
1School of Life Course and Population Sciences; King's College London, United Kingdom; 2The University of Algarve, Portugal; 3Faculty of Psychology and Education Sciences; University of Porto (FPCEUP)
Introduction/background:
Nomothetic measures are standardised questionnaires that measure patients’ self-reported experiences (Patient Reported Outcome Measures (PROMs)). PROMs are brief, acceptable to patients and assessors, and broad enough to capture a breadth of difficulties and experiences, allowing for population level comparisons. Patients assign scores against norms derived from clinical and non-clinical populations. Change in scores is often used in trials to assess therapeutic effect.
However, nomothetic PROMs are unable to capture unique problems, and circumstances. Patient-Generated Outcome Measures, known as Idiographic PROMs (I-PROMs), allow people to identify their problems, describe these and provide scores to indicate their impact; therefore, allowing the use of appropriate interventions, and the assessment of the efficacy of interventions. The Psychological Outcome Profiles (PSYCHLOPS) is an I-PROM with questions on problems, function, and wellbeing, where patients can describe their problems and their severity scores. WHO have been using PSYCHLOPS for many years as part of their ‘Problem Management Plus’ intervention.
Nomothetic measures assume that individual questionnaires’ items assess one or more underlying construct that can be summarised using latent class-based methods. I-PROMs on the other hand, primarily value the uniqueness of individual experiences, perceptions, and constructions, therefor, using an underlying construct is considered inappropriate in reflecting persons’ expressions.
In two studies we examined the theory behind I-PROMs and the potential value of latent class methods in providing an insight into these measures. Factor analysis and Item Response Theory (IRT) were used to understand the properties of PSYCHLOPS, an I-PROM.
Methods:
Pre- and post-treatment PSYCHLOPS data derived from six clinical samples (n = 939) were analysed for validity, reliability and responsiveness; caseness cut-offs and reliable change index were calculated. Exploratory and Confirmatory Factor Analyses were used to determine whether items represented a unidimensional construct; IRT examined items’ properties.
Results:
Estimates for internal consistency, construct validity, and structural validity were satisfactory. Responsiveness was high: Cohen’s d, 1.48. Caseness cut-off and reliable clinical change scores were 6.41 and 4.63, respectively. Factor analysis supports items’ unemotionality. IRT analysis confirmed that items’ scores possess strong properties in assessing the underlying trait measured by PSYCHLOPS.
Conclusion:
PSYCHLOPS functioned as a measure of a single latent trait, which we describe as ‘personal distress’.
There are several challenges for I-PROMs including the robustness of the items to be measured, their measurement model, their reliability and validity, and the meaning of an aggregated I-PROM score. I-PROMs may complement nomothetic measures.
posters-monday: 61
Bias in the estimation of a psychometric function when using the PSI-method under optimal conditions – a simulation study
Simon Grøntved1,2, Jakob Nebeling Hedegaard1, Ib Thorsgaard Jensen3, Daniel Skak Mazhari-Jensen4
1Danish Center for Health Services Research, Department of Clinical Medicine, Aalborg University, Denmark; 2Psychiatry, Region North Jutland, Denmark; 3Statistics and Mathematical Economics, Department of Mathematical Science, Aalborg University, Denmark; 4Neural Engineering and Neurophysiology, Department of Health Science and Technology, Aalborg University, Denmark
Background
The PSI-method is a Bayesian adaptive method intended to estimate the threshold and slope of a parametrized psychometric function. The method has been used in both research and clinical practice. It was proposed as an improvement over non-adaptive methods due to a potential need for fewer trials before convergence of estimates is achieved. This has resulted in several studies only running 30-40 stimulation trials when estimation was terminated. A similar range of trials was deemed sufficient in the original study presented by the developers of the algorithm.
While concerns about the choice of parametrization for the lapse rate have been raised, the number of trials needed and how this number relates to estimation of thresholds have been under less scrutiny.
Aim
We aimed to investigate the potential for bias of the PSI-method's estimates of threshold and slope of the psychometric function, and to investigate whether such bias depended on the number of trials used, the ground-truth threshold, and the slope.
Method
We tested the PSI-method (as implemented by the Palamedes toolbox) in a simulation study with 3874 different personality profiles, and 175 simulations per profile. We used a uniform prior for threshold and slope. To restrict potential bias for threshold and slope, we fixed the lapse and guess rates to the ground-truths. We calculated the relative bias in the estimation of threshold along with confidence intervals, and plotted these against the number of trials used, the underlying threshold and slope.
Results
We found presence of bias in the estimation of alpha after 50 trials, where the mean relative bias was positive across most person profiles, median 18.1% [IQR: 11.1%, 44.6%], but in the most extreme cases as high as 147.5%. The observed bias was dependent on the ground-truth threshold, and slope. Increasing the number of trials to 150, the relative bias was considerably reduced to median 4.7% [IQR: 2.6%, 8.9%]. At 1000 trials the relative bias was negligible, median 0.7% [IQR 0.1%, 1.6%], though still mostly positive.
Conclusion
Our results indicate the presence of non-negligible bias in threshold estimation when stopping the PSI-method at the typical number of trials used in real-words settings. This bias was found on simulated data under optimal conditions. We thus conclude that the method requires a significantly greater number of trials than typically used. It should be investigated whether these results can be reproduced in a real-world setting.
posters-monday: 62
Psychometric properties confirmation of the Multiple Sclerosis Autonomy Scale (MSAS) questionnaire evaluating patient autonomy in multiple sclerosis (MS)
Cécile Donzé2, Claude Mekies3, Géraud Paillot4, Lucie Brechenmacher1, Alexandre Civet1, David Pau1, Delphine Chomette1, Mikael Cohen5, Catherine Mouzawak6, Patrick Vermersch7
1Roche SAS, France; 2Hôpital saint Philibert, Groupement des Hôpitaux de l'Institut Catholique de Lille Faculté de médecine et de maïeutique de Lille, Lomme, France; 3RAMSAY Clinique des Cèdres, Neurologie, CHU Toulouse, Toulouse, France; 4Association Aventure Hustive, Saint-Malo, France; 5CRC-SEP Neurologie Pasteur 2, CHU de Nice, Université Côte d’Azur, UMR2CA-URRIS, Nice, France; 6Structure régionale neuro SEP SYNAPSE, Hôpital du Vésinet, Le Vésinet, France; 7Univ. Lille, INSERM UMR1172 LilNCog, CHU Lille, FHU Precise, Lille, France
Introduction
The Multiple Sclerosis Autonomy Scale (MSAS) is a new Patient Reported Outcome (PRO) aiming to evaluate patient autonomy in multiple sclerosis. Our current study's primary objective is to validate the psychometric properties of the MSAS questionnaire.
Methods
A longitudinal prospective observational study included MS patients from January 2024 to May 2024 in 33 sites.
The initial MSAS questionnaire contains 10 dimensions in a 36-items short form and has to be completed by patients at inclusion, D15, D30 and up to one year after inclusion (study is still ongoing as of today).
Several of the psychometric properties of the MSAS have been evaluated including its construct validity (correlation coefficient between items), Internal consistency (Cronbach's alpha coefficient), unidimensionality (Retrograde Cronbach’s alpha curves) and multiidimensionality (multi trait analysis).
This abstract displays the results of the primary objective of the study evaluated at inclusion, with sensitivity analysis carried out at D15 and D30.
Results
From the 210 patients included in the study from January 2024 to April 2024, 199 completed the MSAS questionnaire at baseline: 132 (66.3%) with relapsing remitting form of MS (RRMS), 23 (11.5%) with primary progressive (PPMS) and 44 (22.1%) with secondary progressive (SPMS).
Internal consistency: Cronbach's alpha coefficient ranged between 0.59 to 0.96 at inclusion. Removal of one item in dimension with the lowest Cronbach's alpha coefficient led to increase the coefficient in this dimension to 0.67.
Construct validity: Few strong correlation coefficient (>|0.8|) between items were observed, and remained between items of the same dimension.
Unidimensionality: Overall, removing impact questions one at a time has no significant impact on Cronbach's alphas. This suggests that the impact questions are highly correlated with each other and are important for the reliability of the scale. The overall Cronbach's alpha coefficient of the questionnaire was 0.845 with 36-items and 0.843 with 35-items.
Multidimensionality: each item was most correlated within its own dimension.
Conclusion
Internal consistency was challenged in a dimension and one item had to be removed. The new MSAS-35 items questionnaire is a psychometrically sound measure of autonomy in Multiple Sclerosis.
posters-monday: 63
Learning heterogeneous treatment effect from multiple randomized trials to inform healthcare decision-making: implications and estimation methods
Qingyang Shi1, Veerle Coupé2, Sacha la Bastide-van Gemert3, Talitha Feenstra1
1Unit of PharmacoTherapy, -Epidemiology and -Economics, Groningen Research Institute of Pharmacy, University of Groningen, The Netherlands; 2Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, The Netherlands; 3Department of Epidemiology, University Medical Center Groningen, University of Groningen, The Netherlands
Evidence synthesis and meta-analysis are crucial for healthcare decision-making, yet it often assumes treatment effects are shared across populations, neglecting heterogeneity by patients’ characteristics. This review addresses the critical need to account for heterogeneous treatment effects when synthesizing multiple trials’ data to inform decision-making for a specific target population. We present a causal framework for the decision-making process with heterogeneous treatment effects estimated using the data from different sources. We provide an overview of existing methods for estimating these effects from randomized trials, discussing their advantages and limitations in the context of decision-making. The review covers methods utilizing individual patient data (IPD), partly IPD with aggregate data, and exclusively aggregate data. We emphasize the importance of transportability assumptions, such as shared conditional average treatment effect functions and common covariate support, when extrapolating findings from trials to a target population. Furthermore, we discuss value estimation of an optimal treatment rule in the target population, highlighting the necessity of observational data for estimating the baseline function of outcomes. This review aims to guide researchers and practitioners in appropriately applying and interpreting methods for heterogeneous treatment effect estimation that informs healthcare decision-making when using multiple trials’ data.
posters-monday: 64
Multiple imputation of missing viral load measurements in HIV treatment trials: a comparison of strategies
Tra My Pham1, Deborah Ford1, Anna Turkova1, Man Chan1, Ralph DeMasi2, Yongwei Wang2, Jenny O Huang3, Qiming Liao2, James R Carpenter1, Ian R White1
1MRC Clinical Trials Unit at UCL, London, United Kingdom; 2ViiV Healthcare, North Carolina, US; 3GSK, Ontario, Canada
In randomised trials assessing treatments for HIV, a commonly used primary outcome is the proportion of patients achieving or maintaining viral suppression, often defined based on viral load (VL) measurements below a pre-specified threshold, e.g. <400 copies/mL. However, missing data might occur which can impact the analysis of the primary outcome. In addition, in trials of paediatric populations, further complications can arise from measurements being left-censored (i.e. only known to be below a threshold), and obtained from diluted samples due to insufficient volumes (i.e. the limit of quantification is inflated by the dilution factor). As a result, viral suppression status can become unclear.
Multiple imputation (MI) has been used for handling missing outcome data in trials. However, when a continuous outcome such as VL is dichotomised to define the primary outcome, the imputation model specification requires further consideration. Trial statisticians could impute the missing VL measurements before dichotomising them to determine suppression status, or impute a binary indicator of suppression status directly. Alternatively, MI could be performed such that categories of VL measurements, one of which is the threshold for defining suppression, are imputed.
We aim to explore the performance of these MI strategies for handling missing VL data in a simulation study, in setting with/without left-censoring and dilution. To motivate our simulation study, we use data in ODYSSEY, a trial comparing dolutegravir-based antiretroviral treatment with standard of care in children with HIV.1 The primary outcome was defined as the proportion of patients with virological or clinical treatment failure by 96 weeks. Here we focus on the virological failure component; for simplicity we define the primary outcome for the simulation study as the first of two consecutive VL measurements of ≥400 copies/mL. We simulate VL measurements at baseline and multiple follow-up time points to reflect real trial data collection schedules. VL measurements are made missing under both Missing Completely At Random and Missing At Random mechanisms, and missing data are imputed using different MI strategies. Strategies are compared in terms of method failure, bias, standard errors, coverage, power, and type 1 error. The results of this work will provide the basis for recommendations of practical MI strategies that are relevant to statisticians working in HIV treatment trials.
1Turkova A, White E, Mujuru HA, et al. Dolutegravir as first- or second-line treatment for HIV-1 infection in children. New England Journal of Medicine 2021; 385: 2531-2543.
posters-monday: 65
A novel approach for assessing inconsistency in network meta-analysis: Application to comparative effectiveness analysis of antihypertensive treatments
Kotaro Sasaki1,2, Hisashi Noma3
1The Graduate University for Advanced Studies, Japan; 2Eisai Co., Ltd., Japan; 3The Institute of Statistical Mathematics, Japan
Introduction: Network meta-analysis (NMA) is a pivotal methodology for synthesising evidence and comparing the effectiveness of multiple treatments. A key assumption in NMA is consistency, which ensures that direct and indirect evidence are in agreement. When this assumption is violated, inconsistency arises, conceptualized by Higgins et al. [1] as design-by-treatment interactions, where “design” refers to the combination of treatments compared within individual studies. To evaluate inconsistency, various statistical tools have been developed. However, the existing methods based on statistical testing have limitations, including low statistical power and challenges in handling multi-arm studies. Moreover, the testing approaches might not be optimal for inconsistency evaluation, as the primary goal is not to draw definitive conclusions about design-by-treatment interaction but to identify and prioritise specific designs for further investigations into potential sources of bias within the network. To address these challenges, this study proposes a novel approach for evaluating inconsistency using influence diagnostics, focusing on quantifying the impact of individual study designs on the results.
Methods: We developed a "leave-one-design-out" (LODO) analysis framework to systematically quantify the influence of individual designs on the overall NMA results. New influence measures were proposed to evaluate these effects comprehensively. To facilitate interpretation, we also introduced the O-value, a summary metric that prioritises designs based on their potential contribution to inconsistency using a parametric bootstrap method. Additionally, a new testing approach was formulated within the LODO framework to identify critical designs requiring further investigation. These methods were applied to an NMA of antihypertensive drugs comprising various study designs.
Results: The application of the proposed methods identified key designs contributing to inconsistency in the antihypertensive drug NMA. The influence measures effectively quantified the impact of individual designs. Moreover, the novel testing approach highlighted specific designs warranting further investigation to uncover potential biases. In a sensitivity analysis, excluding trials suspected of causing inconsistency, the rankings of certain treatment effects were reversed.
Conclusion: Our proposed method offers an effective framework for evaluating inconsistency in NMA. By enabling the quantitative assessment and prioritisation of individual study designs, it provides deeper insights into the sources of inconsistency and improves the reliability of NMA findings.
References: [1] Higgins JP, Jackson D, Barrett JK, Lu G, Ades AE, White IR. Consistency and inconsistency in network meta-analysis: concepts and models for multi-arm studies. Res. Synth. Methods. 2012;3(2):98-110.
posters-monday: 66
Investigating (bio)statistical literacy among health researchers in a Belgian university context: A framework and study protocol
Nadia Dardenne, Anh Diep, Anne-Françoise Donneau
Université De Liège Uliege, Belgium
Introduction Even if the literature highlights the importance of developing (bio)statistical literacy (BSL) through curricula and lifelong trainings, few links are made between the BSL and practices of researchers about statistics. However, the causes of statistical misconducts like p-hacking or HARKing are manifold [1] and need to be investigated as a whole with an appropriate BSL framework.
Framework development A BSL framework will be developed and validated based on current (B)SL definitions[2] and the theory of planned behaviour[3] in order to understand the intentional and behavioural demonstrations, i.e. (intentions) to read/perform statistical reports or analyses - when, why, how often and how - through perceived self-efficacy as consumer and producer of statistics, attitudes towards statistics, subjective norm like pressure and practices from colleagues and basic knowledge of statistic. The objectives of the study will be to assess the BSL by investigating associations among the dimensions of the proposed BSL framework. Also, external factors, notably researchers’ educational background in statistics, their professional experience and socio-demographic characteristics will be studied in relation to the BSL dimensions.
Methods A cross-sectional study with the population of interest including health scientific and academic staff at Belgian universities will be conducted. The study has been approved by the University Hospital of Liège Ethics committee. The Delphi method will be used to validate some parts of the BSL dimensions while Cronbach α will be computed to assess internal consistency. Further, exploratory and confirmatory factor analysis will be used to validate the factor structure. Structural equation modelling will be employed to analyse the associations between the BSL dimensions, some of which will be treated as latent variable, and to test the effect of the external factors on these dimensions. Statistical analyses will be performed using the statistical package SAS and R with appropriate packages as lavaan.
Conclusion The data collected will enable to establish the links between the BSL dimensions among health researchers at Belgian universities, and to suggest ways forward, particularly in terms of adapting or reinforcing existing BSL curriculum and instructional practices.
1. Hardwicke TE et al. Calibrating the scientific ecosystem through meta-research. Annu Rev Stat Its Appl. 2020;7 Volume 7, 2020:11–37.
2. Gal I. Adults’ Statistical Literacy: Meanings, Components, Responsibilities. Int Stat Rev. 2002;70:1–25.
3. Hein de Vries, Margo Dijkstra PK. Self-efficacy: the third factor besides attitude and subjective norm as a predictor of behavioural intentions. Health Educ Res. 1988;3:273–82.
posters-monday: 67
Balneotherapy for Peripheral Vascular Diseases: A Systematic Review with a Focus on Peripheral Arterial Disease and Chronic Venous Insufficiency
Mi Mi Ko
Korea Institute of Oriental Medicine, Korea, Republic of (South Korea)
Background: Peripheral vascular diseases (PVDs), including peripheral arterial disease (PAD), chronic venous insufficiency (CVI) and coronary artery disease (CAD), significantly impair vascular function and quality of life. Balneotherapy, a non-invasive intervention involving thermal and mineral water therapies, has shown potential benefits in managing these conditions. However, a systematic evaluation of its efficacy remains limited. This systematic review aims to assess the effects of balneotherapy on vascular outcomes, symptom alleviation, and quality of life in patients with PVDs.
Methods: A systematic search was conducted in PubMed (Medline), Embase, and the Cochrane Central Register of Controlled Trials (CENTRAL) to identify randomized controlled trials (RCTs) published up to November 2, 2024. The search terms included "Balneotherapy" and "Peripheral vascular diseases," and studies meeting predefined inclusion criteria were selected. Eligible studies focused on patients with PAD and CVI, assessed the effects of balneotherapy, and applied the same adjunct interventions to both treatment and control groups. Data extraction was performed independently by two researchers, and the risk of bias was assessed using the Cochrane Risk of Bias (RoB) tool.
Results: A total of 12 RCTs were included in the analysis. For PAD, balneotherapy improved vascular function (e.g., ankle-brachial pressure index, flow-mediated dilation increased walking capacity, and enhanced functional capacity such as leg pain and swelling. In patients with CVI, balneotherapy reduced lower-limb edema, provided pain relief, and improved mobility and quality of life. For CAD, the therapy enhanced endothelial function, reduced vascular inflammation, and improved peripheral perfusion. Adverse events were rare and generally mild, with no severe safety concerns identified. Despite methodological variability, most studies reported favorable effects, particularly in vascular function and symptom management.
Conclusion: Balneotherapy appears to be a safe and effective complementary treatment for improving vascular function, walking capacity, and quality of life in patients with PVDs, particularly PAD and CVI. Further large-scale, high-quality trials with long-term follow-up are needed to confirm these findings and optimize treatment protocol.
posters-monday: 68
Binomial Sum Variance Inequality correction of 95% CIs of percentages in multicentre studies ensures approximately 95% coverage with minimal width
Paul Talsma1, Francesco Innocenti2
1Phastar, United Kingdom; 2Maastricht University, The Netherlands
Percentages with corresponding 95% confidence intervals (CI) are often reported for clinical and epidemiologic multicentre studies. Many approaches of centre effect correction exist. These are often complex and/or provide inadequate coverage. A method is presented for constructing the CI which ensures approximately 95% coverage, has minimal width and is not overly complex: the Binomial Sum Variance Inequality (BSVI) correction.
Two studies have been done to investigate coverage and width of intervals using the correction.
In study 1, coverage and width of CIs using the BSVI correction were compared with no correction and with mainstream correction approaches. A simulation study was done where data was generated using binomial distributions with population percentages per centre being the same or differing using pre-specified amounts. CIs were constructed using the Wilson, Agresti-Coull and exact methods, for varying numbers of centres (2-32) and participants per centre (10-160). The ratio of number of participants between centres was systematically changed. Eleven traditional ways of correcting the CI were compared with each other, with no correction and with the BSVI correction. The traditional ways included using the ANOVA, Fleiss-Cuzick, Pearson, Hedeker and GEE methods for estimating the Intra-Cluster Correlation with or without correction for differences in centre size, as well as direct estimation of variances using SAS® (v.9.4) PROC SURVEYFREQ. It was found that intervals constructed with no correction or with traditional methods had coverage which is too high, a finding which could be explained using the BSVI. The BSVI correction was shown to be effective in downwards correcting coverage close to the desired 95% level and reducing interval width.
In study 2, the properties of the BSVI correction for small samples and scarce events were investigated. Data were generated from 2-4 centres with average event percentages: 2, 4, 8, 16, and 32; total N: 6, 12, 24, and 48; mean ratio of centre size: 1, 2, or 3; and differences between centre percentages being none, small, medium, and large using Cohen’s effect size. Results show that for N≥24 the BSVI correction leads to 95% CIs with adequate coverage (≥95%) and reduced width compared to no correction. These findings were corroborated with further simulations using the same parameters but N ranging from 14-30 in steps of 2. The BSVI correction is recommended for use for N≥24.
Both studies demonstrate that the BSVI correction leads to CIs with adequate coverage and reduced width when compared to other approaches.
posters-monday: 69
Sample size calculation methods for clinical trials using co-primary count endpoints
Takuma Ishihara1, Kouji Yamamoto2
1Innovative and Clinical Research Promotion Center, Gifu University Hospital; 2Department of Biostatistics, School of Medicine, Yokohama City University
Introduction: Clinical trials often employ co-primary endpoints to comprehensively evaluate treatment efficacy. In trials where efficacy is established only if all endpoints show significant effects, the Intersection-Union test is commonly applied. While this approach avoids inflation of Type I error rate due to multiple testing, it increases the Type II error rate, necessitating larger sample sizes to maintain adequate statistical power.
However, most trial designs assume independence among endpoints, which may lead to an overestimation of the required sample size. Considering correlations between endpoints can reduce the sample size while maintaining statistical power.
Various sample size determination methods have been developed for co-primary endpoints with different variable types, including continuous, binary, mixed continuous-binary, and time-to-event. Notably, Homma and Yoshida (2023) introduced a method for mixed continuous and count endpoints, but their approach did not address cases where all primary endpoints are count-based.
Objective: This study aims to develop a sample size calculation method for clinical trials with co-primary count endpoints.
Methods: Co-primary count endpoints often follow different probability distributions, such as Poisson, zero-inflated Poisson (ZIP), and negative binomial distributions. This study derives analytical expressions to determine the minimum sample size required to achieve statistical significance at a pre-specified nominal significance level while considering endpoint correlations.
Results: Simulation studies were conducted under various scenarios to evaluate the impact of endpoint correlation on sample size requirements. The results show that by considering the correlation between endpoints, the required sample size can be greatly reduced, especially when the correlation between endpoints is high.
Conclusion: Our proposed methodology provides a practical approach for optimizing sample size determination in clinical trials with co-primary count endpoints. By leveraging endpoint correlations, researchers can design more efficient trials without compromising statistical power. These findings have significant implications for resource allocation and trial feasibility in studies involving co-primary count endpoints.
posters-monday: 70
Analysis of Composite Endpoint in Cardiovascular Device Clinical Trials
Hao Jiang, Yonghong Gao
Johnson and Johnson, United States of America
Composite endpoint is often used to assess the safety and the effectiveness of cardiovascular devices to increase study power. For example, MACCE (major adverse cardiac and cerebrovascular events) is commonly used in cardiovascular clinical trials. Time-to-first event analysis, composite event process and Finkelstein Schoenfeld (FS) method are the most used approaches to analyze the composite endpoint to detect the treatment effect of the investigational device.
We investigate the potential power gain or loss of utilizing composite endpoint compared to using only one of the individual component endpoints, under the above mentioned three analysis methods. In addition, we look into the pros and cons of those three methods under different scenarios, including endpoints correlation and censoring mechanism. Simulation studies are conducted to assess the performance of the three methods under different settings. Simulation results are provided which include some thought provoking observations.
posters-monday: 71
Bayesian predictive monitoring using two-dimensional index for single-arm trial with bivariate binary outcomes
Takuya Yoshimoto1,2, Satoru Shinoda2, Kouji Yamamoto2, Kouji Tahata3
1Chugai Pharmaceutical Co., Ltd., Japan; 2Yokohama City University; 3Tokyo University of Science
Bayesian predictive probabilities are commonly used in phase II clinical trials and can schematically describe the stability of the data in an interim analysis by considering all possible future data. It thus helps researchers make an informed decision about whether a trial should be prematurely terminated or move to phase III trials. Typically, phase II oncology studies follow a single-arm trial, with the primary endpoint being short-term treatment efficacy. Specifically, objective response based on the RECIST guidelines is commonly used as a primary endpoint in terms of the treatment efficacy.
Although the primary endpoint is commonly set as an efficacy outcome, situations may arise in which the safety outcome is equally important as the efficacy outcome. Brutti et al. (2011) presented a Bayesian posterior probability-based approach that imposed a restrictive definition of the overall goodness of the therapy by controlling the number of responders who simultaneously do not experience adverse toxicity. Similarly, Sambucini (2019) proposed a Bayesian decision-making method based on predictive probability, involving both efficacy and safety with binary outcomes.
These strategies are attractive; however, Brutti et al. (2011) and Sambucini (2019) could not capture the difference in a situation where the joint probability of simultaneously being a non-responder to the therapy while experiencing toxicity is substantially different when comparing the results from the historical control and study treatment. Therefore, we propose a novel approach involving a bivariate index vector for summarizing results by considering the joint probability of described above. Also, through the simulation study to evaluate the operating characteristics of design, we show that the proposed method made appropriate interim go/no-go decisions, and made a valuable contribution to the clinical development. For details, see Yoshimoto et al. (2024).
Reference 1: Brutti, P., Gubbiotti, S. and Sambucini, V. (2011). An extension of the single threshold design for monitoring efficacy and safety in phase II clinical trials. Statistics in Medicine, 30(14), 1648-1664.
Reference 2: Sambucini, V. (2019). Bayesian predictive monitoring with bivariate binary outcomes in phase II clinical trials. Computational Statistics & Data Analysis, 132, 18-30.
Reference 3: Yoshimoto, T., Shinoda, S., Yamamoto, K. and Tahata, K. (2024). Bayesian predictive probability based on a bivariate index vector for single-arm phase II study with binary efficacy and safety endpoints. Pharmaceutical Statistics. http://doi.org/10.1002/pst.2431
posters-monday: 72
Optimising covariate allocation at design stage using Fisher Information Matrix for Non-Linear Mixed Effects Models in pharmacometrics
Lucie Fayette1,2, Karl Brendel2, France Mentré1
1Université Paris Cité, INSERM, IAME, UMR 1137, Paris, France; 2Pharmacometrics, Ipsen Innovation, Les Ulis, France
Introduction
This work focuses on designing experiments for pharmacometrics studies using Non-Linear Mixed Effects Models (NLMEM) including covariates to describe between-subject variability. Before collecting and modelling new clinical trial data, choosing an appropriate design is crucial. Clinical trial simulations are recommended [1] for power assessment and sample size computation although it is computationally expensive and non-exhaustive. Alternative methods using the Fisher Information Matrix (FIM) [2] have been shown to efficiently optimize sampling times. However, few studies have explored which covariate values provide the most information.
Objectives
Assuming a known model with covariate effects and a joint distribution for covariates in the target population from previous clinical studies, we propose to optimise the allocation of covariates among the subjects to be included in the new trial. It aims achieving better overall parameter estimations and therefore increase the power of statistical tests on covariate effects to detect significance, and clinical relevance or non-relevance of relationships.
Methods
We suggested dividing the domain of continuous covariates into clinically meaningful intervals and optimised their proportions, along with the proportion of each category for discrete covariates. We developed a fast and deterministic FIM computation method, leveraging Gauss-Legendre quadrature and copula modelling [3]. The optimisation problem was formulated as a convex problem subject to linear constraints, allowing resolution using Projected Gradient Descent algorithm.
We applied this approach for a one-compartment population pharmacokinetic model with IV bolus, linear elimination, random effects on volume (V) and clearance (Cl), and a combined error (as in [4]). Additive effects of sex and body mass index were included on log(V), and creatinine clearance on log(Cl). Initial distribution of covariates was imported from NHANES as in [3].
Results
Methods were implemented in R using the package PFIM6.1 (https://cran.r-project.org/web/packages/PFIM/index.html).
We found that optimal distribution reduces the number of subjects needed (NSN) to get 80% power on relevance or non-relevance of the three covariates. Without constraints, results were intuitive: distribution between extreme intervals only. In a more constrained and realistic setting, optimisation reduced NSN by over 60%.
Conclusion
We introduced a novel method to integrate the FIM for NLMEM with covariates to efficiently optimise covariate allocation among patients for future studies. We showed an important reduction in the NSN to achieve desired power in covariate tests.
References
[1]FDA Guidance for Industry Population Pharmacokinetics. https://www.fda.gov/media/128793/download, 2022
[2]Mentré et al. CPT Pharmacometrics Syst Pharmacol, 2013
[3]Guo et al. J Pharmacokinet Pharmacodyn, 2024
[4]Fayette et al. PAGE, 2024
posters-monday: 73
Unbiased Estimation for Hierarchical Models in Clinical Trials
Raiann Joanna Hamshaw, Nanxuan Lin
Biostatistics Research Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
Background
In clinical trials, hierarchical models are applied to data where there are dependence between observations occurring within groups, which would violate the independence assumption of some other non-hierarchical estimation methods. These models allow for the incorporation of group analysis as well as individual level analysis. Unbiasedness refers to the identification of an estimator within a class of unbiased estimators that has a uniformly minimum risk. Researchers often obtain this by minimising the risk for some parameter, and observing whether the result is independent of the parameter.
Methods
The modified covariate method proposed by Tian et al. (2014) is a parametric approach to estimating the causal treatment effect (CATE) as well as identifying significant subgroups. This method is shown to be applicable to continuous, binary and survival outcomes. We intend to apply this method to a hierarchical structure, using the benefit of the eliminated nuisance parameter to obtain an unbiased estimate for the overall treatment effect, given an unbiased estimate exists. Simulation studies were undertaken to assess the variance of our treatment effect estimates against that of traditional methods. Sample size estimation calculations for the method were undertaken.
Results
Our methods show that the application of the modified covariate method consistently allowed for a treatment effect estimate with smaller variances than that of current methods, even when subgroup sizes were not equal and when the model included small subgroups. The sample sizes needed for this method are lower than that of other frequentist estimation methods, which often obtain more accurate estimates as the sample size and the subgroup sizes increase.
Conclusion
This new approach offers advantages over current frequentist and Bayesian methods. The parametric approach to this problem allows for less uncertainty around choosing appropriate parameters than that of the Bayesian methods, as well as having the benefit of having a lower required sample size than that of current frequentist methods. The method obtained smaller variances surrounding the overall treatment effect estimation against both types of methods.
Reference
Tian, L., Alizadeh, A. A., Gentles, A. J., Tibshirani, R. A Simple Method for Estimating Interactions Between a Treatment and a Large Number of Covariates. Journal of the American Statistical Association, 109:508, 1517-1532. 2014. https://doi.org/10.1080/01621459.2014.951443
posters-monday: 74
Sample size re-estimation for McNemar's test in a prospective randomized clinical trial on childhood glaucoma
Markus Schepers1, Esther Hoffmann2, Julia Stingl2, Anne Ehrlich3, Claudia Wolf3, Thomas Dietlein4, Ingeborg Stalmans5, Irene Schmidtmann1
1IMBEI, University of Mainz, Germany; 2Department of Ophthalmology, University Medical Centre, University of Mainz, Germany; 3IZKS Mainz, Germany; 4Department of Ophthalmology, University Hospital of Cologne, Germany; 5Department of Ophthalmology, UZ Leuven, Belgium
In clinical trials involving paired binary data, such as those analyzed with McNemar's test, crucial parameters like the correlation between the paired outcomes and the proportion of discordant pairs significantly impact the test's power. However, these parameters are often unknown at the design stage, complicating sample size planning. We develop sample size re-estimation strategies for McNemar's test, motivated by the PIRATE study - a prospective, multi-center, observer-blinded clinical trial comparing standard trabeculotomy with micro-catheter assisted trabeculotomy for treating childhood glaucoma. The trial involves centers in Mainz and Cologne (Germany) and Leuven (Belgium). For fixed effect size, the power of McNemar's test decreases with a higher proportion of discordant pairs and increases with greater correlation between paired observations. However, knowledge about the correlation between paired observations is often limited before the start of a clinical trial. Therefore, adaptive sample size adjustments when interim analyses reveal a certain fraction of discordant pairs is desirable. We propose practical, generalized recommendations for adaptive designs in studies involving McNemar's test with uncertainty about parameters relevant for power, for re-estimating sample size based on interim estimates of these key parameters. Our recommendations aim at maximizing the conditional power while maintaining the type I error.
posters-monday: 75
Bayesian bivariate analysis of phase II basket trials enabling borrowing of information
Zhi Cao1, Pavel Mozgunov1, Haiyan Zheng2
1University of Cambridge; 2University of Bath
Introduction: Phase II clinical trials focus primarily on establishing early efficacy of a new treatment, while the importance of continued monitoring toxicity cannot be ignored. In the era of precision medicine, basket trials have gained increasing attention, with biomarker-driven technology in various patient sub-populations sharing a common disease feature (e.g., genomic aberration). Thus, the borrowing of information across similar patient (sub-)groups is essential to expedite drug development.
Method: We propose a robust Bayesian hierarchical model that can integrate and analyse clinically relevant differences in toxicity and efficacy, while accounting for possible patient heterogeneity and the correlation between the treatment and toxicity effects. From practical consideration, toxicity responses are treated as binary observations, and the efficacy outcomes are assumed to be normally distributed. Our model can be viewed as a two-dimensional extension of the exchangeable-nonexchangeable (EXNEX[1]) method: flexible weights are assigned to mixture distributions that imply different borrowing structures concerning toxicity and efficacy, namely, bivariate EX, bivariate NEX, EX in either toxicity or efficacy while NEX in the other.
Results & Conclusion: Compared with standard Bayesian hierarchical modelling and stand-alone analysis, simulation results of operating characteristics show that our models perform robustly in terms of (the Bayesian analogues of) type I error and power, especially when only toxicity effects are exchangeable (vice versa). The proposed method also has higher power than independently applying the EXNEX method to toxicity and efficacy treatment effects when they are obviously correlated and dissimilar.
Discussion: We give specific model recommendations for various clinical scenarios based on our simulation study of the joint evaluation of treatment effects. Possible future directions to our proposal are the sample size re-estimation and time-to-event extension.
[1] Neuenschwander, Beat et al. “Robust exchangeability designs for early phase clinical trials with multiple strata.” Pharmaceutical statistics vol. 15,2 (2016): 123-34. doi:10.1002/pst.1730
posters-monday: 76
Usefulness of the blinded sample size re-estimation for dose-response trials with MCP-Mod
Yuki Fukuyama1,2, Gosuke Homma3, Masahiko Gosho4
1Biostatistics and Data Sciences, Nippon Boehringer-Ingelheim Co., Ltd, Tokyo, Japan; 2Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Japan; 3Quantitative Sciences, Astellas Pharma Inc, Chuo-ku, Tokyo, Japan; 4Department of Biostatistics, Institute of Medicine, University of Tsukuba, Tsukuba, Japan
Background / Introduction
Sample size calculation requires the assumption of a mean difference and common variance for continuous outcomes. It is often difficult to specify the appropriate value of variance in the planning stage of a clinical trial, and its misspecification results in an unnecessarily large or small sample size. To mitigate such misspecification, blinded sample size re-estimation (BSSR) has been proposed. Since BSSR uses only accumulated data in a blinded manner for variance estimation, and thus is easy to implement. Several variance estimators have been proposed in BSSR for two-arm trials. Recently, a multiple comparison procedure with modelling techniques (MCP-Mod) has become more common in dose-response trials, as it addresses model uncertainty by specifying a set of candidate dose-response models. Nonetheless, no BSSR in dose-response trials with MCP-Mod has been proposed. We extend variance estimators originally developed in BSSR for two-arm trials, and investigate their usefulness in dose-response trials with MCP-Mod.
Methods
For BSSR in dose-response trials with MCP-Mod, we investigate four variance estimators: the unblinded pooled variance estimator, blinded one-sample variance estimator (OS), bias adjusted blinded one-sample variance estimator (bias adjusted OS), and blinded variance estimator using information on randomization block size. We conduct a simulation study to evaluate operating characteristics including the type I error rate and power. Furthermore, to clarify the discrepancy between the actual and nominal power under the final sample size after BSSR, we investigate biases in the point estimates at the end of the trial.
Results
BSSR based on the OS can control the type I error rate and ensure the target power even if the true variance differs from the assumed one. On the other hand, BSSR based on the bias adjusted OS and blinded variance estimator using information on randomization block size can control the type I error rate, but cannot always ensure the target power. Furthermore, it is found that the point estimates are biased after the BSSR based on the OS and bias adjusted OS.
Conclusion
Although the point estimate is biased after the BSSR based on the OS, it is the only method that satisfies both controlling the type I error rate and ensuring the target power. Therefore, we recommend using the OS-based BSSR to mitigate the misspecification of variance at the trial design stage for dose-response trials. Further investigation on the other endpoints (e.g., binary, count, and time-to-event) may be an avenue for future research.
posters-monday: 77
Quantification of allocation bias in clinical trials under a response-adaptive randomization procedure for binary response variables
Vanessa Ihl, Ralf-Dieter Hilgers
RWTH Aachen University / Uniklinik Aachen, Germany
Background:
Randomized clinical trials have as one of their main goals to mitigate bias. The (un-)predictability of assigning a patient to a treatment arm is influenced by different aspects, one of which is allocation bias. It describes the selective allocation of patients by the recruiter influenced by his opinion e.g., on which arm is best or has a higher probability of being allocated. It is based on patient characteristics that influence the expected response, causing bias in the response and therefore affecting the results of a trial.
Response-adaptive randomization (RAR) promises to treat more patients with the more effective treatment compared to classical approaches. Recently, it has been the subject of increased interest and initial use in trials, as it is said to have higher success rates. It is expected that more patients receive the best treatment without compromising the results or requiring more patients for the trial. Specifically for rare diseases, where one expects to include a large proportion of all diseased persons, it is desirable to treat more patients with the better treatment during the phase II/III trials. So far, there is nothing about allocation bias in this area.
Methods:
We will consider a single-centered two-arm parallel group design with a binary primary endpoint, binomial distributed, in which the doubly adaptive biased coin design is applied for allocating the patients to a treatment arm. Further, we apply a testing strategy that allows to take the adaptive nature of the randomization procedure into account. We implement the procedure for simulations and quantify the allocation bias with special focus on rare diseases. Different assumptions for the allocation bias are investigated including strict biasing policies and higher values for the effect of the bias.
Results:
Our first results indicate that even if the allocation bias can be very strong, some RAR procedures are hardly influenced. Generally, the responses in a simulation study seem to be weakly affected by allocation bias. For specific strategies however, it is important to model possible bias effects. Further simulations are still in process, the upcoming results are expected to strengthen this hypothesis.
Conclusion:
RAR trials can successfully reduce concerns about allocation bias for certain procedures. In some cases, it is useful to be able to include modelling the bias in the trial analysis if it cannot be addressed initially in the design of the trial.
posters-monday: 78
Assessment of Assay Sensitivity in Non-Inferiority Trials Using Aggregate Data from a Historical Trial: A Population Adjustment Approach
Eisuke Hida1, Satomi Okamura2, Tomoharu Sato3
1The University of Osaka, Japan; 2The University of Osaka Hospital; 3Hiroshima City University
Background: In non-inferiority (NI) trials lacking assay sensitivity, an ineffective treatment may be found non-inferior, potentially leading to an erroneous efficacy assessment. Therefore, a 3-arm NI trial with placebo, test treatment and control treatment is considered the gold standard and is recommended by several guidelines, such as ICH-E10. However, due to ethical and feasibility concerns regarding the inclusion of a placebo, the practical implementation of 3-arm NI trials remains limited. As a result, a useful method for evaluating assay sensitivity in 2-arm NI trials is needed.
Objective: We propose a practical method for confirming assay sensitivity in 2-arm NI trials. This method evaluates the assay sensitivity of the NI trial after adjusting for the distribution of covariates using a population adjustment method, applied to the summary statistics of historical trial data and the individual patient data (IPD) of the NI trial.
Method: To assess assay sensitivity, it is necessary to demonstrate that the acceptable minimum effective value of the test treatment in a 2-arm NI trial is superior to the placebo (Hida & Tango, 2018). Since a placebo is not included in 2-arm NI trials, historical trial results must be used as external information. To evaluate assay sensitivity in NI, an adjustment method is required to align the patient characteristics from a historical trial to the distribution of IPD in the NI trial. In other words, this approach is the reverse of Matching Adjusted Indirect Comparison (MAIC) or Simulated Treatment Comparison (STC). This proposed method evaluates assay sensitivity of the NI trial by estimating the average treatment effect of a historical trial in the population of the NI trial (IPD) through a combination of MAIC and inverse probability weighting (IPTW). We investigated the performance and practicality of the proposed methods through simulations based on several scenarios using clinical trial data.
Results and Conclusions: Although the proposed method relies on external information, which may result in a lower level of evidence compared to the gold standard design, this method suggests that it is useful for evaluating assay sensitivity in NI trials and supporting decision-making.
posters-monday: 79
Exploring methods for borrowing evidence across baskets or subgroups in a clinical trial: a simulation study
Wenyue Li, Becky Turner, Duncan Gilbert, Ian White
University College London (UCL), United Kingdom
Introduction:
Basket trials are designed to study a single targeted therapy in the context of multiple diseases or disease types sharing common molecular alterations. To draw adequate inference about small baskets, approaches for borrowing evidence become crucial. We aimed to quantify the benefits of information borrowing and to compare the performance of various methods for a proposed phase III basket trial studying a novel immunotherapy for patients with mucosal squamous cell cancers in two common and four rare cancer sites.
Methods:
We simulated six scenarios with different patterns of variation in true treatment effects of a time-to-event outcome across sites. Scenarios 1, 2 and 3 assumed high, low and moderate variation, while Scenarios 4 and 5 assumed contradictory data for the common sites with high or low variation among the remaining sites, respectively. Scenario 6 assumed similar estimates for the common sites only. We estimated a two-stage random-effects meta-analysis model using restricted maximum likelihood or Bayesian estimation while incorporating different priors for the between-site variance. We also implemented an empirical Bayes method and one-stage Bayesian hierarchical approaches using a Bayesian hierarchical model (BHM) and an exchangeability-nonexchangeability (ExNex) model. We conducted 1000 simulations to compare the performance of all methods to a standalone analysis.
Results:
The standalone method performed the worst in precision, mean squared error and power despite its robustness for bias. On the other hand, the Bayesian meta-analysis method with a strongly informative prior was the most precise while producing very large biases under most scenarios, except Scenario 2 where the empirical Bayes method appeared to be the most precise. However, a substantial undercoverage was found for the empirical Bayes method under Scenario 2 and for the Bayesian meta-analysis method with a strongly informative prior under Scenarios 1, 4, 5 and 6. The ExNex model resulted in fairly low biases under most scenarios, whereas the BHM achieved considerably higher precision and power than the former for the rare sites under Scenario 2.
Conclusions:
Our work demonstrated precision and power gains from using proposed information-borrowing methods rather than a standalone analysis. We also demonstrated sensitivity of the results to the choice of prior for the between-site heterogeneity. To provide further guidance for practice, we recommended using a vague prior for the Bayesian meta-analysis method when treatment effect heterogeneity is likely to be limited. Moreover, we recommended using the ExNex model when contradictory true treatment effects are likely to exist.
posters-monday: 80
Comparing The ED50 Between Treatment Groups Using Sequential Allocation Trials.
Teresa Engelbrecht, Alexandra Graf
Medical University Vienna, Austria
Determining the median effective dose (ED50) is a fundamental objective in the field of anaesthesia research, both in human and animal studies. Sequential allocation methods, such as the Up-and-Down Method (UDM) and the Continual Reassessment Method (CRM), offer an efficient method of dose allocation based on the responses of previous subjects, thus reducing the required sample size compared to traditional study designs with fixed sample sizes [1].
Motivated by previous studies [2,3], we aim to evaluate methods for comparing ED50 across different treatment groups. While sequential allocation methods such as the Up-and-Down Method (UDM) and the Continual Reassessment Method (CRM) are well described for estimating the ED50, only a limited amount of literature is available for the comparison of the ED50 between several treatment groups. To evaluate the advantages and limitations of sequential allocation methods in comparison to traditional fixed-sample designs, we conducted simulation studies across various scenarios. Our analysis assessed the power and type-1-error of UDM and CRM, as well as logistic regression with a fixed sample size, to determine their respective strengths and weaknesses in estimating and comparing ED50 values across treatment groups.
[1] Görges M, Zhou G, Brant R, Ansermino JM. Sequential allocation trial design in anesthesia: an introduction to methods, modeling, and clinical applications. Paediatr Anaesth. 2017;27(3):240-247. doi:10.1111/pan.13088
[2] Müller J, Plöchl W, Mühlbacher P, Graf A, Stimpfl T, Hamp T. The Effect of Pregabalin on the Minimum Alveolar Concentration of Sevoflurane: A Randomized, Placebo-Controlled, Double-Blind Clinical Trial. Front Med (Lausanne). 2022;9:883181. Published 2022 May 3. doi:10.3389/fmed.2022.883181
[3] Müller J, Plöchl W, Mühlbacher P, Graf A, Kramer AM, Podesser BK, Stimpfl T, Hamp T. Ethanol reduces the minimum alveolar concentration of sevoflurane in rats. Sci Rep. 2022;12(1):280. Published 2022 Jan 7. doi:10.1038/s41598-021-04364-8
posters-monday: 81
A pre-study look into post-study knowledge: communicating the use(fulness) of pre-posteriors in early development design discussions
Monika Jelizarow
UCB Biosciences GmbH, Germany
When designing a clinical study we make assumptions on our drug's true treatment effect, for the endpoint of interest. These assumptions are based on existing data and/or expert belief, that is, they are based on some form of evidence synthesis. In the Bayesian framework, this evidence synthesis will result in a design prior distribution representing our current knowledge about the true treatment effect. A pre-posterior can be interpreted as a conditional posterior distribution representing the updated knowledge about the true treatment effect at the end of our future study given only that we know that a certain study outcome (i.e. success or failure) has been met (Walley et al., 2015; Grieve, 2024). Thus, pre-posteriors enable a look into future updated evidence (this is the 'post' part) before running the future study (this is the 'pre' part). This opens the door to help answer many questions statisticians are often asked by their clinical colleagues in proof-of-concept settings, e.g. 'If the study will be successful, what will this make us learn about the true treatment effect?' or 'Does running the study de-risk our program? How would it compare to running the study with more (or fewer) patients?'. Shaped by experiences gained in our organisation, the goal of this contribution is to propose a question-led and visualisation-informed workflow for how to effectively communicate the (use)fulness of this quantitative tool in discussions with stakeholders. The importance of early contextualisation will be emphasised, and supported by illustrating the relationship between, for example, the pre-posterior of success and the unconditional probability of success (PoS), also known as assurance.
posters-monday: 82
Estimation and testing methods for delayed-start design as an alternative to single-arm trials in small clinical trials
Tomoharu Sato1,2, Eisuke Hida2
1Hiroshima City University, Japan; 2The University of Osaka, Japan
Introduction and Objective(s):
Traditional randomised controlled trial designs are difficult to implement in small populations, such as in the rare disease and paediatric disease areas. Various methodological and statistical considerations have been reported for such small clinical trials [1, 2]. Due to feasibility, many single-arm trials of test drugs alone are still being conducted, allowing the evaluation of within-patient comparisons. In single-arm trials, the efficacy of a test drug is assessed based on a pre-specified threshold. However, it is well known that even if the treatment effect is better than the threshold in a well-controlled single-arm trial, the estimate of the treatment effect is subject to bias. Therefore, simple estimates from single-arm trials may make it difficult to draw valid conclusions about efficacy. In such situations, it is also desirable to be able to estimate the true effect size of the test drug without the influence of bias. In this study, we propose a delayed-start design as an alternative to single-arm trials and a method for estimating and testing treatment effects.
Method(s) and Results:
We propose a method for estimating and testing treatment effects using a delayed start design. In a delayed start design, a randomised controlled trial is conducted in the first period and a single-arm trial in the second period, allowing to estimate the treatment effect of the trial and the difference between the two treatment effects. Various factors, such as disease and treatment characteristics, determine the ‘estimand’ and alter the modelling, but we have given model-specific estimation and testing methods and interpretations. We show that, with appropriate use of delayed start designs, it is possible to estimate the difference between two treatment effects, in addition to assessing efficacy by comparison with a pre-specified threshold, as in single-arm trials. Numerical study is also used to assess their performance and give model-specific interpretations.
Conclusions:
Delayed start designs with appropriate modelling for the primary endpoints may be more effective than single-arm trials for pragmatic small clinical trials in rare and paediatric disease areas.
Keywords: small clinical trials, delayed-start design
References:
[1] IOM. Small clinical trials. issues and challenges (2001). [2] CHMP. Guideline on clinical trials in small populations (2006).
posters-monday: 83
Dealing with missing values in adaptive N-of-1 trials
Juliana Schneider1, Maliha Raihan Pranti2, Stefan Konigorski1,3,4
1Hasso-Plattner-Institute, Germany; 2University of Potsdam, Germany; 3Hasso Plattner Institute for Digital Health at Mount Sinai; 4Icahn School of Medicine at Mount Sinai
N-of-1 trials are multi-crossover trials in single participants, designed to estimate individual treat- ment effects. Participants alternate between phases of intervention and one or more alternatives in trials that often have only few data points. In response-adaptive designs of N-of-1 trials, trial length and burden due to ineffective treatment can be reduced by allocating treatments adaptively based on interim analyses. Bayesian approaches are directly applicable by updating posterior beliefs about effectiveness probabilities. Furthermore, they allow inference for both individual and aggregated effects. Missing values occur, for instance, due to commonly reported wavering adherence to the treatment schedule and other personal or external factors. This may happen randomly through- out the trial (Missing Completely At Random) or dependent on other factors such as severity of symptoms addressed in the trial or time. Missing values require adjusting the adaptive allocation mechanism appropriately, but the best approaches for short adaptive N-of-1 trials are not known. In fact, careful imputation of such missing values is crucial, since sequential treatment allocation depends on past outcome values. Here, we investigate the performance of different imputation methods for missing values in simu- lated adaptive N-of-1 trials. The imputation approaches use information either from only the respec- tive individual or from all participants, and the adaptive N-of-1 trials are set up in a Bayesian-bandit design using Thompson Sampling. We evaluate the different imputation approaches in a simulation study of 1000 synthetic adaptive n-of-1 trials, comparing two alternate treatments and their associ- ation with a normally distributed outcome. We compare posterior descriptive and inference metrics for adaptive trajectories with and without missing values. More precisely, we juxtapose the posterior means and variances of the fully observed and partly observed trial sequences against each other and the underlying true distribution, as well as study the Kullback-Leibler divergences among them. This serves to investigate the impact of data missingness and different imputation methods on bias and efficiency in treatment effect difference estimation. Preliminary results indicate that the optimal imputation method in a given situation depends on whether analysis is intended on the aggregated or individual level. Moreover, the amount of miss- ingness within and between trial participants impacts imputation results. Lastly, time-dependent associations between measurements and missingness may alter the success of various imputation methods. Future research may include such time-dependencies both in the simulated data as well as in suitable imputation methods.
posters-monday: 84
Adaptive clinical trial design with delayed treatment effects using elicited prior distributions
James Salsbury1, Jeremy Oakley1, Steven Julious1, Lisa Hampson2
1University of Sheffield, United Kingdom; 2Advanced Methodology and Data Science, Novartis Pharma AG, Switzerland
Randomized clinical trials (RCTs) are essential for evaluating new treatments, but modern therapies such as immunotherapies present challenges, as delayed treatment effects often occur. These delayed effects complicate trial design by leading to premature futility decisions or inefficient trials with excessive sample sizes and extended durations. Additionally, the proportional hazards assumption, commonly used in survival analysis, may be violated in the presence of time-varying treatment effects.
Adaptive trial designs provide a flexible alternative, allowing modifications based on accumulating data. However, in the context of delayed treatment effects, incorporating prior knowledge about uncertain parameters, such as delay duration and effect magnitude, can significantly enhance trial efficiency. Eliciting prior distributions for these parameters provides a structured approach to account for uncertainty, helping guide trial decisions and improve design robustness.
We present a framework for adaptive clinical trials that explicitly incorporates elicited priors to account for delayed treatment effects. We propose adaptive strategies such as dynamic interim analysis, and efficacy/futility stopping rules, which can be informed by prior distributions. Simulations compare the performance of adaptive designs to traditional fixed designs, demonstrating the benefits of using priors to improve trial efficiency and decision-making.
Our methods aim to reduce inefficiencies and support real-time decision-making, ultimately advancing the evaluation of new therapies.
|