Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Multi-omics data integration
Time:
Monday, 25/Aug/2025:
4:00pm - 5:30pm

Location: ETH E23

D-BSSE, ETH, 84 seats

Show help for 'Increase or decrease the abstract text size'
Presentations
17-multi-omics: 1

Unsupervised Factor-Based Methods for Multi-Omics Data Integration

Bernard Isekah Osang’ir1,2, Jürgen Claesen2,3, Ziv Shkedy2, Surya Gupta1

1Belgian Nuclear Research Centre (SCK•CEN), Mol, Belgium; 2I-Biostat, Hasselt University, Diepenbeek, Hasselt, Belgium; 3Department of Epidemiology and Data Science, Amsterdam UMC, Amsterdam, The Netherlands

Background: Multi-omics data integration is essential for advancing precision medicine and systems biology, allowing for a holistic understanding of complex biological processes. Unsupervised factor-based methods offer a powerful approach for identifying latent patterns within high-dimensional and heterogeneous biomolecular datasets. However, the performance of these methods in integrating diverse omics modalities remains underexplored. This study systematically benchmarks three widely used factor-based methods—Multi-Omics Factor Analysis (MOFA), Multiple Factor Analysis (MFA), Group Factor Analysis (GFA), and proposed a new method in the context of multi-omics data integration, the Factor Analysis for Bicluster Acquisition (FABIA).

Methods: We evaluated these methods using real-world and simulated multi-omics datasets. The real datasets included (1) a Chronic Lymphocytic Leukemia (CLL) study integrating DNA methylation and drug response profiles and (2) an experimental radiation dataset capturing transcriptomic and proteomic data from irradiated mouse brain tissues. Using the R package SUMO, we simulated datasets with predefined latent factors, varying noise levels, and different data distributions. Each method was assessed for its ability to (i) recover predefined latent factors, (ii) capture shared and unique variance components, and (iii) maintain robustness across noise conditions. Performance was evaluated using Jaccard index, Pearson correlation of factor scores, feature weights, and metrics such as sensitivity, specificity, and accuracy. Additionally, we explored semi-supervised integration, where the FABIA method is used for guided feature selection enhanced signal discovery.

Results: All four methods successfully identified latent biological patterns, however, the MOFA and the FABIA methods showed the strongest performance, capturing similar biological signals with high agreement. FABIA excelled at low-to-moderate noise but declined under high noise, while MOFA remained stable. The MFA method seems to be robust but struggled with extreme noise, and the GFA method was the most affected. Semi-supervised learning enhanced variance explanation and feature selection, refining latent factor detection and improving integrative multi-omics analysis, particularly in complex and noisy datasets.

Conclusions: Factor-based methods are crucial for multi-omics integration, providing scalable and interpretable solutions for extracting biological insights. The benchmark analysis of these methods, highlights their strengths and limitations in handling noise and dataset complexity. Multi-omics plays a key role in understanding disease mechanisms, identifying biomarkers, and advancing precision medicine. As the availability of multi-omics data grow, robust statistical frameworks for multi-omics data integration is essential for uncovering complex biological relationships and biomedical applications.



17-multi-omics: 2

Multi-modal integration reveals the joint role of electrocardiogram, imaging and genetics in cardiovascular risk

Andrea Mario Vergani1,2, Francesca Ieva1,2, Marco Masseroli1, Emanuele Di Angelantonio2,3

1Politecnico di Milano, Milano, Italy; 2Human Technopole, Milano, Italy; 3University of Cambridge, Cambridge, United Kingdom

Background / Introduction

The recent availability of biobank-scale multi-modal healthcare data offers invaluable opportunities for personalised risk profiling and studying omics' biological impacts on disease. In the cardiovascular field, multi-modal data is largely collected, including genetics, electrocardiogram (ECG) and imaging; however, the biological interactions of these complex modalities and their impacts across disease subtypes are still unclear. For this reason, this study leverages omics fusion and survival analysis to explore the value that imaging, ECG and genetics can provide to cardiovascular risk prediction, as well as their interplay when integrated together.

Methods

Analysing data from the UK Biobank, including a panel of Polygenic Risk Scores (PRSs) available for 485,000 participants, ECG-derived measures (e.g., QRS duration) from 40,000 individuals, and Cardiac Magnetic Resonance (CMR) measures (e.g., left ventricular ejection fraction) from 30,000 subjects, we leveraged a state-of-the-art omics fusion method - Multi-Omics Factor Analysis - to integrate such modalities into a joint low-dimensional feature space. The embeddings were then fitted in survival analysis studies on about 20,000 subjects healthy at baseline with PRS, ECG and CMR data available, targeting various cardiovascular disease subtypes (e.g., coronary artery disease). Moreover, we analysed the integrated feature space to evaluate the different impacts of PRSs, ECG and CMR on our embeddings and, in turn, on disease risk.

Results

When integrating the modalities with Multi-Omics Factor Analysis, we observed a clear interplay between genetic, ECG and CMR information, especially in the first principal embeddings: as an example, the first integrated factor was mostly ECG-driven, but also explained part of the variance of PRS and CMR datasets; the second one, instead, mostly captured CMR measures, but also retained cardiovascular-related genetic information. Overall, our latent representation integrated omics interactions in a 10-dimensional space, explaining nearly 50% of the variance of the CMR dataset and about 30% of the PRS and the ECG ones each. The three modalities together proved to predict time-to-cardiovascular event, achieving a mean cross-validation concordance index of 0.71; specifically, the first and second joint factors had statistically significant effects on cardiovascular risk prediction, together with the fourth one, which captured exclusively genetic variability.

Conclusion

We propose a novel approach employing population-level multi-omics fusion to integrate ECG, CMR and genetic measures for better-informed cardiovascular risk prediction. The integrated complex modalities improved prognostic disease prediction and demonstrated statistical significance and value beyond traditional clinical covariates, thus contributing to enhanced personalisation for cardiovascular risk stratification.



17-multi-omics: 3

Analyzing Protein Folding Dynamics Using Multi-Dimensional Varying Coefficient Models

Jürgen Claesen

Amsterdam UMC, Netherlands, The

Proteins are initially synthesized as unstructured polymers on ribosomes and fold into functional forms. The folding process is influenced by both intrinsic protein properties and interactions with external factors, leading to a range of behaviors from rapid folding to stable unfolding. Protein folding mechanisms can be investigated using pulsed hydrogen-deuterium exchange mass spectrometry (HDX-MS), which tracks both local and global exchanges. In a pulsed HDX experiment, proteins are exposed to deuterium for a set period, causing hydrogen atoms to exchange with deuterium and resulting in a measurable mass increase. This change in mass is captured by a mass spectrometer coupled with liquid chromatography (LC). By monitoring shifts in retention time and mass at various stages of the folding process, the mechanism and rate of folding can be determined.

To examine folding differences across multiple proteins, we developed a multi-dimensional varying coefficient model. In this model, protein mass and retention time are treated as covariates within one-dimensional smooth functions. The product of these one-dimensional smooths creates a smoothed surface. We also incorporated interactions between categorical variables (protein identity and folding time) and the two smooths (mass and retention time), yielding multiple smoothed surfaces. The model, structured with main effects and interactions, allows for the estimation and testing of smoothed differences between the reference surface and other surfaces, with results evaluated using simultaneous confidence bands.



17-multi-omics: 4

X-Med: Cross-modal integration for sequential intelligence in aftercare of kidney transplant recipients

Aditya Kumar1, Simon Rauch2, Mario Cypko1, Oliver Amft1,2

1Hahn-Schickard, Freiburg, Germany; 2Intelligent Embedded Systems Lab, University of Freiburg, Germany

Introduction: Early prediction of outcomes for kidney transplant recipients (KTRs), including graft loss and rejection, is crucial for improving post-transplant care. Leveraging multimodal data offers complementary insights, including structured (e.g., demographics, laboratory results, vitals) and unstructured (e.g., clinical notes) information from Electronic Health Records (EHRs). Structured longitudinal data are often affected by missing values, irregular sampling and asynchronous measurements. Unstructured text demands context-aware processing to extract relevant clinical information. To address the aforementioned data challenges, we propose a unified patient representation learning approach that models each data modality individually and integrates them into a shared embedding space. The learned patient-level embeddings are evaluated on downstream prediction tasks and provide interpretable, disentangled features representation.

Methods: Structured data, including both time-varying and static features, are modelled using a Time-Aware LSTM with self-attention. Static features are reintroduced at each time step to account for their influence overtime. The Time-Aware LSTM architecture addresses temporal dependencies, irregular sampling, asynchronous features, and missing data. Unstructured clinical notes are embedded using a pretrained sentence transformer (gte-large), finetuned on clinical texts. Representations from both modalities are fused via cross-attention into a shared patient embedding space. The training process enforces a disentangled embedding space. The embeddings are evaluated on outcome prediction tasks (graft loss, rejection, and mortality) using the NephroCAGE dataset [1]. Interpretability is assessed with SHAP values, assessing whether the influential features align with medical domain knowledge. Additionally, we use the mutual information gap (MIG) and separate attribute predictability (SAP) to quantify disentanglement.

Results: Preliminary results demonstrate that our model achieves state-of-the-art performance in predicting graft loss and rejection (ROC-AUC = 0.95 and 0.81). SHAP analyses indicate that the model captures established risk factors for kidney patients. Ablation studies confirm that complementary information from different modalities is effectively integrated. Ongoing work focuses on evaluating disentanglement using MIG and SAP to optimise training settings without compromising predictive performance.

Conclusion: Our results highlight the potential of multimodal patient representations for outcome prediction in kidney transplantation. Integrating structured and unstructured data improves performance, yet balancing predictive power and disentanglement remains challenging. Future work will explore training strategies to enhance latent space interpretability while preserving clinical relevance and accuracy.

Reference:

[1]. Schapranow, Matthieu-P., et al. "NephroCAGE—German-Canadian Consortium on AI for Improved Kidney Transplantation Outcome: Protocol for an Algorithm Development and Validation Study." JMIR Research Protocols 12.1 (2023): e48892.



17-multi-omics: 5

Modeling Interdependencies in Multiomic Spatial Analysis

Giulia Capitoli1, Vanna Denti1, Veronica Vinciotti2, Ernst Wit3

1University of Milano-Bicocca, Italy; 2University of Trento, Trento; 3University of Svizzera Italiana, Switzerland

Introduction

Understanding the dependency structure among a large number of molecules is a central goal in biology, particularly in the context of disease research and biomarker discovery. However, real-world data often present significant challenges due to their heterogeneous nature. Samples are frequently collected under varying spatial and temporal conditions, leading to differences in network structures across groups. In such cases, the assumption of independent and identically distributed (i.i.d.) data becomes unrealistic. Applying a single graphical model to the entire dataset risks overlooking meaningful group-specific variations, while fitting separate models for each group fails to leverage shared patterns between groups and often requires pre-labeled group information.

Methods

To address these challenges, Gaussian Graphical Mixture Models (GGMMs) have emerged as a promising solution. GGMMs assume that data arise from a mixture of Gaussian distributions, where each component represents a subgroup with its unique network structure. This framework enables the simultaneous identification of cluster memberships and the modeling of intra-cluster dependencies, offering a principled approach for analyzing heterogeneous and high-dimensional data.

Central to the proposal is the extension of Gaussian Graphical Mixture Models (GGMMs) to incorporate spatial dependencies and multimodal data, resulting in a Spatial Gaussian Copula Graphical Mixture Model (SGCGMM).

Results

By leveraging techniques such as markov random fields, sparse precision matrices, and copula models, the model achieves interpretable and scalable results, accounting for spatial correlations, integrate diverse molecular profiles, and handle mixed data types and overcoming current limitations in handling high-dimensional, nested, and noisy data.

The methodology will be applied to mass spectrometry imaging (MALDI-MSI) datasets, which provide spatially resolved molecular profiles on the same biopsy tissue section. These efforts will characterize interdependencies between molecular families, identify spatial biomarkers, and provide insights into tumor microenvironments and disease mechanisms.

Conclusion

Beyond the motivating application, the tools and theoretical advancements developed in this work will have broader applicability, driving progress in multiomic data integration and precision medicine. A computational pipeline and user-friendly tools are under development to ensure the accessibility and scalability of the proposed approaches.