Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
T6: Digitalization and AI - Session 2
Time:
Monday, 07/July/2025:
4:00pm - 6:00pm

Chair: Fatima Rani
Co-chair: Sergio Lucia
Location: Zone 3 - Room D049

KU Leuven Ghent Technology Campus Gebroeders De Smetstraat 1, 9000 Gent

Show help for 'Increase or decrease the abstract text size'
Presentations
4:00pm - 4:20pm

Computational Assessment of Molecular Synthetic Accessibility using Economic Indicators

Friedrich Hastedt1, Klaus Hellgardt1, Sophia Yaliraki1, Antonio del Rio Chanona1, Dongda Zhang2

1Imperial College London, United Kingdom; 2University of Manchester, United Kingdom

The field of molecular and drug discovery has made significant progress in generating promising compounds with desired physical properties. However, a significant gap remains in addressing the practical feasibility of synthesizing these compounds. While current research focuses primarily on the prediction of molecular activity and properties, the economic viability of synthesis, including the cost and complexity of producing these compounds at scale, is often overlooked. Without incorporating market-driven constraints such as price into early-stage predictions, many molecules that appear promising in silico are ultimately inviable from a process engineering or market perspective. This results in wasted time and resources.

In recent years, several machine-learning (ML) approaches have been developed to guide virtual screening and de novo molecule generation toward synthesizable compounds1. A promising strategy involves computationally efficient scoring functions that classify molecules as “easy-to-synthesize (ES)” or “hard-to-synthesize (HS)”. These functions use either i) complexity-based indicators2 or ii) retrosynthetic analysis3 to assess synthetic accessibility. Although both methods have their merits, they face a significant limitation: the inability to generalize to out-of-distribution molecules, which are in fact the molecules of interest. Additionally, these scoring systems are typically based on binary classifications (1 for ES, 0 for HS) or predefined continuous ranges (e.g., 1 to 10, where 10 represents HS), which lack a clear physical interpretation.

To overcome the limitations, we propose a novel molecular synthetic accessibility score based on the market price of a molecule. Our model (MolPrice) is trained on a database of 5 million molecules with associated catalogued prices. Leveraging self-supervised learning, MolPrice differentiates between the prices of synthetically accessible molecules and more complex, out-of-distribution molecules, such as inaccessible natural products or HS compounds. By grounding the score in the market value of molecules, MolPrice provides a physically interpretable metric. Compared to existing models for price prediction, MolPrice demonstrates superior accuracy, speed, and reliability, as well as enhanced generalizability.

We validate MolPrice through multiple case studies, including virtual screening and retrosynthetic planning. In virtual screening, MolPrice steers the search toward synthetically accessible molecules, while preserving molecules with desirable properties. In retrosynthetic planning, MolPrice efficiently prioritizes promising synthetic routes, potentially reducing synthetic complexity and cost. In summary, MolPrice is a versatile tool, offering both accurate molecular price predictions and reliable synthetic accessibility assessments.

1. Stanley, M., and Segler, M. 2023. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Current Opinion in Structural Biology, 82, p.102658

2. Ertl, P., and Schuffenhauer, A. 2009. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1(1), p.8.

3. Huang, Q., Li, L.L., & Yang, S.Y. (2011). RASA: A Rapid Retrosynthesis-Based Scoring Method for the Assessment of Synthetic Accessibility of Drug-like Molecules. Journal of Chemical Information and Modeling, 51(10), p. 2768–2777.

4. Sanchez-Garcia, R., Havasi, D., Takács, G., Robinson, M., Lee, A., Delft, F., and Deane, C. 2023. CoPriNet: graph neural networks provide accurate and rapid compound price prediction for molecule prioritisation. Digital Discovery, 2(1), p.103–111.



4:20pm - 4:40pm

ML-based adsorption isotherm prediction of metal-organic frameworks for carbon dioxide and methane separation adsorbent screening

Dongin Jung, Donggeun Kang, Donghyeon Kim, Siuk Roh, Jiyong Kim

School of Chemical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea

Biogas, composed of carbon dioxide and methane, is primarily separated using pressure swing adsorption (PSA) to upgrade it into a high-energy content gas with high methane purity. Recently, metal-organic frameworks (MOF) have emerged as promising carbon dioxide adsorbents for PSA in biogas treatment, owing to their high porosity and tunable structures. Estimating the adsorption capacity of MOF is essential for screening high performing adsorbents. While molecular simulations are commonly used to estimate the adsorption capacities, their computational intensity acts as a bottleneck in screening MOF adsorbents. This study proposes a new AI-leveraged high-throughput screening methodology (HTS) to rapidly identify high-performance MOF adsorbents for biogas treatment. A graph neural network (GNN) model was developed to predict the adsorption capacities of all MOF candidates, replacing the time-consuming molecular simulations. The GNN model processes the structural graphs of MOFs, capturing their spatial configurations, such as surface structure and pore characteristics, which are closely related to adsorption performance. Based on the adsorption capacities predicted by the GNN model, an isotherm model was derived to characterize the adsorption behavior of the MOFs. In the first stage of screening, we eliminate unsuitable MOFs with disordered structures or those containing precious metals. To ensure the viability of our screening method, we performed PSA process simulations on the remaining MOF candidates under various conditions (i.e., pressures and biogas compositions). We then identify the optimal MOFs, which exhibit high and stable methane recovery. Finally, the identified optimal MOFs outperformed conventional PSA adsorbents, demonstrating the effectiveness of our methodology. The proposed screening methodology not only contributes to rapid screening of MOFs at the process scale for biogas treatment but also paves the way for broader applications of MOFs in carbon dioxide separation technologies.

References

Ga, S., An, N., Lee, G. Y., Joo, C., & Kim, J. (2024). Multidisciplinary high-throughput screening of metal–organic framework for ammonia-based green hydrogen production. Renewable and Sustainable Energy Reviews, 192, 114275. https://doi.org/10.1016/j.rser.2023.114275

Choudhary, K., DeCost, B. Atomistic Line Graph Neural Network for improved materials property predictions. npj Comput Mater 7, 185 (2021). https://doi.org/10.1038/s41524-021-00650-1



4:40pm - 5:00pm

Predicting Surface Tension of Organic Molecules Using COSMO-RS Theory and Machine Learning

Flora Esposito1, Ulderico Di Caprio1, Bruno Rodrigues3, Florence Vermeire2, Idelfonso Bessa dos Reis Nogueira3, Mumin Enis Leblebici1

1Center for Industrial Process Technology, Department of Chemical Engineering, KU Leuven, Agoralaan Building B, 3590 Diepenbeek, Belgium; 2KU Leuven, Department of Chemical Engineering, Celestijnenlaan 200F-bus 2424, Leuven 3001, Belgium; 3Chemical Engineering Department, Norwegian University of Science and Technology, Sem Sælandsvei 4, Kjemiblokk 5, Trondheim 793101, Norway

Surface tension is a key property at the liquid/gas interface and plays an important role in various chemical engineering processes, including liquid flow and transport through porous media. Experimental measurement results for this property require specialized equipment and are time-consuming; therefore, predictive models for surface tension are essential. Available modeling methods rely on parametric fitting of experimental data. However, recently Gaudin proposed a COSMO-RS-based model to predict surface tension (γX) of pure liquids.1 The model is described by: γX = - ΔGxG→X/sx, where γX (mN/m) is the surface tension of a compound X, ΔGxG→X (kcal/mol) is the Gibbs free energy of self-solvation of a compound X, and sx2) is the molecular surface of a compound X. The implicit assumption of this model is that any surface segment has an equal probability of being exposed at the surface. While this approach assumes uniform molecular orientation at the surface and was originally tested on a limited set of molecules at 20 °C, this work aims to: 1. test the prediction capabilities of the current approach across a wider range of temperatures and compounds; and 2. develop a corrective machine learning (ML) model correlating the deviations from the theoretical prediction with molecular descriptors. The deviation is expressed as the ratio between experimental and simulated surface tensions: ε = γXexpXsim. Surface tension data for 93 common organic molecules are obtained from surface-tension.de website. They include alcohols, amines, amides, nitro compounds, ketones, hydrocarbons, ethers, diols, etc. Their molecular descriptors are computed using the Mordred library in Python. COSMOTherm software is employed to predict ΔGxG→X and Sx.

A neural network with four hidden layers is optimized using Bayesian techniques to minimize Mean Squared Error (MSE) on the validation set, achieving an MSE of 0.0056 and an R2 of 0.8452 on the test set. Gaudin's model for predicting surface tension exhibited a mean absolute percentage error (MAPE) of approximately 16–17%, a root mean square error (RMSE) of 9–7 mN/m, and a mean absolute error (MAE) of 6–5 mN/m when evaluated across a temperature range of 5–50 °C. However, integrating Gaudin’s model with a machine learning-based corrective approach significantly improved predictive accuracy. The hybrid model reduced the MAPE to 6–7%, the RMSE to 3.7–3.3 mN/m, and the MAE to 2-3 mN/m over the same temperature range. This enhancement represents a reduction in prediction errors of approximately 60.6% in MAPE, 56.3% in RMSE, and 54.5% in MAE compared to the standalone Gaudin model. Nevertheless, the ML model will be retrained on a larger and broader set of molecules, retrieved from the Jasper 2 dataset, aiming to improve its prediction capabilities.

Bibliography.

1 Gaudin, T. Chem Phys Lett 706, 308–310 (2018)

2 Jasper, J. J. J Phys Chem Ref Data 1, 841–1010 (1972)



5:00pm - 5:20pm

Hybrid model development for Succinic Acid fermentation: relevance of ensemble learning for enhancing model prediction

Juan Federico Herrera Ruiz, Javier Fontalvo, Oscar Andres Prado-Rubio

Universidad Nacional de Colombia sede Manizales, Colombia

The increasing focus on sustainable development goals has spurred significant research into bioprocesses optimization, particularly through technological advancements in process monitoring, data storage, and computational capabilities. These developments, combined with modelling techniques and simulation tools, are driving substantial advances on digitalization of biomanufacturing. In this context, hybrid modelling has emerged as a powerful approach, combining parametric and non-parametric methods to mitigate their individual drawbacks [1].

This study focuses on developing a hybrid model to harness limited experimental data and improve states’ predictions of succinic acid fermentation by Escherichia coli [2]. Succinic Acid is considered as one of the key molecules to pave the way for sustainable bio-based production of chemicals. However, parametric kinetic models for succinic acid fermentation tend to be overparameterized and uncertain, leading to poor predictive performance [3]. The present research was conducted in two stages. First, the experimental data was pretreated using established methodologies for removing outliers and noise [4]. In the second stage, different hybrid models were proposed for the system, with varying degrees of hybridization (including from one to all reaction rates). For each hybrid model, the predictive power of different machine learning (ML) algorithms such as ANN, SVM, and Gaussian Processes were investigated. Besides, two tunning strategies were tested: a) using the original kinetic parameters and b) recalibrating the remaining kinetic parameters after the ML training. During the model validation, high variability of the quality of predictions was observed. Therefore, an ensemble learning approach was implemented to mitigate this issue.

The data for training and validation were the same as the original research. The results showed that the hybrid models perform better than the parametric model, with a validation RMSE of 3.0456 for models (a) and (b), with the highest degree of hybridization (darkest models); compared to a RMSE of 7.0268 for the parametric model. These results depict the advantages of hybrid modeling in accurately describing succinic acid fermentations even with limited data, which aids in the prospects of bioprocess scale-up, digitalization and biorefinery development.

References

[1] de Azevedo CR, Díaz VG, Prado‐Rubio OA, Willis MJ, Préat V, Oliveira R, et al. Hybrid Semiparametric Modeling: A Modular Process Systems Engineering Approach for the Integration of Available Knowledge Sources. Systems Engineering in the Fourth Industrial Revolution, Wiley; 2019, p. 345–73. https://doi.org/10.1002/9781119513957.ch14.

[2] Chaleewong T, Khunnonkwao P, Puchongkawarin C, Jantama K. Kinetic modeling of succinate production from glucose and xylose by metabolically engineered Escherichia coli KJ12201. Biochem Eng J 2022;185:108487. https://doi.org/10.1016/j.bej.2022.108487.

[3] Leonov P. Bio-succinic acid production from alternative feedstock. Denmark Technical University, 2022. PhD Thesis.

[4] Sánchez-Rendón JC, Morales-Rodriguez R, Matallana-Pérez LG, Prado-Rubio OA. Assessing Parameter Relative Importance in Bioprocesses Mathematical Models through Dynamic Sensitivity Analysis, 2020, p. 1711–6. https://doi.org/10.1016/B978-0-12-823377- 1.50286-X



5:20pm - 5:40pm

Addressing the bottlenecks in implementing artificial intelligence for decarbonisation of thermal power plants

Waqar Muhammad Ashraf, Vivek Dua

University College London, United Kingdom

Artificial intelligence (AI) has had transformative impact on many industrial sectors including healthcare and banking; AI adoption in industrial thermal power systems remains relatively slow. This is attributed to data-centric nature of AI modelling algorithms lacking the interpretability notion [1, 2] and ineffective introduction of system-based constraints in the modelling algorithm(s) and optimisation problem. As a result, the trained AI models and solution estimated from the optimisation problem, though feasible mathematically, may not be tested on real-time operation of industrial systems particularly thermal power plants. To these potential challenges impeding the adoption of AI in thermal power plants, we present a comprehensive AI based analysis toolkit that incorporates the interpretable AI model(s) [3], uncertainty quantification in the model-based point-prediction [4] and improved formulation of optimisation problem for the efficient solution estimation for the performance enhancement of the thermal power plants.

The developed AI based analysis toolkit will be implemented on a 660 MW supercritical thermal power plant to maximise thermal efficiency and minimise heat rate under the ramp-up and ramp-down of the power plant. We will also demonstrate how AI model-based optimisation analysis without introducing the system-specific constraints may estimate the practically ineffective solutions to implement on the plant operation; though the solutions are feasible to the formulated optimisation problem. The improved solution estimation strategy through the developed AI based toolkit can have a transformative impact towards the AI adoption in the industrial systems that enhances the responsible use of AI for enhancing the performance of the industrial systems. It is further anticipated that smart operation of thermal power plants can significantly reduce the fossil fuel consumption thus cutting down huge volumes of emissions to environment to support the decarbonisation of thermal power plants.

References

[1] Saleem, R., Yuan, B., Kurugollu, F., Anjum, A., and Liu, L., 2022, "Explaining deep neural networks: A survey on the global interpretation methods," Neurocomputing, 513, pp. 165-180.

[2] Decardi-Nelson, B., Alshehri, A. S., Ajagekar, A., and You, F., 2024, "Generative AI and process systems engineering: The next frontier," Computers & Chemical Engineering, 187, p. 108723.

[3] Ashraf, W. M., and Dua, V., 2024, "Data Information integrated Neural Network (DINN) algorithm for modelling and interpretation performance analysis for energy systems," Energy and AI, p. 100363.

[4] Ashraf, W. M., and Dua, V., 2024, "Storage of weights and retrieval method (SWARM) approach for neural networks hybridized with conformal prediction to construct the prediction intervals for energy system applications," International Journal of Data Science and Analytics, pp. 1-15.



5:40pm - 6:00pm

Thermodynamics-informed graph neural networks for transition enthalpies

Roel Leenhouts1, Sebastien Jankelevitch1, Roel Raike1, Simon Müller2, Florence Vermeire1

1Department of Chemical Engineering, KU Leuven, Leuven, Belgium; 2Institute of Thermal Separation Processes, Hamburg University of Technology, Hamburg, Germany

Phase transition enthalpies represent the amount of heat absorbed or released during phase transitions such as melting, vaporization, and sublimation. The prediction of these enthalpies is essential for early-stage process screening and for modeling the temperature dependence of a range of thermodynamic properties. Despite their importance, measuring phase transition enthalpies can be time-consuming and costly, leading to a growing interest in computational methods that can provide reliable predictions. Graph neural networks (GNNs), known for their ability to learn complex molecular representations, have emerged as state-of-the-art tools for predicting various thermophysical properties. Despite their success GNNs do not inherently obey thermodynamic laws in their predictions.

In this study, we present a multi-task GNN designed to predict vaporization, fusion, and sublimation enthalpies of organic compounds. The GNN employed in this work utilizes a directed message passing architecture. To train the model, we digitized the extensive Chickos and Acree compendium, which encompasses 32,023 experimentally measured transition enthalpy values collected over 135 years [1]. This dataset serves as a comprehensive resource for developing machine learning models capable of accurately predicting phase transition enthalpies across diverse molecular families. In addition, we modified the loss function of the GNN, based on the thermodynamic cycle shown in Equation (1), to impose thermodynamic consistency between the enthalpies of different phase changes. For the thermodynamics-informed constraints, we explored two approaches: soft constraints, which guide the model toward thermodynamically consistent solutions while maintaining flexibility, and hard constraints, which strictly enforce thermodynamic consistency.

(1) ΔHsub = ΔHfus + ΔHvap

The results demonstrated that the multi-task GNN achieved mean absolute errors (MAEs) of 11.0 kJ/mol for sublimation, 6.1 kJ/mol for fusion, and 4.6 kJ/mol for vaporization on the test set. Importantly, incorporating a soft constraint improved the thermodynamic consistency without compromising accuracy, while a hard constraint ensured fully consistent predictions but reduced accuracy. Thus, soft-constrained physics-informed neural networks (PINNs) offer an optimal balance between consistency and accuracy for this application, whereas hard-constrained PINNs prioritize thermodynamic fidelity at the cost of predictive accuracy.

In addition, we compared our GNN against SoluteML, a state-of-the-art method for predicting Abraham solute parameters that can be combined with LSER published by Chickos and Acree to calculate transition enthalpies [2]. A modest improvement in prediction accuracy was observed. The model's performance analyzed across molecular subgroups revealed that prediction accuracy increases as the size of the training data for each subgroup grows, highlighting the importance of expanding experimental datasets. Overall, this work demonstrates the potential of thermodynamics-informed GNNs for accurate and physically consistent prediction of phase transition enthalpies.

[1] W. Acree and J. S. Chickos, “Phase transition enthalpy measurements of organic and organometallic compounds and ionic liquids. sublimation, vaporization, and fusion enthalpies from 1880 to 2015,” Journal of Physical and Chemical Reference Data, 2017.
[2] Y. Chung, F. H. Vermeire, H. Wu, P. J. Walker, M. H. Abraham, and W. H. Green, “Group contribution and machine learning approaches to predict Abraham solute parameters, solvation free energy, and solvation enthalpy,” Journal of Chemical Information and Modeling, 2021.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: ESCAPE | 35
Conference Software: ConfTool Pro 2.6.154
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany