10:30am - 10:50amThe Smart HPLC Robot: Fully autonomous method development empowered by mechanistic model framework
Dian Ning Chia, Fanyi Duanmu, Luca Mazzei, Eva Sorensen, Maximilian Otto Besenhard
University College London, United Kingdom
Developing ultra- or high-performance liquid chromatography (HPLC) methods for analysis or purification can require significant amounts of material and manpower, and typically involves time-consuming iterative lab-based workflows. Autonomous HPLC is a powerful tool for speeding up method development, as demonstrated recently via Bayesian optimisation to identify suitable HPLC settings without anyoperator interference.[1,2] To allow for autonomy and knowledge-driven decision-making, we incorporate mechanistic models, and automated training of these models, in the workflow. This digital HPLC twin, aka “Smart HPLC Robot”, is an intelligent platform enabling the development of an optimized HPLC method with minimal computer-controlled experiments, while simultaneously delivering a calibrated mechanistic model that provides valuable insights into method robustness. The Smart HPLC Robot is programmed in Python and integrates seamlessly with Agilent OpenLab software, which controls the Agilent 1260 Infinity II system via a web application interface built-in C#. It also interfaces with process simulators to run mechanistic models, ensuring smooth communication between the experimental setup and the simulation environment. The operation of the Smart HPLC Robot starts with the user configuring the robot, i.e., choosing the mass transfer and isotherm models to start with, the number of components expected (if this information is available), as well as providing the bounds for the operating variables (e.g., gradient program and flow rate) and objective function for optimisation (e.g., maximize number of peaks or minimize method time) - otherwise default conditions will be used. The initial conditions are then sent to the Agilent. After each experiment, the robot analyses the chromatogram immediately to estimate and update the model parameters, e.g., the adsorption isotherm parameters. Once the parameters are estimated, the now fully-equipped mechanistic model allows for in-silico optimisation of the HPLC method, e.g., gradient program and flow rate. The new, optimal experimental settings are then sent to the Agilent, and the chromatograms from the optimal simulation and the experiment are compared for validation. If unsatisfactory, these steps are automatically repeated as many times as required. The results show that the Smart HPLC Robot is a fully automated and efficient framework for optimal HPLC design and operation, as well as for model development based on a mechanistic model and a limited number of experiments, all without any manual interference, thus saving material, time and manpower.
References [1] Dixon, T.M., Williams, J., Besenhard, M., Howard, R.M., MacGregor, J., Peach, P., Clayton, A.D., Warren, N.J., Bourne, R.A., 2024. Operator-free HPLC automated method development guided by Bayesian optimization, Digital Discovery, 3, 1591–1601
[2] Boelrijk J., Ensing B., Forre P., Pirok B.W.J., Closed-loop automatic gradient design for liquid chromatography using Bayesian optimization, Analytica Chimica Acta, 1242, 340789
10:50am - 11:10amHybrid Models Automatic Identification and Training through Evolutionary Algorithms
Ulderico Di Caprio, M. Enis Leblebici
Center for Industrial Process Technology, Department of Chemical Engineering, KU Leuven, Agoralaan Building B, 3590 Diepenbeek, Belgium
Hybrid modeling (HM) techniques have become popular for predicting the behavior of complex chemical systems, especially when purely mechanistic models are insufficient. These models combine mechanistic knowledge, such as conservation laws, with data-driven methods like machine learning to enhance accuracy. However, their development often relies on experts to identify model deviations and apply the appropriate data-driven corrections. This study proposes a new methodology to automatically identify discrepancies between a mechanistic model and real-world data and select the optimal data-driven function for deviation prediction using minimal data. The approach is designed for dynamic systems and does not require extensive knowledge of data-driven modeling. The identification process is formulated as a mixed-integer programming (MIP) optimization problem, which simultaneously identifies the mechanistic model component that requires adjustment and the best statistical function to describe the deviation. This non-linear optimization is challenging due to the dynamic nature of the system, creating complex relationships between the parameters and the prediction error. The optimization is solved using mixed-integer differential evolution (DE) algorithm and the Bayesian information criterion (BIC) as loss function to balance model accuracy and complexity.
Several case studies are used to validate the methodology, including chemical reactions, biochemical reactions, and Lotka-Volterra oscillator. Following, the results on one example of the equilibrium reaction is reported. Considering the reaction
A⇋R⇋S,
where each reaction is a first-order in the reactant. The employed kinetic constants are k1D = 3·10-1 min-1, k1I = 1·10-1 min-1, k2D = 2·10-2 min-1, k1I = 1·10-2 min-1, and the initial conditions are CA0=10 mol/L, CR0 = 0 mol/L and CS0 = 0 mol/L. A deviation function was artificially introduced into the mass balance of the first component (r1D), using a multi-layer perceptron to generate deviations from the model. Data was divided into a training set (first 10 minutes) and a test set (next 10 minutes), and model performance was assessed using the coefficient of determination (R²), mean absolute error (MAE), and mean absolute percentage error (MAPE).
The model achieved an R² of 0.917, an MAE of 0.0518, and a MAPE of 0.842% on the test set. These metrics show that the methodology accurately identified both the correct mechanistic equation and its associated parameters. Furthermore, the algorithm correctly identified the deviation in the reaction rate for r1D. When noise (±10%) was added to the data, the algorithm still identified the correct equation and performed well on the training set, but prediction accuracy on the test set decreased, highlighting the methodology's sensitivity to noise. This suggests that noise-reduction techniques should be applied before using the proposed methodology.
In conclusion, this approach offers a novel automated solution for hybrid modeling, improving accuracy with minimal data. While it performs well in dynamic systems, future work will focus on enhancing its robustness in the presence of noise to extend its applicability in real-world scenarios.
11:10am - 11:30amHybrid machine-learning for dynamic plant-wide biomanufacturing
Shabnam Shahhoseyni1, Arijit Chakraborty2, Mohammad Reza Boskabadi1, Venkat Venkatasubramanian2, Seyed Soheil Mansouri1
1Department of Chemical and Biochemical Engineering, Technical University of Denmark, DK-2800 Kgs Lyngby, Denmark; 2Department of Chemical Engineering, Columbia University, New York, NY 10027, United States of America
Data-driven modeling has shown great promise in capturing the behavior of complex systems (Venkatasubramanian, 2009). Bioprocesses, as a prime example of such systems, are ideal candidates for data-driven models, which help bridge gaps in both theoretical knowledge and practical modeling. A major challenge in AI and machine learning (ML) modeling for biomanufacturing lies in the lack of high-quality data and the complexity of the processes involved. To overcome this, incorporating domain expertise into the modeling framework is crucial. By combining numeric AI (machine learning) with symbolic AI, a hybrid AI approach can be developed, resulting in robust, interpretable models suitable for complex systems (Chakraborty, Serneels, Claussen, & Venkatasubramanian, 2022).
In this work, we develop a hybrid model including first principle model and data driven approach for Lovastatin biomanufacturing as the target process study. Using a model discovery engine, we aim to create a detailed, explainable model of the manufacturing process that accounts for its various complexities. Our approach starts with generating a specialized timeseries dataset from the KT-Biologics I (KTB1) model of the production unit (Boskabadi, Ramin, Kager, Sin, & Mansouri, 2024). KTB1 presents a dynamic mechanistic simulation model of continuous biomanufacturing, encompassing the entire plant. The upstream section includes a continuous stirred-tank reactor (CSTR) and a hydrocyclone, while the downstream section incorporates centrifuge and nanofiltration. The dataset includes concentrations of various nutrients (such as lactose and adenine), biomass levels in different streams, and the Lovastatin API produced production plant. We then apply the AI-DARWIN framework (Chakraborty, Sivaram, & Venkatasubramanian, 2021) to build explainable ML models. This framework allows us to limit the types of functions used during model discovery, ensuring that the resulting models are both accurate and easy to interpret. The models are presented in polynomial form, making it clear how each factor influences the overall system output. The primary objective is to develop a hybrid AI model to predict the API production in the plant under varying nutrient conditions.
References
Arijit Chakraborty, A. S. (2021). AI-DARWIN: A first principles-based model discovery engine using machine learning. Computers and Chemical Engineering, 154, 107470.
Boskabadi, M., Ramin, P., Kager, J., Sin, G., & Mansouri, S. S. (2024). KT-Biologics I (KTB1): A Dynamic Simulation Model for Continuous Biologics Manufacturing. Computers and Chemical Engineering, 108770.
Chakraborty, A., Serneels, S., Claussen, H., & Venkatasubramanian, V. (2022). Hybrid AI Models in Chemical Engineering–A Purpose-driven Perspective. Computer Aided Chemical Engineering. 51, pp. 1507-1512. Elsevier.
Venkatasubramanian, V. (2009). Drowning in data: informatics and modeling challenges in a data‐rich networked world. AIChE Journal, 55(1), 2-8.
11:30am - 11:50amPhysics-Informed Automated Discovery of Kinetic Models
Miguel Ángel de Carvalho Servia1, Ilya Orson Sandoval1, Klaus Hellgardt1, King Kuok {Mimi} Hii2, Dongda Zhang3, Ehecatl Antonio del Rio Chanona1
1Department of Chemical Engineering, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom; 2Department of Chemistry, Imperial College London, White City, London, W12 0BZ, United Kingdom.; 3Department of Chemical Engineering, The University of Manchester, Oxford Road, Manchester, M13 9P, United Kingdom
The industrialization of catalytic processes requires reliable kinetic models for their design, optimization, and control. Despite being the most sought-after due to their interpretability, white box models are difficult to construct, requiring extensive time and expert knowledge. To alleviate this, automated knowledge discovery techniques, such as SINDy, have gained popularity in dynamic modelling [1]. This study aims to advance previous frameworks, ADoK-S and ADoK-W, by incorporating prior expert knowledge through mathematical constraints and integrating uncertainty quantification techniques, addressing previously identified shortcomings [2].
The research utilizes improved versions of the ADoK-S and ADoK-W frameworks [2], comprising of four main steps: (I) a genetic programming algorithm with encoded constraints that foster the generation of physically reasonable candidate models, (II) a sequential optimization algorithm for parameter estimation of promising models, (III) a model selection process using the Akaike Information Criterion (AIC) and, (IV) quantifying the uncertainty of the output of the selected model. The revised methodology ensures the proposal of physically coherent models and showcases the uncertainty in the final model's output.
The refined methodology successfully embeds prior knowledge, facilitating the discovery of kinetic models with less data compared with their original counterpart and guaranteeing physically sound proposals. Furthermore, uncertainty quantification enhances the reliability of predictions, aiding in the identification of sensitive parameters and promoting safer and more efficient system developments, vital for decision-making and risk management. These enhancements not only allow for a physics-informed reduction in the search space but also improve data efficiency and model reliability, all critical to making automated knowledge discovery methods a serious competitor to classical kinetic modelling approaches.
References
[1] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. U.S.A., 113(15):3932–3937, March 2016. doi:10.1073/pnas.1517384113.
[2] Miguel Ángel de Carvalho Servia, Ilya Orson Sandoval, King Kuok (Mimi) Hii, Klaus Hellgardt, Dongda Zhang, and Ehecatl Antonio del Rio Chanona. The automated discovery of kinetic rate models – methodological frameworks. Digit Discov, 3(5):954–968, 2024. ISSN 2635-098X. doi:10.1039/d3dd00212h.
|