4:00pm - 4:20pmApplying Time Series Extrinsic Regression to Parameter Estimation Problems for Dynamic Models – an Alternative to Gradient-Free Approaches?
Torben Talis, John Paul Gerakis, Christian Hoffmann, Jens-Uwe Repke
Technische Universität Berlin, Germany
Time series analysis is a well-established field within the machine learning community, with two prominent applications being time-series forecasting, i.e., surrogate models, predicting the next time step for the systems outputs, and time-series classification, where complete timeseries are mapped to discrete labels, e.g. a sensor is either working or defective. Time-Series Extrinsic Regression (TSER), however, is a method for predicting continuous, time-invariant variables from a time series by learning the relation between these underlying parameters and the complete dynamic time series of the outputs without focusing on the recent states. E.g., it can be used to predict the heart rate based on an ECG signal. TSER as a research field was only established in 2021, but it is gaining traction ever since and it is used e.g. in the field of manufacturing technology to predict steel surface roughness from laser reflection measurements. It is applied, when there are no models available.
Parameter Estimation (PE) is a common task in chemical engineering. It is used to adjust model parameters to better fit existing dynamic models to experimental time series data. This becomes more challenging in higher dimensions and for dynamic systems, where sensitivity and identifiability may change over time. There already exists a multitude of algorithms to solve the problem, including second-order methods that leverage information from Jacobian and Hessian matrices, as well as gradient-free optimization techniques, such as particle swarm optimization (PSO) or simulated annealing. However, with the growing establishment of machine learning (ML) in an increasing number of domains, the question arises as to whether, and if so, how, ML in general and TSER in particular can be employed to solve PE problems.
This study marks the first application of TSER to PE problems. A comparative analysis is conducted between TSER and PSO, in terms of prediction accuracy, computational cost and data efficiency. We investigate, whether it is viable to use TSER, when there is a model available.
Our methodology to regress model parameters via ML builds on the typical assumption, that a structurally correct and rigorous model, which can be simulated at low cost, is available. At the beginning, the boundaries of the parameter space are defined. This space is then sampled using Sobol sequences and the model is simulated. The resulting trajectories, along with their corresponding parameters, constitute the training data set. These trajectories are transformed through application of the “RandOm Convolutional Kernel Transform” method resulting in novel features, which are subsequently used to train the regressor model. This regressor returns predictions for the parameters.
In a case study, the method is applied to predict the heat transfer and kinetic parameters of a batch reactor based on simulated data. However, real measurements are often not continuously available, but are taken only at rare, discrete points in time, and different variables are measured at different, asynchronous intervals. This is also mimicked in the synthetic training data, so the influence of heterogeneity on the results can be shown and over- or undersampling strategies are applied to counteract the effect.
4:20pm - 4:40pmGRAPSE: Graph-Based Retrieval Augmentation for Process Systems Engineering
Daniel Ovalle1, Arpan Seth2, John Kitchin1, Carl Laird1, Ignacio Grossmann1
1Carnegie Mellon University, United States of America; 2Evonik Corporation, United States of America
Large Language Models (LLMs) have demonstrated impressive capabilities, performing well across a wide range of tasks, including accelerating scientific discovery in different fields. However, a significant limitation of these models is that they are restricted to the information found in their training data, which can be outdated and not tailored to specific domains. This dependence on pre-training data means that the accuracy and quality of their responses may vary, particularly for topics that are less common in the training corpus [1]. As a result, general-purpose LLMs often lack the specialized knowledge required for fields like Process Systems Engineering (PSE), a rapidly advancing area of research.
Retrieval-Augmented Generation (RAG) enhances LLMs by integrating domain-specific knowledge through the indexing of extensive text in a separate information retrieval system. This approach allows retrieved information to be presented to the LLM alongside the user question, facilitating access to up-to-date knowledge relevant to specific domains while improving interpretability and provenance tracking [2]. However, creating an indexed database for RAG from scientific documents presents several challenges. Scientific knowledge, especially in PSE, often involves complex mathematical expressions that standard LLM document parsers struggle to interpret, leading to a loss of essential semantic information [3]. Additionally, traditional indexed databases often overlook the relationships that exist across different papers, which can hinder the comprehensive retrieval of interconnected knowledge [4].
In this work, we propose a reflective-active LLM agent [5] that possesses domain-specific knowledge in PSE. The agent accesses a graph-based index database where PSE papers are processed using a computer vision model to accurately capture concepts, relationships, and mathematical expressions contained within them. Within this database, documents are grouped semantically and processed recursively, employing techniques such as embedding, clustering, and summarization to construct a hierarchical tree structure with varying levels of summarization from the bottom up [6]. This approach enables the agent to integrate multiple levels of abstraction effectively. The ultimate goal is to develop a tool that facilitates research and education in PSE, while also exploring the creation of benchmarks to evaluate the performance of LLM models specifically tailored to this field.
[1] F. Petroni, T. Rocktäschel, P. Lewis, A. Bakhtin, Y. Wu, A. H. Miller, and S. Riedel, "Language models as knowledge bases?" arXiv preprint arXiv:1909.01066, 2019. [2] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., "Retrieval-augmented generation for knowledge-intensive NLP tasks," Advances in Neural Information Processing Systems, vol. 33, pp. 9459 9474, 2020. [3] L. Blecher, G. Cucurull, T. Scialom, and R. Stojnic, "Nougat: Neural optical understanding for academic documents," arXiv preprint arXiv:2308.13418, 2023. [4] Y. Hu, Z. et al., "GRAG: Graph Retrieval-Augmented Generation," arXiv preprint arXiv:2405.16506, 2024. [5] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, "ReAct: Synergizing reasoning and acting in language models," arXiv preprint arXiv:2210.03629, 2022. [6] P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, "RAPTOR: Recursive abstractive processing for tree-organized retrieval," arXiv preprint arXiv:2401.18059, 2024.
4:40pm - 5:00pmFlexibility assessment via affine bound evaluation
Diogo Narciso1, Steven Sachio2,3, Maria M. Papathanasiou2,3
1CERENA, Department of Chemical Engineering, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal; 2The Sargent Centre for Process Systems Engineering, Imperial College London, London, United Kingdom; 3Department of Chemical Engineering, Imperial College London, London, United Kingdom
System design is guided primarily by the system performance with respect to its economic viability[1]. This process entails the selection of a set of variables within a feasible design space to achieve the optimal performance under nominal operating conditions[1]. In real life systems, however, system operation is often subject to multiple disturbances, which may cause their product output(s) to be off-spec. In such cases, flexibility becomes an important consideration in system design, since it promotes more robust designs and minimise the risk of operating under problematic regimes[2].
Several developments relating to flexibility in system design have been proposed since the 1980s. Swaney and Grossmann (1985) proposed the flexibility index metric which quantitatively describe the flexibility of a nominal process design based on inscribing hyperrectangles within the feasible space[3]. Since then, extended flexibility index formulations have been proposed to solve stochastic, dynamic, and non-convex problems[2]. However, recent works moved away from investigating nominal designs, to approximating the full feasible space (design space identification)[4].
This work proposes a novel approach for flexibility assessment. In design problems where the design space (DSp) is constrained by a set of affine bounds, flexibility may be expressed either as the minimum or the maximum distance with respect to the feasible (design) space bounds. For any point in the DSp, the minimum distance provides a good indicator on the minimum flexibility, as the direction that represents the highest risk of violating the constraints. An analogous conclusion can be drawn between the maximum distance and maximum flexibility. These distances (or flexibilities) can be computed exactly via geometrical considerations, enabling the calculation of a minimum-based and maximum-based flexibility metrics for all points in the DSp. This class of problems are in fact multiparametric programming problems as the goal is to obtain comprehensive flexibility maps, rather than investigating unique points in the DSp. In the case of minimum flexibility, their solutions comprise: (i) a set of critical regions defining a convex hull within the DSp (each associated with a unique nearest bound of the feasible space), (ii) the corresponding optimizer functions (projection at nearest bound), and (iii) objective functions (minimum distance).
A detailed mathematical framework was developed for this class of problems. It enables a new paradigm for flexibility assessment, which can be applied to design problems of any dimension. Problem complexity is significantly reduced in comparison with the classic multiparametric programming approaches, since only a limited number of active sets need to be considered during solution calculation.
References:
[1] Smith, R. 2005. Chemical process: design and integration, John Wiley & Sons.
[2] Pistikopoulos, E. N. & Tian, Y. 2024. Advanced Modeling and Optimization Strategies for Process Synthesis. Annual Review of Chemical and Biomolecular Engineering, 15, 81-103.
[3] Swaney, R. E. & Grossmann, I. E. 1985. An index for operational flexibility in chemical process design. Part I: Formulation and theory. AIChE Journal, 31, 621-630.
[4] Geremia, M., Bezzo, F. & Ierapetritou, M. G. 2023. A novel framework for the identification of complex feasible space. Computers & Chemical Engineering, 179, 108427.
5:00pm - 5:20pmInterval Hessian-based Algorithm for Nonconvex Optimization
Ashutosh Sharma1, Gauransh Dingwani2, Ishan Bajaj1
1Department of Chemical Engineering, Indian Institute of Technology Kanpur, Kanpur 208016, India; 2Department of Chemical Engineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
Second-order optimization algorithms based on the Hessian of the objective function have been proven to achieve a faster convergence rate than the first-order methods. However, their application to large-scale nonconvex problems is hindered due to the computational cost associated with: (1) Hessian evaluation, (2) Hessian matrix inversion, and (3) modifying the Hessian matrix to ensure its positive-definiteness.
Accordingly, we propose a new search direction based on interval Hessian and incorporate it in a line-search framework to find a local minimum of unconstrained nonconvex optimization problems. The algorithm works as follows.
Step 0: Define p as the index corresponding to the Hessian update and k corresponding to the iterate update.
Step 1: Define a region of size Δ around the current iterate xk=xp, obtain the variable bounds (xk, xp ∈ [xpl, xpu]), and estimate the interval Hessian ([∇2ƒp]).
Step 2: Estimate the lower bound on minimum eigenvalue (λp) of [∇2ƒp] . We use Gerschgorin, E-Matrix and Mori-Kokame methods for this step.
Step 3: Obtain a Hessian matrix Hp=∇2ƒp+λpI that that approximates the Hessian of the objective function and guaranteed to be positive-definite in the interval [xpl, xpu]. The matrix Hp can be viewed as the Hessian of the αBB convex underestimator [1] developed for deterministic global optimization of twice differentiable nonconvex problems.
Step 4: Obtain the search direction by taking the product of (Hp)-1 and the gradient ∇ƒk.
Step 5: The new iterate is obtained by xk+1 = xk - θk (Hp)-1∇ƒk, where θk is the step length satisfying the Armijo conditions.
Step 6: If ||∇ƒk|| < ε, then terminate the algorithm; otherwise, update the index k ← k+1. If xk+1 ∈ [xpl, xpu], go to step 4. Otherwise, p ←p+1 and go to step 2.
The novelty of the algorithm is that, unlike traditional second-order methods such as the Newton method, we avoid performing expensive operations at all iterations. Specifically, for the iterations with xk ∈ [xpl, xpu], we compute the Hessian only once, and the search direction is obtained by matrix-vector multiplication. On the other hand, at every iteration of the Newton method, Hessian needs to be computed and finding the search direction requires O(n3) operations. Moreover, for the Newton method, the Hessian may become indefinite for noncovex problems and needs to be modified to guarantee a descent direction. However, in our algorithm, the matrix Hp (defined in Step 3 above) is a positive-definite approximation of the Hessian of the original function and need not be modified.
We apply our algorithm on a set of 210 problems and show that our method converges to a local minimum for 70% problems using Δ = 0.1 in less than 1000 Hessian evaluations. We compared the traditional method of Hessian modification based on trial and error with the proposed algorithm and showed that for less than 21 O(n3) operations our proposed algorithm could solve 44% of problems and the traditional method could solve 33% problems.
References
[1] Adjiman et al. (1998). Computers & Chemical Engineering, 22(9):1137–1158
5:20pm - 5:40pmEfficient exploration of near-optimal designs by exploiting convexity
Evren Mert Turan, Stefano Moret, André Bardow
ETH Zurich, Switzerland
The transition away from fossil resources necessitates a massive shift in the energy, chemicals, and fuels industry. To facilitate this shift, researchers have developed green-field models of future supply chains, technology mixes, and infrastructure requirements. These predictions are derived from systems models which are typically formulated as large-scale linear programs that minimize the capital and operating cost of the greenfield system while meeting specified sustainability goals. However, Trutnevyte (2016) showed that providing a single cost-optimal solution is both limiting and naïve: Firstly, due to the deep uncertainty of the future and simplifying assumptions, we can be certain that the cost-optimal solution is not the “real life” cost-optimal solution. Secondly, a model usually neglects crucial factors that decision-makers will consider in planning the transition. Some of these factors are challenging to state mathematically, e.g., public acceptance of technologies, and so including these factors directly in the optimization constraints or additional objectives is not feasible.
This realisation has led to the proposal of various algorithms for generating many near-cost-optimal designs (e.g., 10% suboptimal). The designs can then be analysed at a later stage taking all decision factors into account. These algorithms are unified under the name Modelling to Generate Alternatives (MGA, Brill et al. 1982). MGA algorithms solve a sequence of optimization problems to explore the near-optimal space. MGA has two goals: (i) to identify maximally different solutions and (ii) to ensure a diversity of solutions to prevent a bias towards certain combinations. An ideal MGA algorithm will meet these goals while avoiding excessive computational effort and not requiring major changes of existing models. Surprisingly, MGA algorithms are not compared based on their effectiveness at meeting these goals but instead with respect to the computational cost per iteration (Lau et al., 2024).
In this work, we propose tractable metrics to measure MGA goals and propose a prototype MGA algorithm that exploits the convexity of the design problem to efficiently generate near-optimal designs. The algorithm is based on the classic hit-and-run Markov chain Monte Carlo algorithm and involves solving a sequence of feasibility problems, instead of optimization problems, by generating new designs along line segments within the near-optimal region. This greatly reduces the computational cost per new design.
We benchmark the proposed MGA metrics and algorithm to alternative methods from the literature. Our results show the benefits of exploiting the convexity of the solutions space. The resulting methodology can be directly applied to any design problem stated as a linear program and is applicable with minor modifications to any convex design problem.
References
Brill Jr, E. D., Chang, S. Y., & Hopkins, L. D. (1982). Modeling to generate alternatives: The HSJ approach and an illustration using a problem in land use planning. Management Science, 28(3), 221-235.
Trutnevyte, E. (2016). Does cost optimization approximate the real-world energy transition?. Energy, 106, 182-193.
Lau, M., Patankar, N., & Jenkins, J. D. (2024). Measuring Exploration: Review and Systematic Evaluation of Modelling to Generate Alternatives Methods in Macro-Energy Systems Planning Models. arXiv preprint arXiv:2405.17342.
5:40pm - 6:00pmSNoGloDe: A Structured Nonlinear Global Decomposition Solver
Georgia Stinchfield1, Arsh Bhatia1, Michael Bynum2, Yankai Cao3, Carl Laird1
1Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA; 2Discrete Math and Optimization, Sandia National Laboratories, Albuquerque, NM; 3University of British Columbia, Chemical and Biological Engineering, Vancouver, British Columbia, Canada
Many industrial scale optimization problems cannot be solved directly with off the shelf solvers. These large-scale optimization problems typically require decomposition strategies and customized optimization algorithms to reach a feasible solution within a realistic computational time window. Many decomposition techniques rely on reformulating the problem such that it has a block-angular structure. In this work, we focus on implementing and extending a decomposition algorithm originally posed for nonlinear two-stage stochastic programs (stochastic programs exhibit this block-angular structure), as first proposed by Cao and Zavala [1]. While many decomposition methods are targeted towards linear or convex models the approach proposed in [1] has global optimality guarantees on a class of nonlinear, non-convex problems.
The general algorithmic framework proposed by [1] involves traversing a spatial branch and bound tree in the space of the first-stage variables of the stochastic program. The lower bounding problem is formulated by removing the linking equality constraints, and the upper bounding problem is typically solved by fixing the first-stage variables, making the major algorithmic steps of this approach trivially parallelizable.
We develop a general framework, utilizing the algebraic modeling language Pyomo [2], to decompose and solve large-scale optimization problems with variants of a block angular structure; in other words, extending beyond just two-stage nonlinear stochastic programs to other cases involving complicating variables. For example, consider problems that are decomposed temporally. Instead of enforcing equality constraints across a subset of first-stage variables, neighboring time periods from the original time horizon must have equality constraints imposed on certain variables from the end of time period N-1 and the start of time period N. These problems can have combinations of discrete and continuous decisions, along with linear and nonlinear constraints. Our tool aims to create a unified framework to apply and extend the general algorithm proposed in [1] to allow for customizations to the upper and lower bounding problems (e.g., problem-specific generations of candidate solutions, custom relaxations at the lower bound), provides a parallelized implementation, and allows for a variety of problem structures that exhibit complicating variables.
References
[1] Cao, Yankai, and Victor M. Zavala. "A scalable global optimization algorithm for stochastic nonlinear programs." Journal of Global Optimization 75.2 (2019): 393-416.
[2] Bynum, Michael L., et al. Pyomo-optimization modeling in python. Vol. 67. No. s 32. Berlin/Heidelberg, Germany: Springer, 2021.
Acknowledgements & Disclaimers
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
|