ESCAPE | 35 - ConfTool Pro Printout

Session

T4: Model Based optimisation and advanced Control - Including keynote

Time:

Tuesday, 08/July/2025:

8:30am - 10:30am

Chair: Srinivas Palanki
Co-chair: Radoslav Paulen

Location: Zone 3 - Aula D002

KU Leuven Ghent Technology Campus Gebroeders De Smetstraat 1, 9000 Gent

Presentations

8:30am - 9:10am

Keynote: Principles and Applications of Model-free Extremum Seeking – A Tutorial Review

Laurent Dewasme, Alain Vande Wouwer

University of Mons, Belgium

Extremum seeking has become a large family of perturbation-based methods whose origins can be traced back to the work of the French Engineer Leblanc in 1922, who sought to transmit electrical power to a train car contactlessly. Since then, extremum seeking has gained significant attention, especially in the past three decades. It is a practical approach aimed at achieving optimal performance of a system by continuously seeking the extremum of an online-measured cost function. The method finds various applications in different fields, provided that the required measurement information is available and an optimality principle can be formulated. While many simulation studies have confirmed the effectiveness of extremum seeking, relatively few experimental studies have been conducted to validate its real-world applications. This gap between theory and practice is a common challenge in real-time optimization and control. This presentation will provide an in-depth introduction to the foundational principles and applications of extremum seeking, including variations and extensions of the method. Practical applications will be presented in various fields, such as energy, biotechnology, and robotics, with, among others, a focus on the authors’ experience through several research studies over the last 15 years.

9:10am - 9:30am

Extremum seeking control applied to operation of dividing wall column

Ivar J. Halvorsen^1,2, Mark Haring¹, Sigurd Skogestad²

¹SINTEF Digital, Norway; ²Norwegian University of Science and Technology

The dividing wall column is an attractive arrangement since there is a significant energy saving potential compared to conventional column sequences. The realisation of the saving potential requires control structures that can track the optimal operation point despite inevitable changes in feed properties, performance characteristics and other uncertainties. The characteristic of the optimum is known, given a good model and key measurements that enable precise information about the internal states. However, there will always be uncertainties, in the model, in the measurements, in realisation for manipulative variables and the most informative measurements that would be some key composition data inside the arrangement are usually not available, but some information may be inferred via temperature profiles. Extremum seeking control (ESC) is a model-free optimisation technique that by active perturbation of selected manipulative variables infers gradient properties of a measured cost function and by that enable tracking of a moving optimum. Extremum seeking control can be used also in combination with other approaches, e.g. self-optimising-control (SOC). The key point is that for any kind of model-based control and optimisation technique there will be remaining uncertainties such that further adjustments by model-free ESC on top will be beneficial. Some structures of ESC for DWC will be analysed and will be used to plan experiments on a pilot plant.

9:30am - 9:50am

MORL4PC: An Adaptive Multi-Objective Reinforcement Learning approach for Process Control

Niki Kotecha, Max Bloor, Calvin Tsay, Antonio del Rio Chanona

Sargent Centre for Process Systems Engineering, Imperial College London, United Kingdom

Industrial process control presents a complex challenge of balancing multiple objectives, such as optimizing productivity, reducing energy consumption, minimizing environmental impact, and maintaining safe operation [1]. Model-based traditional control methods often struggle to handle these competing goals, particularly when confronted with unexpected disruptions that deviate from their underlying assumptions. These disturbances can render pre-defined models inaccurate, leading to suboptimal control decisions, emergency shutdowns, and costly downtime [2]. Multi-objective reinforcement learning (MORL) offers a powerful solution to this problem by learning a set of Pareto-optimal policies that provide trade-offs between various objectives. With MORL, operators can seamlessly switch between policies in response to system disruptions, preventing shutdowns and ensuring smoother operation under changing conditions [3].

This paper explores the application of MORL in process control systems, where uncertain and time-varying environments present significant challenges for traditional control methods. By utilizing MORL, we enable controllers to optimize multiple competing objectives simultaneously, generating a Pareto front of policies that allow for flexible decision-making. This Pareto front provides operators with a set of pre-learned policies that offer different trade-offs, enabling quick adaptation to disruptions like equipment failure, raw material fluctuations, or unanticipated system variations.

In this work, we integrate multi-objective evolutionary algorithms (MOEA) within a reinforcement learning framework to find a set of adaptable policies that effectively balance two conflicting objectives. We employ the MOEA to adapt the neural network (policy) parameters by applying the evolutionary algorithm in the parameter space, resulting in a Pareto front in the policy space. Non-dominated Sorting Genetic Algorithm II (NSGA-II) is used to evaluate and sort the policies, yielding a final population that represents a Pareto front of non-dominated solution. Our methodology employs an adaptive strategy where when a disruption hits the system, the policy switches dynamically to another policy from the Pareto front set of policies obtained during training.

One of the key advantages of MOEA-RL for process control is its ability to enhance system resilience. When a disruption hits, instead of relying on a single static policy, operators can choose the most appropriate policy from the Pareto front to prioritize certain objectives, such as minimizing waste or reducing energy consumption, while maintaining system stability. This adaptability greatly reduces the need for emergency shutdowns, as the system can continue to operate under new conditions with an optimized, situation-specific policy. This results in improved operational efficiency, fewer downtime incidents, and increased overall process stability. The effectiveness of our method is shown through a series of case studies, simulating various disruptions. Throughout these, our approach consistently showcases adaptability and robustness across diverse disruptions.

In conclusion, MOEA-RL allows the system to seamlessly switch between policies from the trained Pareto set in response to disruptions, allowing operators to respond more effectively, minimizing downtime and improving overall system performance. Future work will focus on incorporating curiosity driven exploration and exploring methods to further enhance policy switching in highly dynamic environments.

[1]Simkoff,J.M., etal."Process control and energy efficiency."Annu.Rev.Chem.Biomol.Eng.11(2020):423–445.

[2]Bloor,Maximilian,etal."Control-Informed ReinforcementLearning for ChemicalProcesses."arXivpreprint arXiv:2408.13566(2024).

[3]Hayes,Conor F.,etal."A practical guide to multi-objective reinforcement learning and planning."AutonomousAgents 36.1(2022):26.

9:50am - 10:10am

Perturbation methods for Modifier Adaptation with Quadratic approximation

Mohamed Tarek Aboelnour^1,2, Sebastian Engell²

¹BASF SE, Germany; ²Technical University Dortmund

In recent years, the field of real-time optimization (RTO) has gained significant attention due to the increasing pressure to operate processing plants optimally from both an economic and emissions point of view. A key problem in RTO is that the plant models must be accurate in order to obtain an admissible and optimal operating point of the plant. While the re-estimation of model parameters online can improve performance if the plant model is structurally correct, Modifier Adaptation (MA) can handle both parametric and structural plant-model mismatches. It is an iterative method that adapts the gradients of the cost function and the gradients of the constraints based on measurement information. Upon convergence, the first-order optimality conditions are satisfied.

In this contribution, we employ Modifier Adaptation with Quadratic Approximation (MAWQA) [1]. MAWQA combines modifier adaptation with a quadratic approximation approach adapted from derivative-free optimization, and thereby alleviates the problem of estimating the gradients from noisy measurements. The quadratic optimization is computed using information from past operating points. However, in the quadratic approximation, the distribution of the points from which the approximation is computed has a significant influence. It is possible for the optimization to become stuck in a region away from the true optimum because the information from the previous operating points is not sufficiently rich. In such cases, additional trials (or perturbations) are necessary.

For gradients computed by finite differences, it was proposed in [2] to solve an additional optimization problem aimed at maximizing the inverse of the condition number of the inputs to compute a trial point that would best improve the geometry. While this strategy proves effective for simplex gradients, it is less well-suited for quadratic approximations.

In this paper, we propose two different methods from derivative-free optimization for measuring the poisdness of the distribution of the observations. In addition, we introduce perturbation methods that are better suited for improving the quality of quadratic surrogate functions, based on the poisdness measures. The first method leverages Lagrange polynomials to compute a set of inputs with improved geometric distribution. The second method utilizes pivot polynomials in combination with Gaussian elimination to plan the plant trials. We compare these methods with others proposed in the literature and validate their efficiency using the Williams-Otto reactor benchmark, a well-established test case for RTO methodologies.

[1] Gao, W., Wenzel, S., and Engell, S. (2016). A reliable modifier-adaptation strategy for real-time optimization. Computers & Chemical Engineering, 91, 318–328.

[2] Gao, W. and Engell, S. (2005). Iterative set-point optimization of batch chromatography. Computers & Chemical Engineering, 29, 1401–1409.

Perturbation methods for Modifier Adaptation with Quadratic approximation

10:10am - 10:30am

Optimal Energy Scheduling for Battery and Hydrogen Storage Systems Using Reinforcement Learning

Moritz Zebenholzer¹, Lukas Kasper¹, Alexander Schirrer², René Hofmann¹

¹TU Wien, Institute of Energy Systems and Thermodynamics, Austria; ²TU Wien, Institute of Mechanics and Mechatronics, Austria

Due to the energy transition, the share of renewable forms of energy, such as wind and photovoltaic, is steadily increasing. These are highly volatile, resulting in a gap between generation and demand, which must be balanced by storage at any time but which is difficult to predict. To accomplish this in a highly efficient and reliable way in relation to time and energy, sector-coupled multi-energy systems (MES) combined with battery and hydrogen storage systems are deployed. The optimal and safe operation of such MES requires operational planning, typically done by rule-based controllers (RBC) in the industry, whereas more elaborate model predictive control (MPC) strategies are the subject of current research. This form of optimal control is generally seen as delivering the best possible performance, usually based on minimum operating costs in compliance with the system-relevant constraints.
However, the main obstacle to realizing MPC is that the optimization depends heavily on an adequate model of the system dynamics, which requires extensive effort to build. In addition, the uncertain prediction of stochastic fluctuating quantities such as renewable energy generation, demand and electricity prices strongly affect the control performance. Moreover, in use cases that require long prediction horizons and detailed models, the arising mixed-integer MPC problems may require excessive computation effort.
This work aims to use Reinforcement Learning (RL) to overcome these difficulties without applying elaborate mixed-integer linear programming (MILP). A hybrid neural network (NN) combines a classification module for binary variables with a regression layer for continuous values to efficiently model discontinuous system behaviour. The self-learning algorithm, which requires no prior knowledge of the system dynamics, can inherently learn the uncertainties of the input variables on the system only through interaction with the model. In a case study, it is demonstrated that RL can learn complex system behaviour with comparable quality to the MPC and outperforms the RBC. The trained policy of the RL agent is then deployed while significantly decreasing the computational effort.
Methods shall be developed in future work to enable the RL agent to enhance or outperform the MPC operational strategy. Here, statistical analyses will be used to derive additional information about the predictions of energy generation, demand and electricity prices to obtain an enriched RL agent. In addition, feature and reward engineering will incorporate relevant information into the training process.

Conference Agenda