JavaScript ist deaktiviert 
  Die JavaScript-Funktionalität  Ihres Browsers ist deaktiviert. Um diese ConfTool-Funktion nutzen zu können, müssen Sie JavaScript aktivieren .  Informationen zur Aktivierung von JavaScript.  aale25@htw-dresden.de  
  
 
 
 
Eine Übersicht aller Sessions/Sitzungen dieser Veranstaltung. Ort  oder ein Datum  aus, um nur die betreffenden Sitzungen anzuzeigen. Wählen Sie eine Sitzung aus, um zur Detailanzeige zu gelangen.
 
 
SES 1-3: Maschinelles Lernen für Industrielle Anwendungen 
Zeit:  
 Donnerstag, 13.03.2025:  
 11:45 - 12:30 
Chair der Sitzung:  Thomas Weickert , Hochschule Mannheim
Ort: Hörsaal Z211 Hörsaal Z211, 2. Etage, Zentralgebäude 
 
 
 
Präsentationen  
Beyond Monte Carlo: Leveraging Temporal Difference Learning for Superior Performance in Dynamic Resource Allocation
 David Heik , Fouad Bahrpeyma, Dirk Reichelt
Hochschule für Technik und Wirtschaft, Deutschland
The application of reinforcement learning in dynamic industrial scheduling has gained increasing attention due to its potential to optimize complex manufacturing processes. Industry 4.0 and the rise of smart manufacturing present new challenges that require innovative approaches, particularly in environments with high variability and uncertainty. Previous work demonstrated that reinforcement learning, especially through Monte Carlo methods, significantly improves performance in job-shop scheduling scenarios by optimizing resource allocation. However, while Monte Carlo methods excel when the reward function is clear and retrospective, real-world manufacturing systems often require more dynamic, real-time decision-making capabilities, for which temporal difference methods are more appropriate. Research in this area has shown the effectiveness of reinforcement learning, but a gap remains in understanding how different reward functions impact the learning process in temporal difference systems. In this study, we systematically analyzed multiple reward functions within a temporal difference system, applying a sensitivity analysis to assess their impact during training and evaluation phases. Despite the inherent complexities and challenges posed by temporal difference methods, our results demonstrated further improvements in the overall performance of the production line. This paper demonstrates how a goal-oriented reward function can be systematically developed.
 
 
YoloRL: simplifying dynamic scheduling through efficient action selection based on multi-agent reinforcement learning
 David Heik , Fouad Bahrpeyma, Dirk Reichelt
Hochschule für Technik und Wirtschaft, Deutschland
The ability to react sovereign and dynamic to unpredictable events in a automatically manner is crucial for cost-effective production scheduling in modern manufacturing environments. The progressive integration of cyber-physical systems into industrial sectors is one of the prerequisites for this development. In this context, the industrial data generated there provides the foundation for the operative and strategical decision-making. A central challenge is to collect this data in real time, transform it if necessary and finally analyze it in order to ensure time-critical decisions. In this paper, we present a novel approach that simplifies dynamic scheduling through efficient action selection. Our method, YoloRL, is based on (multi-agent) reinforcement learning, but is characterized by a highly simplified training process. Instead of considering all state information of an episode, our YoloRL (You only look once Reinforcement Learning) approach focuses only on the initial state to identify promising action sequences. This leads to a significant reduction in training complexity, while at the same time enabling robust and adaptive control. The performance of the manufacturing system in this work is measured by the overall completion time, with the objective of minimizing this metric. Our experimental results indicate that the proposed method leads to a faster generalization of the learned domain knowledge and yields a powerful policy that performs efficiently and reliably in dynamic environments.