JavaScript ist deaktiviert
Die JavaScript-Funktionalität Ihres Browsers ist deaktiviert. Um diese ConfTool-Funktion nutzen zu können, müssen Sie JavaScript aktivieren.
Hier finden Sie Informationen zur Aktivierung von JavaScript.
Bei Fragen oder Problemen wenden Sie sich bitte an das Organisationsteam unter aale25@htw-dresden.de.

Veranstaltungsprogramm

Eine Übersicht aller Sessions/Sitzungen dieser Veranstaltung.
Bitte wählen Sie einen Ort oder ein Datum aus, um nur die betreffenden Sitzungen anzuzeigen. Wählen Sie eine Sitzung aus, um zur Detailanzeige zu gelangen.

Nur Sitzungen an einem bestimmten Tag / zu einer bestimmten Zeit

Sitzungsübersicht

Sitzung

SES 1-3: Maschinelles Lernen für Industrielle Anwendungen

Zeit:

Donnerstag, 13.03.2025:

11:45 - 12:30

Chair der Sitzung: Thomas Weickert, Hochschule Mannheim

Ort: Hörsaal Z211

Hörsaal Z211, 2. Etage, Zentralgebäude

Präsentationen

Beyond Monte Carlo: Leveraging Temporal Difference Learning for Superior Performance in Dynamic Resource Allocation

David Heik, Fouad Bahrpeyma, Dirk Reichelt

Hochschule für Technik und Wirtschaft, Deutschland

The application of reinforcement learning in dynamic industrial scheduling has gained increasing attention due to its potential to optimize complex manufacturing processes. Industry 4.0 and the rise of smart manufacturing present new challenges that require innovative approaches, particularly in environments with high variability and uncertainty. Previous work demonstrated that reinforcement learning, especially through Monte Carlo methods, significantly improves performance in job-shop scheduling scenarios by optimizing resource allocation. However, while Monte Carlo methods excel when the reward function is clear and retrospective, real-world manufacturing systems often require more dynamic, real-time decision-making capabilities, for which temporal difference methods are more appropriate. Research in this area has shown the effectiveness of reinforcement learning, but a gap remains in understanding how different reward functions impact the learning process in temporal difference systems. In this study, we systematically analyzed multiple reward functions within a temporal difference system, applying a sensitivity analysis to assess their impact during training and evaluation phases. Despite the inherent complexities and challenges posed by temporal difference methods, our results demonstrated further improvements in the overall performance of the production line. This paper demonstrates how a goal-oriented reward function can be systematically developed.

YoloRL: simplifying dynamic scheduling through efficient action selection based on multi-agent reinforcement learning

David Heik, Fouad Bahrpeyma, Dirk Reichelt

Hochschule für Technik und Wirtschaft, Deutschland

The ability to react sovereign and dynamic to unpredictable events in a automatically manner is crucial for cost-effective production scheduling in modern manufacturing environments. The progressive integration of cyber-physical systems into industrial sectors is one of the prerequisites for this development. In this context, the industrial data generated there provides the foundation for the operative and strategical decision-making. A central challenge is to collect this data in real time, transform it if necessary and finally analyze it in order to ensure time-critical decisions. In this paper, we present a novel approach that simplifies dynamic scheduling through efficient action selection. Our method, YoloRL, is based on (multi-agent) reinforcement learning, but is characterized by a highly simplified training process. Instead of considering all state information of an episode, our YoloRL (You only look once Reinforcement Learning) approach focuses only on the initial state to identify promising action sequences. This leads to a significant reduction in training complexity, while at the same time enabling robust and adaptive control. The performance of the manufacturing system in this work is measured by the overall completion time, with the objective of minimizing this metric. Our experimental results indicate that the proposed method leads to a faster generalization of the learned domain knowledge and yields a powerful policy that performs efficiently and reliably in dynamic environments.