Preference learning based deep reinforcement learning for flexible job shop scheduling problem
Abstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industr...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2025-01-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01772-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861463709646848 |
---|---|
author | Xinning Liu Li Han Ling Kang Jiannan Liu Huadong Miao |
author_facet | Xinning Liu Li Han Ling Kang Jiannan Liu Huadong Miao |
author_sort | Xinning Liu |
collection | DOAJ |
description | Abstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics. |
format | Article |
id | doaj-art-4d4b96dc2f8646fa95020a2fad6cda40 |
institution | Kabale University |
issn | 2199-4536 2198-6053 |
language | English |
publishDate | 2025-01-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj-art-4d4b96dc2f8646fa95020a2fad6cda402025-02-09T13:01:07ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-01-0111212310.1007/s40747-024-01772-xPreference learning based deep reinforcement learning for flexible job shop scheduling problemXinning Liu0Li Han1Ling Kang2Jiannan Liu3Huadong Miao4School of Computer and Software, Dalian Neusoft University of InformationSchool of Computer and Software, Dalian Neusoft University of InformationNeusoft Research Institute, Dalian Neusoft University of InformationSchool of Computer and Software, Dalian Neusoft University of InformationSNOW China (Beijing) Co. Ltd., Dalian BranchAbstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics.https://doi.org/10.1007/s40747-024-01772-xFlexible job shop scheduling problemPreference learningProximal policy optimizationDeep reinforcement learning |
spellingShingle | Xinning Liu Li Han Ling Kang Jiannan Liu Huadong Miao Preference learning based deep reinforcement learning for flexible job shop scheduling problem Complex & Intelligent Systems Flexible job shop scheduling problem Preference learning Proximal policy optimization Deep reinforcement learning |
title | Preference learning based deep reinforcement learning for flexible job shop scheduling problem |
title_full | Preference learning based deep reinforcement learning for flexible job shop scheduling problem |
title_fullStr | Preference learning based deep reinforcement learning for flexible job shop scheduling problem |
title_full_unstemmed | Preference learning based deep reinforcement learning for flexible job shop scheduling problem |
title_short | Preference learning based deep reinforcement learning for flexible job shop scheduling problem |
title_sort | preference learning based deep reinforcement learning for flexible job shop scheduling problem |
topic | Flexible job shop scheduling problem Preference learning Proximal policy optimization Deep reinforcement learning |
url | https://doi.org/10.1007/s40747-024-01772-x |
work_keys_str_mv | AT xinningliu preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem AT lihan preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem AT lingkang preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem AT jiannanliu preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem AT huadongmiao preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem |