Preference learning based deep reinforcement learning for flexible job shop scheduling problem

Abstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industr...

Full description

Saved in:
Bibliographic Details
Main Authors: Xinning Liu, Li Han, Ling Kang, Jiannan Liu, Huadong Miao
Format: Article
Language:English
Published: Springer 2025-01-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01772-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861463709646848
author Xinning Liu
Li Han
Ling Kang
Jiannan Liu
Huadong Miao
author_facet Xinning Liu
Li Han
Ling Kang
Jiannan Liu
Huadong Miao
author_sort Xinning Liu
collection DOAJ
description Abstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics.
format Article
id doaj-art-4d4b96dc2f8646fa95020a2fad6cda40
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2025-01-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-4d4b96dc2f8646fa95020a2fad6cda402025-02-09T13:01:07ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-01-0111212310.1007/s40747-024-01772-xPreference learning based deep reinforcement learning for flexible job shop scheduling problemXinning Liu0Li Han1Ling Kang2Jiannan Liu3Huadong Miao4School of Computer and Software, Dalian Neusoft University of InformationSchool of Computer and Software, Dalian Neusoft University of InformationNeusoft Research Institute, Dalian Neusoft University of InformationSchool of Computer and Software, Dalian Neusoft University of InformationSNOW China (Beijing) Co. Ltd., Dalian BranchAbstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics.https://doi.org/10.1007/s40747-024-01772-xFlexible job shop scheduling problemPreference learningProximal policy optimizationDeep reinforcement learning
spellingShingle Xinning Liu
Li Han
Ling Kang
Jiannan Liu
Huadong Miao
Preference learning based deep reinforcement learning for flexible job shop scheduling problem
Complex & Intelligent Systems
Flexible job shop scheduling problem
Preference learning
Proximal policy optimization
Deep reinforcement learning
title Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_full Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_fullStr Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_full_unstemmed Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_short Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_sort preference learning based deep reinforcement learning for flexible job shop scheduling problem
topic Flexible job shop scheduling problem
Preference learning
Proximal policy optimization
Deep reinforcement learning
url https://doi.org/10.1007/s40747-024-01772-x
work_keys_str_mv AT xinningliu preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem
AT lihan preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem
AT lingkang preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem
AT jiannanliu preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem
AT huadongmiao preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem