Preference learning based deep reinforcement learning for flexible job shop scheduling problem

Abstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xinning Liu, Li Han, Ling Kang, Jiannan Liu, Huadong Miao
Format:	Article
Language:	English
Published:	Springer 2025-01-01
Series:	Complex & Intelligent Systems
Subjects:	Flexible job shop scheduling problem Preference learning Proximal policy optimization Deep reinforcement learning
Online Access:	https://doi.org/10.1007/s40747-024-01772-x
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823861463709646848
author	Xinning Liu Li Han Ling Kang Jiannan Liu Huadong Miao
author_facet	Xinning Liu Li Han Ling Kang Jiannan Liu Huadong Miao
author_sort	Xinning Liu
collection	DOAJ
description	Abstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics.
format	Article
id	doaj-art-4d4b96dc2f8646fa95020a2fad6cda40
institution	Kabale University
issn	2199-4536 2198-6053
language	English
publishDate	2025-01-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj-art-4d4b96dc2f8646fa95020a2fad6cda402025-02-09T13:01:07ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-01-0111212310.1007/s40747-024-01772-xPreference learning based deep reinforcement learning for flexible job shop scheduling problemXinning Liu0Li Han1Ling Kang2Jiannan Liu3Huadong Miao4School of Computer and Software, Dalian Neusoft University of InformationSchool of Computer and Software, Dalian Neusoft University of InformationNeusoft Research Institute, Dalian Neusoft University of InformationSchool of Computer and Software, Dalian Neusoft University of InformationSNOW China (Beijing) Co. Ltd., Dalian BranchAbstract The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics.https://doi.org/10.1007/s40747-024-01772-xFlexible job shop scheduling problemPreference learningProximal policy optimizationDeep reinforcement learning
spellingShingle	Xinning Liu Li Han Ling Kang Jiannan Liu Huadong Miao Preference learning based deep reinforcement learning for flexible job shop scheduling problem Complex & Intelligent Systems Flexible job shop scheduling problem Preference learning Proximal policy optimization Deep reinforcement learning
title	Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_full	Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_fullStr	Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_full_unstemmed	Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_short	Preference learning based deep reinforcement learning for flexible job shop scheduling problem
title_sort	preference learning based deep reinforcement learning for flexible job shop scheduling problem
topic	Flexible job shop scheduling problem Preference learning Proximal policy optimization Deep reinforcement learning
url	https://doi.org/10.1007/s40747-024-01772-x
work_keys_str_mv	AT xinningliu preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem AT lihan preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem AT lingkang preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem AT jiannanliu preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem AT huadongmiao preferencelearningbaseddeepreinforcementlearningforflexiblejobshopschedulingproblem

Preference learning based deep reinforcement learning for flexible job shop scheduling problem

Similar Items