Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
Abstract Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to t...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2025-01-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01757-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861520194338816 |
---|---|
author | Guoyong Wang Tiange Fu Ruijuan Zheng Xuhui Zhao Junlong Zhu Mingchuan Zhang |
author_facet | Guoyong Wang Tiange Fu Ruijuan Zheng Xuhui Zhao Junlong Zhu Mingchuan Zhang |
author_sort | Guoyong Wang |
collection | DOAJ |
description | Abstract Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (AdaBNTD) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for AdaBNTD within the Markovian observation framework. In particular, AdaBNTD is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of $${{\mathcal {O}}}(1/\sqrt{K})$$ O ( 1 / K ) , where K denotes the iteration count. Besides, the effectiveness AdaBNTD is also verified through several reinforcement learning benchmark domains. |
format | Article |
id | doaj-art-fee16efbbe114378aa3e25415c641530 |
institution | Kabale University |
issn | 2199-4536 2198-6053 |
language | English |
publishDate | 2025-01-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj-art-fee16efbbe114378aa3e25415c6415302025-02-09T13:01:10ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-01-0111211910.1007/s40747-024-01757-wAdaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysisGuoyong Wang0Tiange Fu1Ruijuan Zheng2Xuhui Zhao3Junlong Zhu4Mingchuan Zhang5School of Information Engineering, Luoyang Institute of Science and TechnologyLongmen LaboratorySchool of Information Engineering, Henan University of Science and TechnologySchool of Information Engineering, Henan University of Science and TechnologySchool of Information Engineering, Henan University of Science and TechnologyLongmen LaboratoryAbstract Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (AdaBNTD) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for AdaBNTD within the Markovian observation framework. In particular, AdaBNTD is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of $${{\mathcal {O}}}(1/\sqrt{K})$$ O ( 1 / K ) , where K denotes the iteration count. Besides, the effectiveness AdaBNTD is also verified through several reinforcement learning benchmark domains.https://doi.org/10.1007/s40747-024-01757-wAdaptive methodsNon-asymptotic convergenceNonlinear function approximationReinforcement learningTemporal-difference learning |
spellingShingle | Guoyong Wang Tiange Fu Ruijuan Zheng Xuhui Zhao Junlong Zhu Mingchuan Zhang Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis Complex & Intelligent Systems Adaptive methods Non-asymptotic convergence Nonlinear function approximation Reinforcement learning Temporal-difference learning |
title | Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis |
title_full | Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis |
title_fullStr | Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis |
title_full_unstemmed | Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis |
title_short | Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis |
title_sort | adaptive temporal difference learning via deep neural network function approximation a non asymptotic analysis |
topic | Adaptive methods Non-asymptotic convergence Nonlinear function approximation Reinforcement learning Temporal-difference learning |
url | https://doi.org/10.1007/s40747-024-01757-w |
work_keys_str_mv | AT guoyongwang adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis AT tiangefu adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis AT ruijuanzheng adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis AT xuhuizhao adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis AT junlongzhu adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis AT mingchuanzhang adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis |