Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis

Abstract Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to t...

Full description

Saved in:
Bibliographic Details
Main Authors: Guoyong Wang, Tiange Fu, Ruijuan Zheng, Xuhui Zhao, Junlong Zhu, Mingchuan Zhang
Format: Article
Language:English
Published: Springer 2025-01-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01757-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861520194338816
author Guoyong Wang
Tiange Fu
Ruijuan Zheng
Xuhui Zhao
Junlong Zhu
Mingchuan Zhang
author_facet Guoyong Wang
Tiange Fu
Ruijuan Zheng
Xuhui Zhao
Junlong Zhu
Mingchuan Zhang
author_sort Guoyong Wang
collection DOAJ
description Abstract Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (AdaBNTD) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for AdaBNTD within the Markovian observation framework. In particular, AdaBNTD is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of $${{\mathcal {O}}}(1/\sqrt{K})$$ O ( 1 / K ) , where K denotes the iteration count. Besides, the effectiveness AdaBNTD is also verified through several reinforcement learning benchmark domains.
format Article
id doaj-art-fee16efbbe114378aa3e25415c641530
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2025-01-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-fee16efbbe114378aa3e25415c6415302025-02-09T13:01:10ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-01-0111211910.1007/s40747-024-01757-wAdaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysisGuoyong Wang0Tiange Fu1Ruijuan Zheng2Xuhui Zhao3Junlong Zhu4Mingchuan Zhang5School of Information Engineering, Luoyang Institute of Science and TechnologyLongmen LaboratorySchool of Information Engineering, Henan University of Science and TechnologySchool of Information Engineering, Henan University of Science and TechnologySchool of Information Engineering, Henan University of Science and TechnologyLongmen LaboratoryAbstract Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (AdaBNTD) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for AdaBNTD within the Markovian observation framework. In particular, AdaBNTD is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of $${{\mathcal {O}}}(1/\sqrt{K})$$ O ( 1 / K ) , where K denotes the iteration count. Besides, the effectiveness AdaBNTD is also verified through several reinforcement learning benchmark domains.https://doi.org/10.1007/s40747-024-01757-wAdaptive methodsNon-asymptotic convergenceNonlinear function approximationReinforcement learningTemporal-difference learning
spellingShingle Guoyong Wang
Tiange Fu
Ruijuan Zheng
Xuhui Zhao
Junlong Zhu
Mingchuan Zhang
Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
Complex & Intelligent Systems
Adaptive methods
Non-asymptotic convergence
Nonlinear function approximation
Reinforcement learning
Temporal-difference learning
title Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
title_full Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
title_fullStr Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
title_full_unstemmed Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
title_short Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
title_sort adaptive temporal difference learning via deep neural network function approximation a non asymptotic analysis
topic Adaptive methods
Non-asymptotic convergence
Nonlinear function approximation
Reinforcement learning
Temporal-difference learning
url https://doi.org/10.1007/s40747-024-01757-w
work_keys_str_mv AT guoyongwang adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis
AT tiangefu adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis
AT ruijuanzheng adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis
AT xuhuizhao adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis
AT junlongzhu adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis
AT mingchuanzhang adaptivetemporaldifferencelearningviadeepneuralnetworkfunctionapproximationanonasymptoticanalysis