Indexed by:
Abstract:
Combined with policy-based reinforcement learning (RL) and value-based RL, the actor-critic (AC) learning structure is an effective framework. However, the cost function of this AC framework has large variances, which make it difficult to accomplish an optimization objective. Based on the discounted generalized value iteration method with ℓ1-regularization, a regularized AC (RAC) framework is developed to address the optimal regulation problems and make the cost function converge faster. Two neural networks are constructed to update the cost function and the policy gradient, respectively. The ℓ1-regularization is used in the policy gradient and the cost function in the process of value iteration. The cost function is proved to converge to the optimal cost function in a monotonically decreasing form. Finally, the effectiveness of RAC is shown through two experiments. © 2023 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2023
Page: 105-110
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 8
Affiliated Colleges: