Indexed by:
Abstract:
In this paper, a value-iteration-based off-policy Q-learning algorithm is developed. The proposed algorithm solves the optimal regulation problem of nonlinear systems with unknown dynamics. Under the off-policy mechanism, the algorithm utilizes the behavioral policy for full exploration, which is beneficial to avoid the target policy from falling into the local optimal solution. In addition, a relaxation factor is introduced to adjust the convergence rate of the cost function sequence. To implement the algorithm, the critic network and the action network are used to approximate the optimal Q-function and the optimal control policy, respectively. Finally, a simulation example is presented to demonstrate the effectiveness of the proposed algorithm. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2024
Page: 2717-2722
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 7
Affiliated Colleges: