Indexed by:
Abstract:
Learning control methods have been widely enhanced by reinforcement learning, but it is challenging to analyze the effects of incorporating extra system information. This paper presents a novel multi-step framework that utilizes extra multi-step system information to solve optimal control problems. Within this framework, we establish and classify general multi-step value iteration (MsVI) algorithms based on the uniformity between policy evaluation and improvement stages. According to this uniformity concept, the convergence condition and the acceleration conclusion are analyzed for different kinds of MsVI algorithms. Besides, we introduce a swarm policy optimizer to relieve limitations of the traditional gradient optimizer. Specifically, we implement general MsVI using an actor–critic scheme, where the swarm optimizer and neural networks are employed for policy improvement and evaluation, respectively. Furthermore, the approximation error caused by the approximator is also considered to verify the advantage of using multi-step system information. Finally, we apply the proposed method to a nonlinear benchmark system, demonstrating superior learning ability and control performance compared to traditional methods. © 2025 The Authors
Keyword:
Reprint Author's Address:
Email:
Source :
Automatica
ISSN: 0005-1098
Year: 2025
Volume: 175
6 . 4 0 0
JCR@2022
Cited Count:
SCOPUS Cited Count: 2
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 9
Affiliated Colleges: