• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zhang, Siyuan (Zhang, Siyuan.) | Zhu, Xiaoqing (Zhu, Xiaoqing.) | Chen, Jiangtao (Chen, Jiangtao.) | Liu, Xinyuan (Liu, Xinyuan.) | Wang, Tao (Wang, Tao.)

Indexed by:

EI Scopus

Abstract:

[Objective] Inspired by the skill learning of quadruped animals in nature, deep reinforcement learning has been widely applied to learn the quadruped robot locomotion skill. Through interaction with the environment, robots can autonomously learn complete motion strategies. However, traditional reinforcement learning has several drawbacks, such as large computational requirements, slow convergence of algorithms, and rigid learning strategies, which substantially reducetraining efficiency and generate unnecessary time costs. To address these shortcomings, this paper introduces evolutionary strategies into the soft actor-critic (SAC) algorithm, proposing an optimized parallel SAC (OP-SAC) algorithm for the parallel training of quadruped robots using evolutionary strategies and reinforcement learning.[Methods] The algorithm first uses a variant temperature coefficient SAC algorithm to reduce the impact of hyperparameter temperature coefficients on the training process and then introduces evolutionary strategies using the reference trajectory trained by the evolutionary strategy as a sample input to guide the training direction of the SAC algorithm. Additionally, the state information and reward values obtained from SAC algorithm training are used as inputs and offspring selection thresholds for the evolutionary strategy, achieving the decoupling of training data. Furthermore, the algorithm adopts an alternating training approach, introducing a knowledge-sharing strategy where the training results of the evolutionary strategy and reinforcement learning are stored in a common experience pool. Furthermore, a knowledge inheritance mechanism is introduced, allowing the training results of both strategies to be passed on to the next stage of the algorithm. With these two training strategies, the evolutionary strategy and reinforcement learning can guide each other in terms of the training direction and pass useful information between different generations, thereby accelerating the learning process and enhancing the robustness of the algorithm.[Results] The simulation experiment results were as follows: 1) Using the OP-SAC algorithm to train quadruped robots achieves a reward value convergence of approximately 3 000, with stable posture and high speed after training completion. The algorithm can effectively complete the bionic gait learning of quadruped robots. 2) Compared with other algorithms combining SAC and evolutionary strategies, the OP-SAC algorithm has a faster convergence rate and higher reward value after convergence, demonstrating higher robustness in the learned strategies. 3) Although the convergence speed of the OP-SAC algorithm is slower than that of other reinforcement learning algorithms combined with central pattern generator, it ultimately achieves a higher reward value and more stable training results. 4) The ablation experiments confirm the importance of knowledge inheritance and sharing strategies for improving training effectiveness.[Conclusions] The above analysis shows that the proposed OP-SAC algorithm accomplishes the learning of quadruped robot locomotion skill, improves the convergence speed of reinforcement learning to a certain extent, optimizes learning strategies, and significantly enhances training efficiency. © 2024 Tsinghua University. All rights reserved.

Keyword:

Multipurpose robots Biped locomotion Reinforcement learning Deep learning Knowledge acquisition Adversarial machine learning Deep reinforcement learning

Author Community:

  • [ 1 ] [Zhang, Siyuan]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 2 ] [Zhang, Siyuan]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China
  • [ 3 ] [Zhu, Xiaoqing]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 4 ] [Zhu, Xiaoqing]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China
  • [ 5 ] [Chen, Jiangtao]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 6 ] [Chen, Jiangtao]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China
  • [ 7 ] [Liu, Xinyuan]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 8 ] [Liu, Xinyuan]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China
  • [ 9 ] [Wang, Tao]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 10 ] [Wang, Tao]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Journal of Tsinghua University

ISSN: 1000-0054

Year: 2024

Issue: 10

Volume: 64

Page: 1696-1705

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 8

Affiliated Colleges:

Online/Total:699/10645586
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.