Optimization-based parallel learning of quadruped robot locomotion skills - Details

Author：

Zhang, Siyuan (Zhang, Siyuan.) | Zhu, Xiaoqing (Zhu, Xiaoqing.) | Chen, Jiangtao (Chen, Jiangtao.) | Liu, Xinyuan (Liu, Xinyuan.) | Wang, Tao (Wang, Tao.)

Indexed by：

EI Scopus

Abstract：

[Objective]　Inspired　by　the　skill　learning　of　quadruped　animals　in　nature,　deep　reinforcement　learning　has　been　widely　applied　to　learn　the　quadruped　robot　locomotion　skill.　Through　interaction　with　the　environment,　robots　can　autonomously　learn　complete　motion　strategies.　However,　traditional　reinforcement　learning　has　several　drawbacks,　such　as　large　computational　requirements,　slow　convergence　of　algorithms,　and　rigid　learning　strategies,　which　substantially　reducetraining　efficiency　and　generate　unnecessary　time　costs.　To　address　these　shortcomings,　this　paper　introduces　evolutionary　strategies　into　the　soft　actor-critic　(SAC)　algorithm,　proposing　an　optimized　parallel　SAC　(OP-SAC)　algorithm　for　the　parallel　training　of　quadruped　robots　using　evolutionary　strategies　and　reinforcement　learning.[Methods]　The　algorithm　first　uses　a　variant　temperature　coefficient　SAC　algorithm　to　reduce　the　impact　of　hyperparameter　temperature　coefficients　on　the　training　process　and　then　introduces　evolutionary　strategies　using　the　reference　trajectory　trained　by　the　evolutionary　strategy　as　a　sample　input　to　guide　the　training　direction　of　the　SAC　algorithm.　Additionally,　the　state　information　and　reward　values　obtained　from　SAC　algorithm　training　are　used　as　inputs　and　offspring　selection　thresholds　for　the　evolutionary　strategy,　achieving　the　decoupling　of　training　data.　Furthermore,　the　algorithm　adopts　an　alternating　training　approach,　introducing　a　knowledge-sharing　strategy　where　the　training　results　of　the　evolutionary　strategy　and　reinforcement　learning　are　stored　in　a　common　experience　pool.　Furthermore,　a　knowledge　inheritance　mechanism　is　introduced,　allowing　the　training　results　of　both　strategies　to　be　passed　on　to　the　next　stage　of　the　algorithm.　With　these　two　training　strategies,　the　evolutionary　strategy　and　reinforcement　learning　can　guide　each　other　in　terms　of　the　training　direction　and　pass　useful　information　between　different　generations,　thereby　accelerating　the　learning　process　and　enhancing　the　robustness　of　the　algorithm.[Results]　The　simulation　experiment　results　were　as　follows:　1)　Using　the　OP-SAC　algorithm　to　train　quadruped　robots　achieves　a　reward　value　convergence　of　approximately　3　000,　with　stable　posture　and　high　speed　after　training　completion.　The　algorithm　can　effectively　complete　the　bionic　gait　learning　of　quadruped　robots.　2)　Compared　with　other　algorithms　combining　SAC　and　evolutionary　strategies,　the　OP-SAC　algorithm　has　a　faster　convergence　rate　and　higher　reward　value　after　convergence,　demonstrating　higher　robustness　in　the　learned　strategies.　3)　Although　the　convergence　speed　of　the　OP-SAC　algorithm　is　slower　than　that　of　other　reinforcement　learning　algorithms　combined　with　central　pattern　generator,　it　ultimately　achieves　a　higher　reward　value　and　more　stable　training　results.　4)　The　ablation　experiments　confirm　the　importance　of　knowledge　inheritance　and　sharing　strategies　for　improving　training　effectiveness.[Conclusions]　The　above　analysis　shows　that　the　proposed　OP-SAC　algorithm　accomplishes　the　learning　of　quadruped　robot　locomotion　skill,　improves　the　convergence　speed　of　reinforcement　learning　to　a　certain　extent,　optimizes　learning　strategies,　and　significantly　enhances　training　efficiency.　©　2024　Tsinghua　University.　All　rights　reserved.

Keyword：

Multipurpose robots Biped locomotion Reinforcement learning Deep learning Knowledge acquisition Adversarial machine learning Deep reinforcement learning

Author Community：

[ 1 ] [Zhang, Siyuan]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 2 ] [Zhang, Siyuan]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China
[ 3 ] [Zhu, Xiaoqing]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 4 ] [Zhu, Xiaoqing]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China
[ 5 ] [Chen, Jiangtao]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 6 ] [Chen, Jiangtao]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China
[ 7 ] [Liu, Xinyuan]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 8 ] [Liu, Xinyuan]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China
[ 9 ] [Wang, Tao]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 10 ] [Wang, Tao]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Multi-strategy Central Pattern Generator and Reinforcement Learning Integration for Quadruped Locomotion
2025，3rd International Conference on Machine Learning, Cloud Computing and Intelligent Mining, MLCCIM 2024
Environmental Features Assessment Network Aided Deep Reinforcement Learning for Quadrupedal Locomotion in Tough Terrain
2023，2023 China Automation Congress, CAC 2023
A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms
2024，Control Theory and Applications
Gait learning method of quadruped robot based on policy distillation
2025，Journal of Beijing University of Aeronautics and Astronautics

Source ：

Journal of Tsinghua University

ISSN： 1000-0054

Year： 2024

Issue： 10

Volume： 64

Page： 1696-1705

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 8

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to