Indexed by:
Abstract:
Learning ability is a typical characteristic of higher animal intelligence. In order to explore the learning mechanism of quadruped motor skills, this paper studies the gait learning task of quadruped robots, and reproduces the rhythmic gait learning process of quadruped animals from scratch. In recent years, proximal policy optimization (PPO) algorithm, as a typical representative algorithm of deep reinforcement learning, has been widely used in gait learning tasks for quadruped robots, with good experimental results and fewer hyperparameters required. However, in the multidimensional input and output scenario, it is easy to converge to the local optimum point, in the experimental environment of this study, the gait rhythm signals of the trained quadruped robot were irregular, and the center of gravity oscillates. To solve the above problems, inspired by meta-learning, based on the advantage of meta-learning in characterizing the high-dimensional abstract representation of learning processes, this paper proposes an meta proximal policy optimization (MPPO) algorithm that combines meta-learning and PPO algorithms. This algorithm can enable quadruped robots to learn better gait. The simulation results on the PyBullet simulation platform show that the algorithm proposed in this paper can enable quadruped robots to learn walking skills. Compared with soft actor-critic (SAC) and PPO algorithms, the MPPO algorithm proposed in this paper has advantages such as more regular gait rhythm signals and faster walking speed. © 2024 South China University of Technology. All rights reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
Control Theory and Applications
ISSN: 1000-8152
Year: 2024
Issue: 1
Volume: 41
Page: 155-162
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: