Indexed by:
Abstract:
Reproducing the learning process of higher organisms is an important research direction in robot research. Some commonly used reinforcement learning algorithms had been explored based on actor critic (AC) networks to accomplish this task. Due to some shortcomings still existed in the reinforcement learning algorithms, some improvements were also took place. For the deep deterministic policy gradient (DDPG), an overestimated problem to Q value led to deterioration of the learning effect. Inspired by the arbitration mechanism in the prefrontal cortex of the brain, a deep arbitration actor critic (DAAC) algorithm was proposed, including two sets of evaluation networks. Through the arbitration mechanism, an optimal evaluation network was selected to update the policy parameters, solving the overestimated problem to Q value effectively. This algorithm enables the quadruped robot reproduce the bionic gait learning process. In simulation experiments, the DAAC algorithm was compared with three algorithms, DDPG, soft actor critic (SAC), and proximal policy optimization (PPO). The experiment results show that the gait of the quadruped robot trained by DAAC has better performance in three aspects, reward value, machine stability, and speed, verifying effectively the superiority of the algorithm. © 2023 Beijing Institute of Technology. All rights reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
Transaction of Beijing Institute of Technology
ISSN: 1001-0645
Year: 2023
Issue: 11
Volume: 43
Page: 1197-1204
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 9
Affiliated Colleges: