Indexed by:
Abstract:
Reinforcement learning algorithm represented by flexible action evaluation (SAC) has been successful in reproducing the motor skills of higher animals. This framework combines strategy search and state action value function. However, the agent use strategy exploration is greedy, and the Q value function of evaluation network estimation uses low valuation. This paper proposes a policy distillation (PD) soft actor-critic (PDSAC) algorithm that integrates PD and SAC algorithms to enable agents to adopt better policies. This algorithm allows the agent to explore using hybrid policies and speeds up the convergence of the reward function from reinforcement learning. To validate the proposed algorithm, Theoretical proof that the PDSAC algorithm improves the efficiency of policy exploration and validation in quadruped robot gait learning tasks. According to simulation results, the PDSAC outperforms the SAC in the gait learning task, achieving a 40% increase in convergence speed and a 26.7% improvement in the reward value function. © 2025 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
Journal of Beijing University of Aeronautics and Astronautics
ISSN: 1001-5965
Year: 2025
Issue: 2
Volume: 51
Page: 428-439
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 7
Affiliated Colleges: