• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zhang, Pin (Zhang, Pin.) | Dong, Wenhan (Dong, Wenhan.) | Cai, Ming (Cai, Ming.) | Jia, Shengde (Jia, Shengde.) | Wang, Zi-Peng (Wang, Zi-Peng.)

Indexed by:

EI Scopus SCIE

Abstract:

Options, the temporally extended courses of actions that can be taken at varying time scale, have provided a concrete, key framework for learning levels of temporal abstraction in hierarchical tasks. While methods of learning options end-to-end is well researched, how to explore good options and actions simultaneously is still challenging. We address this issue by maximizing reward augmented with entropies of both option and action selection policy in options learning. To this end, we reveal our novel optimization objective by reformulating options learning from perspective of probabilistic inference and propose a soft options iteration method to guarantee convergence to the optimum. In implementation, we propose an off-policy algorithm called the maximum-entropy options critic (MEOC) and evaluate it on series of continuous control benchmarks. Comparative results demonstrate that our method outperforms baselines in efficiency and final result on most benchmarks, and the performance exhibits superiority and robustness especially on complex tasks. Ablated studies further explain that entropy maximization on hierarchical exploration promotes learning performance through efficient options specialization and multimodality in action level.

Keyword:

Hidden Markov models hierarchical reinforcement learning maximum-entropy options learning framework (MEOL) options learning Trajectory Reinforcement learning Entropy Probabilistic logic maximum-entropy options critic (MEOC) temporal abstraction exploration Optimization Deep reinforcement learning Inference algorithms

Author Community:

  • [ 1 ] [Zhang, Pin]Air Force Engn Univ, Coll Aeronaut Engn, Xian 710038, Peoples R China
  • [ 2 ] [Dong, Wenhan]Air Force Engn Univ, Coll Aeronaut Engn, Xian 710038, Peoples R China
  • [ 3 ] [Cai, Ming]Air Force Engn Univ, Coll Aeronaut Engn, Xian 710038, Peoples R China
  • [ 4 ] [Jia, Shengde]Natl Univ Def Technol, Coll Mechatron Engn & Automat, Changsha 410073, Peoples R China
  • [ 5 ] [Wang, Zi-Peng]Beijing Univ Technol, Fac Informat Technol, Beijing Lab Smart Environm Protect, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
  • [ 6 ] [Wang, Zi-Peng]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China

Reprint Author's Address:

  • [Cai, Ming]Air Force Engn Univ, Coll Aeronaut Engn, Xian 710038, Peoples R China

Show more details

Related Keywords:

Source :

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

ISSN: 2162-237X

Year: 2024

Issue: 3

Volume: 36

Page: 4834-4848

1 0 . 4 0 0

JCR@2022

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:2781/10988186
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.