MEOL: A Maximum-Entropy Framework for Options Learning - Details

Author：

Zhang, Pin (Zhang, Pin.) | Dong, Wenhan (Dong, Wenhan.) | Cai, Ming (Cai, Ming.) | Jia, Shengde (Jia, Shengde.) | Wang, Zi-Peng (Wang, Zi-Peng.)

Indexed by：

EI Scopus SCIE

Abstract：

Options,　the　temporally　extended　courses　of　actions　that　can　be　taken　at　varying　time　scale,　have　provided　a　concrete,　key　framework　for　learning　levels　of　temporal　abstraction　in　hierarchical　tasks.　While　methods　of　learning　options　end-to-end　is　well　researched,　how　to　explore　good　options　and　actions　simultaneously　is　still　challenging.　We　address　this　issue　by　maximizing　reward　augmented　with　entropies　of　both　option　and　action　selection　policy　in　options　learning.　To　this　end,　we　reveal　our　novel　optimization　objective　by　reformulating　options　learning　from　perspective　of　probabilistic　inference　and　propose　a　soft　options　iteration　method　to　guarantee　convergence　to　the　optimum.　In　implementation,　we　propose　an　off-policy　algorithm　called　the　maximum-entropy　options　critic　(MEOC)　and　evaluate　it　on　series　of　continuous　control　benchmarks.　Comparative　results　demonstrate　that　our　method　outperforms　baselines　in　efficiency　and　final　result　on　most　benchmarks,　and　the　performance　exhibits　superiority　and　robustness　especially　on　complex　tasks.　Ablated　studies　further　explain　that　entropy　maximization　on　hierarchical　exploration　promotes　learning　performance　through　efficient　options　specialization　and　multimodality　in　action　level.

Keyword：

Hidden Markov models hierarchical reinforcement learning maximum-entropy options learning framework (MEOL) options learning Trajectory Reinforcement learning Entropy Probabilistic logic maximum-entropy options critic (MEOC) temporal abstraction exploration Optimization Deep reinforcement learning Inference algorithms

Author Community：

[ 1 ] [Zhang, Pin]Air Force Engn Univ, Coll Aeronaut Engn, Xian 710038, Peoples R China
[ 2 ] [Dong, Wenhan]Air Force Engn Univ, Coll Aeronaut Engn, Xian 710038, Peoples R China
[ 3 ] [Cai, Ming]Air Force Engn Univ, Coll Aeronaut Engn, Xian 710038, Peoples R China
[ 4 ] [Jia, Shengde]Natl Univ Def Technol, Coll Mechatron Engn & Automat, Changsha 410073, Peoples R China
[ 5 ] [Wang, Zi-Peng]Beijing Univ Technol, Fac Informat Technol, Beijing Lab Smart Environm Protect, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
[ 6 ] [Wang, Zi-Peng]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China

Reprint Author's Address：

[Cai, Ming]Air Force Engn Univ, Coll Aeronaut Engn, Xian 710038, Peoples R China

Email：

pinpinmiao@sina.com |
dongwenhan@sina.com |
caiming1124@sina.com |
jiashengde08@nudt.edu.cn |
wzp182475@163.com

Show more details

Related Keywords：

CSO: Constraint-Guided Space Optimization for Active Scene Mapping
2024，32nd ACM International Conference on Multimedia, MM 2024
Nowcasting the Vehicular Control Delay From Low-Ping Frequency Trajectories via Incremental Hypergraph Learning
2024，IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY
A Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction
2024，IEEE TRANSACTIONS ON INTELLIGENT VEHICLES
MVHGN: Multi-View Adaptive Hierarchical Spatial Graph Convolution Network Based Trajectory Prediction for Heterogeneous Traffic-Agents
2023，IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Source ：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

ISSN： 2162-237X

Year： 2024

Issue： 3

Volume： 36

Page： 4834-4848

1 0 . 4 0 0

JCR@2022

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to