A stable data-augmented reinforcement learning method with ensemble exploration and exploitation - Details

Author：

Zuo, Guoyu (Zuo, Guoyu.) (Scholars：左国玉) | Tian, Zhipeng (Tian, Zhipeng.) | Huang, Gao (Huang, Gao.)

Indexed by：

EI Scopus SCIE

Abstract：

Learning　from　visual　observations　is　a　significant　yet　challenging　problem　in　Reinforcement　Learning　(RL).　Two　respective　problems,　representation　learning　and　task　learning,　need　to　solve　to　infer　an　optimal　policy.　Some　methods　have　been　proposed　to　utilize　data　augmentation　in　reinforcement　learning　to　directly　learn　from　images.　Although　these　methods　can　improve　generation　in　RL,　they　are　often　found　to　make　the　task　learning　unsteady　and　can　even　lead　to　divergence.　We　investigate　the　causes　of　instability　and　find　it　is　usually　rooted　in　high-variance　of　Q-functions.　In　this　paper,　we　propose　an　easy-to-implement　and　unified　method　to　solve　above-mentioned　problems,　Data-augmented　Reinforcement　Learning　with　Ensemble　Exploration　and　Exploitation　(DAR-EEE).　Bootstrap　ensembles　are　incorporated　into　data　augmented　reinforcement　learning　and　provide　uncertainty　estimation　of　both　original　and　augmented　states,　which　can　be　utilized　to　stabilize　and　accelerate　the　task　learning.　Specially,　a　novel　strategy　called　uncertainty-weighted　exploitation　is　designed　for　rational　utilization　of　transition　tuples.　Moreover,　an　efficient　exploration　method　using　the　highest　upper　confidence　is　used　to　balance　exploration　and　exploitation.　Our　experimental　evaluation　demonstrates　the　improved　sample　efficiency　and　final　performance　of　our　method　on　a　range　of　difficult　image-based　control　tasks.　Especially,　our　method　has　achieved　the　new　state-of-the-art　performance　on　Reacher-easy　and　Cheetah-run　tasks.

Keyword：

Bootstrap ensembles Reinforcement learning from images Robot learning Data augmentation

Author Community：

[ 1 ] [Zuo, Guoyu]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Tian, Zhipeng]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 3 ] [Huang, Gao]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 4 ] [Zuo, Guoyu]Beijing Key Lab Comp Intelligence & Intelligent Sy, Beijing 100124, Peoples R China
[ 5 ] [Huang, Gao]Beijing Key Lab Comp Intelligence & Intelligent Sy, Beijing 100124, Peoples R China
[ 6 ] [Huang, Gao]Beijing Inst Technol, Beijing Adv Innovat Ctr Intelligent Robots & Syst, Beijing 100081, Peoples R China

Reprint Author's Address：

Email：

zuoguoyu@bjut.edu.cn |
tiantiant@emails.bjut.edu.cn |
huanggao@bjut.edu.cn

Show more details

Related Keywords：

Remaining life prediction of turbofan engine based on multi-path feature fusion
2022，
A Brief Survey on Semantic-preserving Data Augmentation
2024，
Research and Comparison of Intelligent Detection Methods of Pavement Distress Based on Deep Data Augmentation; [基于数据深度增强的路面病害智能检测方法研究及比较]
2022，Journal of Beijing University of Technology
Improving Multi-Class Code Readability Classification with An Enhanced Data Augmentation Approach (130)
2022，INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING

Source ：

APPLIED INTELLIGENCE

ISSN： 0924-669X

Year： 2023

Issue： 21

Volume： 53

Page： 24792-24803

5 . 3 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：19

Cited Count：

WoS CC Cited Count： 2

SCOPUS Cited Count： 2

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to