Application of Q-learning based on adaptive greedy considering negative rewards in football match system - Details

Author：

Xue, Fei (Xue, Fei.) | Li, Juntao (Li, Juntao.) | Yuan, Ruiping (Yuan, Ruiping.) | Liu, Tao (Liu, Tao.) | Dong, Tingting (Dong, Tingting.)

Indexed by：

EI Scopus

Abstract：

Aiming　at　the　problem　that　the　multi-robot　task　allocation　method　in　soccer　system　is　easy　to　fall　into　the　problem　of　local　optimal　solution　and　real-time　performance,　a　new　multi-robot　task　allocation　method　is　proposed.　First　of　all,　in　order　to　improve　the　speed　and　efficiency　of　finding　optimal　actions　and　make　better　use　of　the　disadvantages　that　traditional　Q-learning　can＇t　often　propagate　negative　values,　we　propose　a　new　way　to　propagate　negative　values,　that　is,　Q-learning　methods　based　on　negative　rewards.　Next,　in　order　to　adapt　to　the　dynamic　external　environment,　an　adaptive　greedy　method　of　which　the　mode　of　operation　is　judged　by　the　value　is　proposed.　This　method　is　based　on　the　classical　-greedy.　In　the　process　of　solving　problems,　can　be　adaptively　changed　as　needed　for　a　better　balance　of　exploration　and　exploitation　in　reinforcement　learning.　Finally,　we　apply　this　method　to　the　robot＇s　football　game　system.　It　has　been　experimentally　proven　that　dangerous　actions　can　be　avoided　effectively　by　the　Q-learning　method　which　can　spread　negative　rewards.　The　adaptive　-greedy　strategy　can　be　used　to　adapt　to　the　external　environment　better　and　faster　so　as　to　improve　the　speed　of　convergence.　Copyright　©　2019　Inderscience　Enterprises　Ltd.

Keyword：

Algorithms Learning systems Football Robot applications Reinforcement learning Industrial robots Multipurpose robots

Author Community：

[ 1 ] [Xue, Fei]School of Information, Beijing Wuzi University, Beijing; 101149, China
[ 2 ] [Li, Juntao]School of Information, Beijing Wuzi University, Beijing; 101149, China
[ 3 ] [Yuan, Ruiping]School of Information, Beijing Wuzi University, Beijing; 101149, China
[ 4 ] [Liu, Tao]School of Information, Beijing Wuzi University, Beijing; 101149, China
[ 5 ] [Dong, Tingting]College of Computer Science and Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address：

[xue, fei]school of information, beijing wuzi university, beijing; 101149, china

Email：

xuefei2004@126.com

Show more details

Related Keywords：

Multi-robot formation control using reinforcement learning method
2010，1st International Conference on Advances in Swarm Intelligence, ICSI 2010
An improved FCM algorithm with application in path optimization
2013，Journal of Computational Information Systems
Algorithms for acceptor sites recognition in DNA sequences
2004，WCICA 2004 - Fifth World Congress on Intelligent Control and Automation, Conference Proceedings
The constructing of multi-agent intelligent control system for behaviors coevolution
2006，6th World Congress on Intelligent Control and Automation, WCICA 2006

Source ：

International Journal of Wireless and Mobile Computing

ISSN： 1741-1084

Year： 2019

Issue： 3

Volume： 16

Page： 233-240

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 11

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to