Neural-network-based accelerated safe Q-learning for optimal control of discrete-time nonlinear systems with state constraints - Details

Author：

Zhao, M. (Zhao, M..) | Wang, D. (Wang, D..) | Qiao, J. (Qiao, J..)

Indexed by：

EI Scopus SCIE

Abstract：

For　unknown　nonlinear　systems　with　state　constraints,　it　is　difficult　to　achieve　the　safe　optimal　control　by　using　Q-learning　methods　based　on　traditional　quadratic　utility　functions.　To　solve　this　problem,　this　article　proposes　an　accelerated　safe　Q-learning　(SQL)　technique　that　addresses　the　concurrent　requirements　of　safety　and　optimality　for　discrete-time　nonlinear　systems　within　an　integrated　framework.　First,　an　adjustable　control　barrier　function　is　designed　and　integrated　into　the　cost　function,　aiming　to　facilitate　the　transformation　of　constrained　optimal　control　problems　into　unconstrained　cases.　The　augmented　cost　function　is　closely　linked　to　the　next　state,　enabling　quicker　deviation　of　the　state　from　constraint　boundaries.　Second,　leveraging　offline　data　that　adheres　to　safety　constraints,　we　introduce　an　off-policy　value　iteration　SQL　approach　for　searching　a　safe　optimal　policy,　thus　mitigating　the　risk　of　unsafe　interactions　that　may　result　from　suboptimal　iterative　policies.　Third,　the　vast　amounts　of　offline　data　and　the　complex　augmented　cost　function　can　hinder　the　learning　speed　of　the　algorithm.　To　address　this　issue,　we　integrate　historical　iteration　information　into　the　current　iteration　step　to　accelerate　policy　evaluation,　and　introduce　the　Nesterov　Momentum　technique　to　expedite　policy　improvement.　Additionally,　the　theoretical　analysis　demonstrates　the　convergence,　optimality,　and　safety　of　the　SQL　algorithm.　Finally,　under　the　influence　of　different　parameters,　simulation　outcomes　of　two　nonlinear　systems　with　state　constraints　reveal　the　efficacy　and　advantages　of　the　accelerated　SQL　approach.　The　proposed　method　requires　fewer　iterations　while　enabling　the　system　state　to　converge　to　the　equilibrium　point　more　rapidly.　©　2025　Elsevier　Ltd

Keyword：

Adaptive dynamic programming Accelerated value iteration Control barrier functions State constraints Neural networks Safe Q-learning

Author Community：

[ 1 ] [Zhao M.]School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Zhao M.]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, 100124, China
[ 3 ] [Zhao M.]Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, 100124, China
[ 4 ] [Zhao M.]Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, 100124, China
[ 5 ] [Wang D.]School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
[ 6 ] [Wang D.]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, 100124, China
[ 7 ] [Wang D.]Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, 100124, China
[ 8 ] [Wang D.]Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, 100124, China
[ 9 ] [Qiao J.]School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
[ 10 ] [Qiao J.]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, 100124, China
[ 11 ] [Qiao J.]Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, 100124, China
[ 12 ] [Qiao J.]Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Accelerated Value Iteration for Nonlinear Zero-Sum Games with Convergence Guarantee
2024，Guidance, Navigation and Control
Improved Adaptive Critic for Neural Optimal Control of Constrained Nonlinear Discrete-Time Systems
2020，39th Chinese Control Conference (CCC)
Model-free intelligent critic design with error analysis for neural tracking control
2024，Neurocomputing
Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate
2024，Neural Networks

Source ：

Neural Networks

ISSN： 0893-6080

Year： 2025

Volume： 186

7 . 8 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 5

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to