Indexed by:
Abstract:
The efficiency and economy of the nonlinear optimal control process in wastewater treatment plants are two crucial indicators, corresponding to achieving the control objective faster and reducing the preset cost function. To accomplish this, an integrated online Q-learning (IOQL) algorithm, driven by a prior policy and an exploration policy, is proposed for nonlinear discrete-time systems characterized by nonaffine features and unknown structures. First, a prior policy based on historical or artificial experience is designed to reduce the training time of the controller and provide a more stable learning environment. By introducing a weighting factor, the impact of the prior policy on the overall learning process can be adjusted. Second, an exploration policy is trained online through new experiences collected from the real environment. By leveraging two policies, we can swiftly and smoothly adjust the critic network for approximating the cost function and the action network for approximating the exploration policy, which can gradually enhance the control outcomes. Third, a stability condition with reasonable bounds is presented for the IOQL design. Finally, experimental and comparative results pertaining to a wastewater treatment plant, specifically evaluating learning speed and cost consumption, clearly demonstrate the significant advantages and superiority of the IOQL algorithm.
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
ISSN: 1551-3203
Year: 2024
Issue: 2
Volume: 21
Page: 1833-1842
1 2 . 3 0 0
JCR@2022
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 8
Affiliated Colleges: