• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Qiao, J. (Qiao, J..) | Yang, R. (Yang, R..) | Wang, D. (Wang, D..)

Indexed by:

EI Scopus SCIE

Abstract:

Wastewater treatment is indispensable to the functioning of urban society, and its optimal control has enormous social benefits. However, precise modelling of the unstable and complex treatment process is challenging yet crucial to the adaptive dynamic programming method. In this article, an adaptive critic algorithm with variational inference is designed to address the optimal control problem of nonlinear discrete-time systems, along with the convergence analysis. Based on the recorded system trajectory, the variational autoencoder is utilized to approximate the behavior policy of the offline dataset without system modelling and online interaction. Through policy iteration learning, the actor-critic structure can amend the policy generated by the variational autoencoder to achieve the optimal control objective. Simulations on a nonlinear system and the wastewater treatment process have verified that the proposed approach outperformed the behavior policy. Driven by the wastewater treatment process data derived from the incremental proportional-integral-derivative controller, the proposed approach can produce an optimal control policy of less tracking error and cost. Note to Practitioners—When dealing with an unknown system with complex dynamics, it is more feasible to improve the acceptable performance of the existing control policy based on the system’s trajectory than to obtain an excelling policy. Motivated by batch reinforcement learning, learning from offline data can avoid the online interaction between the system and the adaptive dynamic programming algorithm, which could lead to exploratory errors during online learning. Specifically, using a model-free adaptive dynamic programming algorithm, the parameters of the controller are instantly updated based on the experience replay buffer sampled from the online trajectory data. However, online exploration determines the update, and there is no guarantee that the system will converge every time. As a specific type of adaptive dynamic programming algorithm, adaptive critic design uses a critic network to approximate the expected future cost and an actor network to generate a control input that minimizes the expected future cost. In this article, using the converged trajectory as the offline dataset, a revised variational autoencoder is used to approximate the behavior policy of the offline dataset. As a generative model, the variational autoencoder considers a random variable that adheres to a prior distribution while producing outputs. Through offline learning, the actor network can amend the approximated policy based on the evaluation from the critic network while being constrained within the limited variation of the generative model. Finally, the objective of the optimal control task can be achieved by following the designated cost design. However, a dataset containing disturbances could impede offline learning, which needs to be addressed. IEEE

Keyword:

wastewater treatment Wastewater treatment Optimal control Dynamic programming Adaptive dynamic programming Task analysis Adaptation models Biological system modeling offline reinforcement learning variational autoencoder data-driven control Trajectory

Author Community:

  • [ 1 ] [Qiao J.]the Beijing Laboratory of Smart Environmental Protection, and the Beijing Institute of Artificial Intelligence, Faculty of Information Technology, the Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
  • [ 2 ] [Yang R.]the Beijing Laboratory of Smart Environmental Protection, and the Beijing Institute of Artificial Intelligence, Faculty of Information Technology, the Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
  • [ 3 ] [Wang D.]the Beijing Laboratory of Smart Environmental Protection, and the Beijing Institute of Artificial Intelligence, Faculty of Information Technology, the Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

IEEE Transactions on Automation Science and Engineering

ISSN: 1545-5955

Year: 2023

Issue: 4

Volume: 21

Page: 1-12

5 . 6 0 0

JCR@2022

ESI Discipline: ENGINEERING;

ESI HC Threshold:19

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 10

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 8

Affiliated Colleges:

Online/Total:379/10564322
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.