Offline Data-Driven Adaptive Critic Design With Variational Inference for Wastewater Treatment Process Control - Details

Author：

Qiao, J. (Qiao, J..) | Yang, R. (Yang, R..) | Wang, D. (Wang, D..)

Indexed by：

EI Scopus SCIE

Abstract：

Wastewater　treatment　is　indispensable　to　the　functioning　of　urban　society,　and　its　optimal　control　has　enormous　social　benefits.　However,　precise　modelling　of　the　unstable　and　complex　treatment　process　is　challenging　yet　crucial　to　the　adaptive　dynamic　programming　method.　In　this　article,　an　adaptive　critic　algorithm　with　variational　inference　is　designed　to　address　the　optimal　control　problem　of　nonlinear　discrete-time　systems,　along　with　the　convergence　analysis.　Based　on　the　recorded　system　trajectory,　the　variational　autoencoder　is　utilized　to　approximate　the　behavior　policy　of　the　offline　dataset　without　system　modelling　and　online　interaction.　Through　policy　iteration　learning,　the　actor-critic　structure　can　amend　the　policy　generated　by　the　variational　autoencoder　to　achieve　the　optimal　control　objective.　Simulations　on　a　nonlinear　system　and　the　wastewater　treatment　process　have　verified　that　the　proposed　approach　outperformed　the　behavior　policy.　Driven　by　the　wastewater　treatment　process　data　derived　from　the　incremental　proportional-integral-derivative　controller,　the　proposed　approach　can　produce　an　optimal　control　policy　of　less　tracking　error　and　cost.　Note　to　Practitioners—When　dealing　with　an　unknown　system　with　complex　dynamics,　it　is　more　feasible　to　improve　the　acceptable　performance　of　the　existing　control　policy　based　on　the　system’s　trajectory　than　to　obtain　an　excelling　policy.　Motivated　by　batch　reinforcement　learning,　learning　from　offline　data　can　avoid　the　online　interaction　between　the　system　and　the　adaptive　dynamic　programming　algorithm,　which　could　lead　to　exploratory　errors　during　online　learning.　Specifically,　using　a　model-free　adaptive　dynamic　programming　algorithm,　the　parameters　of　the　controller　are　instantly　updated　based　on　the　experience　replay　buffer　sampled　from　the　online　trajectory　data.　However,　online　exploration　determines　the　update,　and　there　is　no　guarantee　that　the　system　will　converge　every　time.　As　a　specific　type　of　adaptive　dynamic　programming　algorithm,　adaptive　critic　design　uses　a　critic　network　to　approximate　the　expected　future　cost　and　an　actor　network　to　generate　a　control　input　that　minimizes　the　expected　future　cost.　In　this　article,　using　the　converged　trajectory　as　the　offline　dataset,　a　revised　variational　autoencoder　is　used　to　approximate　the　behavior　policy　of　the　offline　dataset.　As　a　generative　model,　the　variational　autoencoder　considers　a　random　variable　that　adheres　to　a　prior　distribution　while　producing　outputs.　Through　offline　learning,　the　actor　network　can　amend　the　approximated　policy　based　on　the　evaluation　from　the　critic　network　while　being　constrained　within　the　limited　variation　of　the　generative　model.　Finally,　the　objective　of　the　optimal　control　task　can　be　achieved　by　following　the　designated　cost　design.　However,　a　dataset　containing　disturbances　could　impede　offline　learning,　which　needs　to　be　addressed.　IEEE

Keyword：

wastewater treatment Wastewater treatment Optimal control Dynamic programming Adaptive dynamic programming Task analysis Adaptation models Biological system modeling offline reinforcement learning variational autoencoder data-driven control Trajectory

Author Community：

[ 1 ] [Qiao J.]the Beijing Laboratory of Smart Environmental Protection, and the Beijing Institute of Artificial Intelligence, Faculty of Information Technology, the Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
[ 2 ] [Yang R.]the Beijing Laboratory of Smart Environmental Protection, and the Beijing Institute of Artificial Intelligence, Faculty of Information Technology, the Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
[ 3 ] [Wang D.]the Beijing Laboratory of Smart Environmental Protection, and the Beijing Institute of Artificial Intelligence, Faculty of Information Technology, the Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Policy Gradient Adaptive Critic Design With Dynamic Prioritized Experience Replay for Wastewater Treatment Process Control
2022，IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
Heuristic Dynamic Programming Using Echo State Network For Multivariable Tracking Control Of Wastewater Treatment Process
2015，ASIAN JOURNAL OF CONTROL
Enhancing offline reinforcement learning for wastewater treatment via transition filter and prioritized approximation loss
2025，NEUROCOMPUTING
Data-Driven Iterative Adaptive Critic Control Toward an Urban Wastewater Treatment Plant
2021，IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS

Source ：

IEEE Transactions on Automation Science and Engineering

ISSN： 1545-5955

Year： 2023

Issue： 4

Volume： 21

Page： 1-12

5 . 6 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：19

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 10

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 8

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to