A data cleaning method for water quality based on improved hierarchical clustering algorithm - Details

Author：

Meng, Qingxuan (Meng, Qingxuan.) | Yan, Jianzhuo (Yan, Jianzhuo.) (Scholars：闫健卓)

Indexed by：

EI Scopus

Abstract：

Identifying　and　rectifying　incomplete　water　quality　data　is　of　vital　importance.　A　data　cleaning　method　based　on　improved　balanced　iterative　reducing　and　clustering　using　hierarchies　(BIRCH)　clustering　algorithm　is　proposed.　The　clustering　feature　tree　of　water　quality　data　is　constructed　and　the　cluster　vector　of　the　clustering　feature　tree　is　obtained　by　the　agglomerative　method.　The　optimal　cluster　number　is　determined　according　to　the　Bayesian　Information　Criterion　and　the　nearest　clustering　ratio.　The　Pauta　criterion　is　used　to　detect　the　global　outlier　and　artificial　neural　network　(ANN)　is　used　to　fill　in　outliers　and　missing　values.　Finally,　the　improved　data　cleaning　method　is　applied　to　water　quality　monitoring　data　of　Beijing　wastewater　treatment　plant.　The　experimental　results　show　that　the　data　cleaning　method　can　not　only　detect　abnormal　values　and　missing　values　accurately,　but　also　normalise　and　complete　missing　data.　Copyright　©　2019　Inderscience　Enterprises　Ltd.

Keyword：

Wastewater treatment Water quality Neural networks Trees (mathematics) Clustering algorithms Iterative methods Cleaning Statistics Hierarchical clustering Sewage treatment plants

Author Community：

[ 1 ] [Meng, Qingxuan]College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China
[ 2 ] [Yan, Jianzhuo]College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China

Reprint Author's Address：