Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering - Details

Author：

Chu, Z. (Chu, Z..) | He, J. (He, J..) | Zhang, X. (Zhang, X..) | Zhu, N. (Zhu, N..)

Indexed by：

Scopus SCIE

Abstract：

As　a　social　information　product,　the　privacy　and　usability　of　high-dimensional　data　are　the　core　issues　in　the　field　of　privacy　protection.　Feature　selection　is　a　commonly　used　dimensionality　reduction　processing　technique　for　high-dimensional　data.　Some　feature　selection　methods　only　process　some　of　the　features　selected　by　the　algorithm　and　do　not　take　into　account　the　information　associated　with　the　selected　features,　resulting　in　the　usability　of　the　final　experimental　results　not　being　high.　This　paper　proposes　a　hybrid　method　based　on　feature　selection　and　a　cluster　analysis　to　solve　the　data　utility　and　privacy　problems　of　high-dimensional　data　in　the　actual　publishing　process.　The　proposed　method　is　divided　into　three　stages:　(1)　screening　features;　(2)　analyzing　the　clustering　of　features;　and　(3)　adaptive　noise.　This　paper　uses　the　Wisconsin　Breast　Cancer　Diagnostic　(WDBC)　database　from　UCI’s　Machine　Learning　Library.　Using　classification　accuracy　to　evaluate　the　performance　of　the　proposed　method,　the　experiments　show　that　the　original　data　are　processed　by　the　algorithm　in　this　paper　while　protecting　the　sensitive　data　information　while　retaining　the　contribution　of　the　data　to　the　diagnostic　results.　©　2023　by　the　authors.

Keyword：

feature selection high-dimensional data random forest clustering differential privacy

Author Community：

[ 1 ] [Chu Z.]School of Software Engineering, Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Chu Z.]Key Laboratory of Security for Network and Data in Industrial Internet of Liaoning Province, Jinzhou, 121000, China
[ 3 ] [He J.]School of Software Engineering, Beijing University of Technology, Beijing, 100124, China
[ 4 ] [Zhang X.]Key Laboratory of Security for Network and Data in Industrial Internet of Liaoning Province, Jinzhou, 121000, China
[ 5 ] [Zhang X.]Key Laboratory of Security for Network and Data in Industrial Internet of Liaoning Province, Jinzhou, 121000, China
[ 6 ] [Zhu N.]School of Software Engineering, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Random forest-based feature selection and detection method for drunk driving recognition
2020，INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS
An Online Method Based on Random Forest for Air Pollutant Concentration Forecasting
2018，37th Chinese Control Conference (CCC)
Predicting prognosis of endometrioid endometrial adenocarcinoma on the basis of gene expression and clinical features using Random Forest
2019，ONCOLOGY LETTERS
Diagnosis of Large for Gestational Age Fetus with an Expert-Driven Feature Selection Scheme
2019，IEEE Symposium Series on Computational Intelligence (SSCI)

Source ：

Electronics (Switzerland)

ISSN： 2079-9292

Year： 2023

Issue： 9

Volume： 12

2 . 9 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：19

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to