Performance of using LDA for Chinese news text classification - Details

Author：

Wu, Xiaojun (Wu, Xiaojun.) | Fang, Liying (Fang, Liying.) | Wang, Pu (Wang, Pu.) | Yu, Nan (Yu, Nan.)

Indexed by：

EI Scopus

Abstract：

Chinese　text　classification　is　always　challenging,　especially　when　data　are　high　dimensional　and　sparse.　In　this　paper,　we　are　interested　in　the　way　of　text　representation　and　dimension　reduction　in　Chinese　text　classification.　First,　we　introduces　a　topic　model-Latent　Dirichlet　Allocation(LDA),　which　is　uses　LDA　model　as　a　dimension　reduction　method.　Second,　we　choose　Support　Vector　Machine(SVM)　as　the　classification　algorithm.　Next,　a　method　of　text　classification　based　on　LDA　and　SVM　is　described.　Finally,　we　choose　documents　with　large　number　of　Chinese　text　for　experiment.　Compared　with　LDA　method　and　the　traditional　TF-IDF　method,　the　experimental　results　show　that　LDA　method　runs　a　better　results　both　on　the　classification　accuracy　and　running　time.　©　2015　IEEE.

Keyword：

Text processing Support vector machines Statistics Classification (of information)

Author Community：

[ 1 ] [Wu, Xiaojun]Department of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China
[ 2 ] [Fang, Liying]Department of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China
[ 3 ] [Wang, Pu]Department of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China
[ 4 ] [Yu, Nan]Department of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China