Chinese short text classification method based on word embedding and Long Short-Term Memory Neural Network - Details

Author：

Gao, MingXia (Gao, MingXia.) | Li, JiaYi (Li, JiaYi.)

Indexed by：

EI Scopus

Abstract：

The　value　of　Internet　short　texts　is　increasingly　prominent,　and　traditional　classification　methods　cannot　be　applied　to　short　texts　with　weak　feature　expression.　In　this　regard,　this　paper　proposes　a　Chinese　Short　Text　Classification　method　based　on　Word　Embedding　and　LSTM　with　feature　enhancement　(hereinafter　called　CSTCFE-WE-LSTM).　This　method　uses　word　embedding　learned　from　Wikipedia　corpus　as　initial　features　for　the　model,　and　then　uses　category　factors　and　TF-IDF　to　generate　weights　to　enhance　features.　Finally,　it　uses　a　6-layer　neural　network　for　classification,　which　includes　a　word　embedding　layer,　two　LSTM　layers,　a　Dropout　layer,　and　two　fully　connected　layers.　In　order　to　verify　the　method　CSTCFE-WE-LSTM,　we　collected　short　text　sets　on　3　topics,　and　reached　following　conclusions:　1.　In　the　best　model,　P,　F　and　other　indicators　are　better　than　the　classifier　of　using　Wikipedia　word　embedding　and　KNN.　2.　For　texts　with　sentences　average　length　less　than　10　words,　the　effect　of　two-layer　LSTM　is　better　than　that　of　single-layer　LSTM,　and　the　effect　is　better　when　the　number　of　single-layer　nodes　is　50.　3.　The　effect　of　feature　enhancement　is　better　for　the　health　category　than　the　commercial　category.　©　2021　IEEE.

Keyword：

Classification (of information) Multilayer neural networks Text processing Embeddings Long short-term memory

Author Community：

[ 1 ] [Gao, MingXia]Beijing University of Technology, Faculty of Information Technology, Beijing, China
[ 2 ] [Li, JiaYi]Beijing University of Technology, Faculty of Information Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Research on Text Classification Based on CNN and LSTM
2019，2019 IEEE International Conference on Artificial Intelligence and Computer Applications, ICAICA 2019
Mobile phone spam text classification based on prior knowledge
2018，4th IEEE International Conference on Computer and Communications, ICCC 2018
The Study on the Text Classification Based on Graph Convolutional Network and BiLSTM
2022，8th International Conference on Computing and Artificial Intelligence, ICCAI 2022
Learning Chinese word embeddings from stroke, structure and pinyin of characters
2019，28th ACM International Conference on Information and Knowledge Management, CIKM 2019

Source ：

Year： 2021

Page： 91-95

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 3

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to