• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Gao, MingXia (Gao, MingXia.) | Li, JiaYi (Li, JiaYi.)

Indexed by:

EI Scopus

Abstract:

The value of Internet short texts is increasingly prominent, and traditional classification methods cannot be applied to short texts with weak feature expression. In this regard, this paper proposes a Chinese Short Text Classification method based on Word Embedding and LSTM with feature enhancement (hereinafter called CSTCFE-WE-LSTM). This method uses word embedding learned from Wikipedia corpus as initial features for the model, and then uses category factors and TF-IDF to generate weights to enhance features. Finally, it uses a 6-layer neural network for classification, which includes a word embedding layer, two LSTM layers, a Dropout layer, and two fully connected layers. In order to verify the method CSTCFE-WE-LSTM, we collected short text sets on 3 topics, and reached following conclusions: 1. In the best model, P, F and other indicators are better than the classifier of using Wikipedia word embedding and KNN. 2. For texts with sentences average length less than 10 words, the effect of two-layer LSTM is better than that of single-layer LSTM, and the effect is better when the number of single-layer nodes is 50. 3. The effect of feature enhancement is better for the health category than the commercial category. © 2021 IEEE.

Keyword:

Classification (of information) Multilayer neural networks Text processing Embeddings Long short-term memory

Author Community:

  • [ 1 ] [Gao, MingXia]Beijing University of Technology, Faculty of Information Technology, Beijing, China
  • [ 2 ] [Li, JiaYi]Beijing University of Technology, Faculty of Information Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2021

Page: 91-95

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 4

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 3

Affiliated Colleges:

Online/Total:444/10557522
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.