Indexed by:
Abstract:
The value of Internet short texts is increasingly prominent, and traditional classification methods cannot be applied to short texts with weak feature expression. In this regard, this paper proposes a Chinese Short Text Classification method based on Word Embedding and LSTM with feature enhancement (hereinafter called CSTCFE-WE-LSTM). This method uses word embedding learned from Wikipedia corpus as initial features for the model, and then uses category factors and TF-IDF to generate weights to enhance features. Finally, it uses a 6-layer neural network for classification, which includes a word embedding layer, two LSTM layers, a Dropout layer, and two fully connected layers. In order to verify the method CSTCFE-WE-LSTM, we collected short text sets on 3 topics, and reached following conclusions: 1. In the best model, P, F and other indicators are better than the classifier of using Wikipedia word embedding and KNN. 2. For texts with sentences average length less than 10 words, the effect of two-layer LSTM is better than that of single-layer LSTM, and the effect is better when the number of single-layer nodes is 50. 3. The effect of feature enhancement is better for the health category than the commercial category. © 2021 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2021
Page: 91-95
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 4
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: