• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Gao, Mingxia (Gao, Mingxia.) | Li, Jiayi (Li, Jiayi.)

Indexed by:

EI Scopus

Abstract:

The classification of Chinese short text suffers from the problem of feature sparsity. In this paper, we propose a Chinese short text classification method based on Bert sentence embedding integrating external statistical features (CSTCBERTSE-ESF). [Methods] The CSTC-BERTSE-ESF method uses BERT sentence embedding as the feature base, then concatenate domain features obtained from the domain corpus by statistical methods as the complete features of the classifier, and finally uses Random Forest and Softmax for the classification. [Results] To verify the effectiveness of the method, a series of experiments were conducted in this paper using THUNews with the Book dataset. The accuracy of the method reached 93.8% and 93% on the two datasets, respectively, and the F1 value of CSTC-BERTSE-ESF can be improved by 8.8% compared to CSTC-BERTSE on the book dataset. [Limitation] Due to the limitation of computing power, the domain corpora used in this paper are small, and the domain features will be more accurate if the capacity of the corpus is increased. [Conclusion] The fusion of external domain features can effectively improve the classification of Chinese short texts, especially the data with strong domain knowledge. © 2022 IEEE.

Keyword:

Embeddings Classification (of information) Forestry Text processing Computing power

Author Community:

  • [ 1 ] [Gao, Mingxia]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China
  • [ 2 ] [Li, Jiayi]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2022

Page: 78-83

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 5

Affiliated Colleges:

Online/Total:426/10581296
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.