Indexed by:
Abstract:
Unknown word recognition technology is of great significance to improve the precision of text segmentation and syntax analysis. Social network has become an important platform for sharing, disseminating, and acquiring information. Unknown word recognition based on micro-blog short text has become a research hot spot, while the micro-blog text contains a large number of nonstandard terms and network buzzwords, which has increased the difficulty of unknown word recognition. This paper proposes a Chinese unknown word recognition method for micro-blog short text based on improved FP-growth (POS-FP). Firstly, the POS-FP algorithm is used to get frequent itemsets from micro-blog, and the N-grams model is used to filter out unknown words from frequent itemsets. Secondly, the improved mutual information and left-right information entropy are used to verify the internal features of candidate unknown words. Then, context-dependent and open-source methods are used for external verification of candidate unknown words. Compared with traditional methods, this algorithm improves the recognition rate of unknown words in micro-blog short texts.
Keyword:
Reprint Author's Address:
Email:
Source :
PATTERN ANALYSIS AND APPLICATIONS
ISSN: 1433-7541
Year: 2020
Issue: 2
Volume: 23
Page: 1011-1020
3 . 9 0 0
JCR@2022
ESI Discipline: ENGINEERING;
ESI HC Threshold:115
Cited Count:
WoS CC Cited Count: 7
SCOPUS Cited Count: 10
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 6
Affiliated Colleges: