Indexed by:
Abstract:
Unknown word recognition is one of the important research contents of natural language processing. However, there are still problems such as sparse data, corpus noise, and various forms of expressions for the identification of micro-blog short words. This paper proposes an unknown words recognition method POS-FP (Frequent Pattern growth with part- of-speech)for micro-blog short text. Firstly, the candidate unknown words are obtained by combing the N-grams model and frequent item sets. Then the unknown word is filtered and verified by the improved mutual information, information entropy and context dependence. Finally, the open verification method is used to obtain final unknown word. Experiments show that the algorithm improved the unknown word recognition for micro-blog short texts. © 2018 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2018
Page: 1-7
Language: English
Cited Count:
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 17
Affiliated Colleges: