Indexed by:
Abstract:
Electronic medical record (EMR) text word segmentation is the basis of natural language processing in medicine. Due to the characteristics of EMR, such as strong specialization, high cost of annotation, special writing style and sustained growth of terminology, the current Chinese word segmentation (CWS) methods cannot fully meet the requirements of the application of EMR. In order to solve this problem, an EMR word segmentation model based on Graph Neural Network (GNN), bidirectional Long Short-Term Memory network (Bi-LSTM) and conditional random field (CRF) is designed in this paper to improve the segmentation effect and reduce the dependence on data set. In the model, GNN based on the domain lexicon is used to learn the local composition features, Bi-LSTM is used to capture the long-term dependence and context sequence information, and CRF is used to obtain the optimal annotation sequence based on the sentence level label information. Through multi-feature interaction, the ambiguity resolution and new word recognition in the EMR word segmentation are effectively carried out. Compared with CWS tools such as Jieba and Pkuseg, as well as baseline models and state-of-the-art methods, the precision and recall rate of the model in this paper have been significantly improved.
Keyword:
Reprint Author's Address:
Source :
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE
ISSN: 2156-1125
Year: 2020
Page: 985-989
Language: English
Cited Count:
WoS CC Cited Count: 6
SCOPUS Cited Count: 8
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: