Indexed by:
Abstract:
With the construction of the electronic medical record system, medical record data begins to accumulate, and how to extract essential information from these resources has become a concern. And named entity recognition(NER) is the first step. With the help of doctors, we built a small Chinese electronic medical record annotation corpus. But the NER supervision method requires a large amount of manually labeled corpus. So to reduce the cost of it and make better use of the unlabeled corpus, this paper proposes a semi-supervised Chinese electronic medical record NER model based on ALBERT-BiLSTM-CRF which named CEMRNER. The model uses a Bidirectional Long Short Term Memory network (BiLSTM) and a Conditional Random Field model (CRF) to train the data and introduces the pre-training language model ALBERT to solve the problem of Chinese representation. At the same time, we propose a dual selected strategy to select the high confidence samples and expand the training set. The dual strategy can ensure the accuracy i automatically labeled data, and reduce the error iteration in semi-supervised learning. The experiment and analysis show that compared with other models, this method is more accurate and comprehensive. The precision, recall rate, and F1Score are 85.45%, 87.81%, and 86.61%, respectively. The paper proves that using a semi-supervised method and pre-training ALBERT can improve the accuracy of recognition under the condition of less labeled data. © 2020 ACM.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2020
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 5
Affiliated Colleges: