Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books - Details

Author：

Song, Bingyan (Song, Bingyan.) | Bao, Zhenshan (Bao, Zhenshan.) | Wang, YueZhang (Wang, YueZhang.) | Zhang, Wenbo (Zhang, Wenbo.) | Sun, Chao (Sun, Chao.)

Indexed by：

EI Scopus

Abstract：

Little　research　has　been　done　on　the　Named　Entity　Recognition　(NER)　of　Traditional　Chinese　Medicine　(TCM)　books　and　most　of　them　use　statistical　models　such　as　Conditional　Random　Fields　(CRFs).　However,　in　these　methods,　lexicon　information　and　large-scale　of　unlabeled　corpus　data　are　not　fully　exploited.　In　order　to　improve　the　performance　of　NER　for　TCM　books,　we　propose　a　method　which　is　based　on　biLSTM-CRF　model　and　can　incorporate　lexicon　information　into　representation　layer　to　enrich　its　semantic　information.　We　compared　our　approach　with　several　previous　character-based　and　word-based　methods.　Experiments　on　＇Shanghan　Lun＇　dataset　show　that　our　method　outperforms　previous　models.　In　addition,　we　collected　376　TCM　books　to　construct　a　large-scale　of　corpus　to　obtain　the　pre-trained　vectors　since　there　is　no　large　available　corpus　in　this　field　before.　We　have　released　the　corpus　and　pre-trained　vectors　to　the　public.　©　2020,　Springer　Nature　Switzerland　AG.

Keyword：

Medicine Semantics Natural language processing systems Random processes

Author Community：

[ 1 ] [Song, Bingyan]College of Computer Science, Beijing University of Technology, Beijing; 100124, China
[ 2 ] [Bao, Zhenshan]College of Computer Science, Beijing University of Technology, Beijing; 100124, China
[ 3 ] [Wang, YueZhang]College of Computer Science, Beijing University of Technology, Beijing; 100124, China
[ 4 ] [Zhang, Wenbo]College of Computer Science, Beijing University of Technology, Beijing; 100124, China
[ 5 ] [Sun, Chao]College of Chinese Medicine, Capital Medical University, Beijing; 100069, China