Indexed by:
Abstract:
Long document classification (LDC) has been a focused interest in natural language processing (NLP) recently with the exponential increase of publications. Based on the pretrained language models, many LDC methods have been proposed and achieved considerable progression. However, most of the existing methods model long documents as sequences of text while omitting the document structure, thus limiting the capability of effectively representing long texts carrying structure information. To mitigate such limitation, we propose a novel hierarchical graph convolutional network (HGCN) for structured LDC in this article, in which a section graph network is proposed to model the macrostructure of a document and a word graph network with a decoupled graph convolutional block is designed to extract the fine-grained features of a document. In addition, an interaction strategy is proposed to integrate these two networks as a whole by propagating features between them. To verify the effectiveness of the proposed model, four structured long document datasets are constructed, and the extensive experiments conducted on these datasets and another unstructured dataset show that the proposed method outperforms the state-of-the-art related classification methods.
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
ISSN: 2162-237X
Year: 2022
Issue: 10
Volume: 34
Page: 8071-8085
1 0 . 4
JCR@2022
1 0 . 4 0 0
JCR@2022
ESI Discipline: COMPUTER SCIENCE;
ESI HC Threshold:46
JCR Journal Grade:1
CAS Journal Grade:1
Cited Count:
WoS CC Cited Count: 9
SCOPUS Cited Count: 16
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 6
Affiliated Colleges: