Indexed by:
Abstract:
Given that citations are not equally important, various techniques have been presented to identify important citations on the basis of supervised machine learning models. However, only a small volume of data has been annotated manually with the labels. To make full use of unlabeled data and promote the learning performance, the semi-supervised self-training technique is utilized to identify important citations in this work. After six groups of features are engineered, the semi-supervised versions of SVM and RF models improve significantly the performance of the conventional supervised versions when un-annotated samples under 75% and 95% confidence level are rejoined to the training set, respectively. The AUC-PR and AUC-ROC of SVM model are 0.8102 and 0.9622, and those of RF model reach 0.9248 and 0.9841, which outperform their counterparts. This demonstrates the effectiveness of our semi-supervised self-training strategy for important citation identification. © 2021 CEUR-WS. All rights reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 1613-0073
Year: 2021
Volume: 2871
Page: 164-170
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 16
Affiliated Colleges: