• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

An, Xin (An, Xin.) | Sun, Xin (Sun, Xin.) | Xu, Shuo (Xu, Shuo.) (Scholars:徐硕)

Indexed by:

SSCI Scopus SCIE

Abstract:

Given that citations are not equally important, various techniques have been presented to identify important citations on the basis of supervised machine learning models. However, only a small volume of instances have been annotated manually with the labels. To make full use of unlabeled instances and promote the identification performance, the semi-supervised self-training technique is utilized here to identify important citations in this work. After six groups of features are engineered, the SVM and RF models are chosen as the base classifiers for self-training strategy. Then two experiments based on two different types of datasets are conducted. The experiment on the expert-labeled dataset from one single discipline shows that the semi-supervised versions of SVM and RF models significantly improve the performance of the conventional supervised versions when unannotated samples under 75% and 95% confidence level are rejoined to the training set, respectively. The AUC-PR and AUC-ROC of SVM model are 0.8102 and 0.9622, and those of RF model reach 0.9248 and 0.9841, which outperform their counterparts and the benchmark methods in the literature. This demonstrates the effectiveness of our semi-supervised self-training strategy for important citation identification. Another experiment on the author-labeled dataset from multiple disciplines, semi-supervised learning models can perform better than their supervised learning counterparts in term of AUC-PR when the ratio of labeled instances is less than 20%. Compared to our first experiment, insufficient amount of instances from each discipline in our second experiment enables the performance of the models to be unsatisfactory.

Keyword:

Semi-supervised learning Self-training Important citation Author-labeled dataset Expert-labeled dataset

Author Community:

  • [ 1 ] [An, Xin]Beijing Forestry Univ, Sch Econ & Management, Beijing 100083, Peoples R China
  • [ 2 ] [Sun, Xin]Inst Sci & Tech Informat China, Beijing 100038, Peoples R China
  • [ 3 ] [Xu, Shuo]Beijing Univ Technol, Coll Econ & Management, Beijing 100124, Peoples R China

Reprint Author's Address:

Show more details

Related Keywords:

Source :

SCIENTOMETRICS

ISSN: 0138-9130

Year: 2022

Issue: 11

Volume: 127

Page: 6533-6555

3 . 9

JCR@2022

3 . 9 0 0

JCR@2022

ESI Discipline: SOCIAL SCIENCES, GENERAL;

ESI HC Threshold:27

JCR Journal Grade:2

CAS Journal Grade:3

Cited Count:

WoS CC Cited Count: 8

SCOPUS Cited Count: 10

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 3

Affiliated Colleges:

Online/Total:1242/10605138
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.