• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Chen, Jie (Chen, Jie.) | Chen, Cai (Chen, Cai.) | Liang, Yi (Liang, Yi.)

Indexed by:

CPCI-S

Abstract:

The classical TF-IDF algorithm only considers the weight of the term frequency and the inverse document frequency, without considering the weights of other feature of word. After the author analyzing summary of Chinese expression habits, an adaptive weight of position of word algorithm based on TF-IDF is proposed in this paper, which can be called TF-IDF-AP algorithm. The TF-IDF-AP algorithm can dynamically determine the weight of position of word according to the position of word. This paper introduced the vector space model (VSM) and designed comparative experiment under the scene of Chinese document clustering. The results show that the F-measure of TF-IDF-AP algorithm has been improved by 12.9% comparing with the classical TF-IDF algorithm.

Keyword:

Term Frequency-Inverse Document Frequency(TF-IDF) weight of position text feature extraction adaptive weight

Author Community:

  • [ 1 ] [Chen, Jie]Beijing Univ Technol Beijing, Fac Informat Technol, Beijing, Peoples R China
  • [ 2 ] [Chen, Cai]Beijing Univ Technol Beijing, Fac Informat Technol, Beijing, Peoples R China
  • [ 3 ] [Liang, Yi]Beijing Univ Technol Beijing, Fac Informat Technol, Beijing, Peoples R China

Reprint Author's Address:

  • [Chen, Jie]Beijing Univ Technol Beijing, Fac Informat Technol, Beijing, Peoples R China

Email:

Show more details

Related Keywords:

Related Article:

Source :

PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2016)

ISSN: 1951-6851

Year: 2016

Volume: 133

Page: 114-117

Language: English

Cited Count:

WoS CC Cited Count: 7

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 13

Online/Total:295/10523599
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.