• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Yang, Z. (Yang, Z..) (Scholars:杨震) | Lei, J. (Lei, J..) | Wang, J. (Wang, J..) | Zhang, X. (Zhang, X..) | Guo, J. (Guo, J..) (Scholars:郭瑾)

Indexed by:

Scopus

Abstract:

As a simple classification method VSM has been widely applied in text information processing field. There are some problems for traditional VSM to select a refined vector model representation, which can make a good tradeoff between complexity and performance, especially for incremental text mining. To solve these problems, in this paper, several improvements, such as VSM based on improved TF, TFIDF and BM25, are discussed. And then maximum mutual information feature selection is introduced to achieve a low dimension VSM with less complexity, and at the same time keep an acceptable precision. The experimental results of spam filtering and short messages classification shows that the algorithm can achieve higher precision than existing algorithms under same conditions. © 2008 American Institute of Physics.

Keyword:

Incremental Text Classification; Short Messages Classification; Spam Filtering; VSM

Author Community:

  • [ 1 ] [Yang, Z.]School of Computer, Beijing University of Technology, Beijing, 100022, China
  • [ 2 ] [Lei, J.]School of Computer, Beijing University of Technology, Beijing, 100022, China
  • [ 3 ] [Wang, J.]School of Information, Central University of Finance and Economics, Beijing, 100081, China
  • [ 4 ] [Zhang, X.]School of Computer, Beijing University of Technology, Beijing, 100022, China
  • [ 5 ] [Guo, J.]School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

AIP Conference Proceedings

ISSN: 0094-243X

Year: 2008

Volume: 1060

Page: 369-373

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Online/Total:2490/10655320
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.