• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Yang Zhen (Yang Zhen.) (Scholars:杨震) | Fan Kefeng (Fan Kefeng.) | Lai Yingxu (Lai Yingxu.) (Scholars:赖英旭) | Gao Kaiming (Gao Kaiming.) | Wang Yong (Wang Yong.)

Indexed by:

EI Scopus SCIE

Abstract:

With the rapid development of information technology, short texts arising from socialized human interaction are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classification models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suffers from the assumption of independent, identically distributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as observations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the external reference document. According to Vapnik's methods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the structural risk, which provides a result that allows one to trade off errors on the training sample against improved generalization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the experimental results. With respect to established baselines, results of these experiments show that applying our proposed document expansion method produces better chance to achieve the improved classification performance.

Keyword:

Document expansion Text classification External reference Short texts Language model

Author Community:

  • [ 1 ] [Yang Zhen]Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
  • [ 2 ] [Lai Yingxu]Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
  • [ 3 ] [Gao Kaiming]Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
  • [ 4 ] [Fan Kefeng]China Elect Standardizat Inst, Beijing 100007, Peoples R China
  • [ 5 ] [Wang Yong]Guilin Univ Elect Technol, CSIP Guangxi Sect, Guilin 541004, Peoples R China

Reprint Author's Address:

  • 杨震

    [Yang Zhen]Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China

Show more details

Related Keywords:

Related Article:

Source :

CHINESE JOURNAL OF ELECTRONICS

ISSN: 1022-4653

Year: 2014

Issue: 2

Volume: 23

Page: 315-321

1 . 2 0 0

JCR@2022

ESI Discipline: ENGINEERING;

ESI HC Threshold:176

JCR Journal Grade:4

CAS Journal Grade:4

Cited Count:

WoS CC Cited Count: 8

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 7

Online/Total:857/10810350
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.