• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Yuan, Dingrong (Yuan, Dingrong.) | Mo, Zhuoying (Mo, Zhuoying.) | Xie, Bing (Xie, Bing.) | Xie, Yangcai (Xie, Yangcai.)

Indexed by:

EI Scopus

Abstract:

There are huge amounts of information on Web pages, which includes content information and other useless information, such as navigation, advertisement and flash of animation etc. Reducing the toils of Web users, we estabished a thechnique to extract the content information from web page. Fristly, we analyzed the semantic of web documents by V8 engine of Google and parsed the web document into DOM tree. And then, traversed the DOM tree, pruned the DOM tree in the light of the characteristic of Web page's edit language. Finally, we extracted the content information from Web page. Theoretics and experiments showed that the technique could simplify the web page, present the content information to web users and supply clean data for applicable area, such as retrieval, KDD and DM from web. © 2011 Springer-Verlag Berlin Heidelberg.

Keyword:

Semantics Information retrieval XML Websites

Author Community:

  • [ 1 ] [Yuan, Dingrong]College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China
  • [ 2 ] [Yuan, Dingrong]International WIC Institute, Beijing University of Technology, Beijing 100022, China
  • [ 3 ] [Mo, Zhuoying]College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China
  • [ 4 ] [Xie, Bing]College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China
  • [ 5 ] [Xie, Yangcai]College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China

Reprint Author's Address:

Show more details

Related Keywords:

Related Article:

Source :

ISSN: 1865-0929

Year: 2011

Issue: PART 2

Volume: 144 CCIS

Page: 271-278

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 6

Online/Total:497/10587318
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.