MatchLink: A focused crawling method - Details

Author：

Jiang, Zong-Li (Jiang, Zong-Li.) (Scholars：蒋宗礼) | Lu, Guo-Xiang (Lu, Guo-Xiang.)

Indexed by：

EI Scopus PKU CSCD

Abstract：

How　to　find　what　a　user　wants　in　tremendous　amount　of　Web　information　is　a　great　challenge　to　web　search　engine.　By　focusing　downloading　web　pages　on　a　given　domain,　focused　crawlers　can　save　a　great　deal　of　works　and　improve　the　quality　of　the　information　they　provide.　We　put　forward　a　method　of　focused　crawling-MatchLink.　It　uses　document　vector　model　to　evaluate　topic　relevance　of　the　anchor　and　uses　Naive　Bayes　algorithm　and　multilayer　classification　method　to　compute　the　topic　relevance　of　the　web　page　containing　the　anchor.　According　to　these　two　relevancies,　topic　relevant　web　pages　have　prior　claim　to　be　downloaded.　Experiment　shows　that　the　result　is　better　than　BestFirst　and　BreadthFirst.

Keyword：

Algorithms Search engines Websites Classification (of information)

Author Community：

[ 1 ] [Jiang, Zong-Li]College of Computer Science, Beijing University of Technology, Beijing 100022, China
[ 2 ] [Lu, Guo-Xiang]College of Computer Science, Beijing University of Technology, Beijing 100022, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

WebSifter: An assistant system for personal web search
2005，Journal of Tsinghua University
A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech
2004，2004 International Symposium on Chinese Spoken Language Processing
An ontology search engine based on semantic analysis
2005，3rd International Conference on Information Technology and Applications, ICITA 2005
A semantic matching algorithm of web services based on Google distance
2011，Journal of Computational Information Systems

Source ：

Journal of Beijing University of Technology

ISSN： 0254-0037

Year： 2007

Issue： 11

Volume： 33

Page： 1227-1232

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 25

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to