Indexed by:
Abstract:
How to find what a user wants in tremendous amount of Web information is a great challenge to web search engine. By focusing downloading web pages on a given domain, focused crawlers can save a great deal of works and improve the quality of the information they provide. We put forward a method of focused crawling-MatchLink. It uses document vector model to evaluate topic relevance of the anchor and uses Naive Bayes algorithm and multilayer classification method to compute the topic relevance of the web page containing the anchor. According to these two relevancies, topic relevant web pages have prior claim to be downloaded. Experiment shows that the result is better than BestFirst and BreadthFirst.
Keyword:
Reprint Author's Address:
Email:
Source :
Journal of Beijing University of Technology
ISSN: 0254-0037
Year: 2007
Issue: 11
Volume: 33
Page: 1227-1232
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 25
Affiliated Colleges: