Title: A Framework of a Hybrid Focused Web Crawler
Abstract:Because of the complex Web structure, most approaches of focused crawling employ a local search algorithm, which will only search pages in a sub-graph of the Web. And the multi-topic feature of Web pa...Because of the complex Web structure, most approaches of focused crawling employ a local search algorithm, which will only search pages in a sub-graph of the Web. And the multi-topic feature of Web pages makes it difficult to determine the relevance of a Web page to a given topic. Towards those two issues, in this paper we present a new hybrid approach to focused crawling, which is based on meta-search and VIPS (VIsion based Page Segmentation) algorithm. We use meta-search to achieve a wider crawling range than traditional local search algorithm. Besides, in order to obtain better recall and precision, we use VIPS-based algorithm for the relevance computation of a Web page, which first partitions a Web page into a set of blocks that reflect the semantic structure of the page. The system architecture of hybrid focused crawler is discussed after a short review on related work, and then we present the framework of the hybrid focused crawling approach.Read More
Publication Year: 2008
Publication Date: 2008-12-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 10
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot