Title: Improved focused crawling using bayesian object based approach
Abstract:The rapid growth of the World-Wide-Web made it difficult for general purpose search engines, e.g. Google and Yahoo, to retrieve most of the relevant results in response to the user queries. A vertical...The rapid growth of the World-Wide-Web made it difficult for general purpose search engines, e.g. Google and Yahoo, to retrieve most of the relevant results in response to the user queries. A vertical search engine specialized in a specific topic became vital. Building vertical search engines is accomplished by the help of a focused crawler. A focused crawler traverses the web selecting out relevant pages to a predefined topic and neglecting those out of concern. The focused crawler is guided toward those relevant pages through a crawling strategy. In this paper, a new crawling strategy is presented that helps building a vertical search engine. With this strategy, the crawler is kept focused to the user interests toward the topic. We build a model that describes the Web pages' features that distinguish relevant Web documents from those that are irrelevant. This is accomplished in the form of a supervised learning process, the web page is treated as an object having a set of features, and the features' values determine the relevancy of the web page through a Bayesian model. Results from practical experiments proved the efficiency of the proposed crawling strategy.Read More
Publication Year: 2008
Publication Date: 2008-03-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 5
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot