Title: Topic-specific Web Crawler using Probability Method
Abstract:Web has become an integral part of our lives and search engines play an important role in making users search the content online using specific topic.The web is a huge and highly dynamic environment w...Web has become an integral part of our lives and search engines play an important role in making users search the content online using specific topic.The web is a huge and highly dynamic environment which is growing exponentially in content and developing fast in structure.No search engine can cover the whole web, but it has to focus on the most valuable pages for crawling.Many methods have been developed based on link and text content analysis for retrieving the pages.Topic-specific web crawler collects the relevant web pages of interested topics of the user from the web.In this paper, we present an algorithm that covers the link, text content using Levenshtein distance and probability method to fetch more number of relevant pages based on the topic specified by the user.Evaluation illustrates that the proposed web crawler collects the best web pages under user interests during the earlier period of crawling.Read More