Title: The Evolution of Link-Attributes for Pages and Its Implications on Web Crawling
Abstract:It is important for an incremental crawler to know how web pages evolve and the relation between their changing frequencies and the link-attributes such as indegrees. This paper proposes a model for i...It is important for an incremental crawler to know how web pages evolve and the relation between their changing frequencies and the link-attributes such as indegrees. This paper proposes a model for incremental crawling and performs an experiment to verify the correlation between them, by monitoring the evolution of all the link-attributes of the web pages within one website. Particularly, we look deeply into one special kind of page named Index-pages. From the experiment, we can make four conclusions: (1) Pages which have bigger indegrees, outdegrees or PageRank values change more often, and these link-attributes all approximately obey a power-law distribution. (2) The link-attributes of pages seldom change though the pages change themselves. (3) A small proportion of the pages link to most of the vertexes in the web graph. (4) The Index-pages link to sizeable new pages in a website. These conclusions can be used to greatly enhance the performance of an incremental crawler, which is the foremost component for general search engines and web information stores.Read More
Publication Year: 2005
Publication Date: 2005-03-31
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 4
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot