Title: RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Abstract: The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and dierences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach.
Publication Year: 2001
Publication Date: 2001-09-11
Language: en
Type: article
Access and Citation
Cited By Count: 947
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot