Title: RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Abstract:The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper...The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and dierences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach.Read More
Publication Year: 2001
Publication Date: 2001-09-11
Language: en
Type: article
Access and Citation
Cited By Count: 951
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot