Title: Comparative analysis of approximate blocking techniques for entity resolution
Abstract: Entity Resolution is a core task for merging data collections. Due to its quadratic complexity, it typically scales to large volumes of data through blocking: similar entities are clustered into blocks and pair-wise comparisons are executed only between co-occurring entities, at the cost of some missed matches. There are numerous blocking methods, and the aim of this work is to offer a comprehensive empirical survey, extending the dimensions of comparison beyond what is commonly available in the literature. We consider 17 state-of-the-art blocking methods and use 6 popular real datasets to examine the robustness of their internal configurations and their relative balance between effectiveness and time efficiency. We also investigate their scalability over a corpus of 7 established synthetic datasets that range from 10,000 to 2 million entities.
Publication Year: 2016
Publication Date: 2016-05-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 165
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot