Title: An Index Structure of Semi-Structure Data Set for Similarity Search
Abstract: A new index, called CSS-tree, is proposed to organize and search dynamic high-dimension vast semi-structure data set. The CSS-tree is a multi-way balance tree, which is combining the benefit of R-tree and SS-tree to deal with high-dimension vast data sets, and the benefit of M-tree to deal with metric data sets. This paper details the structure of CSS-tree, whose each inner node is composed of a group of index elements including cover center and cover radius of child tree and every leaf is in same level and all data indices is in leaves. The paper give algorithms for similarity search based CSS-tree both range search and k-NN search, and dynamic update algorithms of the CSS-tree. It describes the simply split policy which reference to CF-tree's split policy of BIRTH, and reorganizing algorithms which using clustering technique to keep the index elements that the similar elements are neighbor in the index tree, and avoid the need of independent between feather values. It also describes how to keep minimum cover space and overlap space. Using simulation data sets and using part of Chinese Encyclopedia Database as data set, which is on XML document set, experiments show that the CSS-tree is close to SS+-tree and M-tree in building tree, but CSS-tree outperforms both SS+-tree and M-tree in similarity search in semi-structured data sets.
Publication Year: 2002
Publication Date: 2002-01-01
Language: en
Type: article
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot