Title: A Comparative Study on Unsupervised Feature Selection Methods for Text Clustering
Abstract: Text clustering is one of the central problems in text mining and information retrieval area. For the high dimensionality of feature space and the inherent data sparsity, performance of clustering algorithms will dramatically decline. Two techniques are used to deal with this problem: feature extraction and feature selection. Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, four unsupervised feature selection methods, DF, TC, TVQ, and a new proposed method TV are introduced. Experiments are taken to show that feature selection methods can improves efficiency as well as accuracy of text clustering. Three clustering validity criterions are studied and used to evaluate clustering results.
Publication Year: 2006
Publication Date: 2006-03-21
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 115
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot