Title: Study on Feature Selection and Weighting Based on Synonym Merge in Text Categorization
Abstract: Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term's strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems.
Publication Year: 2010
Publication Date: 2010-01-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 8
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot