Title: Research and Improvement of TFIDF Text Feature Weighting Method
Abstract:Keywords extraction method plays a very important role in the areas of text classification and information retrieval.This paper firstly analysed the shortage of the original TFIDF algorithm,that is th...Keywords extraction method plays a very important role in the areas of text classification and information retrieval.This paper firstly analysed the shortage of the original TFIDF algorithm,that is the IDF(Inverse Document Frequency)algorithm does not consider the distribution of feature term between categories.So some problems will appear,such as the terms with low frequency and the high IDF weights,and some words with high frequency and low IDF weights,which can cause that the precision of keywords extraction is not accurate.After analysis of these problems,by increasing a new weight DI(Distribution Information),we got a new DI-TFIDF algorithm.A corpus used in the experiment was downloaded from the Sogou corpus and we selected the 1000article of sports,education and military documents as an experiment based on the traditional TFIDF method and the DI-TFIDF method.Experimental results show that our proposed DI-TFIDF method can extract the keywords in a higher accuracy than traditional TFIDF algorithm.Read More
Publication Year: 2014
Publication Date: 2014-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 1
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot