Title: Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms
Abstract: Weighting and normalization are the most important factor that may affect the text representation significantly. This paper presents two novel term weighting schemes to represent text documents, namely, i). Term-weighting scheme for document representation based on Term Frequency - Ranking of Term Frequency (TF-RTF) and ii). Term-weighting scheme for document representation based on Term Frequency - Ranking of fuzzy logic with semantic relationship of terms (TF-RFST). The ranking of each term in a document provides its priority of the document and uses these priorities for document representation in TF-RTF. In TF-RFST, each term is represented based on its frequency and the frequency of semantic related terms for that term. Hence, the ranking of each term is based on the combined frequencies of the term and its semantic related terms with a specific weighting scheme. With appropriate weighting schemes such as TF-RFT and TF-RFST, the proposed methods provide better clustering performance in terms of accuracy, entropy, recall and F-Measure than previously suggested methods, such as word count, Term Frequency-Inverse Document Frequency (TF-IDF), Term Frequency-Inverse Corpus Frequency (TF-ICF), Multi Aspect TF (MATF), BM25 and BM25F. Experiments carried out on the Reuters-8, Reuters-52 and WebKB data sets with K-means and K-means++ clustering algorithms for demonstrate the effectiveness of the proposed term weighting schemes.
Publication Year: 2019
Publication Date: 2019-07-11
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 19
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot