Title: The Keyword Extraction of Chinese Medical Web Page Based on WF-TF-IDF Algorithm
Abstract:Web page keyword extraction is widely used in web text classification, text clustering, and information retrieval. However, the keyword extraction of the Chinese web page still need be improved and ap...Web page keyword extraction is widely used in web text classification, text clustering, and information retrieval. However, the keyword extraction of the Chinese web page still need be improved and applied, especially in the medical field. This paper proposes an improved TF-IDF algorithm based on WF-TF-IDF to extract keywords from Chinese medical web page. The WF-TF-IDF algorithm considers three factors which are word frequency in the title, description and word distribution of categories in the corpus. We do the data-preprocessing which includes web page denoising, regular expression processing, Chinese word segmentation, synonyms exchanging and stop word filtering. Then we extract keywords based on the result of data-preprocessing. We filter the meaningless words in the extracted keywords according to the part of speech. The experimental results shows that the WF-TF-IDF algorithm improves the precision rate and recall rate by about 7% compared to the traditional TF-IDF algorithm.Read More
Publication Year: 2017
Publication Date: 2017-10-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 29
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot