Title: An Extractive Malayalam Document Summarization Based on Graph Theoretic Approach
Abstract: Text summarization is a way to condense the large amount of information into a concise form by the process of selection of important information and discarding unimportant and redundant information. The need for Text summarization has increased much due to the abundance of documents in the internet.Even though a lot of text summarization systems have been developed for summarizing documents in various languages, there is no such well performing system for Malayalam.In this paper, we propose the use of Graph theoretic approach for summarizing Malayalam documents that is motivated by the method of identification of themes. After the common preprocessing steps, namely, stop word removal and stemming, sentences in the documents are represented as nodes in an undirected graph. There is a node for every sentence. Two sentences are connected with an edge if the two sentences share some common words, or in other words, their (cosine, or such) similarity is above some threshold. This representation yields two results: The partitions contained in the graph (that is those sub-graphs that are unconnected to the other sub graphs), form distinct topics covered in the documents. The second result yielded by the graph-theoretic method is the identification of the important sentences in the document. We apply graph theoretic approach on Malayalam text summarization task and achieve comparable results to the state of the art.
Publication Year: 2015
Publication Date: 2015-10-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 4
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot