Title: Word sense disambiguation in Bengali: An unsupervised approach
Abstract: In the proposed approach, Word Sense Disambiguation (WSD) in Bengali language has been done using unsupervised methodology. This work is consisted of sequential two sub-tasks. First one is grouping of Bengali sentences into a certain number of clusters where a particular cluster contains the sentences of similar meaning and second one is labeling the clusters with its inner meanings with the help of a linguistic expert as these sense tagged clusters could be used as a knowledge reference for WSD task. In this work, clustering has been performed using weka-3-6-13 tool. The test sentences are collected from the Bengali text corpus developed in the TDIL (Technology Development for Indian Language) project of the Govt. of India. In this work, Type-based and Token-based distributional approaches have been developed for Bengali sentence clustering. In Type-based method, a feature vector of co-occurring words of a target word in a sentence has been considered and in Token-based method, synsets of the collocating words are also considered. The synsets of the collocating words are retrieved from the Bengali WordNet, developed at ISI, Kolkata. The base line result, achieved result and the pitfalls of the procedure are discussed in the report in detail.
Publication Year: 2017
Publication Date: 2017-02-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 7
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot