Title: Predicting Categorial Sememe for English-Chinese Word Pairs via Representations in Explainable Sememe Space
Abstract: Sememe is the minimum unambiguous semantic unit in human language. Sememe knowledge bases(SKB) have been proven to be effective in many NLP tasks. Categorial sememe, indicating the basic category of word sense to bridge the lexicon and semantics, is indispensable in SKB. However, manual categorial sememe annotation is costly. This paper proposes a new task to automatically build SKB: English-Chinese Word Pair Categorial Sememe Prediction. The bilingual information is utilized to resolve the ambiguity challenge. Our method proposes the sememe space, in which sememes, words, and word senses are represented as vectors with interpretable semantics, to bridge the semantic gap between sememes and words. Extensive experiments and analyses validate the effectiveness of the proposed method. Using this method, we predict categorial sememes for 113,014 new word senses, and the prediction MAP is 85.8%. Further we conduct expert annotations based on prediction results and increase HowNet nearly by 50%. We will publish all the data and code.
Publication Year: 2021
Publication Date: 2021-01-01
Language: en
Type: book-chapter
Indexed In: ['crossref']
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot