Title: Modeling of term-distance and term-occurrence information for improving n-gram language model performance
Abstract: In this paper, we explore the use of distance and co-occurrence information of word-pairs for language modeling. We attempt to extract this information from history-contexts of up to ten words in size, and found it complements well the n-gram model, which inherently suffers from data scarcity in learning long history-contexts. Evaluated on the WSJ corpus, bigram and trigram model perplexity were reduced up to 23.5% and 14.0%, respectively. Compared to the distant bigram, we show that word-pairs can be more effectively modeled in terms of both distance and occurrence.
Publication Year: 2013
Publication Date: 2013-08-01
Language: en
Type: article
Access and Citation
Cited By Count: 5
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot