Title: Improving n-gram models by incorporating enhanced distributions
Abstract: Two methods of improving conventional n-gram statistical language models are examined. The first involves using a new set of n-gram statistics that attempt to improve the ability of a system to identify phrases correctly. The second involves replacing the maximum likelihood unigram component with an optimised distribution. We test these approaches by incorporating them into weighted average [1] and deleted estimate [2] language models trained on a large newspaper corpus. The improvements lead to a reduction in perplexity of 4.5% and 4.9% respectively for these models.
Publication Year: 2002
Publication Date: 2002-12-24
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 2
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot