Title: Sentence segmentation using IBM word alignment model 1
Abstract: In statistical machine translation, word alignment models are trained on bilingual corpora. Long sentences pose severe problems: 1. the high computational requirements; 2. the poor quality of the resulting word alignment. We present a sentence-segmentation method that solves these problems by splitting long sentence pairs. Our approach uses the lexicon information to locate the optimal split point. This method is evaluated on two Chinese-English translation tasks in the news domain. We show that the segmentation of long sentences before training significantly improves the final translation quality of a state-of-the-art machine translation system. In one of the tasks, we achieve an improvement of the BLEU score of more than 20% relative.
Publication Year: 2005
Publication Date: 2005-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 32
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot