Title: Splitting Long Input Sentences f or Phrase-based Statistical Machine Translation
Abstract: Translation results suffer when a standard phrase- based statistical machine translation system is used for translating long sentences. The translation output will not have the same word order as the source. When a sentence is long, it should be partitioned into several clauses, and the word re- ordering during the translation done within these clauses, not between the clauses. In this paper, we propose splitting the long sentences using linguis- tic information, and translating the sentence piece by piece. In other words, we constrain the word reordering so that it can only be done within the pieces but not between the pieces. We then ap- ply a language model to join the pieces back to- gether in the original sequence in order to reduce disfluencies in the connection. By doing so, word order can be preserved and translation quality im- proved. Our experiments on the patent translation from Japanese to English are able to achieve bet- ter translations measured by both BLEU score and word error rate (WER).
Publication Year: 2011
Publication Date: 2011-01-01
Language: en
Type: article
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot