Title: Refining Word Alignment with Discriminative Training
Abstract: The quality of statistical machine translation systems depends on the quality of the word alignments that are computed during the translation model training phase. IBM alignment models, as implemented in the GIZA++ toolkit, constitute the de facto standard for performing these computations. The resulting alignments and translation models are however very noisy, and several authors have tried to improve them. In this work, we propose a simple and effective approach, which considers alignment as a series of independent binary classification problems in the alignment matrix. Through extensive feature engineering and the use of stacking techniques, we were able to obtain alignments much closer to manually defined references than those obtained by the IBM models. These alignments also yield better translation models, delivering improved performance in a large scale Arabic to English translation task.
Publication Year: 2010
Publication Date: 2010-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 6
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot