Summary: Inverting an Estonian-english statistical machine translation model

Name
Indrek Klanberg
Abstract
The present thesis is about statistical machine translation in both theoretical and practical manner. Statistical machine translation is an area, which aims to make the machine translate without giving it any knowledge about grammar of the languages. It only receives a parallel corpora with millions of sentence pairs, where the only certain knowledge is, that in each pair, the sentences translate to each other. In the practical part of the work, we used statistical machine translation framework Moses, to create a new language model for English-Estonian direction. In addition an existing opposite direction language model, which was built from different corporas and put together weigthed, was inverted. Throughout the work 1 language-model, 2 phrase-models and 2 reordering-models were created. Theoretical part of the work involved describing different algorithms that were used inside the framework and its components in the practical part. To be more precise we discussed bidirectional phrase-model with lexical weights, tri-gram language model with recursive interpolation and Witten-Bell smoothing and bidirectional msd(monotone, swap, discontinues) reordering model. At the end stage of the work a test corpora with more than 1000 sentences was translated using the created models. Result was measured with automatic evaluation method BLEU. In addition, the result was examined close up and even though there were many good to allmost perfect translations for simpler sentences, in more complex sentences, there started to exist errors. Most common was misunderstanding the contex. Others worth mentioning were wrong inflection and bad sentence structure. As the result of the work a English-Estonian machine translation model was made and we came to the conclusion that it is a promising field for English-Estonian translation. The work at hand is also meant for educational purposes for anyone willing to step into statistical machine translation field for English-Estonian direction.
Graduation Thesis language
Estonian
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Mark Fishel, Mare Koit
Defence year
2012
 
PDF