Improving Neural Machine Translation Models with Back-translation and Quality Estimation

Name
Raul Erik Kattai
Abstract
The best machine translation models are on par with human translators as it is becoming increasingly difficult to differentiate between their translations. To produce high-quality results, a translation model requires a lot of training data. However, there exists a limited number of useful bilingual text corpora. By translating a monolingual corpus with a model a synthetic bilingual corpus can be created. Because of its lower quality, the synthetic corpus can degrade the model and make the quality of its output worse. This bachelor’s thesis applies a quality estimation model to a synthetic parallel corpus to filter out unsuitable sentence pairs. The resulting dataset is used to fine-tune a machine translation model. The objective is to improve the model with monolingual data.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Andre Tättar
Defence year
2021
 
PDF