Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Improving Neural Machine Translation Models with Back-translation and Quality Estimation

Name

Raul Erik Kattai

Abstract

The best machine translation models are on par with human translators as it is becoming increasingly difficult to differentiate between their translations. To produce high-quality results, a translation model requires a lot of training data. However, there exists a limited number of useful bilingual text corpora. By translating a monolingual corpus with a model a synthetic bilingual corpus can be created. Because of its lower quality, the synthetic corpus can degrade the model and make the quality of its output worse. This bachelor’s thesis applies a quality estimation model to a synthetic parallel corpus to filter out unsuitable sentence pairs. The resulting dataset is used to fine-tune a machine translation model. The objective is to improve the model with monolingual data.

Graduation Thesis language

English

Graduation Thesis type

Bachelor - Computer Science

Supervisor(s)

Andre Tättar

Defence year

2021

PDF

UT Institute of Computer Science Graduation Theses Registry

Improving Neural Machine Translation Models with Back-translation and Quality Estimation