Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Training the Best Neural Machine Translation Model for the Estonian-English Language Pair

Name

Kristiina Kuningas

Abstract

To this day, a lot of neural machine translation models have been developed to produce high-quality translations on many language directions. The same goes for Estonian-English. However, these models that have been trained on that language pair are mostly multilingual or already outdated and need enhancing. This bachelor’s thesis represents a bilingual approach using recent effective technologies with the most current data available to improve the previous best result for this Estonian-English language pair. This paper introduces a state-of-the-art bilingual neural machine translation system, which outperforms the previous best result achieved for Estonian-English. The system uses different methods to achieve the goal - trains baseline models on parallel data, generates additional data with available monolingual data and backtranslation, combines the synthetic data with the initial parallel corpus, trains a new model on the augmented corpus, and in the final step, uses ensembles of those already trained models.

Graduation Thesis language

English

Graduation Thesis type

Bachelor - Computer Science

Supervisor(s)

Andre Tättar

Defence year

2021

PDF

UT Institute of Computer Science Graduation Theses Registry

Training the Best Neural Machine Translation Model for the Estonian-English Language Pair