Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Adapting the BERT Model to Estonian Language

Name

Raul Niit

Abstract

The rapid development of natural language models has turned computers into skilled users of human language, which can be used to solve different types of language tasks such as translation, classification or generation of text. The BERT language model which was created by Google researchers in 2018 is still one of the most popular natural language models today thanks to its powerful architecture and open source framework. BERT models based on a specific language have also been created, such as ESTBERT created in 2020, which is adapted specifically for tasks in the Estonian language. The aim of this master’s thesis is to change the architecture of the BERT model so that additional morphological information about the input such as lemmas and forms could be used in the model. In this work the modifications of the model are implemented and the performance of the model is analyzed on four natural language tasks.

Graduation Thesis language

Estonian

Graduation Thesis type

Master - Data Science

Supervisor(s)

Sven Laur, Hendrik Šuvalov

Defence year

2023

PDF

UT Institute of Computer Science Graduation Theses Registry

Adapting the BERT Model to Estonian Language