Adapting the BERT Model to Estonian Language

Name
Raul Niit
Abstract
The rapid development of natural language models has turned computers into skilled users of human language, which can be used to solve different types of language tasks such as translation, classification or generation of text. The BERT language model which was created by Google researchers in 2018 is still one of the most popular natural language models today thanks to its powerful architecture and open source framework. BERT models based on a specific language have also been created, such as ESTBERT created in 2020, which is adapted specifically for tasks in the Estonian language. The aim of this master’s thesis is to change the architecture of the BERT model so that additional morphological information about the input such as lemmas and forms could be used in the model. In this work the modifications of the model are implemented and the performance of the model is analyzed on four natural language tasks.
Graduation Thesis language
Estonian
Graduation Thesis type
Master - Data Science
Supervisor(s)
Sven Laur, Hendrik Šuvalov
Defence year
2023
 
PDF