Institute of Computer Science - Graduation Theses Topics Registry

Graduation theses topics (Submit a thesis topic) Completed theses (Submit your thesis)

Resource collection automation for Finno-Ugric languages and Estonian dialects

Organization

TartuNLP

Abstract

NLP applications like ChatGPT, machine translation and others depend on text data - the point of this thesis is to create scripts for retrieving texts from most popular Facebook pages and possibly other sources in order to collect texts in Estonian dialects and languages (Võro, Seto, Mulgi, Kihnu, others) as well as other Finno-Ugric languages (Livonian and others). The results will allow us to add them or improve the performance of those languages in Neurotõlge (translate.ut.ee) and enable other NLP solutions.

Graduation Theses defence year

2023-2024

Supervisor

Mark Fishel

Spoken language (s)

Estonian, English

Requirements for candidates

Level

Bachelor, Masters

Keywords

#smugri #estonianlanguages

Application of contact

Name

Mark Fishel

Phone

E-mail

fishel@ut.ee

UT Institute of Computer Science Graduation Theses Topics Registry

Resource collection automation for Finno-Ugric languages and Estonian dialects

Application of contact