Resource collection automation for Finno-Ugric languages and Estonian dialects

Organization
TartuNLP
Abstract
NLP applications like ChatGPT, machine translation and others depend on text data - the point of this thesis is to create scripts for retrieving texts from most popular Facebook pages and possibly other sources in order to collect texts in Estonian dialects and languages (Võro, Seto, Mulgi, Kihnu, others) as well as other Finno-Ugric languages (Livonian and others). The results will allow us to add them or improve the performance of those languages in Neurotõlge (translate.ut.ee) and enable other NLP solutions.
Graduation Theses defence year
2023-2024
Supervisor
Mark Fishel
Spoken language (s)
Estonian, English
Requirements for candidates
Level
Bachelor, Masters
Keywords
#smugri #estonianlanguages

Application of contact

 
Name
Mark Fishel
Phone
E-mail
fishel@ut.ee