Resource collection automation for Finno-Ugric languages and Estonian dialects
NLP applications like ChatGPT, machine translation and others depend on text data - the point of this thesis is to create scripts for retrieving texts from most popular Facebook pages and possibly other sources in order to collect texts in Estonian dialects and languages (Võro, Seto, Mulgi, Kihnu, others) as well as other Finno-Ugric languages (Livonian and others). The results will allow us to add them or improve the performance of those languages in Neurotõlge (translate.ut.ee) and enable other NLP solutions.
Graduation Theses defence year
Spoken language (s)
Requirements for candidates
Application of contact