Resource collection automation for Finno-Ugric languages and Estonian dialects

Organisatsiooni nimi
TartuNLP
Kokkuvõte
NLP applications like ChatGPT, machine translation and others depend on text data - the point of this thesis is to create scripts for retrieving texts from most popular Facebook pages and possibly other sources in order to collect texts in Estonian dialects and languages (Võro, Seto, Mulgi, Kihnu, others) as well as other Finno-Ugric languages (Livonian and others). The results will allow us to add them or improve the performance of those languages in Neurotõlge (translate.ut.ee) and enable other NLP solutions.
Lõputöö kaitsmise aasta
2023-2024
Juhendaja
Mark Fishel
Suhtlemiskeel(ed)
eesti keel, inglise keel
Nõuded kandideerijale
Tase
Bakalaureus, Magister
Märksõnad
#smugri #estonianlanguages

Kandideerimise kontakt

 
Nimi
Mark Fishel
Tel
E-mail
fishel@ut.ee