Learning the vector space of morphological transformations

Organisatsiooni nimi
Natural Language Processing
Kokkuvõte
The ability to generate morphologically inflected (for nouns) or conjugated (for verbs) word forms is important for many natural language processing systems, especially in morphologically complex languages such as Estonian.

The goal of this project is to machine learn the vector space of morphological transformations using a well-known TransE (https://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf) model used in relation prediction task. The relation prediction systems learn from fact triples (head entity, relation, tail entity) by projecting all entities and relations into a low-dimensional dense vector space. In this project the same method will be applied to morphological triples (baseform, morphological transformation, inflected/conjugated form).

The project involves two steps:
1) Prepare the training and test data from morphologically annotated Multext-East corpora.
2) Conduct experiments with the TransE model (C++ code available).
Lõputöö kaitsmise aasta
2016-2017
Juhendaja
Kairit Sirts
Suhtlemiskeel(ed)
eesti keel, inglise keel
Nõuded kandideerijale
Familiarity with C++, interest in working with natural language data
Tase
Bakalaureus, Magister
Märksõnad

Kandideerimise kontakt

 
Nimi
Kairit Sirts
Tel
E-mail
kairit.sirts@ut.ee