Learning the vector space of morphological transformations

Organization
Natural Language Processing
Abstract
The ability to generate morphologically inflected (for nouns) or conjugated (for verbs) word forms is important for many natural language processing systems, especially in morphologically complex languages such as Estonian.

The goal of this project is to machine learn the vector space of morphological transformations using a well-known TransE (https://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf) model used in relation prediction task. The relation prediction systems learn from fact triples (head entity, relation, tail entity) by projecting all entities and relations into a low-dimensional dense vector space. In this project the same method will be applied to morphological triples (baseform, morphological transformation, inflected/conjugated form).

The project involves two steps:
1) Prepare the training and test data from morphologically annotated Multext-East corpora.
2) Conduct experiments with the TransE model (C++ code available).
Graduation Theses defence year
2016-2017
Supervisor
Kairit Sirts
Spoken language (s)
Estonian, English
Requirements for candidates
Familiarity with C++, interest in working with natural language data
Level
Bachelor, Masters
Keywords

Application of contact

 
Name
Kairit Sirts
Phone
E-mail
kairit.sirts@ut.ee