Automatic Detection of Morphological Inflection Types

Name
Sander Saska
Abstract
Estonian language is constantly evolving, as new words are created in different ways. Language users often know intuitively how to inflect new words, but in linguistics this intuition is formalized in the form of inflection types. This work researches how to automate the identification of inflection types. To this end, two LSTM-based models have been created to detect and predict inflection types. The initial data for the models are taken from Vabamorf's morphology lexicon, which consists of almost 74 000 lemmas. All possible word forms are synthesized for the lemmas and the result is transformed into a suitable form for the LSTM-based models. One model is trained on only words, with an accuracy of 95.8%, and the other model is trained on words and parts of speech, with an accuracy of 97.8%.
Graduation Thesis language
Estonian
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Siim Orasmaa
Defence year
2024
 
PDF