Mapping Between Old and New Estonian Orthography Using Finite State Transducers

Name
Ida Maria Orula
Abstract
Nowadays it is common to analyse all kinds of written sources automatically. However, the necessary technologies are only applicable to words that follow the morphological rules of the modern language. Therefore, it is necessary to normalize historical texts that are written using the old Estonian orthography. This problem may approached from two different angles. On the one hand, it is possible to convert all old Estonian orthography forms to their modern counterparts. This would make the texts easy to understand for both the automated analysis technologies and also for the people who are not so familiar with the old Estonian orthography. However, valuable information about how the language has changed, would be lost. The second approach is to adapt the current technologies to make them recognize the old word forms. In this thesis, both solutions are used. The author creates a new orthographic transducer that maps old word forms from the 1739 Bible translation to their modern forms. In addition, an existing morphological analyser of the Estonian language is modified, to allow it to recognize old Estonian orthography word forms. The author also gives suggestions for future developments of the created system.
Graduation Thesis language
Estonian
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Heiki-Jaan Kaalep
Defence year
2019
 
PDF