The Development of Estonian Texts' Summarizer EstSum
Name
Janar Saks
Abstract
In today's vast information quantity, there is often a need for a quick overview of important information. Therefore, a summary as a shortened overview of the source material, could be an important source of information collection. But like any other language technology application, they also depend on the peculiarities of the language they are designed for. A summarizer created for the English language is not implementable for the Estonian language, because of the multitude of word forms that is typical for the Estonian language and therefore requires a completely different approach.
Although Estonian texts' summarizer EstSum uses a keyword-based score to calculate a sentence´s weight, the score is calculated solely from word forms not the word´s stem. The addition of a linguistic module, that can analyze word stems, did increase the evaluation score compared to the regular EstSum.
Furthermore, the new developed version of the Estonian texts' summarizer is capable of separating more important information from the source than the old version of EstSum.
Although Estonian texts' summarizer EstSum uses a keyword-based score to calculate a sentence´s weight, the score is calculated solely from word forms not the word´s stem. The addition of a linguistic module, that can analyze word stems, did increase the evaluation score compared to the regular EstSum.
Furthermore, the new developed version of the Estonian texts' summarizer is capable of separating more important information from the source than the old version of EstSum.
Graduation Thesis language
Estonian
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Kaili Müürisep
Defence year
2018