Recognition of Dialogue Acts in the Estonian Dialogue Corpus: Overview of Resources and Software Development

Name
Sven Aller
Abstract
The aim of the thesis was to describe the present situation of the resources of the Estonian Dialogue Corpus and markup tools for dialogue acts as well as to develop the semi-automatic dialogue act markup tool DAREC. The thesis describes the structure of dialogue acts, the markup typology EdiT used in the Estonian Dialogue Corpus as well as the positive and negative sides of manual and automatic markup tools. The semi-automatic markup tool DAREC created by Mark Fishel in 2007 is based on a statistical method. Linguists’ first opinions were quite positive in terms of markup results. On the other hand, testers were critical about some features of the user interface, such as not beeing user-friendly, a poor manual, the absence of some important functions. Based on the users’ opinions and principles of creating good user interfaces most of the weaknesses were eliminated. The heuristic tests revealed that the usability of DAREC had remarkable improved. The most highly scored features included its user-friendliness, design and contextual help. At the same time various ideas for making the system more effective were suggested. The thesis also suggests several possibilities for developing DAREC, for example, increasing precision and recall of recognition by improving algorithm as well as the size of the dialogue corpus and adding more expert features.
Graduation Thesis language
Estonian
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Mare Koit
Defence year
2012
 
PDF