Phrase Similarity Measures Based on Word Mover's Distance

Name
Hele-Andra Kuulmets
Abstract
Measuring semantic similarity between texts is necessary for successfully solving natural language document classification tasks. However, not always base the problems that can potentially be solved using semantic similarity on texts with the length of multiple sentences. Sometimes the decision has to be made only seeing a single sentence or a phrase from that sentence.
In this thesis, Word Mover's Distance (WMD), which essentially is a document similarity measure, is applied to three different problems where only short phrases are given. The first problem, predicting omitted word by the given context, is a made up problem and the goal is to assess the goodness of the measure and its suitability for such tasks. The results are good and show that it is possible to do some semantic separation of phrases using WMD.
Other two problems are examples of practical cases. Firstly, the method is used to detect adverse drug reactions from the patients' epicrises. Secondly, the method is applied to the analysis of syntax parser errors. The goal is to predict phrases that parser fails to tag correctly. For different reasons, which are also analyzed on this thesis, the results were not good for neither of the problem.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Sven Laur
Defence year
2019
 
PDF