Using Embeddings to Improve Text Segmentation

Kaur Karus
Textual data is often an unstructured collection of sentences and thus difficult to use for many purposes. Creating structure in the text according to topics or concepts can aid in text summarization, neural machine translation and other fields where a single sentence can provide too little context. There have been methods of text segmentation that are either unsupervised and based on word occurrences or supervised and based on vector representations of words and sentences. The purpose of this Master’s Thesis is to develop a general unsupervised method of text segmentation using word vector. The created ap-proach is implemented and compared to a naïve baseline to assess the viability of this method. An implemented model is used as part of extractive text summarization to as-sess the benefit of the proposed approach. The results show that while the approach out-performs the baseline, further research can greatly improve its efficacy.
Graduation Thesis language
Graduation Thesis type
Master - Computer Science
Mark Fišel
Defence year