Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Using Embeddings to Improve Text Segmentation

Name

Kaur Karus

Abstract

Textual data is often an unstructured collection of sentences and thus difficult to use for many purposes. Creating structure in the text according to topics or concepts can aid in text summarization, neural machine translation and other fields where a single sentence can provide too little context. There have been methods of text segmentation that are either unsupervised and based on word occurrences or supervised and based on vector representations of words and sentences. The purpose of this Master’s Thesis is to develop a general unsupervised method of text segmentation using word vector. The created ap-proach is implemented and compared to a naïve baseline to assess the viability of this method. An implemented model is used as part of extractive text summarization to as-sess the benefit of the proposed approach. The results show that while the approach out-performs the baseline, further research can greatly improve its efficacy.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Mark Fišel

Defence year

2019

PDF

UT Institute of Computer Science Graduation Theses Registry

Using Embeddings to Improve Text Segmentation