Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Multi-speaker Text-to-speech Synthesis in Estonian

Name

Oleh Matsuk

Abstract

Text-to-speech synthesis is a challenging problem, but in recent years it has obtained convincing solutions in the form of neural network models. Specialized model architectures have been proposed to affect speaker identity features of the synthesized speech without training separate models, thus reducing the requirements for data volume and training time.
In this work we implement and train a recently proposed neural architecture with limited amount of Estonian speech data to obtain a model capable of multi-speaker text-to-speech synthesis. Consequently, we evaluate the overall quality of the synthesized speech and the model's ability to assume speaker identity features for speakers both seen and unseen in training. We evaluate and compare the results between multiple models trained with different sets of training data.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Mark Fišel

Defence year

2021

PDF

UT Institute of Computer Science Graduation Theses Registry

Multi-speaker Text-to-speech Synthesis in Estonian