Mapping voices to their descriptions
Organization
TartuNLP
Abstract
Dalle-2 and stable diffusion generate images from text, using joint vector representations for both text and images. The aim of this thesis is to expand the approach to voices that can be used in text-to-speech synthesis (like Neurokõne, https://neurokone.ee) and train NN models to generate voices based on their descriptions.
Graduation Theses defence year
2023-2024
Supervisor
Mark Fishel
Spoken language (s)
Estonian, English
Requirements for candidates
Level
Bachelor, Masters
Application of contact
Name
Mark Fishel
Phone
E-mail