Mapping voices to their descriptions

Organization
TartuNLP
Abstract
Dalle-2 and stable diffusion generate images from text, using joint vector representations for both text and images. The aim of this thesis is to expand the approach to voices that can be used in text-to-speech synthesis (like Neurokõne, https://neurokone.ee) and train NN models to generate voices based on their descriptions.
Graduation Theses defence year
2023-2024
Supervisor
Mark Fishel
Spoken language (s)
Estonian, English
Requirements for candidates
Level
Bachelor, Masters
Keywords
#neurokõne #texttospeech #voicegeneration #transformers

Application of contact

 
Name
Mark Fishel
Phone
E-mail
fishel@ut.ee