Mapping voices to their descriptions
Dalle-2 and stable diffusion generate images from text, using joint vector representations for both text and images. The aim of this thesis is to expand the approach to voices that can be used in text-to-speech synthesis (like Neurokõne, https://neurokone.ee) and train NN models to generate voices based on their descriptions.
Graduation Theses defence year
Spoken language (s)
Requirements for candidates
Application of contact