Automatic Preprocessing of Speech Corpora’s Audio Files for the Purpose of Speech Synthesis

Name
Andreas Teder
Abstract
Speech corpora may contain excessive silence, reducing the quality of speech synthesis models when used for training. Manually removing the silence is a long process, but using overly simple automatic methods may cause other defects. The purpose of this bachelor’s thesis is to create a program, which utilizes different methods to automatically remove the silence from speech corpora without creating defects. The program utilizes a method based on energy and zero crossing rate along with a method based on acoustic models. The program is used to preprocess three Estonian speech corpora, two of which are used to train speech synthesis models. The quality of both the preprocessing and the speech synthesis models is graded and analyzed. It was concluded that both methods increase the quality of speech corpora, but the acoustic models give a better result overall.
Graduation Thesis language
Estonian
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Liisa Rätsep
Defence year
2021
 
PDF