Automatic Preprocessing of Speech Corpora’s Audio Files for the Purpose of Speech Synthesis

Andreas Teder
Speech corpora may contain excessive silence, reducing the quality of speech synthesis models when used for training. Manually removing the silence is a long process, but using overly simple automatic methods may cause other defects. The purpose of this bachelor’s thesis is to create a program, which utilizes different methods to automatically remove the silence from speech corpora without creating defects. The program utilizes a method based on energy and zero crossing rate along with a method based on acoustic models. The program is used to preprocess three Estonian speech corpora, two of which are used to train speech synthesis models. The quality of both the preprocessing and the speech synthesis models is graded and analyzed. It was concluded that both methods increase the quality of speech corpora, but the acoustic models give a better result overall.
Bachelor - Computer Science
Liisa Rätsep
