Boosting subjective quality of Arabic text-to-speech (TTS) using end-to-end deep architecture
Fahmy, FK; Abbas, HM; Khalil, Mahmoud;
Abstract
End-to-end speech synthesis methods managed to achieve nearly natural and human-like speech. They are prone to some synthesis errors such as missing or repeating words, or incomplete synthesis. We may argue this is mainly due to the local information preference between text input and the learned acoustic features of a conditional autoregressive (CAR) model. The local information preference prevents the model from depending on text input when predicting acoustic features. It contributes to synthesis errors during inference time. In this work, we are comparing two modified architectures based on Tacotron2 to generate Arabic speech. The first architecture replaces the WaveNet vocoder with a flow-based implementation of WaveGlow. The second architecture, influenced by InfoGan, maximizes the mutual information between text input and predicted acoustic features (mel-spectrogram) to eliminate the local information preference. The training objective has been also changed by adding a CTC loss term. The training objective could be considered as a metric of local information preference between text input and predicted acoustic features. We carried the experiments on Nawar Halabi’s dataset (http://en.arabicspeechcorpus.com/) which contains about 2.41 h of Arabic speech. Our experiments show that maximizing mutual information between predicted acoustic features and conditional text input as well as changing the training objective can enhance the subjective quality of generated speech and reduce the utterance error rate.
Other data
Title | Boosting subjective quality of Arabic text-to-speech (TTS) using end-to-end deep architecture | Authors | Fahmy, FK; Abbas, HM; Khalil, Mahmoud | Keywords | Tacotron 2;WaveGlow;InfoGan;Arabic text-to-speech;Speech synthesis;Deep learning;Neural networks | Issue Date | 8-Feb-2022 | Publisher | SPRINGER | Journal | INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | Volume | 25 | Start page | 79 | End page | 88 | ISSN | 1381-2416 | DOI | 10.1007/s10772-022-09961-0 | Scopus ID | 2-s2.0-85124412012 | Web of science ID | WOS:000752749100004 |
Attached Files
File | Description | Size | Format | Existing users please Login |
---|---|---|---|---|
s10772-022-09961-0.pdf | 1.69 MB | Adobe PDF | Request a copy |
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.