Title: Comparing the naturalness of several approaches for generating <i>F</i>0 contours in German text-to-speech systems
Abstract: Generating near-to-natural F0 contours is an important issue in text-to-speech synthesis and contributes vastly to the quality of synthetic speech. In earlier studies by the authors, a model of German intonation was developed that is based on the quantitative Fujisaki model. A typical F0 contour is described as a sequence of major rises and falls, which are modeled by onsets and offsets of accent commands connected to accented syllables. The current paper addresses perception experiments comparing the intonational naturalness of a Fujisaki-model-based TTS and four other German TTS systems with comparably high segmental quality. Natural speech samples were used as a reference. Three of the TTS systems had PSOLA, and one LPC segmentals. Two types of experiments were conducted with 20 subjects: (1) a pair comparison of 15 isolated sentences, (2) a ranking test based on a news passage of about 15 sec produced with each of the systems. Preliminary results from experiment (1) show, that on a naturalness scale from 0 to 5, the natural speech samples reach a maximum score of 4.5, with values of 2.8 for the best synthesis, the LPC-based one. The system with Fujisaki-model-based intonation leads the group of PSOLA systems, which is closely clustered at a mean of 2.4.
Publication Year: 1999
Publication Date: 1999-02-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 1
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot