Abstract: Voice assistant technology has expanded the design space for voice-activated consumer products and audio-centric user experience. To navigate this emerging design space, Speech Synthesis Markup Language (SSML) provides a standard to characterize synthetic speech based on parametric control of the prosody elements, i.e. pitch, rate, volume, contour, range, and duration. However, the existing voice assistants utilizing Text-to-Speech (TTS) lack expressiveness. The need of a new production workflow for more efficient and emotional audio content using TTS is discussed. A prototype that allows a user to produce TTS-based content in any emotional tone using voice input is presented. To evaluate the new workflow enabled by the prototype, an initial comparative study is conducted against the parametric approach. Preliminary quantitative and qualitative results suggest the new workflow is more efficient based on time to complete tasks and number of design iterations, while maintaining the same level of user preferred production quality.
Publication Year: 2017
Publication Date: 2017-10-20
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 3
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot