logo
notes-tech

Vivo Text's Innovation

Analogy between Music and Speech

A core concept of the VivoText TTS system derives from an analogy between the conversion of music scores (e.g., music texts) into human-like expressive performances and the conversion of written text into natural sounding speech.

Just as variation in tempo, articulation, and dynamics contribute to the effectiveness of a musical performance, speech attributes such as pitch, duration, and amplitude—known as prosody—are at the core of effective TTS. They are critical to conveying the phonemic, syntactic, and pragmatic content of words and sentences.

Vivo Text applies methods developed for music performance, called MOR-Music Objects Recognition to speech synthesis.

The result is highly intelligible enunciation and natural flow in a variety of speaking styles.

Voice Sample Database

VivoText has developed an innovative approach to the creation of the database that provides the audio units for concatenating the speech output. Our proprietary, patent-pending sample library contains a comprehensive set of samples of phonemes with different prosodic attributes, carefully designed to provide a broad range of prosodic variants of each phoneme in all its contexts.

Expressivity

Expression is generated in a two-step process:

  • Basic expression is derived automatically from the phonetic, semantic and syntactic analysis of the text, determining, for example, whether the sentence is a statement or question, simple or complex. The analysis also takes into consideration additional expressive instruction that may be provided in the text such as italics, underlining, and CAPS.
  • Expression is determined by a speaking-style preference that the user chooses from a menu. For example, the user can choose a "deliberate" style for news or an "enthusiastic" style for announcing the launch of a new product.

The VivoText TTS engine is designed as a universal platform that is language independent. The unified architectural infrastructure for all the different languages requires just a single software package for the entire portfolio of voices and dialects within each language.

Copyright © 2010-2011 VivoText
Website by Sigalon