Listen to the video
Video Thumbnail

The ProfiVox-diad text-to-speech system

The ProfiVox-diad text-to-speech system (1999) concatenates short waveform elements, so called diads (i.e., two half human speech sound waveform elements). This idea is logical from a speech-building point of view, as a finite number of waveform elements (1600 kinds of diads) can cover the entire language. The average duration of a diad is 75 ms. The example shows the 5 diads used to compose the waveform of the Hungarian word ‘alma’ (apple).

image

The melody and rhythm of the sentence are supermposed on this waveform by signal processing, according to rules. An own special signal processing method is used to change the melody. Creating a diad database requires precise phonetic knowledge. The waveform of the diads must be labeled: period limits of voiced sounds (v), sound boundaries (red marker), diad boundaries. There must be zero transition at the junctions of the diads. Creating a diad database requires a lot of live work. The total duration of the ProfiVox system’s diad database is only 2 minutes of speech. ProfiVox’s voice is human, it can ask questions. It is used in many industrial applications. It also works for free by robobraille.org.

image

Most blind people use Profivox in Hungary as a screen reader (Jaws for Windows). Available for purchase in the GooglePlay Store. Different voice samples can be listened to in the ‘Applications’.
Video Thumbnail