PROFIVOX text-to-speech family

Human voice based text-to-speech conversion

This text-to-speech technology is the result of linguistic-phonetic research and a professional technical performance at the BME TMIT since 1995. The system family uses several technologies, consists of several elements and models. We are constantly developing the technology.

ProfiVox-diad (1999-) system connects short speech wave elements (dyphones e.g.two half speech sounds) and triphones (consonant-vowel-consonant) cut from human speech. The final prosody of the synthesized sentence is realised by signal processing according to rules. Its voice is human, it can ask also questions. Available for purchase in the Googleplay Store. It is/was used in many industrial applications. It also works for free (robobraille.org). Most blind people use this as a computer-screen reader.

Profivox Corpus method (2005-) selects real sound wave parts (word, string, sentence) from large speech databases (corpus) of live speech using mathematical functions. It can’t ask questions. There is no post-processing in the final voice. The synthesis provides very good quality synthesized speech when is prepared to a limited topic area. Applications in Hungary: reading of name and address (inquiry), reading of price lists, reading of numbers, date, time, weather report, railway passenger information.

Profivox-HMM (2010-) is a statistical solution. It learns from a large speech database on a statistical basis. The learning outcome is a general parameter database with voice and prosodic information. By the synthesis process the input text is analysed and speech parameters are selected from the parameter database in the most optimal way. The data set is converted to a waveform by a speech encoder. It can also ask questions. It is easy to teach to other people’s voices. It is used in an administrative automatic chatbot system.

All three systems provide good quality speech. They are used in industrial, IT systems, see ‘Applications’.