Speech databases

Speech databases for continuous speech recognition and speech synthesis

Speech databases are the foundations of modern automatic speech recognition (ASR) and speech synthesis (TTS) systems. Both solutions use large speech databases with many sentences, which are annotated and labeled (manually or mechanically). Several speech databases have been created in Hungarian, mainly in the speech research laboratories of BME TMIT.

The first real Hungarian speech database was created for ASR research. This was called BABEL (1998). It was collected by BME TMIT researchers under the guidance of Klára Vicsi, based on an international standard. Sixty speakers read texts. The MTBA Hungarian Telephone Speech Database (2002) was a collection of speech items. Five hundred informants contained read records from 297 landlines and 203 mobile phones. In the Phonetics Laboratory of the Research Institute of Linguistics of the Hungarian Academy of Sciences, the BEA spoken language database was created under the leadership of Mária Gósy (2007). The work laste 5 years. In this databese, hundreds of people spoke spontaneously and read texts aloud. The BEA database was also used by BME TMIT for ASR experiments and developments. SpeechTex Kft (since 2013) has developed several speech databases for its state-of-the-art continuous speech recognition and speech descriptor technical solutions.

The PPBA Parallel Precision Speech Database (2010) was created in the speech technology laboratory of BME TMIT for high-quality, human-sounded speech synthesis, in which 12 speakers read the same sentence set containing 2000 items. This was used to teach the Profivox-HMM speech synthesizer. The WEATHER speech database (2013) consisted of 5,000 sentences and served the ProfiVox corpus speech synthesizer. The TRAIN speech database (as of 2014) contained 12,000 sentences and operates in MÁV’s automatic, audible passenger information systems. These speech databases developed for speech synthesis were later used to teach the BME TMIT Neural ProfiVox speech synthesizers as well.