Mini dictionary

REMARK. The Hungarian speech sounds are written with their appropriate letter(s).

Acoustic point shape

During articulation the place of articulation (bilabial, labiodental, dentialveolar, alveolar, palatal, laringal) determines the actual acoustic content (spectrum) when producing speech sounds. Thus, every place of articulation corresponds to its spectral shape, regardless of the mode of speech sound production (stops, spirants, affricates etc.) This plays an important role in speech synthesis and in sound surgery.

Aperiodic vibration

Irregular, non-periodic vibration. Aperiodic vibration is produced, for example, in spirants (e.g. s, f).

Articulation

Continuous automatic movement of speech organs to produce speech.

Articulation channel

The sound tube area from the vocal cords to the mouth. The average length is 17 cm for men and 14 cm for women).

Articulation place

Location of the articulation point in the oral cavity when producing consonants. In Hungarian following places are used: two lips (bilabial: b, p, m), lower lip and upper teeths (labio-dental: v, f), Upper teeths bed and tongue tip (denti-alveolar: d, t, sz, z, c, dz, n, l, r), anterior palate (alveolar: s, zs, cs, dz), hard palate (palatal: ny, gy, ty, j), soft palate (velar: g, k), larynx (larinx: h)

Articulation rate

The number of speech sounds uttered within one second excluding breaks. Unit: sound/s. The Hungarian average is 14 sound/s, for calm speech it is 12, and 18 for fast speech.

Burst

Stops have two phases in articulation: the lock and after the burst. So the ending part of stops is the burst, when the speaker unlocks the articulation channel. The burst is a fast process.

C

Short abbreviation for consonants in written materialsin speech science. The abbreviation for vowels is V.

Clear Phase

Speech sounds have three parts: two voice transition parts (right and left)and the clear phase (middle part of the sound). Acoustically, the clear phase is that characterizes the given speech sound.

Co-articulation

Every speech sound has its characteristic articulation configuration in the oral cavity. Though articulation is continuous during speech production the speech organs move smoothly between two such configurations. This is called coarticulation. In fluent speech the speech sounds are connected to each other by co-articulation phases. This results that the spectral shape of fluent speech shows always some formant movements between two voiced speech sounds. These sound parts are called transition phases. The sound transition phases in speech sounds are thus a consequence of co-articulation. Co-articulation can also shift the characteristic articulation place of the adjacent sound. For example: palatal consonants shift the F2 formants of certain vowels towards their F2.

Diphone
A speech-building waveform element that contains two halves of speech sounds cut from real speech waveform. The Hungarian speech can be covered with max. 1600 types of diphones. Typical use of diphones occur in speech synthesis.
Fundamental frequency

The frequency of the vibrations of vocal cords when voiced speech sounds are pronounced. Phonetic notation is F0 Hz. Value: 100 Hz on average for male speakers, 180 Hz for women, 400 Hz for children.

Formant

The vocal cord vibration in the larynx is rich in harmonics. The resonant frequencies of the oral cavity above the larynx amplify the groups of harmonics close to this frequency. Thus, these amplified harmonic groups will have higher energy than their surroundings. This part of energy concentration is called formant. The resonant frequencies are at different frequencies for each speech sound, depending on the articulation. Forms are denoted by their ascending frequency: F1, F2, F3, F4. Formants frequencies occur in speech sounds between 200-4000 Hz. Each formant has its own frequency band. Formants can be detected in high-energy voiced speech sounds, such as vowels.

Forming speech sounds

Ways to form speech sounds with the speech organs. In Hungarian speech the following ways are used: open oral cavity (vowels, vowel like consonants), open nasal cavity (nasal sounds), closure in the oral cavity (stop consonants), narrowing the oral cavity (fricatives), rolled speech sound produced with the top of tongue.

Fundamental frequency

The frequency of vibration of the vocal cords in speech, when one produces a voiced speech sound. It is also called the basic tone. Phonetic mark is: F0 (Hz). Its average values are: 100 Hz for male speakers, 180 Hz for female speakers and for children, 400 Hz. Important note: F0 has nothing to do with the formants marked F1, F2, F3, F4.

Frequency

The frequency of periods of vibration per unit time. Its unit is Hz. 1 Hz is the frequency when the vibration describes a period during 1second. The basic tone frequency in speech is about 100 Hz for a male speaker, about 180 Hz for a woman, and about 400 Hz for a child.

Harmonic

The frequency components of voiced speech sounds are harmonics that are multiples of the value of fundamental frequency. The harmonics of speech typically occur in the frequency range of 100Hz to 4000Hz.

Hearing band

The frequency range within human ear perceives acoustic signals (speech, music, noise). The human hearing range is from 20 Hz to 20 kHz.

IPA

International Phonetic Alphabet, developed for clear definition and marking of speech sounds especially in written texts. The ACSII transliteration of IPA symbols is the SAMPA system, which was developed for computer use.

Labeling

The set of time-synchronized tags (markers) placed parallel to the speech symbols to indicate language units. For example, marking the sound boundaries or syllable boundaries of a word.

Phoneme

Theoretical linguistic concept. It refers to the linguistic unit that distinguishes word meanings. The realization of the phoneme is the speech sound. There are 65 phonemes in the Hungarian language, 15 types of vowels and 50 types of consonants. There are two groups of phonemes according to the language duration class: short and long. For example, the difference in meaning between the words sok (many) and sokk (shock) is created by the long version of the speech sound k.

Period duration

The duration of one period of periodic sound vibration. If the frequency is 100 Hz (such as by male fundamental frequency in speech), the period time is 10ms. T0 (s)=1/F0(Hz)

Prosody

A term to express the speech melody, the emphasis, and the rhythm together with one word. Prosody gives the expressive nature of human speech.

Relative loudness

The relative volume of a speech sounds in comparision to other speech sounds within an uttered speech sequence. The relative intensities of speech sounds are expressed on a relative volume scale.

Resonance frequency

The frequency value at which a body (like air) starts vibrate on the effect of the vibration generated by another body. The articulation channel has several resonant frequencies. At these resonant frequencies the harmonics of the vocal cord vibration having the same frequency as the resonance frequency are amplified. These higher energy parts are the formants. The formants form the different speech sounds.

SAMPA

SAMPA is a speech sound notation system for computers using ASCII codes. The origin of SAMPA is IPA.

Segmental structure

This is a theoretical concept. This term summarizes the basic elements of speech the results of articulation, timing, and intensity structure of speech. In other words, the segmental structure expresses the basic elements of speech without prosody, i.e., speech sounds, sound transitions, specific sound durations, and specific sound intensities. In practice, speech having only the segmental structure can only be created with speech technology tools.

Silent phase

The first part of unvoiced stops and affricates. There is no sound generation for a short time. The vocal folds are inactive. The air flowing out of the lungs is blocked by an obstacle inside the articulation tube. The silent phase is followed by a burst. In the silent phase the amplitude of the speech signal is zero.

Sound boundary = Start and end point of the speech sounds. This is usually marked in the waveform or spectrum for speech technology purposes.

Sound symbols = Speech sounds are generally represented by letters in texts. The letter form of the given speech sound is language-dependent. That is why the International Phonetic Alphabet (IPA) for sound symbols was introduced in 1889. By IPA speech sounds can be clearly defined regardless of their written form. The SAMPA sound symbol system was established in 1989. especially for use in computer texts.

Sound Spectrogram

Visual representation of the frequency and intensity components of speech as a function of time in a three-dimensional image. This visual representation is also called “visible speech”, or “voice print”.

Speech melody

Change in the frequency of the vocald cord vibration (F0) to produce different sentence forms and also express emotions.

Speech rate

The number of speech sounds per unit time during speech, including pauses. Unit: sound/s. Its value is less than the value of articulation speed.

Speech Frequency

The set of frequency values that occur in speech. Range: 50Hz-12000 Hz.

Speech synthesis

Artificial production of speech by machine. In general, the written form is converted into synthesized speech (machine reading, text-to-speech TTS).

Speech recognition by machine

Artificial recognition of speech by machine (ASR). In general, the recognized speech can be converted into text. ASR is used in security systems and forensic too.

Suppressed voice

The voice of vocal cord vibration during the first part of voiced stop sound. In this case there is no sound outflow until the burst. The amplitude of the suppressed voice is low and contains no formants.

Suprasegmental structure

This is a theoretical concept. The part of the complex speech signal that includes the melody, emphasis, tempo, rhythm, volume, and timber of the speech, i.e., the complex prosodic structure. In human-pronounced speech, the segmental and suprasegmental structures are present simultaneously.

Specific duration

The theoretical duration characteristic of speech sound as a function of the sound before and after the sound in continuous speech.

Spectrum

The set of data that determines the frequencies, amplitudes, and phases of the components of a speech waveform. In the case of voiced speech sounds, the spectrum shows a linear pattern at the frequency values of the harmonics. For noisy sounds, the spectrum is continuous because there are no harmonics in these sounds. The change in the spectrum over time is shown by the sound spectrogram (visible speech or voice print).

Transition phase

Speech sounds have mainly three parts: two transition phases and the clear phase (center of the sound). Speech sounds are connected to the adjacent speech sound with their transition phase parts. All this is closely related to the continuous movement of the articulatory organs. The sound transition phase is a consequence of co-articulation.

Triphone

A 3-element voice connection in which the middle speech sound is present in its whole length and the two neighbouring ones only in half. It is used in speech synthesis as a wave form building element. CVC triads are usually made.

V

Abbreviation for Vowels in speech science. For consonants the notation C is used.

Voice Surgery

Direct intervention in the time function of speech sounds (with sound wave editor programs), that can result better voice quality or can change the meaning of speech.