9th Speech in Noise Workshop, 5-6 January 2017, Oldenburg

Rhythm in plain and Lombard speech

Hans Rutger Bosker(a)
Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands

Martin Cooke
Ikerbasque (Basque Science Foundation), Bilbao, Spain

(a) Presenting
(b) Attending

Speech is an inherently rhythmic signal. Even though speech is by no means strictly periodic, energy patterns in speech are constrained by the physiological dynamics of the lips, jaw, and tongue. As such, energy fluctuations in speech typically occur within the 2-20 Hz range.

Recent research suggests that this rhythmicity in the speech signal plays a central role in comprehension, facilitating the processing of the signal. For instance, when the slow amplitude modulations present in typical speech are destroyed or filtered out, intelligibility drops considerably [1]. Neural mechanisms involving endogenous oscillations phase-locking to the energy fluctuations in speech have been suggested to account for these findings [2].

Given the beneficial effects of rhythmicity in comprehension, this study investigated whether speakers actually make use of increased rhythmicity in their speech (i.e., more regular alternations between high and low amplitude intervals) to improve their intelligibility in acoustically challenging listening conditions (e.g., background noise). Rhythmicity was operationalized by analyzing the modulation spectrum of speech, which represents the spectral content of the signal’s amplitude envelope.

Four different corpora were analyzed (varying sample sizes; varying numbers of talkers), each including plain speech (sentences produced in quiet) and matched Lombard speech (same sentences produced in noise). Each sentence was first normalized in amplitude by RMS scaling, thus avoiding intensity confounds. The envelope of the normalized signal was then submitted to a Fast Fourier Transform (FFT), resulting in the modulation spectrum of that sentence. Comparing the average modulation spectra of plain and Lombard speech revealed greater power in Lombard speech in the delta band (1-3 Hz) across all four corpora. Comparison with previous analyses of the speech rate in plain and Lombard speech revealed that this power difference in the delta band could not be attributed to overall slower speaking rates in Lombard speech.

These findings suggest that speakers produce more rhythmic speech, particularly in the 1-3 Hz range, when talking in noise (vs. in quiet). Results are discussed in terms of the functional role of rhythmicity in dialogue and potential underlying neurocognitive mechanisms (e.g., neural oscillatory dynamics).

[1] Ghitza, O. (2012). On the role of theta-driven syllabic parsing in decoding speech: intelligibility of speech with a manipulated modulation spectrum. Frontiers in Psychology, 3, 238.
[2] Doelling, K. B., Arnal, L. H., Ghitza, O., & Poeppel, D. (2014). Acoustic landmarks drive delta–theta oscillations to enable speech comprehensionzr by facilitating perceptual parsing. NeuroImage, 85, 761-768.

Last modified 2017-01-04 23:51:47