Computational auditory scene analysis in multi-talker environments
In many everyday situations, listeners are confronted with complex acoustic scenes. Despite the complexity of these scenes, they are still able to follow and understand one particular talker. This contribution presents auditory models that aim to solve different speech-related tasks in multi-talker settings. The main characteristics of the models are: (1) restriction to salient auditory features (“glimpses”); (2) usage of periodicity, periodic energy, and binaural features; and (3) template-based classification methods using clean speech models. The model performance is evaluated on the bases of (already existing) human psychoacoustic data [e.g., Brungart and Simpson, Perception & Psychophysics, 2007, 69 (1), 79-91]. The model results were found to be similar to the subject results. This suggests that sparse glimpses of periodicity-related monaural and binaural auditory features provide sufficient information about a complex auditory scene involving multiple talkers. Furthermore, it can be concluded that the usage of clean speech models is sufficient to decode speech information from the glimpses derived from a complex scene, i.e., computationally complex models of sound source superposition are not required, even in complex scenes.