Predicting effects of additive noise and hearing-instrument signal processing on consonant recognition and confusions
The perception of consonants has been investigated in various studies and shown to critically depend on fine details in the stimuli. In the present study, a microscopic speech perception model is proposed, which combines an auditory processing front end with a correlation-based template matching back end. The model represents an extension of the auditory signal processing model by Dau et al. [(1997). J. Acoust. Soc. Am. 102, 2892–2905] towards predicting microscopic speech perception data. It was evaluated based on the extensive consonant perception data set provided by Zaar and Dau [(2015). J. Acoust. Soc. Am. 138, 1253–1267], which was obtained with normal-hearing (NH) listeners using 15 consonant-vowel combinations (CVs) mixed with white noise. Accurate predictions of the consonant recognition scores were obtained across a large range of signal-to-noise ratios. Furthermore, the model yielded convincing predictions of the consonant confusion scores, such that the predicted errors were clustered in perceptually plausible confusion groups.
The model was further evaluated with respect to perceptual artifacts induced by hearing-aid (HA) and simulated cochlear-implant (CI) processing in NH listeners. In terms of HA processing, effects of strong nonlinear frequency compression and impulse-noise suppression were measured in 10 NH listeners using CV stimuli. Regarding the simulated CI processing, the consonant perception data from DiNino et al. [(2016). J. Acoust. Soc. Am., under review] were considered, which were obtained with noise-vocoded vowel-consonant-vowel (VCV) stimuli in 12 NH listeners. Both the HA and the simulated CI processing induced strong perceptual confusions of specific consonants, whereas other consonants remained perceptually unaffected. The model predictions obtained for the two data sets showed a large agreement with the perceptual data both in terms of consonant recognition and confusions, demonstrating the model’s sensitivity to supra-threshold effects of hearing-instrument signal processing on consonant perception.
Overall, the large predictive power of the proposed model suggests that adaptive processes in the auditory preprocessing in combination with a cross-correlation based template-matching back end can account for some of the processes underlying consonant perception in normal-hearing listeners. The proposed model may provide a valuable framework for the evaluation of hearing-instrument processing strategies, particularly when combined with simulations of individual hearing impairment.