A correlation metric in the envelope power spectrum domain for speech intelligibility prediction
A powerful tool to investigate speech perception is the use of speech intelligibility prediction models. Recently, a model was presented, termed correlation-based speech-based envelope power spectrum model (sEPSMcorr), that uses a correlation-based back end at the output of an audio-frequency and modulation-frequency selective auditory preprocessing (Relaño-Iborra et al., 2016). The use of the correlation back-end extended the predictive power of earlier versions of the sEPSM framework (e.g. Jørgensen et al. 2013) towards conditions of non-linear signal processing, such as phase jitter and ideal binary mask processing. Moreover, the model was shown to account for conditions with fluctuating interferers, unlike other correlation-based models.
Here, the back end of the sEPSMcorr was combined with a more realistic auditory pre-processing front end adopted from the computational auditory signal processing and perception model (CASP; Jepsen et al., 2008). The preprocessing contains outer- and middle-ear filtering and a non-linear auditory filterbank (DRNL, López-Poveda and Meddis, 2001), followed by inner hair-cell transduction, adaptation and a modulation filterbank.
The predictions were compared to measured data in conditions of additive masking noise, phase jitter distortions, reverberation and noise-reduction algorithms. The effects of the back end as well as the different preprocessing stages on the predicted results were analyzed. The modelling framework could be useful for the design and evaluation of, e.g. speech transmission algorithms or hearing-instrument algorithms.