Modeling speech intelligibility based on envelopes derived from auditory spike trains
Speech intelligibility models aim to predict the human ability to understand speech in adverse listening conditions. The presented work combines the back-end processing of the multi-resolution speech-based envelope power spectrum model (mr-sEPSM; Jørgensen et al., 2013) with an auditory nerve (AN) model (Zilany and Bruce, 2014). The presented work calculated signal-noise-ratios in the envelope domain (SNRenv) for normal-hearing listeners based on different envelope representations derived from the AN model: (i) instantaneous firing rates (ii) peristimulus time histogram (PSTH) of auditory nerve spike trains, and (iii) SUMCOR neural metrics (Heinz and Swaminathan, 2009). The SNRenv patterns showed good agreements compared to the SNRenv patterns calculated from the acoustic (i.e. Hilbert) envelope (Heinz, 2016). Furthermore, speech intelligibility for normal-hearing listeners based on envelopes derived from PSTHs for CLUE sentences (CLUE; Nielsen and Dau, 2009) was predicted accurately in speech shaped noise (SSN), sinusoidally amplitude modulated noise (SAM) and speech-like noise (ISTS; Holube et al., 2010). Effects of hearing loss resulted in poorer speech intelligibility predictions. The work provides a foundation for quantitatively modeling individual effects of inner and outer hair cell loss on speech intelligibility.