The role of short-time power and envelope power SNRs in psychoacoustic masking and speech intelligibility
Effects of spectral masking in psychoacoustics can be explained with the power-spectrum model (PSM) of masking. Similarly the envelope power spectrum-model (EPSM) has been suggested to account for masking in the envelope domain. Recently, Biberger and Ewert [(2016). J. Acoust. Soc. Am. 140, 1023-1038] proposed the generalized power spectrum model (GPSM) which combines the concepts of the PSM and EPSM. The GPSM was shown to account for a broad variety of data from psychoacoustic and speech intelligibility (SI) experiments. In the suggested GPSM, the PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A recent study of Schubotz et al. [(2016). J. Acoust. Soc. Am. 140, 524-540] showed that short-time power features are important to account for SI for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers. Here an extension of the GPSM is suggested, where both the envelope power SNRs and power SNRs are calculated on short-time scales. In contrast to Biberger and Ewert (2016), the envelope power SNRs and power SNRs are combined by applying a maximum operation, where only the most contributing domain is considered, instead of using an additive combination of envelope power and power SNRs. The proposed model is shown to account for a critical set of psychoacoustic experiments and for SI in a variety of noise- and speech-like maskers, reverberation and spectral subtraction. Model predictions are compared to those of the extended speech intelligibility index (ESII) and the multi-resolution speech-based EPSM, demonstrating that the current approach shows the highest predictive power. The contribution of amplitude modulation masking and energetic masking in the different psychoacoustic and SI experiments is analyzed using the suggested model.