Performance evaluation of the short-time objective intelligibility measure with different band importance functions
Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility.
In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.