Using models to evaluate whether spectral centroid could play a role in F0 and VTL perception in acoustic and electric hearing
Normal-hearing (NH) listeners rely on two principal voice characteristics — voice pitch (F0) and vocal-tract length (VTL) — to segregate voices in cocktail party situations. Cochlear implant (CI) listeners have been shown to have much larger F0 and VTL discrimination thresholds than NH listeners. However, the mechanisms underlying perception of these cues in CI users, but also in NH listeners, remain largely unknown.
In implants, while some studies argue that F0 can be coded temporally, other recent studies have suggested that spectral centroid (SC) could be used instead. When the F0 changes, the lower frequency channels of the implant are more or less excited, thus shifting the SC. Similarly, while some researchers argue that VTL is perceived through its effect on individual formants, others have argued that, like musical timbre, VTL perception might also rely on SC.
However both these assumptions result from observation from steady-state stimuli. In natural speech, the formant trajectories create a tremendous SC variability which may blur small F0 and/or VTL differences. Using basic auditory models, the variability of perceptual SC in natural speech was evaluated and compared to the effects of VTL and F0 variations.