Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automatic Speech Recognition (ASR) in noisy acoustic scenarios. According to this approach, feature coefficients are normalized by using suitable linear or nonlinear transformations in order to match the noisy speech statistics to the clean speech one. Histogram Equalization (HEQ) is an effective algorithm belonging to this category. Recently some of the authors have proposed an interesting extension to the HEQ original algorithm, in order to suitably deal with the multichannel audio information coming from multi-microphone sensory activity in far-field acoustic scenarios. In this paper the feature normalization capabilities of the multichannel HEQ technique are further enhanced by introducing the kernel estimation technique and employing the multi-condition training for ASR system parametrization. Computer simulations based on the Aurora 2 database have shown that a significant recognition improvement with respect to the single-channel counterpart and other multi-channel techniques can be achieved confirming the effectiveness of the idea.
Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions / Principi, Emanuele; R., Rotili; Squartini, Stefano. - Volume 234:(2011), pp. 149-161. [10.3233/978-1-60750-972-1-149]
Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions
PRINCIPI, EMANUELE;SQUARTINI, Stefano
2011-01-01
Abstract
Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automatic Speech Recognition (ASR) in noisy acoustic scenarios. According to this approach, feature coefficients are normalized by using suitable linear or nonlinear transformations in order to match the noisy speech statistics to the clean speech one. Histogram Equalization (HEQ) is an effective algorithm belonging to this category. Recently some of the authors have proposed an interesting extension to the HEQ original algorithm, in order to suitably deal with the multichannel audio information coming from multi-microphone sensory activity in far-field acoustic scenarios. In this paper the feature normalization capabilities of the multichannel HEQ technique are further enhanced by introducing the kernel estimation technique and employing the multi-condition training for ASR system parametrization. Computer simulations based on the Aurora 2 database have shown that a significant recognition improvement with respect to the single-channel counterpart and other multi-channel techniques can be achieved confirming the effectiveness of the idea.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.