Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automatic Speech Recognition (ASR) in noisy acoustic scenarios. According to this approach, feature coefficients are normalized by using suitable linear or nonlinear transformations in order to match the noisy speech statistics to the clean speech one. Histogram Equalization (HEQ) is an effective algorithm belonging to this category. Recently some of the authors have proposed an interesting extension to the HEQ original algorithm, in order to suitably deal with the multichannel audio information coming from multi-microphone sensory activity in far-field acoustic scenarios. In this paper the feature normalization capabilities of the multichannel HEQ technique are further enhanced by introducing the kernel estimation technique and employing the multi-condition training for ASR system parametrization. Computer simulations based on the Aurora 2 database have shown that a significant recognition improvement with respect to the single-channel counterpart and other multi-channel techniques can be achieved confirming the effectiveness of the idea.

Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions / Principi, Emanuele; R., Rotili; Squartini, Stefano. - Volume 234:(2011), pp. 149-161. [10.3233/978-1-60750-972-1-149]

Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions

PRINCIPI, EMANUELE;SQUARTINI, Stefano
2011-01-01

Abstract

Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automatic Speech Recognition (ASR) in noisy acoustic scenarios. According to this approach, feature coefficients are normalized by using suitable linear or nonlinear transformations in order to match the noisy speech statistics to the clean speech one. Histogram Equalization (HEQ) is an effective algorithm belonging to this category. Recently some of the authors have proposed an interesting extension to the HEQ original algorithm, in order to suitably deal with the multichannel audio information coming from multi-microphone sensory activity in far-field acoustic scenarios. In this paper the feature normalization capabilities of the multichannel HEQ technique are further enhanced by introducing the kernel estimation technique and employing the multi-condition training for ASR system parametrization. Computer simulations based on the Aurora 2 database have shown that a significant recognition improvement with respect to the single-channel counterpart and other multi-channel techniques can be achieved confirming the effectiveness of the idea.
2011
Proceedings of the 21st Italian Workshop on Neural Nets - Frontiers in Artificial Intelligence and Applications
9781607509714
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/65522
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact