The SPLICE algorithm has been recently proposed in the literature to address the robustness issue in Automatic Speech Recognition (ASR). Several variants have been also proposed to improve some drawbacks of the original technique. In this presentation an innovative efficient solution is discussed: it is based on SNR estimation in the frequency or mel domain and investigates the possibility of using different noise types for GMM training in order to maximize the generalization capabilities of the tool and therefore the recognition performances in presence of unknown noise sources. Computer simulations, conducted on the AURORA2 database, seem to confirm the effectiveness of the idea: the proposed approach yields similar accuracy performances w.r.t. the reference one, even employing a simpler mismatch compensation paradigm which does not need any a-priori knowledge on the noises used in the training phase.
Efficient SNR driven SPLICE implementation for robust speech recognition / Squartini, Stefano; Principi, Emanuele; Cifani, S.; Rotili, R.; Piazza, Francesco. - LNCS voulme n. 6800:(2011), pp. 70-80. [10.1007/978-3-642-25775-9_6]
Efficient SNR driven SPLICE implementation for robust speech recognition
SQUARTINI, Stefano;PRINCIPI, EMANUELE;PIAZZA, Francesco
2011-01-01
Abstract
The SPLICE algorithm has been recently proposed in the literature to address the robustness issue in Automatic Speech Recognition (ASR). Several variants have been also proposed to improve some drawbacks of the original technique. In this presentation an innovative efficient solution is discussed: it is based on SNR estimation in the frequency or mel domain and investigates the possibility of using different noise types for GMM training in order to maximize the generalization capabilities of the tool and therefore the recognition performances in presence of unknown noise sources. Computer simulations, conducted on the AURORA2 database, seem to confirm the effectiveness of the idea: the proposed approach yields similar accuracy performances w.r.t. the reference one, even employing a simpler mismatch compensation paradigm which does not need any a-priori knowledge on the noises used in the training phase.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.