Extreme Learning Machine (ELM) represents a popular paradigm for training feedforward neural networks due to its fast learning time. This paper applies the technique for the automatic classification of speech utterances. Power Normalized Cepstral Coefficients (PNCC) are employed as feature vectors and ELM performs the final classification. Both the baseline ELM algorithm and ELM with kernel have been employed and tested. Due to the fixed number of input neurons in the ELM, a length normalization algorithm is employed to transform the PNCC sequence into a vector of fixed length. Length normalization has been performed using two techniques: the first is based on Dynamic Time Warping (DTW) distances, the second on the vectorized outerproduct of trajectory matrix. Experiments have been conducted on the TIDIGITS corpus, to assess the performance on an isolated speech recognition task, and on ITAAL, to validate the system in an emergency detection task in realistic acoustic conditions. The ELM approach has been compared to template matching based on Dynamic Time Warping and to a Support Vector Machine based speech recognizer. The obtained results demonstrated the effectiveness of the approach both in terms of recognition performance and execution times. In particular, classification based on PNCCs, DTW distances and ELM kernel resulted in the best performing algorithm both in terms of recognition accuracy and execution times.

Acoustic Template-Matching for Automatic Emergency State Detection: an ELM based algorithm / Principi, Emanuele; Squartini, Stefano; E., Cambria; Piazza, Francesco. - In: NEUROCOMPUTING. - ISSN 0925-2312. - ELETTRONICO. - Volume 149, Part A:(2015), pp. 426-434. [10.1016/j.neucom.2014.01.067]

Acoustic Template-Matching for Automatic Emergency State Detection: an ELM based algorithm

PRINCIPI, EMANUELE
;
SQUARTINI, Stefano;PIAZZA, Francesco
2015-01-01

Abstract

Extreme Learning Machine (ELM) represents a popular paradigm for training feedforward neural networks due to its fast learning time. This paper applies the technique for the automatic classification of speech utterances. Power Normalized Cepstral Coefficients (PNCC) are employed as feature vectors and ELM performs the final classification. Both the baseline ELM algorithm and ELM with kernel have been employed and tested. Due to the fixed number of input neurons in the ELM, a length normalization algorithm is employed to transform the PNCC sequence into a vector of fixed length. Length normalization has been performed using two techniques: the first is based on Dynamic Time Warping (DTW) distances, the second on the vectorized outerproduct of trajectory matrix. Experiments have been conducted on the TIDIGITS corpus, to assess the performance on an isolated speech recognition task, and on ITAAL, to validate the system in an emergency detection task in realistic acoustic conditions. The ELM approach has been compared to template matching based on Dynamic Time Warping and to a Support Vector Machine based speech recognizer. The obtained results demonstrated the effectiveness of the approach both in terms of recognition performance and execution times. In particular, classification based on PNCCs, DTW distances and ELM kernel resulted in the best performing algorithm both in terms of recognition accuracy and execution times.
2015
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/153902
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 15
social impact