Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel approach based on non-linear predictive denoising autoencoders. In our approach, auditory spectral features of the next short-term frame are predicted from the previous frames by means of Long-Short Term Memory (LSTM) recurrent denoising autoencoders. We show that this yields an effective generative model for audio. The reconstruction error between the input and the output of the autoencoder is used as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-the-art methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 94.4% F-Measure.

Non-Linear Prediction with LSTM Recurrent Neural Networks for Acoustic Novelty Detection / Marchi, E.; Vesperini, Fabio; Weninger, F.; Eyben, F.; Squartini, Stefano; Schuller, B.. - (2015). (Intervento presentato al convegno International Joint Conference on Neural Networks, IJCNN 2015 tenutosi a Killarney; Ireland nel 12 July 2015 through 17 July 2015) [10.1109/IJCNN.2015.7280757].

Non-Linear Prediction with LSTM Recurrent Neural Networks for Acoustic Novelty Detection

VESPERINI, FABIO;SQUARTINI, Stefano;
2015-01-01

Abstract

Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel approach based on non-linear predictive denoising autoencoders. In our approach, auditory spectral features of the next short-term frame are predicted from the previous frames by means of Long-Short Term Memory (LSTM) recurrent denoising autoencoders. We show that this yields an effective generative model for audio. The reconstruction error between the input and the output of the autoencoder is used as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-the-art methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 94.4% F-Measure.
2015
978-147991960-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/230587
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 59
  • ???jsp.display-item.citation.isi??? 40
social impact