This paper concerns the exploitation of multi-resolution time-frequency features via Wavelet Packet Transform to improve audio onset detection. In our approach, Wavelet Packet Energy Coefficients (WPEC) and Auditory Spectral Features (ASF) are processed by Bidirectional Long Short-Term Memory (BLSTM) recurrent neural network that yields the onsets location. The combination of the two feature sets, together with the BLSTM based detector, form an advanced energy-based approach that takes advantage from the multi-resolution analysis given by the wavelet decomposition of the audio input signal. The neural network is trained with a large database of onset data covering various genres and onset types. Due to its data-driven nature, our approach does not require the onset detection method and its parameters to be tuned to a particular type of music. We show a comparison with other types and sizes of recurrent neural networks and we compare results with state-of-the-art methods on the whole onset dataset. We conclude that our approach significantly increase performance in terms of F-measure without any music genres or onset type constraints.

Audio Onset Detection: A Wavelet Packet Based Approach with Recurrent Neural Networks / E., Marchi; Ferroni, Giacomo; F., Eyben; Squartini, Stefano; B., Schuller. - (2014). (Intervento presentato al convegno IJCNN 2014 tenutosi a Beijing, China nel July 6-11 2014) [10.1109/IJCNN.2014.6889669].

Audio Onset Detection: A Wavelet Packet Based Approach with Recurrent Neural Networks

FERRONI, GIACOMO;SQUARTINI, Stefano;
2014-01-01

Abstract

This paper concerns the exploitation of multi-resolution time-frequency features via Wavelet Packet Transform to improve audio onset detection. In our approach, Wavelet Packet Energy Coefficients (WPEC) and Auditory Spectral Features (ASF) are processed by Bidirectional Long Short-Term Memory (BLSTM) recurrent neural network that yields the onsets location. The combination of the two feature sets, together with the BLSTM based detector, form an advanced energy-based approach that takes advantage from the multi-resolution analysis given by the wavelet decomposition of the audio input signal. The neural network is trained with a large database of onset data covering various genres and onset types. Due to its data-driven nature, our approach does not require the onset detection method and its parameters to be tuned to a particular type of music. We show a comparison with other types and sizes of recurrent neural networks and we compare results with state-of-the-art methods on the whole onset dataset. We conclude that our approach significantly increase performance in terms of F-measure without any music genres or onset type constraints.
2014
978-1-4799-1484-5
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/153905
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 5
social impact