Multi-label Sound Event Classification (SEC) is a challenging task which requires to handle multiple co-occurring sound event classes. Recent works proposed an ontology-aware framework for SEC in which a Graph-Neural Network (GNN) approach is trained to exploit labels co-occurrence information and improve the performance of a standard audio-feature based classifier via late-fusion. This GNN is fed a graph-based representation of the training set labels. In this paper we adopt such framework and perform an in-depth study on how the labels embeddings used to construct the graph representation can affect the performance. We perform our experiment on the FSD50K dataset and compare different embeddings strategies: two from previous works and two which haven't been considered yet for SEC applications. Our results show that node2vec embeddings lead to substantial performance improvements with respect to other embeddings strategies used in previously ontology-aware SEC works. Our best node2vec model leads to an absolute improvement of 3.39% in mean average precision with respect to the best competing embedding strategy, with a lower number of trainable parameters.

Graph Node Embeddings for ontology-aware Sound Event Classification: an evaluation study / Aironi, C.; Cornell, S.; Principi, E.; Squartini, S.. - 2022-:(2022), pp. 414-418. (Intervento presentato al convegno 30th European Signal Processing Conference, EUSIPCO 2022 tenutosi a srb nel 2022).

Graph Node Embeddings for ontology-aware Sound Event Classification: an evaluation study

Aironi C.
;
Cornell S.;Principi E.;Squartini S.
2022-01-01

Abstract

Multi-label Sound Event Classification (SEC) is a challenging task which requires to handle multiple co-occurring sound event classes. Recent works proposed an ontology-aware framework for SEC in which a Graph-Neural Network (GNN) approach is trained to exploit labels co-occurrence information and improve the performance of a standard audio-feature based classifier via late-fusion. This GNN is fed a graph-based representation of the training set labels. In this paper we adopt such framework and perform an in-depth study on how the labels embeddings used to construct the graph representation can affect the performance. We perform our experiment on the FSD50K dataset and compare different embeddings strategies: two from previous works and two which haven't been considered yet for SEC applications. Our results show that node2vec embeddings lead to substantial performance improvements with respect to other embeddings strategies used in previously ontology-aware SEC works. Our best node2vec model leads to an absolute improvement of 3.39% in mean average precision with respect to the best competing embedding strategy, with a lower number of trainable parameters.
2022
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/309930
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact