A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed in this paper. The approach is fully data-driven and employs a Neural Network fed by GCC-PHAT (Generalized Cross Correlation Phase Transform) Patterns, calculated by means of the microphone signals, to determine the speaker position in the room under analysis. In particular, we deal with a multi-room case study, in which the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested against the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In particular, we focused on the speaker localization problem in two distinct neighbouring rooms. We assumed the presence of an Oracle multi-room Voice Activity Detector (VAD) in our experiments. A three-stage optimization procedure has been adopted to find the best network configuration and GCC-PHAT Patterns combination. Moreover, an algorithm based on Time Difference of Arrival (TDOA), recently proposed in literature for the addressed applicative context, has been considered as term of comparison. As result, the proposed algorithm outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against 1465 mm. Concluding, we also assessed the algorithm performance when a real VAD, recently proposed by some of the authors, is used. Even though a degradation of localization capability is registered (an average RMSE equal to 770 mm), still a remarkable improvement with respect to the state of the art performance is obtained.

A neural network based algorithm for speaker localization in a multi-room environment / Vesperini, Fabio; Vecchiotti, Paolo; Principi, Emanuele; Squartini, Stefano; Piazza, Francesco. - ELETTRONICO. - (2016), pp. 1-6. (Intervento presentato al convegno MSLP 2016 tenutosi a Vietri Sul Mare, Salerno, Italy nel 13-16 Sept. 2016) [10.1109/MLSP.2016.7738817].

A neural network based algorithm for speaker localization in a multi-room environment

VESPERINI, FABIO;VECCHIOTTI, PAOLO;PRINCIPI, EMANUELE;SQUARTINI, Stefano;PIAZZA, Francesco
2016-01-01

Abstract

A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed in this paper. The approach is fully data-driven and employs a Neural Network fed by GCC-PHAT (Generalized Cross Correlation Phase Transform) Patterns, calculated by means of the microphone signals, to determine the speaker position in the room under analysis. In particular, we deal with a multi-room case study, in which the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested against the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In particular, we focused on the speaker localization problem in two distinct neighbouring rooms. We assumed the presence of an Oracle multi-room Voice Activity Detector (VAD) in our experiments. A three-stage optimization procedure has been adopted to find the best network configuration and GCC-PHAT Patterns combination. Moreover, an algorithm based on Time Difference of Arrival (TDOA), recently proposed in literature for the addressed applicative context, has been considered as term of comparison. As result, the proposed algorithm outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against 1465 mm. Concluding, we also assessed the algorithm performance when a real VAD, recently proposed by some of the authors, is used. Even though a degradation of localization capability is registered (an average RMSE equal to 770 mm), still a remarkable improvement with respect to the state of the art performance is obtained.
2016
978-1-5090-0746-2
978-1-5090-0746-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/239800
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 59
  • ???jsp.display-item.citation.isi??? 3
social impact