This work proposes a dominance detection framework operating in reverberated environments. The framework is composed of a speech enhancement front-end, which automatically reduces the distortions introduced by room reverberation in the speech signals, and a dominance detector, which processes the enhanced signals and estimates the most and least dominant person in a segment. The front-end is composed by three cooperating blocks: speaker diarization, room impulse responses identification and speech dereverberation. The dominance estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Experiments have been performed suitably reverberating the DOME dataset: the absolute accuracy improvement averaged over the addressed reverberated conditions is 32.68% in the most dominant person estimation task and 36.56% in the least dominant person estimation one, both with full agreement among annotators.

Dominance Detection in A Reverberated Acoustic Scenario / Principi, Emanuele; R., Rotili; M., Woellmer; Squartini, Stefano; B., Schuller. - Volume 7367 LNCS, Issue PART 1:(2012), pp. 394-402. [10.1007/978-3-642-31346-2_45]

Dominance Detection in A Reverberated Acoustic Scenario

PRINCIPI, EMANUELE;SQUARTINI, Stefano;
2012-01-01

Abstract

This work proposes a dominance detection framework operating in reverberated environments. The framework is composed of a speech enhancement front-end, which automatically reduces the distortions introduced by room reverberation in the speech signals, and a dominance detector, which processes the enhanced signals and estimates the most and least dominant person in a segment. The front-end is composed by three cooperating blocks: speaker diarization, room impulse responses identification and speech dereverberation. The dominance estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Experiments have been performed suitably reverberating the DOME dataset: the absolute accuracy improvement averaged over the addressed reverberated conditions is 32.68% in the most dominant person estimation task and 36.56% in the least dominant person estimation one, both with full agreement among annotators.
2012
Advances in Neural Networks - ISNN2012
9783642313455
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/74300
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact