In this work, we propose a novel representationlearning technique for Deep Learning-based Speech Enhancement algorithms inspired by Domain-Adversarial training. A gradient reversal layer and an additional network are employed, only at training time, to explicitly enforce a representation that is orthogonal to the additive noise in the input signal. We show that such learning scheme, which can be applied easily to most mask-based Deep Neural Network Speech Enhancement approaches, is able to improve the denoising performance when used in conjunction with scale-invariant signal-to-distortion ratio loss and allows to reach state-of-the-art performance with no computational overhead at run-time. In particular, on the commonly used VoiceBank-DEMAND benchmarking dataset, we improve signal-to-distortion ratio and signal-to-noise ratio over the nonadversarial model and CSIG, COVL and CBAK over other, state-of-the art, adversarial training techniques.

A Novel Adversarial Training Scheme for Deep Neural Network based Speech Enhancement / Cornell, S.; Principi, E.; Squartini, S.. - ELETTRONICO. - (2020), pp. 1-8. (Intervento presentato al convegno 2020 International Joint Conference on Neural Networks, IJCNN 2020 tenutosi a gbr nel 2020) [10.1109/IJCNN48605.2020.9206734].

A Novel Adversarial Training Scheme for Deep Neural Network based Speech Enhancement

Cornell S.;Principi E.;Squartini S.
2020-01-01

Abstract

In this work, we propose a novel representationlearning technique for Deep Learning-based Speech Enhancement algorithms inspired by Domain-Adversarial training. A gradient reversal layer and an additional network are employed, only at training time, to explicitly enforce a representation that is orthogonal to the additive noise in the input signal. We show that such learning scheme, which can be applied easily to most mask-based Deep Neural Network Speech Enhancement approaches, is able to improve the denoising performance when used in conjunction with scale-invariant signal-to-distortion ratio loss and allows to reach state-of-the-art performance with no computational overhead at run-time. In particular, on the commonly used VoiceBank-DEMAND benchmarking dataset, we improve signal-to-distortion ratio and signal-to-noise ratio over the nonadversarial model and CSIG, COVL and CBAK over other, state-of-the art, adversarial training techniques.
2020
978-1-7281-6926-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/289775
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact