In this work, we propose a novel representationlearning technique for Deep Learning-based Speech Enhancement algorithms inspired by Domain-Adversarial training. A gradient reversal layer and an additional network are employed, only at training time, to explicitly enforce a representation that is orthogonal to the additive noise in the input signal. We show that such learning scheme, which can be applied easily to most mask-based Deep Neural Network Speech Enhancement approaches, is able to improve the denoising performance when used in conjunction with scale-invariant signal-to-distortion ratio loss and allows to reach state-of-the-art performance with no computational overhead at run-time. In particular, on the commonly used VoiceBank-DEMAND benchmarking dataset, we improve signal-to-distortion ratio and signal-to-noise ratio over the nonadversarial model and CSIG, COVL and CBAK over other, state-of-the art, adversarial training techniques.
A Novel Adversarial Training Scheme for Deep Neural Network based Speech Enhancement / Cornell, S.; Principi, E.; Squartini, S.. - ELETTRONICO. - (2020), pp. 1-8. (Intervento presentato al convegno 2020 International Joint Conference on Neural Networks, IJCNN 2020 tenutosi a gbr nel 2020) [10.1109/IJCNN48605.2020.9206734].
A Novel Adversarial Training Scheme for Deep Neural Network based Speech Enhancement
Cornell S.;Principi E.;Squartini S.
2020-01-01
Abstract
In this work, we propose a novel representationlearning technique for Deep Learning-based Speech Enhancement algorithms inspired by Domain-Adversarial training. A gradient reversal layer and an additional network are employed, only at training time, to explicitly enforce a representation that is orthogonal to the additive noise in the input signal. We show that such learning scheme, which can be applied easily to most mask-based Deep Neural Network Speech Enhancement approaches, is able to improve the denoising performance when used in conjunction with scale-invariant signal-to-distortion ratio loss and allows to reach state-of-the-art performance with no computational overhead at run-time. In particular, on the commonly used VoiceBank-DEMAND benchmarking dataset, we improve signal-to-distortion ratio and signal-to-noise ratio over the nonadversarial model and CSIG, COVL and CBAK over other, state-of-the art, adversarial training techniques.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.