Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience. This paper describes a system based on a generative adversarial approach, which aims to repair the lost fragments during the transmission of audio streams. Inspired by the powerful image-to-image translation capability of Generative Adversarial Networks (GANs), we propose bin2bin, an improved pix2pix framework to achieve the translation task from magnitude spectrograms of audio frames with lost packets, to non-corrupted speech spectrograms. In order to better maintain the structural information after spectrogram translation, this paper introduces the combination of two STFT-based loss functions, mixed with the traditional GAN objective. Furthermore, we employ a modified PatchGAN structure as discriminator and we lower the concealment time by a proper initialization of the phase reconstruction algorithm. Experimental results show that the proposed method has obvious advantages when compared with the current state-of-the-art methods, as it can better handle both high packet loss rates and large gaps. We make our code publicly available at: github.com/aircarlo/bin2bin-GAN-PLC.

A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment / Aironi, C.; Cornell, S.; Serafini, L.; Squartini, S.. - (2023), pp. 121-125. (Intervento presentato al convegno 31st European Signal Processing Conference, EUSIPCO 2023 tenutosi a Helsinki, Finland nel 04-08 September 2023) [10.23919/EUSIPCO58844.2023.10290027].

A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

Aironi C.;Cornell S.;Serafini L.;Squartini S.
2023-01-01

Abstract

Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience. This paper describes a system based on a generative adversarial approach, which aims to repair the lost fragments during the transmission of audio streams. Inspired by the powerful image-to-image translation capability of Generative Adversarial Networks (GANs), we propose bin2bin, an improved pix2pix framework to achieve the translation task from magnitude spectrograms of audio frames with lost packets, to non-corrupted speech spectrograms. In order to better maintain the structural information after spectrogram translation, this paper introduces the combination of two STFT-based loss functions, mixed with the traditional GAN objective. Furthermore, we employ a modified PatchGAN structure as discriminator and we lower the concealment time by a proper initialization of the phase reconstruction algorithm. Experimental results show that the proposed method has obvious advantages when compared with the current state-of-the-art methods, as it can better handle both high packet loss rates and large gaps. We make our code publicly available at: github.com/aircarlo/bin2bin-GAN-PLC.
2023
978-9-4645-9360-0
File in questo prodotto:
File Dimensione Formato  
A_Time-Frequency_Generative_Adversarial_Based_Method_for_Audio_Packet_Loss_Concealment.pdf

Solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza d'uso: Tutti i diritti riservati
Dimensione 482.02 kB
Formato Adobe PDF
482.02 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/325456
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact