Despite the significant advancements in networking technologies, transmission of data packets in real-time, particularly in speech communications, continues to face challenges due to the possibility of data loss. This loss not only compromises sound quality but also diminishes overall intelligibility. In such cases, Packet Loss Concealment (PLC) techniques could help by reconstructing the missing content and restoring the audio quality. This work proposes a novel method, that improves previous time-frequency generative inpainting approaches. Compared to other state-of-the-art methods, our proposed approach has the flexibility to restore lost packets either in real-time at low latency or in offline mode, without the need to retrain the network. Evaluations conducted against a recent state-of-the-art method, ranked at the top of the 2022 Microsoft PLC competition, and against four DNN-based PLC solutions from the literature, show superior scores in terms of task-specific metrics. The method has also been tested in more challenging scenarios than aforementioned ones, with packet loss rates of up to 50%, showing the ability to help automatic speech recognition (ASR) systems reduce word error rate (WER) by up to almost 50% relative improvement. Additionally, a comparative subjective evaluation has been conducted, confirming the effectiveness of the proposed method in relation to the state of the art. The code is made available in the project repositoryhttps://github.com/aircarlo/cplx-bin2bin.
Complex-Bin2bin: A Latency-Flexible Generative Neural Model for Audio Packet Loss Concealment / Aironi, C.; Gabrielli, L.; Cornell, S.; Squartini, S.. - In: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. - ISSN 2329-9290. - (2024), pp. 1-12. [10.1109/TASLP.2024.3515794]
Complex-Bin2bin: A Latency-Flexible Generative Neural Model for Audio Packet Loss Concealment
Aironi C.
;Gabrielli L.;Squartini S.
2024-01-01
Abstract
Despite the significant advancements in networking technologies, transmission of data packets in real-time, particularly in speech communications, continues to face challenges due to the possibility of data loss. This loss not only compromises sound quality but also diminishes overall intelligibility. In such cases, Packet Loss Concealment (PLC) techniques could help by reconstructing the missing content and restoring the audio quality. This work proposes a novel method, that improves previous time-frequency generative inpainting approaches. Compared to other state-of-the-art methods, our proposed approach has the flexibility to restore lost packets either in real-time at low latency or in offline mode, without the need to retrain the network. Evaluations conducted against a recent state-of-the-art method, ranked at the top of the 2022 Microsoft PLC competition, and against four DNN-based PLC solutions from the literature, show superior scores in terms of task-specific metrics. The method has also been tested in more challenging scenarios than aforementioned ones, with packet loss rates of up to 50%, showing the ability to help automatic speech recognition (ASR) systems reduce word error rate (WER) by up to almost 50% relative improvement. Additionally, a comparative subjective evaluation has been conducted, confirming the effectiveness of the proposed method in relation to the state of the art. The code is made available in the project repositoryhttps://github.com/aircarlo/cplx-bin2bin.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.