Several issues can seriously degrade the quality of digital audio, such as packet loss on IP-based networks or damaged storage media, impacting intelligibility and user experience. This paper presents a generative approach, aiming to repair lost fragments in audio streams. Inspired by the well-established image-to-image translation ability of generative adversarial networks (GANs) and based on the bin2bin framework, previously introduced for speech inpainting, we propose an enhanced framework which performs the translation task from CQT magnitude spectrograms of music signal frames with lost regions, to reliable spectrograms. The goal is to effectively reconstruct missing audio segments, enabling a seamless listening experience for the audience. The proposed pipeline combines the traditional GAN discriminative loss function with two additional objectives: a loss function related to perceptual audio quality, and a second one based on the L2 norm between the true and predicted piano-roll, estimated by the CQT reconstruction. Through comprehensive evaluations on gaps of 375ms and 750ms, which are considered in the literature to be respectively of "small" and "medium" duration, we demonstrate the robustness and effectiveness of our framework in producing coherent reconstructions with reduced artifacts. The proposed approach outperforms a baseline cGAN-based method, GACELA. In terms of ODG score, a metric inspired by a human-based scoring system, we achieve a gain in performance up to 13.3%, while the improvement in Structural Similarity (SSIM) between the clean and restored spectrograms reaches 13.6%.
A Score-aware Generative Approach for Music Signals Inpainting / Aironi, Carlo; Cornell, Samuele; Gabrielli, Leonardo; Squartini, Stefano. - (2023), pp. 1-7. (Intervento presentato al convegno 2023 4th International Symposium on the Internet of Sounds tenutosi a Pisa, Italy nel October, 26th-27th, 2023) [10.1109/IEEECONF59510.2023.10335274].
A Score-aware Generative Approach for Music Signals Inpainting
Aironi, Carlo;Cornell, Samuele;Gabrielli, Leonardo;Squartini, Stefano
2023-01-01
Abstract
Several issues can seriously degrade the quality of digital audio, such as packet loss on IP-based networks or damaged storage media, impacting intelligibility and user experience. This paper presents a generative approach, aiming to repair lost fragments in audio streams. Inspired by the well-established image-to-image translation ability of generative adversarial networks (GANs) and based on the bin2bin framework, previously introduced for speech inpainting, we propose an enhanced framework which performs the translation task from CQT magnitude spectrograms of music signal frames with lost regions, to reliable spectrograms. The goal is to effectively reconstruct missing audio segments, enabling a seamless listening experience for the audience. The proposed pipeline combines the traditional GAN discriminative loss function with two additional objectives: a loss function related to perceptual audio quality, and a second one based on the L2 norm between the true and predicted piano-roll, estimated by the CQT reconstruction. Through comprehensive evaluations on gaps of 375ms and 750ms, which are considered in the literature to be respectively of "small" and "medium" duration, we demonstrate the robustness and effectiveness of our framework in producing coherent reconstructions with reduced artifacts. The proposed approach outperforms a baseline cGAN-based method, GACELA. In terms of ODG score, a metric inspired by a human-based scoring system, we achieve a gain in performance up to 13.3%, while the improvement in Structural Similarity (SSIM) between the clean and restored spectrograms reaches 13.6%.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.