This paper presents, on the basis of a rigorous mathematical formulation, a multicomponent sinusoidal model that allows an asymptotically exact reconstruction of nonstationary speech signals, regardless of their duration and without any limitation in the modeling of voiced, unvoiced, and transitional segments. The proposed approach is based on the application of the Hilbert transform to obtain an amplitude signal from which an AM component is extracted by filtering, so that the residue can then be iteratively processed in the same way. This technique permits a multicomponent AM-FM model to be derived in which the number of components (iterations) may be arbitrarily chosen. Additionally, the instantaneous frequencies of these components can be calculated with a given accuracy by segmentation of the phase signals. The validity of the proposed approach has been proven by some applications to both synthetic signals and natural speech. Several comparisons show how this approach almost always has a higher performance than that obtained by current best practices, and does not need the complex filter optimizations required by other techniques.
Multicomponent AM-FM representations: An asymptotically exact approach / Gianfelici, Francesco; Biagetti, Giorgio; Crippa, Paolo; Turchetti, Claudio. - In: IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. - ISSN 1558-7916. - 15:3(2007), pp. 823-837. [10.1109/TASL.2006.889744]
Multicomponent AM-FM representations: An asymptotically exact approach
GIANFELICI, Francesco;BIAGETTI, Giorgio;CRIPPA, Paolo;TURCHETTI, Claudio
2007-01-01
Abstract
This paper presents, on the basis of a rigorous mathematical formulation, a multicomponent sinusoidal model that allows an asymptotically exact reconstruction of nonstationary speech signals, regardless of their duration and without any limitation in the modeling of voiced, unvoiced, and transitional segments. The proposed approach is based on the application of the Hilbert transform to obtain an amplitude signal from which an AM component is extracted by filtering, so that the residue can then be iteratively processed in the same way. This technique permits a multicomponent AM-FM model to be derived in which the number of components (iterations) may be arbitrarily chosen. Additionally, the instantaneous frequencies of these components can be calculated with a given accuracy by segmentation of the phase signals. The validity of the proposed approach has been proven by some applications to both synthetic signals and natural speech. Several comparisons show how this approach almost always has a higher performance than that obtained by current best practices, and does not need the complex filter optimizations required by other techniques.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.