In biometric person identification systems, speaker identification plays a crucial role as the voice is the more natural signal to produce and the simplest to acquire. Mel frequency cepstral coefficients (MFCCs) have been widely adopted for decades in speech processing to capture the speech-specific characteristics with a reduced dimensionality. However, although their ability to de-correlate the vocal source and the vocal tract filter make them suitable for speech recognition, they show up some drawbacks in speaker recognition. This paper presents an experimental evaluation showing that reducing the dimension of features by using the discrete Karhunen-Loève transform (DKLT), guarantees better performance with respect to conventional MFCC features. In particular with short sequences of speech frames, that is with utterance duration of less than 1 s, the performance of truncated DKLT representation are always better than MFCC.

Speaker Identification with Short Sequences of Speech Frames / Biagetti, Giorgio; Crippa, Paolo; Curzi, Alessandro; Orcioni, Simone; Turchetti, Claudio. - 2:(2015), pp. 178-185. (Intervento presentato al convegno 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015) tenutosi a Lisbona, Portogallo nel 10 - 12 Gennaio 2015) [10.5220/0005191701780185].

Speaker Identification with Short Sequences of Speech Frames

BIAGETTI, Giorgio;CRIPPA, Paolo
;
CURZI, ALESSANDRO;ORCIONI, Simone;TURCHETTI, Claudio
2015-01-01

Abstract

In biometric person identification systems, speaker identification plays a crucial role as the voice is the more natural signal to produce and the simplest to acquire. Mel frequency cepstral coefficients (MFCCs) have been widely adopted for decades in speech processing to capture the speech-specific characteristics with a reduced dimensionality. However, although their ability to de-correlate the vocal source and the vocal tract filter make them suitable for speech recognition, they show up some drawbacks in speaker recognition. This paper presents an experimental evaluation showing that reducing the dimension of features by using the discrete Karhunen-Loève transform (DKLT), guarantees better performance with respect to conventional MFCC features. In particular with short sequences of speech frames, that is with utterance duration of less than 1 s, the performance of truncated DKLT representation are always better than MFCC.
2015
978-989-758-077-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/227759
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? ND
social impact