The application of speaker recognition technologies on domotic systems, cars, or mobile devices such as tablets, smartphones and smartwatches faces with the problem of ambient noise. This paper studies the robustness of a speaker identification system when the speech signal is corrupted by the environmental noise. In the everyday scenarios the noise sources are highly time-varying and potentially unknown. Therefore the noise robustness must be investigated in the absence of information about the noise. To this end the performance of speaker identification using short sequences of speech frames was evaluated using a database with simulated noisy speech data. This database is derived from the TIMIT database by rerecording the data in the presence of various noise types, and is used to test the model for speaker identification with a focus on the varieties of noise. Additionally, in order to optimize the recognition performance, in the training stage the white noise has been added as a first step towards the generation of multicondition training data to model speech corrupted by noise with unknown temporal-spectral characteristics. The experimental results demonstrated the validity of the proposed algorithm for speaker identification using short portions of speech also in realistic conditions when the ambient noise is not negligible.

Speaker identification in noisy conditions using short sequences of speech frames / Biagetti, Giorgio; Crippa, Paolo; Falaschetti, Laura; Orcioni, Simone; Turchetti, Claudio. - ELETTRONICO. - 73:(2018), pp. 43-52. [10.1007/978-3-319-59424-8_5]

Speaker identification in noisy conditions using short sequences of speech frames

BIAGETTI, Giorgio;CRIPPA, Paolo
;
FALASCHETTI, LAURA;ORCIONI, Simone;TURCHETTI, Claudio
2018-01-01

Abstract

The application of speaker recognition technologies on domotic systems, cars, or mobile devices such as tablets, smartphones and smartwatches faces with the problem of ambient noise. This paper studies the robustness of a speaker identification system when the speech signal is corrupted by the environmental noise. In the everyday scenarios the noise sources are highly time-varying and potentially unknown. Therefore the noise robustness must be investigated in the absence of information about the noise. To this end the performance of speaker identification using short sequences of speech frames was evaluated using a database with simulated noisy speech data. This database is derived from the TIMIT database by rerecording the data in the presence of various noise types, and is used to test the model for speaker identification with a focus on the varieties of noise. Additionally, in order to optimize the recognition performance, in the training stage the white noise has been added as a first step towards the generation of multicondition training data to model speech corrupted by noise with unknown temporal-spectral characteristics. The experimental results demonstrated the validity of the proposed algorithm for speaker identification using short portions of speech also in realistic conditions when the ambient noise is not negligible.
2018
Smart Innovation, Systems and Technologies - 9th KES International Conference on Intelligent Decision Technologies, KES-IDT 2017
9783319594231
978-3-319-59424-8
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/249348
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 6
social impact