The application of speaker recognition technologies on domotic systems, cars, or mobile devices such as tablets, smartphones and smartwatches faces with the problem of ambient noise. This paper studies the robustness of a speaker identification system when the speech signal is corrupted by the environmental noise. In the everyday scenarios the noise sources are highly time-varying and potentially unknown. Therefore the noise robustness must be investigated in the absence of information about the noise. To this end the performance of speaker identification using short sequences of speech frames was evaluated using a database with simulated noisy speech data. This database is derived from the TIMIT database by rerecording the data in the presence of various noise types, and is used to test the model for speaker identification with a focus on the varieties of noise. Additionally, in order to optimize the recognition performance, in the training stage the white noise has been added as a first step towards the generation of multicondition training data to model speech corrupted by noise with unknown temporal-spectral characteristics. The experimental results demonstrated the validity of the proposed algorithm for speaker identification using short portions of speech also in realistic conditions when the ambient noise is not negligible.
Speaker identification in noisy conditions using short sequences of speech frames / Biagetti, Giorgio; Crippa, Paolo; Falaschetti, Laura; Orcioni, Simone; Turchetti, Claudio. - ELETTRONICO. - 73:(2018), pp. 43-52. [10.1007/978-3-319-59424-8_5]
Speaker identification in noisy conditions using short sequences of speech frames
BIAGETTI, Giorgio;CRIPPA, Paolo
;FALASCHETTI, LAURA;ORCIONI, Simone;TURCHETTI, Claudio
2018-01-01
Abstract
The application of speaker recognition technologies on domotic systems, cars, or mobile devices such as tablets, smartphones and smartwatches faces with the problem of ambient noise. This paper studies the robustness of a speaker identification system when the speech signal is corrupted by the environmental noise. In the everyday scenarios the noise sources are highly time-varying and potentially unknown. Therefore the noise robustness must be investigated in the absence of information about the noise. To this end the performance of speaker identification using short sequences of speech frames was evaluated using a database with simulated noisy speech data. This database is derived from the TIMIT database by rerecording the data in the presence of various noise types, and is used to test the model for speaker identification with a focus on the varieties of noise. Additionally, in order to optimize the recognition performance, in the training stage the white noise has been added as a first step towards the generation of multicondition training data to model speech corrupted by noise with unknown temporal-spectral characteristics. The experimental results demonstrated the validity of the proposed algorithm for speaker identification using short portions of speech also in realistic conditions when the ambient noise is not negligible.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.