In this paper, we address the problem of the concurrent detection of multiple infant cries by using microphones located in the cribs of a Neonatal Intensive Care Unit (NICU). We term this task as infant cry diarization in resemblance with the 'speaker diarization' task related to the speech signal: instead of determining 'who spoke when', here the problem is determining 'who cried when'. The proposed algorithm consists of a fully-convolutional neural network (Conv-DetNet) that processes simultaneously all the audio signals acquired from the microphone in each crib and detects if the infants cried or not. The neural network takes as input Log-Mel coefficients and it is composed of stacked dilated convolutional blocks with increasing dilation factors. Each block is composed of pointwise and depthwise convolutional layers that replace standard convolutions with a mathematically equivalent but more efficient operation. The architecture has been compared to its single-channel equivalent and to single and multi-channel architectures presented in a previous work, composed of standard convolutional layers and fully-connected layers. The experiments have been conducted on a synthetic dataset that simulates the acoustic environment of the Salesi Hospital NICU located in Ancona (Italy). The results have been evaluated in terms of Area Under Precision-Recall Curve (PRC-AUC) and they showed that the proposed multi-channel Conv-DetNet achieves the highest performance with a PRC-AUC equal to 87.58%, outperforming all the comparative methods.

Who Cried When: Infant Cry Diarization with Dilated Fully-Convolutional Neural Networks / Severini, M.; Principi, E.; Cornell, S.; Gabrielli, L.; Squartini, S.. - (2020), pp. 1-8. (Intervento presentato al convegno 2020 International Joint Conference on Neural Networks, IJCNN 2020 tenutosi a gbr nel 2020) [10.1109/IJCNN48605.2020.9207234].

Who Cried When: Infant Cry Diarization with Dilated Fully-Convolutional Neural Networks

Severini M.;Principi E.;Cornell S.;Gabrielli L.;Squartini S.
2020-01-01

Abstract

In this paper, we address the problem of the concurrent detection of multiple infant cries by using microphones located in the cribs of a Neonatal Intensive Care Unit (NICU). We term this task as infant cry diarization in resemblance with the 'speaker diarization' task related to the speech signal: instead of determining 'who spoke when', here the problem is determining 'who cried when'. The proposed algorithm consists of a fully-convolutional neural network (Conv-DetNet) that processes simultaneously all the audio signals acquired from the microphone in each crib and detects if the infants cried or not. The neural network takes as input Log-Mel coefficients and it is composed of stacked dilated convolutional blocks with increasing dilation factors. Each block is composed of pointwise and depthwise convolutional layers that replace standard convolutions with a mathematically equivalent but more efficient operation. The architecture has been compared to its single-channel equivalent and to single and multi-channel architectures presented in a previous work, composed of standard convolutional layers and fully-connected layers. The experiments have been conducted on a synthetic dataset that simulates the acoustic environment of the Salesi Hospital NICU located in Ancona (Italy). The results have been evaluated in terms of Area Under Precision-Recall Curve (PRC-AUC) and they showed that the proposed multi-channel Conv-DetNet achieves the highest performance with a PRC-AUC equal to 87.58%, outperforming all the comparative methods.
2020
978-1-7281-6926-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/286031
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact