Who Cried When: Infant Cry Diarization with Dilated Fully-Convolutional Neural Networks

Severini, M.; Principi, E.; Cornell, S.; Gabrielli, L.; Squartini, S.

doi:10.1109/IJCNN48605.2020.9207234

In this paper, we address the problem of the concurrent detection of multiple infant cries by using microphones located in the cribs of a Neonatal Intensive Care Unit (NICU). We term this task as infant cry diarization in resemblance with the 'speaker diarization' task related to the speech signal: instead of determining 'who spoke when', here the problem is determining 'who cried when'. The proposed algorithm consists of a fully-convolutional neural network (Conv-DetNet) that processes simultaneously all the audio signals acquired from the microphone in each crib and detects if the infants cried or not. The neural network takes as input Log-Mel coefficients and it is composed of stacked dilated convolutional blocks with increasing dilation factors. Each block is composed of pointwise and depthwise convolutional layers that replace standard convolutions with a mathematically equivalent but more efficient operation. The architecture has been compared to its single-channel equivalent and to single and multi-channel architectures presented in a previous work, composed of standard convolutional layers and fully-connected layers. The experiments have been conducted on a synthetic dataset that simulates the acoustic environment of the Salesi Hospital NICU located in Ancona (Italy). The results have been evaluated in terms of Area Under Precision-Recall Curve (PRC-AUC) and they showed that the proposed multi-channel Conv-DetNet achieves the highest performance with a PRC-AUC equal to 87.58%, outperforming all the comparative methods.

Who Cried When: Infant Cry Diarization with Dilated Fully-Convolutional Neural Networks / Severini, M.; Principi, E.; Cornell, S.; Gabrielli, L.; Squartini, S.. - (2020), pp. 1-8. (Intervento presentato al convegno 2020 International Joint Conference on Neural Networks, IJCNN 2020 tenutosi a gbr nel 2020) [10.1109/IJCNN48605.2020.9207234].