This paper presents AIRA-D FusionNet, a multi-modal system for violence recognition that integrates visual analysis using a MoViNet-A0 backbone with auditory processing of MFCC features through a BiLSTM network. Trained on a combined dataset, our fused model achieves a recall of 0.91 and an AUC of 0.856, demonstrating a 12% improvement in recall over unimodal baselines by effectively leveraging complementary audio-visual cues. To enable practical deployment, the model was successfully optimized for mobile inference by conversion to TensorFlow Lite. This confirms the system's viability for real-time violence detection applications on resource-constrained devices, offering a sensitive and efficient solution for automated security monitoring.
AIRA-D FusionNet: A Multi-Modal Deep Learning Framework for Violence Recognition through Audio-Visual Cues / Halilaj, M.; Bekteshi, E.; Myrto, E.; Dragoni, A. F.. - ELETTRONICO. - (2026), pp. 1-6. ( 3rd International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2026 phl 2026) [10.1109/ACDSA67686.2026.11468260].
AIRA-D FusionNet: A Multi-Modal Deep Learning Framework for Violence Recognition through Audio-Visual Cues
Halilaj M.;Dragoni A. F.
2026-01-01
Abstract
This paper presents AIRA-D FusionNet, a multi-modal system for violence recognition that integrates visual analysis using a MoViNet-A0 backbone with auditory processing of MFCC features through a BiLSTM network. Trained on a combined dataset, our fused model achieves a recall of 0.91 and an AUC of 0.856, demonstrating a 12% improvement in recall over unimodal baselines by effectively leveraging complementary audio-visual cues. To enable practical deployment, the model was successfully optimized for mobile inference by conversion to TensorFlow Lite. This confirms the system's viability for real-time violence detection applications on resource-constrained devices, offering a sensitive and efficient solution for automated security monitoring.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


