Nowadays Human Pose Estimation (HPE) represents one of the main research themes in the field of computer vision. Despite innovative methods and solutions introduced for frame processing algorithms, the use of standard frame-based cameras still has several drawbacks such as data redundancy and fixed frame-rate. The use of event-based cameras guarantees higher temporal resolution with lower memory and computational cost while preserving the significant information to be processed and thus it represents a new solution for real-time applications. In this paper, the DHP19 dataset was employed, the first and, to date, the only one with HPE data recorded from Dynamic Vision Sensor (DVS) event-based cameras. Starting from the baseline single-input single-output (SISO) Convolutional Neural Network (CNN) model proposed in the literature, a novel multi-input multi-output (MIMO) CNN-based architecture was proposed in order to model simultaneously two different single camera views. Experimental results show that the proposed MIMO approach outperforms the standard SISO model in terms of accuracy and training time.
3D Human Pose Estimation Based on Multi-Input Multi-Output Convolutional Neural Network and Event Cameras: A Proof of Concept on the DHP19 Dataset / Manilii, A.; Lucarelli, L.; Rosati, R.; Romeo, L.; Mancini, A.; Frontoni, E.. - 12661:(2021), pp. 14-25. (Intervento presentato al convegno 25th International Conference on Pattern Recognition Workshops, ICPR 2020 tenutosi a ita nel 2021) [10.1007/978-3-030-68763-2_2].
3D Human Pose Estimation Based on Multi-Input Multi-Output Convolutional Neural Network and Event Cameras: A Proof of Concept on the DHP19 Dataset
Rosati R.;Romeo L.;Mancini A.;Frontoni E.
2021-01-01
Abstract
Nowadays Human Pose Estimation (HPE) represents one of the main research themes in the field of computer vision. Despite innovative methods and solutions introduced for frame processing algorithms, the use of standard frame-based cameras still has several drawbacks such as data redundancy and fixed frame-rate. The use of event-based cameras guarantees higher temporal resolution with lower memory and computational cost while preserving the significant information to be processed and thus it represents a new solution for real-time applications. In this paper, the DHP19 dataset was employed, the first and, to date, the only one with HPE data recorded from Dynamic Vision Sensor (DVS) event-based cameras. Starting from the baseline single-input single-output (SISO) Convolutional Neural Network (CNN) model proposed in the literature, a novel multi-input multi-output (MIMO) CNN-based architecture was proposed in order to model simultaneously two different single camera views. Experimental results show that the proposed MIMO approach outperforms the standard SISO model in terms of accuracy and training time.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.