In computer graphics, 3D modeling is a fundamental concept. It is the process of creating three-dimensional objects or scenes using specialized software that allows users to create, manipulate and modify geometric shapes to build complex models. This operation requires a huge amount of time to perform and specialised knowledge. Typically, it takes three to five hours of modelling to obtain a basic mesh from the blueprint. Several approaches have tried to automate this operation to reduce modelling time. The most interesting of these approaches are based on Deep Learning, and one of the most interesting is Pixel2Mesh. However, training this network requires at least 150 epochs to obtain usable results. Starting from these premises, this work investigates the possibility of training a modified version of the Pixel2Mesh in fewer epochs to obtain comparable or better results. A modification was applied to the convolutional block to achieve this, replacing the classification-based approach with an image reconstruction-based approach. This modification uses a configuration based on constructing an encoder-decoder architecture using state-of-the-art networks such as VGG, DenseNet, ResNet, and Inception. Using this approach, the convolutional block learns how to reconstruct the image correctly from the source image by learning the position of the object of interest within the image. With this approach, it was possible to train the complete network in 50 epochs, achieving results that outperform the state-of-the-art. The tests performed on the networks show an increase of 0.5% points over the state-of-the-art average.
Investigation on the Encoder-Decoder Application for Mesh Generation / Mameli, M.; Balloni, E.; Mancini, A.; Frontoni, E.; Zingaretti, P.. - 14496 LNCS:(2024), pp. 387-400. [10.1007/978-3-031-50072-5_31]
Investigation on the Encoder-Decoder Application for Mesh Generation
Mameli M.;Balloni E.;Mancini A.;Frontoni E.;Zingaretti P.
2024-01-01
Abstract
In computer graphics, 3D modeling is a fundamental concept. It is the process of creating three-dimensional objects or scenes using specialized software that allows users to create, manipulate and modify geometric shapes to build complex models. This operation requires a huge amount of time to perform and specialised knowledge. Typically, it takes three to five hours of modelling to obtain a basic mesh from the blueprint. Several approaches have tried to automate this operation to reduce modelling time. The most interesting of these approaches are based on Deep Learning, and one of the most interesting is Pixel2Mesh. However, training this network requires at least 150 epochs to obtain usable results. Starting from these premises, this work investigates the possibility of training a modified version of the Pixel2Mesh in fewer epochs to obtain comparable or better results. A modification was applied to the convolutional block to achieve this, replacing the classification-based approach with an image reconstruction-based approach. This modification uses a configuration based on constructing an encoder-decoder architecture using state-of-the-art networks such as VGG, DenseNet, ResNet, and Inception. Using this approach, the convolutional block learns how to reconstruct the image correctly from the source image by learning the position of the object of interest within the image. With this approach, it was possible to train the complete network in 50 epochs, achieving results that outperform the state-of-the-art. The tests performed on the networks show an increase of 0.5% points over the state-of-the-art average.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.