: Generative AI techniques are revolutionizing the creation of 3D and immersive content, yet challenges remain, such as achieving precise user control in 3D generation. Current text-to-3D and image-to-3D pipelines often produce outputs that deviate from user expectations, lacking the ability to refine or correct generated models effectively. To address these limitations, we propose Imagin3D, a novel human-in-the-loop (HITL) system that integrates Multimodal Large Language Models to enhance the controllability and adaptability of 3D content generation. Imagin3D leverages a Multi-View Question Answering module to evaluate the consistency of generated views with user-provided textual descriptions, enabling iterative refinement through guided inpainting while preserving multi-view consistency. This allows users to co-create 3D models, which are then synthesized into a final 3D asset using Neural Rendering. We validate Imagin3D through extensive quantitative evaluations and a comprehensive user study, demonstrating its effectiveness in improving usability, accuracy, and user satisfaction in interactive 3D generation tasks. Our results highlight the potential of HITL approaches to bridge the gap between AI-generated outputs and user intent, paving the way for more accessible and user-centered 3D generation workflows.

Orchestrating Generative AI Paradigms with Human-In-The-Loop for 3D Generation / Balloni, Emanuele; Stacchio, Lorenzo; Paolanti, Marina; Zingaretti, Primo; Pierdicca, Roberto. - In: IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS. - ISSN 1077-2626. - PP:(2026), pp. 1-12. [10.1109/tvcg.2026.3695993]

Orchestrating Generative AI Paradigms with Human-In-The-Loop for 3D Generation

Balloni, Emanuele;Paolanti, Marina;Zingaretti, Primo;Pierdicca, Roberto
2026-01-01

Abstract

: Generative AI techniques are revolutionizing the creation of 3D and immersive content, yet challenges remain, such as achieving precise user control in 3D generation. Current text-to-3D and image-to-3D pipelines often produce outputs that deviate from user expectations, lacking the ability to refine or correct generated models effectively. To address these limitations, we propose Imagin3D, a novel human-in-the-loop (HITL) system that integrates Multimodal Large Language Models to enhance the controllability and adaptability of 3D content generation. Imagin3D leverages a Multi-View Question Answering module to evaluate the consistency of generated views with user-provided textual descriptions, enabling iterative refinement through guided inpainting while preserving multi-view consistency. This allows users to co-create 3D models, which are then synthesized into a final 3D asset using Neural Rendering. We validate Imagin3D through extensive quantitative evaluations and a comprehensive user study, demonstrating its effectiveness in improving usability, accuracy, and user satisfaction in interactive 3D generation tasks. Our results highlight the potential of HITL approaches to bridge the gap between AI-generated outputs and user intent, paving the way for more accessible and user-centered 3D generation workflows.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/358177
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact