The integration of Large Language Models into virtual Human systems opens new avenues for creating interactive, intelligent agents capable of natural and personalized human-computer communication. However, the real-time generation and deployment of such avatars remain computationally demanding and often lack modularity or adaptability. In this paper, we propose an efficient and scalable framework for creating LLM-driven virtual humans that balances performance, responsiveness, and expressiveness. Our architecture combines lightweight dialogue management with multimodal synchronization pipelines to support speech and facial animation. The framework includes an optimization layer that enables on-device deployment without compromising interactivity. We demonstrate the effectiveness of our approach by deploying our system into several lightweight devices, showing improvements in latency and adaptability to user input. This work sets the stage for broader use of intelligent avatars in domains such as education, entertainment, and customer support.
A Modular and Efficient Framework for the Development of Large Language Model-Based Virtual Humans: An Educational Scenario / Giordano, Michele; Berardini, Daniele; Frontoni, Emanuele; Zingaretti, Primo; Stacchio, Lorenzo. - 16169:(2026), pp. 649-660. ( Workshops and competitions hosted by the 23rd International Conference on Image Analysis and Processing, ICIAP 2025 ita 2025) [10.1007/978-3-032-11317-7_52].
A Modular and Efficient Framework for the Development of Large Language Model-Based Virtual Humans: An Educational Scenario
Berardini, Daniele;Frontoni, Emanuele;Zingaretti, Primo;
2026-01-01
Abstract
The integration of Large Language Models into virtual Human systems opens new avenues for creating interactive, intelligent agents capable of natural and personalized human-computer communication. However, the real-time generation and deployment of such avatars remain computationally demanding and often lack modularity or adaptability. In this paper, we propose an efficient and scalable framework for creating LLM-driven virtual humans that balances performance, responsiveness, and expressiveness. Our architecture combines lightweight dialogue management with multimodal synchronization pipelines to support speech and facial animation. The framework includes an optimization layer that enables on-device deployment without compromising interactivity. We demonstrate the effectiveness of our approach by deploying our system into several lightweight devices, showing improvements in latency and adaptability to user input. This work sets the stage for broader use of intelligent avatars in domains such as education, entertainment, and customer support.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


