The integration of Large Language Models into virtual Human systems opens new avenues for creating interactive, intelligent agents capable of natural and personalized human-computer communication. However, the real-time generation and deployment of such avatars remain computationally demanding and often lack modularity or adaptability. In this paper, we propose an efficient and scalable framework for creating LLM-driven virtual humans that balances performance, responsiveness, and expressiveness. Our architecture combines lightweight dialogue management with multimodal synchronization pipelines to support speech and facial animation. The framework includes an optimization layer that enables on-device deployment without compromising interactivity. We demonstrate the effectiveness of our approach by deploying our system into several lightweight devices, showing improvements in latency and adaptability to user input. This work sets the stage for broader use of intelligent avatars in domains such as education, entertainment, and customer support.

A Modular and Efficient Framework for the Development of Large Language Model-Based Virtual Humans: An Educational Scenario / Giordano, Michele; Berardini, Daniele; Frontoni, Emanuele; Zingaretti, Primo; Stacchio, Lorenzo. - 16169:(2026), pp. 649-660. ( Workshops and competitions hosted by the 23rd International Conference on Image Analysis and Processing, ICIAP 2025 ita 2025) [10.1007/978-3-032-11317-7_52].

A Modular and Efficient Framework for the Development of Large Language Model-Based Virtual Humans: An Educational Scenario

Berardini, Daniele;Frontoni, Emanuele;Zingaretti, Primo;
2026-01-01

Abstract

The integration of Large Language Models into virtual Human systems opens new avenues for creating interactive, intelligent agents capable of natural and personalized human-computer communication. However, the real-time generation and deployment of such avatars remain computationally demanding and often lack modularity or adaptability. In this paper, we propose an efficient and scalable framework for creating LLM-driven virtual humans that balances performance, responsiveness, and expressiveness. Our architecture combines lightweight dialogue management with multimodal synchronization pipelines to support speech and facial animation. The framework includes an optimization layer that enables on-device deployment without compromising interactivity. We demonstrate the effectiveness of our approach by deploying our system into several lightweight devices, showing improvements in latency and adaptability to user input. This work sets the stage for broader use of intelligent avatars in domains such as education, entertainment, and customer support.
2026
9783032113160
9783032113177
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/358175
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact