Toward Embodied Intelligence: An Architecture for Natural Dialogue and Action Execution in Assistive Robots

IRIS

This work presents a new architecture for an assistive mobile robot designed to support the elderly and individuals with disabilities in performing daily indoor tasks. The proposed framework integrates multimodal perception, language-based reasoning, and safety-aware action planning to enable natural and effective two-way communication between humans and robots. At its core, the system utilizes large language models (LLMs) for dialogue management, contextual understanding, and reasoning over fused sensory inputs, including vision, speech, and proprioceptive data. By combining speech recognition, object detection, and local memory modules, the robot not only interprets explicit user commands but also infers implicit intentions, predicts missing information, and requests clarifications when necessary. A dedicated safety layer filters and validates action sequences before execution, ensuring reliability and user safety. The architecture further incorporates short- and long-term memory structures, enabling the robot to maintain a dialogue history and semantic knowledge of the environment. This bidirectional interaction model allows the robot to generate both natural conversational responses and executable action plans in a context-aware manner. Preliminary implementation and testing demonstrate promising performance, bridging the gap between conversational AI and embodied robotic action in real-life assistive scenarios.

Toward Embodied Intelligence: An Architecture for Natural Dialogue and Action Execution in Assistive Robots / Omer, K., Monteriu', A.. - (2026), pp. 42-47. (12th International Conference on Automation, Robotics and Applications, ICARA 2026 Istanbul 5 - 7 February 2026) [10.1109/ICARA69401.2026.11480310].

Toward Embodied Intelligence: An Architecture for Natural Dialogue and Action Execution in Assistive Robots

Omer K.;Monteriu' A.

2026-01-01

Abstract

This work presents a new architecture for an assistive mobile robot designed to support the elderly and individuals with disabilities in performing daily indoor tasks. The proposed framework integrates multimodal perception, language-based reasoning, and safety-aware action planning to enable natural and effective two-way communication between humans and robots. At its core, the system utilizes large language models (LLMs) for dialogue management, contextual understanding, and reasoning over fused sensory inputs, including vision, speech, and proprioceptive data. By combining speech recognition, object detection, and local memory modules, the robot not only interprets explicit user commands but also infers implicit intentions, predicts missing information, and requests clarifications when necessary. A dedicated safety layer filters and validates action sequences before execution, ensuring reliability and user safety. The architecture further incorporates short- and long-term memory structures, enabling the robot to maintain a dialogue history and semantic knowledge of the environment. This bidirectional interaction model allows the robot to generate both natural conversational responses and executable action plans in a context-aware manner. Preliminary implementation and testing demonstrate promising performance, bridging the gap between conversational AI and embodied robotic action in real-life assistive scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Codice ISBN
	
				9798331563530
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICARA69401.2026.11480310
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Omer_Toward-Embodied-Intelligence-Architecture_2026.pdf Solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza d'uso: Tutti i diritti riservati Dimensione 493.89 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	493.89 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/357536

Citazioni

ND

0

ND

social impact