Discovering relevant datasets in large, heterogeneous data ecosystems, such as Data Lakes or Data spaces, is a complex task, often hindered by a lack of transparency and user-centric explanations in the discovery process. Explainability is critical for enabling users to understand why specific datasets are recommended, what information they contain, and how they align with user-defined criteria and preferences. To address these challenges, this work proposes a novel Graph Retrieval-Augmented Generation (Graph RAG) framework to enhance explainability in a platform for discovery of summary data sources. The proposed approach leverages a Knowledge Graph (KG) to interpret user requests, extracting relevant contextual information. These enriched requests are then transformed by a Large Language Model (LLM) into actionable dataset queries for a dataset discovery platform. Candidate solutions are evaluated and enriched with statistical insights on value distributions and contextual knowledge from the KG. Finally, the LLM ranks these solutions based on user preferences, producing a final report. This dual strategy of query enrichment and contextual explanation fosters transparency and enhances user understanding of the discovery process. We demonstrate the effectiveness of the approach through an experimental validation, highlighting its potential to improve both the accuracy and interpretability of dataset discovery.

A Graph RAG Approach to Enhance Explainability in Dataset Discovery / Diamantini, Claudia; Mele, Alessandro; Mircoli, Alex; Potena, Domenico; Rossetti, Cristina; Storti, Emanuele. - In: DATA SCIENCE AND ENGINEERING. - ISSN 2364-1185. - (2025). [10.1007/s41019-025-00313-x]

A Graph RAG Approach to Enhance Explainability in Dataset Discovery

Diamantini, Claudia;Mele, Alessandro;Mircoli, Alex;Potena, Domenico;Storti, Emanuele
2025-01-01

Abstract

Discovering relevant datasets in large, heterogeneous data ecosystems, such as Data Lakes or Data spaces, is a complex task, often hindered by a lack of transparency and user-centric explanations in the discovery process. Explainability is critical for enabling users to understand why specific datasets are recommended, what information they contain, and how they align with user-defined criteria and preferences. To address these challenges, this work proposes a novel Graph Retrieval-Augmented Generation (Graph RAG) framework to enhance explainability in a platform for discovery of summary data sources. The proposed approach leverages a Knowledge Graph (KG) to interpret user requests, extracting relevant contextual information. These enriched requests are then transformed by a Large Language Model (LLM) into actionable dataset queries for a dataset discovery platform. Candidate solutions are evaluated and enriched with statistical insights on value distributions and contextual knowledge from the KG. Finally, the LLM ranks these solutions based on user preferences, producing a final report. This dual strategy of query enrichment and contextual explanation fosters transparency and enhances user understanding of the discovery process. We demonstrate the effectiveness of the approach through an experimental validation, highlighting its potential to improve both the accuracy and interpretability of dataset discovery.
2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/350054
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact