The management of modern solutions for Big Data management and analytics, most notably Data Lakes and Data Lakehouses, is faced with new challenges stemming from the versatility offered by such technologies, as well as the continuously evolving variety and volume of data sources, necessitating the tracking of data quality concerns. In this scenario, this paper proposes a metadata management framework for summary data sources with the capability to generate data profiles at various levels of detail. The approach leverages a Knowledge Graph, which defines dimensions and measures according to the multidimensional model. Profiles are then exploited to efficiently assess a set of quality properties of sources in a Big Data framework, including completeness, coverage and consistency that are formally defined and evaluated.

Assessment of Data Quality Through Multi-granularity Data Profiling / Diamantini, Claudia; Mele, Alessandro; Potena, Domenico; Storti, Emanuele. - 13985:(2023), pp. 195-209. [10.1007/978-3-031-42914-9_14]

Assessment of Data Quality Through Multi-granularity Data Profiling

Diamantini, Claudia;Mele, Alessandro;Potena, Domenico;Storti, Emanuele
2023-01-01

Abstract

The management of modern solutions for Big Data management and analytics, most notably Data Lakes and Data Lakehouses, is faced with new challenges stemming from the versatility offered by such technologies, as well as the continuously evolving variety and volume of data sources, necessitating the tracking of data quality concerns. In this scenario, this paper proposes a metadata management framework for summary data sources with the capability to generate data profiles at various levels of detail. The approach leverages a Knowledge Graph, which defines dimensions and measures according to the multidimensional model. Profiles are then exploited to efficiently assess a set of quality properties of sources in a Big Data framework, including completeness, coverage and consistency that are formally defined and evaluated.
2023
Advances in Databases and Information Systems. ADBIS 2023
978-3-031-42913-2
978-3-031-42914-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/321172
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact