The Big Data landscape poses challenges in managing di- verse data formats, requiring efficient storage and processing for high- quality analysis. Effective metadata management is crucial for organiz- ing, accessing, and reusing data within these data ecosystems. Existing metadata vocabularies and standard, however, do not adequately accom- modate aggregated or summary data. This paper introduces a metadata model to support semantic annotation and profiling of multidimensional data. Defined as an RDF vocabulary, the model provides a flexible and extensible graph representation for metadata at source and attribute lev- els, aligning dimensions and measures to a reference Knowledge Graph and summarizing value distributions in profiles. An evaluation of the ex- ecution time for profile generation is also proposed, across data sources with different cardinalities.

A metadata model for profiling multidimensional sources in data ecosystems / Diamantini, Claudia; Mele, Alessandro; Potena, Domenico; Rossetti, Cristina; Storti, Emanuele. - (2024). ( The 3rd Italian Conference on Big Data and Data Science, ITADATA 2024 Pisa Sept. 17-19, 2024).

A metadata model for profiling multidimensional sources in data ecosystems

Diamantini, Claudia;Mele, Alessandro;Potena, Domenico;Storti, Emanuele
2024-01-01

Abstract

The Big Data landscape poses challenges in managing di- verse data formats, requiring efficient storage and processing for high- quality analysis. Effective metadata management is crucial for organiz- ing, accessing, and reusing data within these data ecosystems. Existing metadata vocabularies and standard, however, do not adequately accom- modate aggregated or summary data. This paper introduces a metadata model to support semantic annotation and profiling of multidimensional data. Defined as an RDF vocabulary, the model provides a flexible and extensible graph representation for metadata at source and attribute lev- els, aligning dimensions and measures to a reference Knowledge Graph and summarizing value distributions in profiles. An evaluation of the ex- ecution time for profile generation is also proposed, across data sources with different cardinalities.
2024
File in questo prodotto:
File Dimensione Formato  
Potena_A metadata model_Preprint_2024.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza d'uso: Creative commons
Dimensione 765.03 kB
Formato Adobe PDF
765.03 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/347837
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact