The Big Data landscape poses challenges in managing di- verse data formats, requiring efficient storage and processing for high- quality analysis. Effective metadata management is crucial for organiz- ing, accessing, and reusing data within these data ecosystems. Existing metadata vocabularies and standard, however, do not adequately accom- modate aggregated or summary data. This paper introduces a metadata model to support semantic annotation and profiling of multidimensional data. Defined as an RDF vocabulary, the model provides a flexible and extensible graph representation for metadata at source and attribute lev- els, aligning dimensions and measures to a reference Knowledge Graph and summarizing value distributions in profiles. An evaluation of the ex- ecution time for profile generation is also proposed, across data sources with different cardinalities.
A metadata model for profiling multidimensional sources in data ecosystems / Diamantini, Claudia; Mele, Alessandro; Potena, Domenico; Rossetti, Cristina; Storti, Emanuele. - (2024). ( The 3rd Italian Conference on Big Data and Data Science, ITADATA 2024 Pisa Sept. 17-19, 2024).
A metadata model for profiling multidimensional sources in data ecosystems
Diamantini, Claudia;Mele, Alessandro;Potena, Domenico;Storti, Emanuele
2024-01-01
Abstract
The Big Data landscape poses challenges in managing di- verse data formats, requiring efficient storage and processing for high- quality analysis. Effective metadata management is crucial for organiz- ing, accessing, and reusing data within these data ecosystems. Existing metadata vocabularies and standard, however, do not adequately accom- modate aggregated or summary data. This paper introduces a metadata model to support semantic annotation and profiling of multidimensional data. Defined as an RDF vocabulary, the model provides a flexible and extensible graph representation for metadata at source and attribute lev- els, aligning dimensions and measures to a reference Knowledge Graph and summarizing value distributions in profiles. An evaluation of the ex- ecution time for profile generation is also proposed, across data sources with different cardinalities.| File | Dimensione | Formato | |
|---|---|---|---|
|
Potena_A metadata model_Preprint_2024.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza d'uso:
Creative commons
Dimensione
765.03 kB
Formato
Adobe PDF
|
765.03 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


