The management of modern solutions for Big Data management and analytics, most notably Data Lakes and Data Lakehouses, is faced with new challenges stemming from the versatility offered by such technologies, as well as the continuously evolving variety and volume of data sources, necessitating the tracking of data quality concerns. In this scenario, this paper proposes a metadata management framework for summary data sources with the capability to generate data profiles at various levels of detail. The approach leverages a Knowledge Graph, which defines dimensions and measures according to the multidimensional model. Profiles are then exploited to efficiently assess a set of quality properties of sources in a Big Data framework, including completeness, coverage and consistency that are formally defined and evaluated.
Assessment of Data Quality Through Multi-granularity Data Profiling / Diamantini, Claudia; Mele, Alessandro; Potena, Domenico; Storti, Emanuele. - 13985:(2023), pp. 195-209. [10.1007/978-3-031-42914-9_14]