Metadata have always played a key role in favoring the cooperation of heterogeneous data sources. This role has become much more crucial with the advent of data lakes, in which case metadata represent the only possibility to guarantee an effective and efficient management of data source interoperability. For this reason, the necessity to define new models and paradigms for metadata representation and management appears crucial in the data lake scenario. In this paper, we aim at addressing this issue by proposing a new metadata model well suited for data lakes. Furthermore, to give an idea of its capabilities, we present an approach that leverages it to “structure” unstructured sources and to extract thematic views from heterogeneous data lake sources.
A new metadata model to uniformly handle heterogeneous data lake sources / Diamantini, C.; Lo Giudice, P.; Musarella, L.; Potena, D.; Storti, E.; Ursino, D.. - STAMPA. - 909:(2018), pp. 165-177. [10.1007/978-3-030-00063-9_17]
A new metadata model to uniformly handle heterogeneous data lake sources
C. Diamantini;D. Potena;E. Storti;D. Ursino
2018-01-01
Abstract
Metadata have always played a key role in favoring the cooperation of heterogeneous data sources. This role has become much more crucial with the advent of data lakes, in which case metadata represent the only possibility to guarantee an effective and efficient management of data source interoperability. For this reason, the necessity to define new models and paradigms for metadata representation and management appears crucial in the data lake scenario. In this paper, we aim at addressing this issue by proposing a new metadata model well suited for data lakes. Furthermore, to give an idea of its capabilities, we present an approach that leverages it to “structure” unstructured sources and to extract thematic views from heterogeneous data lake sources.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.