In the last years, data lakes are emerging as an effective and efficient support for information and knowledge extraction from a huge amount of highly heterogeneous and quickly changing data sources. Data lake management requires the definition of new techniques, very different from the ones adopted for data warehouses in the past. One of the main issues to address in this scenario consists in the extraction of thematic views from the (very heterogeneous and generally unstructured) data sources of a data lake. In this paper, we propose a new network-based model to uniformly represent structured, semi-structured and unstructured sources of a data lake. Then, we present a new approach to, at least partially, “structure” unstructured data. Finally, we define a technique to extract thematic views from the sources of a data lake, based on similarity and other semantic relations among the metadata of data sources

An approach to extracting thematic views from highly heterogeneous sources of a data lake / Diamantini, C.; Lo Giudice, P.; Musarella, L.; Potena, D.; Storti, E.; Ursino, D.. - 2161:(2018). (Intervento presentato al convegno The 26th Italian Symposium on Advanced Database Systems (SEBD 2018) tenutosi a Castellaneta Marina (TA) nel Giugno 2018).

An approach to extracting thematic views from highly heterogeneous sources of a data lake

C. Diamantini;D. Potena;E. Storti;D. Ursino
2018-01-01

Abstract

In the last years, data lakes are emerging as an effective and efficient support for information and knowledge extraction from a huge amount of highly heterogeneous and quickly changing data sources. Data lake management requires the definition of new techniques, very different from the ones adopted for data warehouses in the past. One of the main issues to address in this scenario consists in the extraction of thematic views from the (very heterogeneous and generally unstructured) data sources of a data lake. In this paper, we propose a new network-based model to uniformly represent structured, semi-structured and unstructured sources of a data lake. Then, we present a new approach to, at least partially, “structure” unstructured data. Finally, we define a technique to extract thematic views from the sources of a data lake, based on similarity and other semantic relations among the metadata of data sources
2018
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/258751
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact