In the last years, data lakes are emerging as an effective and an efficient support for information and knowledge extraction from a huge amount of highly heterogeneous and quickly changing data sources. Data lake management requires the definition of new techniques, very different from the ones adopted for data warehouses in the past. In this scenario, one of the most challenging issues to address consists in the extraction of topic-guided (i.e., thematic) views from the (very heterogeneous and often unstructured) sources of a data lake. In this paper, we propose a new network-based model to uniformly represent structured, semi-structured and unstructured sources of a data lake. Then, we present a new approach to, at least partially, “structuring” unstructured data. Finally, we define a technique to extract topic-guided views from the sources of a data lake, based on similarity and other semantic relationships among source metadata.

An approach to extracting topic-guided views from the sources of a data lake / Diamantini, C.; Lo Giudice, P.; Potena, D.; Storti, E.; Ursino, D.. - In: INFORMATION SYSTEMS FRONTIERS. - ISSN 1387-3326. - 23:1(2021), pp. 243-262. [10.1007/s10796-020-10010-x]

An approach to extracting topic-guided views from the sources of a data lake

C. Diamantini
;
D. Potena;E. Storti;D. Ursino
2021-01-01

Abstract

In the last years, data lakes are emerging as an effective and an efficient support for information and knowledge extraction from a huge amount of highly heterogeneous and quickly changing data sources. Data lake management requires the definition of new techniques, very different from the ones adopted for data warehouses in the past. In this scenario, one of the most challenging issues to address consists in the extraction of topic-guided (i.e., thematic) views from the (very heterogeneous and often unstructured) sources of a data lake. In this paper, we propose a new network-based model to uniformly represent structured, semi-structured and unstructured sources of a data lake. Then, we present a new approach to, at least partially, “structuring” unstructured data. Finally, we define a technique to extract topic-guided views from the sources of a data lake, based on similarity and other semantic relationships among source metadata.
2021
File in questo prodotto:
File Dimensione Formato  
s10796-020-10010-x.pdf

Solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza d'uso: Tutti i diritti riservati
Dimensione 2.35 MB
Formato Adobe PDF
2.35 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
ISF2019_ver2.pdf

Open Access dal 25/05/2021

Descrizione: This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s10796-020-10010-x
Tipologia: Documento in post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza d'uso: Licenza specifica dell’editore
Dimensione 2.27 MB
Formato Adobe PDF
2.27 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/275540
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 11
social impact