A lightweight approach to extract interschema properties from structured, semi-structured and unstructured sources in a big data scenario

IRIS

The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources. © 2020 World Scientific Publishing Company.

A lightweight approach to extract interschema properties from structured, semi-structured and unstructured sources in a big data scenario / Cauteruccio, F., Lo Giudice, P., Musarella, L., Terracina, G., Ursino, D., Virgili, L.. - In: INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING. - ISSN 0219-6220. - 19:3(2020), pp. 849-889. [10.1142/S0219622020500182]

A lightweight approach to extract interschema properties from structured, semi-structured and unstructured sources in a big data scenario

F. Cauteruccio;P. Lo Giudice;L. Musarella;G. Terracina;D. Ursino;L. Virgili

2020-01-01

Abstract

The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources. © 2020 World Scientific Publishing Company.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Rivista su cui è pubblicata l'opera
	
				INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING
			
	Codice DOI
	
				https://dx.doi.org/10.1142/S0219622020500182
			
	Parole chiave
	
				big data; interschema property derivation; structuring unstructured data; Unstructured sources
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Cauteruccio_A-Lightweight-approach_2020.pdf Solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza d'uso: Tutti i diritti riservati Dimensione 2.25 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.25 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
IJITDM19.pdf Open Access dal 14/06/2021 Descrizione: Electronic version of an article published as A lightweight approach to extract interschema properties from structured, semi-structured and unstructured sources in a big data scenario / Cauteruccio, F.; Lo Giudice, P.; Musarella, L.; Terracina, G.; Ursino, D.; Virgili, L.. - In: INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING. - ISSN 0219-6220. - 19:3(2020), pp. 849-889. 10.1142/S0219622020500182 ©2020 World Scientific Publishing Company, https://www.worldscientific.com/worldscinet/ijitdm. Only personal use of this material is permitted. Permission from publisher must be obtained for all other uses, in any current or future media. Tipologia: Documento in post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza d'uso: Licenza specifica dell'editore Dimensione 2.54 MB Formato Adobe PDF Visualizza/Apri	2.54 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/276193

Citazioni

ND

9

5

social impact