In many experimental domains, especially e-Science, workflow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workflows is generated, that formally describes experimental protocols and the way different tools are combined inside experiments. In this paper we describe the use of the SUBDUE graph clustering algorithm to discover sub-workflows from a repository. Since sub-workflows represent significant usage patterns of tools, the discovered knowledge can be exploited by scientists to learn by-example about design practices, or to retrieve and reuse workflows. Such a knowledge, ultimately, leverages the potential of scientific workflow repositories to become a knowledge-asset. A set of experiments is conducted on the myExperiment repository to assess the effectiveness of the approach.
Mining Usage Patterns from a Repository of Scientific Workflows / Diamantini, Claudia; Potena, Domenico; Storti, Emanuele. - (2012), pp. 152-157. (Intervento presentato al convegno The 27th ACM Symposium on Applied Computing tenutosi a Riva del Garda, Trento, Italy nel March 26 - 30, 2012).
Mining Usage Patterns from a Repository of Scientific Workflows
Diamantini, Claudia;Potena, Domenico;Storti, Emanuele
2012-01-01
Abstract
In many experimental domains, especially e-Science, workflow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workflows is generated, that formally describes experimental protocols and the way different tools are combined inside experiments. In this paper we describe the use of the SUBDUE graph clustering algorithm to discover sub-workflows from a repository. Since sub-workflows represent significant usage patterns of tools, the discovered knowledge can be exploited by scientists to learn by-example about design practices, or to retrieve and reuse workflows. Such a knowledge, ultimately, leverages the potential of scientific workflow repositories to become a knowledge-asset. A set of experiments is conducted on the myExperiment repository to assess the effectiveness of the approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.