Static Analysis of Taverna Workflows To Predict Provenance PatternsCitation formats

Standard

Static Analysis of Taverna Workflows To Predict Provenance Patterns. / Alper, Pinar; Belhajjame, Khalid; Goble, Carole.

In: Future Generation Computer Systems, Vol. 75, 10.2017, p. 310-329.

Research output: Contribution to journalArticle

Harvard

Alper, P, Belhajjame, K & Goble, C 2017, 'Static Analysis of Taverna Workflows To Predict Provenance Patterns', Future Generation Computer Systems, vol. 75, pp. 310-329. https://doi.org/10.1016/j.future.2017.01.004

APA

Alper, P., Belhajjame, K., & Goble, C. (2017). Static Analysis of Taverna Workflows To Predict Provenance Patterns. Future Generation Computer Systems, 75, 310-329. https://doi.org/10.1016/j.future.2017.01.004

Vancouver

Alper P, Belhajjame K, Goble C. Static Analysis of Taverna Workflows To Predict Provenance Patterns. Future Generation Computer Systems. 2017 Oct;75:310-329. https://doi.org/10.1016/j.future.2017.01.004

Author

Alper, Pinar ; Belhajjame, Khalid ; Goble, Carole. / Static Analysis of Taverna Workflows To Predict Provenance Patterns. In: Future Generation Computer Systems. 2017 ; Vol. 75. pp. 310-329.

Bibtex

@article{fd2833ab287a43ba9c06a89416d4e0ee,
title = "Static Analysis of Taverna Workflows To Predict Provenance Patterns",
abstract = "Workflows have found adoption in scientific domains particularly due to their automation and provenance features. Using workflows scientists can repeat analyses with different input parameters to later use provenance to access and compare results based on respective parameters. An assumption that is often made is that by designing an analysis as a workflow, we get parameter-to-result traceability for free with workflow provenance. This assumption holds for cases of coarse-grained traceability, where an entire workflow is subjected to repetition and all workflow parameters contribute to all results. On the other hand for cases requiring finer grained traceability, where a workflow is configured with collections of parameters and analyses within a workfloware repeated with combinations of parameters from collections, this assumption is not guaranteed to hold. In this paper we identify two dimensions that affect fine-grained traceability as: 1) the level of granularity supported by a workflow system in modelling parameters/data in workflows and in provenance, which we name as the level of support for Factorial Design, and 2) the practice of scientists in successfully encoding Factorial Design into workflows. Taverna is a workflow system that provides extensive features for factorial design, meanwhile it provides an uncontrolled approach to workflow design; meaning scientists may create workflows, which, when run, could break traceability in provenance. Using a real-world Taverna workflow we show how broken traceability manifests in provenance and how it can render provenance practically useless for accessing workflow outputs derived from particular input parameters. In order to prevent broken traceability from occurring, we describe a rule-based static analysis technique, which operates over workflow descriptions and anticipates patterns in provenance. Our rules exploit the well-defined execution behaviour in the Taverna system. In order to understand Factorial Design support in workflow systems in general, we provide a comparative survey. We conclude that other workflow systems also provide constructs for Factorial Design, and, similar to Taverna, they too are prone to broken traceability.",
author = "Pinar Alper and Khalid Belhajjame and Carole Goble",
year = "2017",
month = oct
doi = "10.1016/j.future.2017.01.004",
language = "English",
volume = "75",
pages = "310--329",
journal = "Future Generation Computer Systems",
issn = "0167-739X",
publisher = "Elsevier BV",

}

RIS

TY - JOUR

T1 - Static Analysis of Taverna Workflows To Predict Provenance Patterns

AU - Alper, Pinar

AU - Belhajjame, Khalid

AU - Goble, Carole

PY - 2017/10

Y1 - 2017/10

N2 - Workflows have found adoption in scientific domains particularly due to their automation and provenance features. Using workflows scientists can repeat analyses with different input parameters to later use provenance to access and compare results based on respective parameters. An assumption that is often made is that by designing an analysis as a workflow, we get parameter-to-result traceability for free with workflow provenance. This assumption holds for cases of coarse-grained traceability, where an entire workflow is subjected to repetition and all workflow parameters contribute to all results. On the other hand for cases requiring finer grained traceability, where a workflow is configured with collections of parameters and analyses within a workfloware repeated with combinations of parameters from collections, this assumption is not guaranteed to hold. In this paper we identify two dimensions that affect fine-grained traceability as: 1) the level of granularity supported by a workflow system in modelling parameters/data in workflows and in provenance, which we name as the level of support for Factorial Design, and 2) the practice of scientists in successfully encoding Factorial Design into workflows. Taverna is a workflow system that provides extensive features for factorial design, meanwhile it provides an uncontrolled approach to workflow design; meaning scientists may create workflows, which, when run, could break traceability in provenance. Using a real-world Taverna workflow we show how broken traceability manifests in provenance and how it can render provenance practically useless for accessing workflow outputs derived from particular input parameters. In order to prevent broken traceability from occurring, we describe a rule-based static analysis technique, which operates over workflow descriptions and anticipates patterns in provenance. Our rules exploit the well-defined execution behaviour in the Taverna system. In order to understand Factorial Design support in workflow systems in general, we provide a comparative survey. We conclude that other workflow systems also provide constructs for Factorial Design, and, similar to Taverna, they too are prone to broken traceability.

AB - Workflows have found adoption in scientific domains particularly due to their automation and provenance features. Using workflows scientists can repeat analyses with different input parameters to later use provenance to access and compare results based on respective parameters. An assumption that is often made is that by designing an analysis as a workflow, we get parameter-to-result traceability for free with workflow provenance. This assumption holds for cases of coarse-grained traceability, where an entire workflow is subjected to repetition and all workflow parameters contribute to all results. On the other hand for cases requiring finer grained traceability, where a workflow is configured with collections of parameters and analyses within a workfloware repeated with combinations of parameters from collections, this assumption is not guaranteed to hold. In this paper we identify two dimensions that affect fine-grained traceability as: 1) the level of granularity supported by a workflow system in modelling parameters/data in workflows and in provenance, which we name as the level of support for Factorial Design, and 2) the practice of scientists in successfully encoding Factorial Design into workflows. Taverna is a workflow system that provides extensive features for factorial design, meanwhile it provides an uncontrolled approach to workflow design; meaning scientists may create workflows, which, when run, could break traceability in provenance. Using a real-world Taverna workflow we show how broken traceability manifests in provenance and how it can render provenance practically useless for accessing workflow outputs derived from particular input parameters. In order to prevent broken traceability from occurring, we describe a rule-based static analysis technique, which operates over workflow descriptions and anticipates patterns in provenance. Our rules exploit the well-defined execution behaviour in the Taverna system. In order to understand Factorial Design support in workflow systems in general, we provide a comparative survey. We conclude that other workflow systems also provide constructs for Factorial Design, and, similar to Taverna, they too are prone to broken traceability.

U2 - 10.1016/j.future.2017.01.004

DO - 10.1016/j.future.2017.01.004

M3 - Article

VL - 75

SP - 310

EP - 329

JO - Future Generation Computer Systems

JF - Future Generation Computer Systems

SN - 0167-739X

ER -