CWL+Research Object == Complete Provenance
Research output: Contribution to conference › Poster › peer-review
Abstract
The term Provenance is referred to as ‘The beginning of something’s existence; something’s origin’ Or ‘A record of ownership of a work of art or an antique, used as a guide to authenticity or quality’. Provenance tracking is crucial in scientific studies where workflows have emerged as an exemplar approach to mechanize data-intensive analyses. Gil et al. analyze challenges of scientific workflows and concluded that formally specified workflow helps
‘accelerate the rate of scientific process’ and facilitates others to reproduce the given experiment provided that provenance of end-to-end process at every level is captured.
We have implemented exemplar GATK variant calling workflow using three approaches to workflow definition namely Galaxy, CWL and Cpipe to identify assumptions implicit in these approaches. These assumptions lead to limited or no understanding of reproducibility requirements due to lack of documentation and comprehensive provenance tracking and resulted in identification of provenance information crucial for genomic workflows.
CWL provides a declarative approach to workflow declaration making minimal assumptions about precise software environment, base software dependencies, configuration settings, alteration of parameters and software versions. It aims to provide an open source extensible standard to build flexible and customized workflows including intricate details of every process. It facilitates capture of information by supporting declaration of requirements, `cwl:tool` and checksums etc. Currently, there is no mechanism to gather the produced information as a result of a workflow run into one bundle for future use. We propose to demonstrate the implementation of a module for CWL.
‘accelerate the rate of scientific process’ and facilitates others to reproduce the given experiment provided that provenance of end-to-end process at every level is captured.
We have implemented exemplar GATK variant calling workflow using three approaches to workflow definition namely Galaxy, CWL and Cpipe to identify assumptions implicit in these approaches. These assumptions lead to limited or no understanding of reproducibility requirements due to lack of documentation and comprehensive provenance tracking and resulted in identification of provenance information crucial for genomic workflows.
CWL provides a declarative approach to workflow declaration making minimal assumptions about precise software environment, base software dependencies, configuration settings, alteration of parameters and software versions. It aims to provide an open source extensible standard to build flexible and customized workflows including intricate details of every process. It facilitates capture of information by supporting declaration of requirements, `cwl:tool` and checksums etc. Currently, there is no mechanism to gather the produced information as a result of a workflow run into one bundle for future use. We propose to demonstrate the implementation of a module for CWL.
Bibliographical metadata
Original language | English |
---|---|
Number of pages | 1 |
DOIs | |
Publication status | Accepted/In press - 14 Jun 2017 |
Event | Bioinformatics Open Source Conference (BOSC) 2017 - ISMB/ECCB 2017, Prague, Czech Republic Event duration: 22 Jul 2017 → 23 Jul 2017 http://www.open-bio.org/wiki/BOSC_2017 |
Conference
Conference | Bioinformatics Open Source Conference (BOSC) 2017 |
---|---|
Abbreviated title | BOSC |
Country | Czech Republic |
City | Prague |
Period | 22/07/17 → 23/07/17 |
Internet address |
Related information
Publications
Research output: Contribution to conference › Abstract › peer-review
Research output: Contribution to conference › Abstract › peer-review
Research output: Contribution to journal › Article › peer-review