Portable workflow and tool descriptions with the CWL

Research output: Contribution to conferenceAbstract

  • External authors:
  • Peter Amstutz
  • Nebojša Tijanić
  • John Kern
  • Luka Stojanovic
  • Tim Pierce
  • John Chilton
  • Maxim Mikheev
  • Samuel Lampa
  • Hervé Ménager
  • Scott Frazer
  • Venkat Sai Malladi
  • Michael R. Crusoe

Abstract

Bioinformatics workflow platforms provide provenance tracking, execution and data management, repeatability, and an environment for data exploration and visualization. Example F/OSS bioinformatics workflow platforms include Arvados, Galaxy, Mobyle, iPlant DiscoveryEnvironment, Apache Taverna and Yabi. Each one presently represent workflows using different vocabularies and formats, and adding new tools requires different procedures for each system.

Neither the description of the workflows nor the descriptions of the tools that power them are usable outside of the platforms they were written for. This results in duplicated effort, reduced reusability, and impedes collaboration.

Three engineers (Peter Amstutz, John Chilton, and Nebojsa Tijanic) from leading bioinformatics platform teams (Curoverse, Galaxy Team, and Seven Bridges Genomics) and a tool author (Michael R. Crusoe / khmer project) started working together at the BOSC 2014 Codefest with an initial focus on developing a portable means of representing, sharing and invoking command line tools which was then the basis for portable workflow descriptions. The group placed high value on re-using existing formats and ontologies; they governed themselves with a lazy consensus / do-ocracy approach.

On March 31st, 2015 the group released their second draft of the Common Workflow Language specification. The serialized form is a YAML document that is validated by an Apache Avro schema and can be interpreted as an RDF graph using JSON-LD. The documents are also valid Wf4Ever 'wfdesc' descriptions after a simple transformation. Future drafts will include the use of the EDAM ontology to describe the tools enabling discovery via the ELIXIR tool registry.

Seven Bridges Genomics, the Galaxy Project, and the organization behind Arvados (Curoverse) have started to implement support for the Common Workflow Language, with interest from other projects and organizations like Apache Taverna, BioDatomics and the Broad Institute. Developers on the Galaxy Team are exploring adding CWL tool description support with plans to add support for the CWL workflow descriptions. Tool authors and other community members will benefit as they will only have to describe their tool and workflow interfaces once. This will enable scientists, researchers and other analysts to share their workflows and pipelines in an interoperable and yet human readable manner.

Bibliographical metadata

Original languageEnglish
Number of pages1
Publication statusPublished - 10 Jul 2015
EventBioinformatics Open Source Conference - ISMB 2015, Dublin, Ireland
Event duration: 10 Jun 201511 Jun 2015
Conference number: 2015
http://www.open-bio.org/wiki/BOSC_2015

Conference

ConferenceBioinformatics Open Source Conference
Abbreviated titleBOSC
CountryIreland
CityDublin
Period10/06/1511/06/15
Internet address

Related information

Publications

Research output: Contribution to conferencePoster

Research output: Contribution to conferencePoster

Research output: Book/ReportCommissioned report

View all (3)