Exploiting Execution Provenance to Explain Difference Between Two Data-Intensive Computations

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

  • Authors:
  • Priyaa Thavasimani


Successful e-science requires control over variations of an experiment, typically encoded as a script or workflow, as well as the ability to transfer existing experiments to other environments in a reproducible way. Variations may be introduced either deliberately or because of inaccurate porting, for instance when the target environment does not satisfy the original dependencies on data and software libraries. Although these variations are captured by various provenance capturing systems, they were not exploited in an effective way to explain difference between two computations. In this paper we address the problem of explaining the observed differences in the outcomes from two such experiment variations, in terms of differences in the execution traces of those experiments. While experiments may differ widely in their structure and implementation, our hypothesis is that a general method for producing such explanations only needs to rely on the provenance of the experiment execution, for which using a standard data model, i.e., W3C PROV, is available. To test this hypothesis in a concrete workflow setting, we have developed why-diff, an algorithm to match two provenance traces derived from the execution of workflows that are variations of one another. We present the algorithm, show how it derives a delta graph which encodes the differences between the traces and thus provides the basis for generating human-readable explanations, and evaluate it’s performance in terms of the number of comparisons required during the matching process.

Bibliographical metadata

Original languageEnglish
Title of host publicationExploiting execution provenance to explain difference between two data-intensive computations
Number of pages2
Publication statusPublished - 2018