Extracting Format Transformation Examples from Manual Data Corrections

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

One of the challenges in data analysis is the substantial cost of human
involvement. Before any analysis can take place, data from heterogeneous sources needs to be cleaned, integrated and transformed into a uniform format. This tasks, also known as `'data wrangling" often requires both technical skills and knowledge from domain experts. Because effort performed during data wrangling, including format transformation, is usually task-dependent and often tailored to specific sources, it gives rise to a repetitive, time-consuming and labour intensive process. Current tools support data scientists in conducting wrangling steps, such as the creation of format transformation rules, but the problem of iterative manual work to inform the creation of such rules remains. We propose an approach that observes the actions of data scientists at work
correcting errors in a query result. Specifically, we aim to extract format
transformation examples from manual corrections carried out by data scientists, that can be used to synthesize format transformation programs. In so doing, the objective is to re-use information about recurring manual corrections to automate subsequent transformations. In this paper, we propose example generation and filtering techniques for extracting format transformation examples from manual corrections, and evaluate the techniques empirically on a variety of format transformation tasks.

Bibliographical metadata

Original languageEnglish
Title of host publicationNew Trends in Databases and Information Systems
Subtitle of host publicationADBIS 2018 Short Papers and Workshops, AI*QA, BIGPMED, CSACDB, M2U, BigDataMAPS, ISTREND, DC, Budapest, Hungary, September, 2-5, 2018, Proceedings
PublisherSpringer International Publishing AG
Chapter1
Pages3-11
Number of pages8
Volume909
ISBN (Electronic)978-3-030-00063-9
ISBN (Print)978-3-030-00062-2
DOIs
StatePublished - 2018