Data-driven research requires many people from different domains to collaborate efficiently. The domain scientist collects and analyzes scientific data, the data scientist develops new techniques, and the tool developer implements, optimizes and maintains existing techniques to be used throughout science and industry. Today, however, this data science expertise lies fragmented in loosely connected communities and scattered over many people, making it very hard to find the right expertise, data and tools at the right time. Collaborations are typically small and cross-domain knowledge transfer through the literature is slow. Although progress has been made, it is far from easy for one to build on the latest results of the other and collaborate effortlessly across domains. This slows down data-driven research and innovation, drives up costs and exacerbates the risks associated with the inappropriate use of data science techniques.
We propose to create an open, online collaboration platform, a ‘collaboratory’ for data-driven research, that brings together data scientists, domain scientists and tool developers on the same platform. It will enable data scientists to evaluate their latest techniques on many current scientific datasets, allow domain scientists to discover which techniques work best on their data, and engage tool developers to share in the latest developments. It will change the scale of collaborations from small to potentially massive, and from periodic to real-time. This will be an inclusive movement operating across academia, healthcare, and industry, and empower more students to engage in data science.