Source selection is the problem of identifying a subset of available data sources that best meet a user’s needs. In this paper we propose a user-driven approach to source selection that seeks to identify sources that are most fit for purpose. The approach employs a decision support methodology to take account of a user’s context, to allow end users to tune their preferences by specifying the relative importance between different criteria, looking to find a trade-off solution aligned with his/her preferences. The approach is extensible to incorporate diverse criteria, not drawn from a fixed set, and solutions can use a subset of the data from each selected source, rather than require that sources are used in their entirety or not at all.
The paper describes and motivates the approach, presenting a methodology for modelling a user’s context, and its collection of optimisation algorithms for exploring the space of solutions, and compares and evaluates the resulting algorithms using multiple real world data sets. The experiments show how source selection results are produced that are attuned to each user’s preferences, both with respect to overall weighted utility and through faithful representation of a user’s preferences within a result, while scaling to potentially thousands of sources.