Classifying social science concepts by using machine learning and text-mining is often very challenging, particularly due to the fact that social concepts are often defined in a vague manner. In this paper, we put forward a first conceptual step to overcome this challenge. By using the case of social innovation, which has 252 distinct definitions, we qualitatively demonstrated that these definitions group around four different themes where various definitions utilise one or multiple of these criteria in different combinations to define social innovations. We designed an experiment where a database of social innovation projects annotated i) based on an overall understanding and ii) based on a decomposed definition of four criteria. As a next step, we will test the performance of various model specification on these two approaches.