This thesis studies a family of unsupervised learning algorithms called feature distribution learning and their extension to perform covariate shift adaptation. Unsupervised learning is one of the most active areas of research in machine learning, and a central challenge in this field is to develop simple and robust algorithms able to work in real-world scenarios. A traditional assumption of machine learning is the independence and identical distribution of data. Unfortunately, in realistic conditions this assumption is often unmet and the performances of traditional algorithms may be severely compromised. Covariate shift adaptation has then developed as a lively sub-field concerned with designing algorithms that can account for covariate shift, that is for a difference in the distribution of training and test samples. The first part of this dissertation focuses on the study of a family of unsupervised learning algorithms that has been recently proposed and has shown promise: feature distribution learning; in particular, sparse filtering, the most representative feature distribution learning algorithm, has commanded interest because of its simplicity and state-of-the-art performance. Despite its success and its frequent adoption, sparse filtering lacks any strong theoretical justification. This research questions how feature distribution learning can be rigorously formalized and how the dynamics of sparse filtering can be explained. These questions are answered by first putting forward a new definition of feature distribution learning based on concepts from information theory and optimization theory; relying on this, a theoretical analysis of sparse filtering is carried out, which is validated on both synthetic and real-world data sets. In the second part, the use of feature distribution learning algorithms to perform covariate shift adaptation is considered. Indeed, because of their definition and apparent insensitivity to the problem of modelling data distributions, feature distribution learning algorithms seems particularly fit to deal with covariate shift. This research questions whether and how feature distribution learning may be fruitfully employed to perform covariate shift adaptation. After making explicit the conditions of success for performing covariate shift adaptation, a theoretical analysis of sparse filtering and another novel algorithm, periodic sparse filtering, is carried out; this allows for the determination of the specific conditions under which these algorithms successfully work. Finally, a comparison of these sparse filtering-based algorithms against other traditional algorithms aimed at covariate shift adaptation is offered, showing that the novel algorithm is able to achieve competitive performance. In conclusion, this thesis provides a new rigorous framework to analyse and design feature distribution learning algorithms; it sheds light on the hidden assumptions behind sparse filtering, offering a clear understanding of its conditions of success; it uncovers the potential and the limitations of sparse filtering-based algorithm in performing covariate shift adaptation. These results are relevant both for researchers interested in furthering the understanding of unsupervised learning algorithms and for practitioners interested in deploying feature distribution learning in an informed way.