We study how to perform effective Bayesian inference in high-dimensional sparse Factor Analysis models with a zero-norm, sparsity-inducing prior on the model parameters. Such priors represent a methodological ideal, but Bayesian inference in such models is usually regarded as impractical. We test this view. After empirically characterising the properties of existing algorithmic approaches, we use techniques from statistical mechanics to derive a theory of optimal learning in the restricted setting of sparse PCA with a single factor. Finally, we describe a novel `Dense Message Passing' algorithm (DMP) which achieves near-optimal performance on synthetic data generated from this model.DMP exploits properties of high-dimensional problems to operate successfully on a densely connected graphical model. Similar algorithms have been developed in the statistical physics community and previously applied to inference problems in coding and sparse classification. We demonstrate that DMP out-performs both a newly proposed variational hybrid algorithm and two other recently published algorithms (SPCA and emPCA) on synthetic data while it explains at least the same amount of variance, for a given level of sparsity, in two gene expression datasets used in previous studies of sparse PCA.A significant potential advantage of DMP is that it provides an estimate of the marginal likelihood which can be used for hyperparameter optimisation. We show that, for the single factor case, this estimate exhibits good qualitative agreement both with theoretical predictions and with the hyperparameter posterior inferred by a collapsed Gibbs sampler. Preliminary work on an extension to inference of multiple factors indicates its potential for selecting an optimal model from amongst candidates which differ both in numbers of factors and their levels of sparsity.