Capturing a human performance with a level of accuracy that allows important information such as emotion and individuality to be retained is usually reserved for exceedingly large, invasive and expensive motion capture studios. We present a comprehensive method designed to capture the underlying physical performance of a human performer in order to retain emotional components using common off-the-shelf hardware. We employ hybrid bottom-up pose proposals based on a mixture of discriminative and generative methods. We demonstrate how human dynamics can be learned from consumer level depth-camera hardware and show for the first time that, while the tracking output from these systems may not be accurate enough to learn monocular feature mappings directly, it is sufficient to learn underlying human dynamics. We show how pose proposals and learned human dynamics can be integrated in an inference engine to calculate a complex human 3D pose from a common RGB monocular image or video. We then describe how this performance can be generalised to drive a range of synthesis engines, preserving performer anonymity while retaining important emotional aspects of the original performance. We also describe a novel peak finding post process which we apply to noisy confidence maps. We also present a novel modification to the Gibbs sampler which outperforms traditional Gibbs sampling in most cases.