Machine learning (ML) techniques, in particular supervised regression algorithms, are a promising new way to use multiple observables to predict a cluster's mass or other key features. To investigate this approach we use the MACSIS sample of simulated hydrodynamical galaxy clusters to train a variety of ML models, mimicking different datasets. We find that compared to predicting the cluster mass from the 􀀀M relation, the scatter in the predicted-to-true mass ratio can be reduced by a factor of 4, from 0:1300:004 dex ('35 per cent) to 0:0310:001 dex ('7 per cent) when using the same, interloper contaminated (out to 5r200c), spectroscopic galaxy sample. Interestingly, omitting line-of-sight galaxy velocities from the training set has no effect on the scatter when the galaxies are taken from within r200c. We also train ML models to reproduce estimated masses derived from mock X-ray and weak lensing analyses. While the weak lensing masses can be recovered with a similar scatter to that when training on the true mass, the hydrostatic mass suffers from signicantly higher scatter of '0:13 dex ('35 per cent). Training models using dark matter only simulations does not significantly increase the scatter in predicted cluster mass compared to training on simulated clusters with hydrodynamics. In summary, we find ML techniques to offer a powerful method to predict masses for large samples of clusters, a vital requirement for cosmological analysis with future surveys.