Correctly measured classification accuracy is an important aspect not only to classify pre-designated classes such as disease versus control properly, but also to ensure that the biological question can be answered competently. We recognised that there has been minimal investigation of pre-treatment methods and its influence on classification accuracy within the metabolomics literature. The standard approach to pre-treatment prior to classification modelling often incorporates the use of methods such as autoscaling, which positions all variables on a comparable scale thus allowing one to achieve separation of two or more groups (target classes). This is often undertaken without any prior investigation into the influence of the pre-treatment method on the data and supervised learning techniques employed. Whilst this is useful for deriving essential information such as predictive ability or visual interpretation in many cases, as shown in this study the standard approach is not always the most suitable option available. Here, a study has been conducted to investigate the influence of six pre-treatment methods—autoscaling, range, level, Pareto and vast scaling, as well as no scaling—on four classification models, including: principal components-discriminant function analysis (PC-DFA), support vector machines (SVM), random forests (RF) and k-nearest neighbours (kNN)—using three publically available metabolomics data sets. We have demonstrated that undertaking different pre-treatment methods can greatly affect the interpretation of the statistical modelling outputs. The results have shown that data pre-treatment is context dependent and that there was no single superior method for all the data sets used. Whilst we did find that vast scaling produced the most robust models in terms of classification rate for PC-DFA of both NMR spectroscopy data sets, in general we conclude that both vast scaling and autoscaling produced similar and superior results in comparison to the other four pre-treatment methods on both NMR and GC–MS data sets. It is therefore our recommendation that vast scaling is the primary pre-treatment method to use as this method appears to be more stable and robust across all the different classifiers that were conducted in this study.