Metabolomics, like other omics methods, produces huge datasets of biological variables, often accompanied by the necessary metadata. However, regardless of the form in which these are produced they are merely the ground substance for assisting us in answering biological questions. In this short tutorial review and position paper we seek to set out some of the elements of “best practice” in the optimal acquisition of such data, and in the means by which they may be turned into reliable knowledge. Many of these steps involve the solution of what amount to combinatorial optimization problems, and methods developed for these, especially those based on evolutionary computing, are proving valuable. This is done in terms of a “pipeline” that goes from the design of good experiments, through instrumental optimization, data storage and manipulation, the chemometric data processing methods in common use, and the necessary means of validation and cross-validation for giving conclusions that are credible and likely to be robust when applied in comparable circumstances to samples not used in their generation.