This paper describes techniques to implement gradient-descent-based machine learning algorithms on crossbar arrays made of memristors or other analog memory devices. We introduce the Unregulated Step Descent (USD) algorithm, which is an approximation of the steepest descent algorithm, and discuss how it addresses various hardware implementation issues. We discuss the effect of device parameters and their variability on performance of the algorithm by using artificially generated and real-world datasets. In addition to providing insights on the effect of device parameters on learning, we illustrate how the USD algorithm partially offsets the effect of device variability. Finally, we discuss how the USD algorithm can be implemented in crossbar arrays using a simple 4-phase training scheme. The method allows parallel update of crossbar memory elements and reduces the hardware cost and complexity of the training architecture significantly.