Background: There is growing interest to use digital photographs in dental epidemiology. However, the reporting of procedures and metric-based performance outcomes from training to promote data quality prior to actual scoring of digital images has not been optimal. Methods: A training study was undertaken to assess training methodology and to select a group of scorers to assess images for dental fluorosis captured during the 2013–2014 National Health and Nutrition Examination Survey (NHANES). Ten examiners and 2 reference examiners assessed dental fluorosis using the Deans Index (DI) and the Thylstrup-Fejerskov (TF) Index. Trainees were evaluated using 128 digital images of upper anterior central incisors at three different periods and with approximately 40 participants during two other periods. Scoring of all digital images was done using a secured, web-based system. Results: When assessing for nominal fluorosis (apparent vs. non-apparent), the unweighted Kappa for DI ranged from 0.68 to 0.77 and when using an ordinal scale, the linear-weighted kappa for DI ranged from 0.43 to 0.69 during the final evaluation. When assessing for nominal fluorosis using TF, the unweighted Kappa ranged from 0.67 to 0.89 and when using an ordinal scale, the linear-weighted kappa for TF ranged from 0.61 to 0.77 during the final evaluation. No examiner improvement was observed when a clinical assessment feature was added during training to assess dental fluorosis using TF, results using DI was less clear. Conclusion: Providing examiners theoretical material and scoring criteria prior to training may be minimally sufficient to calibrate examiners to score digital photographs. There may be some benefit in providing an in-person training to discuss criteria and review previously scored images. Previous experience as a clinical examiner seems to provide a slight advantage at scoring photographs for DI, but minimizing the number of scorers does improve inter-examiner concordance for both DI and TF.