Presents a method for evaluating educational software. The evaluation is designed as a field study, and is comprised of a test of remedial skills, an essay test of conceptual understanding, and a system that records how students use a given program. The instruments were used to evaluate ConStatS, a program for teaching conceptual understanding of probability and statistics. Ss were 327 undergraduates who used ConStatS and 63 control Ss who used tool-based statistics software but not ConStatS. Ss in the experimental group did better on 92 of the 103 questions than control Ss; the 10 questions on which the experimental group showed the greatest improvement over the control group involved transformations, probability, and the concepts of deviation and sensitivity of summary measures. (PsycLIT Database Copyright 1995 American Psychological Assn, all rights reserved)