Coaching and Test Validity

October 25, 2007

A continuing concern on the part of testing specialists, admissions officers, policy analysts and others is that commercial “coaching” schools for college and graduate school admissions tests, if they are effective in substantially raising students’ scores, could adversely affect the predictive value of such tests in college admissions. It is instructive to examine this concern in some detail.

The coaching debate goes to the heart of several fundamental psychometric questions. What is the nature of the abilities measured by scholastic aptitude tests? Are failures to substantially increase scores through coaching the result of failure in pedagogy or of the inherent difficulty of teaching thinking and reasoning skills? To what extent do score improvements that result from coaching contribute to or detract from test validity?

With respect to the possible adverse affects on predictive validity, three outcomes of coaching are possible. The 2 x 2 table below, a deliberately oversimplified depiction of how college and professional school admissions decisions are actually made, will serve to illustrate these three outcomes. The horizontal axis, representing admission test scores, has been dichotomized into scores below and above the “cut score” for admission. The vertical axis has been dichotomized into successful and unsuccessful performance in school. Applicants in the lower left quadrant and the upper right quadrant represent “correct” admissions decisions. Those in the lower left quadrant (valid rejections) did not achieve scores high enough to be accepted, and, had they been accepted anyway, they would have been unsuccessful. Students in the upper right hand quadrant (valid acceptances) exceeded the cut score on the test and successfully graduated. Applicants in the upper left and lower right quadrants represent incorrect admissions decisions. Those in the upper left quadrant (false rejections) did not achieve scores high enough to be accepted, but had they been accepted, they would have succeeded in college. Those in the lower right quadrant (false acceptances) were accepted in part on the basis of their test scores, but were unsuccessful in college.

One possible effect of coaching is that it might improve both the abilities measured by the tests and the scholastic abilities involved in doing well in college. For the borderline students, coaching in this case (arrow 1) would have the wholly laudatory effect of moving the student from the “valid rejection” category to the “valid acceptance” category. No one could reasonably argue against such an outcome.

A second possible effect of coaching concerns the student who, because of extreme test anxiety or grossly inefficient test-taking strategies, obtains a score that is not indicative of his or her true academic ability. Coaching in the fundamentals of test taking, such as efficient time allocation and appropriate guessing, might cause the student to be more relaxed and thus improve his or her performance. The test will then be a more veridical reflection of ability. This second case might result in the student moving from the false rejection category to the valid acceptance category (arrow 2) and again this is an unarguably positive outcome.

The third possible outcome of coaching is not so clearly salutary. The coached student moves from the valid rejection category to the false acceptance category (arrow 3). The coached student increases his or her performance on the test, but there is no corresponding increase in the student’s ability to get good grades. Case three is an example of what the late David McClelland derisively called “faking high aptitude.”

Actual research on the extent to which these outcomes occurs is conspicuous by its absence. If the first two results dominate, that simply adds to the validity of the test. If the third turns out to be widespread, then it implies not so much deficiencies in our understanding of scholastic aptitude as serious deficiencies in tests designed to measure that aptitude. In any event, more research is needed on precisely such issues. One way to better understand a phenomenon is to attempt to change it. In so doing, we may come to better understand the nature of expert performance, the optimal conditions under which it progresses, and the instructional environments that foster its development.

In 1989, in a more in-depth treatment of the coaching debate, I concluded with the following statement. I believe it applies with equal force today:

The coaching debate will probably continue unabated for some time to come. One reason for this, of course, is that so long as tests are used in college admissions decisions, students will continue to seek a competitive advantage in gaining admission to the college of their choice. A second more scientifically relevant reason is that recent advances in cognitive psychology have provided some hope in explicating the precise nature of aptitude, how it develops, and how it can be enhanced. This line of research was inspired in part by the controversy surrounding the concepts of aptitude and intelligence and the felt inadequacy of our understanding of both. Green (1981) noted that social and political challenges to a discipline have a way of invigorating it, so that the discipline is likely to prosper. So it is with the coaching debate. Our understanding of human intellectual abilities, as well as out attempts to measure them, is likely to profit from what is both a scientific and a social debate.

Green, B. F. (1981). A primer of testing. American Psychologist. 10, 1001-1011.
McClelland, D. (1973). Testing for competence rather than intelligence. American Psychologist.