May | 2008 | Assessment

The Assessment of “Understanding”

May 29, 2008

Study to remember and you will forget.
Study to understand and you will remember.
—Anonymous

I once sat on the dissertation committee of a graduate student in mathematics education who had examined whether advanced graduate students in math and science education could explain the logic underlying a popular procedure for extracting square roots by hand. Few could explain why the procedure worked. Intrigued by the results, she decided to investigate whether they could explain the logic underlying long division. To her surprise, most in her sample could not. All of the students were adept at division, but few understood why the procedure worked.

In a series of studies at Johns Hopkins University, researchers found that first year physics students could unerringly solve fairly sophisticated problems in classical physics involving moving bodies, but many did not understand the implications of their answers for the behavior of objects in the real world. For example, many could not draw the proper trajectories of objects cut from a swinging pendulum that their equations implied.

What then does it mean to “understand” something—a concept, a scientific principle, an extended rhetorical argument, a procedure or algorithm? What questions might classroom teachers ask of their students, the answers to which would allow a strong inference that the students “understood”? Every educator from kindergarten through graduate and professional school must grapple almost daily with this fundamental question. Do my students really “get it”? Do they genuinely understand the principle I was trying to get across at a level deeper than mere regurgitation? Rather than confront the problem head on, some teachers, perhaps in frustration, sidestep it. Rather then assign projects or construct examinations that probe students’ deep understanding, they require only that students apply the learned procedures to problems highly similar to those discussed in class. Other teachers with the inclination, time and wherewithal often resort to essay tests that invite their students to probe more deeply, but as often as not their students decline the invitation and stay on the surface.

I have thought about issues surrounding the measurement of understanding on and off for years, but have not systematically followed the literature on the topic. On a lark, I conducted three separate Google searches and obtained the following results:

“nature of understanding” 41,600 hits

“measurement of understanding” 66,000 hits

“assessment of understanding” 34,000 hits

Even with the addition of “classroom” to the search, the number of hits exceeded 9,000 for each search. The listings covered the spectrum—from suggestions to elementary school teachers on how to detect “bugs” in children’s understanding of addition and subtraction, to discussions of laboratory studies of brain activity during problem solving, to abstruse philosophical discussions in hermeneutics and epistemology. Clearly, this approach was taking me everywhere, which is to say, nowhere.

Fully aware that I am ignoring much that has been learned, I decided instead to draw upon personal experience—some 30 years in the classroom—to come up with a list of criteria that classroom teachers might use to assess understanding. The list is undoubtedly incomplete, but it is my hope that it will encourage teachers to not only think more carefully about how understanding might be assessed, but also—and perhaps more importantly—encourage them to think more creatively about the kinds of activities they assign their classes. These activities should stimulate students to study for understanding, rather than for mere regurgitation at test time.

The student who understands a principle, rule, procedure or concept should be able to do the following tasks (these are presented in no particular order and their actual difficulties are an empirical question):

Construct problems that illustrate the concept, principle, rule or procedure in question.
As the two anecdotes above illustrate, students may know how to use a procedure or solve specific textbook problems in a domain, but may still not fully understand the principle involved. A more stringent test of understanding would be that they can construct problems themselves that illustrate the principle. In addition to revealing much to instructors about the nature of students’ understanding, problem construction by students can be a powerful learning experience in its own right, for it requires the student to think carefully about such things as problem constraints and data sufficiency.

Identify and, if possible, correct a flawed application of a principle or procedure.
This is basically a check on conceptual and procedural knowledge. If a student truly understands a concept, principle or procedure, she should be able to recognize when it is faithfully and properly applied and when it is not. In the latter case, she should be able to explain and correct the misapplication.

Distinguish between instances and non-instances of a principle; or stated somewhat differently, recognize and explain “problem isomorphs,” that is, problems that differ in their context or surface features, but are illustrations of the same underlying principle.
In a famous and highly cited study by Michelene Chi and her colleagues at the Learning Research and Development Center, novice physics students and professors of physics were each presented with problems typically found in college physics texts and asked to sort or categorized them into groups that “go together” in some sense. They were then asked to explain the basis for their categorization. The basic finding (since replicated in many different disciplines) was that the novice physics students tended to sort problems on the basis of their surface features (e.g., pulley problems, work problems), whereas the experts tended to sort problems on the basis of their “deep structure,” the underlying physical laws that they illustrated (e.g., Newton’s third law of motion, the second law of thermodynamics). This profoundly revealing finding is usually discussed in the context of expert-novice comparisons and in studies of how proficiency develops, but it is also a powerful illustration of deep understanding.

Explain a principle or concept to a naïve audience.
One of the most difficult questions on an examination I took in graduate school was the following: “How would you explain factor analysis to your mother?” That I remember this question over 30 years later is strong testimony to the effect it had on me. I struggled mightily with it. But the question forced me to think about the underlying meaning of factor analysis in ways that had not occurred to me before.

Mathematics educator and researcher, Liping Ma, in her classic exposition Knowing and Teaching Elementary Mathematics (Lawrence Erlbaum, 1999), describes the difficulty some fifth and sixth grade teachers in the United States encounter in explaining fundamental mathematical concepts to their charges. Many of the teachers in her sample, for example, confused division by 1/2 with division by two. The teachers could see on a verbal level that the two were different but they could neither explain the difference nor the numerical implications of that difference. It follows that they could not devise simple story problems and other exercises for fifth and sixth graders that would demonstrate the difference.

To be sure, students may well understand a principle, procedure or concept without being able to do all of the above. But a student who can do none of the above almost certainly does not understand, and students who can perform all of the above tasks flawlessly almost certainly do understand.

One point appears certain: relying solely on the problems at the end of each chapter in text books, many of which have been written by harried and stressed-out graduate students, will not assure that our students understand the concepts we wish to teach them. The extended essay has been the solution of choice for many instructors whose teaching load and class size permit such a luxury. But less labor intensive ways of assessing understanding are sorely needed.

Some Thoughts on Effective Schooling, No Child Left Behind and the Achievement Gap

May 5, 2008

A perpetually vexing problem in American education is the substantially lower mean levels of achievement in virtually all academic subjects by African American, Hispanic, and poor students. The problem is evident from varied but consistent indices: lower grades, lower performance on state-mandated standardized tests, substantially higher drop out rates, and lower average performance on college admissions tests.

Historically, two schools of thought have dominated the debate over how best to gauge whether individual schools are doing a good job of educating these students. One might be called the “valued-added” school and the other the “final status” school. Advocates of the value-added criterion maintain that the only reasonable and fair standard for assessing school effectiveness is how effectively schools educate students, given their entering level of achievement. The argument is that it is simply unreasonable to expect schools in the nation’s large urban areas to produce the same levels of achievement as well-funded suburban schools. In their paper in ERS Spectrum (Spring, 2005), entitled “The Perfect Storm in Urban Schools: Student, Teacher, and Principal Transience,” researchers Hampton and Purcell of Cleveland State University describe in painful detail the dimensions of the problems faced by the vast majority of the nation’s urban schools. The picture they describe is not pretty. Against a community backdrop of linguistic diversity, broken-homes, poverty, joblessness, and despair is a confluence of transiencies—a transience of students, a transience of teachers, a transience of principals, and, they might well have added, a transience of superintendents. All combine to form a “perfect storm” that could not have been purposefully scripted better to produce lasting and pervasive failure. No wonder the modest “value-added” approach to assessing school quality has such widespread appeal.

The alternative view is that a goal of modest year-to-year growth for students who are seriously behind their peers is both defeatist and demeaning. Clinical mental retardation excepted, all students can learn and can achieve at high levels, and accepting anything less than excellence is to admit defeat. Moreover, the value-added approach to assessing school effectiveness carries for many the odious implication that such limited achievement is all that these students are capable of.

The argument of the “final status” advocates gains considerable credibility when they point to “existence proofs,” inner-city schools whose students’ performance on any number of achievement measures is comparable to those of the best schools in the metropolitan area. The R. L. Vann School in the poverty-stricken “Hill District” of Pittsburgh, Pennsylvania, with a 99% African American student body, is a case in point. Although I have not followed its progress in recent years, throughout the 1970’s and 80’s, the school consistently performed on a par with the best schools in the area on any number of standardized achievement tests in math and English Language Arts. For readers with a statistical bent, the situation is dramatically illustrated when the Pittsburgh school medians on standardized tests are plotted against school SES (as indexed by “percent free lunch”). On first blush, the scatter plot of points appears to be a misprint, with the Vann School appearing as an outlier in the extreme upper left hand corner of the swarm of points. The school is in the top quarter in achievement and the bottom quarter in SES.

The controversial and politically explosive No Child Left Behind Act (NCLB) has placed both the Achievement Gap and the “value added vs. final status” controversy in stark relief. NCLB requires among other things that states specify for their schools “adequate yearly progress” toward reducing the achievement gap. The legislation has reawakened a host of old and difficult questions: What will we accept as “adequate yearly progress?” What role should standardized tests play in monitoring student achievement and in evaluating teacher and principal effectiveness? What is the best way to gauge “school effectiveness?” Put more starkly, What do we mean by a “successful” or “effective” school, and what do we mean by a “failing” or “unsuccessful” one? These questions take on enormous political, social and even moral overtones when they are applied equally to an under-funded urban school populated primarily by poor and minority children, on the one hand, and to a well-funded suburban school populated by middle and upper-class majority students, on the other.

The most contentious provisions of the bill are the series of sanctions for continued failure to meet the specified adequate yearly progress. These cover the spectrum from developing and implementing a plan for improvement, to allowing the affected students to change schools, to turning the school over to the state or a private, for-profit agency with a proven record of success. Several states have sued in federal court arguing that such sanctions without federally appropriated money to finance needed improvements are unconstitutional.

In such a climate, where the very motives of each side in the debate are often impugned, it is easy to lose sight of what should be our common goal. We may disagree about means and methods, but we should be united in our commitment as educators and citizens to the ultimate end in view, exemplified in the words of no less a thinker than John Dewey. A century ago he wrote, “What the best and wisest parent wants for his own child, that must the community want for all its children. Any other ideal for our schools is narrow and unlovely; acted upon it destroys our democracy.”

References

Dewey, J. (1907). The School and Society. Chicago: University of Chicago Press (1907).

Hampton, F., & Purcell, T. (2005). “The Perfect Storm in Urban Schools: Student, Teacher, and Principal Transience.” ERS Spectrum, 23(2), 12-22.
1 row in set (0.00 sec)