John Jensen: Evaluating Teachers Objectively

by John Jensen, PhD

In evaluating teachers, we want to know how much a teacher contributes to student learning. Is his or her contribution high, medium, low, or a threat?  If we could determine this, presumably we could hire and retain those on the optimal end of the scale.

One challenge is to separate the teacher’s influence from those originating in the student, the student’s parents, or alternate conditions.  A separate concern is whether we can even find out how much students learn.

Perhaps solving the last issue first might suggest how to gauge the teacher’s contribution.  So first off, how do we tell what children have learned?

I submit that a practical criterion available to any teacher has been almost universally ignored.  Just occasionally I hear of (or recall) a teacher for whom it was their aim. The criterion is retained (instead of temporary) knowledge.

In a sense, all tested knowledge is “retained” in order to be tested. Some, however, has been engraved in a child’s mind for a lifetime, and other will disappear in a few days.  A high school health teacher showed me a test he had just given without realizing he had administered it two weeks earlier.

“Not one student remembered that they had had the same test two weeks ago!” he told me in amazement.

However they studied in that class, the outcome was temporary rather than permanent knowledge.  So how can we separate deep, retained knowledge from temporary, surface, disappearing knowledge?  How do we find this out?  We do so with three conditions.

1.  Entire course mastery.  We make students continually responsible for the entire course back to the beginning of the year.

“But I do this already,” you may object.  You may or may not. The crux is whether your manner of instruction and testing back up your wish.  Two other conditions apply your intent.

2.  Make all tests impromptu.  Test any part of the entire course at random moments with no prior announcement. So that you do not unwittingly tip off students, make up a bag of cards, each containing the title of a section of the course, some brief and others more comprehensive. Randomly draw a day of the week for a test (that you do not tell students), and randomly the name of the section to be tested.  You don’t tell students “Tomorrow is your test on…” as you always have.  They find out instead as the period begins: “Put away your books.  Today we’re testing Chapter three, section ten, about…”. You make the entire curriculum subject to retest at any time.

3.  Keep the last grade. Whatever is the last grade a student receives for a given section goes into his or her transcript as that section’s grade of record. Sections again can be retested at random as they are drawn from the bag.

These three conditions substantially alter instructional focus. By replacing the grades granted for knowledge obtained by cramming, review questions, scaffolding, test construction, and teacher hints, the three conditions extract the learning practiced sufficiently to persist on its own—retained learning.

The difference is illustrated by an experience from my sophomore year in high school.  After we had worked our way through a 400 page world history text and with the  end of the school looming, a brave student asked the teacher one day, “What will be on our final exam?”

“Don’t worry,” he said, smiling benevolently . “Before the test we’ll go over some review questions.”

We all leaned back and grinned. Review questions!  I’d never heard of them, but their promise was that we could dismiss everything else we had studied! Because of such conditions that make testing easier, most scores can be regarded only as approximations of what students continue to know.

The conditions I suggest instead declare forcefully that the goal is permanent retention of as much knowledge as possible. Both teacher and students are recognized for the scores revealing it; scores both valid and reliable, and reflecting accurately the teacher’s ability to generate long-term learning.

A possible objection to this approach is that students already are tested too much, that testing is time taken away from actual learning and presents a distraction. Many would like to turn back the trend (cf. “Texas Considers Reversing Tough Testing and Graduation Requirements, “  New York Times, April 11, 2013).

The  answer is to use tests to stimulate the practice that deepens knowledge.  Not much class time is needed to achieve this. A ten-minute test twice a week may be enough by the means I suggest. Had such conditions been observed for the last couple decades, by now the issue of evaluating teachers would be moot. We would not be concerned about variances among them because all students would be learning simply by the standard focus on retained instead of temporary knowledge.

As the U.S. system gropes today for how to validate the substance of knowledge that might subsist behind a cloud of test scores, interest ranges in search of teacher qualities that make a difference. Anyone seeking this information should retrieve a landmark study by Arthur Combs and associates from the 1960s (“The Perceptual Organization of Effective Teachers,” Florida Studies in the Helping Professions, No. 37, in Arthur W. Combs et al., “Social Sciences”, Gainesville, University of Florida, 1969, cf.

To summarize briefly, researchers sought to discover the difference between the best and worst teachers.  They obtained valid groups of each by asking freshmen entering Florida colleges to name their best and worst teachers, compiled those named consistently, and obtained two pools–those unanimously viewed as the best versus worst. They visited these teachers, inviting them to participate in a study, and administered one test after another to them but turned up no differences.

Resorting finally to classroom observation, they found that clear differences existed not in their behavior but in their belief system. The good ones held positive beliefs about students, about learning, and about the world on twenty independent scales, while the worst held down the negative end of those scales.  That these differences registered so powerfully with students year by year reveals that it really matters what teachers believe about what they do.

To separate other conditions from the teacher’s contribution, as we inquired at the start, ceases to be an issue when children learn well.  Whatever the teacher’s influence is, it hasn’t held students back. But when students aren’t learning, adults parse details mainly to find out who to blame. We solve all issues, in other words, if we simply design standard classroom activity so that the practice of learning results in long-term results for all.

In sum, the two angles outlined above suggest a design for evaluating teachers. First, measure retained learning by the three steps noted above. Whatever the results are, the teachers arranged for them.  Second, resurrect the tool that Combs and his team used to create their groups. Ask students past and present to name their best and worst teachers, and steadily winnow out the worst.

As the saying goes, this is not rocket science.  Just be brave enough to insist on long-term learning and measure it objectively, and brave enough to invite comprehensive feedback. You’ll have no doubt which teachers are high and low and the conclusions will be solid, reliable, and politically defensible.

John Jensen is a licensed clinical psychologist and author of the three-volume Practice Makes Permanent series (Rowman and Littlefield). He will send a proof copy of the volumes to anyone on request:

John Jensen, Ph.D.
John Jensen is a licensed clinical psychologist and education consultant. His three volume Practice Makes Perfect Series is in publication with Rowman and Littlefield, education publishers. The first of the series due in January is Teaching So Students Work Harder and Enjoy It: Practice Makes Perfect. He welcomes comments sent to him directly at