By Sue Brookhart, NCME Newsletter Editor

Two views of formative assessment are presented in the two articles. Whether you as a reader have already formed a strong opinion about this topic (as I confess I have), or whether you are new to exploring the issue, you will learn from these two pieces. All readers will find they present well-written arguments for their respective points of view.

By Lee Jones, Johnna Gueorguieva, and Scott Bishop, Riverside Publishing Company

Over the past two decades, as the national and local emphasis on improving education for all students increased, many states established standards for student achievement along with criterion referenced testing programs. In 2002, the enactment of the federal No Child Left Behind (NCLB) legislation firmly established state assessment programs as tools for accountability. Under NCLB, states, districts and schools currently face repercussions-the withholding of federal funds, the transfer of students to higher-performing schools, and state control of low-performing schools-if the achievement of all student subgroups does not exhibit adequate yearly progress. With test stakes greater than ever, school administrators face intense pressures to ensure that all students perform well and that their schools show significant improvement each year. In the face of these pressures, school leaders are seeking quick and effective interventions for improving student achievement. They are turning to"formative assessments" to be part of this solution.

In this article, we describe the expansion of the use of formative assessments, describe the increase in the use of a specific type of assessment used for formative purposes, ("interim benchmark assessments"), and reflect on the potential challenges these pose for the educational measurement community.

Formative assessments traditionally have been characterized as assessments that: 1) are conducted on a frequent or ongoing basis, 2)are tightly integrated with daily instruction, 3) enable nearly immediate adjustment of instructional actions by teachers and learning behaviors by students, and 4) have as an ultimate goal the improvement of student learning.

Over recent years, however, the range of assessments referred to as "formative" has expanded, now referring to a continuum of practice from informal teacher observations and classroom-administered mini assessments designed to provide "just-in-time" feedback, to middle stakes district and (perhaps eventually) statewide tests designed to inform instruction before end-of-year state accountability testing. While assessments across this continuum all serve formative purposes, i.e., the goal of improving instruction and student learning, the more assessments move from teacher observation and a tight integration with daily classroom instruction, the more they move from the traditional definition of formative assessments described above.

The fact that a range of assessments are now used for formative purposes is probably good for education. For example, many districts that administer norm-referenced tests for the purpose of obtaining profiles of their students' knowledge and skills relative to a nationally-representative sample can also use the tests for the value added formative purpose of identifying student strengths and weaknesses relative to state standards covered on the NRT, to focus instruction on areas where improvement is needed. Indeed, an unintended benefit of a broad interpretation of formative assessments may be that teachers are more inclined to regard tests and test results as actionable, rather than passively receiving tests scores as top-down and disassociated from the curriculum.

Increasing numbers of districts are implementing a type of formative assessment often referred to as "interim benchmark assessments."These are assessments administered periodically during the year, usually district- or statewide, for the purpose of assessing expected learning to date in specific areas covered by district or state standards. Also, several recent Requests for Proposals for statewide testing programs have asked that test publishers include formative assessment components that are similar to interim benchmark assessments in their offers for assessment solutions. More typically, large school districts solicit bids for these programs independently of the state.

What are the typical characteristics of the interim benchmark assessment programs that are emerging from school districts? The typical program models the NCLB testing requirements: content areas usually focus on reading, mathematics, and science and span Grades 3 through 8, although frequently assessments extend from Grades 2 through 10. Assessments are administered approximately quarterly, and in some instances more frequently. This allows each periodic assessment to focus only on learning goals for a defined period of instruction. Thus, more test items can be allocated to specific domains than is usually possible on the state's NCLB test. A near-immediate analysis and reporting of results is desired. Scores are invariably reported in relation to specific state standards or grade level expectations in order to provide diagnostic information regarding strengths and weaknesses relative to the performance benchmarks that students are expected to achieve at particular points in the school year. There is usually an expectation that performance data can be aggregated at the class, school, and district level for each participating grade so that results can be used not only by teachers to implement desired instructional interventions at the individual student or class level, but so that results can inform short- and longer term institutional programmatic decisions that also will improve student learning. The explicit assumption is that improvement of learning in gap areas identified by interim benchmark assessments will improve performance in those areas.

The goals are noble, but this is a heavy burden to place on an assessment program that also requires quick scoring turnaround and rapid reporting of results. Some states and districts are requesting services for their programs that pose significant technical challenges, including scaling, equating, and other types of score linking. Requests for validity evidence are frequent, including studies of the relationships between interim benchmark assessments results and performance on the state's NCLB assessment. Costly and complex data-collection designs will be required in order to support the uses of interim benchmark assessments to make inferences about individual student strengths and weaknesses, growth in student learning, prediction of future performance, and efficacy of instructional intervention.

These are not insurmountable challenges, but the demand from policy-makers for acceleration of improvement in learning creates a tension that could work against the development and implementation of successful interim benchmark programs. Certainly this will require the establishment of an ongoing research base and data collection designs that are more robust and complex than might have been envisioned for more traditionally-defined formative assessment programs.

In closing, we will re-state the observation that regardless of one's sociopolitical view of NCLB, the measurement challenges that continue to emerge from its implementation still justify its NCME anointed nickname of "No Psychometrician Left Behind." There is still much work to do.

By Judy Arter and Rick Stiggins, Assessment Training Institute


Formative assessment is the use of assessment processes, materials and results to help maximize student learning during the course of instruction. This contrasts with summative assessment which seeks to judge the sufficiency of student achievement at a particular point in time.

Effective formative assessment requires more than assessing frequently and, as important as it is, more than teachers using the findings from frequent assessments to plan the next steps in instruction. Effective formative assessment includes teachers providing descriptive feedback to students. It also involves students in understanding the learning targets they are to hit, becoming accurate assessors of those learning targets, assessing self and peers in relationship to those learning targets, tracking their own progress toward those learning targets, and describing what they know, how they've grown over time, and their next steps in learning. In short, effective formative assessment involves action on the part of both teachers and students.

In this we have followed the lead of the Assessment Reform Group (ARG) in the UK. For us, formative assessment is what the ARG calls "assessment for learning." Assessment for learning is intended to inform instructional decisions made both by teachers and their students , not just more frequently, but in a continuous manner as learning unfolds. Done well, it reveals not only which students are meeting which state standard, but how each student is progressing up the scaffolding leading to mastery of each standard. It relies on a range of assessment methods to generate evidence that may be unique to an individual student or classroom. Assessment for learning goes beyond identifying who needs more help to literally being that instructional help.

Reasons this is an Important Topic

This view of formative assessment is important because it is assessment for learning practices that have yielded the remarkable gains in student achievement reported in the research literature over the past 30 years. For example, in his original mastery learning research, Bloom and his students (1984) made extensive use of classroom assessment in support of learning and reported subsequent effect sizes on student test performance of one to two standard deviations. Black and Wiliam's (1998) research review synthesized some 40 studies from around the world on the impact of effective classroom assessment and reported effect sizes of .4 to .7 standard deviation, with the largest gains coming for the lowest achievers. Meisels and his colleagues (2003) involved students in performance assessments and reported effect sizes of over one and a half standard deviations on subsequent tests. Finally, Rodriguez (2004) reported effects of similar size in U.S. TIMMS math performance arising from the effective management of classroom assessment. Black and Wiliam (1998) report the actions that teachers should take to create these gains. On their list were providing descriptive feedback to students and student involvement in their own assessment.


The biggest issues for us right now are (1) the extent to which current formative assessments being implemented by districts have the features that enable them to be used by teachers and students as assessments for learning, and (2) educator preparation to implement assessment for learning.

Districts and teachers need to think through the design of the short cycle, benchmark, or common assessments being selected or developed as formative. To be truly useful, instructional assessments need to have the following features (e.g., Popham, 2003; Herman, et al. 2004):

. Everyone needs to have a common understanding of what skill or knowledge students must master and the assessment adequately covers those.

. Each assessment covers only a few learning targets in enough detail to both draw a good conclusion about level of mastery and provide diagnostic information about student misconceptions.

. Each assessment is tied closely to instruction so that it is available while instruction on the relevant learning targets progresses.

. Each assessment uses the assessment method (selected-response through performance assessment) most appropriate for the type of learning being developed (knowledge through skills and products).

. Assessments are tailored to where in the continuum of learning each student currently fits.

. Assessment materials, such as rubrics, are available in student friendly versions.

These features enable teachers to understand at the point of need exactly where students are having successes, understand on which subparts of complex standards students are having trouble (and exactly what trouble they're having), and provide descriptive feedback to students (or, better yet, have students practice giving themselves descriptive feedback).

Dylan Wiliam (2004, p. 4) points out, "In the United States, the term 'formative assessment' is often used to describe assessments that are used to provide information on the likely performance of students on state-mandated tests-a usage that might better be described as'early-warning summative.'"

Lorrie Shepard (2005, slides 28 and 29) talks about the idea of formative assessment being hijacked. She says, "Data-driven instruction and commercial test publishers have produced systems that, if used frequently, will produce the next round of inauthentic, test-driven curricula."

In other words, schools and districts can't make assessment for learning operational merely by purchasing tests, scoring services and information management systems. Such systems are not bad; they serve the needs of certain decision-makers. But, they don't automatically provide the assessment for learning characteristics that lead to the improved achievement gains shown in the literature. This is especially true if the "formative" assessments:

. Occur quarterly (or even monthly); teachers and students make decisions multiple times each day.

. Use only multiple-choice items; some student learning outcomes require other assessment methods.

. Cover multiple standards, none of which in detail; teachers and students need specific information about what students are doing well and their next steps in learning.

. Are lock-step: everyone gives exactly the same test at the same time.

. Are "early-warning summative."

Even if formative assessments are effectively designed, there aren't enough measurement experts in the world to devise for teachers all the assessments needed to provide the daily diagnostic information teachers and students need. Only teachers can do this.

The issue is that educators don't automatically understand how to use assessment information to plan instruction (e.g., Ayala, 2005; Kim,2005) or how to use assessment materials and procedures (such as test specifications, rubrics, and student exemplar work) to make learning targets clear to students, provide descriptive feedback to students, and meaningfully involve students in their own assessment and goal setting. Educators have typically not had the opportunity to learn about, experience, or see concrete examples of assessment for learning.

We in the measurement community know all too well that these have rarely been part of the teacher or administrator preparation curriculum (e.g., Crooks, 1988; Stiggins, 1999). So, a major question in making assessment for learning operational remains, how do we get teachers and school leaders the opportunity to learn how to use sound classroom assessment practices?


Ayala, C.C. (2005). Assessment Pedagogies Model . Paper presented at AERA, Montreal.

Black, P., & Wiliam, D. (1998). Assessment and classroom learning . Educational Assessment: Principles, Policy and Practice . 5(1), 7-74.Also summarized in an article entitled, Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan. 80(2), 139-148.

Bloom, B. (1984). The search for methods of group instruction as effective as one-to-one tutoring. Educational Leadership, 41(8), 4-17.

Crooks, T.J. (1988). The impact of classroom evaluation on students. Review of Educational Research . 58(4), 438-481.

Herman, J.L., Baker, E.L., and Linn, Robert L. (2004). Accountability systems in support of student learning: Moving to the next generation. CRESST LINE , Spring 2004, pp. 1-7.

Kim, K.K. (2005). Student progress monitoring: Implementing a scientifically valid practice in the real world . Paper presented at AERA, Montreal.

Meisels, S., Atkins-Burnett, S., Xue, Y., & Bickel, D. D. (2003). Creating a system of accountability: The impact of instructional assessment on elementary children's achievement scores. Educational Policy Analysis Archives . 11(9), 19 pp. Retrieved from

Popham, J. (2003). Are your state's NCLB tests instructionally insensitive? Here's how to tell! Paper prepared for the National School Boards Association, February 2003. Similar points are made in: Building tests that support instruction and accountability: A guide for policymakers ( ) and Crafting curricular aims for instructionally supportive assessment ( )

Rodriguez, M. C., (2004). The role of classroom assessment in student performance on TIMSS. Applied Measurement in Education . 17(1), 1-24.

Sadler, D. R. 1989. Formative assessment and the design of instructional systems. Instructional Science, 18 : 119-144.

Shepard, L.A. (2005). Competing paradigms for classroom assessment: Echoes of the tests-and-measurement model . Lecture and PowerPoint slides presented at the annual meeting of the American Educational Research Association, Montreal, April 2005.

Stiggins, R.J. (1999). Evaluating classroom assessment training in teacher education. Educational Measurement: Issues and Practice . 18 (1), 23-27.

Wiliam, D. (2004). Keeping learning on track: Integrating assessment with instruction . Invited address to the 30th annual conference of the International Association for Educational Assessment held in June 2004 in Philadelphia



June 9th, 2005 -

A Global Leading News Source

Career Index

Plan your career as an educator using our free online datacase of useful information.

View All

On Twitter

Pennsylvania governor's race focusing on Corbett's #education spending, budgets #edchat #edreform

7 hours ago

Univ. of Iowa floats 3-year bachelor's degree plan #highered #edchat #education

7 hours ago

The future of Scotland's #highered system hinges on tomorrow's independence vote #education #edchat #ukedchat

7 hours ago

On Facebook


Enter your email to subscribe to daily Education News!

Hot Topics