Skip to content

High Quality Assessment Essay

Panelists
Linda Darling-HammondEdD, Charles E. Ducommun Professor of Education,
Stanford University; Director, Stanford Center for Opportunity Policy in Education
James W. PellegrinoPhD, Co-Director, Learning Sciences Research Institute,
University of Illinois at Chicago
Bob Wise, President, Alliance for Excellent Education

What does a high-quality assessment system look like? States will have to make important decisions about assessments in the coming years, as the products developed by the two state consortia come on line. To help guide those decisions, nearly two dozen of the nation’s leading assessment experts developed a set of criteria. These criteria address what the assessments should measure, how they should measure these abilities, and how the assessments should be used.

In this webinar, panelists explored the statement of criteria and its implications for states. Linda Darling-Hammondand James Pellegrino described the criteria and what they mean, and Bob Wise moderated the discussion. Panelists also addressed questions submitted by webinar viewers from across the country.

Additional Resources:

Linda Darling-Hammond’s PowerPoint presentation.

Support for this webinar is provided in part by
the William and Flora Hewlett Foundation.

Assessment literacy involves understanding how assessments are made, what type of assessments answer what questions, and how the data from assessments can be used to help teachers, students, parents, and other stakeholders make decisions about teaching and learning.  Assessment designers strive to create assessments that show a high degree of fidelity to the following five traits:

1.  Content Validity
2.  Reliability
3.  Fairness
4.  Student Engagement and Motivation
5.  Consequential Relevance

In this blog post, we’ll cover the first characteristic of quality educational assessments:  content validity.

One of the most important characteristics of any quality assessment is content validity. Simply put, content validity means that the assessment measures what it is intended to measure for its intended purpose, and nothing more.  For example, if an assessment is designed to measure Algebra I performance, then reading comprehension issues should not interfere with a student’s ability to demonstrate what he or she knows, understands, and can do in Algebra I.  Content validity is evidenced at three main levels:  the assessment design level, the assessment experience level, and the assessment question, or item, level.

The assessment design is guided by a content blueprint, a document that clearly articulates the content that will be included in the assessment and the cognitive rigor of that content.   The content standards which the test is designed to assess determine what content makes it into the test’s item pool.

The next level where content validity matters is the assessment experience itself, meaning when the student sits down to take the assessment, what items do they see?  In a fixed form, grade level test, most or all students at a given grade level see the same item set, namely those assessing the grade-level standards to which the student is assigned.  In a cross-grade, computer adaptive test, an item selection algorithm presents each student with items sampled from a broad range of standards and adapts to the in-the-moment performance of the test taker. Each student sees items at the difficulty level that’s appropriate for them, based on their previous responses. This adaptivity enables test developers to provide very precise information about a student’s learning and performance in a domain area.

Content validity is a concept germane to the building block level of MAP as well: the questions, or items, themselves. Experts in both content and assessment design items to measure the concepts and skills in the standards at the indicated levels of cognitive complexity.  Every item in a high-quality assessment goes through a rigorous development process with several levels of review, which ensures that item content is clear, accurate and relevant.  The result is a robust and aligned item pool that serves to provide the most accurate information possible about a student.

Content validity is supported in a number of ways in educational assessments, such as:

+ General assessment design principles that control for readability

+ Content expert review cycles

+ Evidence-centered design methodology

+ Statistical analysis of student performance on test items

One way to check content validity is to ask these guiding questions:

+ How closely does what the assessment measures match the intended (instructed) content?

+ What knowledge or skills does the student most need to perform successfully on this assessment?

+ If the student performs successfully on this assessment, what does that mean?

Content validity is foundational to making accurate inferences. If one is unclear about what the assessment is measuring, then the inferences made will be uninformative – in other words, it means that the assessment has failed in its prime directive: to provide valuable information about what the test taker knows and can do. An assessment can have all sorts of bells and whistles, incorporate cutting edge technology and functionality, have a great suite of reports that tell a compelling assessment narrative, but if the test is lacking content validity, it is not worth much.  What’s more, when data from an assessment that lacks content validity are used to inform instruction, the result could include wasted time and inappropriate growth expectations of students. For these reasons, content validity is central to a high quality educational assessment.

In our next post on characteristics of quality educational assessments, we’ll explore the next trait – reliability. In the meantime, please feel free to share your thoughts on what qualities you think a good educational assessment should have by dropping a comment below.

Sarah Godlove Evans is NWEA’s Assessment Product Manager. With more than twelve years’ experience in education and assessment, she has served as a middle-school math and science teacher, a content design manager, a director of assessment and instructional design, a content specialist, an item and test development consultant, and a pre-service teacher trainer.