Written by Wisewire
On an average day, a teacher asks upwards of 400 questions, and that doesn’t even count the written assessments. Have you ever wondered if you’re asking the right questions? Questions that are so perfectly crafted that they truly tell you if students have mastered the content? Let’s test that idea. The Wisewire team of learning experts is excited to contribute the first in a series of blogs focused on helping you create effective classroom assessments and interpreting the results. We’ve spent decades designing, writing, and editing high stakes assessments alongside hundreds of digital and print products and look forward to sharing our insights with you to enrich your classroom. Let’s get to it…
Which of these assessment items is stronger, and why?
How about these?
On the surface, all four items may seem valid. Each asks a straightforward question or presents a clear, specific task, and each has one indisputably correct answer (D). Yet items 2 and 4 are significantly stronger. Items 1 and 3 have such fundamental flaws that neither can be a valid assessment of knowledge.
What Parts Make Up A Valid Item?
A valid assessment item meets two main criteria:
- Students who have mastered the relevant content will answer correctly.
- Students who have not mastered the relevant content will answer incorrectly.
Like a vehicle whose parts must all function smoothly to get you to your destination, an assessment item has parts that work together to achieve its goals. An item is made of five distinct parts: (1) the standard; (2) the stimulus; (3) the stem; (4) the correct answer, or key; and (5) the distractor options.
1. The Standard
Your item must assess the relevant standard. In the example shown, students analyze a food chain for a forest ecosystem. This task aligns with the stated standard. The item would no longer be valid if it asked students this question instead: Which organism is a predator?
2. The Stimulus
A stimulus is additional information for students to consider before selecting an answer. If your item includes a stimulus, it should be integral to the solution—that is, students must use the stimulus to identify the correct answer. A stimulus should not be superfluous or distracting; it should serve a purpose. It may be hard to resist adding cutesy clip art to an otherwise “boring” test, but for your students’ and your sake: resist!
3. The Stem
An item’s stem poses the problem for students to solve. A stem may be a question to answer, a task to perform, or a statement to complete. Regardless, the stem should target only one piece of information. To answer the example question, students must know where a fox belongs in a food chain.
Suppose the stimulus shown also replaced “grass” with a question mark, and the stem asked students to identify both missing organisms. The answer choices might consist of pairs of organisms: e.g., “grass; fox”. How would you know how to address a student’s knowledge gap? An incorrect answer in this case would not reveal the student’s error: Is it grass, fox, or both? Better instead to write two distinct items, each targeting only one part of the food chain.
Also regarding stems: Generally, it’s best to avoid negative words such as not and except; they risk confusing students and obscuring the source of error.
Consider this item:
The word not adds a layer of complexity that increases difficulty, but the added challenge is unrelated to the content being assessed—it reveals nothing more about whether students know what a predator is. What’s more, knowing a deer is not a predator does not prove that students can identify actual predators. A positive stem—such as: Which of these animals is a predator?—works best. To preserve the original four answer choices, you could use a multiple-select format; a correct answer would consist of B, C, and D but not A. By asking direct questions, you have a much more powerful effect when assessing for understanding.
4. The Key
An item’s key must be unambiguously correct. In the example, only “fox” can complete the food chain; flies, mushrooms, and rabbits do not eat frogs.
5. The Distractors
In our experience, issues with distractors are the most common reasons an item is invalidated. Similar to the key, distractors must unambiguously be incorrect. However they also must be plausible options based on common misconceptions, and they must fit within the context of your stimulus and stem. That’s a pretty heavy load for one part to carry.
Suppose distractor C in our example was changed to “shark.” A shark is not part of a forest ecosystem, but it could potentially compete with the correct answer, as a shark would likely eat a frog if given the opportunity. Furthermore, in a typical ocean ecosystem a shark plays the same role that the fox plays in this forest: both are third-level consumers (they eat animals that eat other animals). For these reasons, “shark” is an ambiguous distractor. Did you ever think such an innocent option you added to your test after hours of grading papers when you were just trying to get to bed could undermine your assessment? Ambiguous distractors force students to make unreasonably fine distinctions and thus exceed the scope of the relevant standard.
Each pair of items presented earlier illustrates how distractors may unintentionally clue students to the key. In the first pair, students must identify the prepositional phrase. However, in Item 1 only the correct answer is a phrase; the three distractors are complete sentences, or clauses. Even if students can’t distinguish between phrases and clauses, they can see an obvious difference that all but gives away the answer: Choices A, B, and C begin with capital letters and end in periods. Choice D—the key—is therefore an outlier, an option that stands out from the set in some way; people are drawn to outliers when guessing. In other words, students who have not mastered the relevant content—who cannot identify prepositional phrases—are still likely to guess the correct answer. Item 2 corrects this problem—all four answer choices are similarly punctuated. There are no outliers, so the item is valid.
In the second pair, students must identify a specialized molecule. However, in Item 3 the distractors are everyday terms for familiar substances: sugar, water, ozone. In contrast, the key is significantly longer and more technical. (Deoxyribonucleic acid looks and sounds like a science term.) What do you think students will likely guess if they lack the necessary knowledge? Again, the presence of an outlier invalidates the item. Item 4 corrects the problem—all four answer choices are similarly long and “science-y.” Only students who understand how the body stores genetic information have reason to choose the key over the three distractors. (Of course, a fraction of guessers will guess correctly, but this outcome is inherent to the multiple-choice format. A valid item contains no clues that unintentionally abet guessing.)
In our next post, we’ll dive deeper into the world of outliers to add to your assessment writing toolkit. For now, keep in mind that the only way to guard against outliers is to think critically about each answer choice you give your students. Does it contain a clue that might help students guess correctly regardless of their content knowledge? If so, replace the choice or revise it to eliminate the clue. This critical mindset, applied to the entire writing process, will help ensure that your items assess precisely—and only—what you intend them to assess.