Addressing common concerns about concept inventories

posted July 8, 2016 and revised March 30, 2017
by Adrian Madsen, Sam McKagan and Eleanor Sayre

Concept inventories are useful for assessing the effectiveness of your teaching, but as you use them, concerns and questions often come up. Here we discuss some common concerns about using concept inventories and related research that addresses these concerns. 

This concept inventory doesn’t assess all the content I cover in my course; is it still useful?

This is true and concept inventories should be used as one measure of student understanding and the effectiveness of instruction in your course. However, it is a particularly good measure because the kind of conceptual understanding probed by concept inventories is one of the most difficult aspects of a physics course to master. Further, because concept inventories are standardized they uniquely allow for comparisons to be made across instructors and institutions making it a powerful way to judge the effectiveness of instructional methods. That said, other forms of assessment such as exams, homework problems, labs, in class questions and discussions or measures of student attitudes and beliefs should be used in tandem with concept inventories to understand what your students are learning and experiencing in a class.

Will giving this concept inventory as a pre-test influence the post-test results?

Henderson 2002 compared concept inventory post-test scores between students who took the pre-test and those who did not and found no significant difference.

Does question order matter?

If it does matter, it doesn't matter very much. Gray, Rebello and Zollman 2002 manipulated the order of questions on a concept inventory (the FCI). In one ordered set of pre-test questions, there was a small but statistically significant difference in the percentage of students who chose a certain incorrect answer. The same effect did not occur on the post-test.

Does the content of the distractors (multiple-choice answers) matter?

Yes, to some extent. When the distracters were changed to have more everyday or “female” contexts, males and females had the same overall score as on the original, but the patterns of answers varied on individual questions (McCullough and Meltzer 2001). In another study, new distracters were created based on student open-ended responses. Here, again, the same percentage of students answered correctly overall, but the incorrect answers chosen were different than those chosen on the original (Rebello and Zollman 2004).

Would students answer differently if the concept inventory consisted of open-ended questions instead of multiple-choice questions?

Yes, Rebello and Zollman 2004 compared the responses of students who answered open-ended versions and multiple-choice versions of the same concept inventory questions. They did not find differences in the correct answers given by students, but they did find differences in the incorrect answers. Their analysis of the open-ended responses revealed new distracters that did not appear on the original test and distracters from the original test that did not appear in the open-ended responses. 

Should concept inventories be used as a placement test?      

No. A placement test is most effective if it can distinguish students who will succeed in the course and students who will not. Henderson 2002 found that there is a non-zero number of students who have very low pre-test scores but still earned an A in the course. It would be inappropriate to use a concept inventory to recommend these students not take the physics course, because there is still a chance they will be successful.

Are concept inventories gender-biased?

Many studies have found that males consistently score about 10% higher than females on the mechanics concept inventory pre-tests. Males also usually score about 10% higher on the post-test, but this effect shows more variability across studies. Numerous factors that may influence this differential score based on gender have been investigated, for example math preparation or standardized test scores. No one factor has been able to account for the gender gap in scores. It is likely that this gender gap in scores is caused by a combination of many small factors. (Madsen, McKagan and Sayre 2013)

What about teaching to the test?

A common concern about unusually high scores on concept inventories is that an instructor is teaching to the test. This usually involves going over the exact test questions in class. Since concept inventories are meant to gauge the effectiveness of teaching, scores should reflect normal classroom practices. Conventional wisdom is that instructors can assign and go over questions that cover similar concepts to those on the concept inventories, but should not show or discuss the exact test questions before the test. Because each concept inventory covers only a few common physics topics, an ordinary physics class that covers many topics will necessarily spend little time on each of them. If your class spends a lot of time on a few topics, your post-test scores might be larger than someone who only spends a little time.

Image ©Alberto G via Flickr CCBY