An external anchor is a common measure, separate fromthe test itself, that we can use to compare the group of test takers taking thenew form with the group taking the reference form. Ideally, the external anchorshould measure the same knowledge and skills as the test to be equated, usingquestions or problems in the same format, administered under the sameconditions. In reality, we cannot oftencome close to this ideal. However, there is one well known test on which scoresare equated through an external anchor design that meets these idealconditions—the SAT Reasoning Test. Each form of the test includes a sectionthat is not the same for all test takers. There are several versions of thissection, spiraled among the test takers, so that the group of test takerstaking each version is a representative sample of the full group of test takersfor that administration. Some test takers get an additional Critical Readingsection; some get an additional Math section; some get an additional Writingsection. For some of the test takers, this section is an anchor that links thecurrent form to a previous form. For others, it is an anchor that will link thecurrent form to a future form. Because the anchor is not taken by all the testtakers, the scores on the anchor are not included in computing the individualscores on the test. The anchor scores are used only for equating. An equatingplan of this complexity would be impractical for most other tests.
在实际的造作过程中,我们发现ETS非常聪明,如果让同学们在同一时间做两套完整的form来获得算分表非常不现实,因为这会让考试时间double。
所以每一位同学都会公平地分配一个additional section的加试,可能是数学、语法、阅读。这相当于把每次考试十几万的考生分成了3组,一组用来获得这套题目数学的算分表,一组获得语法,一组获得阅读。而且因为每次考试数量足够大,还可以用来测试新题目(这是另一个功能,以后我们再详细介绍新题是如何测试出来的)。
通过这种大规模的测试,CB和ETS可以保证每套新题都能相对稳定。
有的同学可能质疑,如果这套form的题目在之前没用过,虽然在实际考试中可获得算分表,CB怎么样来确保整体难度呢?如果这套题目就是整体偏简单,最终算分表不还是会非常严格吗?
别忘了,每次additional section还会进行新题的测试。CB会收集大量新题的表现,并基本测出每道题目的难度。所以在组卷的时候就可以效仿基准试卷,比如中等难度来10道,高难度来5道进行拼盘,就像菜谱一样。
但是在老SAT时代,也曾有过零星几次算分表及其严格的时候。比如,2014年11月的SAT考试,数学算分表就非常严格,感兴趣的读者可以通过链接:https://www.applerouth.com/blog/2014/12/17/the-trouble-with-the-curve/查看外国网站的报道。
总之在这样一套加试体系之下,CB和ETS因为一下两个原因保证算分表有效:
1>所有投入使用的新题都经过大数据测试,每道题目的难度都有详细的数据支撑,便于组卷的人员拿捏尺寸。
2>整套新卷都可以在第一次考试时通过externalanchor test的设计,得到具有公信力的算分表。