|
|
|
|
|
 |
Search published articles |
 |
|
Showing 2 results for Multifaceted Rasch Measurement (mfrm)
Wander Lowie, Houman Bijani, Mohammad Reza Oroji, Zeinab Khalafi, Pouya Abbasi, Volume 26, Issue 2 (9-2023)
Abstract
Performance testing including the use of rating scales has become highly widespread in the evaluation of second/foreign oral assessment. However, few studies have used a pre-, post-training design investigating the impact of a training program on the reduction of raters’ biases to the rating scale categories resulting in increase in their consistency measures. Besides, no study has used MFRM including the facets of test takers’ ability, raters’ severity, task difficulty, group expertise, scale category, and test version all in a single study. 20 EFL teachers rated the oral performances produced by 200 test takers before and after a training program using an analytic rating scale including fluency, grammar, vocabulary, intelligibility, cohesion and comprehension categories. The outcome of the study indicated that MFRM can be used to investigate raters’ scoring behavior and can result in enhancement in rater training and validating the functionality of the rating scale descriptors. Training can also result in higher levels of interrater consistency and reduced levels of severity/leniency; however, it cannot turn raters into duplicates of one another, but can make them more self-consistent. Training helped raters use the descriptors of the rating scale more efficiently of its various band descriptors resulting in reduced halo effect. Finally, the raters improved consistency and reduced rater-scale category biases after the training program. The remaining differences regarding bias measures could probably be attributed to the result of different ways of interpreting the scoring rubrics which is due to raters’ confusion in the accurate application of the scale.
Zahra Orouji, Houman Bijani, Mohammadreza Oroji, Volume 28, Issue 1 (4-2025)
Abstract
As oral language proficiency assessment relies on human judgment, raters play a crucial role in performance-based testing. Among rater-related variables, rating experience has received considerable attention. Previous research on rater training has shown that extremely severe or lenient raters often benefit most from training, leading to changes in rating behavior. However, many of these studies have applied FACETS to only one or two facets and have rarely employed pre- and post-training designs. In addition, empirical findings have been inconsistent, providing no clear evidence as to whether experienced or inexperienced raters demonstrate greater rating reliability. The present study investigated the impact of rater training on experienced and inexperienced raters. Twenty raters evaluated the oral performances of 200 test takers before and after participating in a training program. The results indicated that training increased interrater consistency and reduced bias in the use of rating scale categories. The findings further suggested that, given the difficulty of fully eliminating rater variability, rater training should prioritize improving intrarater reliability rather than focusing exclusively on agreement among raters. Both experienced and inexperienced raters showed improved rating quality following training; however, inexperienced raters demonstrated greater gains. These results suggest that inexperienced raters should not be excluded from rating solely due to limited experience. As inexperienced raters are also more cost-effective, the findings imply that testing authorities may benefit more from investing in effective rater-training programs than from allocating substantial resources to recruiting highly experienced raters.
|
|
|
|
|
|
|
|
|