|
|
|
|
Search published articles |
|
|
Showing 2 results for Inter-Rater Reliability
Masoumeh Ahmadi Shirazi, Volume 16, Issue 1 (3-2013)
Abstract
The present study reports the processes of development and use of an Analytic Dichotomous Evaluation Checklist (ADEC) which aims at enhancing both inter- and intra-rater reliability of writing evaluation. The ADEC consists of a total of 68 items that comprises five subscales of content, organization, grammar, vocabulary, and mechanics. Eight raters assessed the writing performance of 20 Iranian EFL students using the ADEC. Also, the raters were asked to rate the same sample of essays holistically based on Test of Written English (TWE) scale. To examine the inter-rater and intra-rater reliability of the ADEC, multiple approaches were employed including correlation coefficient, the dichotomous Rasch Model, and many-faceted Rasch measurement (MFRM). The findings of the study confirmed that the ADEC introduces higher reliability into scoring procedure compared with holistic scoring. Future research with greater number of raters and examinees may provide robust evidence to use analytic scale rather than holistic one.
Mahnaz Saeidi, Mandana Yousefi, Purya Baghayei, Volume 16, Issue 1 (3-2013)
Abstract
Evidence suggests that variability in the ratings of students’ essays results not only from their differences in their writing ability, but also from certain extraneous sources. In other words, the outcome of the rating of essays can be biased by factors which relate to the rater, task, and situation, or an interaction of all or any of these factors which make the inferences and decisions made about students’ writing ability undependable. The purpose of this study, therefore, was to examine the issue of variability in rater judgments as a source of measurement error this was done in relation to EFL learners’ essay writing assessment. Thirty two Iranian sophomore students majoring in English language participated in this study. The learners’ narrative essays were rated by six different raters and the results were analyzed using many-facet Rasch measurement as implemented in the computer program FACETS. The findings suggest that there are significant differences among raters concerning their harshness as well as several cases of bias due to the rater-examinee interaction. This study provides a valuable understanding of how effective and reliable rating can be realized, and how the fairness and accuracy of subjective performance can be assessed.
|
|
|
|
|
|