Writing Assessment Scales: Making the Right Choice
Christine Coombe & John Evans
Dubai Men's College, Higher Colleges of Technology
Framing the Issue
The field of writing assessment has always been at the forefront of measuring second language performance, only now there is a much greater need for accountability, in terms of validity and reliability. The perceived parameters of the stakeholders have expanded considerably to include industry and international institutions (McKay, 1991). Consequently skills are now internationally transferable so there is a greater need to ensure that all aspects of assessments are as consistent as possible.
Fortunately, the pace of change in the fields of measurement has enabled an expansion in the empirical research into performance assessment (McNamara, 2000). The assessment of performance usually involves markers or ‘raters’ passing subjective judgement on various writing tasks against some pre-agreed scale of criteria, or rating bands. These scales or bands are being designed to distinguish between varying skills instead of giving a single impressionistic score. This brings about a need to examine the ways in which factors such as consistency in marking and relevance of the banding criteria can be monitored and adjusted to ensure reliability and validity in language assessment.
Validity and reliability are central to effective testing practice. In its purest form, validity is defined as “the extent to which a test or examination does what it is designed to do.” (Alderson, Clapham & Wall, 1996). Reliability is another important cornerstone of good testing practice. Reliability refers to the overall extent to which a test measures consistently (Bailey, 1998). If results from a given test are to be taken as valid they must also prove to be reliable and should provide similar results for the same sample doing the same test. One would also expect ‘intermarker reliability’, where different markers...