assessment of directed writing by a group of tesl …pustaka2.upsi.edu.my/eprints/406/1/assessment...

14Assessment of Directed Writing by a Group of

TESL Student at UPSI

N o rm a h b in ti O thm an

Abstract

Twenty-eight TESL students from UPSI and one experienced English SPM Examination examiner (Expert Rater 1) assessed twenty-five samples of Form Four ESL secondary school students’ writing. Twelve of the TESL students used the analytic scoring method; seven used the primary trait scoring method; and eight used the holistic scoring method. The expert rater and one of the TESL students (Teacher A) used the English SPM Examination scoring method. Teacher A and Expert Rater 1 assessed the writing samples individually, but the other twenty-eight TESL-students were gathered during three separate seminars and workshops. The scores from the twenty-eight TESL students, Teacher A and Expert Rater 1 were correlated using descriptive statistics and non-parametric (Spearman Rho) calculations. The analysis of these scores showed that there was a significant correlation coefficient between the subjects, even though they used three different scoring methods to assess the writing samples. The TESL students who took part in the assessment during the seminars and workshops agreed that the three scoring methods were suitable for classroom assessment as compared to the English SPM Examination scoring method, which was more suitable for standardized assessment. A few strengths and weaknesses of each scoring method were identified and solutions for use in the classroom assessment were recorded in the salient features of assessment.

Introduction

TESL teachers’ assessment of ESL students’ writing plays an important role in the process of teaching. It is also important that their assessment provides confidence and motivation for students to excel in their school-based, district-based, state- based and national-based ESL examinations in Malaysian secondary schools.

Assessment o f Directed Writing by a Group o f TESL Student at UPSI / 197

TESL teachers in schools normally handle the school-based ESL examinations, and a selected group of TESL teachers who work together in a committee handle the district-based and the state-based ESL examinations. The Malaysian Examination Board (Lembaga Peperiksaan Malaysia) handles the national-based ESL examination at the Sijil Peperiksaan Malaysia (SPM) level.

The function of the school-based, district-based and state-based examinations in Malaysian secondary schools is to give insight into the progress of students’ learning and achievement while still in school, whereas the national-based examinations are to give final grades that determine the students’ future undertaking in their studies. Nevertheless all levels of examinations are important to the students’ learning process. As Rabinowitz (2001) says, statewide, or nationwide tests cannot yield the detailed information necessary to target instruction for individual students. So this leaves a clear and essential role for local assessments or school-based assessments to develop diagnostic information about what students do well, where they are having difficulty and how the instructional program might be adjusted to address their specific needs.

For school-based, district-based and state-based examinations, teachers normally adopt certain sets of scoring methods to assess their students’ writing depending on the objectives of assessments. And there are many types of scoring methods available for teachers to refer to when assessing their students’ writing tasks. Each scoring method is different from another in the sense that each has different criteria of looking at students’ writing product. For example, one scoring method looks at a student’s writing product generally and does not go into detail about analyzing the student’s grammar performance, whereas another scoring method looks into detail the grammar performance. No matter what kind of criteria each scoring method has, the ultimate aim is the same, that is, to grade students’ writing.

It is important to make sure that the interpretation of students’ writing performance, regardless of what kind of scoring methods used, helps the students in their process of learning. It was discovered that a few scoring schemes in several countries/school systems employ similar aspects. This means that these school systems are interested in evaluating the same elements in students’ writing. That is why the kinds of writing offered to students in almost all schools are more or less in the same form and serves the same function (Takala, 1988).

Some of the kinds of writing that teachers normally give to their students in schools all over the world are essays, summaries, note taking, letter writing, paraphrasing, report writing and directed writing. Teachers in assessing their students’ performance in language normally use these kinds of writing to evaluate students’ language performance.

“Written language has always played a dominant role in formal education. Typically the acquisition of literacy is considered to be one of the most important tasks of the school, not only as a vehicle of learning, but as a means of achieving

198 / Prosiding Seminar Penyelidikan

other goals as well” (Takala, 1988). Written examination in the SPM examination is a very important type of assessment that determines the students’ performance in the language use. And the result of the assessment is important because a good grade for English examination is a prerequisite for the students to further their studies in certain fields such as medicine, pure science and mathematics, especially at a foreign institution.

The International Association for the Evaluation of Educational Achievement (IEA) has carried out several studies on writing tasks and scoring scale. The IEA, which was founded in 1959, has conducted much research to compare the educational performance of school students in various countries and systems of education around the world (Gorman, 1988: vii). The IEA’s study of written composition began in 1980 and the findings were published in several volumes. The writing tasks studied were pragmatic writing, letter writing, summary writing, descriptive writing, narrative writing, open writing, argumentative/ persuasive writing and reflective writing. There are also studies conducted to investigate the effectiveness of some scoring methods used to assess students' writing tasks.

Statement of the Problem

TESL teachers who conduct the school-based, district-based and state-based assessments of writing provide immediate feedback for the students, and thus enable the students to progress iij their learning process, and in their preparation for the national-based examination. The nation-based assessment of writing provides grades that determine the students’ future undertakings in their further studies. Expert raters conduct the national-based assessment and the Malaysian Examination Board (Lembaga Peperiksaan Malaysia) trains the expert raters to assess the national examination. It is important then for TESL teachers to balance their school-based, district-based and state-based assessments with the national- based assessment of writing for the sake of students’ learning progress and school service improvement.

The issue about balancing school-based assessment and the nation-based assessment does not only apply to Malaysian schools. In America for instance, the issue about balancing state and local assessment has been raised by school administrators. State assessment in America refers to their national level assessment, and local assessment is their school-based or district-based assessment. Rabinowitz (2001) in his article about balancing state and local assessments in American schools finds that local assessment programs are still relevant because effective local assessment is essential to improve student learning, and that locally developed and administered assessment programs have unique capacity to provide diagnostic information that, when understood and used effectively, has immediate impact on classroom practices.


Another researcher who shares the same view with Rabinowitz about balancing state and local assessment in America is Stiggins (2002), who believes that the current assessment systems in American education are harming huge numbers of students. The harm arises directly from the failure to balance the use of standardized tests and classroom assessments in the service of school improvement. He also believes that student achievement suffers because the once- a-year tests are incapable of providing teachers with the moment-to-moment and day-to-day information about students achievement that they need to make crucial instructional decisions. Stiggins (2002) suggests that teachers rely on classroom assessment to make crucial instructional decisions. However the problem is that teachers are unable to gather or effectively use dependable information on students achievement each day because of the drain of resources for excessive standardized testing. There are no resources left to train teachers to create and conduct appropriate classroom assessments. For the same reason, Stiggins (2002) states that district and building administrators have not been trained to build' assessment systems that balance standardized tests and classroom assessments. As a result of these chronic and long-standing problems, classroom; building; district; state; and national assessment systems remain in constant crisis, and students suffer the consequences.

The issue about balancing local and national assessment involves students’ performance in writing, because locally and nationally standardized examinations normally require students to write continuously in a given period. For example, the PMR, the SPM, the STPM and the MUET nationally standardized examinations in Malaysia involve writing components. So writing is commonly used to assess students’ language skills and their learning in many academic content-areas. Thus the need to provide students with fair and supportable assessment approaches is very important because many decisions rest on writing assessment. It is imperative that decision makers, national examiners, national raters and schoolteachers who provide language performance report based on assessment of students’ writing give a fair report that really depicts the students’ actual performance because the report given determine the students’ future undertakings and even future career. So it is necessary to study the needs for validated school-based assessment methods that TESL teachers can make used of to balance school-based and nation-based assessment.

Objectives

Teachers assign writing tasks for different instructional purposes: to have learners imitate some model of writing; to train learners in the use and manipulation of linguistic and rhetorical forms; to reinforce material that students have already learned; to improve learners’ writing fluency; to encourage authentic communication whereby the writer really wants to impart the information and the


reader is genuinely interested in receiving it; and to learn how to integrate all the purposes mentioned above, with the emphasis on improving the whole performance, not just one of its aspects (Raimes, 1987 as quoted by Cohen, 1994). Taking into consideration all these purposes as to why teachers assign writing tasks to their students, the present study developed two main objectives that cover TESL teachers’ role in assessing their students’ writing tasks; and the effectiveness of the holistic scoring method, the analytic scoring method and the primary trait scoring method to assess directed writing.

The two main objectives are:1. To investigate TESL teachers’ assessment of directed writing product written

by Form Four ESL secondary students. This writing product was given in Paper Two of the English SPM Examination in Malaysia.

2. To establish the construct validity of the holistic scoring method, analytic scoring method and primary trait scoring method to assess directed writing in ESL classrooms in Malaysia.

The specific objectives of this study are:1. To record and analyze the salient features of assessment verbalized by the

TESL teachers as they assessed directed writing.2. To design and establish the holistic, analytic and primary trait scoring

methods to assess directed writing.3. To test the validity and reliability of the holistic, analytic and primary trait

scoring methbds designed to assess the directed writing.4. To identify the strengths and weaknesses of the holistic scoring method,

analytic scoring method and primary trait scoring method used in this study to assess directed writing.

5. To establish the concurrent validity of the scores obtained from the TESL teachers after assessing directed writing using the holistic scoring method, analytic scoring method and primary trait scoring method, as compared to the scores given by the expert raters who trained by the Malaysian Examination Board (Lembaga Peperiksaan Malaysia).

Research Questions

1. How Do Tesl Teachers Assess Directed Writing, And What Were The Salient Features Of Assessment That They Verbalized As They Reacted To The Writing Products Using The Holistic Scoring Method, The Analytic Scoring Method And The Primary Trait Scoring Method?

2. To what extent we the holistic scoring method, the analytic scoring method and the primary trait scoring method valid and reliable for assessment of directed writing?


3. What is the relationship between the scores given by TESL teachers using the holistic scoring method, the analytic scoring method and the primary trait scoring method with the scores given by the expert rater, using the SPM examination scoring method?

Literature Review

English as a Second Language (ESL) is a compulsory subject in the SPM examination. It is crucial for ESL students to do well in the subject due to the importance of the language. There are three major components tested in the SPM English Examination: Oral English, Paper One and Paper Two. In Paper Two of the SPM examination, the students are tested on three types of writing: directed writing, summary writing and essay writing. The three types of writing tasks given n Paper Two of the SPM examination are very important for secondary school ESL students in Malaysia. As much as it is important for the students to do well in the writing tasks, it is also equally so for TESL teachers to assess their students’ writing well enough to ensure that the grades given really depict the students’ actual performance in writing.

Teachers’ assessment o f students’ writing can greatly influence students’ attitudes for future learning because they can be easily confused by unclear, vague or ambiguous responses and can become frustrated with their writing progress and hopes for the results in their examination. Alternatively, students can be positively motivated if the assessment given to their written work reflects their actual performance in the national level examination. Unfortunately, there is no clear set of universal guidelines that will guarantee such a supportive and positive experience for all students. In a given context for writing instruction, students will differ, and tasks, topics, and responses will differ (Grabe and Kaplan, 1996: 377).

Schoolteachers might be using different ways and methods to assess their students’ writing tasks, depending on the school authorities’ instruction to them. Students will not be able to predict their actual performance in the national examination if the performance assessment method adopted by their ESL teachers may not be the same as the national raters’ performance assessment method. Normally, TESL teachers in secondary schools invite the English SPM national raters who are considered as the expert raters to come to their schools to conduct seminars and workshops for their students before they sit for the national examination. This is to ensure that their students get some exposure about the national raters’ expectation when assessing their writing product in the national level examination. The national raters are trained by the Malaysian Examination Board (Lembaga Peperiksaan Malaysia) on how to assess the English SPM examination according to their standards. The Malaysian Examination Board has its own scoring method that national raters or the expert raters have to refer to when assessing the English SPM examination papers.


There are also similarities and differences in the scoring methods available for examiners and raters to refer to when assessing students’ writing. The similarities will not cause problems for examiners or raters to give grades they refer to different scoring methods. The problem lies in the differences because it might cause differences in scores given to the students’ writing. If we look into two scoring methods and compare the elements that each look into, we will see some differences in their focus. For example holistic scoring method looks into one single integrated score of writing behavior. It is interested in responding to the writing as a whole and respondents are unlikely to be penalized for poor performance on one lesser aspect, for example grammatical ability (Cohen, 1994:314). On the other hand, primary trait scoring method narrows its focus on a specific aspect of the writing piece. So if two TESL teachers assess the same writing piece, one with the holistic scoring method and the other with primary trait scoring method, their focus while assessing the writing piece is definitely not the same. The problem is whether the scores that they give to the same writing piece differ or not.

Cohen (1994:312) states that writers and teachers or raters differ in so many aspects related to the assessment of writing. He quotes Ruth and Murphy as saying that:1. Writers will differ in their notions about the significance of particular features

of the topic.2. Students and their teachers (raters) differ in their recognition and

interpretation of salient points in a writing topic (with teachers having a wealth of professional experience in the evaluation of writing while students have only their own experience as test takers).

3. Student writers may construct different writing tasks for themselves at different stages in their development.

Ruth and Murphy’s findings, as quoted by Cohen (1994) above tell us that it is universally accepted that writers and their raters differ in some ways or other. Even if two raters are given the same scoring method to assess the same writing piece, there are bound to be differences in their judgment.

There are a few studies that look into assessment of writing performance and their score relationship. For example Johnson, Penny and Gordon (2001) studied score resolution and the inter-rater reliability of holistic scores in rating essays; Hayes (2000) studied the consistency of student performance on holistically scored writing assignments; and Swartz, Hooper, Montgomery, Wakely, et al (1999) studied on using generalizability theory to estimate the reliability of waiting scores derived from holistic and analytical scoring methods. Despite of many studies conducted that were related to writing assessment and scoring relationship, Crehan and Hudson (2001) stated that unresolved concerns remain


for the more basic issues of objective and reliable scoring of performance assessments, especially for writing products. Both of them conducted a study on a comparison of two scoring strategies for performance assessments.

Research Design

This research involves a case study with twenty-eight TESL students from Universiti Pendidikan Sultan Idris and one experienced TESL teacher. One of the TESL students was involved in teaching a class of Form Four ESL students, during which the directed writing samples were taken as instrument for this research. Three separate seminars and workshops were conducted to gather twenty-seven TESL students, during which they were required to assess the directed writing samples. While assessing the directed writing samples, their salient features of assessment were recorded. The experienced TESL teacher was an expert rater who had twelve years experience in assessing the English SPM Examination papers.

During the seminars and the workshops the TESL students were trained to assess the directed writing samples using the three scoring methods devised for this research, McNamara (2000:44) believed that initial and on-going rater training was an important way to improve the quality of rater-mediated assessment, schemes. The training normally took the form of a moderation meeting, and this moderation meeting had the function of bringing about broad agreement on the relevant interpretation of level descriptors and rating categories. During moderation meeting, discrepancies were noted and were discussed in detail, with particular attention being paid to the way in which the level descriptors were being interpreted by individual raters. The scoring methods, which were the independent variables, were the holistic scoring method, primary trait scoring method and analytic scoring method.

Instrumentation

Twenty-five samples of directed writing taken from one class of Form Four ESL students were given to TESL students and the expert rater to be assessed. The expert rater and the TESL student involved in teaching the class used the English SPM Examination scoring method to assess the directed writing samples. Twenty- seven TESL students assessed the directed writing samples using one of the three scoring methods given to them. Three scoring methods were designed for the TESL students to use in assessing the directed writing samples. They were the holistic scoring method, the analytic scoring method and the primary trait scoring method.


Data Collection

A naturalistic observation was conducted when a TESL student taught a class of Form Four ESL students, during which twenty-five directed writing samples were taken as instrument for this research. This was to ensure that the writing samples were valid and reliable for research. Twenty-seven TESL students were gathered in three separate seminars and workshops to assess the directed writing samples. During these seminars and workshops the scores that these TESL students gave to the writing samples were collected after they had assessed the directed writing samples. Their salient features of assessment were recorded during these seminars and workshops. The directed writing samples were given to the expert rater to be assessed individually.

Data Analysis

The scores obtained from the TESL students and the expert rater were correlated using the SPSS program to see the relationship of the scores. The scores were correlated by the spearman rho. Apart from that descriptive statistics of the scores were also calculated using the SPSS program.

Result

The Scores Obtained From the Assessment Of Directed Writing

The TESL student who was involved in teaching the class of Form Four ESL students was referred to as Teacher A (TA); twenty-seven TESL students involved in the seminars and workshops were referred to as Rater 1 to Rater 27; and the expert rater was referred to as Expert Rater 1 in the report of this research. Out of twenty-seven TESL students gathered during the seminars and workshops, twelve of them (R1 to R12) used the analytic scoring method; seven (R13 to R19) used the primary trait scoring method; and eight (R20 to R27) used the holistic scoring method. Teacher A (TA) and the expert rater (ER1) assessed the writing samples using the English SPM Examination scoring method. Three TESL students did not complete assessing the writing samples given to them. They were Rater 5 who managed to assess only eleven samples; Rater 6 only twenty-four samples; and Rater 13 only twenty samples.

To see the relationship of the scores given by the TESL students and the expert rater who assessed twenty-five samples of directed writing, the scores were grouped into five categories: excellent (25 - 30 marks); good (20 - 24 marks); average (10 - 19 marks); poor (5 - 9); and very poor (0 - 4 marks). This answered research question number three (what was the relationship between the scores given by TESL students using the holistic scoring method, the analytic scoring

Assessment o f Directed Writing by a Croup o f TESL Student at UPSI / 205

method and the primary trait scoring method with the scores given by the expert rater and Teacher A, using the SPM examination scoring method?).

Teacher was a teacher trainee who had no experience assessing English SPM Examination papers whereas Expert Rater 1 had seven years of experience assessing the English SPM Examination papers. Even though their experience varied there are some similarities in the categories of scores given to some writing samples that they had assessed. For example, both Teacher A and Expert Rater 1 gave excellent scores to DW60, DW61, DW62 and DW71; good scores to DW51, DW52, DW56, DW58, DW59, DW68 and DW70; and average scores to DW64 and DW73. The amount of writing samples that received similar categories of scores by both ESL teachers were thirteen out of twenty-five that was 52%. None of them gave poor and very poor scores to any of the writing samples. 46% of the total scores given by Teacher A and Expert Rater 1 fell into the good band; 32% into the excellent band; and 22% into the average band.

Seven (58%) out of twelve raters using the analytic scoring method gave excellent scores to DW60; five raters (42%) to DW54, DW62; four raters (33%) to DW63, DW71; two raters (17%) to DW57, DW61, DW75; and one rater (8%) to DW 51, DW58. Seven out of twelve raters gave good scores to DW75; six raters to DW54, DW61; five raters to DW58, DW60, DW73. DW70; four raters to DW52, DW59, DW62, DW68, DW71; three raters to DW57; two raters to DW51, DW55, DW53; one rater to DW56, DW65, DW67, DW72, and DW74. The majority number of scores for raters who used the analytic scoring method fell into the average band. Ten out of twelve raters using the analytic scoring method gave average scores to DW69, DW72; nine raters to DW51, DW53, DW56, DW65, DW66, DW67; eight raters to DW55, DW74; seven raters to DW57, DW52, DW68, DW73; five raters to DW59, DW70; four raters to DW58, DW61; two raters to DW62, DW63, DW64, DW71, DW75; and one rater to DW54. Eight out of twelve raters using the analytic scoring method to assess directed writing gave poor scores to DW64; three raters to DW56, DW73; two raters to DW55, DW59, DW66, DW69; and one rater to, DW52, DW53, DW58, DW65, DW67, DW68, DW70, DW71, and DW74. 52% of total scores given by all twelve raters using the analytic scoring method to assess directed writing samples fell into the average band; 26% fell into the good band; 12% into the excellent band; and 10% into the poor band. None of the raters using the analytic scoring method gave very poor scores to any of the writing samples.

Five (71%) out of seven raters who assessed the directed writing samples using the primary trait scoring method gave excellent scores to DW60; four raters (57%) to DW61; three raters (43%) to DW54; two raters (29%) to DW51, DW52; and one rater (14%) to DW57, DW59, DW63, DW69 and DW71, DW74. For the good scores six out of seven raters (86%) gave good scores to DW59; five raters (71%) to DW62; four raters (57%) to DW51, DW59, DW75; three raters (47%) to DW54, DW71, DW72; two raters (29%) to DW52, DW57, DW58, DW66,


DW67, DW68, DW70, and DW74; and one rater (14%) to DW53, DW55, DW56, DW60, DW61, DW63, DW64, DW65, DW69, and DW73. The majority of scores given by seven raters using the primary trait scoring method was in the average band. Six raters (86%) gave average scores to DW56; five raters (71%) to DW55, DW58, DW63, DW64, DW65, DW66, DW67, DW68, DW69, DW70, DW73; four raters (57%) to DW53, DW57; three raters (47%) to DW71, DW72, DW74, DW75; two raters (29%) to DW52. DW61, DW62; and one rater (14%) to DW60: Only three raters who used the primary trait scoring method gave poor scores to six writing samples. One rater (14%) gave average scores to DW53, DW64, DW65, DW72, DW73, and DW74. For the primary trait scoring method, 52% of the total scores by all the raters fell into the average band; 32% into the good band; 13% into the excellent band; and 3% into the poor band. None of the raters gave very poor scores to any of the writing samples.

Six (75%) out of eight raters who used the holistic scoring method to assess directed writing samples gave excellent scores to DW60; three raters (38%) to DW54, DW62, DW63; two raters (25%) to DW75; and one rater (13%) to DW57, DW61, DW62, and DW71. Six (75%) out eight raters using the holistic scoring method to assess directed writing gave good scores to DW51; five raters (63%) to DW61, DW62, DW75; four raters (50%) to DW54; three raters (38%) to DW52, DW63, DW68, DW69; two raters (25%) to DW57, DW60, DW71; one rater (13%) to DW55, DW58, DW65, DW67, DW72, and DW74. The majority of scores given by eight raters who used the holistic scoring method to assess directed writing samples fell into the average band. All the eight raters (100%) gave average scores to DW56; seven (88%) to DW53, DW55, DW58, DW65, DW70; Six raters (75%) to DW72, DW74; five raters (63%) to DW52, DW57, DW68, DW71; four raters (50%) to DW66, DW67, DW69, DW73; Three raters (38%) to DW59; two raters (25%) to DW51, DW61, DW63, DW64; and one rater (13%) to DW54, DW75. Six (75%) out of eight raters using the holistic scoring method to assess directed writing samples gave poor scores to DW64; five raters (63%) to DW59; four raters (50%) to DW66, DW73; three raters (38%) to DW67; and one rater (13%) to DW53, DW69, DW70, DW72, and DW74. 52% of the total scores given by eight raters who used the holistic scoring method to assess directed writing samples fell into the average band. 25% of the scores were in the good band; 13% were in the poor band; and 10% were in the excellent band.

The categories of scores given by twelve raters using the analytic scoring method, seven raters using the primary trait scoring method and eight raters using the holistic scoring method show that 52% of all their scores fell into the average band, regardless of the scoring methods they used. The good band was the second most chosen category of scores by all raters in the three groups (26% for raters using the analytic scoring method; 32% for raters using the primary trait scoring method; and 25% for raters using the holistic scoring method). When compared to the categories of scores given by Teacher A and Expert Rater 1 the difference


was in that the most scores given by the two raters were in the good band (46%). However there were some similarities in the category of scores given to some writing samples. For example, Teacher A and Expert Rater 1 gave excellent scores to DW60; 58% of raters using the analytic scoring method gave excellent scores to DW60; 71% of raters using the primary trait scoring method gave excellent scores to DW60; and 75% of raters using the holistic scoring method gave excellent scores to DW60. Thus regardless of the differences in the scoring methods, raters who used different scoring methods to assess the directed writing samples were able to identify the excellent pieces of students’ writing.

To compare the scores given by all twenty-eight TESL students and one expert rater to the directed writing samples, the mean and standard deviation of the scores were calculated using the SPSS program. The descriptive statistics obtained from these calculations also answered the research question number three (what was the relationship between the scores given by TESL teachers using the holistic scoring method, the analytic scoring method and the primary trait scoring method with the expert rater and Teacher A using the SPM examination scoring method?).

The descriptive statistics show that out of total score of 30 (thirty) allocated to each directed writing sample the maximum mean of scores was obtained from Teacher A that was 23.6800 and the minimum mean was from Rater 8 that was11.8000. The mean of score obtained from Expert Rater 1 was 20.8000. The raters whose mean score were closest to the Expert Rater l ’s were Rater 5 (20.4545); Rater 11 (20.1600); Rater 16 (21.1600); and Rater 23 (20.3200). Rater 5 and Rater 11 were using the analytic scoring method; Rater 16 was using the primary trait scoring method; and Rater 23 was using the holistic scoring method. This shows that raters using the three different scoring methods could achieve the similar mean of scores after assessing the writing samples.

To establish the concurrent validity of the scores obtained from TESL students and expert rater after assessing directed writing samples using the holistic scoring method, the analytic scoring method and the primary trait scoring method (specific objective number five); and to answer research questions number two (To what extent were the holistic scoring method, the analytic scoring method and the primary trait scoring method valid and reliable for assessment of directed writing?), and research question number three (What was the relationship between the scores given by TESL students using the holistic scoring method, the analytic scoring method and the primary trait scoring method with the scores given by the expert rater and Teacher A, using the SPM examination scoring method?) the Spearman rho was calculated using the SPSS program.

The non-parametric calculations by spearman rho show that there was significant correlation coefficient at 99% confidence level between the scores given by Expert Rater 1 (ERl) and the scores given by Teacher A, Rater 1, Rater3, Rater 4, Rater 7, Rater 8, Rater 9, Rater 10, Rater 11, Rater 15, Rater 16, Rater


17, Rater 18, Rater 19, Rater 20, Rater 21, Rater 22, Rater 23, Rater 24, Rater 25, Rater 26, and Rater 27 in grading twenty-five samples of directed writing samples written by Form Four ESL secondary school students. Apart from that there was also a significant correlation coefficient at 95% confidence level between the scores given by Expert Rater 1 and the scores given by Rater 2, Rater 5, Rater 6, and Rater 12. Only two raters did not show any correlation of scores with Expert Rater 1. They were Rater 13 and Rater 14 whose significant level were at -0.011 and 0.021 respectively. Both raters used the primary trait scoring method to assess the directed writing samples.

Thus 79% of the raters (22 out of 28 raters, including Teacher A) correlated their scores at 99% confidence level with Expert Rater 1. Among these raters eight of them used the analytic scoring method; five used the primary trait scoring method; eight used the holistic scoring method; and Teacher A used the English SPM Examination scoring method, like Expert Rater 1. 14% of the raters (4 out of 28 raters) correlated their scores at 95% confidence level with Expert Rater 1. All these four raters were using the analytic scoring method. Only 7% of the raters (2 out of 28 raters) did not correlate their scores with Expert Rater 1.

Salient Features of Directed Writing Assessment

While the TESL students assessed the directed writing samples during the seminars and workshops, their salient features of assessment were recorded and analysed, as stated in specific objective number one of this research (To record and analyse the salient features of assessment verbalised by the TESL students as they assessed directed writing). The purpose of this was to identify the strengths and weaknesses of the holistic scoring method, analytic scoring method and primary trait scoring method used to assess directed writing (specific objective number four). The analysis also answered research questions number one (How did TESL students assess directed writing and what were the salient features of assessment that they verbalised as they reacted to the writing products using the holistic scoring method, the analytic scoring method and the primary trait scoring method?) and number two (To what extent were the holistic scoring method, the analytic scoring method and the primary trait scoring method valid and reliable for assessment of directed writing?).

Teacher A (TA) who assessed twenty-five samples of Form Four ESL secondary school students’ directed writing (DW51 to DW75) on her own was told to record her salient features of assessment. She wrote it down. She admitted that she had no experience assessing students’ writing, and it took her three days to complete assessing the writing samples. While assessing the writing samples, Teacher A focussed on the students’ performance in grammar and content and gave good marks to students whom she thought performed well in these two aspects. Expert Rater 1 (ER1) did not record down her salient features of


assessment, but she wrote comments about the students.’ performance on the writing samples. The comments that she wrote on the students’ writing samples showed that she took into consideration two important aspects: grammar and content, when she assessed the writing samples.

Twelve TESL students or raters (R1 to R12) who were gathered together in the first one-day seminar and workshop assessed directed writing samples DW51 to DW75 using the analytic scoring method; seven raters (R13 to R19) who were gathered in the second one-day seminar and workshop assessed directed writing samples DW51 to DW75 using the primary trait scoring method; and eight raters (R20 to R27) who were gathered together in the third one-day seminar and workshop assessed directed writing samples DW51 to DW75 using the holistic scoring method.

While assessing the writing samples, these students were told to write down their salient features of assessment by answering a set of questions given to them as a guideline. Some of them answered these questions in the midst of assessment, while some of them answered after completing the assessment. A discussion was held after they answered the questions. The questions given to the raters as a guideline were:1. List down the aspects of writing (for example: content, organisation,

vocabulary, grammar and mechanics) that you consider as most important while you were assessing the directed writing samples.

2. While assessing the directed writing samples did you look at one writing sample and give its score, or did you look at all writing samples first before deciding the scores for each one?

3. While assessing did you depart your attention from the scoring method you were supposed to use? Why?

4. Please give reasons for the scores you gave to the writing samples.5. What is your opinion about using the analytic scoring method/primary trait

scoring method/holistic scoring method to assess directed writing?

In response to question number one, five raters (Rater 2, Rater 3, Rater 4, Rater 9, and Rater 10) who used the analytic scoring method (see appendix) listed these aspects according to its order of importance: content, organisation, vocabulary, grammar and mechanics. Another five raters who used the same scoring method (Rater 1, Rater 5, Rater 6, Rater 8, and Rater 12) listed these aspects according to its importance: content, organisation, grammar, vocabulary, and mechanics. Only two raters who used the analytic scoring method (Rater 7 and Rater 11) listed these aspects according to its importance: content, grammar, organisation, vocabulary, and mechanics. They looked at the aspects given in the analytic scoring method when answering this question.

The primary trait scoring method (see appendix) required the raters to look at only the process of making nasi lemak, whether the description was clear or not.


Six out of seven raters (Rater 13, Rater 14, Rater 15, Rater 17, Rater 18 and Rater 19) who used this scoring method looked at the general performance of writing and took into consideration aspects like grammar, flow of ideas, format of writing, vocabulary, the use of sequence connectors, and sentence structures. They gave good marks if the writing samples showed good performance in these aspects, apart from the clear process of making nasi lemak. Rater !6 admitted that she tended to take into consideration her pre-conceived ideas about the process of making nasi lemak. She knew very well how to make nasi lemak. So she make used of her own knowledge about the process, rather than referring to the writing instruction given. She gave good scores to writing samples that include extra information about making nasi lemak.

The .holistic scoring method required the raters to consider the students’ overall performance in directed writing. The scoring method did not req iire raters to concentrate on any particular aspect of writing. Eight TESL students or raters (R20 to R27) who assessed twenty-five samples directed writing (DW51 to DW75) admitted that they did not concentrate on any particular aspect of language while assessing, but focussed on the students’ general performance. However they did not deny the fact that sometimes they were tempted to deduct marks for grammatical error. For example, Rater 27 looked at whether the students were able to answer the question as required, but would deduct marks for wrong format, incorrect sequence connectors and grammatical errors. The other raters took into consideration language aspects as stated by the other raters who used the analytic scoring method.

For question number two, all twelve raters who used the analytic scoring method admitted that they concentrated on one piece of writing sample at one time while assessing. None of them looked one aspect of writing (content, organisation, vocabulary, grammar, and mechanics) at a time while assessing. In a discussion held with all the raters, they agreed that they would be able to give fairer scores if they looked at one aspect at a time while assessing. However, most of them agreed that looking at one aspect at a time was time consuming. It would be applicable if they would be given lesser number of writing samples if they were to look at one aspect at a time. Nonetheless, all of them agreed that the analytic scoring method had enabled them to look at all important aspects of writing. So they were confident that they had given fair scores to all the writing samples, even though they had to look at one writing sample at a time.

Unlike raters who used the analytic scoring method, the other raters who used the primary trait scoring method and the holistic scoring method had no choice than to concentrate on each writing sample at a time. Seven raters (R13 to R19) who used the primary trait scoring method and eight raters (R20 to R27) who used the holistic scoring method admitted that they concentrated on one piece of writing at a time and gave scores to each writing sample before continuing with the next one because the scoring method required them to do so.


Rater 3, Rater 10, and Rater 11 who used the analytic scoring method did not depart their attention (in response to question number three) from the aspects stated in the scoring method. They adhered strictly to the guideline given in the scoring method when assessing the writing samples. Rater 3 who used the analytic scoring method believed that by following the scoring method it enabled her to give fair judgment to all the writing samples because the scoring method took into consideration important aspects of writing equally and fairly. She suggested that the descriptors under the aspect organisation should include format of writing. Rater 11 who also used the analytic scoring method said that the guideline given for assessing in the scoring method was very detailed and useful for teachers. The other nine raters (Rater 1, Rater 2, Rater 4, Rater 5, Rater 6, Rater 7, Rater 8, Rater 9, and Rater 12) who used the analytic scoring method to assess directed writing said that they departed their attention from the aspects given in the scoring method. The reasons they gave for departing their attention were that there were other aspects not stated in the writing instruction and the scoring method that attracted their attention while assessing the students’ writing samples. These raters found that some school students who wrote the directed writing added extra information that showed that they were very good with describing the process of making nasi lemak; had some sense of humour; and portrayed friendliness in their writing. So they gave credit to such writing samples. Some of the raters deducted marks from writing samples, which were translated directly from the Malay Language.

Six out of seven raters who used the primary trait scoring method to assess directed writing samples admitted that they departed their attention from the scoring method while assessing. They said that it was difficult to concentrate only on the process of making nasi lemak because they were much convinced by the students’ performance in language. Three raters (Rater 17, Rater 18, and Rater 19) stated that grammatical errors, wrong sentence structures and wrong format of writing influenced them when giving marks to the writing samples because these aspects hampered the description of making nasi lemak. Rater 13 who was the only one using the primary trait scoring method, who did not depart her attention from the scoring method while assessing, said that she felt comfortable concentrating on only one aspect.

Five raters (Rater 20, Rater 21, Rater 22, Rater 23, and Rater 26) who used the holistic scoring method to assess the directed writing samples departed their attention from the scoring method. The reasons they gave were that the holistic scoring method was too general, errors could not be ignored, and the marking system was too lenient. So they tend to be stricter in marking when considering the students’ error. Three raters (Rater 24, Rater 25, and Rater 27) did not depart their attention from the holistic scoring method.

Most of the raters who used the analytic scoring method described the general performance of the students as reasons for the scores they gave to the students’

212 /P rosiding Seminar Penyelidikan

writing samples. For example, they described the students’ writing samples as full of content, with no grammatical error, with perfect sentences, with good sentence structures, and well organised when giving good scores to the writing samples. The reasons the raters gave to writing samples that they graded as poor were that the students were poor in language, could not construct good sentences, and could not produce good points for content.

All twelve raters who used the analytic scoring method agreed that the scoring method was very suitable for classroom assessment. They believed that by using the scoring method teachers were able to assess the overall performance of the students’ writing because the analytic scoring method covered every language aspect. They agreed that teachers would be able to monitor students’ learning if they used the scoring method for classroom assessment. However all of them also agreed that the scoring method was too detailed that they could not concentrate assessing too many writing samples in a limited time given. They believed that teachers should be given less numbers of students to teach in a classroom, so that they could use the scoring method to assess their students’ writing.

The raters who used the primary trait scoring method said that the scoring method was very suitable for classroom use because the teachers were given the chance to construct the rubrics depending on what trait they wanted to test on the students. The only problem with the scoring method was that it would be very troublesome for the teachers to standardised their marking with the other teachers in school. They stated that school administrators would normally require them to use the same scoring method with other teachers. However a few of them suggested that the scoring method would be suitable for assessing daily exercises, and not monthly tests or school-based examination.

All raters who used the holistic scoring method argued that the scoring method was not suitable for weak students because it required them to look at the good points in writing only. They also believed that the scoring method was suitable only for large-scale assessment. They agreed that this scoring method be used for school-based examination, and not for classroom-based assessment.

Discussion and Recommendation

The subjects involved in this study were limited to UPSI TESL students who had no experience using the three scoring methods introduced in the research. They had no experience teaching and assessing secondary school students’ directed writing So they could not give much opinion about the strengths and weaknesses of the scoring methods introduced to them. However the scores given to the directed writing samples were valid and reliable because they were obtained from these students after they were trained to use the scoring methods during the seminars and workshops. It is recommended that this research is done with TESL teachers who have experience teaching and assessing secondary school students’ directed writing.


References

Best, J. W. and Kahn, J. V., 1993. Research in Education. 7“' Edition. Needham Heights: Allyn and Bacon.

Brown, H. D., 2001. Teaching by principles: An interactive approach to Language pedagogy. 2nd edition. New York: Addison Wesley Longman, Inc.

Cohen, A. D., 1994. Assessing Language Ability in the Classroom. 2nd Edition. Wadsworth: Heinle and Heinle Publishers.

Cooper, C. R and Odell, L., 1977. Evaluating Writing: Describing, Measuring, Judging. Buffalo: National Council of Teachers of English.

Cretan, K. D. and Hudson, R., 2001. A comparison o f two scoring strategies fo r performance assessments in Educational Research Quarterly. West Monroe. Dec. 2001 .

Gay, L. R., 1992. Educational Research: Competencies fo r analysis and application. Fourth Edition. New York: Macmillan Publishing Company.

Gorman, T. P., Purves A. C. and Degenhart R. E., 1988. The IEA Study o f Written Composition 1: The International Writing Tasks and Scoring Scales. Oxford: Pergamon Press (Vol 5).

Hamp-Lyons, L., 1990. Second language writing: assessm ent issues in Kroll, B (Ed). Second la n g u a g ^ w r itin g : Research insights fo r the classroom. Cambridge: Cambridge University Press.

Hayes, Hatch and Silk, 2000. Dogs holistic assessment predict writing performance? Estimating the consistency o f students performance on holistically scored writing assignments in Written Communication; Beverly Hills; Jan 2000; Vol 17; Issue U

Henning, G., 1987. A Guide to Language Testing. Development. Evaluation. Research. Boston: Heinle and Heinle Publishers.

Henning, T. B., 2002. Beyond standardized testing: A eiise study in assessm ent’s transformative pow er in English Leadership Quarterly; Feb 2002. Vol. 24, Iss 3. Urbana.

Heck, R. H. and Crislip, M., 2001. Direct and indirect writing assessment: Examining issues o f equity and utility in Educational Evaluation and policy analysis. Fall 2001. Washington: American Educational Research Association.

Johnson, Penny and Gordon, 2001. Score resolution and the interrater reliability o f holistic scores in rating essays in Written Communication. Beverly Hills. April 2001; Vol. 18, Iss. 2.


McNamara, T,, 2000. Language Testing. Oxford: Oxford University Press.

Miller, M. D. and Linn, R. L., 2000. Validation o f perform ance-based assessments in Applied Psychological Measurement, Dec. 2000. Thousand Oaks: Sage Publication.

Rabinowitz, S., 2001. Balancing state and local assessments in School Administrator. Arlington: American Association of School Administrators; Dec. 2001 Vol. 58; Issue11.

Rogers, P. S. & Rymer, J., 2001. Analytical tools to fac ilita te transitions into new writing contexts: A communicative perspective in The journa l o f Business Communication; Urbana; April 2001.

Stiggins, R. J., 2002. Assessm ent crisis: The absence o f assessm ent fo r learning in Phi Delta Kappan: Bloomington; June 2002; Vol. 83; Issue 10.

Stobart, G., 2001. The validity o f National Curriculum assessm ent in British Journal of Educational Studies, ISSN 0007-1005 Vol. 49 No.l, March 2001, pp 26-39.

Wolfe, E. W., Kao, Chi-Wen and Ranney, M., 1998. Cognitive differences in proficient and nonproficient essay scorers in Written Communication; Oct. 1998; Beverly Hills:

Publications, Inc.

assessment of directed writing by a group of tesl …pustaka2.upsi.edu.my/eprints/406/1/assessment...

Documents