jawapan peperiksaan aeu

24
Chapter 3 Five major limitations identified and discussed by Bachman (1990), are as follows: (a) Subjectivity (b) Under specification of domain (c) Incompleteness (d) Indirectness (e) Imprecision 1.1 SUBJECTIVITY a)The most obvious form of subjectivity in tests is seen in grading tests that are in the supply type or subjective format such as essays and even interviews. This issue has been addressed at some length in previous chapters. However, subjectivity does not refer only to grading but also to other elements of the test as well. Even when the test is an “objective”, select or multiple choice type test, there is still some amount of subjectivity. This subjectivity is found in the selection of passages and item formats as well as content that is to be tested. In a test that contains a reading comprehension passage, for example, why was one passage selected over another? Was it because of the content? If so, then there must surely be many passages on one content. The question then is why the use of one passage and not another on the same content? The answer is that decisions that affect the test and its ability to precisely measure are made by individuals. There is some degree of subjectivity involved in these decisions. b)A test is a measurement of some content. This content can be referred to as the domain or the construct of the test. However, while it may be quite easy to specify the domain to be listening

Upload: shahnizat-sakiran

Post on 13-Dec-2015

237 views

Category:

Documents


2 download

DESCRIPTION

antara jawapan soalan lalu

TRANSCRIPT

Page 1: jawapan peperiksaan aeu

Chapter 3

Five major limitations identified and discussed by Bachman (1990), are as follows: (a) Subjectivity(b) Under specification of domain(c) Incompleteness(d) Indirectness(e) Imprecision 1.1 SUBJECTIVITY

a)The most obvious form of subjectivity in tests is seen in grading tests that are in the supply typeor subjective format such as essays and even interviews. This issue has been addressed at somelength in previous chapters. However, subjectivity does not refer only to grading but also toother elements of the test as well. Even when the test is an “objective”, select or multiple choicetype test, there is still some amount of subjectivity. This subjectivity is found in the selection ofpassages and item formats as well as content that is to be tested. In a test that contains a readingcomprehension passage, for example, why was one passage selected over another? Was itbecause of the content? If so, then there must surely be many passages on one content. Thequestion then is why the use of one passage and not another on the same content? The answer isthat decisions that affect the test and its ability to precisely measure are made by individuals.There is some degree of subjectivity involved in these decisions.

b)A test is a measurement of some content. This content can be referred to as the domain or theconstruct of the test. However, while it may be quite easy to specify the domain to be listeningcomprehension, for example, it is not as easy to test or measure the domain. When any kind oftheoretical domain or construct is operationalised, there is bound to be some aspect of thedomain that cannot be translated into a test. The test therefore under-specifies the domain. It isthis under-specification of domain that limits a test as a measure of ability, knowledge or sample

Page 2: jawapan peperiksaan aeu

behaviour.

c)Incompleteness refers to the student’s inability to demonstrate the entire repertoire of theconstruct being measured. As a test is constrained by time and physical setting, a student willnever be able to show all of what he or she is able to do. Because only a few questions can beasked in a test due to time constraints, these questions may not be able to elicit the student’s trueor complete ability. Similarly, the constraints placed by the physical setting of the test may alsorestrain the student from demonstrating specific kinds of abilities. As such, we should take notethat even when a student scores zero points in a test, this does not mean that he or she iscompletely ignorant of the subject or ability being tested. It is just that the test has not elicitedthe knowledge or abilities that the student is able to convey or perform.

d)While we are aware of the importance of having direct tests, it is unlikely that a test will becompletely free of being an indirect measure of ability. This limitation is inherent in the testingsituation itself. Many of us have gone through test anxiety. Once the word test or assessment ismentioned, the entire situation changes. While some students will be able to speak well insituations outside the classroom, they lose this ability once they become aware that they are beingtested. In addition to this, every test situation has elements that are not related to the constructbeing tested. This is referred to as construct irrelevant variance by Messick (1989) and examplesmay include the test rubrics or instructions, time constraints, and other rules and regulations ofthe test. All these are not present in the actual real-world situation and must be considered asaspect of indirectness. As such, we can only conclude that the test situation is indirect because itis inauthentic. And by being indirect, it fails to capture the true ability of the students if they wereto perform in the real world.

Page 3: jawapan peperiksaan aeu

e)Finally, we need to acknowledge that there is a degree of imprecision in all tests. While we maybe able to justify some of the weightage in marks or points given to some items, we will never beable to be completely accurate and just. Even in a situation where there are twenty multiplechoice items, each assigned one point, it is almost impossible to claim that each one of thetwenty items are of equal difficulty. As such, we will not be able to justify equal weightage of onepoint for each item. It is this imprecision that must be acknowledged as another constraint oftests.In addition to the above, Herman et al. (1992) also point out other limitations such as themismatch between test content and curriculum and instruction; the over emphasis on routineand discrete skills to the neglect of complex thinking and problem solving skills; and the limitedrelevance of major test formats such as the multiple choice format to either classroom or realworldlearning(pp. 5-6).

Advantages using portfolio as an assessment

(a) enhances student and teacher involvement in assessment;(b) provides opportunities for teachers to observe students using meaningful language;(c) to accomplish various authentic tasks in a variety of contexts and situations;(d) permit the assessment of the multiple dimensions of language learning;(e) provide opportunities for both students and teachers to work together and reflect onwhat it means to assess students’ language growth; (f) increase the variety of information collected on students; (g) make teachers’ ways of assessing student work more systematic.

Alternative assessment

Page 4: jawapan peperiksaan aeu

Chapter 6

4 types of test

(a) Achievement test. (b) Aptitude test. (c) Proficiency test. (d) Diagnostic test

Page 5: jawapan peperiksaan aeu

Additional testing terminology

Several other terms are also important in testing. In language testing, these include theterms: direct test and the indirect test; authentic tests, performance tests, integrativeand discrete point tests as well as speeded and power tests.

Authentic and performance testTESTS AND PERFORMANCE TESTS Closely related to the distinction made between the direct and indirect tests are authentic tests.

Page 6: jawapan peperiksaan aeu

An authentic test is one in which the activity that is performed closely resembles what is doneoutside of the testing situation. For example, if a test requires that you take down notes based ona lecture, this task can be considered authentic as it reflects what you may be required to do inreal life. Tests in which the task does not reflect a real life task are inauthentic.

Performance based examinations are tests that assess your ability to perform a specific task.The performance test, therefore, is like a direct test. It is also authentic and can be considered as being similar to a simulation. It can also be likened to the kinds of badges that the scoutmovement used to give for being able to perform an action or demonstrate an ability or skill.

Applied to language teaching, we can perhaps imagine the different communicative scenariospossible and determine one’s ability based on a performance test according to that scenario.SPEEDED AND POWER TESTS

It is also important to consider the terms speeded and power tests and how they differ. Aspeeded test is a test that is timed and emphasises the students’ ability to complete tasks quickly.A power test, on the other hand, focuses on the students’ knowledge and provides them withenough time to demonstrate this knowledge or ability.

What is the necessity for us to discuss speeded and power tests? It relates to tests being measuresof behaviour, knowledge, or ability. As such a measure, we must be aware of whether we havebeen fair in providing our students the correct situation to demonstrate their knowledge. Atimed or speeded test may not allow them to do so. In such a situation, a power test could bemore relevant. Test takers may be provided with more than enough time to complete the test asthe demonstration of their knowledge is more important than limiting the time of the test. In adifferent situation, however, one may be interested to know how well students perform in anactual social or natural situation. Such a situation may be constrained by time demands. As such,

Page 7: jawapan peperiksaan aeu

a speeded test would be a more relevant and appropriate situation.

Chapter 7

1 THE CLOZE TEST

The cloze test is a test that is often associated with language proficiency testing. It is more thansimply filling in blanks in a passage as it has a theoretical basis. The term cloze comes from theword closure and reflects a psychoanalytical human tendency to “close” any incomplete object. Assuch, the cloze test is thought to elicit a respondent’s language competency by requiring therespondent to complete a passage which has been “mutilated” with blanks. Although it wasinitially intended to be a measure of reading ability, the cloze test has often been considered as ameasure of overall general language proficiency.

There are many different types of cloze tests, two of the more common are determined by howthe words in the passage are deleted in order to form blanks in the passage. The fixed deletion cloze is a cloze passage where every nth word in the passage is deleted. For example, a cloze testwhere n = 5 means that every fifth word after the first sentence is deleted. This method is said tohelp assess overall language proficiency as the types of words deleted are thought to berepresentative of language in general, given the fact that they have been deleted on a more or lessrandom basis.

If the test maker intentionally deletes a certain kind of word, then the cloze test is referred to as arational deletion cloze test. A rational deletion cloze test could involve the deletion of only verbs,for example. The number of words between every blank in a rational deletion cloze test may notconsistently be the same. However, you may also find some cloze tests in which the passage hasbeen altered so that only certain types of words are deleted at consistent intervals. These cloze

Page 8: jawapan peperiksaan aeu

passages, even if they consist of blanks that are spaced out equally, are still rational clozepassages as the deleted words were selected by the test maker.7.1.1 THE STRUCTURE OF THE CLOZE TEST The cloze test consists of a passage with blanks. The first sentence is left intact without anyblanks. This is to ensure that the test takers have some context to work with. It also providesother information such as to indicate the tense of the passage. Normally the cloze passage is longenough to allow for about 20 blank spaces as a longer text would make it extremely difficult.difficulty level of the cloze procedure include the following: (a) Length of the text: The longer the text, the more difficult the cloze passage. (b) Familiarity of vocabulary and structures: This includes the word that is neededto fillin the blank. For example, in a sentence such as The situation was _____ with danger, it ishighly unlikely that non native speakers would be able to provide the correct wordfraught to fill in the blank. (c) Length and complexity of the sentences: The longer and more complex the sentence,the more difficult it becomes for the student to complete the cloze. (d) Familiarity with chapter and discourse genre: Familiarity with each of these wouldmake the cloze easier. (e) Frequency with which blanks are spaced: In this case, when the blanks are closertogether, the more difficult the cloze passage becomes. Normally, the number of wordsbetween blanks or the N in a cloze passage is between 5 to 7 and seldom less than 5.

Grading Clozen test

Exact Word Method In the exact word method, only one answer is accepted for each blank. This method of grading isseen to be more objective and therefore more reliable. However, the exact word method stifles

Page 9: jawapan peperiksaan aeu

creativity. Take the following text for example:

Acceptable Word Method The second method of scoring is the acceptable word method in which we accept any suitableanswer. This method is more subjective and therefore less reliable. However, it definitely doesnot suffer from being unnecessarily rigid and stringent. A potential source of problem thatshould be noted is to decide the words that would be acceptable as answers. Usually, in thismethod, the cloze is pretested with native or near native speakers and the responses given areused as the acceptable answers 7.2 THE DICTATION OF TEST

Dictation of testThe dictation is a common form of assessment that many of us have experienced. The dictation is seen to have some commonalities with the cloze test, especially in that both are considered tobe able to predict overall language ability. The dictation is also thought to provide results that aresimilar to those obtained in cloze tests but with the added ability of assessing listening as well(Hughes, 1972). In a standard dictation test, the teacher begins by selecting an appropriatepassage. This passage is usually a short passage no longer than one paragraph. This stage of thedictation is an important one as the paragraph that has been selected must be appropriate to thestudents’ language ability as well as cultural background. After having selected the passage, theteacher can proceed with the dictation. 7.2.1 THE STRUCTURE OF THE DICTATION

the structure of dictationThe dictation passage is usually read out three times. The first time it is read out, it is done so at a normal rate of reading. Students are expected to listen and get the gist of the passage. Thesecond reading is a little slower and the students are expected to take down what is read. Duringthe second reading, the teacher usually pauses to break the passage into meaningful chunks

Page 10: jawapan peperiksaan aeu

referred to as bursts. Finally, the passage is read a third time and students are expected to checktheir work, editing it for errors.

The dictation passage is usually read out three times. The first time it is read out, it is done so at a normal rate of reading. Students are expected to listen and get the gist of the passage. Thesecond reading is a little slower and the students are expected to take down what is read. Duringthe second reading, the teacher usually pauses to break the passage into meaningful chunksreferred to as bursts. Finally, the passage is read a third time and students are expected to checktheir work, editing it for errors.

What makes dictation difficult

There are many factors that can contribute to the difficulty of taking down a dictation. Some ofthese factors are listed as follows: (a) The length of the phrase or burst.(b) The length of the pauses between bursts.(c) The content of the dictation passage.(d) The syntactic and structural properties of the sentences in the passage.(e) Clarity of voice, expression and pace or tempo.

It is quite obvious that the longer the burst, the more difficult it becomes for the student as he orshe will have to retain more in short term memory while taking down the burst. However, thelonger the pause, the easier it becomes for the students.

How long should a pause be? It is recommended that during the pause we silently read the burstwe have just read twice at normal speed.

When it comes to content, familiar content makes the dictation easier. Jargon and highlytechnical words are difficult for someone who is not trained in the particular field and istherefore unfamiliar with such words. It is also not surprising that complex structures make

Page 11: jawapan peperiksaan aeu

dictation more difficult.

For example, compare the following two sentences:

(1) Ali caught the woman who stole the money. (2) The woman that Ali caught stole the money.

While the two sentences convey more or less the same meaning, the second sentence is a littlemore complicated as its structure is unfamiliar. Furthermore, the woman which is the subject ofthe verb stole does not appear immediately before the verb. The student is therefore required tomake a “long distance” connection between the subject and the verb. Finally, it should also bementioned that clarity of voice is important in dictations. Together with this, you may alsoinclude facial expressions as the more animated the person dictating, the more cues are providedto the students. Variants of dictation test

In addition to the standard dictation that most of us are familiar with, there are several variantsof the dictation procedure. Most notably among these are: (a) the graded or graduated dictation;(b) the partial dictation;(c) the dictocomp. Graded or Graduated Dictation The graded dictation is simply a technique where the dictation passage becomes progressivelymore difficult. This is done by gradually increasing the number of words in a burst. A burst is thenumber of words the tester dictates between pauses and repeats. The dictation may begin with aburst consisting of two words and the number of words slowly increases until there can be up tothirteen or fourteen words in a burst. Normally the processing load becomes too high when aburst exceeds seven words. However, better and more proficient students will be able to handleseven words and perhaps even more by chunking words that collocate naturally.

Page 12: jawapan peperiksaan aeu

Partial Dictation The partial dictation is essentially like a listening cloze activity. Students are provided the passagewith some words or phrases deleted. They are expected to listen to a passage and fill in words orphrases. It is commonplace to have partial dictations in which single words or even short phrasesare deleted. Dictocomp Finally, in the dictocomp, the students are expected to use the information they hear to constructa coherent piece of composition instead of taking down the passage exactly as it was dictated.The teacher will determine the key elements of the original passage which the student is expected

to include in the composition. Therefore, the dictocomp can be said to test listeningcomprehension in a very specific way in that the student has to decide what pieces ofinformation are important and should be included. This is reminiscent of summaries.Additionally, the dictocomp also tests writing ability as well because the students are expected towrite a cohesive piece based on the passage that was dictated to them.

Chapter 99.

discrete point or integrative DISCRETE POINT TESTS AND INTEGRATIVE TESTS Language tests may also be categorised as either discrete point or integrative. Discrete point testsexamine one element at a time. Integrative tests, on the other hand, “requires the candidate to combinemany language elements in the completion of a task” (Hughes, 1989: 16). It is a simultaneous measure ofknowledge and ability of a variety of language features, modes, or skills.

A multiple choice type test is usually cited as an example of a discrete point test while essays arecommonly regarded as the epitome of integrative tests. However, both the discrete point test and

Page 13: jawapan peperiksaan aeu

the integrative test are a matter of degree. A test may be more discrete point than another andsimilarly a test may be more integrative than another. Perhaps the more important aspect is to beaware of the discrete point or integrative nature of a test as we must be careful of what webelieve the test measures.

This brings us to the question of how discrete point is a multiple choice question type item?While it is definitely more discrete point than an essay, it may still require more than just one skillor ability in order to complete. Let’s say you are interested in testing a student’s knowledge of therelative pronoun and decide to do so by using a multiple choice test item. If he fails to answerthis test item correctly, would you conclude that the student has problems with the relativepronoun? The answer may not be as straight forward as it seems. The test is presented in textualform and therefore requires the student to read. As such, even the multiple choice test iteminvolves some integration of language skills as this example shows, where in addition to thegrammatical knowledge of relative pronouns, the student must also be able to read andunderstand the question.

Perhaps a clearer way of viewing the distinction between the discrete point and the integrative test is to examine the perspective each takes toward language. In the discrete point test, language is seen to be made up of smaller units and it may be possible to test language by testing each unitat a time. Testing knowledge of the relative pronoun, for example, is certainly assessing thestudents on a particular unit of language and not on the language as a whole. In an integrativetest, on the other hand, the perspective of language is that of an integrated whole which cannotbe broken up into smaller units or elements. Hence, the testing of language should maintain theintegrity or wholeness of the language.

Multiple choice

Page 14: jawapan peperiksaan aeu

The multiple choice format is perhaps the most common test format to many of us. It is alsocommonly referred to as an objective test as there is seen to be “objectivity” in grading the test.In this section, we will examine the multiple choice format with respect to its structure, use, andconstruction.

There are a number of situations in which a multiple choice format test may be useful and appropriate. Ory outlines some of these situations as follows:

When there is a large number of students taking the test. When you wish to reuse the questions in the test. When you have to provide the grades quickly. When highly reliable test scores must be obtained as efficiently as

possible. When impartiality of evaluation, fairness and freedom from possible

test scoring influences such as fatigue are essential. When you are more confident of your ability to construct valid

objective test items clearly than of your ability to judge essay test answers fairly. When you want to sample a wide range of content. When you are especially interested in measuring particular learning

objectives such as comprehension, recognition, and recall. When you want specific information especially for diagnostic

feedback.

It should be noted that these situations reflect the advantages of using the multiple choicequestion format. These advantages include:

the ability to create a test item bank; quick grading; high reliability; objective grading; wide coverage of content; precision in providing information regarding specific skills and

abilities.

Negative effect of MCQ

The technique tests only recognition knowledge and recall of facts.

Page 15: jawapan peperiksaan aeu

Guessing may have a considerable but unknowable effect on test scores.

The technique severely restricts what can be tested to only lower order skills.

It is very difficult to write successful items, due especially to the difficulty of finding

good distractors. Backwash can be harmful – for both teaching and learning as

preparing for a multiple choice format test is not reflective of good language teaching and

learning practice. Cheating may be facilitated. It places a high degree of dependence on the student’s reading ability

and instructor’s writing ability. It is time consuming to construct.

Chapter 8

Essay

Unlike the directed writing task, the continuous writing test item provides little structure other than the question itself. Students are expected to draw upon their experience and past knowledge as well as knowledge of writing conventions and organisation in order to complete the task. The essay test format provides several advantages compared to the multiple choice test format. Some of these advantages as mentioned by Kubiszyn and Borich (2000:18) are: (a) It can assess higher order skills. Unlike the multiple choice test format whichis often limited to assessing low order skills, the essay places a premium on theability to analyse, synthesise and evaluate through topics that require students toexpress their opinions or argue a point. (b) Emphasises communication skills. This is especially important when weconsider that communication skills is an important aspect of social relations.

Page 16: jawapan peperiksaan aeu

(c) Eliminates guessing. The multiple choice question format is notorious forallowing students to guess. In the essay format, however, guessing is unlikely tooccur. (d) Relatively easy to construct. An essay question can be constructed withinminutes compared to other test formats which can even take days to construct..

scoring essay

As we have seen earlier, scoring an essay is not easy as graders can be easily swayed by many factors. Scoring remains one of the major issues in grading essays. There are generally threemajor approaches to scoring essays which are the holistic scoring method, the analytical scoringmethod, and the objective scoring method. Holistic Scoring In holistic scoring, the reader reacts to the students’ compositions as a whole and a single scoreis awarded to the writing. Normally this score is on a scale of 1 to 4, or 1 to 6, or even 1 to 10.(Bailey, 1998 : 187). Each score on the scale will be accompanied with general descriptors ofability. The following is an example of a holistic scoring scheme based on a 6 point scale.

The 6 point scale above includes broad descriptors of what a student’s essay reflects for eachband. It is quite apparent that graders using this scale are expected to pay attention to vocabulary,meaning, organisation, topic development and communication. Mechanics such as punctuationare secondary to communication.

Bailey also describes another type of scoring related to the holistic approach which she refers toas primary trait scoring. In primary trait scoring, a particular functional focus is selected which isbased on the purpose of the writing and grading is based on how well the student is able toexpress that function. For example, if the function is to persuade, scoring would be on how wellthe author has been able to persuade the grader rather than how well organised the ideas were, or

Page 17: jawapan peperiksaan aeu

how grammatical the structures in the essay were. This technique to grading emphasisesfunctional and communicative ability rather than discrete linguistic ability and accuracy. Analytical Scoring Analytical scoring is a familiar approach to many teachers. In analytical scoring, raters assessstudents’ performance on a variety of categories which are hypothesised to make up the skill ofwriting. Content, for example, is often seen as an important aspect of writing – i.e. is theresubstance to what is written? Is the essay meaningful? Similarly, we may also want to considerthe organisation of the essay. Does the writer begin the essay with an appropriate topic sentence?Are there good transitions between paragraphs? Other categories that we may want to alsoconsider include vocabulary, language use and mechanics. The following are some possiblecomponents used in assessing writing ability using an analytical scoring approach and thesuggested weightage assigned to each:

The points assigned to each component reflect the importance of each of the components. Objective Scoring A third type of scoring approach is the objective scoring approach. This scoring approach relieson quantified methods of evaluating students’ writing. A sample of how objective scoring isconducted is given by Bailey (1999) as follows:

Establish standardization by limiting the length of the assessment: Count the first 250 words of

Page 18: jawapan peperiksaan aeu

the essay. Identify the elements to be assessed: Go through the essay up to the 250th

word underliningevery mistake – from spelling and mechanics through verb tenses, morphology, vocabulary, etc. Include every error that a literate reader might note.

Operationalise the assessment: Assign a weight score to each error, from 3 to 1. A score of 3 is asevere distortion of readability or flow of ideas; 2 is a moderate distortion; and 1 is a minor errorthat does not affect readability in any significant way.

Quantify the assessment: Calculate the essay Correctness Score by using 250 words as thenumerator of a fraction, and the sum of error scores as the denominator: The denominator is thesum of all the error scores:

The steps described above help to provide a clear and systematic method for assessing essays.Objective scoring does not necessarily need to use the same values as in this example. The mostimportant element in this approach is the objective scoring which is determined through theunbiased and fixed values provided according to some concrete aspect of the essay such as thenumber of mistakes made.

Page 19: jawapan peperiksaan aeu

Familiarisation with a grading scale is an important step in achieving valid and accurate scores. Ifbands are used, there is an obvious need to fully understand what each band signifies. This stageof the grading process should therefore be given due consideration and not ignored. There areenough incidents of graders “jumping the gun” and assessing essays without first becomingfamiliar with the scoring criteria. This may only result in having to grade the paper again.

The purpose of identifying benchmark papers or anchor papers is to provide a clear and

Page 20: jawapan peperiksaan aeu

representative example of students work according to the grading criteria. Bands can only give ageneral description of what is expected. Anchor or benchmark papers provide concrete examplesand help ensure fairness in grading.

When it comes to the actual grading, some recommend that we first quickly scan through all theessays and place them in stacks according to the bands on the scale. All papers which weconsider A papers will be stacked together, the B papers will be together and so on. We can thenread each paper more closely in order to confirm our initial impression. If we need to assignmore precise numerical scores, we can do so at this time. Another pointer in grading essays,especially when there are several essays, is to grade all the students on one essay first beforemoving on to the next essay. This is expected to help ensure more consistent grading.