assessing the language proficiency of teachers, are there any borders control¿

8/18/2019 Assessing the Language Proficiency of Teachers, Are There Any Borders Control¿

1/22

Assessing the language proficiency of teachers: are there any border

controls?Catherine Elder University of Auckland

This article takes up some of the issues identified by Douglas (2000) as problem-atic for Language for Specific Purposes ( LSP) testing, making reference to a num-

ber of performance-based instruments designed to assess the language proficiencyof teachers or intending teachers. The instruments referred to include proficiencytests for teachers of Italian as a foreign language in Australia (Elder, 1994) andfor trainee teachers using a foreign language (in this case English) as medium forteaching school subjects such as mathematics and science in Australian secondaryschools (Elder, 1993b; Viete, 1998).

The first problem addressed in the article has to do with specificity: how doesone define the domain of teacher proficiency and is it distinguishable from otherareas of professional competence or, indeed, from what is often referred to as‘general’ language proficiency? The second problem has to do with the vexedissue of authenticity: what constitutes appropriate task design on a teacher-specific

instrument and to what extent can ‘teacher-like’ language be elicited from candi-dates in the very artificial environment of a test? The third issue pertains to therole of nonlanguage factors (such as strategic competence or teaching skills) whichmay affect a candidate’s response to any appropriately contextualized test-task andwhether these factors can or should be assessed independently of the purelylinguistic qualities of the test performance.

All of these problems are about blurred boundaries, between and within realworld domains of language use, between the test and the nontest situation, andbetween the components of ability or knowledge measured by the test. It is arguedthat these blurred boundaries are an indication of the indeterminacy of LSP, ascurrently conceptualized, as an approach to test development.

I Background

Douglas (2000: 46) defines a specific purpose language test as:

[A test] in which the test content and methods are derived from an analysisof the characteristics of the specific target language use situation, so that testtasks and content are authentically representative of the target situation,allowing for an interaction between the test-taker’s language ability and

Address for correspondence: Department of Applied Language Studies and Linguistics, Univer-sity of Auckland, Auckland, New Zealand; e-mail: c.elderauckland.ac.nz

Language Testing 2001 18 (2) 149–170 0265-5322(01)LT199OA © 2001 Arnold

at UNIV AUTONOMA METROPOLITANA on April 4, 2016ltj.sagepub.comDownloaded from

http://ltj.sagepub.com/http://ltj.sagepub.com/http://ltj.sagepub.com/http://ltj.sagepub.com/


2/22

150 Assessing the language proficiency of teachers

specific purpose content knowledge, on the one hand, and the test task, onthe other.

In theory at least, such tests allow us to make more accurate infer-

ences about a test-taker’s future performance in the relevant targetlanguage use (TLU) domain than is the case for a test of generallanguage proficiency where the criterion domain of reference is anabstraction rather than a reality.

Douglas (in press), however, acknowledges a number of problemswhich dog the enterprise of testing Language for Specific Purposes(LSP). The first has to do with specificity: how one defines thedomain of reference and distinguishes it from other real worlddomains. The second has to do with authenticity: how one ensures

that a test reflects the demands of the real world situation and elicitsan appropriate sample of language from the test-taker. The third per-tains to the role of nonlanguage factors (which inevitably influenceperformance on any appropriately contextualized test-task) andwhether these factors can or should be assessed separately from thepurely linguistic aspects of test-taker behaviour. The three problemsidentified by Douglas all have to do with boundaries; they are aboutthe boundaries:

• between and within real world domains of language;• between the test and the nontest situation; and

• between the components of ability or knowledge measured bythe test.

This article explores these three problem areas of LSP testing and thechallenges they pose for assessing the language proficiency of teach-ers. The problems will be illustrated with reference to three specificpurpose test instruments:

1) A test of English for the selection of overseas-qualified immi-grants applying for entry to teacher education programs in theAustralian state of Victoria. This test is known as the Diplomaof Education Oral Interview Test of English (Viete, 1998)(hereafter referred to as DOITE).

2) A classroom language assessment schedule designed to identifythe English language problems faced by nonnative speaker teach-ers in training during their school-based teaching practice rounds(Elder, 1993b) (hereafter CLAsS).

3) Language proficiency test for teachers of Italian and Japanese asa foreign language in Australia (Elder, 1994; Elder et al., 1995)(hereafter LPTT)




3/22

Catherine Elder 151

1 DOITE and CLAsS tests

Both DOITE and CLAsS are intended for overseas-trained subjectteachers in specialist areas, such as mathematics or science, without

a locally recognized qualification in teaching methodology (Viete,1998). The DOITE is used for selection purposes, to determinewhether the English language skills of overseas applicants for second-ary teacher education are sufficient to allow them to cope with thedemands of their training course. The CLAsS follows on from theDOITE and is designed for use on the school-based teaching-practiceround during which teacher trainees are required to teach their subjectspecialism to secondary students under the supervision of a registeredteacher. The purpose of CLAsS is to

1) assist supervisors of successful applicants, who are generallysubject specialists rather than language experts, in determiningwhether any problems they experienced in the teaching situationare related to language; and

2) provide information about aspects of their performance whichcould be improved through supplementary English language sup-port.

2 LPTT

The LPTT tests, on the other hand, are for language teachers. Theywere designed for the purposes of professional accreditation, to ensurea minimum standard of language ability amongst graduates with aforeign language major (or equivalent qualification) in Italian orJapanese seeking registration as foreign language teachers. Thedecision to be made about test-takers is whether their skills in thetarget language are sufficient to allow them to teach this language toprospective students and to use it as the medium of classroom instruc-tion where appropriate.

3 Type of test

For all the above tests a performance-based LSP model of testing waschosen as the best option. In the case of CLAsS, which is an obser-vation schedule rather than a test instrument proper, the performanceis a particular teaching event which takes place in the classroom underthe supervision of a qualified teacher. In all other cases the testsattempt to elicit key features of teacher language proficiency via tasks

which simulate various aspects of the target teaching situation.A specific purpose assessment model was chosen for the follow-

ing reasons:




4/22


5/22

Catherine Elder 153

different subject areas (such as mathematics and chemistry) andtheir different manifestations in textbooks and lectures. However,as Stevenson (1985: 43) pointed out, we still have insufficient infor-mation about the categorical or elemental interrelationships andweights within and among these particular genres to be able to modelthem accurately and sample from them in a systematic fashion. More-over, when different genres are combined in different configurationswithin a larger domain like teaching or classroom communication,this problem is compounded.

Faced with problems such as these in LSP testing we have no otheroption than to settle for an expedient solution to domain definition(rather than a theoretically defensible one) and to compromise theabsolute validity of an individualized test which mirrors precisely a

particular target language event but has no generalizability beyondthat event, in favour of some degree of abstraction (while at the sametime retaining, albeit in simplified or idealized form, something of the communicative ‘flavour’ of the relevant TLU domain). This trade-off between the real and the ideal is part of the design challengewhich all LSP tests must rationally address. Table 1 shows us wherethis compromise has led us as far as the skills/language functionstargeted in the listening/speaking components of the tests nominatedabove are concerned.

These common elements suggest that, in spite of what seemed animpossibly broad TLU domain, and in spite of differences in what istaught, where and how, there is a high level of consensus about whatlanguage functions are critical for teachers across languages and subject areas. Moreover, a similar gamut of skills features in the specifi-cations of teacher tests developed in other countries and based on

Table 1 Language functions assessed in the three tests of teacher proficiency

Skills DOITE CLAsS LPTT

Present information and explain subject specific metalinguistic concepts

Extract meaning from multi-way discussion (with two or more speakers)

Discuss a problem/express opinion

Summarize/paraphrase simplify/disambiguate information

Formulate questions

Issue directives, set up a classroom activity




6/22


independent needs analyses (e.g. Hinofotis et al., 1981; Plakans andAbraham, 1990; Grant, 1997).

But do these tests, for all their ‘teacherliness’, really elicit a sampleof language that is different in essence from what would be elicitedfrom a general proficiency test? Within-test comparisons of the per-formance of specialist vs. nonspecialist test-takers can be revealing.Elder et al. (1995), for example, when trialling the LPTT Japanese(referred to above), found that expert teachers of Japanese as aforeign language (JFL) (who should have been equipped with pre-cisely the skills the test was designed to measure) performed morepoorly overall than did recent graduates from generalist Japanese lan-guage courses. While this outcome was doubtless due to the superiorlanguage skills of the recent graduates, it nevertheless casts some

doubt on the validity of the LPTT for its intended specific purpose.Between test comparisons (between general and specific measures)

tend also to produce equivocal findings. Smith (1992), comparing theSPEAK test designed to test the classroom competence of Inter-national Teaching Assistants in the USA with a subject-specific ver-sion of it known as MATHSPEAK, found that those with specialisttraining in mathematics did not necessarily perform better on the ver-sion which had been tailored to their needs. Douglas and Selinker(1993) attribute Smith’s findings to the lack of contextualization cues

required to promote differential ‘domain engagement’ (p. 235). Con-textualization cues mentioned by Douglas and Selinker were changesin tempo, pitch, stress, intonation, volume, overt rhetorical markers,gesture and eye contact, ‘which signal shifts in the dynamics of thecommunicative event, promoting the planning, execution and assess-ment of communicative moves and consequently the marshalling of interlanguage resources brought to bear’ (p. 236). The tasks on thistest are, in other words, not specific enough to elicit the languagebehaviour characteristic of the particular criterion domain. But howspecific should we be? In domains like teacher proficiency where therange of ‘allowable contributions’ (Swales, 1990: 52) is probablyinfinite, we have no principled basis for deciding which of the manyfeatures of the target context we must sample to be sure that ‘test tasksand content are authentically representative of the target situation’(Douglas, 2000: 46).

III Simulating the teacher role: the problem of authenticity

Bachman (1991) has made a useful distinction between two different

aspects of authenticity:1) situational authenticity, which refers to the level of correspon-

dence between the test and the TLU situation; and




7/22


8/22


it may be necessary to apply the schedule repeatedly in order to elicitan adequate sample of the trainee’s language in a variety of teachingmodes. The test’s interactional authenticity also depends crucially onthe relevance of the criteria used for assessment (i.e., they must beclearly related to the construct of ability which the test is designedto elicit) and on the capacity of the subject specialist teacher chargedwith making judgements about trainees’ language performance toapply these criteria appropriately. (The issue of assessment criteriaand the way in which these are applied is revisited below.)

2 LPTT

The LPTT (Italian), while situationally less authentic than the CLAsS

because it is not administered in a school classroom, is neverthelessa ‘live’ communicative performance. As can be seen from the brief description of the test offered in Appendix 2, the candidate is requiredto assume the role of a primary school language teacher from Phase2 of the interview until the end. During the course of the test he or sheis required to perform various classroom-like tasks, such as reading astory aloud as if to a group of young school-age second languagelearners, and issuing a set of classroom-like instructions explaininghow to carry out a particular learning activity. The quality of candi-

dates’ performance on these tasks is assessed using both linguistic andtask fulfilment criteria. The task fulfilment criteria draw the assessors’attention to features of communicative behaviour, such as style of delivery, which are deemed to be of particular relevance to class-room performance.

There are, nevertheless, obvious limitations to the test’s situationalauthenticity, the most obvious being that many of the tasks in theoral interview are delivered as a monologue by the candidates. Thedecision to limit the interactiveness of test-tasks was due to the factthat the candidates could not be expected to communicate with thenative or near-native speaker examiners in the same way as theywould with a classroom of second language learners with limited con-trol of the target language (Elder, 1993c). It therefore seemed moreappropriate for candidates to address an imagined classroom audienceand to try to forget the interviewer’s presence. The monologic taskshave the added advantage of allowing the candidates, rather than theinterviewers, to take control of the talk (as teachers tend to do) andto produce extended stretches of discourse.

Interestingly, the feedback gathered from test trials indicated that

it was the monologic tasks, rather than the interactive role plays,which were preferred by both candidates and assessors as measuresof teacher proficiency (Elder, 1994). This suggests that it is easier to




9/22

Catherine Elder 157

sustain the illusion of the candidate as teacher when there is minimalinput from the interviewer. Whether the sample of language elicitedcould be regarded as interactionally authentic is of course anothermatter.

3 DOITE

Because the DOITE test is used to select amongst applicants forteacher education, it is designed to assess candidates’ ability to copewith linguistic demands of the Diploma of Education course as wellas their ability to cope with their future role as teachers in English-medium classrooms. Test-tasks are therefore designed to fit these twobills simultaneously. However, this dual function of the test places

limits on the test’s situational authenticity. Presentations on the con-cepts and processes of mathematics might, for example, be requiredin both the teacher education tutorial and in the secondary schoolclassroom, but the audience for the presentation would in each casebe very different, as would the power relationship between presenterand listener. The following task description (see Figure 1) is fraughtwith ambiguity.

Although it is fairly clear that this task has been set up to test thecandidate’s ability to talk about his or her subject specialism in a

classroom-like or tutorial-like context, the status of and function of the interviewer input is unclear. One interviewer (possibly in thecapacity of supervisor) provides an answer to the candidate’s question

Figure 1 Sample task description from DOITE (Viete, 1998: 184)




10/22


about a problem and the other (possibly in the role of student) asksthe candidate for an explanation or reformulation of his or her inputat one or more points in the presentation.

However, even if the role relationships were made more explicit,and the relationships between the task characteristics and the criteriondomain of reference were more clearly delineated, there is no guaran-tee that the resultant performance by the candidate would be interac-tionally authentic. Interactional authenticity would be dependent onboth candidate and interviewer conforming to the requirements of their respective roles (for example, the nonexpert interviewer’srequest for a reformulation would need to be made in a ‘natural’ andnonintimidatory manner). Lumley and Brown (1996), in their analysisof nurse–patient role play in a test of medical English, have pointedto the possibility that interruptions from the interlocutor – even whenthese are perfectly in keeping with the patient role that they areassuming – may inhibit the candidate to a point where he or she isunable or unwilling to continue the interaction and may therefore failto produce an assessable sample of speech.

In sum, while the three tests cover similar ground in terms of theskills they purport to measure, the framing of test-tasks and theenvironment in which they are administered varies considerably fromtest to test. And while some tests appear more situationally authentic

than others, it is not clear how important this is for the interactionalauthenticity of the test. It goes without saying that any simulation of the teacher role in the test situation requires considerable suspensionof disbelief on the part of all the participants, i.e., the candidate, theinterlocutor and the rater. Attempts to give authority to the candidatein the role of teacher or knower or to wrest it away from the inter-viewer may be counterproductive, since it will be quite clear to allconcerned that the ‘teacher’ has no real authority to draw on nor aclassroom reality to refer to (except of course in the case of the ClasS

where the performance is, in fact, a teaching event). Likewise aninterviewer who assumes the role of student or naive listener, judgeand facilitator of communication will simultaneously be unconvinc-ing, at least for some test-takers, and the more sceptical among themmay end up being penalized for their inability or unwillingness toconform to the script.

The general point to be made is that, as Bachman and Palmer(1996: 174–75) and Douglas (2000: 128) also point out, the require-ment of test usefulness and practicality often necessitates modification

of task characteristics and conditions at the expense of situationalauthenticity. The effect of such modifications on the quality of langu-age performance and on the interpretations we place on the meaning




11/22

Catherine Elder 159

of these performances in relation to the real world domain of refer-ence is far from being fully understood. Further research is needed(1) to map the relationship between the discourse produced in realworld domains and the language behaviour elicited on test-tasksexpressly designed to mirror these domains and, where possible, (2)to determine which features of the testing context are producingobservable disparities between test and real-world discourse. Ideally,this kind of research should be part of the normal iterative processof test validation such that the design blueprints for LSP tests areseen as dynamic rather than static and features of the testing situationcan be modified on an ongoing basis to bring them more closely intoline with the characteristics of the relevant TLU domain.

IV Language ability or classroom competence?: the problem of inseparability

To the extent that we succeed in producing tests which are sufficientlycontext sensitive to elicit a valid sample of ‘domain specific’ behav-iour from the candidate, we face a third boundary problem: that of inseparability. Should we, and indeed can we, assess the contextualfeatures engaged in language performance independently of the langu-age sample itself?

The influence of factors other than language in performance assess-ment has long been acknowledged by language testers (e.g. Jones,1979; Wesche, 1992; McNamara, 1996; Jacoby and McNamara,1999), and while some writers take the view that these nonlinguisticfactors such as sensitivity to audience and personal style are part andparcel of communicative competence, others see them as beingbeyond the scope of language testing or a source of what Messick (1993) describes as construct-irrelevant variance. In discussing thisissue McNamara (1996) makes a distinction between:

• strong performance tests in which test-tasks are the target of theassessment with language being treated as a necessary but insuf-ficient condition of their successful execution; and

• weak performance tests in which language proficiency is assessedindependently of other factors involved in the performance, andtasks serve merely as vehicles for eliciting a relevant languagesample.

He advises against the strong approach, which focuses on the singletest performance, because this limits the generalizability of test-

scores. The weak approach – because it is more concerned withunderlying language skills engaged in performance than with thequalities of the performance in its own right – allows us to make




12/22


inferences about a wider range of future performances in the realworld.

In practice, however, it is doubtful whether McNamara’s weak/ strong distinction can be maintained in a performance-based LSP testsince what ends up being assessed may depend on the way candidatesand their interlocutors manage the particular requirements of the test-ing situation and/or upon the particular orientation of raters involvedin the assessment process. On the LPTT tests, both linguistic and task fulfilment criteria are used in making judgements about performance.While the linguistic criteria focus on the components of languageproficiency as traditionally conceived, such as accuracy, fluency andpronunciation, the task fulfilment criteria require raters to addresssuch questions as:

• Was the style and tone of delivery appropriate for the classroom?• Did the candidate tailor his/her language in such a way as to make it

intelligible to second language learners?• Were the classroom instructions issued in a clear and convincing manner?

(Elder, 1995)

Similarly, on the ClAsS (see Appendix 1) there are items on theschedule which pertain to aspects of ‘teacherly’ behaviour such as:

• clearly marks transitions from one idea/lesson stage to the nextusing words such as so, now, right , we’re going to;

• deals effectively with wrong answers, nonresponse; e.g. byrephrasing questions/reviewing steps in a process.

Raters are required to consider these as well as purely linguisticfeatures of performance in making overall judgements about com-municative effectiveness. While all these features are undoubtedlylanguage-based, they may draw on other aspects of ‘teacherly’ behav-iour.

Combining these different dimensions of communicative com-petence can, however, be problematic. The analysis of rating patternson the LPTT (Italian) showed that assessments made against thelinguistic criteria were sometimes at odds with the ratings assignedfor task fulfilment. A Rasch analysis of test-scores yielded misfittingability estimates for 10 of the 75 candidates who sat for the speakingtest, suggesting that the test as it stands, may be unworkable as ameasurement procedure. Scrutiny of individual score profiles showedthat the ‘misfitting’ candidates were those who achieved either a highscore on the linguistic criteria and a low score for task fulfilment orvice versa (Elder, 1995).

In an attempt to find out the possible source of these ‘disorderly’measurements, transcriptions of test discourse (recorded on videotape)were undertaken for each of the misfitting candidates, i.e., those who




13/22

Catherine Elder 161

performed at consistently high levels on task fulfilment criteria andlow for linguistic competence, on the one hand, and for those whoscored high on the linguistic criteria and low for task fulfilment, onthe other.

A short segment of performance from a representative of each of these two groups of candidates is set out below. The segments aretaken from performance on an instruction-giving task in which candi-dates are given a set of picture prompts and are asked to explain, asthey might to a group of young second language learners, how toperform a simple classroom construction activity – in this case, howto make a paper model of a sheep.

In describing one step of the construction activity (making a paperanimal) Candidate A (who scored ‘high’ on task fulfilment and ‘low’

on linguistic competence) demonstrates what has to be done bydemonstrating with her hands the action of curling a strip of paperwith a pair of scissors and saying:

dovete fare . . . cosı̀ e . . . ecco . . . avete il piede della pecora‘you have to do . . . like this and . . . here . . . you have the sheep’s foot’

Figure 2 Sample task from LPTT (Italian) ( Elder, 1995)




14/22


Her speech is delivered at a slower place and she repeats certainwords.

Candidate B (scoring ‘high’ on linguistic competence and ‘low’ ontask fulfilment), on the other hand, describes the action with wordsrather than gestures, using more sophisticated syntax and more pre-cise lexis:

per formare le zampe del pecorello si prendono le striscie di carta e con leforbici le si . . . arrotolano‘to make the sheep’s hooves you take the strips of paper and with the scissorsyou . . . roll them up’

Note also that Candidate B (who is, in fact, a native speaker of Italian) chooses to use the more formal, impersonal si form of theimperative, whereas Candidate A opts for the simpler and more

inclusive second person plural form.Even from this small segment of the transcript it is easy to see why

Candidate B (who is in fact a native speaker of Italian) was awardeda high rating for linguistic competence but a relatively low mark fortask fulfilment and why the opposite was true for the first candidate.Although Candidate A’s use of cosı̀ (‘like this’) accompanied by agesture may be a direct result of her lack of linguistic competence(i.e., she may not know the appropriate vocabulary to describe thisstep in the activity), she has resorted to a simplification strategy which

is arguably more appropriate for young second learners with limitedlinguistic proficiency than is the more complex and linguisticallyprecise utterance produced by Candidate B.

The above example (and there are others) draws attention to whatmay be a fundamental incompatibility between the traditional notionof general proficiency in a context which assumes a developmentalcontinuum involving an increase in range and complexity of languageuse across all contexts of language use, and the nature of performancein specific situations where features of strategic competence such as

simplicity, clarity and sensitivity to audience may be valued over andabove elaborateness. On a test such as this one it seems that on certaintasks we have a clash of ‘frames’ and that native speakers and otherlinguistically proficient learners, understandably anxious to ‘show off’their level of linguistic sophistication, are sometimes outperformedon certain dimensions of assessment by less proficient speakers whonevertheless respond (whether consciously or unconsciously) moreappropriately to the specific demands of the criterion domain. Wid-dowson (in press) makes a similar point:

There is no way either of knowing how representative the learner’s perform-ance is of a more general ability to communicate. The learner might be success-ful by the ingenious use of avoidance strategies, and these may not be dis-tinguishable from an adherence to the least effort principle that characterises




15/22

Catherine Elder 163

normal pragmatic uses of language. In this case, in effect, all you get is evi-dence of the so-called ‘strategic competence’ without knowing whether it iscompensatory or not, or if it is, what it is compensating for.

On the CLAsS this incompatibility between linguistic and non-linguistic components of the test construct showed up in a lack of agreement between two groups of raters charged with the assessmentof candidate ability: the subject specialists, naive raters in so far asthey had had no language training, and the ESL teachers, who wereaccustomed to dealing with language difficulties but were not neces-sarily familiar with the content of the mathematics or science curricu-lum. For example, there was an unacceptably low correlation betweenthe ratings assigned by the two groups in their evaluation of Item

2 on the observation schedule (Using subject-specific language; seeAppendix 1). Feedback from the raters suggests that the reason forthis disagreement is that the ESL teachers were focusing on the lexis,grammar and internal cohesion of the candidate’s classroom presen-tation and the pronunciation of specialist terminology, while the subject specialists were more concerned about the way in which subjectcontent was conceptualized. These different orientations also affectedeach rater group’s final estimates of candidates’ overall communicat-ive effectiveness, with disagreement as to whether the performance

was satisfactory or unsatisfactory occurring in 20% of cases (Elder,1993b).

The construct of teacher proficiency, as operationalized in theseperformance-based measures of teacher proficiency, is clearly multidi-mensional, and this poses problems for the interpretation andreporting of performance. One solution to this problem would be toseparate the purely linguistic and the more classroom-specific aspectsof performance in our reporting. In the case of LPTT we could setcut-offs on the basis of linguistic scores alone, but use information

about classroom competence, as reflected in ratings assigned for task fulfilment, to assist in decisions about borderline cases where evi-dence of context sensitivity may serve to compensate for linguisticshortcomings. Moreover, in the CLAsS we should perhaps entrust thetask of rating to an ESL teacher rather than a subject specialist. How-ever, in theoretical terms this amounts to a weakening of the tests’claim to specificity. If information about general proficiency isenough, and if the opinion of subject specialists doesn’t count, thenthere seems to be little point, other than to satisfy the need for face

validity, in trying to capture the context-specific nature of languageperformance.




16/22


V Conclusion

This article has illustrated the problems associated with characterizingteacher discourse as a specific purpose domain (the problem of

specificity), in attempting to simulate classroom-like behaviours intest situations (the problem of authenticity) and/or to assess class-room-related aspects of performance alongside other purely linguisticfeatures (the problem of inseparability). Douglas (in press) sees thesethorny issues as the basis for theory-building in LSP, suggesting thatfurther research is needed in all three areas and implying that ourunderstanding of what it takes to predict performance in specificdomains of language use is still quite limited. At the same time, whilethe profession now takes for granted the value of assessing language

in meaningful ‘life-like’ contexts, recent studies by scholars workingin performance testing seem to give pause to that development. Forexample, Iwashita et al. (forthcoming), Norris et al. (1998), Papajohn(1999) and Freedle and Kostin (1999) all appear to be less concernedwith the relationship between test-tasks and their real world counter-parts than with intra-task effects on test-taker performance. In otherwords, they raise the question of whether changing test-task character-istics (prompts, topics, texts) or conditions (planning time,speededness), which are believed to increase or decrease their levelof cognitive demand, will be reflected in test-scores assigned to candi-

dates. This research is, I believe, indicative of both a growing concernabout the indeterminacy of performance-based tasks as a means of measurement and a realization that the LSP testing enterprise of the1980s and 1990s, in spite of its laudable attempt to capture the par-ticularities of real world communication, raises more questions thanit answers. As Skehan (1998) reminds us:

There is still no way of relating underlying abilities to performance and pro-cessing conditions, nor is there any systematic basis for examining the langu-age demands of a range of different contexts. As a result, it is not clear how

different patterns of underlying abilities may be more effective in some circum-stances than others, nor how these underlying abilities are mobilized into actualperformance (Skehan, 1998: 159).

VI References

Bachman, L.F. 1991: What does language testing have to offer? TESOLQuarterly 25(4), 671–704.

Bachman L.F. and Palmer, A.S. 1996: Language testing in practice.Oxford: Oxford University Press.

Douglas, D. 2000: Assessing languages for specific purposes: theory and practice. Cambridge: Cambridge University Press.

—— in press: Three problems in testing language for specific purposes:




17/22

Catherine Elder 165

authenticity, specificity, and inseparability. In Elder, C., Brown, A.,Hill, K.N., Iwashita, N., Lumley, T., McNamara, T.F., O’Loughlin,K., editors, Experimenting with uncertainty: essays in honour of Alan

Davies. Cambridge: Cambridge University Press.

Douglas, D. and Selinker, L. 1993: Performance on a general versus a field-specific test. In Douglas, D. and Chapelle, C., editors, A new decadeof language testing research. Selected Papers from the 1990 LanguageTesting Research Colloquium. Alexandria, VA: TESOL.

Elder, C. 1993a: Language proficiency as predictor of performance inteacher education. Melbourne Papers in Language Testing 2(1), 1–17.

—— 1993b: How do subject specialists construe classroom language pro-ficiency? Language Testing 10(3), 235–54.

—— 1993c: The proficiency test for language teachers: Italian, Volume1: Final report on the test development process. Melbourne: NLLIA

Language Testing Centre, University of Melbourne.—— 1994: Performance testing as benchmark for foreign language teachereducation. Babel. Journal of the Federation of Modern Language Tea-chers Associations 29(2), 9–19.

—— 1995: Are raters’ judgements of language teacher effectiveness whollylanguage-based? Melbourne Papers in Language Testing 3(2), 40–59.

Elder, C. Iwashita, N. and Brown, A. 1995: The proficiency test for langu-age teachers: Japanese, Volume 1: Final report on the test develop-ment process. Melbourne: NLLIA Language Testing Research Centre,University of Melbourne.

Freedle, R. and Kostin, I. 1999: Does the text matter in a multiple-choicetest of comprehension? The case for the construct validity of TOEFL’sminitalks. Language Testing 16(1), 2–32.

Grant, L. 1997: Testing the language proficiency of bilingual teachers: Ari-zona’s Spanish proficiency test. Language Testing 14(1), 23–46.

Hinofotis, F., Bailey, K. and Stern, S. 1981: Assessing the oral proficiencyof prospective FTAs: Instrument development. In Palmer, A.S., Groot,P.J.M. and Trosper G., editors, The construct validation of tests of communicative competence. (pp. 106–126) Washington, DC, Teachersof English to Speakers of Other Languages.

Iwashita, N., McNamara, T. and Elder, C. forthcoming: Can we predicttask difficulty in an oral proficiency test? Exploring the potential of aninformation processing approach to task design. Language Learning.

Jacoby S. and McNamara, T. 1999: Locating competence. English for Spe-cific Purposes 18(3), 213–41.

Jones, R. 1979: Performance testing of second language proficiency. In Bri-ère, E. and Hinofotis, F., editors, Concepts in language testing: somerecent studies. Washington DC: TESOL, 49–57.

Lumley, T. and Brown, A. 1996: Specific purpose language performancetests: task and interaction. In Wigglesworth, G. and Elder, C., editors,The language testing cycle: from inception to washback . AustralianReview of Applied Linguistics, Series S, Number 13, 105–36.

McNamara, T.F. 1996: Measuring second language performance. Londonand New York: Addison Wesley Longman.




18/22


Messick, S. 1993: Validity. In Linn, R.L., editor, Educational measure-ment (3rd edn). National Council on measurement in Education andAmerican Council on Education, Oryx Press, 13–304.

Nicholas, H. 1993: Languages at the crossroads; the report of the national

enquiry into the employment and supply of teachers of languages otherthan English. Victoria: National Language and Literacy Institute of Australia.

Norris, J., Brown, J.D., Hudson, T. and Yoshioka, J. 1998: Designingsecond language performance assessments. Second Language Teach-ing & Curriculum Center, University of Hawai’i at Manoa.

Papajohn, D. 1999: The effect of topic variation in performance testing: thecase of the chemistry TEACH test for international teaching assistants.

Language Testing 16(1), 52–81.Plakans, B. and Abraham, R. 1990: The testing and evaluation of inter-

national teaching assistants. In Douglas, D., editor, English languagetesting in U.S. colleges and universities. Washington, DC: NAFSA.Skehan, P. 1998: A cognitive approach to language learning. Oxford:

Oxford University Press.Smith, J.A. 1992: Topic and variation in oral proficiency in international

teaching assistants. PhD dissertation, University of Minnesota DA9217683.

Stevenson, D.K. 1985: Authenticity, validity and a teaparty. Language Test-ing 2(1), 41–47.

Swales, J. 1990: Genre analysis: English in academic and research settings.

Cambridge: Cambridge University Press.Teasdale, A. 1994: Authenticity, validity and task design for tests of well-defined ESP domains. In R. Khoo, editor, The practice of LSP: per-spectives, programmes and projects. Singapore: SEAMEO RegionalLanguage Centre, 230–24.

Viete, R. 1998: Culturally sensitive and equitable assessment of oral Englishfor overseas-qualified teacher trainees. Journal of Intercultural Studies19(2), 171–84.

Wesche, M. 1992: Second language performance testing: the Ontario testof ESL as an example. Language Testing 4, 28–84.

Widdowson, H. in press: Communicative language testing: the art of thepossible. In Elder, C., Brown, A., Hill, K.N., Iwashita, N., Lumley,T., McNamara, T.F., O’Loughlin, K., editors, Experimenting withuncertainty: essays in honour of Alan Davies. Cambridge: CambridgeUniversity Press.




19/22

Catherine Elder 167

Appendix 1 Classroom language assessment schedule (Elder,

1993b: 251–53)

1. General language proficiency

Rating forIntelligiblity

highly acceptable at risk unsatisfactoryof expressionsatisfactory

Comment Needs(strengths & workweaknesses )

1.1. projects and pitches voice appropriately 1.2. pronounces words/sounds clearly

1.3. utters sentences clearly (i.e. with suitable rhythm and intonation)

1.4. clearly distinguishes questions, statements and instructions

1.5. stresses important words/ideas (says them louder, more slowly, with pauses)

1.6. clearly marks transitions from one idea/lesson stage to the next using words such as so ,now , right , we ’re going to

1.7. uses appropriate facial expression gesture,

body movement

Rating for Fluency &flexibility highly acceptable at risk unsatisfactoryof expression satisfactory

1.8. speaks at appropriate speed

1.9. speaks fluently (i.e., not too much stumbling hesitation, groping for words)

1.10. can express ideas in different ways (e.g. by rephrasing, elaborating, summarizing)

Rating for Accuracy of expression highly acceptable at risk unsatisfactory

satisfactory


1.11. grammar of spoken and written English is

generally accurate1.12. formulates questions clearly

1.13. uses correct spelling and punctuation in boardwork and handouts




20/22


Rating forComprehension

highly acceptable at risk unsatisfactoryof expressionsatisfactory


1.14. demonstrates understanding of student language

1.15. seeks clarification of student language when necessary (e.g. asks them torepeat/rephrase)

2. Using subject-specific language

Rating forSubject-specific

highly acceptable at risk unsatisfactorylanguage satisfactory


2.1. demonstrates knowledge of scientific and mathematical terms

2.2. pronounces specialist terms clearly

2.3. uses specialist terms judiciously (grading

them and writing them on the board whenappropriate)

2.4. makes clear the connections between ideas (stress link words if , since , in order )

2.5. explains scientific and mathematical processes/concepts in ways appropriate tothe audience (using simple language,familiar/concrete examples)

2.6. explains diagrams/models/use of equipment clearly

2.7. description/definition of terms/processes is a

usable model for students’ writtenassignments




21/22

Catherine Elder 169

3. Using the language of classroom interaction

Rating forlanguage

highly acceptable at risk unsatisfactoryClassroom satisfactoryinteraction


Involvement of students in class and lesson content

3.1. uses variety of forms of address (we, you, us/student names)

3.2. poses questions to check understanding of previously learned material/new information

3.3. grades questions appropriately for students and learning task: simpler to more complex;closed/open

3.4. offers questions to individuals and whole class

3.5. clearly signals acceptance/rejection of student response

3.6. responds appropriately to students’ questions, requests for assistance

3.7. deals effectively with wrong answers, nonresponse (e.g. by rephrasingquestions/reviewing steps in a process)

3.8. adopts appropriate level of formality/firmness

3.9. gives clear instructions

3.10. maintains contact with class while dealing with individual demands/using blackboard,etc.

Overall communicative effectiveness

Rating forOverall

highly acceptable at risk unsatisfactoryeffectiveness satisfactory




22/22


Appendix 2 Language proficiency test for teachers (Italian):brief test description

Phase Task Audience Mode

1. Warm-up and Brief getting to know you Interviewer Dialoguediscussion conversation and

discussion of aspects ofteacher role

2. Read aloud Read aloud a short story Whole class Monologueand explain meaning of (imagined)selected words from thepassage

3. Story retell Retell the same story Whole class Monologueusing picture book ( imagined )

prompts

4. Instructions Give directions as to how Whole class Monologueto carry out a simple (imagined)construction activity usingpicture prompts

5. Assign and model a Explain participant roles Interviewer Dialogueroleplay using cue cards and (as student)

perform roleplay

6. Explain learner Explain specified mistakes Interviewer Dialogueerror in student writing (as student)

assessing the language proficiency of teachers, are there any borders control¿

Documents