a national exam‘swashback on readingassessment in ... doris...post new matura/ pilotphase •...

A national exam‘s washback on

reading assessment

in the secondary classroom

Doris Froetscher

Lancaster University / BIFIE Vienna

1

Context: Austrian school-leaving exam

• „Matura“

• high-stakes– entry into higher education

• standardized national exam– obligatory by 2015

• pilot phase– started 2008

– schools participate voluntarily

– increasing number (2013: 90% of schools/ >12,000 test takers)

• before reform: teacher-made Matura

2

The new Matura

• English: B2 level

– Reading, Listening, Language in Use, Writing

• Reading paper

– 4 independent tasks

– 6-10 items each

– developed by teachers trained as item writers

– field tested and standard set

3

Research Question

What is the washback of the standardized Matura for English in Austria on how reading is assessed in class tests?

• important to research the ways in which external tests affect assessment in the classroom – Wall and Alderson, 1993; Wall and Horak, 2006; Watanabe,

2000

4

Class tests

• 2-3 times per school year

• usually 50% of final grade

• make-up not regulated until Oct12

5

Research into

washback of standardized tests

on classroom-based assessment

• few studies

• general education– Abrams et al., 2003; Mabry et al, 2003; McMillan et al., 1999;

Mertler, 2010; Stecher et al., 1998; Tierney, 2006

• language testing– Tsagari, 2009; Wesdorp, 1982; Wall and Alderson, 1993

• main methods used

– surveys

– interviews

6

This study

addresses research gap in terms of

- focus

- washback of an external exam on classroom assessment

- method

- analysis of actual class test tasks

- tools

- specially developed instrument IART

- Instrument for analysis of reading tasks

- statistical analyses

- crosstabulations

- Chi-Square statistic

- Mann-Whitney-U Test7

Participants and data

• 22 teachers

• 126 class tests from year 12 (final year, 18 year-olds)

– pre 2008: n=61

– post 2008: n=65

• containing 181 “reading“ tasks

– pre 2008: n=95

– post 2008: n=86

8

Two-step approach

A. global analysis

– all tasks

B. detailed analysis

– specially designed instrument

– smaller number of tasks

9

RESULTS

1010

A. Presence of reading tasks in tests

77%

58,5%

23%

41,5%

pre post

no reading task

includes a reading task

11

A. Types of reading tasks

Reading

You are going to read a magazine article. For questions 13-19, choose

the answer (A, B, C or D) which you think fits best according to the

text. Put the correct letter to the number in the box below. There is an

example at the beginning.

Reading & writing

Read through the article after having read through the comprehension

questions below. Then answer these questions in your own words and

meaningful sentences.

Reading into writing

Write a reader’s letter to TIME (or an e-mail) telling the editors what

you think and feel about this issue dealt with in the report. (Ca. 100

words)

12

29,5%

1%

51,6%

1%

18,9%

98%

pre post

reading

reading & writing

reading into writing

A. Types of reading tasks

13

p <.001, 2df

A. Possible explanations

PRE

Old Matura regulations

• integrated reading/writing task

• no separate reading part

POST

New Matura/ pilot phase

• Reading and Writing separate

• but also

– Listening

– Language in Use

14

Class tests

• integrated tasks

• few reading-only tasks

Class tests

• 2-4 skills tested separately– Listening

– Reading

– Language in Use

– Writing14

A. Test methods used

15

61%

18% 15%

6%0%2%

45%

0%

25%20%

Comprehension

questions

Matching Summary Multiple choice Short answer

(4w)

pre

post

A. Use of tasks from past Matura papers

• all test methods of the new Matura

16

65%

35%

past papers

B. Detailed analysis: IART

Instrument for the analysis of reading tests

• 32 judgement questions

– text, task, item characteristics

• sources

– new Matura test specifications for Reading

– CEFR / Dutch CEFR Grid

– ALTE task analysis checklists

17


• pilot studies

– Inter-judge agreement

• 7 raters, 6 tasks (37 items)

• AC1 measure calculated with AgreeStat (Gwet, 2011)

• coefficients between .24 and .86 (45% - 96%)

– Intra-judge agreement

• 1 rater, 2 tasks (11 items), 2 rounds

• nominal variables: 84% agreement

• ordinal variables: Pearson Correlation .816, p<.001

18


• example of a question

How clearly is the item worded?a. clear

b. rather clear

c. rather unclear

d. very unclear

• tasks analysed– 31 pre

– 43 post

• 26 from past papers of the new Matura

19

Could the topic of the text cause emotional distress?

39%

9%

61%

91%

pre post

no

yes

B. Topic distress

20

p =.002, 1df

Are the instructions clear?

B. Clarity of instructions

21

58%

100%

42%

0%

pre post

partly

fully

p <.001, 1df

Is there an example?

B. Example item

23%

79%

77%

21%

pre post

no

yes

22

p <.001, 1df

Is there an example? (Excluding past papers)

B. Example item (2)

23%

47%

77%

53%

pre post

no

yes

23

How clearly is the item worded?

B. Item clarity

24

U=24645, p=.002

76,3%87%

10,7%6%9,0%

3%4,0% 4%

pre post

unclear

rather unclear

rather clear

clear

How clearly is the item worded? (Excluding past papers)

B. Item clarity (2)

25

76,3% 82,1%

10,7%7,5%

9,0% 3,0%

4,0% 7,5%

pre post

unclear

rather unclear

rather clear

clear

How difficult is the language of the item in relation to

the text?

B. Language of the item

26

U=23372, p=.00126,6%40,7%

70,6%58,7%

2,8% 0,6%

pre post

more difficult

same

easier

How would you classify the distracters for this item?

B. Distracters

27

U=7304, p<.001

0,0% 4,2%

64,4%

89,2%

36,6%

6,6%

pre post

weak

good

too strong

B. Teacher variable

teachers in pilot phase

� higher proportion of tests WITH reading tasks

� less integration with writing

� higher use of past papers

� fewer texts which might cause distress

28

No washback

features where no change was found

- CEFR domains of texts

- authenticity of texts

- reading behaviours targeted (R. Green, 2000)

- difficulty of the items

29

Limitations

• restricted number of tasks

• limited sample

• data based on judgement

– possible researcher bias

30

Tentative conclusions

• positive washback

– testing of reading, not (also) writing

– practice students‘ familiarity with test methods

– increased “usefulness“of class tests (Bachman & Palmer, 1996)

• unproblematic topics, clear instructions, example item,

better distracters

• use of past papers = professionally developed tasks

31

Tentative conclusions

• negative washback

– narrowing down of task types

– teaching to the test?

• strength of washback

– higher effect with teachers in pilot phase

• methodological conclusions

– analysis of classroom tests useful for washback research

– applying an instrument like IART seems suitable

32

Next steps

Interviews with teachers

- views on new Matura‘s influence on their

assessment practices

- approaches towards class tests

- reasons for selecting tasks, past papers

- link between washback on class tests and other

classroom assessment

- achievement dimension

- formative assessment

33

Thank you!

[email protected]

34

References Abrams, L. M., Pedulla, J. J., & Madaus, G. F. (2003). Views from the classroom: Teachers'

opinions of statewide testing programs. Theory Into Practice, 42(1), 18-29.

Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied linguistics, 14(2), 115.

ALTE (n.d.). Individual Component Checklists (Reading). retrieved rom:

http://events.alte.org/projects/analysis.php

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice (Vol. 377): Oxford

University Press.

Council of Europe (2001). Common European Framework of Reference for Languages:

learning, teaching, assessment. Cambridge: Cambridge University Press.

Green, R. (2000). An empirical investigation of the componentiality of E.A.P. reading and

E.A.P. listening through language test data. PhD thesis, University of Reading.

Gwet, K. (2011). AgreeStat (Version 2011.2): Advanced Analytics, LLC.

Mabry, L., Poole, J., Redmond, L., & Schultz, A. (2003). Local impact of state testing in

southwest Washington. Education Policy Analysis Archives, 11, n22.

McMillan, J., Myran, S., & Workman, D. (1999). The impact of mandated statewide testing

on teachers’ classroom assessment and instructional practices.

Mertler, C. A. (2010). Teachers' perceptions of the influence of No Child Left Behind on

classroom practices. Current Issues in Education, 13(3). 35

References Stecher, B., Barron, S., Kaganoff, T., & Goodwin, J. (1998). The effects of standards-based

assessment on classroom practices: Results of the 1996-1997 RAND survey of Kentucky

teachers of mathematics and writing. University of California, National Center for

Research on Evaluation, Standards, and Student Testing (CRESST), Los Angeles.

Tierney, R. (2006). Changing practices: influences on classroom assessment. Assessment in

Education: Principles, Policy & Practice, 13(3), 239-264. doi:

10.1080/09695940601035387

Tsagari, D. (Ed.). (2009). The complexity of test washback (Vol. 15). Frankfurt am Main:

Peter Lang.

Wall, D., & Alderson, J. C. (1993). Examining washback: the Sri Lankan impact study.

Language Testing, 10(1), 41.

Wall, D., & Horák, T. (2006). The impact of changes in the TOEFL ® examination on teaching

and learning in Central and Eastern Europe: Phase 1, the baseline study. RESEARCH

REPORT-EDUCATIONAL TESTING SERVICE PRINCETON RR, 6.

Watanabe, Y. (2000). Washback effects of the English section of the Japanese university

entrance examinations on instruction in pre-college level EFL. Language Testing Update,

27, 42-47.

36

a national exam‘swashback on readingassessment in ... doris...post new matura/ pilotphase •...

Documents