a national exam‘swashback on readingassessment in ... doris...post new matura/ pilotphase •...
TRANSCRIPT
A national exam‘s washback on
reading assessment
in the secondary classroom
Doris Froetscher
Lancaster University / BIFIE Vienna
1
Context: Austrian school-leaving exam
• „Matura“
• high-stakes– entry into higher education
• standardized national exam– obligatory by 2015
• pilot phase– started 2008
– schools participate voluntarily
– increasing number (2013: 90% of schools/ >12,000 test takers)
• before reform: teacher-made Matura
2
The new Matura
• English: B2 level
– Reading, Listening, Language in Use, Writing
• Reading paper
– 4 independent tasks
– 6-10 items each
– developed by teachers trained as item writers
– field tested and standard set
3
Research Question
What is the washback of the standardized Matura for English in Austria on how reading is assessed in class tests?
• important to research the ways in which external tests affect assessment in the classroom – Wall and Alderson, 1993; Wall and Horak, 2006; Watanabe,
2000
4
Class tests
• 2-3 times per school year
• usually 50% of final grade
• make-up not regulated until Oct12
5
Research into
washback of standardized tests
on classroom-based assessment
• few studies
• general education– Abrams et al., 2003; Mabry et al, 2003; McMillan et al., 1999;
Mertler, 2010; Stecher et al., 1998; Tierney, 2006
• language testing– Tsagari, 2009; Wesdorp, 1982; Wall and Alderson, 1993
• main methods used
– surveys
– interviews
6
This study
addresses research gap in terms of
- focus
- washback of an external exam on classroom assessment
- method
- analysis of actual class test tasks
- tools
- specially developed instrument IART
- Instrument for analysis of reading tasks
- statistical analyses
- crosstabulations
- Chi-Square statistic
- Mann-Whitney-U Test7
Participants and data
• 22 teachers
• 126 class tests from year 12 (final year, 18 year-olds)
– pre 2008: n=61
– post 2008: n=65
• containing 181 “reading“ tasks
– pre 2008: n=95
– post 2008: n=86
8
Two-step approach
A. global analysis
– all tasks
B. detailed analysis
– specially designed instrument
– smaller number of tasks
9
A. Presence of reading tasks in tests
77%
58,5%
23%
41,5%
pre post
no reading task
includes a reading task
11
A. Types of reading tasks
Reading
You are going to read a magazine article. For questions 13-19, choose
the answer (A, B, C or D) which you think fits best according to the
text. Put the correct letter to the number in the box below. There is an
example at the beginning.
Reading & writing
Read through the article after having read through the comprehension
questions below. Then answer these questions in your own words and
meaningful sentences.
Reading into writing
Write a reader’s letter to TIME (or an e-mail) telling the editors what
you think and feel about this issue dealt with in the report. (Ca. 100
words)
12
29,5%
1%
51,6%
1%
18,9%
98%
pre post
reading
reading & writing
reading into writing
A. Types of reading tasks
13
p <.001, 2df
A. Possible explanations
PRE
Old Matura regulations
• integrated reading/writing task
• no separate reading part
POST
New Matura/ pilot phase
• Reading and Writing separate
• but also
– Listening
– Language in Use
14
Class tests
• integrated tasks
• few reading-only tasks
Class tests
• 2-4 skills tested separately– Listening
– Reading
– Language in Use
– Writing14
A. Test methods used
15
61%
18% 15%
6%0%2%
45%
0%
25%20%
Comprehension
questions
Matching Summary Multiple choice Short answer
(4w)
pre
post
B. Detailed analysis: IART
Instrument for the analysis of reading tests
• 32 judgement questions
– text, task, item characteristics
• sources
– new Matura test specifications for Reading
– CEFR / Dutch CEFR Grid
– ALTE task analysis checklists
17
B. Detailed analysis: IART
• pilot studies
– Inter-judge agreement
• 7 raters, 6 tasks (37 items)
• AC1 measure calculated with AgreeStat (Gwet, 2011)
• coefficients between .24 and .86 (45% - 96%)
– Intra-judge agreement
• 1 rater, 2 tasks (11 items), 2 rounds
• nominal variables: 84% agreement
• ordinal variables: Pearson Correlation .816, p<.001
18
B. Detailed analysis: IART
• example of a question
How clearly is the item worded?a. clear
b. rather clear
c. rather unclear
d. very unclear
• tasks analysed– 31 pre
– 43 post
• 26 from past papers of the new Matura
19
Could the topic of the text cause emotional distress?
39%
9%
61%
91%
pre post
no
yes
B. Topic distress
20
p =.002, 1df
Are the instructions clear?
B. Clarity of instructions
21
58%
100%
42%
0%
pre post
partly
fully
p <.001, 1df
How clearly is the item worded?
B. Item clarity
24
U=24645, p=.002
76,3%87%
10,7%6%9,0%
3%4,0% 4%
pre post
unclear
rather unclear
rather clear
clear
How clearly is the item worded? (Excluding past papers)
B. Item clarity (2)
25
76,3% 82,1%
10,7%7,5%
9,0% 3,0%
4,0% 7,5%
pre post
unclear
rather unclear
rather clear
clear
How difficult is the language of the item in relation to
the text?
B. Language of the item
26
U=23372, p=.00126,6%40,7%
70,6%58,7%
2,8% 0,6%
pre post
more difficult
same
easier
How would you classify the distracters for this item?
B. Distracters
27
U=7304, p<.001
0,0% 4,2%
64,4%
89,2%
36,6%
6,6%
pre post
weak
good
too strong
B. Teacher variable
teachers in pilot phase
� higher proportion of tests WITH reading tasks
� less integration with writing
� higher use of past papers
� fewer texts which might cause distress
28
No washback
features where no change was found
- CEFR domains of texts
- authenticity of texts
- reading behaviours targeted (R. Green, 2000)
- difficulty of the items
29
Limitations
• restricted number of tasks
• limited sample
• data based on judgement
– possible researcher bias
30
Tentative conclusions
• positive washback
– testing of reading, not (also) writing
– practice students‘ familiarity with test methods
– increased “usefulness“of class tests (Bachman & Palmer, 1996)
• unproblematic topics, clear instructions, example item,
better distracters
• use of past papers = professionally developed tasks
31
Tentative conclusions
• negative washback
– narrowing down of task types
– teaching to the test?
• strength of washback
– higher effect with teachers in pilot phase
• methodological conclusions
– analysis of classroom tests useful for washback research
– applying an instrument like IART seems suitable
32
Next steps
Interviews with teachers
- views on new Matura‘s influence on their
assessment practices
- approaches towards class tests
- reasons for selecting tasks, past papers
- link between washback on class tests and other
classroom assessment
- achievement dimension
- formative assessment
33
References Abrams, L. M., Pedulla, J. J., & Madaus, G. F. (2003). Views from the classroom: Teachers'
opinions of statewide testing programs. Theory Into Practice, 42(1), 18-29.
Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied linguistics, 14(2), 115.
ALTE (n.d.). Individual Component Checklists (Reading). retrieved rom:
http://events.alte.org/projects/analysis.php
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice (Vol. 377): Oxford
University Press.
Council of Europe (2001). Common European Framework of Reference for Languages:
learning, teaching, assessment. Cambridge: Cambridge University Press.
Green, R. (2000). An empirical investigation of the componentiality of E.A.P. reading and
E.A.P. listening through language test data. PhD thesis, University of Reading.
Gwet, K. (2011). AgreeStat (Version 2011.2): Advanced Analytics, LLC.
Mabry, L., Poole, J., Redmond, L., & Schultz, A. (2003). Local impact of state testing in
southwest Washington. Education Policy Analysis Archives, 11, n22.
McMillan, J., Myran, S., & Workman, D. (1999). The impact of mandated statewide testing
on teachers’ classroom assessment and instructional practices.
Mertler, C. A. (2010). Teachers' perceptions of the influence of No Child Left Behind on
classroom practices. Current Issues in Education, 13(3). 35
References Stecher, B., Barron, S., Kaganoff, T., & Goodwin, J. (1998). The effects of standards-based
assessment on classroom practices: Results of the 1996-1997 RAND survey of Kentucky
teachers of mathematics and writing. University of California, National Center for
Research on Evaluation, Standards, and Student Testing (CRESST), Los Angeles.
Tierney, R. (2006). Changing practices: influences on classroom assessment. Assessment in
Education: Principles, Policy & Practice, 13(3), 239-264. doi:
10.1080/09695940601035387
Tsagari, D. (Ed.). (2009). The complexity of test washback (Vol. 15). Frankfurt am Main:
Peter Lang.
Wall, D., & Alderson, J. C. (1993). Examining washback: the Sri Lankan impact study.
Language Testing, 10(1), 41.
Wall, D., & Horák, T. (2006). The impact of changes in the TOEFL ® examination on teaching
and learning in Central and Eastern Europe: Phase 1, the baseline study. RESEARCH
REPORT-EDUCATIONAL TESTING SERVICE PRINCETON RR, 6.
Watanabe, Y. (2000). Washback effects of the English section of the Japanese university
entrance examinations on instruction in pre-college level EFL. Language Testing Update,
27, 42-47.
36