www.cemcentre.org teacher assessment versus exams peter tymms cem, durham university
TRANSCRIPT
![Page 1: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/1.jpg)
www.cemcentre.org
Teacher Assessment versus Exams
Peter Tymms
CEM, Durham University
![Page 2: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/2.jpg)
Overview• The Issue• The importance of LAs, Schools and teachers• Fairness and bias• Coverage and sampling• Teacher assessment• Exams and tests• Predictive validity• Conclusions
![Page 3: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/3.jpg)
The Issue
• Teacher assessment is unfair because it is unreliable and biased.
• Exams are simply snapshots and
are unrepresentative of the work that has really be done
![Page 4: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/4.jpg)
Which matters most?
1. LA
2. School
3. Teacher
4. Pupil
![Page 5: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/5.jpg)
Newcastle Commission: Data Sources
• Several national datasets including
– ASPECTS, PIPS, MidYIS & YELLIS – KS1, KS2, KS3 & GCSE
• Looked a value-added using 3 level multilevel models
![Page 6: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/6.jpg)
Example using KS2 English
![Page 7: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/7.jpg)
Pupil raw Pupil value-added
School raw
School value added
LEA raw LEA value added
-3.00
-2.00
-1.00
0.00
1.00
2.00
![Page 8: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/8.jpg)
Pupil raw Pupil value-added
School raw
School value added
LEA raw LEA value added
-3.00
-2.00
-1.00
0.00
1.00
2.00
![Page 9: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/9.jpg)
Pupil raw Pupil value-added
School raw
School value added
LEA raw LEA value added
-3.00
-2.00
-1.00
0.00
1.00
2.00
![Page 10: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/10.jpg)
Pupil raw Pupil value-added
School raw
School value added
LEA raw LEA value added
-3.00
-2.00
-1.00
0.00
1.00
2.00
![Page 11: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/11.jpg)
Pupil raw Pupil value-added
School raw
School value added
LEA raw LEA value added
-3.00
-2.00
-1.00
0.00
1.00
2.00
![Page 12: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/12.jpg)
Pupil raw Pupil value-added
School raw
School value added
LEA raw LEA value added
-3.00
-2.00
-1.00
0.00
1.00
2.00
![Page 13: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/13.jpg)
Willms’ Diagram
![Page 14: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/14.jpg)
The Teacher Effect
![Page 15: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/15.jpg)
Repeated Boosts: Vocabulary
0
1
2
3
4
5
ER Y1 Y2 Y3 Y4 Y5 Y6
Year
Le
ve
ls
![Page 16: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/16.jpg)
Which matters most?
1. LA
2. School 3. Teacher 4. Pupil
![Page 17: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/17.jpg)
Pupils vary enormously
Teachers have the greatest impact
Schools are relevant
Authorities hardly vary at all
Conclusion
![Page 18: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/18.jpg)
Hypothesis
• The best teachers will be best at judging their students
![Page 19: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/19.jpg)
What is bias?
• Bias appears in a test when part of an assessment is harder for a particular group.
• Or when an assessor systematically downgrades a group or an individual for
construct irrelevant reasons
![Page 20: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/20.jpg)
Example of item bias
Pigeon
Turtle
![Page 21: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/21.jpg)
Examples of teacher bias
• Annecdote• By Sex (eg baseline & page 17 Harlen)• By ability – judgement anchored by experience• By Ethnicity – assault experiments• By social class• By behaviour (origin of ability testing. Binet)• By Age – (EPICure study)• By incident – eg spilling a glass of water.
• The halo (or horns) effect (e.g. P scales)
![Page 22: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/22.jpg)
speak. listen. read. write. using number shapes sci. enq life proc. mat. prop.speakinglistening 0.98reading 0.86 0.86writing 0.87 0.86 0.93using 0.80 0.80 0.79 0.81number 0.84 0.84 0.85 0.87 0.89shape 0.86 0.86 0.85 0.87 0.89 0.93sci. enq 0.78 0.78 0.75 0.78 0.82 0.82 0.83life proc. 0.80 0.80 0.76 0.79 0.81 0.82 0.83 0.95mat. prop 0.79 0.79 0.75 0.78 0.82 0.83 0.84 0.96 0.97phys. proc 0.78 0.78 0.75 0.78 0.82 0.82 0.84 0.95 0.96 0.97
P Scales in 2004
![Page 23: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/23.jpg)
Teacher reliability
• How should reliability be assessed– By looking at the internal consistency of
judgements?– By looking at the link to external
assessments?– By comparing over time? – By comparing one teacher with others?
• Facets model within Rasch measurement
![Page 24: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/24.jpg)
Trusting teachers’ judgement Harlen 2005
“The findings of the review by no means constitute a ringing endorsement of teachers’ assessment; there was evidence of low reliability and bias in teachers’ judgements”
![Page 25: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/25.jpg)
5-14, Portfolios & single level tests
• 5-14 assessments
• What about portfolios?– inter-rater very low for maths and writing
• English teacher levels in SATs – early 1990s “considerable error”– later quite common to find teacher = test
results– single level tests compromised by teacher
judgement
![Page 26: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/26.jpg)
Is it OK for teachers to assess their own pupils for High Stakes exams?
• How does the power to grade affect relationships?
• Would you give McEnroy a B?
![Page 27: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/27.jpg)
Exam/test reliability
• Typically around 0.9 but …
• Distinguish the assessment of–Convergent questions–Divergent questions
![Page 28: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/28.jpg)
Exam/test bias
• Pre-tests are often used to address issues of bias
• But we put much reliance on judgment.
• England’s major exams are largely not pre-tested.
![Page 29: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/29.jpg)
Are Exams inappropriate snapshots?
• Issue 1: Questions must be representative samples of the course under exam conditions.
• Issue 2: Constraint on the nature of the assessment – Multi-method Multi-trait challenge
• Issue 3: Impact of stress on performance– Positive & Negative (links to introversion)
![Page 30: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/30.jpg)
Introvert and Extrovert
Stimulus
Effort
![Page 31: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/31.jpg)
We need to match format to content
• Some things must be assessed by judgement:– Social interactions– Quality of research– Poetry– Art
• Some things are best assessed left to tests– Mental arithmetic– Spelling– Phonological awareness– Diagnostic assessments (e.g. INCAS)
• Even so perhaps there is a final arbiter
![Page 32: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/32.jpg)
Developed ability test (MidYIS/IQ/etc)
Attainment test (Std
Grade/Highers)
Teacher Grade
Later success – degree, salary etc
Predictive validity
![Page 33: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/33.jpg)
We need the evidence but ..
• Prediction is often poor
– Two major reasons
![Page 34: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/34.jpg)
Prediction of Educational Achievement
Prior Achievement
La
ter
Ach
ieve
me
nt
![Page 35: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/35.jpg)
Correlation = 0.7
Prior Achievement
La
ter
Ach
ieve
me
nt
![Page 36: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/36.jpg)
Select top 15%
Prior Achievement
La
ter
Ach
ieve
me
nt
![Page 37: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/37.jpg)
Correlation = 0.39
Prior Achievement
La
ter
Ach
ieve
me
nt
![Page 38: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/38.jpg)
Cream top 3%; r=0.19
Prior Achievement
La
ter
Ach
ieve
me
nt
![Page 39: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/39.jpg)
So, poor prediction because of
• Prior selection
• Variable outcome measures
![Page 40: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/40.jpg)
Conclusion: Judgements or tests?
• Should we do both? (Profiles)– But, how do we ensure that judgements
and tests are independent?– How can judgements be kept free from
bias?
• Virtually impossible in high stakes tests
• Essential for formative work
![Page 41: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/41.jpg)
![Page 42: Www.cemcentre.org Teacher Assessment versus Exams Peter Tymms CEM, Durham University](https://reader035.vdocument.in/reader035/viewer/2022062404/5515cb59550346a3758b4b22/html5/thumbnails/42.jpg)
References
• Campbell, D. T., & Fiske, D. W. (1959). Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix. Psychological Bulletin, 56, 81-105.
• Cooper, B. (1998). Using Bernstein and Bourdieu to understand children's difficulties with "realistic" mathematics testing: an exploratory study. Qualitative Studies in Education, II(4), 511-532.
• Eysenck, H. J. (2006) The Biollogical Basis of Personaility.Transaction publishers• Harlen, W. (2005). Trusting teachers' judgement: research evidence of reliability and validity of
teachers' assessment used for summative purposes. Research Papers in Education, 20(3), 245-270.
• Johnson, S., Hennessy, E., Smith, R., Trikic, R., Wolke, D., & Marlow, N. (2009). The EPICure Study: Academic attainment and special educational needs in extremely preterm children at 11 years. London: Nottingham/London/Warwick.
• Koretz, D., Stecher, B. M., Klein, S. P. & McCaffrey, D. (1994) The Vermont Portfolio Assessment• Program: findings and implications, Educational Measurement: Issues & Practice, 13, 5–16.• Tymms, P. (1997). Value-added Key Stage 1 to Key Stage 2. London: School Curriculum and
Assessment Authority.• Tymms, P., Jones, P., Albone, S., & Henderson, B. (2009). The first seven years at school.
Educational Assessment and Evaluation Accountability, 21, 67-80.• Tymms, P., Merrell, C., Heron, T., Jones, P., Albone, S., & Henderson, B. (2008). The importance
of districts. School Effectiveness and School Improvement, 19(3), 261-274.• Tymms, P., Merrell, C., & Jones, P. (2004). Using baseline assessment data to make international
comparisons. British Educational Research Journal, 30(5), 673-689.• Willms, J. D. (1987). Differences Between Scottish Educational Authorities in their Examinations
Attainment. Oxford Review of Education, 13(2), 211-232.