the future of measurement: making learning...

The Future of Measurement: Making learning visible

John Hattie Melbourne Education Research Institute (MERI)

University of Melbourne

The beginnings ...

Chinese Han Dynasty (206 BCE to 220 CE) • Entrance examination

for the civil service

from observation &

testimonials to written

examinations To avoid nepotism and corruption.

Classical model 1904+

It is simple, elegant, and surprisingly durable

Presence of errors in

measurement

True score = Observed+ Error

Lord and Novick (1968)

Xjg, = tjg +Ejg

Item Response Theories 1950s+

b Difficulty a Discrimination c Guessing

p(u=1|θ) = c + (1-c) ea(θ-b)

1 + ea(θ-b)

Major advances

• test-score equating

• detection of item bias

• adaptive testing

• extensions to polytomous items,

to attitudes and personality

Moving beyond classical IRT 2000’s+

Newer models based on:

• time completing items

• multiple attempts on items

• multiple abilities or cognitive processes

• multidimensionality

Advances

• computer based testing of many forms

• development of reporting engines

• computerized essay scoring

• beginnings of integrating of cognitive processes

• merging testing advances with instruction

Advances in Technology

1870’s

2000’s

Teaching has not changed (much) over the past 200 years

Advances in teaching?

Assessment in classrooms

Backward Design • % of students talked to .

• student questions

• peer feedback

• collaboration among

students

• early years of reading

Software Applications

Virtual Realities

Hypermedia testing

Learner controlled Interactive solutions

Interactive video

Reading self-efficacy scale

Information Function

Preference 72% Web 28% Paper

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

Test Information Functions for Paper-and-Pencil and Web-based Versions

Form A TIF

Form B TIF

Collaborative problem solving

Concept of Validity

Validity is an integrated evaluative judgment of the degree to which

empirical evidence and theoretical rationales support the “adequacy”

and “appropriateness” of “inferences” and “actions” based on test

scores or other modes of assessment.

ASSESSMENT

= Feedback for Teachers

= Teachers as Evaluators

= Making Learning Visible

Influences on Achievement

Decreased Enhanced Zero

What is the typical effect across ?

• 900+ meta-analysis

• 50,000 studies, and

• 240+ million students

The Typical Effect

Effect on Achievement time?

Typical Effect Size

Decreased Enhanced Zero

Distribution of effects

25000 N

f Effe

The disasters ... Rank BOTTOM Influences Studies Effects ES

113 Class size 113 802 .21

114 Charter schools 18 18 .20

125 Matching learning styles 244 1234 .17

130 Mentoring 74 74 .15

131 Ability grouping 500 1369 .12

133 Gender 3168 6293 .12

134 Teacher education 106 412 .12

136 Teacher subject matter knowledge 92 424 .09

145 Class architecture 315 333 .01

148 Retention 229 2882 -.13

Rank AVERAGE Influences Studies Effects ES

41 Study skills 668 2217 .59

43 Outdoor/ Adventure Programs 187 429 .52

46 Play Programs 70 70 .50

47 Second/Third chance programs 52 1395 .50

52 Early Intervention 1704 9369 .47

54 Pre school programs 358 1822 .45

57 Teacher expectations 674 784 .43

83 Effects of testing 569 1749 .34

92 Test-based accountability systems 92 92 .31

Rank TOP influences Studies Effects ES

1 Student expectations 209 305 1.44

3 Response to intervention 13 107 1.07

4 Teacher credibility 51 51 .90

5 Providing formative evaluation 30 78 .90

7 Structured classroom discussion 42 42 .82

10 Feedback 1310 2086 .75

12 Teacher-student relationships 229 1450 .72

25 Not labeling students 79 79 .61

48 Goals (Difficult vs. Do your best) 895 1111 .50

32 Worked examples 62 151 .57

Visible Teaching – Visible Learning

When teachers SEE learning through the eyes of the student

and when students SEE themselves as their own teachers.

r Gift

Feedback

Average effect = .40

The Power of Feedback

What do Victorian TEACHERS say ...

• Constructive, commentary, correction,

conversation, compliance to task

• Give lots of feedback

• Tells me “Where to next”

• Receive little feedback

What do Victorian STUDENTS say…… say…….

RCT and Experimental studies

• Including effort

• With Praise

NO FEEDBACK

Clarity and Challenge of Success Criteria

Prior achievement

FEEDBACK

• Competence/Rubric

• Comparative/ SociaI

• Individual

Assessment Backward Design

• Must emphasize interpretation and Where to next?

• Must have consequences

• Must tell teachers where and for whom they have had impact success

• Must relate to what has been recently taught relative to success criteria

• Must relate to curriculum fidelity --concepts NOT items

• Must be individual and group rich

• Must include Progress as well as Proficiency

• Must be immediate & tailored to the “here and now”

• Must include Diagnostic as well as Success reporting

• Must be based on modern test theory

• Must have multiple modes of delivery

• Must balance surface and deep processing

• Must emphasize test information – particularly formative interpretations

Best test design for Visible Learning

ASSESSMENT REPORTING ENGINE

Creating a test

Reading, Writing, Numeracy Panui, Tuhituhi, Pangarua

Curriculum

Difficulty level

Test characteristic curve

Delivery method

Creating the Test

The Console report

Individual Learning Pathways

Group Learning Pathways

Curriculum Level Report

Teacher or student target

Polynomial regression target

Target setting and expectations

The what next report

Diagnostic advancement leveraging appropriately targeted online resources

Additions to reporting

SURFACE One idea. Who painted Guernica? Many ideas. Outline at least two compositional principles that Picasso used in Guernica. DEEP Relate ideas. Relate the theme of Guernica to a current event. Extend Ideas. What do you consider Picasso was saying via his painting of Guernica?

•Video assessment •Essay scoring •Complexity/ Cognitive process scoring

•Video assessment •Essay scoring •Complexity scoring •Reporting progress

• Video assessment

• Essay scoring

• Complexity scoring

• Reporting progress

• Value added

• Essay scoring

• Value added

• Other domains

•Year 0-4 •Adult literacy •Linked to LMS, to parents, to curricula change •Other tests – spelling, commercial, health

• Essay scoring

• Value added

• Other domains

• World wide testing

• Essay scoring

• Value added

• Other domains

• World wide testing

• Cognitive processes

• Pattern recognition

• Developing production rules

• Different levels of integration of

knowledge

• Different degrees of procedural

skills

• Speed of processing

• Misconceptions

• Reports about cognitive strengths

and gaps

• Activate instructional components

Gaussian distribution

of errors

Lord & Novick CTT

to IRT

Need for new

1788 1968 2012 ??

1905 1998

Spearman

start of CTT

Hambleton & Van der Linden

IRT to MIRT

120 30 60

The Future of Assessment

TEST ITEMS REPORTING ENGINE

Visible Adolescent Unlocking Assessing teachers Intelligence learning reputations formative for professional & Intelligence & risk assessment certification Testing

Forthcoming Hattie, J. & Anderman E (Eds.) Handbook of student achievement Yates, G. & Hattie, J. Making Learning Visible jhattie@unimelb.edu.au

The future of Assessment:

Making learning visible

the future of measurement: making learning...

Documents

future of social media measurement

measurement important questions future secure...

iptv audience measurement standardization for future iptv

making sense of social media measurement

making trees count: measurement and reporting of

making predictions about the future

making the-future

durridge radon measurement technology present & future

making the most of your measurement dashboard

making future - interempresas

educ 714: measurement & evaluation for decision-making...

a primer for measurement and decision-making

the future of making - avaleht -...

a wireless future - making waves

making sense of the future

xperience the future –process measurement today seminar

making future metadata shareable

making headway for future growth

the future the future evaluate skill not style performance...

making strides for the future