where do we go from here? next generation assessment systems for achievement and accountability...

Where Do We Go from Here?

Next Generation Assessment Systems for Achievement and

Accountability

Presentation at the National Conference on Large Scale Assessment

Gregory J. CizekUniversity of North Carolina at Chapel Hill

[email protected]

2015, June

Purposes of Assessment:Four Principles

2


1) All assessments are designed, on purpose, to best fulfill one purpose

3


1) All assessments are designed, on purpose, to best fulfill one purpose2) Some assessments can be made to fulfill multiple purposes, but they are not likely to serve those multiple purposes equally well

4


1) All assessments are designed, on purpose, to best fulfill one purpose2) Some assessments can be made to fulfill multiple purposes, but they are not likely to serve those multiple purposes equally well3) To demand that they do so is unrealistic

5


1) All assessments are designed, on purpose, to best fulfill one purpose2) Some assessments can be made to fulfill multiple purposes, but they are not likely to serve those multiple purposes equally well3) To demand that they do so is unrealistic4) Diverse purposes likely require multiple, targeted assessments

6

What types of test(s) would be most important to have in an

assessment system?

7

What types of test(s) would be most important to have in an

assessment system?Design Purpose(s) OptionsAnnual large-scale, state-mandated student achievement test (Summative)

Public and system accountability; fiduciary monitoring of overall achievement in a state; policy evaluation, funding decisions, program & instructional planning at aggregate levels

Matrix, cluster, random, or other sampling strategy of students Random, systematic, purposeful, or census sampling of content standards

Annual large-scale, state-mandated student achievement test (Summative)

Above, plus reporting to students and parents of overall mastery of grade/subject content standards; local and teacher instructional planning; student accountability (e.g., graduation, promotion)

Census sampling of students Random, systematic, purposeful, or census sampling of content standards 8

9

Design Purpose(s) Options

Fine-grained, narrowly-focused assessment of student learning (Diagnostic)

Individually-diagnostic, educator actionable, tailoring of instruction

On demand, individual or group administration, state or locally developed, aggregated (or not), secure (or not)

Instructionally-embedded assessment for learning (Formative)

Instructional planning for educator and student; student responsibility for goals, monitoring progress, self-evaluation, meta-cognition

As needed, individual or classroom administration, state or locally developed, not aggregated, highly insecure

Why all the controversy about accountability systems?

10


1) “It’s the accountability, stupid.”

11


1) “It’s the accountability, stupid.”2) The personnel evaluation system in education is broken.

12


1) “It’s the accountability, stupid.”2) The personnel evaluation system in education is broken.3) Federal shenanigans

13


1) “It’s the accountability, stupid.”2) The personnel evaluation system in education is broken.3) Federal shenanigans4) Teacher preparation requirements: An Epic Fail

14

Can accountability results improve achievement?

15


That may or may not be the purpose.

16


Point of diminishing returns on

large-scale, every pupil state (or

federal) mandated student

achievement tests.

17


Best candidates for improving achievement:

18



* REAL formative assessment

19



* REAL formative assessment

* ANY content standards that change what happens in classrooms

20

What are the Psychometric Issues?

21

What are the Psychometric Issues?

1) Clarity of construct

2) Clarity of purpose

22

Construct Clarity

“A consensus definition of what constitutes college and career readiness does not exist.” (Hao & Wyse, 2015, p. 3)

23

Construct Clarity (cont’d)

The score at which a student has a 75% probability of getting a score of X or better on a different test [or a grade of Y or better in a different course].

24

Construct Clarity (cont’d)

The score at which a student has a 75% probability of getting a score of X or better on a different test [or a grade of Y or better in a different course].

“Intelligence as a measurable capacity must at the start be defined as the capacity to do well on an intelligence test. Intelligence is what the tests test.” (Boring, 1923, p. 35)

25

Clarity of Purpose:Two Examples

26

Clarity of Purpose:Example 1

“How can we extend the content standards and develop a modified assessment for the most severely cognitively disabled students so that their scores have equivalent meaning as those from the general assessment?

27

Clarity of Purpose:Example 2

“How can we shorten this test and not adversely affect the precision of the student ability estimates?”

28

where do we go from here? next generation assessment systems for achievement and accountability...

Documents

purposes of assessment

multiple purposes

accountability systems

insecure slide

diverse purposes

accountability presentation

embedded assessment

generation assessment