laos session 3: principles of reliability and validity (en)

33
Session 3: Principles of Assessment: Validity and Reliability Professor Jim Tognolini

Upload: neqmap

Post on 12-Apr-2017

103 views

Category:

Education


4 download

TRANSCRIPT

Page 1: Laos Session 3: Principles of Reliability and Validity (EN)

Session 3: Principles of Assessment: Validity and Reliability

Professor Jim Tognolini

Page 2: Laos Session 3: Principles of Reliability and Validity (EN)

Introduction to Modern Assessment Theory: A basis for all assessments

During this session we will

•define reliability.•define measurement error.•examine the sources of measurement error.•define validity and identify threats to validity.•build assessment frameworks•operationalise frameworks with Tables of Specification.

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 3: Laos Session 3: Principles of Reliability and Validity (EN)

The reliability of results gives the extent to which the results are consistent or error-free. The concept of reliability is closely associated with the idea of consistency.

Reliability is not an all or nothing concept; there can exist degrees of reliability.

How similar are results if students are assessed at different times? How similar are results if students are assessed with a different sample of equivalent tasks? How similar are results if essays have been marked by different markers.

Reliability

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 4: Laos Session 3: Principles of Reliability and Validity (EN)

Measurement Error

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 5: Laos Session 3: Principles of Reliability and Validity (EN)

Sources of Measurement Error

The following are some of the sources of measurement error

1. Test taking skills2. Comprehension of instructions3. Sampling variance of items4. Temporary factors such as health; fatigue; motivation; testing

conditions.5. Memory fluctuations6. Marking bias (especially in essays)7. Guessing8. Item types

The aim for test developers is to identify sources of measurement error and minimise their impact.

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 6: Laos Session 3: Principles of Reliability and Validity (EN)

Validity

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 7: Laos Session 3: Principles of Reliability and Validity (EN)

Validity

The validity of the results if a test can best be defined as the extent to which the results of a test measure what they purport to measure.

It is the interpretation (including inferences and decisions) that is validated, not the test or the test score.

Messick also argued (1989) that validation can include the evaluation of the consequences of the test; are the specific benefits likely to be realised?

In 1999 the Standards (AERA, APA and NCME) suggested that validation can be viewed as developing scientifically sound validity arguments to support the intended interpretation of test scores and their relevance to the proposed use.

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 8: Laos Session 3: Principles of Reliability and Validity (EN)

Threats to Validity

1. Factors in the test itself

I. Unclear direction (e.g. how to respond to guessing; recording answers).

II. Reading vocab and sentence structure is too difficultIII. Inappropriate level of difficulty of test items (e.g. guessing)IV. Poorly constructed test itemsV. AmbiguityVI. Test items (tasks) inappropriate for content being assessedVII. Test too shortVIII.Improper arrangement of itemsIX. Identifiable pattern of answers

2. Factors in test administration and scoring

I. Insufficient timeII. CheatingIII. Unreliable scoring

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 9: Laos Session 3: Principles of Reliability and Validity (EN)

Reliability is a necessary but insufficient condition for validity.

Relationship between Validity and Reliability

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 10: Laos Session 3: Principles of Reliability and Validity (EN)

Some basic assessment theory

• Validity and reliability are not deterministic – maximise validity and reliability

• Validity is paramount

• Ways to minimise threats to validity and reliability

Breadth of material sampled – increase validity

Guessing

Quality of items

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 11: Laos Session 3: Principles of Reliability and Validity (EN)

Assessment frameworks

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 12: Laos Session 3: Principles of Reliability and Validity (EN)

Preliminary Questions

• Why are we assessing?

• What are we assessing?

• What is the most appropriate way to assess these outcomes?

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 13: Laos Session 3: Principles of Reliability and Validity (EN)

Assessment framework

Definition of construct

Domains/strands Sub-domains/sub-strands

Outcomes/content standards …

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 14: Laos Session 3: Principles of Reliability and Validity (EN)

Example 1 - Mathematics

Mathematics

Domains/strands Sub-domains/sub-strands

Outcomes/content standards …

Number

Addition & subtraction

Multiplication and Division

Fractions and Decimals

Measurement

Space

Chance

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 15: Laos Session 3: Principles of Reliability and Validity (EN)

Example 1 - Mathematics

MathematicsDomains/

strandsSub-domains/sub-

strandsOutcomes/content

standards Progress levels

Number

Addition & subtraction

Students develop facility with number facts and computation with larger numbers in addition and subtraction and an appreciation of the relationship between those facts

Early Stage 1Combines, separates and compares collections of objects, describes using everyday language and records using informal methods

Stage 1Uses concrete materials and mental strategies for addition and subtraction involving one- and two-digit numbers

Multiplication and Division

Multiplication and DivisionStudents develop facility with number facts and computation with larger numbers in multiplication and division and an appreciation of the relationship between those facts

Early Stage 1Groups and shares collections of objects, describes using everyday language and records using informal methods

Stage 1Models and uses strategies for multiplication and division

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 16: Laos Session 3: Principles of Reliability and Validity (EN)

Example 1 - Mathematics

Mathematics

Domains/strands

Sub-domains/sub-strands

Outcomes/content standards Progress levels

Number

Addition & subtraction

Students develop facility with number facts and computation with larger numbers in addition and subtraction and an appreciation of the relationship between those facts

Stage 2Uses mental and written strategies for addition and subtraction involving two-, three- and four-digit numbers

Stage 3Selects and applies appropriate strategies for addition and subtraction with numbers of any size

Multiplication and Division

Multiplication and DivisionStudents develop facility with number facts and computation with larger numbers in multiplication and division and an appreciation of the relationship between those facts

Stage 2Uses mental and written strategies for multiplication and division

Stage 3Selects and applies appropriate strategies for multiplication and division

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 17: Laos Session 3: Principles of Reliability and Validity (EN)

Example developmental continuum for mathematics

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 18: Laos Session 3: Principles of Reliability and Validity (EN)

Assessment framework

Definition of construct

Domains/strands Sub-domains/sub-strands

Outcomes/content standards …

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 19: Laos Session 3: Principles of Reliability and Validity (EN)

Example 2 – Scientific literacyScientific literacy

Domains/strands Outcomes/content standards …

Formulating

Formulating or identifying investigable questions and hypotheses, planning investigations and collecting evidence

Interpreting

Interpreting evidence and drawing conclusions, critiquing the trustworthiness of evidence and claims made by others, and communicating findings

Using

Using understandings for describing and explaining natural phenomena, making sense of reports, and for decision-making

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 20: Laos Session 3: Principles of Reliability and Validity (EN)

Example 2 – Scientific literacy

Scientific literacy

Domains/strands Outcomes/content standards …

Formulating

Formulating or identifying investigable questions and hypotheses, planning investigations and collecting evidence

Level 1 - Year 2Responds to the teacher’s questions, observes and describes

Level 2 - Year 4Given a question in a familiar

context, identifies a variable to be considered, observes and

describes or makes non-standard measurements and

limited records of data

Interpreting

Interpreting evidence and drawing conclusions, critiquing the trustworthiness of evidence and claims made by others, and communicating findings

Level 1 - Year 2Describes what happened Level 2 - Year 4

Makes comparisons between objects or events observed

Using

Using understandings for describing and explaining natural phenomena, making sense of reports, and for decision-making

Level 1 - Year 2Describes an aspect or property of an individual object or event that has been experienced or reported

Level 2 - Year 4Describes changes to, differences between or properties of objects or events that have been experienced or reported

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 21: Laos Session 3: Principles of Reliability and Validity (EN)

Domain A

Domain B

Domain C

Level 1 - Year 2 Level 2 - Year 4 Level 3 – Year 6

Formulating - Domain A

Interpreting - Domain B

Using - Domain C

T1T2

T7

T8T6T11

T3 T10T4

T5 T12

Example developmental continuum for scientific literacy

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 22: Laos Session 3: Principles of Reliability and Validity (EN)

• Preparing a list of learning outcomes – these describe the types of performances the students are expected to demonstrate (e.g. Knows basic terms – “Writes a definition of each term”; “Identifies the term that represents each weather element”; etc.)

• Outlining the course content – the content describes the area in which each type of performance is to be demonstrated (e.g. “air pressure”; “wind”; “temperature”; etc.)

• Preparing a chart that relates the relative emphasis of the learning objectives to the content through the number, type and percentage of items.

Building a table of specifications

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 23: Laos Session 3: Principles of Reliability and Validity (EN)

Table of specifications

Learning Outcomes

Content Area Basic Skills Application Problem Solving

Total Percentage

Fractions 5 5 5 15

Mixed numbers

5 5 10 20

Decimals 5 15 10 30

Decimal to Fraction

conversions5 15 15 35

Total Percentage

Points20 40 40 100

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 24: Laos Session 3: Principles of Reliability and Validity (EN)

Table of specifications - English

CONTENT COGNITIVE LEVELS WEIGHTAGE MARKS

Identifies Interprets Infers Weightage Marks

PLOT 1 SA(2marks)

1 Essay(3marks)

1SA (2marks) 28% 7marks

CHARACTER 1 SA (2marks)

1 Essay(4marks) 24% 6marks

CRISIS1

Performance task

(8 marks)32% 8marks

LANGUAGE 1 SA (1mark)

1 SA (1mark)

1 SA (2marks) 16% 4marks

Weightage 20% 32% 48% 100Marks 5marks 8marks 12marks 25marks

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 25: Laos Session 3: Principles of Reliability and Validity (EN)

Table of specifications - GeographyCONTENT COGNITIVE LEVELS WEIGHTAGE MARKS

Basic map skills & Understanding

ApplicationExtended

UnderstandingWeightage Marks

Physical Landforms

1 SA(2 marks)

1 Essay(6 marks)

2 SAs (4 marks)

24% 12 marks

Location4 SA

(8 marks)16% 8 marks

Climate1 SA

(2 marks)1 Perform. task

(16 marks)1 SA

(2 marks)40% 20 marks

Vegetation2 SAs

(4 marks)1 Essay

(6 marks)20% 10 marks

Weightage 32% 44% 24% 100

Marks 16 marks 22 marks 12 marks 50 marks

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 26: Laos Session 3: Principles of Reliability and Validity (EN)

Constructing a test that operationally defines the scale.

Test constructors are challenged by the need to

1. define items that enable students at different stages along the scale to demonstrate that they have enough of the subject (construct) to correctly answer the item;

2. ensure that the items are assessing the outcomes for the particular location on the scale;

3. ensure that as the items are being written, the ones that are intended to be located further towards the top of the scale on the line are, in fact, are more demanding then those that are located towards the bottom of the scale on the line; and

4. ensure that the reason that the items are more demanding is a function of the property/variable that is being measured and not a function of some other extraneous feature (validity).

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 27: Laos Session 3: Principles of Reliability and Validity (EN)

Assessment Literacy: Question 1

What is the most important thing to consider when selecting a method for assessing performance against learning objectives?

•how easy the assessment is to score•how easy the assessment is to prepare•how useful the assessment is at assessing the learning objective•how well the assessment is accepted by the school administration 

Standards, Standard Setting and Maintenance, March 2015 27

Page 28: Laos Session 3: Principles of Reliability and Validity (EN)

Assessment Literacy: Question 2

Standards, Standard Setting and Maintenance, March 2015 28

What does it mean when you are told that the test is “reliable”?

A.student scores from the assessment can be used for a large number of decisionsB.students who take the same test are likely to get similar scores next timeC.the test score accurately assesses the contentD.the test score is more valid than teacher-based assessments 

Page 29: Laos Session 3: Principles of Reliability and Validity (EN)

Standards, Standard Setting and Maintenance, March 2015 29

Assessment Literacy: Question 3

Page 30: Laos Session 3: Principles of Reliability and Validity (EN)

Standards, Standard Setting and Maintenance, March 2015 30

Class teachers in a school want to assess their students’ understanding of the method for solving problems that they have been teaching. Which one of the following would be the most appropriate method for seeing whether the teaching had been effective? Justify your answer.

•select a problem solving book with a problem solving test already in it•develop an assessment method consistent with what has actually been taught in class•select a problem solving test (like the PSA) that will give a problem solving mark•select an assessment that measures students’ attitudes to problem solving strategies

Assessment Literacy: Question 3

Page 31: Laos Session 3: Principles of Reliability and Validity (EN)

The following Table of Specifications for a Mathematics assessment was prepared by the classroom teacher. Use this Table to answer items 4 and 5. Note: The numbers in the cells refer to the number of items.

Standards, Standard Setting and Maintenance, March 2015 31

Content AreaBloom’s Taxonomy

Knowledge Comprehension Application Synthesis AnalysisTotal

Place values and number sense

1 2 2 1 1 7

Space 2 3 3 2 0 10Addition and subtraction 2 4 4 5 1 16

Multiplication & Division 1 3 2 2 2 10

Measurement 2 2 3 3 3 13

Total 8 14 14 13 7 56

Page 32: Laos Session 3: Principles of Reliability and Validity (EN)

Assessment Literacy: Question 4How many items did the teacher aim to use to assess higher order thinking skills, where higher order thinking skills are those that assess items at or above Application in Bloom’s Taxonomy?

A.14B.34C.7D.None of the above

Standards, Standard Setting and Maintenance, March 2015 32

Page 33: Laos Session 3: Principles of Reliability and Validity (EN)

Assessment Literacy: Question 5

Which one of the following statements BEST DEFINES a Table of Specifications?

A.It ensures that the total number of marks for the assessment will equal 100.B.It classifies educational goals, learning objectives and standards.C.It relates the content to the cognitive level of the learning objectives for the purpose of improving the validity of the instrument.D.It is a table that is used by teachers to reliably assess students.

Standards, Standard Setting and Maintenance, March 2015