laos session 3: principles of reliability and validity (en)
TRANSCRIPT
Session 3: Principles of Assessment: Validity and Reliability
Professor Jim Tognolini
Introduction to Modern Assessment Theory: A basis for all assessments
During this session we will
•define reliability.•define measurement error.•examine the sources of measurement error.•define validity and identify threats to validity.•build assessment frameworks•operationalise frameworks with Tables of Specification.
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
The reliability of results gives the extent to which the results are consistent or error-free. The concept of reliability is closely associated with the idea of consistency.
Reliability is not an all or nothing concept; there can exist degrees of reliability.
How similar are results if students are assessed at different times? How similar are results if students are assessed with a different sample of equivalent tasks? How similar are results if essays have been marked by different markers.
Reliability
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Measurement Error
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Sources of Measurement Error
The following are some of the sources of measurement error
1. Test taking skills2. Comprehension of instructions3. Sampling variance of items4. Temporary factors such as health; fatigue; motivation; testing
conditions.5. Memory fluctuations6. Marking bias (especially in essays)7. Guessing8. Item types
The aim for test developers is to identify sources of measurement error and minimise their impact.
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Validity
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Validity
The validity of the results if a test can best be defined as the extent to which the results of a test measure what they purport to measure.
It is the interpretation (including inferences and decisions) that is validated, not the test or the test score.
Messick also argued (1989) that validation can include the evaluation of the consequences of the test; are the specific benefits likely to be realised?
In 1999 the Standards (AERA, APA and NCME) suggested that validation can be viewed as developing scientifically sound validity arguments to support the intended interpretation of test scores and their relevance to the proposed use.
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Threats to Validity
1. Factors in the test itself
I. Unclear direction (e.g. how to respond to guessing; recording answers).
II. Reading vocab and sentence structure is too difficultIII. Inappropriate level of difficulty of test items (e.g. guessing)IV. Poorly constructed test itemsV. AmbiguityVI. Test items (tasks) inappropriate for content being assessedVII. Test too shortVIII.Improper arrangement of itemsIX. Identifiable pattern of answers
2. Factors in test administration and scoring
I. Insufficient timeII. CheatingIII. Unreliable scoring
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Reliability is a necessary but insufficient condition for validity.
Relationship between Validity and Reliability
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Some basic assessment theory
• Validity and reliability are not deterministic – maximise validity and reliability
• Validity is paramount
• Ways to minimise threats to validity and reliability
Breadth of material sampled – increase validity
Guessing
Quality of items
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Assessment frameworks
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Preliminary Questions
• Why are we assessing?
• What are we assessing?
• What is the most appropriate way to assess these outcomes?
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Assessment framework
Definition of construct
Domains/strands Sub-domains/sub-strands
Outcomes/content standards …
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Example 1 - Mathematics
Mathematics
Domains/strands Sub-domains/sub-strands
Outcomes/content standards …
Number
Addition & subtraction
Multiplication and Division
Fractions and Decimals
Measurement
Space
Chance
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Example 1 - Mathematics
MathematicsDomains/
strandsSub-domains/sub-
strandsOutcomes/content
standards Progress levels
Number
Addition & subtraction
Students develop facility with number facts and computation with larger numbers in addition and subtraction and an appreciation of the relationship between those facts
Early Stage 1Combines, separates and compares collections of objects, describes using everyday language and records using informal methods
Stage 1Uses concrete materials and mental strategies for addition and subtraction involving one- and two-digit numbers
Multiplication and Division
Multiplication and DivisionStudents develop facility with number facts and computation with larger numbers in multiplication and division and an appreciation of the relationship between those facts
Early Stage 1Groups and shares collections of objects, describes using everyday language and records using informal methods
Stage 1Models and uses strategies for multiplication and division
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Example 1 - Mathematics
Mathematics
Domains/strands
Sub-domains/sub-strands
Outcomes/content standards Progress levels
Number
Addition & subtraction
Students develop facility with number facts and computation with larger numbers in addition and subtraction and an appreciation of the relationship between those facts
Stage 2Uses mental and written strategies for addition and subtraction involving two-, three- and four-digit numbers
Stage 3Selects and applies appropriate strategies for addition and subtraction with numbers of any size
Multiplication and Division
Multiplication and DivisionStudents develop facility with number facts and computation with larger numbers in multiplication and division and an appreciation of the relationship between those facts
Stage 2Uses mental and written strategies for multiplication and division
Stage 3Selects and applies appropriate strategies for multiplication and division
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Example developmental continuum for mathematics
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Assessment framework
Definition of construct
Domains/strands Sub-domains/sub-strands
Outcomes/content standards …
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Example 2 – Scientific literacyScientific literacy
Domains/strands Outcomes/content standards …
Formulating
Formulating or identifying investigable questions and hypotheses, planning investigations and collecting evidence
Interpreting
Interpreting evidence and drawing conclusions, critiquing the trustworthiness of evidence and claims made by others, and communicating findings
Using
Using understandings for describing and explaining natural phenomena, making sense of reports, and for decision-making
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Example 2 – Scientific literacy
Scientific literacy
Domains/strands Outcomes/content standards …
Formulating
Formulating or identifying investigable questions and hypotheses, planning investigations and collecting evidence
Level 1 - Year 2Responds to the teacher’s questions, observes and describes
Level 2 - Year 4Given a question in a familiar
context, identifies a variable to be considered, observes and
describes or makes non-standard measurements and
limited records of data
Interpreting
Interpreting evidence and drawing conclusions, critiquing the trustworthiness of evidence and claims made by others, and communicating findings
Level 1 - Year 2Describes what happened Level 2 - Year 4
Makes comparisons between objects or events observed
Using
Using understandings for describing and explaining natural phenomena, making sense of reports, and for decision-making
Level 1 - Year 2Describes an aspect or property of an individual object or event that has been experienced or reported
Level 2 - Year 4Describes changes to, differences between or properties of objects or events that have been experienced or reported
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Domain A
Domain B
Domain C
Level 1 - Year 2 Level 2 - Year 4 Level 3 – Year 6
Formulating - Domain A
Interpreting - Domain B
Using - Domain C
T1T2
T7
T8T6T11
T3 T10T4
T5 T12
Example developmental continuum for scientific literacy
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
• Preparing a list of learning outcomes – these describe the types of performances the students are expected to demonstrate (e.g. Knows basic terms – “Writes a definition of each term”; “Identifies the term that represents each weather element”; etc.)
• Outlining the course content – the content describes the area in which each type of performance is to be demonstrated (e.g. “air pressure”; “wind”; “temperature”; etc.)
• Preparing a chart that relates the relative emphasis of the learning objectives to the content through the number, type and percentage of items.
Building a table of specifications
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Table of specifications
Learning Outcomes
Content Area Basic Skills Application Problem Solving
Total Percentage
Fractions 5 5 5 15
Mixed numbers
5 5 10 20
Decimals 5 15 10 30
Decimal to Fraction
conversions5 15 15 35
Total Percentage
Points20 40 40 100
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Table of specifications - English
CONTENT COGNITIVE LEVELS WEIGHTAGE MARKS
Identifies Interprets Infers Weightage Marks
PLOT 1 SA(2marks)
1 Essay(3marks)
1SA (2marks) 28% 7marks
CHARACTER 1 SA (2marks)
1 Essay(4marks) 24% 6marks
CRISIS1
Performance task
(8 marks)32% 8marks
LANGUAGE 1 SA (1mark)
1 SA (1mark)
1 SA (2marks) 16% 4marks
Weightage 20% 32% 48% 100Marks 5marks 8marks 12marks 25marks
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Table of specifications - GeographyCONTENT COGNITIVE LEVELS WEIGHTAGE MARKS
Basic map skills & Understanding
ApplicationExtended
UnderstandingWeightage Marks
Physical Landforms
1 SA(2 marks)
1 Essay(6 marks)
2 SAs (4 marks)
24% 12 marks
Location4 SA
(8 marks)16% 8 marks
Climate1 SA
(2 marks)1 Perform. task
(16 marks)1 SA
(2 marks)40% 20 marks
Vegetation2 SAs
(4 marks)1 Essay
(6 marks)20% 10 marks
Weightage 32% 44% 24% 100
Marks 16 marks 22 marks 12 marks 50 marks
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Constructing a test that operationally defines the scale.
Test constructors are challenged by the need to
1. define items that enable students at different stages along the scale to demonstrate that they have enough of the subject (construct) to correctly answer the item;
2. ensure that the items are assessing the outcomes for the particular location on the scale;
3. ensure that as the items are being written, the ones that are intended to be located further towards the top of the scale on the line are, in fact, are more demanding then those that are located towards the bottom of the scale on the line; and
4. ensure that the reason that the items are more demanding is a function of the property/variable that is being measured and not a function of some other extraneous feature (validity).
Capacity Development Workshop: Test and Item Development and Design, Laos,
September 2016
Assessment Literacy: Question 1
What is the most important thing to consider when selecting a method for assessing performance against learning objectives?
•how easy the assessment is to score•how easy the assessment is to prepare•how useful the assessment is at assessing the learning objective•how well the assessment is accepted by the school administration
Standards, Standard Setting and Maintenance, March 2015 27
Assessment Literacy: Question 2
Standards, Standard Setting and Maintenance, March 2015 28
What does it mean when you are told that the test is “reliable”?
A.student scores from the assessment can be used for a large number of decisionsB.students who take the same test are likely to get similar scores next timeC.the test score accurately assesses the contentD.the test score is more valid than teacher-based assessments
Standards, Standard Setting and Maintenance, March 2015 29
Assessment Literacy: Question 3
Standards, Standard Setting and Maintenance, March 2015 30
Class teachers in a school want to assess their students’ understanding of the method for solving problems that they have been teaching. Which one of the following would be the most appropriate method for seeing whether the teaching had been effective? Justify your answer.
•select a problem solving book with a problem solving test already in it•develop an assessment method consistent with what has actually been taught in class•select a problem solving test (like the PSA) that will give a problem solving mark•select an assessment that measures students’ attitudes to problem solving strategies
Assessment Literacy: Question 3
The following Table of Specifications for a Mathematics assessment was prepared by the classroom teacher. Use this Table to answer items 4 and 5. Note: The numbers in the cells refer to the number of items.
Standards, Standard Setting and Maintenance, March 2015 31
Content AreaBloom’s Taxonomy
Knowledge Comprehension Application Synthesis AnalysisTotal
Place values and number sense
1 2 2 1 1 7
Space 2 3 3 2 0 10Addition and subtraction 2 4 4 5 1 16
Multiplication & Division 1 3 2 2 2 10
Measurement 2 2 3 3 3 13
Total 8 14 14 13 7 56
Assessment Literacy: Question 4How many items did the teacher aim to use to assess higher order thinking skills, where higher order thinking skills are those that assess items at or above Application in Bloom’s Taxonomy?
A.14B.34C.7D.None of the above
Standards, Standard Setting and Maintenance, March 2015 32
Assessment Literacy: Question 5
Which one of the following statements BEST DEFINES a Table of Specifications?
A.It ensures that the total number of marks for the assessment will equal 100.B.It classifies educational goals, learning objectives and standards.C.It relates the content to the cognitive level of the learning objectives for the purpose of improving the validity of the instrument.D.It is a table that is used by teachers to reliably assess students.
Standards, Standard Setting and Maintenance, March 2015