malaysia, rigour, benchmarking and icas

68
13-09-2017 1 Malaysia, Rigour, Benchmarking and ICAS Context, purpose, validity, rigour and psychometrics Professor Kelvin Gregory UNSW Global It has been almost 20 years since I was here in Kuala Lumpur.

Upload: others

Post on 25-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

13-09-2017

1

Malaysia, Rigour, Benchmarking and ICASContext, purpose, validity, rigour and

psychometrics

Professor Kelvin GregoryUNSW Global

It has been almost 20 years sinceI was here in Kuala Lumpur.

13-09-2017

2

Outline Malaysian educational context Assessment defined and principles ICAS, validity, uses and interpretations Rigour and higher order thinking Bloom’s cognitive taxonomy

– Detail– Exercise

Webb’s depth of knowledge– Detail– Exercise

Hess’s cognitive rigour matrix– Detail– ICAS Examples

Psychometrics

A school year (204 student days)

Term 1(Jan-Jun)99 days

Term 2(Jun-Nov)105 days

Mid-term breaks: 9 days in each term. Midyear break: 16 daysPrimary education: 6 yearsSecondary education: 5 years What progress should a student make?

How will you know a student is making progress?ICAS is in Term 1? Why?

13-09-2017

3

Education Minister Datuk Seri Mahdzir Khalid As the current Minister of Education, he oversees policies at pre-school, school and pre university levels. He is now leading large scale transformation efforts as outlined in the Malaysia Education Blueprint (MEB) 2013-2025 in putting Malaysia as among the world’s best in education provision. This is where he instituted key transformational initiatives as a leader in education in incorporating best practices within the ecosystem.

https://asia.bettshow.com/speakers/yb-dato%E2%80%99-seri-mahdzir-bin-khalid

External assessments andgovernment/school policyPolices, procedures and guidelines for externalassessment useHow should the external assessment integratewith the school?Most teachers already know the most proficientlearners. What is really needed? What is a soundpurpose for the external assessment? Then howwill the assessment fulfil that purpose?

13-09-2017

4

Malaysia Education Blueprint 2013-2025 “Revamp national examinations and school-based assessments to gradually

increase percentage of questions that test higher-order thinking. By 2016, higher-order thinking questions will comprise at least 40% of questions in UPSR and 50% in SPM. This change in examination design means that teachers will focus less on predicting what topics and questions will come out and drilling for content recall. Instead, students will be trained to think critically and to apply their knowledge in different settings. Similarly, school-based assessments will also shift their focus to testing for higher-order thinking skills.” (E-11)

– Malaysian Certificate of Education or Sijil Pelajaran Malaysia (SPM)– Year 6 Primary School Evaluation Test or Ujian Pencapaian Sekolah Rendah (UPSR) National

Examination

Malaysia Education Blueprint 2013-2025 “As the TIMSS and PISA international assessments have

demonstrated, our students struggle with higher-order thinking skills.” (E-11)

“The aspiration is for Malaysiato be in the top third of countries in terms of performance ininternational assessments, as measured by outcomes in TIMSS and PISA, within 15 years.” (E-9)

13-09-2017

5

Teachers make a difference

“Our research indicates that there is a 15% variability difference in student achievement between teachers within the same schools.”

Deborah Loewenberg Ball, Dean of Education, University of Michigan

“What Matters Very Much is Which Classroom?”“If a student is in one of the most effective classrooms he or

she will learn in 6 months what those in an average classroom will take a year to learn. And if a student is in one of the least effective classrooms in that school, the same amount of learning take 2 years.”

13-09-2017

6

“Excellent examples exist” (E-6)

Education Ministry moves to decentralise some matters “Previously when it comes to assessment, everything is

given to the examination board, so like school-based assessment...schools can do the assessment.”

Education Minister Datuk Seri Mahdzir Khalid10 December 2016

http://www.themalaymailonline.com/malaysia/article/education-ministry-moves-to-decentralise-some-matters#OoElB22H61i8SEwh.99Return

13-09-2017

7

Assessment Defined

“Assessment is the systematic collection, interpretation and use of information to give a deeper appreciation of what learners know and understand, their skills and personal capabilities, and what their learning experiences enable them to do.”– Note the parts of the definition– How does this fit with you? With your school experience?

Each school and school system should have an operationalised definition of assessment

Northern Ireland Curriculum (2013) Guidance on Assessment in the Primary School

Educators know that more assessment is not the key to learning– “Children don’t grow by weighing them”

Research has shown that educators experience difficultiesin designing appropriate assessments

There is a reasonable argument to integrate ICAS into the school’s assessment system– Not more assessment– Rather better assessment– But you need a system to use the assessment and the

assessment data Enable educators to learn from the ICAS assessments

Use it as an external benchmark And an external validation of their work

13-09-2017

8

Principles of Assessment The following five principles underpin assessments:

– complementary to and supportive of learning; – valid and reliable;– fit for purpose and manageable; – supports educator’s professional judgement; and – supports accountability.

These principles all require careful thought and support before any assessment usage. Some of these principles arecontentious. Can you identify which ones?

An example assessment design Make a unit test in three parts

– Part 1: Definitions (tell learners this)– Part 2: Problems practiced in class, ranging from simple to more

challenging– Part 3: Unseen problems

Routine ones like seen in class Non-routine, novel, unseen

– Construct the test so that learners can pass if they do well on Parts 1 and 2, but they only achieve the highest achievement standard by answering all problems correctly

13-09-2017

9

Vygotsky: Zone of Proximal Development Zone of Proximal Development is what a child can

do with assistance today– It’s the learning zone

Some core uses of assessment– Use assessments for learning

to locate students and then teach

– Use assessments oflearning for certification

– Use external low stakesassessments to guideteachers, as an external reference todevelop teaching and learning

Potentialdevelopmentarea

ZPD: The learning zone

Child’s currentachievement

Return

A word or two about validity Validity is a central concept in assessment Validity refers to the interpretations and uses of the assessment scores

– Desired/intended interpretations and uses must be specified ahead of usage– And then evidence must be gathered that the assessment scores/findings

are well supported Evidence from the commission of the assessment through to item writing through to reports

and report usage Support from theory and research

– Each ICAS subject assumes that there is one dominant cognitive ability or proficiency underpins the learners responses Except for ICAS English, English reading proficiency should be a minor skill requirement

13-09-2017

10

Some Draft Statements about ICAS ICAS affirms educational policies and practices through

assessments with demonstrable links to curricula, and best teaching and assessment practices.

ICAS is an external high-quality suite of engaging and challenging assessments designed to assist and recognise learning.

ICAS provides an independent benchmark of learner achievement and progress.

Reflection on the Draft Statements about ICAS How do you interpret these statements within the Malaysian

context?– “demonstrable links to curricula, standards and best teaching and

assessment practices”– “engaging and challenging assessments designed to assist and

recognise learning.”– “an independent benchmark of learner achievement and progress”

What would you change? Why? What is missing?

13-09-2017

11

Some uses of ICAS As benchmarking test

– ICAS should be demonstrably aligned to the curriculum and assess what should be learnt This is being done for Australia, New Zealand and England

– It needs to be done with the revised Malaysian syllabi

– It can be used as a high-quality external assessment An external reference point This quality would be reinforced if a relationship between ICAS and the Malaysian

internal and external tests could be established

ICAS has design features which have implications for school assessment

Interpreting ICAS scores Atomistic level

– Learner responses to individual items Response patterns

– Make sense of learner attributes by looking for patterns Summary scores at either domain or sub-domain levels, gathered over

time– Interpreted in comparison with other learners or groups of learners – Interpreted in comparison to an achievement scale or curriculum– Monitor learning progress at individual, class and school levels

What are your intended interpretations of learner responses and scores?

13-09-2017

12

ICAS Processes ICAS is developed to a plan

– Systematic guidance for all test development activities: construct; desired test interpretations; major sources of validity evidence; clear purpose;

desired inferences; psychometric model; timelines; security; quality control

Content definition and test specifications Item and test development - more about this soon Test production and administration Scaling and reporting

– Scaling uses the Rasch model As in TIMSS 1995, 1995, and PISA studies

All done to increasingly meet AERA/APA/NCME Standards for Educational and Psychological Testing

ICAS Assessment Frameworks Internal documents developed and used by subject-assessment experts

– Guides and shapes item and test development so that the assessment satisfies the purpose

– Enables/facilitates the establishment of a validity framework A network of evidence, theory and argument that supports the intended interpretations and uses of the scores

Frameworks contain:– Construct definitions, main and sub domains– Content/curriculum maps

Australia links have been done England and New Zealand curriculum links are being identified We need to explore the links with the revised Malaysian syllabi documents

– Test blueprint Hess’s cognitive rigor matrix-item allocation

– Combining Bloom’s taxonomy and Webb’s depth of knowledge

Return

13-09-2017

13

The need for more higher order thinking and rigour

Malaysian Education Blueprint 2013-2025“Thinking skills: Every child will learn how to continue acquiring knowledge throughout their lives (instilling a love for inquiry and lifelong learning), to be able to connect different pieces of knowledge, and to create new knowledge. Every child will master a range of important cognitive skills, including critical thinking, reasoning, creative thinking, and innovation. This is an area where the system has historically fallen short, with students being less able than they should be in applying knowledge and thinking critically outside familiar academic contexts.” (E-10)

Malaysian Education Blueprint 2013-2025 “Revamp national examinations and school-based assessments to

gradually increase percentage of questions that test higher-order thinking. By 2016, higher-order thinking questions will comprise at least 40% of questions in UPSR and 50% in SPM. This change in examination design means that teachers will focus less on predicting what topics and questions will come out and drilling for content recall. Instead, students will be trained to think critically and to apply their knowledge in different settings. Similarly, school based assessments will also shift their focus to testing for higher-order thinking skills.” (E-11)– This is call for increased rigour

13-09-2017

14

Higher-order thinking and Cognitive RigourActivity: Take a minute and write your definition of each term

as it relates to Malaysian teaching, learning and assessment – Higher order thinking– Cognitive rigour

Cognitive Rigour

• The kind and level of thinking required of learners to successfully engage with and solve a task • Cognitive rigor is marked and measured by the depth and extent

students are challenged and engaged to demonstrate and communicate their knowledge and thinking.

• It also marks and measures the depth and complexity of learner learning experiences.

• The ways in which learners interact with content• And this is where ICAS excels

13-09-2017

15

Cognitive RigourImagine a primary class has just read some version of a short story.

– What is a basic comprehension question you might ask?– What is a more rigorous question you might ask?

What system might you use to guide your questioning?– Questions seek to elicit evidence of specific cognitive

(latent) functioning– How will you know that your questions are suitable for

eliciting this information?

The Ant and the GrasshopperIn a field one summer's day a Grasshopper was hopping about, chirping and singing to its heart's content. An Ant passed by, bearing along with great toil an ear of corn he was taking to the nest."Why not come and chat with me," said the Grasshopper, "instead of toiling and moiling in that way?“"I am helping to lay up food for the winter," said the Ant, "and recommend you to do the same.“"Why bother about winter?" said the Grasshopper; "We have got plenty of food at present." But the Ant went on its way and continued its toil.When the winter came the Grasshopper had no food and found itself dying of hunger - while it saw the ants distributing every day corn and grain from the stores they had collected in the summer. Then the Grasshopper knew: It is best to prepare for days of need.

What is a basic comprehension question you might ask?What is a more rigorous question you might ask?

13-09-2017

16

Cognitive Rigor Refers to the kind and level of thinking required of learners

to successfully engage with and solve a task ICAS uses Karen Hess’s Cognitive Rigor Matrix

– Kind of thinking (the verbs) For this we use Bloom’s Cognitive Taxonomy

– Level of thinking (the depth) How deeply do you have to understand the content to successfully interact with it?

How complex is the content? We leverage Webb’s Depth of Knowledge Return

Revised Bloom’s Taxonomy Revised by his doctoral students

– Including Professor Peter Airasian Defines the kind of knowledge

and type of thinking students are expected to demonstrate in order to answer questions, address problems, accomplish tasks, and analyse texts and topics

13-09-2017

17

Revised Bloom’s Taxonomy has Two Dimensions The Knowledge Dimension (Content and Concepts)

– The subject matter content (knowledge) Factual, conceptual, procedural, metacognitive

The Cognitive Process Dimension (Cognition)– What students must do (thinking) with what they are learning.

Lower order thinking– Remember, understand and apply

Higher order thinking– Analyse, evaluate and create

Revised Bloom’s knowledge dimensionDimension Components

Factual knowledge Terminology, elements and components

Conceptual knowledge Categories, principles, and theories

Procedural knowledge Specific skills and techniques

Metacognitive knowledge General knowledge and self knowledge

13-09-2017

18

Remembering

Analysing

Applying

Understanding

Evaluating

Creating

Exhibit memory of previously learned material by recalling facts, terms, basic concepts, and answers.

Demonstrate understanding of facts and ideas by organizing, comparing, translating, interpreting, giving descriptions, and stating main ideas.

Solve problems to new situations by applying acquired knowledge, facts, techniques and rules in a different way.

Examine and break information into parts by identifying motives or causes.

Make inferences and find evidence to support generalizations. Present and defend opinions by making judgments.

Compile information together in a different way by combining elementsin a new pattern or proposing alternative solutions.

HOT

Knowledge dimensionCognitive process dimension

Remember Understand Apply Analyse Evaluate Create

Factual knowledge

TerminologyElements & Components

Label mapList names

Interpret paragraphSummarisebook

Use mathematics algorithms

Categorise words

Critique article Create short story

Conceptual knowledge

Categories PrinciplesTheories

Describe taxonomy in own words

Describe taxonomy in own words

Write objectives using taxonomy

Differentiate levels of cognitive taxonomy

Critique written objectives

Create new classification system

Proceduralknowledge

Specific skills & Techniques,Criteria for Use

Paraphrase problem solving process in own words

Paraphrase problem solving process in own words

Use problem solving process for assigned task

Compare convergent and divergent techniques

Critique appropriateness of techniques used in case analysis

Develop an original approach to problem solving

Metacognitiveknowledge

General KnowledgeSelfKnowledge

Describe implications of learning styles

Describe implications of learning styles

Develop study skills appropriate to learning style

Compare elements of dimensions in learning styles

Critique appropriateness of particularlearning style theory to own learning

Create an originallearning style theory

Return

13-09-2017

19

Remember (I Know)APPROPRIATE VERBS: Recognize, Observe, List, Acquire,

Remember, Tell, Underline, State, Label, Record, Write, Relate, Match, Memorize, Show, Describe, Repeat, Identify, Name, Know

PRODUCTS:• Chart• Model• Worksheet• Draw a map• Picture• Demonstrate

Understand (I Comprehend)APPROPRIATE VERBS: Report, Communicate, Discuss, Review,

Debate, Generalize, Interpret, Draw, Relate Change, Prepare, Express, Describe, Explain, Paraphrase, Give Main Idea, Translate, Infer, Restate, Transform, Locate, Report, Summarize

PRODUCTS:• Diagram• Time line• Teach a lesson• Diorama• Make a Filmstrip• Make a recording• Game• Report

13-09-2017

20

Apply (I Can Use It)APPROPRIATE VERBS: Apply, Show, Role play, Practice,

Solve, Experiment, Manipulate, Restructure, Construct Models, Illustrate, Employ, Investigate, Operate, Sketch, Use, Interpret, Demonstrate, Dramatize, Transfer, Report, Conduct, Schedule, Classify, Solve

PRODUCTS:• Survey• Diary• Scrapbook• Photographs• Cartoon• Learning Center• Construction• Illustration• Stitchery• Sculpture• Model• Mobile

Analyze (I Can Be Logical)APPROPRIATE VERBS: Analyze, Inventory, Experiment, Investigate,

Diagram, Deduce, Inspect, Differentiate, Contrast, Categorize, Question, Criticize, Separate, Examine, Discriminate, Dissect, Calculate, Survey, Detect, Relate, Distinguish, Compare, Develop, Debate

PRODUCTS:• Graph• Survey• Family Tree• Time line• Questionnaire• Commercial• Diagram• Chart• Report• Fact file

13-09-2017

21

Evaluate (I Can Judge)APPROPRIATE VERBS: Judge, Measure, Rate, Verify, Decide,

Standardize, Estimate, Justify, Select, Validate, Revise, Argue, Evaluate Critique, Appraise, Debate, Choose, Consider, Score, Recommend, Assess

PRODUCTS: Survey Self evaluation Editorial Experiment Panel evaluation Recommendation Conclusion Court trial Essay Letter

Create (I Plan)APPROPRIATE VERBS: Create, assemble, improve, modify,

predict, derive, Plan, What if…, Construct, Invent, Manage, Produce, Suppose, Organize, Set Up, Imagine, Design, Compose, Prepare, Propose, Arrange, Formulate

PRODUCTS: Story Poem Play Radio Show Puppet Show News Article Invention Dance Mural Comic Strip Recipe Pantomime Travelogue

Return

13-09-2017

22

Exercise Look at the handout

– Note the lower order: higher order thinking divide is different Lower order: knowledge, understanding Higher order:

– Application, analysis, synthesis, evaluation – Application, analysis, evaluation, creation

Now look at the first 10 ICAS Mathematics items for Standard 3– What Bloom’s cognitive levels are applicable?

2013 ICAS Mathematics (Standard 3)

Q1 Q2 Q3 Q4 Q5

RememberUnderstanding

UnderstandingUnderstanding

AnalysisApplication

ApplicationAnalysis

UnderstandingApplication

Q6 Q7 Q8 Q9 Q10

RememberUnderstand

ApplicationUnderstand

UnderstandingApplication

ApplicationApplication

UnderstandingApplication

Classifications from two expert mathematics assessors

80 percent agreement at LOT/HOT level

13-09-2017

23

2013 ICAS Mathematics (Standard 3)

Q11 Q12 Q13 Q14 Q15

AnalysisAnalysis

ApplicationApplication

RememberAnalysis

ApplicationApplication

ApplicationApplication

Q16 Q17 Q18 Q19 Q20

ApplicationUnderstanding

RememberUnderstanding

UnderstandingApplication

RememberUnderstanding

ApplicationApplication

Classifications from two expert mathematics assessors

90 percent agreement at LOT/HOT level

2013 ICAS Mathematics (Standard 3)

Q21 Q22 Q23 Q24 Q25

AnalysisApplication

ApplicationApplication

ApplicationApplication

UnderstandingUnderstanding

UnderstandingApplication

Q26 Q27 Q28 Q29 Q30

UnderstandingUnderstanding

AnalysisAnalysis

ApplicationApplication

ApplicationApplication

AnalysisUnderstanding

Classifications from two expert mathematics assessors

80 percent agreement at LOT/HOT level

13-09-2017

24

2013 ICAS Mathematics (Standard 3)

Q31 Q32 Q33 Q34 Q35

UnderstandingApplication

ApplicationAnalysis

ApplicationApplication

AnalysisAnalysis

ApplicationUnderstanding

Q36 Q37 Q38 Q39 Q40

ApplicationAnalysis

ApplicationApplication

ApplicationAnalysis

ApplicationAnalysis

ApplicationAnalysis

Classifications from two expert mathematics assessors

50 percent agreement at LOT/HOT level

Return

Depth of Knowledge

13-09-2017

25

Depth of knowledge can vary on a number of dimensions level of cognitive complexity of information students should

be expected to know how well they should be able to transfer this knowledge to

different contexts how well they should be able to form generalizations; and how much prerequisite knowledge they must have in order to

grasp ideas.

The depth of knowledge required by a learning activity or within an assessment is related to the number of connections with regard to concepts and

ideas a learner needs to make in order to produce a response;

the level of reasoning; and the use of other self-monitoring processes

– Are they aware of their own learning?– Can they self-reflective, self-assess, self-direct?

13-09-2017

26

How does Depth of Knowledge work? DOK is broken into 4 levels. As the levels increase, students must demonstrate

increasingly complex mental strategies. Level One is the most basic level, essentially the “definition”

stage. Higher levels of DOK require that students solve problems in

new and creative ways, and allow for multiple solutions to solve those problems.

Norman Webb’s Depth of Knowledge (DOK)

52

Level Label

1 Recall and Reproduction

Requires recall of information, such as a fact, definition, term, or performance of a simple process or procedure

2 Skills and Concepts Requires more than one cognitive process or step beyond recall.

3 Strategic Thinking Requires deep understanding exhibited through planning, using evidence, and more demanding cognitive reasoning

4 Extended Thinking Requires high cognitive demand, consists of complex tasks done over an extended period of time

The depth of knowledge levels in the model developed by Webb establishes how deeply or extensively students are expected to transfer and use what they are learning.

13-09-2017

27

Depth of knowledge focuses on complexity DOK is a reference to the complexity of mental processing that must

occur to answer a question, perform a task, or generate a product. Adding is a mental process. Knowing the rule for adding is the intended outcome that influences the

DOK. Once someone learns the “rule” of how to add, 4 + 4 is DOK 1 and is also easy. Adding 4,678,895 + 9,578,885 is still a DOK 1 but may be more difficult.

Depth of knowledge, not difficulty of question• Difficulty is a reference to how many students answer a

question correctly.• “How many of you know the definition of acclimatize?”

• DOK 1 – recall • If all of you know the definition, this question is an easy question.– “How many of you know the definition of quark?”

DOK 1 – recall– If most of you do not know the definition, this question is a difficult

question

13-09-2017

28

Depth of Knowledge and Bloom’s TaxonomyBloom’s cognitive taxonomy focuses upon the type of thinking, the verb used to describe the cognitive processes expected to be used in solving the task.The Depth of Knowledge is NOT determined by the verb, but the context in which the verb is used and the depth of thinking required.

DOK 3- Describe a model that you might use to represent the relationships that exist within the rock cycle. (requires deep understanding of rock cycle and a determination of how best to represent it)DOK 2- Describe the difference between metamorphic and igneous rocks. (requires cognitive processing to determine the differences in the two rock types)DOK 1- Describe three characteristics of metamorphic rocks. (simple recall)

55

DOK is about what follows the verb... What comes after the verb is more important than the verb itself. “Analyze this sentence to decide if the comma have been used

correctly” does not meet the criteria for high cognitive processing.

Suddenly, there came a clap of thunder.

The learner who has been taught the rule for using commas is merely using the rule.

13-09-2017

29

Depth of Knowledge Level 1: Recall and reproduction DOK 1 requires recall of information, such as a fact, definition, term, or

performance of a simple process or procedure, as well as performing a simple algorithm or applying a formula.

Answering a Level 1 item can involve following a simple, well-known procedure or formula. Simple skills and abilities or recall characterize DOK 1.

57

Depth of Knowledge Level 1: Recall and Reproduction Examples1. Identify a diagonal in a geometric figure.2. Multiply two numbers.3. Find the area of a rectangle.4. Convert scientific notation to decimal form.5. Measure an angle.

Source: Kentucky Department of Education (2007). Support Materials for Core Content for Assessment.

58

13-09-2017

30

Depth of Knowledge Level 2: Skills and Concepts

DOK 2 includes the engagement of some mental processing beyond recalling or reproducing a response. Items require students to make some decisions as to how to approach the question or problem.

Keywords distinguishing Level 2 may include classify, organize, estimate, make observations, collect and display data, and compare data.

These actions imply more than one mental or cognitive process/step.

59

Depth of Knowledge Level 2: Skills and Concepts examples1. Classify quadrilaterals.2. Compare two sets of data using the mean, median, and mode of each set.3. Determine a strategy to estimate the number of jelly beans in a jar. 4. Extend a geometric pattern.5. Organize a set of data and construct an appropriate

display.

Source: Kentucky Department of Education (2007). Support Materials for Core Content for Assessment.

60

13-09-2017

31

Depth of Knowledge Level 3: Strategic Thinking

DOK 3 requires reasoning, planning, using evidence, and more demanding cognitive reasoning. The cognitive demands at Level 3 are complex and abstract.

An assessment item that has more than one possible answer and requires students to justify the response they give would most likely be a Level 3.

61

Depth of Knowledge Level 3: Strategic Thinking Examples

1. Solve a multiple-step problem and provide support with a mathematical explanation that justifies the answer.

2. Write a mathematical rule for a non-routine pattern.3. Explain how changes in the dimensions affect the area and perimeter/circumference of

geometric figures.4. Provide a mathematical justification when a situation has more than one outcome.

Interpret information from a series of data displays.

Source: Kentucky Department of Education (2007). Support Materials for Core Content for Assessment.

62

13-09-2017

32

Depth of Knowledge Level 4: Extended Thinking

DOK 4 requires high cognitive demand and is very complex. It requires complex reasoning, planning, developing, and thinking.

Students are expected to make connections - relate ideas within the content or amongcontent areas — and select or devise one approach among many alternatives on how the situation can be solved.

Due to the complexity of cognitive demand, DOK 4 often requires an extended period of time.

63

Depth of Knowledge Level 4: Extended Reasoning, Extended Thinking ExamplesSpecify a problem, identify solution paths, solve the problem, and report the results.1. Collect data over time taking into consideration a number of variables and analyze the results.2. Model a social studies situation with many alternatives and select one approach to solve with a mathematical model.3. Develop a rule for a complex pattern and find a phenomenon that exhibits that behavior.4. Complete a unit on formal geometric constructions, such as nine-point circles or the Euler line.5. Construct a non-Euclidean geometry.

Source: Kentucky Department of Education (2007). Support Materials for Core Content for Assessment.

64

Return

13-09-2017

33

Examples of DOK 1 in Music

1. Name the notes of the C Major scale

2. Name 4 periods of classical music.

3. Know that a sharp raises a note ½ step

1. Simple recall of pre-learned knowledge

2. Simple recall, but must be taught3. Identify a #, recognize that it raises

a pitch

Items Why is this DOK 1?

Examples of DOK 2 in Music

1. Read and perform a simple rhythm

2. Play a simple melody or accompaniment

1. If the student interprets the rhythm (as opposed to repeating) it is DOK 2.

2. Student must make sense out of written notation and perform

Item Why is this DOK 2?

13-09-2017

34

Examples of DOK 3 in Music

1. Improvise a simple melody2. Perform as a member of a

conducted ensemble3. Compose a single line melody

1. New application of complex processes

2. Students make individual choices about performance

3. New application of complex processes

Item Why is this DOK 3?

Examples of DOK 4 in Music

1. Compose using 2 or more parts2. Improvise over a given chord

progression3. Perform in a student-led

ensemble or solo with accompaniment

1. Requires application of harmony, voice leading, cadence

2. Requires student to apply all previous learning in a new and novel situation

3. Student makes all choices

Item Why is this DOK 4?

13-09-2017

35

Level 1Includes the recall of information such as a fact, definition, term simple procedure, as well as performing a simple algorithm or applying a formula. In mathematics a one-step, well-defined, and straight algorithmic procedure is included at this lowest level. Other key words that signify a Level 1 include “identify,” “recall,” “recognize,” “use,” and “measure.” Verbs such as “describe” and “explain” could be classified at different levels depending on what is to be described and explained.

Level 2Keywords that generally distinguish a Level 2 item include “classify,” “organize,” ”estimate,” “make observations,” “collect and display data,” and “compare data.” These actions imply more than one step. For example, to compare data requires first identifying characteristics of the objects or phenomenon and then grouping or ordering the objects. Some action verbs, such as “explain,” “describe,” or “interpret” could be classified at different levels depending on the object of the action.

Level 3Requires reasoning, planning, using evidence, and a higher level of thinking than the previous two levels. In most instances, requiring students to explain their thinking is a Level 3. Activities that require students to make conjectures are also at this level. The cognitive demands at Level 3 are complex and abstract. An activity, however, that has more than one possible answer and requires students to justify the response they give would most likely be a Level 3. Other Level 3 activities include drawing conclusions from observations; citing evidence and developing a logical argument for concepts; explaining phenomena in terms of concepts; and using concepts to solve problems.

Level 4At Level 4, the cognitive demands of the task should be high and the work should be very complex. Students should be required to make several connections—relate ideas within the content area or among content areas—and have to select one approach among many alternatives on how the situation should be solved, in order to be at this highest level. Level 4 activities include designing and conducting experiments; making connections between a finding and related concepts and phenomena; combining and synthesizing ideas into new concepts; and critiquing experimental designs.

DOK Levels for Mathematics

Level 1 “Recall of Information”

- This level generally requires students to identify, list, or define. - Recall who, what, when and where.- Identify specific information contained in maps, charts, tables, and drawings.

Level 2 “Basic Reasoning”

- Convert information from one form to another:Contrast and CompareCause and EffectCategorize into groupsDistinguish between fact and opinion

Level 3 “Complex Reasoning”

- Apply a concept in other contexts.- Draw conclusions or form alternative conclusions.- Analyze how changes have affected people or places. - Analyze similarities and differences in issues or problems.

Level 4 “Extended Reasoning”

- Analyze and explain multiple perspectives or issues.- Make predictions with evidence as support.-Plan and develop solutions to problems.- Describe, define, and illustrate common social, historical, economic, or geographical themes and how they relate.

DOK Levels for Social Studies

Return

13-09-2017

36

Exercise Now look at the first 10 ICAS Mathematics items for

Standard 3– What Depth of Knowledge levels are applicable?

2013 ICAS Mathematics (Standard 3)

Q1 Q2 Q3 Q4 Q5

11

11

11

11

11

Q6 Q7 Q8 Q9 Q10

12

12

21

12

11

Classifications from two expert mathematics assessors

70 percent DOK agreement, assessor 1 is more conservative

13-09-2017

37

2013 ICAS Mathematics (Standard 3)

Q11 Q12 Q13 Q14 Q15

21

22

11

12

11

Q16 Q17 Q18 Q19 Q20

12

12

12

12

22

Classifications from two expert mathematics assessors

40 percent agreement, first assessor is more conservative, stricter

2013 ICAS Mathematics (Standard 3)

Q21 Q22 Q23 Q24 Q25

12

22

22

22

11

Q26 Q27 Q28 Q29 Q30

22

12

12

22

22

Classifications from two expert mathematics assessors

70 percent DOK agreement

13-09-2017

38

2013 ICAS Mathematics (Standard 3)

Q31 Q32 Q33 Q34 Q35

12

22

23

22

22

Q36 Q37 Q38 Q39 Q40

23

23

23

22

22

Classifications from two expert mathematics assessors

50 percent DOK agreement; it is very hard to write a multiple choice item so that it isDOK 3

Return

Karen Hess’s Cognitive Rigor Matrix

Bloom

RememberUnderstand

ApplyAnalyseEvaluateCreate

Webb’s Depth of KnowledgeRecall &

ReproductionSkills &

ConceptsStrategic Thinking

Extended Thinking

Not in matrix

Not in matrix

13-09-2017

39

“UGs”

EvaluateRecall, locate basic facts, definitions, details and eventsSelect appropriate word when intended meaning is clear

DOK LEVEL 1Recall and Reproduction

Understand

DOK LEVEL 3Reasoning

DOK 2Skills and Concepts

DOK 4Extended Thinking

Create

Use context to find meaningObtain and use information in text features

Explain relationshipsSummarizeCentral ideas

AnalyzeApply

Use language structure or word relationships (synonyms/anto-nyms)

Remember

Analyze or interpret author’s craft (e.g., literary devices, viewpoint, or potential bias) to critique a text

Use concepts to solve non-routine problems and justify solutions with evidence

Explain, generalize or connect ideas using supporting evidence (quote, text, evidence, data, etc.)

Compare literary elements, facts, terms and events.Analyze format, organization & text structures

Identify information in a graphic, table, visual, etc.

Cite evidence and develop a logical argument for conjectures based on one text or problem.

Synthesize information across multiple sources Articulate a new voice, theme, perspective.

Evaluate relevancy, accuracy and completeness of information

Analyze multiple sources or textsAnalyze complex abstract themes.

Devise an approach among alternatives to research a novel problem

Explain how concepts relate to other content domains

Develop a complex model or approach for a given situationDevelop an alternative solution

Generate conjectures or hypotheses based on observations or prior knowledge

Brainstorm ideas, concepts, problems, or perspectives related to a topic

Bloom’s Taxonomy + Webb’s DOK = the Hess CRM

Return

Little Red Riding Hood

Imagine your class has just read a version of Little Red Riding Hood (or another short story, in the language of the class)– What is a basic comprehension question you might ask? – What is a more rigorous question you might ask?

What must you consider when developing each type of question?

13-09-2017

40

Depth +Thinking

Level 1Recall & Reproduction

Level 2Skills & Concepts

Level 3Strategic Thinking/ Reasoning

Level 4Extended Thinking

Remember What color was Red’s cape?Who is this story about?

Understand Who are the main characters?What was the story’s setting?

Retell or summarize the story in your own words.

What is the author’s message or theme? Justify your interpretation using text evidence.

Apply Identify words/phrases that helped you to know the sequence of events in the story.

Analyze Is this a realistic or fantasy story?

Compare the wolf character to the character of Red. How are they alike-different?

Is this a realistic or fantasy story? Justify your interpretation using text evidence.

Are all wolves (in literature) like the wolf in this story? Support your response using evidence from this and other texts.

Evaluate What is your opinion about the cleverness of the wolf? Justify your opinion using text evidence.

Which version has the most satisfying ending? (establish criteria first, then locate evidence)

Create Write text messages between Red & her mother explaining the wolf incident.

No longer higher order thinking focused; deep thinking focused What we have thought of as “higher order” (analysis, evaluation,

creative thinking) might only be engaging or fun and not always deeper Many critical thinking examples do not go deep or get to DOK 3 or 4

(e.g., interpret/solve and justify) Shift our thinking from “higher order” to deeper learning, and that can

mean:– deeper understanding – deeper application– deeper analysis, etc.

13-09-2017

41

USA study of 8428 Year 3 assessmentsDepth + Thinking

Level 1Recall & Reproduction

Level 2Skills & Concepts

Level 3Strategic Thinking (support with data, equations, models, etc.)

Level 4Extended Thinking (cross domains)

Remember Know math facts, terms (34)

Understand Attend to precision, Evaluate expressions, plot point (18)

Model with mathematicsEstimate, predict, observe, explain relationships (2)

Construct viable argumentsGeometry proof

Integrate concepts across domains

Apply Calculate, measure, make conversions (28)

Make sense of routineproblems (8)

Make sense of non-routineproblems

Design & conduct a project

Analyze Identify a patternLocate information in table (2)

Use tools strategicallyClassify, organize data, extend a pattern (6)

Reason abstractlyGeneralize a pattern

Analyze multiple sources ofevidence

Evaluate Critique the reasoning of others

Create (1) Design a complex model

ICAS and Cognitive Rigor MatrixDepth + Thinking

Level 1Recall & Reproduction

Level 2Skills & Concepts

Level 3Strategic Thinking (support with data, equations, models, etc.)

Level 4Extended Thinking (cross domains)

Remember Know math facts, terms

Understand Attend to precisionEvaluate expressions, plot point

Model with mathematicsEstimate, predict, observe, explain relationships

Construct viable argumentsGeometry proof

Integrate concepts across domains

Apply Calculate, measure, make conversions

Make sense of routineproblems

Make sense of non-routineproblems

Design & conduct a project

Analyze Identify a patternLocate information in table

Use tools strategicallyClassify, organize data, extend a pattern

Reason abstractlyGeneralize a pattern

Analyze multiple sources ofevidence

Evaluate Critique the reasoning of others

Create Design a complex model

13-09-2017

42

2013 Year 3 ICAS Mathematics

Depth + Thinking

Level 1Recall & Reproduction

Level 2Skills & Concepts

Level 3Strategic Thinking

Level 4Extended Thinking

Remember 50

Understand 62

39

Apply 85

1211 2

Analyze 34

35 2

Evaluate

Create

Return1

ICAS, Depth of Knowledge 1 and Bloom’s RecallDOK 1: Recall and Reproduction

Recognizes, responds, remembers, memorizes, restates, absorbs, describes, demonstrates, follows directions, applies routine processes, definitions, and procedures

How many candles are there on Anna’s cake?A. 6B. 7C. 10D. 14

13-09-2017

43

ICAS, Depth of Knowledge 1 and Bloom’s Understanding

DOK 1: Recall and Reproduction

Ann has some pictures.

Which picture is in the third row from the topand the second column from the left?

Recognizes, responds, remembers, memorizes, restates, absorbs, describes, demonstrates, follows directions, applies routine processes, definitions, and procedures

ICAS, Depth of Knowledge 1 and Bloom’s ApplicationDOK 1: Recall and ReproductionRecognizes, responds, remembers, memorizes, restates, absorbs, describes, demonstrates, follows directions, applies routine processes, definitions, and procedures Sam has tiles like this:

He wants to cover the hexagon with tiles without gaps or overlapping.

How many tiles does Sam need?A. 14B. 12C. 10D. 8

Return

13-09-2017

44

Malaysian mathematics for primary gradesBy the end of Grade 4, students should be able to do the following: Numbers … Measurement—Understand time, including the 12 hour system, perform mathematical

operations, and solve problems involving units of time and the calendar; measure length, mass, and volume of liquid in metric units; calculate unit conversions; perform mathematical operations; and solve problems involving length, mass, and volume of liquid

Shapes and Space—Identify two- and three-dimensional shapes; calculate perimeter, area, and volume; and solve problems involving perimeter, area, and volume of squares, rectangles, cubes, and cuboids

Statistics—Extract and interpret information from pictographs and bar graphs

Interpret and construct simple pictograms, tally charts, block diagrams and simple tables

How many more people play tennis than cricket?

(A) 1(compares cricket and basketball)(B) 3 (key)(C) 4 (compares tennis and basketball)(D) 9(number playing tennis)

13-09-2017

45

Bloom’s Taxonomy -AnalyseWebb’s DOK Level 2 – Skills and Concepts

How many more people play tennis than cricket?

(A) 1 (compares cricket and basketball)(B) 3 (key)(C) 4 (compares tennis and basketball)(D) 9 (number playing tennis)

CRM: DOK 2, Bloom AnalyseCategorize, classify materials, data, figures based on characteristics; Organize or order data; Compare/ contrast figures or data; Select appropriate graph and display data; Interpret data from a simple graph; Extend a pattern.

Interpret and construct simple pictograms, tally charts, block diagrams and simple tables

There are 14 people in this group. Each personplays at least one sport. Only one person playsall three sports.How many people play exactly two sports.

(A) 4(B) 5(C) 6(D) 7

13-09-2017

46

ICAS 2013 YEAR 3 Q38Bloom’s Taxonomy - AnalyseWebb’s DOK Level 3 - Strategic Thinking/ReasoningITEM

(A) 4 (B) 5 (C) 6 (D) 7

CRM: DOK 3, Bloom AnalyseCompare information within or across data sets or texts; Analyse and draw conclusions from data, citing evidence; Generalize a pattern; Interpret data from complex graph; Analyse similarities/differences between procedures or solution.

Bloom’s Taxonomy - AnalyseWebb’s DOK Level 1 – Recall and ReproductionITEM

Which of these is a tetromino?(A) (B) (C) (D)

(3 squares) (no common sides) (5 squares) (key)

CRM: DOK 1, Bloom AnalyseRetrieve information from a table or graph to answer a question; Identify whether specific information is contained in graphic representations (e.g., table, graph, T-chart, diagram)

13-09-2017

47

Bloom’s Taxonomy - AnalyseWebb’s DOK Level 2 – Skills and ConceptsITEM

Sam made this tetromino.

Which of these is a tetromino different to Sam’s?(A) (B) (C) (D)

(a rotation) (missing a common side) (5 squares) (key)

CRM: DOK 2, Bloom AnalyseCategorize, classify materials, data, figures based on characteristics; Organize or order data; Compare/ contrast figures or data; Select appropriate graph and display data; Interpret data from a simple graph; Extend a pattern.

ICAS 2012 YEAR 6 Q39Bloom’s Taxonomy - AnalyseWebb’s DOK Level 3 - Strategic Thinking/ReasoningITEM

SolutionThere are 5 different tetrominoes

CRM: DOK 3, Bloom AnalyseCompare information within or across data sets or texts; Analyze and draw conclusions from data, citing evidence; Generalize a pattern; Interpret data from complex graph; Analyze similarities/differences between procedures or solution.

13-09-2017

48

Malaysian mathematicsBy the end of Grades 7 to 9, students should be able to do the following:1. Numbers …2. Shapes and Space …3. Relationships—Understand and solve problems involving algebraic expressions;

write, formulate, and solve problems involving linear equations, including simultaneous equations; solve linear inequalities, including simultaneous linear inequalities with one unknown; draw graphs of functions; understand and solve problems involving ratio and proportion; collect and organize data systematically; understand measures of central tendency (mean, mode, and median); and represent and interpret data in pictograms, bar graphs, line graphs, and pie charts, and solve related problems

http://timssandpirls.bc.edu/timss2015/encyclopedia/countries/malaysia/the-mathematics-curriculum-in-primary-and-lower-secondary-grades/

Bloom’s Taxonomy - AnalyseWebb’s DOK Level 1 – Recall and ReproductionITEMJim is making a pattern. Each shape in the pattern uses orange and white tiles.

How many white tiles does Jim add to make each new Shape?(A) 2 (number of orange tiles added)(B) 4 (key)(C) 6 (number of tiles added for each shape)(D) 8 (number of white tiles in Shape 2)

CRM: DOK 1, Bloom AnalyseRetrieve information from a table or graph to answer a question; Identify whether specific information is contained in graphic representations (e.g., table, graph, T-chart, diagram)

13-09-2017

49

Bloom’s Taxonomy - AnalyzeWebb’s DOK Level 2 – Skills and ConceptsITEMJim is making a pattern. Each shape in the pattern uses orange and white tiles.

How many white tiles should Jim use in Shape 6?(A) 13 (number of orange tiles in shape 6)(B) 20 (number of white tiles in shape 5)(C) 24 (key)(D) 37 (number of tiles in shape 6)

CRM: DOK 2, Bloom AnalyzeCategorize, classify materials, data, figures based on characteristics; Organize or order data; Compare/ contrast figures or data; Select appropriate graph and display data; Interpret data from a simple graph; Extend a pattern.

ICAS 2011 YEAR 7 Q27Blooms Taxonomy - ApplyWebb’s DOK Level 3 - Strategic Thinking/Reasoning

Key (A)

CRM: DOK 3, Bloom ApplyDesign investigation for a specific purpose or research question; Conduct a designed investigation; Use concepts to solve non- routine problems; Use reasoning, planning, and evidence; Translate between problem & symbolic notation when not a direct translation.

13-09-2017

50

Bloom’s Taxonomy - UnderstandWebb’s DOK Level 1 – Recall and Reproduction

Anish used this number line to show the distance between 1 and 5 is 4 units.

Which of these numbers is also 4 units from 1?

(A) -5 (uses opposite sign to 5)(B) -4 (uses opposite sign to 4)(C) -3 (key)(D) -2 (moves to left counting markers including 1)

CRM: DOK 1, Bloom UnderstandEvaluate an expression; Locate points on a grid or number on number line; Solve a one-step problem; Represent math relationships in words, pictures, or symbols; Read, write, compare decimals in scientific notation.

Bloom’s Taxonomy - UnderstandWebb’s DOK Level 2 – Skills and ConceptsITEMAnish used this number line to show that distance between 1 and 5 is 4 units.

Which of these numbers is greater than 4 units from 1?(A) -4 (key)(B) -3 (-3 is 4 units from 1)(C) -2 (-2 is less than 4 units from 1)(D) 3 (3 is less than 4 units from 1; 3 is greater than 1)

CRM: DOK 2, Bloom UnderstandSpecify and explain relationships (e.g., non- examples/examples); Make and record observations; Explain reasoning; Summarize results or concepts; Make basic inferences or logical predictions from data/observations; Use models to represent or explain mathematical concept; Make and explain estimates; Provide justification for steps taken.

13-09-2017

51

ICAS 2011 YEAR 7 Q20Blooms Taxonomy - UnderstandWebb’s DOK Level 3 - Strategic Thinking/Reasoning

Key is C

CRM: DOK 3, Bloom UnderstandExplain, generalize, or connect ideas using supporting evidence in, generalize, or connect ideas using supporting evidence; Make and justify conjectures; Explain thinking when more than one response is possible; Explain phenomena (observed, in data) in terms of concepts; Provide a mathematical or scientific justification Return

Psychometrics Literally the application of mathematical (measurement)

models to psychological traits Can be applied naively

– Most people can construct a test, add up scores on a test, summarise achievement using letter grades, make judgements

– Most of these assessments would fail to meet assessment industry standards

– Most people have experienced assessments, most people are “experts” in assessment and this belief can be challenging to shake

ICAS leverages the best psychometric practices

13-09-2017

52

Psychometrics is often focused on: Constructing tools (tests, assessments, surveys, scales) to

collect data Developing, using and evaluating procedures to convert that

data into measurements Often what is being measured is latent or hidden and so

much attention is paid to describing a construct.

UNSW Global Psychometrics We use many of the processes as used in international large

scale assessments– TIMSS, PIRLS, PISA

13-09-2017

53

ICAS test design Mostly multiple choice questions

– Some constructed response questions– Writing is assess using a prompt and marked with a rubric

Curriculum focused– Criterion assessment

Items designed with the primary purpose of assessing learning Easy to hard difficulty

Medal focused– Normative assessment

Items designed with the primary purpose of identifying a very small group of medal winners

Progression

Can you identifythe medal questions?These items can be answeredcorrectly by a very smallproportion of any country.

We place all items within thesame subject onto a commonscale: ICAS monitors growth

13-09-2017

54

Main analysis software We have our own java-coded Rasch software for analysis

and reporting purposes We also use Conquest, RUMM2030, SPSS, SAS

Used to develop prototypes for reporting– E.g., senate-weighted international percentile using ‘super-populations”

Key Stage SAT seems to be analysed using WinSteps– We have this software but choose to use the above software

And we use R This open-source software is being used to parallel process all aspects of our work Used to develop novel internal and external reporting systems

We use two theories for ICAS Classical test theory

– The theory behind most classroom tests– All score points are equal

Some items don’t contribute as they should, though

– Uses total score, averages and correlations– Most development in late 1800s to 1950s

Item response theory (also called Modern Test Theory)– The theory behind most large scale tests (which are often comprised

of many small items) developed from 1950s onwards, peak development in 1970s to 2000

– Focuses upon what each item is telling us about the person

13-09-2017

55

ICAS scores Item level

– Correct or incorrect (and choice) provided to student and school– Informative, especially if item is related to curriculum– Not summarised, amount of data may be daunting

Sub-domain and domain level– Either as number correct or ICAS scale scores

Number correct is the commonly used method ICAS scale scores are derived from Item Response Theory and are used for trend

purposes– We could generate sub-scale scaled scores, later

Classical test theory and ICAS

Classical test theory is primarily used to check the quality of the assessment– An internal quality assurance step– Item difficulty, item discrimination, reliability

But components of this theory are used in reports– The number correct

13-09-2017

56

Item difficulty From classical test theory

– The average score on an item For polytomous items, the average score on the item divided by the total score Valid ranges from zero to one, inclusive

– This statistic is sample dependent The item difficulty depends in part on the item but also on the group learners

answering the item

An ICAS test is typically arranged so items are arranged in order of difficulty– Easiest item first, hardest items last– But order is affected by item lay-out and paper constraints

Item discrimination From classical test theory

– A statistical measure of how well an item discriminates between those who have mastered the subject and those who have not Divide the class into three groups based upon total (subscale) score and compute

the difference in item difficulty for the upper and lower groups And there are many other definitions, evolving as technology evolved

– A correlational score between scored responses on the item and the total test score (or total subscale score) or the total test score with the item removed (the total subscale score with the item

removed) (another way)

13-09-2017

57

Item discrimination guide Most guides for the item discrimination are arbitrary

– But all treat negative item discrimination values as indicative of a highly problematic item

There can be an interaction between item discrimination and item difficulty– Very easy items don’t discriminate well because people in the lower

group are getting the item correct The guide:

– 0.2 or higher is desirable– Between 0 and 0.2 is problematic (maybe)

Point-biserial graph – Year 5 English

13-09-2017

58

Reliability Reliability is a statistical measure of how well each item hits or

is focused on the same construct– It is the average correlation across all possible item pairs

There are a range of possible reliability statistics:– Cronbach alpha, Kuder Richardson 20 (don’t use KR21)

Reliabilities range from 0 to 1– Below 0.7 the test scores are pretty useless– Above 0.95 there is a lot of redundancy in the items– ICAS aims for around 0.80 and 0.85

Reliability and validity This is a common diagram (search the internet for images

of reliability and validity and you will see what I mean)– But it is wrong

13-09-2017

59

Reliability and true score Under classical test theory we assume that the actual test

score is a reflection of the person’s hypothetical real or true score (T= X + ε)

And if that person did, without learning, the test many times then the average of the observed test scores would be equal to their true score (T=Σ(X)/n)

We can estimate how likely a particular observed test score captures the true score, where s is the test score standard deviation, and α is the test reliability

푠푒 = 푠 1− α

Reliability and true score, English NSW 2016Year Std Dev Alpha SE 95%

Range

02 5.17 0.77 2.49 4.8703 7.76 0.86 2.92 5.7304 8.15 0.87 2.88 5.6505 8.98 0.89 2.99 5.8606 8.95 0.89 3.02 5.9107 8.43 0.86 3.15 6.1808 8.60 0.86 3.17 6.2209 9.33 0.87 3.43 6.7210 9.66 0.88 3.38 6.6311 10.02 0.88 3.43 6.7212 9.85 0.89 3.34 6.54

2009 and 2010 Key Stage 2tests had reliabilities of 0.9and higher. Double lengththough.

George and Mallery (2003) provide the following rules of thumb:“ > .9 Excellent,

> .8 Good,> .7 Acceptable, > .6 Questionable, > .5 Poor, and< .5 Unacceptable” (p. 231).

13-09-2017

60

Item response theory Essentially assumes that each response tells us something about the person

responding and the item itself– So the analysis is at the item level

And the person level– Individual estimates of ability (and uncertainty)

– Analysis requires advanced mathematics and specialist software IRT models are “strong models”

– They make strong assumptions If these are met, they work well If they are not met, then ….

IRT models are all probabilistic models– They are mathematical models saying how likely something is be the case– Their predecessors were deterministic models (like Guttman)

Rasch Model There are many IRT models ICAS uses the Rasch model

– The same model that has been used in PISA,TIMSS, Key Stage SATs, and other studies

The Rasch model is a measurement model– It has mathematical features that allow claims of measurement (in a

philosophical way, like measurement of temperature and weight)– Some other IRT models are not measurement models

They summarise the data

푃(θ| = 1) =푒( )

1 + 푒( )

13-09-2017

61

Wright Map

푃(θ| = 1) =푒( )

1 + 푒( )

0

+1

+2

-2

-1

+ ∞

- ∞

Items Learners

Scale is in “logits”

Very easy item Weakest learner

Hardest item

Strongest learner

Where items and learnerare at the same location: P=0.5

An area with little information

An area with a lot ofinformation

The ability estimates of these people will be most accurate

The ability estimate of this personwill have a large error

These people have the sameability but may have differentability errors (or uncertainty)

ICAS scale

500

750

1000

1250

1500

1750

2000

2

34

5 67

The ICAS scale extends from -150 to 3000

We convert IRT logit scoresto scale scores.

ICAS scale score is a transformationof logit scores so that negativenumbers are highly unlikely.

And the numbers are such thatpeople are unlikely to confuse themwith raw scores or percentages.

13-09-2017

62

Common item equating using vertical linksAll ICAS tests within a domain and calendar year

Paper Intro (2)Paper A (3)

Paper B (4)Paper C (5)

Paper D (6)Paper E (7)

Paper F (8)Paper G (9)Paper H (10)

Paper I (11)Paper J (12)

Each year’s paper has common items with adjacent year’s paper

All items and tests within a calendar year are placed onto the same scale

Vertical equating items are checked

Checking measurementqualities of common items from years 2 and 3

13-09-2017

63

Common item equating using horizontal links

Paper Intro (2)Paper A (3)

Paper B (4)Paper C (5)

Paper D (6)Paper E (7)

Paper F (8)Paper G (9)Paper H (10)

Paper I (11)Paper J (12)

Paper Intro (2)Paper A (3)

Paper B (4)Paper C (5)

Paper D (6)Paper E (7)

Paper F (8)Paper G (9)Paper H (10)

Paper I (11)Paper J (12)

Current yearPast years

Checking horizontal equating

Year 6 to basecheck

13-09-2017

64

Model fit – are we measuring achievement? Our Rasch analyses produce item and ability parameter

estimates– Each item parameter has one or more fit statistics

And an estimate of how well the Rasch model is capturing the data– Or more strictly, how well the data fits the measurement model (Rasch

theoreticians follow this line of argument)

We generally focus on specific item fit statistics– Infit – does the item appropriately measure its target

This is the most important item fit statistic

– Outfit – does the item appropriately measure learners who are not targeted by the item

– We want infit and outfit statistics to be between 0.8 and 1.2

An easy, good fitting item

We want infit and outfit statistics to be between 0.8 and 1.2

13-09-2017

65

A poor fitting, difficult item

We want infit and outfit statistics to be between 0.8 and 1.2

Reporting At student and

school levels– Schools can slice-and-dice

data as they want Reporting system is largely

normative But can use patterns to

inform learning

13-09-2017

66

Looking at response patterns When the ICAS questions are arranged from easy to hard, and students

are arranged from highest to lowest performance, we expect to see a triangle pattern:

Unless the student is guessing or another ability (e.g., language proficiency) is very important.

Looking at specific response patterns The following patterns are ypically found:

– Guttman

– Rasch – Rasch with Careless Response – Rasch plus Guessing

– Guessing (no pattern)

– Special knowledge (pattern based upon specific curriculum knowledge)

The ideal Rasch pattern has 3 zones, arranged in order of item difficulty– A zone of items all answered correctly– A zone of items with some answered correctly, and some incorrectly– A zone of items all answered incorrectly

The Guttman pattern has two zones only

13-09-2017

67

ICAS is focused on skills, concepts and strategic thinking

As part of ourvalidity frameworkwe investigateICAS’s relationshipwith other measures.

We need to know howICAS relates to the Malaysianassessments.

You will need a plan for ICAS

Wayman (2005): “few would argue that creating more information for educators to use is a negative” (p. 236).– But it is too easy to be swamped with data– Or be too busy to use the data– Or not have a constructive, detailed plan for the use of ICAS

13-09-2017

68

Optimum conditions for data usage Raths, Kotch, & Gorowara (2009)

– School climate– Sensitive measures

Must relate ICAS to the syllabi Evidence should be “curriculum sensitive” and should align with the teacher’s

educational objectives– Timely access to evidence– Buy-in by teachers– Teacher skills– Conceptual interpretation of audience– Time for teachers in school day– Team work