reading comprehension pearson

70
Reading Comprehension Assessment: One VERY Resilient Phenomenon P. David Pearson, UC Berkeley Diane Hamm, MSU

Upload: emweber

Post on 15-Jul-2015

554 views

Category:

Education


0 download

TRANSCRIPT

Reading ComprehensionAssessment: One VERYResilient Phenomenon

P. David Pearson, UC Berkeley

Diane Hamm, MSU

Key referencesPearson, P.D., & Hamm, D. (in press). The assessment ofreading comprehension: A review of practices—past,present, and future. In S. G. Paris and S. A. Stahl (Eds.)Current issues in reading comprehension and assessment.Mahwah, NJ: Lawrence Erlbaum Associates.

Johnston, P. H. (1984a) Assessment in reading. In P. D.Pearson, R. Barr, M. Kamil, & P. Mosenthal (Eds.),Handbook of reading research (pp. 147-182). New York:Longman.Johnston, P. H. (1984b) Reading comprehensionassessment: A cognitive basis. Newark, DE: InternationalReading Association.

PurposeBuild an argument for a fresh line of inquiry intothe assessment of reading comprehension

By providing a rich and detailed historical accountof reading comprehension, both as a theoreticalphenomenon and an operational construct

Bonus: Try to offer my explanation of why theshort passage, multiple-choice exam format is soresilient

Why now?Renewed interest among scholars

Rand report

Uneasiness among practitioners that the code, asimportant as it is,may not be the point of readingThe most important outcome of reformNational thirst for accountability requires impeccablemeasures (both conceptually and psychometric)Pleas of teachers desperate for useful tools (need atool that does for RC what running records does forword id)

Reading comprehensionassessment has always vexedresearchers

We want to access the thing itself, the “click”ButWe only ever see its residue, its wake, itsartifactsWe are stuck with artifacts

Require them to tell us whether they understoodRequire them to tell us what they understoodQuiz them on the detailsRequest the big ideas

Most of the measures interposesome other skill or capacity betweenthe act and the evidence

WritingTalkingThe conventions of multiple-choiceassessmentsThese interposed processes inevitablycompromise our capacity to draw inferencesabout comprehension both as a generic anda passage specific enterprise

History nurtures modestyJust about any approach to assessingreading comprehension that has arisenin the last 30 years has a precedent thatis at least 70 years old.

CaveatsI apologize, in advance, for all thestudies, especially those done bypeople in this room, that I will fail to cite.I also want to make it clear that I amlimiting myself to READINGCOMPREHENSION assessment, notprerequisites to or correlates of RC.Just because I chronicle a practice doesnot mean that I advocate it!

The Century old Roots of ReadingComprehension Assessment

Fit the context of the period just afterthe last turn of the century

Curricular shift from oral to silentreading

From Accuracy and Expressiveness(oratory and declamation) of oral reading

To Indicators of understanding

The Roots…Emerging Scientism in educationGrowing numbers of studentsChanging demographics: widespreadilliteracyEfficient means of assessing (e.g. ArmyAlpha Exams)Move from high to a low inference index

1895 BinetUsed RC tasks as a measure of IQ notreading

Not inconsistent with Binet’s notion thatIQ should index school-based reasoningcapacity

Every one of us, whatever our speculative opinion,knows better than he practices, andrecognizes a better law than he obeys.”

Check two of the following statements with thesame meaning as the quotation above.

� To know right is to do the right.� Our speculative opinions determine our actions.� Our deeds often fall short of the actions we approve.� Our ideas are in advance of our everyday behavior.

From Thurstone IQ test, cited in Johnston, 1984, (undated)

Note the multiple correct answers.

1916 Kansas Silent ReadingTest*

Kelley

“fill in the blanks”

some verbal logic problems

some procedural tasks

Complete as many of 16 tasks as possible ina limited time

*The first published standardizedcomprehension test.

1917: ThorndikeReading as Reasoning

Basically an error analysis leading to a set ofcategories and a theory

Understanding a paragraph is like solving aproblem in mathematics. It consists in selectingthe right elements in the situation and putting themtogether in the right relations, and also with theright amount of weight or influence or force ofeach

Touton and Berry (1931) Erroranalyses

(a) failure to understand the question(b) failure to isolate elements of “an involved

statement” read in context(c) failure to associate related elements in a context(d) failure to grasp and retain ideas essential to

understanding concepts(e) failure to see setting of the context as a whole(f) other irrelevant answers

A panoply of measuresStarch: relevant words recalled as a functionof total words recalledCourtis (1914): words remembered/words intextChapman (1924): Find the words in part 2that do not fit the words in part 1 of theparagraph.Note similarity to later free recall and errordetection models of assessment

Enter Psychometrics in thelate 1930s

1935: IBM introduced the IBM 805 scanner

1935: Kelley: Factor Analysis

1944: Davis: Fundamental Factors

Davis 1944

Answer specific text-basedquestions

Main thought

Follow passageorganization

Word meanings in context

Word meanings

Author’s purpose

Literary devices

Draw inferences aboutcontent

Text based questionswith paraphrase

Word factor and a reasoning factor

Other Factor AnalysesHarris 1948: found a single factorDerrik (1953) found 3Hunt (1957) Vocabulary was everythingSchreiner, Hieronymus, and Forsyth (1971): Nodifferentiation among paragraph meaning, causeand effect, reading for inferences, and selectingmain ideas BUT separate LC and lower levelprocessingDavis (1968, 1972)

Davis 1972

8. Following the structure ofthe content

4. Weaving together ideas inthe content

7. Recognizing literarytechniques

3. Understanding contentstated explicitly

6. Recognizing the author’stone and mood andpurpose

2. Word meanings in context

5. Drawing inferences fromthe content

1. Remembering wordmeaning

Davis 1972Remembering word meanings

drawing inferences from content

structure of the passage

writerly techniques

explicit comprehension

Put an end to factor analytic studies

Cloze ProcedureWilson Taylor (1953): every 5th word

Bormuth (1966): the basis of readabilityresearch

Modifications to ClozeAllow synonyms to serve as correct answersDelete only every 5th content word (leavingfunction words intact)Use an alternative to every 5th word deletionMAZE: MC for the blanksMacro cloze: phrasesDelete words at the end of sentences and provide aset of choices from which examinees are to pickthe best answer

The conceptual death of clozeShanahan, Kamil,& Tobin (1983): notsensitive to “intersentential” comprehension

No differences when sentences werescrambled within or across passages orpresented in isolation

Still survivesDRP

Stanford Diagnostic

ESL

Passage DependencyP passage - P isolation

A quiet stir in the late 60s and early 70s(Tuinman)

Died in the wake of Schema Theory’sembrace of prior knowledge

Criterion-referencedassessment

Make a virtue out of sub-skillsTook the notions of mastery learning comingout of Carroll, Gagné and BloomDefine sets of subskillsSet a level of masteryTest-teach-testAssumes a componential skill view of readingData: Bloom’s experiments with Ed Psycourses

CRT takes overWisconsin Design for Reading SkillDevelopment

Fountain Valley

Virtually every basal program by the mid1970s

The children wanted to make a book for their teacher. One girlbrought a camera to school. She took a picture of each person in theclass. Then they wrote their names under the pictures. One boy tied allthe pages together. Then the children gave the book to their teacher.

1. What happened first?a. The children wrote their namesb. Someone brought a camera to schoolc. The children gave a book to their teacher

2. What happened after the children wrote their names?a. A boy put the pages together.b. The children taped their pictures.c. A girl took pictures of each person

3. What happened last?a. The children wrote their names under the pictures.b. A girl took pictures of everyone.c. The children gave the book to their teacher.

(adapted from the Ginn Reading Program, 1982)

Reactions to this movementProvided fuel for the constructivistreforms that were gathering momentumDied in the early 90s basals for about 6yearsOnly to be revived recentlyJohnson, D.D., & Pearson, P.D., (1975).Skills management systems: A critique.The Reading Teacher, 28, 757-764.

Domain referencedassessment

John Bormuth, Toward a Theory ofAchievment Test ItemsIdentify the domain as textsMap all of the logical relations amongsentences.Using linguistic transformations, develop allpossible Wh questions--> itemsRandomly sample from the domainSurvives in Math, not reading

The Cognitive RevolutionThe powerful impact of schema

The evolution of text analytic systemsStory grammars ala Stein & Glenn

Propositional analysis of texts ala Kintsch& vanDijk

Inference taxonomies ala Trabasso

The Impact of Cognitive Scienceon Assessment

more attention to the role of prior knowledge

attention to text structure (in the form of storymaps and visual displays to capture theorganizational structure of text)

the introduction of metacognitive monitoring

Used to critique the existing assessmenttraditions on the way to new assessments

Contrasts between what weknow and what we do

Yet when we assessreading comprehension,we . . .

New views of the readingprocess tell us that . . .

From Valencia, S., & Pearson, P.D. (1987). Readingassessment: Time for a change. The ReadingTeacher, 40, 726-733.

Use short texts that seldomapproximate the structural andtopical integrity of anauthentic text.

A complete story or text hasstructural and topicalintegrity.

Mask any relationshipbetween priorknowledge and readingcomprehension by usinglots of short passages onlots of topics.

Prior knowledge is animportant determinant ofreading comprehension.

Use multiple-choiceitems with only onecorrect answer, evenwhen many of theresponses might, undercertain conditions, beplausible.

The diversity in priorknowledge acrossindividuals as well asthe varied causalrelations in humanexperiences invitesmany possibleinferences to fit a textor question.

Rely on literalcomprehension test items.

Inference is an essentialpart of the process ofcomprehending units assmall as sentences.

Seldom assess how andwhen students vary thestrategies they use duringnormal reading, studying,or when the going getstough.

The ability to varyreading strategies to fitthe text and the situationis one hallmark of anexpert reader.

A sense that we hadPaid too much attention tomeasurement theory and

Not enough to reading theory

Structural representationsUsed in test development

Determine hierarchical and sequentialrelations

A theory of importance

Determines which nodes should beassessed

Authentic TextsSelect, not construct, texts forunderstanding

(started a cottage industry for magazinepublishers)

Can’t tinker with the text to rationalizeitems and distractors

(drove professional item writers crazy)

More than one right answerHow does Ronnie reveal his interest in Anne?

Ronnie cannot decide whether to join in theconversation.

Ronnie gives Anne his treasure, the green ribbon.

Ronnie gives Anne his soda.

Ronnie invites Anne to play baseball.

During the game, he catches a glimpse of the greenribbon in her hand.

Rate all of the responses onsome scale of relevance

How does Ronnie reveal his interest in Anne?(2)(1)(0) Ronnie cannot decide whether to join in theconversation.(2)(1)(0) Ronnie gives Anne his treasure, the greenribbon.(2)(1)(0) Ronnie gives Anne his soda.(2)(1)(0) Ronnie invites Anne to play baseball.(2)(1)(0) During the game, he catches a glimpse of thegreen ribbon in her hand.

Best predictor of retelling scores

IncludeComplex indicators of comprehension

Prior knowledge

Metacognition

Habits, attitudes, and dispositions

Most notable examplesMEAP (Wixson, Peters, Paris)

IGAP (Valencia, Shanahan, Pearson,Reeve)

Some findings from IGAPWhen we plugged in Comprehension, PriorKnowledge, Metacognition, Habits/Attitude

We emerged with these factorsmetacognitive

habits/attitudes items

a combination of the comprehension and priorknowledge items

FateWent the way of all tests that challengethe conventional wisdom

Not good to teach to (e.g. metacognitiveitems)

Went down in the mid 1990s when theytried to add on an individual scorereporting component

Sentence verification taskOriginal: Verbatim repetition of a sentence in thepassageParaphrase: The same meaning as an original but withlots of semantic substitutes for words in the originalsentence.Meaning change: Uses some of the words in thepassage but in a way that changes the meaning of theoriginal sentence.Distractor: A sentence that differs in both meaning andwording from the original.

Judge each as old or newMost people seem content with polyester fillings andsuch. (Original)You don't know what comfort is until you've sunk yourhead into 3,000 bits of polyester. (Meaning change)It is always fun visiting grandparents because they takeyou someplace exciting, like the zoo or the circus.(Distractor)Being able to hear stories of when his mom and dadwere kids was one of the great things about havinggrandparents around, Tim concluded. (Paraphrase)His favorite grandparent was his mother's mother.(Distractor)

Why isn’t SVT used moreoften?

Does not pass the prima facie test

Time consuming to prepare

Sociocultural and LiteraryPerspectives

Learning and understanding areinherently socialAssessment should be responsive,interactive, and dynamicTexts are inherently political documentswith points of view and agendas andauthorsRosenblatt: Reader, text, and poemLanger: Into, through, and beyond

CLASIf you were explaining what this essay is about toa person who had not read it, what would you say?

What do you think is important or significantabout it?

What questions do you have about it?

This is your chance to write any otherobservations, questions, appreciations, andcriticisms of the story”

Another CLASNow you will be working in a group. You will bepreparing yourself to do some writing later. Your groupwill be talking about the story you read earlier. A copy ofthe story is provided before the group questions if youneed to refer to it. Some of the activities in this sectionmay direct you to work alone and then share with yourgroup, and other activities may have all of you workingtogether. It is important to take notes of your discussionbecause you will be able to use these notes when you doyour writing.Read the directions and do the activities described.Members of the group should take turns reading thedirections. The group leader should keep the activitiesmoving along so that you finish all activities.You’ll have 15 minutes for these prewriting activities.

The demise of performanceassessment in wide-scale

The social aspect: Whose work is it anyway?

Generalizability: Too passage specific

Expense: Scoring and rubric development

Invasion of privacy

The legacy:Mixed models

Classroom assessment

Sharing thinkingThink alouds

Olshavsky

Hartman

NVT

Write alongsFarr & Greene

CLAS

NAEP1960s: Goal free evaluation

What you see is what you get

NAEP 1970sDemonstrate the ability to showcomprehension of what was read

analyze what is read, use what is read

reason logically

make judgments

have attitude/interest in reading.

NAEP 1980svalue reading and literature

comprehend written works

respond to written works in interpretiveand evaluative ways

apply study skills

NAEP 1990sFORMING INITIAL UNDERSTANDING

Which of the following is the best statement of thetheme of the story

DEVELOPING INTERPRETATIONSWhat caused this event

PERSONAL REACTION AND RESPONSEHow did this character change your ideas of _____

DEMONSTRATE CRITICAL STANCEWhat could be added to improve the author’sargument

NAEP concernsThe framework does not passpsychometric muster

Not much information at the lower endof the performance scale (no floor)

Item format: Do CR items add anyvalue to the information gained?

Not if they are MC in disguise?

Mapping test scores and textdifficulty on the same scale

Stenner et al: Lexile

DRP

Carver

New InitiativesLots of psychometric work

Lots of conceptual work

Share a few examples

Reading for UnderstandingThe standards for good assessment,especially those dealing withinstructional sensitivity, are critical

Notice that in most of our work, weassume the validity of our measuresand test the validity of the interventions.

What if we turned that around?

What does it mean to achieve agiven comprehension score?

Find a population of kids with a narrow bandof overall comprehension scoresAdminister lots of subskill tests, decoding,vocabulary, and comprehensionEvaluate prerequisiteness and compensatoryhypotheses

Which types of knowledge/skill are essentialHow many ways are there to get to 6.5?

Valencia study (later this morning)Note that New Standards Reference Examprivileges compensatory concept.

More questions to answer?For accommodations, how do we weighincreased participation against potentialsources of invalidity?

Time

Glossary

Mode of presentation

Starting overGo back to a set of theoreticalconceptualizations of comprehension

Component SkillsKnowledge Driven models (Schema Theory andConstruction-Integration)Contextually Driven models (Socio-cultural orcritical)Executive Control models (metacognition andCognitve Flexibility Theory)

Mine each for assessment implicationsApply each set of implications to a commonset of passages to create a set of alternativetheory-based assessments

More steps

Develop a “gold standard” forcomprehension—how do we get as close aspossible to that ineffable phenomenon?My candidate: Some on-line assessment ofboth the content (ideas in text) and the affect(phenomenological sense) of comprehension(akin to the write alongs)Examine the predictive validity of theassessment models generated from eachtheoretical perspective in relation to the goldstandardBe open to the possibility of a mixed model

ConclusionWe have traveled far, sometimes on newroads and sometimes on old.

Virtually all the old forms of assessmentsurvive, even flourish because of their

Psychometric properties

Efficiencies

And because challengers often fail to meeteither psychometric or efficiency standards

ConclusionWe seem poised to re-energizeourselves in this important enterprise

To build assessments that can meet themost rigorous of both measurement andconceptual standards

To serve the needs of both classroomteachers and policy makers

A welcome challenge