reading assessments in kindergarten through third grade: findings from the center for the...

Reading Assessments in Kindergarten through Third Grade: Findings from the Center forthe Improvement of Early Reading AchievementAuthor(s): Scott G. Paris and James V. HoffmanSource: The Elementary School Journal, Vol. 105, No. 2, Lessons from Research at the Centerfor the Improvement of Early Reading Achievement<break></break>Joanne F. Carlisle, StevenA. Stahl, and Deanna Birdyshaw, Guest Editors (November 2004), pp. 199-217Published by: The University of Chicago PressStable URL: http://www.jstor.org/stable/10.1086/428865 .

Accessed: 19/06/2013 12:22

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

The University of Chicago Press is collaborating with JSTOR to digitize, preserve and extend access to TheElementary School Journal.

http://www.jstor.org

This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=ucpress

http://www.jstor.org/stable/10.1086/428865?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


The Elementary School JournalVolume 105, Number 2� 2004 by The University of Chicago. All rights reserved.0013-5984/2004/10502-0005$05.00

Reading Assessmentsin Kindergartenthrough Third Grade:Findings from theCenter for theImprovement of EarlyReading Achievement

Scott G. ParisUniversity of Michigan

James V. HoffmanUniversity of Texas at Austin

Abstract

Assessment of early reading development is im-portant for all stakeholders. It can identify chil-dren who need special instruction and provideuseful information to parents as well as sum-mative accounts of early achievement in schools.Researchers at the Center for Improvement ofEarly Reading Achievement (CIERA) investi-gated early reading assessment in a variety ofstudies that employed diverse methods. Onegroup of studies used survey methods to deter-mine the kinds of assessments available to teach-ers and the teachers’ reactions to the assess-ments. A second group of studies focused onteachers’ use of informal reading inventories forformative and summative purposes. In a thirdgroup of studies, researchers designed innova-tive assessments of children’s early reading, in-cluding narrative comprehension, adult-child in-teractive reading, the classroom environment,and instructional texts. The CIERA studies pro-vide useful information about current readingassessments and identify promising new direc-tions.

Achievement testing in the United Stateshas increased dramatically in frequency andimportance during the past 25 years and isnow a cornerstone of educational practiceand policy making. The No Child Left Be-hind (NCLB) (2001) legislation mandatesannual testing of reading in grades 3–8 andincreased assessment for students in gradesK–3 with clear intentions of increased ac-countability and achievement. The ration-ales for early assessment lie in (a) researchon reading development that indicates theimportance of basic skills for future successand (b) classroom evidence that early diag-nosis and remediation of reading difficul-ties can improve children’s reading achieve-ment (Snow, Burns, & Griffin, 1998). Theunprecedented federal resolve and re-



200 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2004

sources at the beginning of the twenty-firstcentury that are focused on the improve-ment of children’s reading achievement re-quire researchers and educators to identifyuseful assessment tools and procedures.

The Center for the Improvement ofEarly Reading Achievement (CIERA), aconsortium of researchers from many uni-versities, was funded and became opera-tional in 1997. Assessment of readingachievement and the corresponding prac-tices and policies were major foci of the re-search agenda. It is important to note thatthe CIERA research was proposed, and of-ten conducted, before the report of the Na-tional Reading Panel (2000) and before theNCLB legislation. These important eventsdid not frame CIERA research at the time,but they certainly influence the interpreta-tion of assessment tools today. For example,both the NRP and NCLB emphasized fiveessential skills for beginning reading suc-cess: the alphabetic principle, phonemicawareness, oral reading fluency, vocabu-lary, and comprehension. Consequently,many of the early reading assessments de-veloped recently have focused on thoseskills, especially the first three. CIERA re-searchers acknowledge the importance ofassessing these skills, but they chose to in-vestigate a broader array of assessment is-sues and practices partly because therewere already many assessments of the al-phabetic principle, phonemic awareness,and oral reading fluency, such as the Dy-namic Indicators of Basic Literacy Skills (DI-BELS), a popular and quick battery of read-ing assessments (Good & Kaminski, 2002).Moreover, CIERA researchers in 1997wanted to survey teachers to find out whatassessments they used and why, as well asto identify new kinds of assessments fornonreaders and beginning readers.

CIERA research on early reading assess-ment was proposed and conducted in anera of increased testing and evidence-basedpolicy making. An initial cluster of studiesexamined the kinds of reading assessmentsavailable and used by teachers in order to

describe current classroom practices. Thestudies were intended to be surveys of bestpractices in schools. A second group ofstudies examined the use of oral reading as-sessments to determine students’ adequateyearly progress (AYP) because many infor-mal reading inventories were being trans-formed into formal summative assessmentsof reading achievement. Several otherCIERA studies examined innovative toolsand new directions for assessing early read-ing achievement. The research was explor-atory, eclectic, and conducted by multipleinvestigators, but collectively, the studieshelp to identify promising assessments ofreading along with some practical obstaclesto implementation. We present the findingsof these three groups of studies and con-clude with a discussion of future directionsin K–3 reading assessment research.

Surveys of Early ReadingAssessmentsMany teachers are overwhelmed by the nu-merous reading assessments mandated bypolicy makers, advocated by publishers, re-quired by administrators, or simply rec-ommended for classrooms. We begin withan examination of two CIERA studies onthe variety of assessment instruments avail-able for K–3 teachers. We then examine twoCIERA studies of teachers’ attitudes towardand use of assessments. The first two stud-ies differ in their focus on commercial andnoncommercial measures. Both studies fol-lowed up on the pioneering research byStallman and Pearson (1990), who con-ducted one of the first comprehensive sur-veys of early reading measures.

The Commercial MarketplaceTen years after Stallman and Pearson’s

(1990) study, Pearson, Sensale, Vyas, andKim (1999) conducted a similar study ofcommercial reading tests. They identified148 tests with 468 subtests in their CIERAsurvey. More than half of the tests had beendeveloped in the 1990s, and more than halfwere designed for individual administra-



ASSESSMENTS 201

tion, clearly a response to the preponder-ance of group tests in the previous decade.Multiple-choice responses and marking an-swer sheets still predominated over read-ing, writing, or open-ended responses.Nearly all tests were administered with amixture of visual and auditory presenta-tions. In contrast to the previous decade,about 40% of tests required production as aresponse mode. Recognition was requiredin about 40% of the tests and identificationin only 10%. Scoring ease may have driventhe response mode because more than 60%of the tests could be scored simply as rightor wrong, less than 20% containedmultiple-choice items, and only 10% of tests used ru-brics to score answers.

Pearson et al. (1999) analyzed the skillsassessed in the 148 tests and found thatword knowledge, such as concept identifi-cation, was assessed in 50%; sound andsymbol concepts were assessed in 65%; lit-eracy and language concepts were assessedin 90%; and comprehension was assessed inonly 24% of the tests. When they analyzedonly the K–1 tests to comparewith the Stall-man and Pearson (1990) findings, theyfound that 52% compared to a previous 18%of tests were administered to individualchildren. Only 36% of the tests compared tothe previous 63% were multiple-choicetests, and the heavy emphasis on sound-symbol correspondence was reduced in halfand replaced by a much stronger emphasison language and literacy concepts. Thesechanges may be due to the growing influ-ence of whole language, Clay’s ObservationSurvey (1993), and assessment methodsused in Reading Recovery throughout the1990s. Although the type of processing re-quired was still largely recognition, it haddecreased from 72% of tests in the first sur-vey to 51%. Likewise, filling in bubbles de-creased from 63% to 28% of the tests, andoral responding increased from 12% to 39%.

The authors also noted the variety ofnew reading assessments that emerged inthe 1990s. Kits, such as the Dial-R, that in-cluded assessment batteries became more

prominent. Some elaborate systems weredeveloped for using classroom assessmentfor formative and summative functions. Forexample, the Work Sampling System (Mei-sels, Liaw, Dorfman, & Nelson, 1995) in-cludes developmental benchmarks fromages 3 to 12 in behavioral, academic, andaffective domains that can be used withteachers’ checklists and students’ portfo-lios to monitor individual growth andachievement. The kits and elaborate sys-tems usually include teachers’ guides, cur-riculum materials, and developmental ru-brics. Leveled books became a popular toolfor determining children’s reading levelsfor the assessment tasks, again reflectingthe influence of Reading Recovery, GuidedReading, and similar derivations for in-struction.

Pearson et al. (1999) concluded thatcommercial reading tests in the late 1990swere muchmore numerous and varied thanthe tests available 10 years earlier. Moreskills were tested, particularly languageand literacy concepts. More choices, judg-ments, and interpretations were requiredfrom the examiner, usually the teacher, touse the new tests. However, there was stilla preponderance of recognition responsesand filling in bubbles on answer sheets. Theresearchers suggested that the changes inearly reading assessments during the 1990sreflected the influences of three thematicchanges to early literacy and language artsin classrooms: emergent literacy, processwriting approaches, and performance as-sessments throughout the curriculum.

Noncommercial AssessmentsThe second CIERA survey of early read-

ing assessments was conducted by Meiselsand Piker (2000). Their study had three ob-jectives: ‘‘1) to gain an understanding ofclassroom-based literacy measures that areavailable to teachers; 2) to characterize theinstructional assessments teachers use intheir classrooms to evaluate their students’literacy performance; and 3) to learn moreabout how teachers assess reading andwrit-




NOVEMBER 2004

ing skills’’ (p. 5). In contrast to the previousstudies of commercial assessments, Meiselsand Piker (2000) examined noncommercialliteracy assessments that were nominatedby teachers and educators or used by schooldistricts. They excluded assessments usedfor research or accountability and focusedon K–3 instruments. Some assessments ofmotivation and attitudes were also in-cluded. The researchers collected informa-tion from educational list serves, personalcontacts, literature searches, published re-views of the measures, Web sites, and news-letter postings so their survey was directedat assessments in use rather than on sale inthe marketplace.

Their search identified 89 measures, 60of which were developed in the 1990s. Thecoding categories Meisels and Piker (2000)used were adapted from the Stallman andPearson (1990) survey and categorizedmea-sures on 13 literacy skills: print awareness,phonics, reading, reading strategies, com-prehension, writing process, writing con-ventions, motivation, self-perception, meta-cognition, attitude, oral language listeningand speaking, and oral language other. Thefirst six of these skills are most directly re-lated to reading assessment. However, theCIERA researchers identified 203 subskillsamong these 13 categories. This is again anindication of the conceptual and practicaldecomposition of literacy skills in complexassessment batteries.

Meisels and Piker (2000) found that 70%of the measures were designed for individ-ual administration and nearly half were in-tended for all grades, K–3. Only five wereavailable in languages other than English(four in Spanish, one in Danish). Of the 13skills, phonics, comprehension, and readingwere assessed most frequently, and moti-vation, self-perception, and attitudes weremeasured least often. Of the 89 measures,47 were based on observation or on-demand methods for evaluating students’literacy. Constructed responses were usedmostly with writing. Checklists were usedin 36% of the measures, and running re-

cords in 15%. The most frequent kind of re-sponse was oral response on 64% of themeasures, followed by writing on 46% ofthe measures. Recognition, identification,and recall were used to assess about one-third of the skills. Meisels and Piker (2000)then examined the skills assessed in eachtest and found that 70% were assessed withobservations and that the data were re-corded in checklists (69%) or anecdotal ob-servations (45%) most often. Both the lim-ited response formats for students and theinformal records of teachers are worth not-ing.

Meisels and Piker (2000) examined themeasures for evidence of psychometric re-liability and validity and expressed disap-pointment with the results. Only 14% of themeasures had evidence of good reliabilitythat ranged from high to moderate. Evenless information was available about valid-ity. No consistent tests or benchmarks wereused to establish concurrent or predictivevalidity. The researchers noted that the non-commercial measures were less likely to in-clude psychometric evidence than commer-cial tests. In comparing their results to theStallman and Pearson (1990) study, Meiselsand Piker (2000) also noted that the non-commercial measures were usually de-signed for individuals, not groups, and hadmore opportunities for students to identifyor produce answers rather than just recog-nize correct choices. Noncommercial mea-sures usually had fewer guidelines for ad-ministering, recording, and interpreting theassessment information.

How Teachers Use and RegardReading AssessmentsThe next set of studies went beyond a

consideration of the instruments to examinehow teachers use and evaluate them. Paris,Paris, and Carpenter (2002) reported find-ings from a survey of teachers’ perceptionsof assessment in early elementary grades.They asked successful teachers what kindsof reading assessments they used for whatpurposes so that a collection of ‘‘best prac-



ASSESSMENTS 203

tices’’ might be available as models forother teachers. The assessment survey wasa part of a large CIERA survey of elemen-tary teachers who taught in ‘‘beat the odds’’schools to determine their practices andviews. These schools across the UnitedStates had a majority of students who qual-ified for Title I programs and had a meanschool test score on some standardizedmeasure of reading achievement that washigher than the average score of other TitleI schools in the state. Most of the selectedschools also scored above the state averagefor all schools. Candidate schools were se-lected from a network of CIERA partnerschools as well as from annual reports ofoutstanding schools in 1996, 1997, and 1998as reported by the National Association ofTitle I Directors.

The sample included 504 K–3 classroomteachers in ‘‘beat the odds’’ schools, but theanonymous and voluntary survey made itimpossible to determine if these were themost effective teachers in the schools. Inthe first part of the survey, teachers wereasked to record the types of reading as-sessments used in their classrooms andthe frequency with which they used eachone. Most teachers reported that they usedall of the assessment types; 86% used per-formance assessments, 82% used teacher-designed assessments, 78% used word at-tack/word meaning, 74% used measuresof fluency and understanding, 67% usedcommercial assessments, and 59% usedstandardized reading tests.

The survey showed that K–3 teachersused a variety of assessments in their class-rooms daily. Assessments designed byteachers, including the instructional assess-ments Meisels and Piker (2000) examined,were used most frequently, and standard-ized tests were used least often. This con-trast was most evident for K–1 teacherswho rarely used standardized tests. Thesurvey showed that K–3 teachers used ob-servations, anecdotal evidence, informal in-ventories, and work samples as their mainsources of evidence about children’s read-

ing achievement and progress. The surveyalso showed the variety of tools available toteachers and the large variation amongteachers in what they used. The dauntingvariety of assessments requires a highlyskilled teacher to select and use appropriatetools.

Another part of the survey posed ques-tions about the effects of assessments onvarious stakeholders. In general, teachersreported that teacher-designed, informal as-sessments had more positive effects on stu-dents, teachers, and parents. Conversely,teachers believed standardized and com-mercial assessments had a higher positiveeffect on administrators. These patternssuggest that teachers differentiate betweenassessments over which they have controland assessments generated externally interms of their effects on stakeholders. It isironic that teachers believed that the mostuseful assessments for students, teachers,and parents were valued less by adminis-trators than standardized and commercialassessments.

Responses to High-Stakes AssessmentA fourth survey conducted by CIERA

researchers gathered the views of teachersregarding high-stakes testing (Hoffman,Assaf, & Paris, 2001). This study, which sur-veyed reading teachers in Texas, was de-signed as a modified replication of earlierinvestigations of teachers’ views of high-stakes testing in Arizona (Haladyna, Nolen,& Haas, 1991) and Michigan (Urdan &Paris, 1994). Texas is recognized nationallyas one of the leaders in the testing and ac-countability movement. The Texas Assess-ment of Academic Skills (TAAS) was thecenterpiece of the state’s accountability sys-tem throughout the 1990s. The TAAS was acriterion-referenced assessment of readingand mathematics given to all Texas studentsin grades 3–8 near the end of the year. It hasrecently been replaced by the Texas Assess-ment of Knowledge and Skills (TAKS), butthe design and use are essentially the same.The study, conducted in 1998–1999, in-




NOVEMBER 2004

cluded responses from 200 experiencedreading specialists who returned amail sur-vey. For the most part, respondents wereolder (61% between the ages of 40 and 60)and more experienced (63% with over 10years experience and 45% with over 20years experience) than Texas classroomteachers in general. Most respondents wereworking in elementary grades (78%) and inminority school settings (81%) serving low-income communities (72%) where the needfor reading specialists was greatest andfunds for them were most available.

To examine general attitudes we createda composite scale for the following fouritems from this section:

• Better TAAS tests will make teachersdo a better job.

• TAAS tests motivate students to learn.• TAAS scores are good measures ofteachers’ effectiveness.

• TAAS test scores provide good com-parisons of the quality of schools fromdifferent districts.

Each of these items represents some ofthe political motivations and intentionsthat underlie the TAAS. Respondents ratedeach item on a scale ranging from 1(strongly disagree) to 4 (strongly agree).The average rating on this composite vari-able was 1.7 (SD � .58), suggesting thatreading specialists strongly disagreed withsome of the underlying assumptions of andintentions for the TAAS.

Another composite variable was createdwith items related to the validity of theTAAS as a measure of student learning. Thefour items included in this analysis were:

• TAAS tests accuratelymeasure achieve-ment for minority students.

• TAAS tests accuratelymeasure achieve-ment for limited English-speaking stu-dents.

• Students’ TAAS scores reflect whatstudents have learned in school duringthe past year.

• Students’ TAAS scores reflect the cu-

mulative knowledge students havelearned during their years in school.

The average rating on this composite vari-able was also 1.7 (SD� .58), suggesting thatreading specialists challenge the validity ofthe test, especially for minority studentsand ESL speakers, who are the majority ofstudents in Texas public schools.

Contrast these general attitudes and be-liefs regarding TAAS with the perception ofthe respondents that administrators believeTAAS performance is an accurate indicatorof student achievement (M � 3.1) and thequality of teaching (M � 3.3). Also, contrastthis with the perception of the reading spe-cialists that parents believe the TAAS re-flects the quality of schooling (M � 2.8).The gaping disparity between the percep-tions of those responding and their views ofadministrators’ and parents’ attitudes sug-gests an uncomfortable dissonance. Otherparts of the TAAS survey revealed thatreading specialists reported more pressureto cheat on the tests among low-performingschools, inappropriate uses of the TAASdata, adverse effects on the curriculum, toomuch time spent on test preparation, andnegative effects on teachers’ morale andmotivation. In sum, the survey revealed un-intended and negative consequences ofhigh-stakes testing that are similar to resultsof other studies of the consequences ofhigh-stakes testing (e.g., Paris, 2000; Paris,Lawton, Turner, & Roth, 1991; Urdan &Paris, 1994).

Summary of the CIERA SurveysThe four CIERA surveys support several

conclusions. First, a vast assortment of com-mercial and informal reading assessments isavailable for K–3 classroom teachers. Stall-man and Pearson (1990) identified 20 com-mercial reading tests, yet 10 years laterPearson et al. (1999) found 148, and thenumber is certainly higher today. However,commercial tests are not the only source ofreading assessments. Meisels and Piker(2000) solicited information about noncom-mercial assessments from teachers and ed-



ASSESSMENTS 205

ucators and identified 89 types of literacyassessments measuring 203 skills. Teachersface a formidable task of finding appropri-ate tools, obtaining them, and then adapt-ing the assessments to their own purposesand students.

Second, reading assessments varied bygrade level. Teachers in K–1, compared toteachers in grades 2–3, were more likely touse assessments of print awareness, phon-ics, and similar enabling skills than assess-ments of reading, writing, or motivation.Teachers in grades K–1 were also less likelythan teachers in grades 2–3 to use standard-ized tests and commercial assessments. Ob-servations were reported as the most com-mon type of assessment andmay be slightlymore frequent at grades K–1. Recognitionas a response option was also usedmost fre-quently among younger children, whereasidentification and production were morefrequent at grades 2–3. Teachers in grades2–3 use more sophisticated tests of readingandwriting and fewermeasures of enablingskills as their assessment methods matchthe developing abilities of their students.

Third, teachers regarded informal mea-sures that they design, select, and embed inthe curriculum as more useful for teachers,students, and parents than commercial as-sessments. Teachers regarded standardizedtests and commercial tests that allow littleteacher control and adaptation as less usefuland used them less often. Paradoxically, thestandardized tests were regarded as havingthe most effect on administrators’ knowl-edge and reporting practices. We think thatteachers’ frustration with assessments ispartly tied to this paradox.

Fourth, the most frequently used andhighly valued reading assessments are leastvisible to parents and administrators be-cause they are not reported publicly. Obser-vations, anecdotes, and daily work samplesare certainly low-stakes evidence ofachievement for accountability purposes,but they may be the most useful for teach-ers, parents, and students. It is also ironicthat the assessments on which teachers feel

least trained and regard as least useful (i.e.,standardized tests) are used most often forevaluations and public reports. Togetherthese findings suggest that teachers needsupport in establishing the value of instruc-tional assessments in their classrooms foradministrators and parents while also de-marcating the limits and interpretations ofexternally mandated tests (see Hoffman,Paris, Patterson, Salas, & Assaf, 2003). Thecurrent slogan about the benefits of a bal-anced approach to reading instructionmight also be applied to a balanced ap-proach to reading assessment. The skillsthat are assessed need to be balancedamong various components of reading, andthe purposes/benefits of assessment needto be balanced among the stakeholders.

The critical question that many policymakers ask is, Which reading assessmentsprovide the best evidence about children’saccomplishments and progress? The an-swer may not be one test or even one typeof assessment. A single assessment cannotadequately represent the complexity of achild’s reading development. Likewise, thesame assessments may not represent thecurriculum and instructional diversityamong teachers. A single assessment cannotcapture the variety of skills and develop-mental levels of children in most K–3classes. That is why teachers use multipleassessments and choose those that fit theirpurposes. These assessments are the onesthat can reveal the most information abouttheir students. We believe that the most ro-bust evidence about children’s reading re-veals developing skills that can be com-pared to individual standards of progress aswell as to normative standards of achieve-ment. A developmental approach balancesthe types of assessments across a range ofreading factors and allows all stakeholdersto understand the strengths and weak-nesses of the child’s reading profile. Manyteachers use this approach implicitly, andwe think it is a useful model for early read-ing assessment rather than a one-test-fits-allapproach.




NOVEMBER 2004

Assessment of Students’ OralReadingOral reading has been a focus for the as-sessment of early reading developmentthroughout the twentieth century (Rasinski& Hoffman, 2003). Teachers in the afore-mentioned surveys reported using chil-dren’s oral reading as an indicator ofgrowth and achievement. The informalreading inventory (IRI) changed over timeto focus on the accuracy of oral readingwithless attention to reading rate until recently.Now researchers have focused attention onthree facets of oral reading fluency—rate,accuracy, and prosody—as indicators of au-tomatic decoding and successful reading(Kuhn & Stahl, 2003).

During the first year of CIERA, ScottParis and David Pearson were asked by theMichigan Department of Education (MDE)to help evaluate the new Michigan LiteracyProgress Profile (MLPP) while also evalu-ating summer reading programs through-out the state. These research projects dove-tailed with CIERA research on assessment,so we spent 5 years working with the In-gham Intermediate School District andMDE evaluating summer reading programsand testing components of the MLPP. Theprogram evaluations led to several insightsabout early reading assessments and eval-uation research that are worth noting here(Paris, Pearson, et al., in press).

One insight from the research was therealization that informal reading invento-ries (IRIs) were legitimate tools for assess-ing student growth in reading and for pro-gram evaluation. In the past 5–7 years,several state assessment programs andcommercial reading assessments have usedleveled texts with running records or mis-cue analyses as formative and summativeassessments of early reading. There hasbeen widespread enthusiasm for such IRIassessments that serve both purposes be-cause the assessments are authentic, alignedwith classroom instructional practices, andintegrated into the curriculum. In fact, IRIsare similar to the daily performance assess-

ments and observations teachers reportedin the CIERA survey of classroom assess-ments. However, the use of IRIs for sum-mative assessment must be viewed withcaution until the reliability and validity ofIRI assessments administered by teacherscan be established. Extensive training andprofessional development that integratereading assessment with instruction seemnecessary in our experience.

A second insight has involved the diffi-culties in analyzing students’ growth whenstudents are reading different leveled texts.The main problem in using IRIs for mea-suring reading growth is that running re-cords and miscue analyses are gathered onvariable levels of text that are appropriatefor each child. Thus, comparing a child’sreading proficiency at two times (or com-paring various children to each other overtime) usually involves comparisons of dif-ferent passages and text levels, so changesin children’s performance are confoundedby differences between passages and diffi-culty levels. Paris (2002) identified severalmethods for analyzing IRI data from lev-eled texts and concluded that the most so-phisticated statistical procedure was basedon Item Response Theory (IRT). In the eval-uation of summer reading programs, Paris,Pearson, et al. (2004) used IRT analyses toscale all the reading data from more than1,000 children on different passages and dif-ferent levels of an IRI so the scores could becompared on single scales of accuracy, com-prehension, retelling, and so forth. Thoseanalyses revealed significant effects on chil-dren who participated in summer readingprograms compared to control groups ofchildrenwho did not participate in the sum-mer programs (see Paris, Pearson, et al.,2004).

A brief description of IRT analyses willreveal the benefits of this approach. IRT isa psychometric method of analyzing datathat allows estimates of individual scoresthat are independent of the actual test items.This is important for reading assessmentthat compares students’ growth over time



ASSESSMENTS 207

on different levels, items, and tests, whichis the usual problem using IRI data. The IRTscaling procedures in a two-parameterRasch model estimate the individual scoresand item difficulties simultaneously (Em-bretson & Reise, 2000). The crux of an IRTanalysis is to find optimal estimates for theitem parameters that depend on the stu-dents’ IRT scores that, in turn, depend onthe item parameters. This catch-22 is solvedstatistically by an iterative procedure thatconverges toward a final solution with op-timal estimates for all parameters. How-ever, the calculation is different from otherstatistical procedures, such as regressionanalysis, because ‘‘likelihood’’ is the under-lying concept and not regression weights.

The item difficulty is calculated accord-ing to a logistic function that identifies thepoint on an item parameter scale where theprobability of a correct response is exactly.50. The distribution of correct answersacross items of varying difficulty from stu-dents in the sample permits estimates of in-dividual IRT scores that are based on theactual as well as possible patterns of correctresponses. The numerical IRT scale is thenestablished with a zero point and a range ofscores, for example, 0–100 or 200–800. For-tunately, there are software programs avail-able to calculate IRT scores, but they haverarely been used with children’s readingdata derived from IRIs and leveled texts.We think IRT analyses are scientifically rig-orous and potentially useful ways to ex-amine children’s reading data and progressover time.

A third set of insights about reading as-sessments involved practical decisionsabout how to use IRIs effectively. Paris andCarpenter (2003) found that teachers re-quire sustained professional developmentand schoolwide implementation of readingassessments to use them uniformly, consis-tently, and wisely. The real benefit of IRIs isthe knowledge teachers gain while assess-ing individual children because the assess-ment framework provides insights aboutneeded instruction. Teachers need guidance

in selecting IRIs, administering them, inter-preting them, and using the results withstudents and parents, and that guidanceneeds to be shared knowledge among theschool staff so it creates a culture of under-standing about reading assessment. Parisand Carpenter (2003) found that imple-menting a schoolwide system of recordingand reporting the data as part of the veri-fication of students’ adequate yearly pro-gress (AYP) made the assessments worththe time and energy of all the participants.Thus, teachers gained diagnostic informa-tion about students and also provided ac-countability through measures of AYP bycomparing fall and spring scores.

A fourth insight that researchers gainedis that IRIs can provide multiple indicatorsof children’s oral reading, including rate,accuracy, prosody, retelling, and compre-hension and that teachers can choose whichmeasures to collect. CIERA research iden-tified some problems with the various mea-sures derived from IRIs (Paris, Carpenter,Paris, & Hamilton, in press). For example,there are restricted ranges and ceiling ef-fects in some measures, such as prosodyand accuracy. It also appears that compre-hension is more highly related to oral read-ing accuracy and rate in beginning readersand that the relation decreases by the timechildren are reading texts at a third- orfourth-grade level. This means that somechildren become adept ‘‘word callers’’ withlittle evidence of comprehension, so readingrate and accuracy measures in IRIs mayyield incomplete information for olderreaders.

IRI data on oral reading fluency andcomprehension are most informative aboutchildren’s reading during initial skill devel-opment, approximately grades K–3, andwhen the information is used in combina-tion with other assessments. Assessments ofprerequisite skills for fluent oral reading,such as children’s vocabulary, letter-soundknowledge, phonological awareness, begin-ning writing, understanding of text conven-tions, and book-handling skills, may aug-




NOVEMBER 2004

ment IRIs with valuable information. Thus,IRIs provide developmentally sensitive as-sessments for beginning and strugglingreaders when fluency and understandingare growing quickly and when teaching fo-cuses on specific reading skills. IRIs are ex-cellent tools for combining diagnostic andsummative assessments in an authentic for-mat for teachers and students.

New Directions in Early ReadingAssessmentIn this part of our review, we summarizefour examples of innovative assessments byCIERA researchers that chart new direc-tions in literacy assessment with youngchildren.

Narrative ComprehensionDuring the past 10 years of renewed em-

phases on beginning reading, there hasbeen less attention given to children’s com-prehension skills compared to decodingskills (National Reading Panel, 2000). Moreresearch on young children’s comprehen-sion skills and strategies is needed to diag-nose and address children’s early readingdifficulties that extend beyond decoding. Amajor CIERA assessment project focused onchildren’s comprehension of narrative sto-ries, and more specifically, on narratives il-lustrated in wordless picture books. Parisand Paris (2003) created and tested compre-hension assessment materials and proce-dures that can be used with young children,whether or not they can decode print. Suchearly assessments of comprehension skillscan complement existing assessments of en-abling skills, provide diagnostic measuresof comprehension problems, and link com-prehension assessment with classroom in-struction.

Narrative comprehension is a complexmeaning-making process that depends onthe simultaneous development of manyskills, including, for example, understand-ing of story structure and relations amongelements and psychological understandingabout characters’ thoughts and feelings. It

is important to assess narrative comprehen-sion for several reasons. First, narrativecompetence is among the fundamental cog-nitive skills that influence early reading de-velopment. Whitehurst and Lonigan (1998)refer to these skills as ‘‘outside-in’’ skills be-cause children use the semantic, conceptual,and narrative relations that they alreadyknow to comprehend the text. In this view,narrative competence is a fundamental as-pect of children’s comprehension of expe-riences before they begin to read, and ithelps children map their understandingonto texts. The importance and early devel-opment of narrative thinking may be onereason that elementary classrooms are dom-inated by texts in the narrative genre (Duke,2000). Second, because of the extensive re-search on narrative comprehension, there isample documentation of its importanceamong older children and adults as well asextensive research on its development (e.g.,Berman & Slobin, 1994). Third, the clearstructure of narrative stories with specificelements and relations provides a structurefor assessment of understanding. Fourth,narrative is closely connected to many con-current developmental accomplishments ofyoung children in areas such as language,play, storytelling, and television viewing. Itis an authentic experience in young chil-dren’s lives, and it reveals important cog-nitive accomplishments.

In a procedure similar to one van Kraay-enoord and Paris (1996) used, Paris andParis (2003) modified trade books with clearnarrative story lines—a strategy that can beused easily for both assessment and instruc-tional purposes—to create the narrativecomprehension (NC) assessment task. Theylocated commercially published wordlesspicture books, adapted them by deletingsome irrelevant pages to shorten the task,and assembled the pages of photocopiedblack and white pictures into spiral boundlittle books. It was important that the storyline revealed by the pictures was clear withan obvious sequence of events and that thepictures contained themain elements of sto-



ASSESSMENTS 209

ries (i.e., settings, characters, problems, res-olutions). The first study in Paris and Paris(2003) established the NC task proceduresfor observing how K–2 children interactedwith wordless picture books under threeconditions: spontaneous examination dur-ing a ‘‘picture walk,’’ elicited retelling, andprompted comprehension during question-ing. The results were striking. The retellingand prompted comprehension scores in-creased in regular steps for each grade fromK to 2, and readers were significantly betterthan nonreaders on both measures, thusshowing developmental sensitivity of theassessment task. There were no develop-mental differences on the picture walk be-haviors, however.

In study 2, Paris and Paris (2003) ex-tended the procedures to additional word-less picture books and examined the reli-ability of the assessment procedures. Thesimilarity of developmental trends acrossbooks indicated that the NC task is sensitiveto progressive increases in children’s abili-ties to make inferences and connectionsamong pictures and to construct coherentnarrative relations from picture books. Sim-ilarity of performance across books showedthat examiners can administer the NC taskwith different materials and score children’sperformance in a reliable manner. Thus, thegeneralizability and robustness of the NCtask across picture books were supported.

In study 3, Paris and Paris (2003) ex-amined the predictive and concurrent va-lidity of the NC task with standardized andinformal measures of reading. The similar-ity in correlations, overall means, and de-velopmental progressions of NC measuresconfirmed the patterns revealed by studies1 and 2 with new materials and additionalchildren. In addition, the NC task was sen-sitive to individual growth over 1 year thatwas not due to practice effects. The NC re-telling and comprehension measures werecorrelated significantly with concurrent as-sessments with an IRI and the Gates-MacGinitie Reading Test, a standardized,group-administered test. Furthermore, the

NC comprehension scores among firstgraders significantly predicted their scoreson the Iowa Tests of Basic Skills (ITBS) ayear later in second grade (r � .52).

The three studies provided consistentand positive evidence about the NC task asa developmentally appropriate measure of5–8-year-old children’s narrative under-standing of picture books. Retelling andprompted comprehension scores improvedsignificantly with age, indicating that theNC task differentiates children who can re-call main narrative elements, identify criti-cal explicit information, make inferences,and connect information across pages fromchildren who have weaknesses with thesenarrative comprehension skills. The NCtask requires brief training and can be givento children in less than 15 minutes, which iscritical for individual assessment of youngchildren. The high percentage agreementbetween raters among the three booksshowed that the scoring rubrics are reliableacross story content and raters. The similarpatterns of cross-sectional and longitudinalperformance further confirmed the gener-alizability of the task. The strong concurrentand predictive relations provided encour-aging evidence of the validity of the NCtask as a measure of comprehension foremergent readers.

In a related series of CIERA studies, vanden Broek et al. (in press) examined pre-school children’s comprehension of tele-vised narratives. They showed 20-minuteepisodes of children’s television programsand presented 13-minute audiotaped sto-ries to children to compare their viewingand listening comprehension. Children re-called causally related events in the narra-tives better than other kinds of text rela-tions, and their recall scores in viewing andlistening conditions were highly correlated.Furthermore, preschoolers’ comprehensionof TV episodes predicted their standardizedreading comprehension test scores in sec-ond grade. The predictive strength re-mained even when vocabulary and wordidentification skills were controlled in a re-




NOVEMBER 2004

gression analysis. Thus, narrative compre-hension skills of preschoolers can be as-sessed with TV and picture books, and themeasures have significant predictive valid-ity for later reading comprehension. Wethink that narrative comprehension view-ing and listening tasks can help teachers tofocus on comprehension skills of youngchildren even if the children have restricteddecoding skills, few experiences withbooks, or limited skills in speaking English.

Parent-Child Interactive ReadingDeBruin-Parecki (1999) created an as-

sessment procedure for family literacy pro-grams that records interactive book readingbehaviors. One purpose of the assessmentwas to help parents with limited literacyskills understand the kinds of social, cog-nitive, and literate behaviors that facilitatepreschool children’s engagement withbooks. A second purpose was to providefamily literacy programs with visible evi-dence of the quantity and quality of parent-child book interactions. The research wasbased on the premise that children learn toread early and with success when parentsprovide stimulating, print-rich environ-ments at home (e.g., Bus, van Ijzendoorn, &Pellegrini, 1995). Moreover, parents mustprovide appropriate support during jointbook reading. Morrow (1990) identified ef-fective interactive reading behaviors, suchas questioning, scaffolding dialogue and re-sponses, offering praise or positive rein-forcement, giving or extending information,clarifying information, restating informa-tion, directing discussion, sharing personalreactions, and relating concepts to life ex-periences. Thus, DeBruin-Parecki (1999)created the Adult/Child Interactive Read-ing Inventory (ACIRI) to assess these kindsof behaviors.

The ACIRI lists 12 literacy behaviors ofadults and the corresponding 12 behaviorsby children. For example, one adult behav-ior is ‘‘poses and solicits questions about thebook’s content,’’ and the correspondingchild behavior is ‘‘responds to questions

about the book.’’ There were four behaviorsin each of the following three categories: en-hancing attention to text, promoting inter-active reading and supporting comprehen-sion, and using literate strategies. Theobserver using the ACIRI recorded the fre-quencies of the 12 behaviors for both parentand child along with notes about the jointbook reading. The assessmentwas designedto be brief (15–30 minutes), flexible, non-threatening, appropriate for any texts,shared with parents, and informative abouteffective instruction. Following the assess-ment, the observer discusses the resultswith the parent to emphasize the positivefeatures of the interaction and to provideguidance for future book interactions. Thetransparency of the assessment and the im-mediate sharing of information minimizethe discomfort of being observed. Afterleaving the home, the observer can recordadditional notes and calculate quantitativescores for the frequencies of observed be-haviors.

DeBruin-Parecki tested the ACIRI with29 mother-child pairs enrolled in an EvenStart family literacy program in Michigan.The children were 3 to 5 years old, and themothers were characterized as lower socio-economic status. The regular staff collectedACIRI assessments at the beginning andend of the year as part of their programevaluation and field testing of the assess-ment. Familiarity also minimized anxietyabout being observed. The results sup-ported the usefulness of the instrument.The ACIRI was shown to be sensitive toparent-child interactions because the fourbehaviors in each of the categories showedsignificant correlations between the fre-quencies of adult and child behaviors. Re-liability was evaluated by having observersrate videotaped parent-child book interac-tions. Interrater reliability was 97% amongeight raters. Consequential validity wasestablished through staff interviews thatshowed favorable evaluations of the ACIRI.The comparison of fall to spring scoresshowed that parents and children increased



ASSESSMENTS 211

the frequencies of many of the 12 behaviorsduring the year. Thus, the ACIRI providedboth formative and summative assessmentfunctions for Even Start staff.

Book Reading in Early ChildhoodClassroomsIn addition to measuring reading skills

of children and adults, it is also importantto assess the literate environment. Longi-tudinal and cross-sectional studies of chil-dren’s literacy development reveal thatmore frequent book reading at home, sup-ported by interactive conversations andscaffolded instruction, leads to growth inlanguage and literacy during early child-hood (Scarborough & Dobrich, 1994; Ta-bors, Snow, & Dickinson, 2001). Similarstudies in schools have shown that the qual-ity of teacher-child interaction, the fre-quency of book reading, and the availabilityof books all enhance children’s early read-ing and language development (Neuman,1999; Whitehurst & Lonigan, 1998). Thus,assessments of environments that supportreading can help improve conditions athome and school.

Dickinson, McCabe, and Anastasopoulos(2001) reported a framework for assessingbook reading in early childhood classroomsalong with data from their observations ofmany classrooms. They derived the follow-ing five important dimensions to evaluate inclassrooms.

• Book area. Issues to consider includewhether there is a book area, the qual-ity of the area, and the quantity andquality of books provided.

• Time for adult-child book reading.Time is a critical ingredient, and con-sideration should be given to the fre-quency and duration of adult-mediatedreading experiences, including one-to-one, small-group, andwhole-class read-ings, as well as the number of booksread during these sessions.

• Curricular integration. Integration re-fers to the nature of the connection be-tween the ongoing curriculum and the

use of books during whole-class timesand throughout the day.

• Nature of the book reading event.When considering a book readingevent, one should examine the teacher’sreading and discussion styles and chil-dren’s engagement.

• Connections between the home andclassroom. The most effective teachersand programs strive to support read-ing at home through parent education,lending libraries, circulation of booksmade by the class, and encouragementof better use of community libraries.

Dickinson et al. (2001) examined datafrom four studies in order to evaluate theimportance of these dimensions. Theynoted that many of the preschool class-rooms they observed were rated high inquality using historical definitions of devel-opmentally appropriate practices, but thatthe same classrooms were rated as havinglow-quality literacy instruction. For exam-ple, only half the classrooms they observedhad separate areas for children to readbooks, and there were few informationalbooks and few books about varied racialand cultural groups. They found no bookreading at all in 66 classrooms. In the otherclassrooms, adults read to children less than10 minutes per day, and only 35% of theclasses allowed time for children to look atbooks on their own. Other observations ledthe researchers to conclude that book read-ing was not coordinated with the curricu-lum or learning goals. Only 19% of class-rooms had three or more books related toa curricular theme, and only 35% of class-rooms had listening centers. Dickinson etal. (2001) noted that group reading is afiller activity used in classroom transitionsrather than an instructional and curricularpriority.

The researchers also examined bookreading in classrooms by assessing teachers’style, animation, and prosody as they read.Most teachers read with little expressive-ness. Many used an explicit managementstyle, such as asking questions of childrenwho raised their hands, as ‘‘crowd control’’




NOVEMBER 2004

rather than using thought-provoking ques-tions about text. More than 70% of teachers’talk during book reading made few cogni-tive demands on children. The researcherssuggested that teachers must devote moreattention to engaging children in discus-sions that link their experiences to text,teach new vocabulary words, probe char-acters’ motivations, and promote compre-hension of text relations. Analyses of home-school connections revealed that morecould be done to encourage families to re-inforce school practices and to seek com-munity literacy resources. Teachers rarelyconnected language and cultural experi-ences at home with literacy instruction atschool.

Dickinson et al. (2001) concluded thatthe framework can be useful for assessingearly childhood classrooms and for study-ing the effects of specific environmental fea-tures on children’s literacy development.They noted, for example, that their researchrevealed little correlation between theamount of book reading in classrooms andthe degree to which reading was integratedinto the curriculum. They interpreted this asevidence that book reading in many earlychildhood classrooms is an incidental activ-ity rather than a planned instructional goal.The framework is also useful for readingeducators to use with preservice and in-service teachers who want to assess theirown classrooms and teaching styles becauseit identifies critical elements of successfulclassrooms.

Texts and the Text Environment inBeginning Reading InstructionThe research by Dickinson et al. (2001)

provides a conceptual bridge to severalother CIERA investigations of texts and thetext environment. Two important strands ofresearch at CIERA have examined the as-sessment of text characteristics for begin-ning reading instruction. In the first strandof research, Hiebert and her colleagues de-veloped a text assessment framework that

can be used to analyze important featuresof texts used for beginning reading instruc-tion. This framework is grounded in Hie-bert’s theoretical claims that certain text fea-tures scaffold readers’ success in earlyreading. The framework, called the Text Ele-ments by Task (TExT) model, identifies twocritical factors in determining beginningreaders’ success with texts: linguistic con-tent and cognitive load.

Linguistic content refers to the knowl-edge about oral and written language thatis necessary for readers to recognize thewords in particular texts. Phoneme-grapheme knowledge that is required toread a text is described in terms of severalmeasures that provide different but comple-mentary information on the phoneme-grapheme knowledge related to vowels.The first measure of phoneme-graphemeknowledge summarizes the complexity ofthe vowel patterns in a text. The secondmeasure is the degree to which highly com-mon vowel and consonant patterns are re-peated. To use this measure, the TExTmodel examines the number of different on-sets that appear with a particular rime. Thenumber of syllables in words is a third mea-sure of linguistic content that influences be-ginning readers’ recognition of words. Themodel claims that texts with fewer multi-syllabic words help children acquire fluentword recognition.

The cognitive load factor within theTeXT model measures the amount of newlinguistic information to which beginningreaders can attend while continuing to un-derstand the text’s message. Repetition of atleast some core linguistic content has tra-ditionally been used to reduce the cognitiveload in text used for teaching children toread. Within the TeXT model, word repeti-tion and the number of unique words rela-tive to total words are used to inspect thecognitive load a particular text places on be-ginning readers. Two additional features oftexts that are commonly used in classroomsare also considered in the model: the sup-



ASSESSMENTS 213

port provided through illustrations, andpatterns of sentence and text structure. In arecent study applying these TeXT princi-ples, Menon and Hiebert (2003) compared‘‘little books’’ developed within the frame-work to traditional beginning reading texts.They demonstrated the effectiveness of stu-dents’ practice reading little books devel-oped in consideration of the TeXT modelover traditional basal texts. The assessmenttools developed in this research can be usedto evaluate the complexity of texts as wellas to guide the construction of new texts forbeginning readers.

In the second strand of research focusedon texts, Hoffman and his colleagues inves-tigated the qualities of texts used in begin-ning reading instruction and the levelingsystems of these texts. The research con-ducted through CIERA was grounded inearlier studies of changes in basal readingtexts associated with the literature-basedmovement (Hoffman et al., 1998) and laterwith the decodable text movement (Hoff-man, Sailors, & Patterson, 2002). Hoffman(2002) has proposed a model for the assess-ment of text quality for beginning readinginstruction that considers three factors: ac-cessibility, instructional design, and engag-ing qualities. Accessibility of a text for thereader is a function of the decoding de-mands of the text (a word-level focus) andthe support provided through the predict-able features of the text (ranging from pic-ture support, to rhyme, to repeated phrases,to a cumulative structure). The instructionaldesign factor involves the ways that a textfits into the larger scheme for texts read (aleveling issue) as well as the instruction andcurriculum that surround the reading of thetext. Finally, the content factor considers thecontent, language, and design features ofthe text.

In addition to evaluating the changes intexts for beginning readers, Hoffman,Roser, Salas, Patterson, and Pennington(2001) used this framework to study thevalidity of some of the popular text-leveling schemes. In a study involving over

100 first-grade learners, these researchersexamined the ways in which estimates oftext difficulty using different text-levelingsystems predicted the performance of first-grade readers. The research identified highcorrelations between the factors in the theo-retical model and the leveling from both thePinnell and Fountas and Reading Recoverysystems. The analysis also confirmed thevalidity of these systems in predicting stu-dent performance with first-grade texts. Fi-nally, the study documented the effects ofvarying levels of support provided to thereader (shared reading, guided reading, nosupport) and performance on such mea-sures as oral reading accuracy, rate, and flu-ency.

Hoffman and Sailors (2002) created amethod for assessing the classroom literacyenvironment called the TEX-IN3 that in-cludes three components: a text inventory,a text in-use observation, and a series of textinterviews. The TEX-IN3 draws on severalresearch literatures, including research ontexts conducted through CIERA. In addi-tion, the instrument was developed basedon the literature exploring the effects of thetext environment on teaching and learning.The assessment yields a series of scores aswell as qualitative data on the classroom lit-eracy environment.

The TEX-IN3 was validated in a studyof over 30 classrooms (Hoffman, Sailors,Duffy, & Beretvas, 2003). In this study, stu-dents were tested with pre- and posttests ona standardized reading test. Observersweretrained in the use of TEX-IN3, and high lev-els of reliability were established. Datawerecollected in classrooms at three times (fall,winter, and spring). Data analyses focusedon the relations between features of the textenvironment from the TEX-IN3 and stu-dents’ reading comprehension scores. Theanalyses supported all three components ofthe TEX-IN3. For example, the correlationsbetween students’ gain scores and the rat-ings of the overall text environment were




NOVEMBER 2004

significant. Correlations between students’gain scores and the in-use scores derivedfrom observation of teaching, as well as thecorrelations between the rating of teachers’understanding and valuing of the text en-vironment with students’ gain scores, weresignificant. The findings from the researchwith the TEX-IN3 suggest the importance ofexpanding assessment from a narrow focuson the texts in classrooms to considerationof texts in the full literacy environment.

Summary and Future ResearchFrom 1997 to 2002, CIERA researchers con-ducted many studies of early reading as-sessment that focused on readers and text,home and school, and policy and profes-sion. The CIERA surveys of early readingassessments identified the expanding arrayof assessment instruments, commercial andnoncommercial, available to K–3 teachers.Researchers also identified how effectiveteachers in schools that ‘‘beat the odds’’ useassessments and how they view the utilityand effects of various types of assessments.The instruments used most frequently con-tributed to ongoing CIERA research on thedevelopment of the MLPP battery and theuse of IRIs for formative and summativepurposes. Those studies remain in progressas researchers collect longitudinal evidenceabout the reliability and validity of earlyreading assessments (e.g., Paris, Carpenter,et al., in press).

The most important insight from this re-search is that some skills, such as alphabetknowledge, concepts of print, and phone-mic awareness, are universally mastered inrelatively brief developmental periods. As aconsequence, the distributions of data fromthese variables are skewed by floor and ceil-ing effects that, in turn, influence the cor-relations used to establish reliability and va-lidity of the assessments. Assessments oforal reading accuracy, and perhaps rate, arealso skewed, so that measures of some basicreading skills are difficult to analyze withparametric statistics in traditional ways.The mastery of some reading skills poses

challenges to conventional theories of read-ing development and traditional statisticalanalyses.

CIERA researchers also developed in-novative assessments of comprehensionwith wordless picture books that offerteachers new ways to instruct and assesscomprehension with children who cannotyet decode print. These cross-sectional andlongitudinal studies substantiate the reli-ability and validity of early assessments. Inaddition, CIERA researchers designed andtested new methods for assessing narrativecomprehension, interactive parent-childreading, literate environments in earlychildhood classrooms, text features, and thetext environment. All of these tools haveimmediate practical applications and bene-fits for educators. Indeed, the hallmark ofCIERA research on reading assessment isthe use of rigorous methods to identify, cre-ate, test, and refine instruments and prac-tices that can help parents and teachers pro-mote the reading achievement of allchildren.

This research, as well as studies outsidethe immediate CIERA network, points tothe need for continuing study of assessmentin early literacy. We believe that at least fourareas deserve special attention. First, thepolicy context for instructional programs,teaching, and teacher education that placesa premium on ‘‘scientifically proven’’ ap-proaches and methods has immediate im-plications for assessment. Tools to be usedin reading assessment (e.g., for diagnosis,program evaluation, or research) are subjectto high standards of validity and reliability.We applaud this attention to rigor in as-sessment, but we believe that decisionmak-ing about the use of instruments should beprofessional, public, and comprehensive.Such deliberations must extend beyond thetraditional psychometric constructs of reli-ability and validity to include considerationof the consequences of testing and the socialcontexts of assessment.

Second, researchers must continue to in-vestigate the ways in which assessment



ASSESSMENTS 215

tools can be broadened to focus on multiplefactors and the interaction of these factorsin ways that reflect authentic learning andteaching environments. For example, infor-mal reading inventories have become pop-ular tools for assessment, partly becausereading rate and accuracy can be assessedquickly and reliably, but educators need toconsider how text-leveling factors might in-teract with students’ developmental levelsto influence evaluations of reading perfor-mance. Good assessments should lead to anunderstanding of the complexity of learn-ing to read and not impose a false sense ofsimplicity on early reading development.

Third, the gulf between what teachersvalue as informal assessments and what isimposed on them in the form of standard-ized testing appears to be broadening. Al-though performance assessments and port-folios were popular in the 1980s and 1990s,the trends today are to increase high-stakestesting for young children, to removeteacher judgment from assessment, and tostreamline assessments so they can be con-ducted quickly and repeatedly. More re-search is needed on how highly effectiveteachers assess developing reading skills intheir classrooms. Before educators and pol-icy makers abandon performance assess-ment, careful consideration must be givento the ways that ongoing assessment canpromote differentiated instruction.

Fourth, researchers cannot lose sight ofthe fact that good assessment rests on goodtheory, not just a theory of reading but ofeffective teaching and development. Justbecause motivation, self-concept, and criti-cal thinking are difficult to measure usinglarge-scale standardized tests does notmean they should be ignored. The scientificmethod is not just about comparing oneprogram or one approach to another toprove which is best. The scientific investi-gation of assessment in early literacy shouldcontribute to theory building that ulti-mately informs effective teaching and learn-ing.

References

Berman, R. A., & Slobin, D. I. (1994). Relatingevents in narrative: A crosslinguistic develop-mental study. Hillsdale, NJ: Erlbaum.

Bus, A. J., van Ijzendoorn, M. H., & Pellegrini, A.D. (1995). Joint book-reading makes for suc-cess in learning to read: A meta-analysis onintergenerational transmission of literacy.Re-view of Educational Research, 65(1), 1–21.

Clay, M. M. (1993). An observation survey of earlyliteracy achievement. Portsmouth, NH: Hei-nemann.

DeBruin-Parecki, A. (1999). Assessing adult/childstorybook reading practices (Technical Rep. No.2-004). Ann Arbor: University of Michigan,Center for the Improvement of Early Read-ing Achievement.

Dickinson, D. K., McCabe, A., & Anastasopou-los, L. (2001). A framework for examining bookreading in early childhood classrooms (Tech.Rep. No. 1-014). Ann Arbor: University ofMichigan, Center for the Improvement ofEarly Reading Achievement.

Duke, N. K. (2000). 3.6 minutes per day: Thescarcity of information texts in first grade.Reading Research Quarterly, 35(2), 202–224.

Embretson, S. E., & Reise, S. P. (2000). Item re-sponse theory for psychologists. Mahwah, NJ:Erlbaum.

Good, R. H., & Kaminski, R. A. (Eds.). (2002).Dynamic indicators of basic early literacy skills(6th ed.). Eugene, OR: Institute for the Devel-opment of Educational Achievement.

Haladyna, T., Nolen, S. B., & Haas, N. S. (1991).Raising standardized achievement testscores and the origins of test score pollution.Educational Researcher, 20(5), 2–7.

Hoffman, J. V. (2002). Words on words in leveledtexts for beginning readers. In D. Schallert,C. Fairbanks, J. Worthy, B. Maloch, & J. V.Hoffman (Eds.), Fifty-first yearbook of the Na-tional Reading Conference (pp. 59–81). OakCreek, WI: National Reading Conference.

Hoffman, J. V., Assaf, L. C., & Paris, S. G. (2001).High-stakes testing in reading: Today inTexas, tomorrow? Reading Teacher, 54(5), 482–492.

Hoffman, J. V., McCarthey, S. J., Abbott, J., Chris-tian, C., Corman, L., Curry, C., Dressman,M.,Elliot, B., Mathern, D., & Stahle, E. (1998).The literature-based basals in first-gradeclassrooms: Savior, satan, or same-old, same-old? Reading Research Quarterly, 33, 168–197.

Hoffman, J. V., Paris, S. G., Patterson, E., Salas,R., & Assaf, L. (2003). High-stakes assess-ment in the language arts: The piper plays,the players dance, but who pays the price?




NOVEMBER 2004

In J. Flood & D. Lapp (Eds.), Handbook of re-search on teaching the English language arts (2ded., pp. 619–630). Mahwah, NJ: Erlbaum.

Hoffman, J. V., Roser, N. L., Salas, R., Patterson,E., & Pennington, J. (2001). Text leveling and‘‘little books’’ in first-grade reading. Journalof Literacy Research, 33(3), 507–528.

Hoffman, J. V., & Sailors, M. (2002). The TEX-IN3:Text inventory, text in-use and text interviews.Bastrop, TX: Jeaser.

Hoffman, J. V., Sailors, M., Duffy, G., & Beretvas,C. (2003, April). Assessing the literacy environ-ment using the TEX-IN3: A validity study. Pa-per presented at the annual meeting of theAmerican Educational Research Association,Chicago.

Hoffman, J. V., Sailors, M., & Patterson, E. (2002).Decodable texts for beginning reading in-struction: The year 2000 basals. Journal of Lit-eracy Research, 34(3), 269–298.

Kuhn, M. R., & Stahl, S. A. (2003). Fluency: Areview of developmental and remedial prac-tices. Journal of Educational Psychology, 95(1),3–21.

Meisels, S. J., Liaw, F. R., Dorfman, A. B., & Nel-son, R. (1995). The Work Sampling System:Reliability and validity of a performance as-sessment for young children. Early ChildhoodResearch Quarterly, 10(3), 277–296.

Meisels, S. J., & Piker, R. A. (2000). An analysis ofearly literacy assessments used for instruction(Tech. Rep. No. 3-002). Ann Arbor: Univer-sity of Michigan, Center for the Improve-ment of Early Reading Achievement.

Menon, S., & Hiebert, E. H. (2003). A comparisonof first graders’ reading acquisition with littlebooks and literature anthologies (Tech. Rep. No.1-009). Ann Arbor: University of Michigan,Center for the Improvement of Early Read-ing Achievement.

Morrow, L. M. (1990). Assessing children’s un-derstanding of story through their construc-tion and reconstruction of narrative. In L. M.Morrow & J. K. Smith (Eds.), Assessment forinstruction in early literacy (pp. 110–133). En-glewood Cliffs, NJ: Prentice-Hall.

National Reading Panel. (2000). Teaching childrento read: An evidence-based assessment of the sci-entific research literature on reading and its im-plications for reading instruction: Reports of thesubgroups. Bethesda, MD: National Instituteof Child Health and Human Development.

Neuman, S. B. (1999). Books make a difference:A study of access to literacy. Reading ResearchQuarterly, 34(3), 286–311.

No Child Left Behind Act of 2001. (2002). Pub. L.No. 107–110, paragraph 115, Stat. 1425.

Paris, A. H., & Paris, S. G. (2003). Assessing nar-

rative comprehension in young children.Reading Research Quarterly, 38(1), 37–76.

Paris, S. G. (2000). Trojan horse in the schoolyard:The hidden threats in high-stakes testing. Is-sues in Education, 6(1,2), 1–16.

Paris, S. G. (2002). Measuring children’s readingdevelopment using leveled texts. ReadingTeacher, 56(2), 168–170.

Paris, S. G., & Carpenter, R. D. (2003). FAQsabout IRIs. Reading Teacher, 56(6), 578–580.

Paris, S. G., Carpenter, R. D., Paris, A. H., &Hamilton, E. E. (in press). Spurious and gen-uine correlates of children’s reading compre-hension. In S. G. Paris & S. A. Stahl (Eds.),Children’s reading comprehension and assess-ment. Mahwah, NJ: Erlbaum.

Paris, S. G., Lawton, T. A., Turner, J. C., & Roth,J. L. (1991). A developmental perspective onstandardized achievement testing. Educa-tional Researcher, 20, 12–20.

Paris, S. G., Paris, A. H., & Carpenter, R. D.(2002). Effective practices for assessingyoung readers. In B. Taylor & P. D. Pearson(Eds.), Teaching reading: Effective schools, ac-complished teachers (pp. 141–160). Mahwah,NJ: Erlbaum.

Paris, S. G., Pearson, P. D., Cervetti, G., Carpen-ter, R., Paris, A. H., DeGroot, J., Mercer, M.,Schnabel, K., Martineau, J., Papanastasiou,E., Flukes, J., Humphrey, K., & Bashore-Berg,T. (2004). Assessing the effectiveness of sum-mer reading programs. In G. Borman & M.Boulay (Eds.), Summer learning: Research, pol-icies, and programs (pp. 121–161). Mahwah,NJ: Erlbaum.

Pearson, P. D., Sensale, L., Vyas, S., & Kim, Y.(1999, June). Early literacy assessment: A mar-ketplace analysis. Paper presented at the Na-tional Conference on Large-Scale Assess-ment, Snowbird, UT.

Rasinski, T. V., & Hoffman, J. V. (2003). Oral read-ing in the school literacy curriculum.ReadingResearch Quarterly, 38(4), 510–522.

Scarborough, H. S., & Dobrich, W. (1994). On theefficacy of reading to preschoolers. Develop-mental Review, 14, 245–302.

Snow, C. E., Burns, M. S., & Griffin, P. (1998).Preventing reading difficulties in young children.Washington, DC: National Academy Press.

Stallman, A. C., & Pearson, P. D. (1990). Formalmeasures of early literacy. In L. M. Morrow& J. K. Smith (Eds.), Assessment for instructionin early literacy (pp. 7–44). Englewood Cliffs,NJ: Prentice-Hall.

Tabors, P. O., Snow, C. E., & Dickinson, D. K.(2001). Homes and schools together: Sup-porting language and literacy development.In D. K. Dickinson & P. O. Tabors (Eds.), Be-



ASSESSMENTS 217

ginning literacy with language: Young childrenlearning at home and in school (pp. 313–334).Baltimore: Brookes.

Urdan, T. C., & Paris, S. G. (1994). Teachers’ per-ceptions of standardized achievement tests.Educational Policy, 8(2), 137–156.

van den Broek, P., Kendeou, P., Kremer, K.,Lynch, J., Butler, J., White, M. J., & Lorch, E.P. (in press). Assessment of comprehensionabilities in young children. In S. G. Paris &

S. A. Stahl (Eds.), Children’s reading compre-hension and assessment. Mahwah, NJ: Erl-baum.

van Kraayenoord, C. E., & Paris, S. G. (1996).Story construction from a picture book: Anassessment activity for young learners. EarlyChildhood Research Quarterly, 11, 41–61.

Whitehurst, G. J., & Lonigan, C. J. (1998). Childdevelopment and emergent literacy. ChildDevelopment, 69(3), 848–872.