national center for education statistics - the naep …...the guide naepnaep 1999 edition u.s....

The

GuideNAEPNAEP

1999 Edition

U.S. Department of EducationOffice of Educational Research and Improvement NCES 2000-456

NATIONAL CENTER FOR EDUCATION STATISTICS

THE NATION’S REPORT CARD, the National Assessment of Educational Progress (NAEP), is the only nationally representative andcontinuing assessment of what America’s students know and can do in various subject areas. Since 1969, assessments have been con-ducted periodically in reading, mathematics, science, writing, history, geography, and other fields. By making objective informationon student performance available to policymakers at the national, state, and local levels, NAEP is an integral part of our nation’s eval-uation of the condition and progress of education. Only information related to academic achievement is collected under this program.NAEP guarantees the privacy of individual students and their families.

NAEP is a congressionally mandated project of the National Center for Education Statistics, the U.S. Department of Education. TheCommissioner of Education Statistics is responsible, by law, for carrying out the NAEP project through competitive awards to quali-fied organizations. NAEP reports directly to the Commissioner, who is also responsible for providing continuing reviews, includingvalidation studies and solicitation of public comment, on NAEP’s conduct and usefulness.

In 1988, Congress established the National Assessment Governing Board (NAGB) to formulate policy guidelines for NAEP. The Boardis responsible for selecting the subject areas to be assessed from among those included in the National Education Goals; for settingappropriate student performance levels; for developing assessment objectives and test specifications through a national consensusapproach; for designing the assessment methodology; for developing guidelines for reporting and disseminating NAEP results; fordeveloping standards and procedures for interstate, regional, and national comparisons; for determining the appropriateness of testitems and ensuring they are free from bias; and for taking actions to improve the form and use of the National Assessment.

The National Assessment Governing Board

What Is The Nation’s Report Card?

Mark D. Musick, ChairPresidentSouthern Regional Education BoardAtlanta, Georgia

Michael T. Nettles, Vice ChairProfessor of Education and Public PolicyUniversity of MichiganAnn Arbor, Michigan

Moses BarnesSecondary School PrincipalFort Lauderdale, Florida

Melanie A. CampbellFourth-Grade TeacherTopeka, Kansas

Honorable Wilmer S. CodyCommissioner of EducationState of KentuckyFrankfort, Kentucky

Edward DonleyFormer ChairmanAir Products & Chemicals, Inc.Allentown, Pennsylvania

Honorable John M. EnglerGovernor of MichiganLansing, Michigan

Thomas H. FisherDirector, Student Assessment ServicesFlorida Department of EducationTallahassee, Florida

Michael J. GuerraExecutive DirectorNational Catholic Education AssociationSecondary School DepartmentWashington, DC

Edward H. HaertelProfessor, School of EducationStanford UniversityStanford, California

Juanita HaugenLocal School Board PresidentPleasanton, California

Honorable Nancy KoppMaryland House of DelegatesBethesda, Maryland

Honorable William J. MoloneyCommissioner of EducationState of ColoradoDenver, Colorado

Mitsugi NakashimaPresidentHawaii State Board of EducationHonolulu, Hawaii

Debra PaulsonEighth-Grade Mathematics TeacherEl Paso, Texas

Honorable Norma PaulusFormer Superintendent of Public

InstructionOregon State Department of EducationSalem, Oregon

Honorable Jo Ann PottorffKansas House of RepresentativesWichita, Kansas

Diane RavitchSenior Research ScholarNew York UniversityNew York, New York

Honorable Roy RomerFormer Governor of ColoradoDenver, Colorado

John H. StevensExecutive DirectorTexas Business and Education CoalitionAustin, Texas

Adam UrbanskiPresidentRochester Teachers AssociationRochester, New York

Deborah VoltzAssistant ProfessorDepartment of Special EducationUniversity of LouisvilleLouisville, Kentucky

Marilyn A. WhirryTwelfth-Grade English TeacherManhattan Beach, California

Dennie Palmer WolfSenior Research AssociateHarvard Graduate School of EducationCambridge, Massachusetts

C. Kent McGuire (Ex-Officio)Assistant Secretary of EducationOffice of Educational Research and

ImprovementU.S. Department of EducationWashington, DC

Roy TrubyExecutive Director, NAGBWashington, DC

The

GuideNAEPNAEP

A Description of the Content and Methodsof the 1999 and 2000 Assessments

Revised Edition

November 1999

THE NATIONAL CENTER FOR EDUCATION STATISTICS

Office of Educational Research and Improvement

U.S. Department of Education

U.S. Department of EducationRichard W. RileySecretary

Office of Educational Research and ImprovementC. Kent McGuireAssistant Secretary

National Center for Education StatisticsGary W. PhillipsActing Commissioner

Education Assessment GroupPeggy G. CarrAssociate Commissioner

November 1999

SUGGESTED CITATION:

U.S. Department of Education. National Center for Education Statistics. The NAEP Guide, NCES 2000–456, by Horkay, N., editor. Washington, DC: 1999.

FOR MORE INFORMATION:To obtain single copies of this report, while supplies last, or ordering information on other U.S. Department of Education products, call toll free 1-877-4ED PUBS (877–433–7827), or write:

Education Publications Center (ED Pubs)U.S. Department of Education P.O. Box 1398Jessup, MD, 20794–1398

TTY/TDD 1–877–576–7734FAX 301–470–1244

Online ordering via the Internet: http://www.ed.gov/pubs/edpubs.htmlCopies also are available in alternate formats upon request.This report is also available on the World Wide Web: http://nces.ed.gov/nationsreportcard

Cover photo copyright 1999, PhotoDisc, Inc.

The work upon which this publication is based was performed for the National Center for EducationStatistics, Office of Educational Research and Improvement, by Educational Testing Service.

A C K N O W L E D G M E N T S

This guide was produced with the assistance of professional staff at the National Center forEducation Statistics (NCES), Educational Testing Service (ETS), Aspen Systems Corporation,National Computer Systems (NCS), and Westat.

The NCES staff whose invaluable assistance provided text and reviews for this guide include:Janis Brown, Pat Dabbs, and Andrew Kolstad.

Many thanks are due to Nancy Horkay, who edited and coordinated production of the guide,and to the numerous reviewers at ETS. The comments and critical feedback of the followingreviewers are reflected in this guide: Nancy Allen, Jay Campbell, Patricia Donahue, JeffHaberstroh, Debra Kline, John Mazzeo, and Christine O’Sullivan. Thanks also to Connie Smithat NCS and Dianne Walsh at Westat for coordinating the external reviews.

The guide design and production were skillfully completed by Aspen staff members WendyCaron, Robert Lee, John Libby, Laura Mitchell, Munira Mwalimu, Maggie Pallas, Amy Salsbury,and Donna Troisi.

NCES and ETS are grateful to Nancy Horkay, coordinator of the previous guide, upon which the current edition is based.

i

T A B L E O F C O N T E N T S

Introduction 1

Background and Purpose 3

Question 1:What is NAEP? 3

Question 2:What subjects does NAEP assess? How are the subjects chosen, and how arethe assessment questions determined? What subjects were assessed in 1999? What subjects will be assessed in 2000? 6

Question 3:Is participation in NAEP voluntary? Are the data confidential? Are students’ names or other identifiers available? 14

Question 4:Can parents examine the questions NAEP uses to assess student achievement? Can parents find out how well their children performed in the NAEP assessment? Why are NAEP questions kept confidential? 16

Question 5:Who evaluates and validates NAEP? 18

Assessment Development 21

Question 6:What process is used to develop the assessments? 21

Question 7:How does NAEP accommodate students with disabilities and students with limited English proficiency? 24

Question 8:What assessment innovations has NAEP developed? 26

iii

Scoring and Reporting 28

Question 9:What results does NAEP provide? 28

Question 10:How does NAEP reliably score and process millions of student-composed responses? 31

Question 11:How does NAEP analyze the assessment results? 35

Question 12:How does NAEP ensure the comparability of results among the state assessments and between the state and national assessments? 39

Question 13:What types of reports does NAEP produce? What reports are planned for the 1999 and 2000 assessments? 41

Using NAEP Data 43

Question 14:What contextual background data does NAEP provide? 43

Question 15:How can educators use NAEP resources such as frameworks, released questions, and reports in their work? 46

Question 16:How are NAEP data and assessment results used to further explore education and policy issues? What technical assistance does NAEP provide? 48

Question 17:Can NAEP results be linked to other assessment data? 50

iv

Sampling and Data Collection 53

Question 18:Who are the students assessed by NAEP? 53

Question 19:How many schools and students participate in NAEP? When are the data collected during the school year? 56

Question 20:How does NAEP use matrix sampling? What is focused BIB spiraling, and what are its advantages for NAEP? 59

Question 21:What are NAEP’s procedures for collecting data? 62

Bibliography 65

Further Reading 68

Glossary 70

Subject Index 74

v

The NAEP Guide

Intro

duct

ion

1

As mandated by Congress, the National Assessment of Educational Progress(NAEP) surveys the educational accomplishments of U.S. students and monitorschanges in those accomplishments. NAEP tracks the educational achievements offourth-, eighth-, and twelfth-grade students over time in selected content areas.For 30 years, NAEP has been collecting data to provide educators and policy-makers with accurate and useful information.

About NAEPEach year, NAEP employs the full-time equivalent of more than 125 people,

and as many as 5,000 people work on NAEP in some capacity. These people work for many different organizations that must coordinate their efforts to con-duct NAEP. Amendments to the statute that authorized NAEP established thestructure for this cooperation in 1988.

Under the current structure, the Commissioner of Education Statistics, whoheads the National Center for Education Statistics (NCES) in the U.S. Departmentof Education, is responsible, by law, for carrying out the NAEP project throughcompetitive awards to qualified organizations. The Associate Commissioner forAssessment executes the program operations and technical quality control.

The National Assessment Governing Board (NAGB), appointed by theSecretary of Education but independent of the department, governs the program.Authorized to set policy for NAEP, the Governing Board is broadly representativeof NAEP’s varied audiences. NAGB selects the subject areas to be assessed, develops guidelines for reporting, and gives direction to NCES. While overseeingNAEP, NAGB often works with several other organizations. In the past, NAGBhas contracted with the Council of Chief State School Officers (CCSSO) to ensurethat content is planned through a national consensus process, and it contractswith ACT Inc. to identify achievement standards for each subject and grade tested.

NCES also relies on the cooperation of private companies for test developmentand administration services. Since 1983, NCES has conducted the assessmentthrough a series of contracts, grants, and cooperative agreements with EducationalTesting Service (ETS) and other contractors. Under these agreements, ETS isdirectly responsible for developing the assessment instruments, scoring studentresponses, analyzing the data, and reporting the results. NCES also has a cooper-ative agreement with Westat. Under this agreement, Westat selects the schooland student samples, trains assessment administrators, and manages field operations (including assessment administration and data collection activities).National Computer Systems (NCS), which serves as a subcontractor to ETS, isresponsible for printing and distributing the assessment materials and for scan-ning and scoring students’ responses. American Institutes for Research (AIR),which serves as a subcontractor to ETS, is responsible for development of thebackground questionnaires.

INTRODUCTION

The NAEP Guide2

Intro

duct

ion

NCES publishes the results of the NAEP assessments and releases them to themedia and public. NCES strives to present this information in the most accurateand useful manner possible, publishing reports designed for the general publicand specific audiences and making the data available to researchers for secondaryanalyses.

About the GuideThe goals of The NAEP Guide are to provide readers with an overview of the

project and to help them better understand the philosophical approach, proce-dures, analyses, and psychometric underpinnings of NAEP. This guide acquaintsreaders with NAEP’s informational resources, demonstrates how NAEP’s designmatches its role as an indicator of national educational achievement, and describessome of the methods used in the 1999 and 2000 assessments.

The guide follows a question-and-answer format, presenting the most com-monly asked questions and following them with succinct answers. Each answeralso includes additional background information. The guide is designed for thegeneral public, including state and national policymakers; state, district, andschool education officials who participate in NAEP; and researchers who relyon the guide for their introduction to NAEP.

What is NAEP?

Often called the “Nation’s Report Card,” the National Assessment ofEducational Progress (NAEP) is the only nationally representative, continuingassessment of what America’s students know and can do in various subjectareas. NAEP provides a comprehensive measure of students’ learning at criticaljunctures in their school experience.

The assessment has been conducted regularly since 1969. Because it makesobjective information about student performance available to policymakers atnational and state levels, NAEP plays an integral role in evaluating the condi-tions and progress of the nation’s education. Under this program, only informa-tion related to academic achievement is collected, and NAEP guarantees that alldata related to individual students and their families remain confidential.

Overview of NAEPOver the years, NAEP has evolved to

address questions asked by policymakers,and NAEP now refers to a collection ofnational and state-level assessments.

Between 1969 and 1979, NAEP was anannual assessment. From 1980 through1996, it was administered every twoyears. In 1997, NAEP returned to annualassessments. Initiated in 1990, state-levelNAEP enables participating states to compare their results with those of the nation and other participating states.

NAEP has two major goals: to reflectcurrent educational and assessment prac-tices and to measure change reliably overtime. To meet these dual goals, NAEPselects nationally representative samplesof students who participate in either themain NAEP assessments or the long-termtrend NAEP assessments.

National NAEPNational NAEP reports information

for the nation and for specific geographicregions of the country (Northeast, South-east, Central, and West). It includes stu-dents drawn from public and nonpublicschools. At the national level, NAEP isdivided into two assessments: the mainNAEP and the long-term trend NAEP.These assessments use distinct data col-lection procedures, separate samples ofstudents, and test instruments based on different frameworks. Student andteacher background questionnaires alsovary between the main and long-termtrend assessments, as do many of theanalyses employed to produce results.The results from these two assessmentsare also reported separately.

Answer

Question:1

FURTHER DETAILS

The NAEP Guide

Que

stion

1

3

The NAEP Guide4

Que

stion

1

Main NAEPThe main assessments report results for

grade samples (grades 4, 8, and 12). Theyperiodically measure students’ achieve-ment in reading, mathematics, science,writing, U.S. history, civics, geography,and other subjects. (See the inside backcover.) In 2000, main NAEP will assessmathematics and science at grades 4, 8,and 12 and reading at grade 4.

The main assessments follow the cur-riculum frameworks developed by theNational Assessment Governing Board(NAGB) and use the latest advances inassessment methodology. Indeed, NAEPhas pioneered many of these innovations.The assessment instruments are flexibleso they can adapt to changes in curricularand educational approaches. For example,NAEP assessments include large percent-ages of constructed-response questions(questions that ask students to writeresponses ranging from two or three sen-tences to a few paragraphs) and itemsthat require the use of calculators andother materials.

As the content and nature of the NAEPinstruments evolve to match instructionalpractices, however, the ability of theassessment to measure change over timeis greatly reduced. Recent main NAEPassessment instruments have typicallybeen kept stable for relatively short peri-ods of time, allowing short-term trendresults to be reported. For example, the1998 reading assessment followed a short-term trend line that began in 1992 andcontinued in 1994. Because of the flexibili-ty of the main assessment instruments,the long-term trend NAEP must be usedto reliably measure change over longerperiods of time.

Long-Term Trend NAEPThe long-term trend assessments report

results for age/grade samples (9-year-olds/fourth grade; 13-year-olds/eighthgrade; and 17-year-olds/eleventh grade).They measure students’ achievements inmathematics, science, reading, and writ-ing. Measuring trends of student achieve-ment, or change over time, requires theprecise replication of past procedures.Therefore, the long-term trend instrumentdoes not evolve based on changes in cur-ricula or in educational practices.

The long-term trend assessment usesinstruments that were developed in the1970s and 1980s and are administeredevery two years in a form identical to the original one. In fact, the assessmentsallow NAEP to measure trends from 1969 to the present. In 1999, the long-termtrend assessment began to be adminis-tered on a four-year schedule and in different years from the main and stateassessments in mathematics, science,reading, and writing.

State NAEPUntil 1990, NAEP was a national assess-

ment. Because the national NAEP sam-ples were not, and are not currently,designed to support the reporting of accurate and representative state-levelresults, in 1988 Congress passed legisla-tion authorizing a voluntary Trial StateAssessment (TSA). Separate representa-tive samples of students are selected foreach jurisdiction that agrees to participatein TSA, to provide these jurisdictions withreliable state-level data concerning theachievement of their students. Althoughthe first two NAEP TSAs in 1990 and 1992assessed only public school students, the

Background and Purpose

1994 TSA included public and nonpublicschools. Certain nonstate jurisdictions,such as U.S. territories, the District ofColumbia, and Department of DefenseEducation Activity Schools, may also par-ticipate in state NAEP.

In 1996, “Trial” was dropped from thetitle of the assessment based on numerousevaluations of the TSA program. The leg-islation, however, still emphasizes thatthe state assessments are developmental.

In 1998, state NAEP assessed reading atgrades 4 and 8 and writing at grade 8. Instate NAEP, 44 jurisdictions participatedfor reading at grade 4, 41 jurisdictions forreading at grade 8, and 40 jurisdictionsfor writing at grade 8. In 2000, stateNAEP will assess mathematics and science at grades 4 and 8.

Background QuestionnairesWhat factors are related to higher

scores? Who is teaching students? Howdo schools vary in terms of coursesoffered? NAEP attempts to answer these

questions and others through data collect-ed on background questionnaires.

Students, teachers, and principals com-plete these questionnaires to provideNAEP with data about students’ schoolbackgrounds and educational activities.Students answer questions about thecourses they take, homework, and homefactors related to instruction. Teachersanswer questions about their professionalqualifications and teaching activities, whileprincipals answer questions about school-level practices and policies. Relating stu-dent performance on the cognitive por-tions of the assessments to the informationgathered on the background question-naires increases the usefulness of NAEPfindings and provides the context for a bet-ter understanding of student achievement.

Related Questions:Question 14: What contextual background

data does NAEP provide?

Question 18: Who are the studentsassessed by NAEP?

The NAEP Guide

Que

stion

1

55


The NAEP Guide6

Que

stion

2What subjects does NAEP assess? How are the subjects chosen,and how are the assessment questions determined? What subjectswere assessed in 1999? What subjects will be assessed in 2000?

Since its inception in 1969, NAEP has assessed numerous academic subjects, includ-ing mathematics, science, reading, writing, world geography, U.S. history, civics,social studies, and the arts. (A chronological list of the assessments from 1969 to 2000is on the inside back cover.)

Since 1988, the National Assessment Governing Board (NAGB) has selected the sub-jects assessed by NAEP. Furthermore, NAGB oversees creation of the frameworksthat underlie the assessments and the specifications that guide the development of the assessment instruments. The framework for each subject area is determinedthrough a consensus process that involves teachers, curriculum specialists, subject-matter specialists, school administrators, parents, and members of the general public.

In 1999, the long-term trend assessments in mathematics, science, reading, and writ-ing were conducted using the age/grade samples described earlier (see page 4). Atthe national level, the 2000 assessment will include mathematics and science atgrades 4, 8, and 12 and reading at grade 4. At the state level, NAEP will includemathematics and science at grades 4 and 8.

Selection of SubjectsThe legislation authorizing NAEP

charges NAGB with determining the sub-jects that will be assessed. The table onpage 7 identifies the subjects and gradesassessed in the 1999 assessment and thosein the assessment planned for 2000.

Development ofFrameworks

NAGB uses an organizing frameworkfor each subject to specify the content thatwill be assessed. The framework is theblueprint that guides the development ofthe assessment instrument.

Developing a framework can involvethe following elements:

• widespread participation andreviews by educators and state education officials in the particularfield of interest;

• reviews by steering committeeswhose members represent policy-makers, practitioners, and the gener-al public;

• involvement of subject supervisorsfrom the education agencies ofprospective participants;

• public hearings; and

• reviews by scholars in that field, by National Center for EducationStatistics (NCES) staff, and by a policy advisory panel.

The Framework publications for theNAEP 1999 and 2000 assessments providemore details about the consensus process,which is unique for each subject.

Although they guide the developmentof assessment instruments, frameworkscannot encompass everything that is

Answer

Question:2

FURTHER DETAILS

The NAEP Guide

Que

stion

2

77

taught in all the classrooms of the nation,much less everything that should betaught. Nevertheless, frameworks capturea range of subject-specific content andthinking skills that students need to learnand the complex issues they encounterinside and outside their classrooms.Furthermore, the consensual process usedto develop the frameworks ensures thatthey are appropriate for current educa-tional requirements.

Because the assessments must remainflexible to mirror changes in educationalobjectives and curricula, the frameworksmust be forward looking and responsive,balancing current teaching practices withresearch findings. This flexibility is evi-dent in the evolution of NAEP assessmentinstruments. The present instrumentsallocate a majority of testing time to constructed-response questions thatrequire students to compose written

answers. Because they require students todescribe, interpret, and explain, thesequestions elicit a broader range of stu-dents’ cognitive and analytical abilitiesthan do simple multiple-choice questions.Furthermore, the information obtainedthrough constructed-response questionsenhances NAEP’s ability to track progresstoward national education goals.

Specification of AssessmentQuestions

Under the direction of EducationalTesting Service (ETS), teachers, subject-matter specialists, and measurementexperts develop the questions and tasksbased on the subject-specific frameworks.

For each subject-area assessment, anational committee of experts providesguidance and reviews the questions toensure that they meet the framework


NAEP 1999 and 2000 Assessments1999 NAEP

Long-Term Trend Age 9 Age 13 Age 17Reading ✓ ✓ ✓

Mathematics ✓ ✓ ✓

Science ✓ ✓ ✓

Grade 4 Grade 8 Grade 11Writing ✓ ✓ ✓

2000 NAEP

Main NAEP Grade 4 Grade 8 Grade 12Reading ✓

Mathematics ✓ ✓ ✓

Science ✓ ✓ ✓

State NAEP Grade 4 Grade 8 Grade 12Mathematics ✓ ✓

Science ✓ ✓

The NAEP Guide8

Que

stion

2

specifications. For each state-level assess-ment, the state curriculum and testingdirectors who comprise the NAEP NET-WORK review the questions that will beincluded in the NAEP state component.

Framework for the 1999 Long-Term TrendAssessments

Because the long-term trend assessmentsmeasure trends in student achievement,or change over time, they must preciselyreplicate past procedures and frameworks.The long-term trend instruments do notevolve based on changes in curricula or ineducational practices. Main NAEP assess-ments follow the curriculum frameworksdeveloped by NAGB and use the latestadvances in assessment methodology. Asa result, the subjects of mathematics, sci-ence, reading, and writing are assessedboth in long-term trend and in mainNAEP, using different instruments andfor different purposes.

In 1999, the long-term trend assessmentswere identical to the assessments begun in1984 for reading and writing and in 1986for mathematics and science. (Note thattrend was measured several years beforethat because of statistical links with previ-ous years’ assessments; the chart on theinside back cover shows specific subjectsand years in trend measurement.)

The reading framework for long-termtrend is described in Reading Objectives:1983–84 Assessment. There are four objectives for student achievement:

• Comprehending What Is Read;

• Extending Comprehension;

• Managing the Reading Experience;and

• Valuing Reading.

The writing framework for long-termtrend is described in Writing Objectives:1984 Assessment and Writing Objectives:1988 Assessment. There are four objectivesfor student achievement:

• Writing to Accomplish a Variety of Purposes;

• Managing the Writing Process;

• Controlling the Forms of WrittenLanguage; and

• Valuing Writing and Written Works.

The mathematics framework for long-term trend is described in Math Objectives:1985–86 Assessment. There are seven con-tent areas:

• Fundamental Methods ofMathematics;

• Discrete Mathematics;

• Data Organization andInterpretation;

• Measurement;

• Geometry;

• Relations, Functions, and AlgebraicExpressions; and

• Numbers and Operations.

The science framework for long-termtrend is described in Science Objectives:1985–86 Assessment and Science Objectives:1990 Assessment. There are six content areas:

• Life Science;

• Physics;

• Chemistry;

• Earth and Space Science;

• History of Science; and

• Nature of Science.

The design of the assessments, sam-pling, and data collection are described in


The NAEP Guide

Que

stion

2

99

the Procedural Appendix of NAEP 1996Trends in Academic Progress.

Framework for the 2000NAEP MathematicsAssessment

The framework for the 2000 NAEPmathematics assessment covers five content strands:

• Number Sense, Properties, andOperations;

• Measurement;

• Geometry and Spatial Sense;

• Data Analysis, Statistics, andProbability; and

• Algebra and Functions.

The distribution of questions amongthese strands is a critical feature of theassessment design, as it reflects the rela-tive importance and value given to eachof the curricular content strands withinmathematics. Over the past six NAEPassessments in mathematics, the contentstrands have received differential empha-sis. There has been continuing movementtoward a more even balance among thestrands and away from the earlier model,in which questions that were classified asnumber facts and operations accountedfor more than 50 percent of the assess-ment item bank. Another significant dif-ference in the newer NAEP mathematicsassessments is that questions may be clas-sified into more than one strand, under-scoring the connections that exist betweendifferent mathematical topics.

A central feature of student performancethat is assessed by NAEP mathematics is“mathematical power.” Mathematicalpower is characterized as a student’s over-all ability to gather and use mathematicalknowledge through:

• exploring, conjecturing, and reason-ing logically;

• solving nonroutine problems;

• communicating about and throughmathematics; and

• connecting mathematical ideas inone context with mathematical ideasin another context or with ideas fromanother discipline in the same orrelated contexts.

To assist in the collection of informa-tion about students’ mathematical power,assessment questions are classified notonly by content, but also by mathematicalability. The mathematical abilities of prob-lem solving, conceptual understanding,and procedural knowledge are not sepa-rate and distinct factors of a student’sways of thinking about a mathematicalsituation. They are, rather, descriptions ofthe ways in which information is struc-tured for instruction and the ways inwhich students manipulate, reason with,or communicate their mathematical ideas.As such, some questions in the assess-ment may be classified into more thanone of these mathematical ability cate-gories. Overall, the distribution of allquestions in the mathematics assessmentis approximately equal across the threecategories.

Framework for the 2000NAEP Science Assessment

The 2000 NAEP science assessmentframework is organized along two majordimensions:

• Fields of science: Earth, Physical, andLife Sciences; and

• Knowing and doing science: Concep-tual understanding, Scientific investi-gation, and Practical reasoning.


The NAEP Guide10

Que

stion

2

Fields of science. The assessment empha-sizes knowledge in the content areas. Thecontent assessed in earth science centerson objects and events that are relativelyassessable or visible, such as Earth (litho-sphere), water (hydrosphere), air (atmo-sphere), and the Earth in space. The physical science component relates tobasic knowledge and understanding concerning the structure of the universe,as well as the physical principles thatoperate within it. The assessment probesmatter and its transformations, energyand its transformations, and the motionof things. Major concepts assessed in lifesciences include change and evolution,cells and their functions, organisms, andecology.

Knowing and doing science. This dimen-sion stresses the connections and organi-zation of factual knowledge in science.Conceptual understanding is the measureof students’ abilities to perceive and graspthe meaning of ideas. Students shouldacquire a rich collection of scientific infor-mation that will enable them to movefrom simply being able to provide reason-able interpretations of observations toproviding explanations and predictions.Assessment exercises in grade 4 deal withstudents’ ability to elaborate principlesfrom personal experiences. In grades 8and 12, the emphasis shifts from richnessof experience to reasonable scientificinterpretation of observations. This data-base of organized information also allowsstudents to apply information efficientlyin the design and execution of scientificinvestigations and in practical reasoning.

Scientific investigation represents theactivities of science that distinguish itfrom other ways of knowing about theworld. Science involves designing fairtests and considering all the variables and the means to control the variables.

As students are asked to demonstratetheir ability to do scientific investigations,it is important to keep in mind theirdevelopment in understanding and performance, not just with respect to thecontrol of variables but also regarding theother elements of science. Appropriate totheir age and grade level, students will be assessed on their ability to acquire new information, plan appropriate inves-tigations, use a variety of scientific tools,and communicate the results of theirinvestigations.

Practical reasoning is a complex skillthat develops throughout life. As childrenmature they also learn to take a deperson-alized view of a situation and to considersomeone else’s point of view. Practical rea-soning should become a major factor inscience assessment at grades 8 and 12rather than at grade 4. By grade 12, stu-dents should be able to discuss larger science- and technology-linked problemsnot directly related to their immediateexperience. Examples include waste dis-posal, energy uses, air quality, water pol-lution, noise abatement, and the tradeoffsbetween the benefits and adverse conse-quences of various technologies. Practicalreasoning includes competence in analyz-ing a problem, planning appropriate ap-proaches, evaluating them, carrying outthe required procedures for the approachselected, and evaluating the results.

In addition to the two major dimen-sions, the framework includes two othercategories that pertain to a limited subsetof items:

• Nature of science; and

• Themes.

Nature of science includes the historicaldevelopment of science and technology,the habits of mind that characterize thesefields, and methods of inquiry and prob-


▲

The NAEP Guide

Que

stion

2

1111


lem solving. Themes represent big ideasor key organizing concepts that pervadescience. Themes include the ideas of sys-tems and their application in the disci-plines, models and their function in thedevelopment of scientific understandingand its application to practical problems,and patterns of change as exemplified innatural phenomena.

Framework for the 2000NAEP Reading Assessment

The NAEP reading assessment frame-work, used from 1992 to 2000 and ground-ed in current theory, views reading as a dynamic, complex interaction that in-volves the reader, the text, and the contextof the reading experience. As specified inthe framework, the assessment addressesthree purposes for reading:

• reading for literary experience;

• reading for information; and

• reading to perform a task.

Reading for literary experienceinvolves reading novels, short stories,poems, plays, and essays to learn howauthors present experiences and interac-tion among events, emotions, and possi-bilities. Reading to be informed involvesreading newspapers, magazine articles,textbooks, encyclopedias, and cataloguesto acquire information. Reading to per-form a task involves reading documentssuch as bus schedules, directions for agame, laboratory procedures, recipes, or maps to find specific information,understand the information, and apply it.(Reading to perform a task is not assessedat grade 4.)

Within these purposes for reading, the framework recognizes four ways that readers interact with text to constructmeaning from it. These four modes of

interaction, called “reading stances,” areas follows:

• forming an initial understanding;

• developing an interpretation;

• engaging in personal reflection andresponse; and

• demonstrating a critical stance.

All reading assessment questions aredeveloped to reflect one of the purposesfor reading and one of the readingstances.

The following questions from a previ-ous grade 4 reading assessment indicatethe reading purposes and stances testedby the questions and illustrate a samplestudent response.

Grade 4Story:

Hungry Spider and the Turtle

“Hungry Spider and the Turtle” is aWest African folktale that humorouslydepicts hunger and hospitality throughthe actions and conversations of twovery distinct characters. The ravenousand generous Turtle, who is tricked outof a meal by the gluttonous and greedySpider, finds a way to turn the tablesand teach the Spider a lesson.

Questions:

Why did Spider invite Turtle toshare his food?

A. To amuse himself

B. To be kind and helpful

C. To have company at dinner

D. To appear generous

Reading Purpose: LiteraryExperience

Reading Stance: Developing anInterpretation

Who do you think would make abetter friend, Spider or Turtle?Explain why.

Reading Purpose: LiteraryExperience

Reading Stance: Personal Response

Sample Response:

I think Turtle because insted of getangry with Spider he just pad him alesson.

The NAEP Guide12

Que

stion

2Background and Purpose

For further discussion of the frame-work, see the Reading Framework for theNational Assessment of Educational Progress:1992–2000. For additional explanationsand the results, see NAEP 1998 ReadingReport Card for the Nation and the States.

Related Questions:Question 1: What is NAEP?

Question 6: What process is used to devel-op the assessments?

The NAEP Guide

Que

stion

2

1313

The NAEP 2000 Reading AssessmentAspects of Reading Literacy

Constructing, Extending, and Examining Meaning

Requires the read-er to provide aninitial impressionor unreflectedunderstanding of what was read.

Requires the read-er to go beyondthe initial impres-sion to develop a more completeunderstanding ofwhat was read.

Requires the reader to connectknowledge fromthe text to his orher own personalbackground. Thefocus is on howthe text relates to personal knowledge.

Requires the read-er to stand apart from the text and consider it.

What is thestory/plot about?

How did the plotdevelop?

How did this char-acter change youridea of ____?

Rewrite this storywith ____ as a setting or ____ as a character.

How would youdescribe the maincharacter?

How did this character changefrom the beginningto the end of thestory?

Is this story similarto or different from your ownexperiences?

How does thisauthor’s use of____ (irony, personification,humor) contributeto ____?

What does this arti-cle tell you about_____?

What caused thisevent?

What current eventdoes this remindyou of?

How useful wouldthis article be for____? Explain.

What does theauthor think aboutthis topic?

In what ways arethese ideas impor-tant to the topic or theme?

Does this descrip-tion fit what youknow about ____?Why?

What could beadded to improvethe author’s argument?

What is this sup-posed to help you do?

What will be theresult of this stepin the directions?

In order to ____,what informationwould you need to find that youdon’t know rightnow?

Why is the infor-mation needed?

What time can you get a nonstopflight to X? (Search)

What must you dobefore this step?

Describe a situationwhere you couldleave out step X.

What would happen if youomitted this?

Reading forLiterary

Experience

InitialUnderstanding

DevelopingInterpretation

Personal Reflectionand Response

Demonstratinga Critical Stance

Reading forInformation

Reading toPerform a Task

The NAEP Guide14

Que

stion

3

Answer

Question:1

Is participation in NAEP voluntary? Are the data confiden-tial? Are students’ names or other identifiers available?

Federal law specifies that NAEP is voluntary for every pupil, school, school dis-trict, and state. Even if selected, school districts, schools, and students can refuseto participate without facing any adverse consequences from the federal govern-ment. Some state legislatures mandate participation in NAEP, others leave theoption to participate to their superintendents and other education officials at thelocal level, and still other states choose not to participate.

Federal law also dictates that NAEP data remain confidential. The legislationauthorizing NAEP—the National Education Statistics Act of 1994, Title IV ofImproving America’s Schools Act of 1994, U.S.C. 9010—stipulates in Section411(c)(2)(A):

The Commissioner shall ensure that all personally identifiable informationabout students, their education performance, and their families, and thatinformation with respect to individual schools, remains confidential, inaccordance with Section 552a of Title 5, United States Code.

After publishing NAEP reports, the National Center for Education Statistics(NCES) makes the data available to researchers but withholds students’ namesand other identifying information. Although it might be possible for researchersto deduce the identities of some NAEP schools, they must swear to keep theseidentities confidential, under penalty of fines and jail terms, before gainingaccess to NAEP data.

A Voluntary AssessmentParticipation in NAEP is voluntary for

states, school districts, schools, teachers,and students. Participation involvesresponding to test questions that focus ona particular subject and to backgroundquestions that concern the subject area,classroom practices, school characteristics,and student demographics. Answeringany of these questions is voluntary.

Before any student selected to partici-pate in NAEP actually takes the test, thestudent’s parents decide whether or nottheir child will do so. Local schools determine the procedures for obtainingparental consent.

NAEP background questions provideeducators and policymakers with usefulinformation about the educational envi-ronment. Nonparticipation and nonre-sponse—by students as well as teachers—greatly reduce the amount of potentiallyhelpful information that can be reported.

A Confidential AssessmentAll government and contractor em-

ployees who work with NAEP data swearto uphold a confidentiality law. If any em-ployee violates the confidentiality law bydisclosing the identities of NAEP respon-dents, that person is subject to criminal

Answer

Question:3

FURTHER DETAILS

The NAEP Guide

Que

stion

3

1515

penalties, including fines and prisonterms.

During test administration, the namesof students are used to assign specific testbooklets to students selected for a partic-ular assessment. Each booklet has aunique identification number so that itcan be linked to teacher and school data.After the booklets have been completedand absent students have taken makeuptests, NAEP no longer needs students’names, and the links between students’names and their test booklets aredestroyed.

NAEP administrators use tear-off formsto break the link between the names andidentification numbers. When the book-lets are sent to NAEP for scoring, the por-tion of the form containing students’names remains in the school. School offi-cials keep these forms in a secure storageenvelope for a few weeks after the assess-ment in case the link to the identificationnumbers needs to be checked. When theinformation is no longer needed, schoolsare notified and officials destroy the stor-age envelope, confirming their actions byreturning a Destruction Notice to NAEP.

Released DataMost people make use of published

reports and other NAEP summary datasuch as almanacs. However, educational

researchers may have an interest in addi-tional analyses that require access to rawNAEP data. Because public funds are usedfor NAEP, these raw data are made avail-able after their collection to researchersthrough restricted-use data tapes, subjectto approval by NCES. The released datado not include names, addresses, or otherpersonally identifiable information aboutthe students who were assessed. In thoseextremely rare cases in which schools orteachers might be identified—becausethe data they reported were unusual orunique—NCES suppresses sufficientinformation to eliminate that possibility.

To receive these restricted-use datafiles, NCES requires researchers to submitsworn statements vowing not to discloseany identifiable information. These statements must be provided before re-searchers access the data. Researcherswho violate the confidentiality law aresubject to the same criminal penalties—fines and prison terms—as governmentand contractor employees.

Related Question:Question 4: Can parents examine the

questions NAEP uses to assess studentachievement? Can parents find out how well their children performed in the NAEPassessment? Why are NAEP questions keptconfidential?

The NAEP Guide16

Que

stion

4Can parents examine the questions NAEP uses to assess studentachievement? Can parents find out how well their children per-formed in the NAEP assessment? Why are NAEP questions keptconfidential?

Every parent has the right of access to the educational and measurement materialsthat their children encounter. NAEP provides a demonstration booklet so that inter-ested parents may review questions similar to those in the assessment. Under certainprearranged conditions, small groups of parents can review the booklets being usedin the actual assessment. This review must be arranged with the school principal,NAEP field supervisor, or school coordinator, who will ensure that test security ismaintained.

NAEP is not designed, however, to report scores for individual students. So,although parents may examine the NAEP test questions, the assessment yields noscores for their individual children.

As with other school tests or assessments, most of the questions used in NAEPassessments remain secure or confidential to protect the integrity of the assessment.NAEP’s integrity must be protected because certain questions measure studentachievement over a period of time and must be administered to students who havenever seen them before.

Despite these concerns, NAEP releases nearly one-third of the questions used in eachassessment, making them available for public use. Furthermore, the demonstration booklets provided by NAEP make all student background questions readily avail-able for review.

Parent Access to NAEPBooklets

Because parents are interested in theirchildren’s experiences in school, NAEPprovides the school with a demonstrationbooklet before the assessment is sched-uled. This demonstration booklet, whichmay be reproduced, contains all studentbackground questions and sample cogni-tive questions. Parents can obtain copiesof the demonstration booklet from theschool.

Within the limits of staff and resources,school administrators and parents canreview the NAEP booklets being used

for the current assessment. Arrangementsfor this review must be made prior to the local administration dates so that sufficient materials can be prepared andinterested persons can be notified of itstime and location. Upon request, NAEPstaff will also review the booklets withsmall groups of parents, with the under-standing that no assessment questionswill be duplicated, copied, or removed.

Requests for these reviews can be made to the NAEP data collection staff or by contacting the National

Answer

Question:4

FURTHER DETAILS

The NAEP Guide

Que

stion

4

1717

Center for Education Statistics (NCES) at202–219–1831. Individuals whose childrenare not participating in the assessmentbut who wish to examine secure assess-ment questions can contact the U.S.Department of Education’s Freedom ofInformation Act officer at 202–708–4753.

The Importance of SecurityMeasuring student achievement and

comparing students’ scores from previousyears requires reusing some questions forcontinuity and statistical purposes. Thesequestions must remain secure to assesstrends in academic performance accurate-ly and to report student performance onexisting NAEP score scales.

Furthermore, for NAEP to regularlyassess what the nation’s students knowand can do, it must keep the assessmentfrom being compromised. If studentshave prior knowledge of test questions,then schools and parents will not know

whether their performance is based onclassroom learning or coaching on specif-ic assessment questions. After everyassessment, nearly one-third of the ques-tions are released to the public. Thesequestions can be used for teaching orresearch. NAEP reports often containsamples of actual questions used in theassessments. Sample questions can alsobe obtained from NCES, NAEP ReleasedExercises, 555 New Jersey Avenue, NW,Washington, DC 20208–5653 or on theWeb site at http://nces.ed.gov/nationsreportcard.

Related Questions:Question 3: Is participation in NAEP vol-

untary? Are the data confidential? Are stu-dents’ names or other identifiers available?

Question 15: How can educators useNAEP resources such as frameworks, releasedquestions, and reports in their work?

The NAEP Guide18

Que

stion

5

Who evaluates and validates NAEP?

NAEP and its findings have a considerable impact on the public’s understanding ofstudent academic achievement. Because NAEP plays a unique and prominent role,precautions must be taken to ensure the validity and reliability of its findings.Therefore, Congress consistently passes legislation that establishes panels to evaluatethe assessment as a whole. In response to these mandates, the National Center forEducation Statistics (NCES) has established various expert panels to study NAEP.These panels have produced a series of reports that address numerous importantNAEP issues.

The Technical Review PanelBy law, the Commissioner of NCES

must provide “continuing reviews of theNational Assessment, including valida-tion studies” (P.L. 100–297, Sec. 3403 [I][9] [A]). In fulfillment of this mandate, the Center for Research on Evaluation,Standards, and Student Testing (CRESST)at the University of California, LosAngeles, in conjunction with the Univer-sity of Colorado at Boulder and RAND,was awarded a contract in 1989 to estab-lish the Technical Review Panel (TRP).

Beginning in 1989, TRP produced aseries of studies on specific questionsabout the validity of interpreting NAEPresults, including:

• the quality of NAEP data;

• the number and character of NAEPscales;

• the robustness of NAEP trend lines;

• the trustworthiness of one interpreta-tion of group comparisons;

• the validity of interpretations ofNAEP anchor points and achieve-ment levels;

• the linking of other test results to NAEP;

• the effects of student motivation on performance;

• the adequacy of NAEP data aboutstudent background and instruction-al experiences; and

• the understanding educators andpolicymakers obtain from NAEPreports.

At the end of the project period, TRPproduced a final report summarizing theresults of its studies (Linn, Koretz, &Baker, 1996).

The Trial State AssessmentAnother panel was commissioned to

study the validity of the NAEP state com-ponent. In spring 1988, Congress enactedPublic Law 100–297, which authorized

Answer

Question:5

FURTHER DETAILS

The NAEP Guide

Que

stion

5

1919

the NAEP Trial State Assessment (TSA).In authorizing the TSA, Congress calledfor an independent evaluation of “the fea-sibility and validity of state assessmentsand the fairness and accuracy of the datathey produce.” In response to the legisla-tion, NCES awarded a grant in 1990 to theNational Academy of Education (NAE)Panel on the Evaluation of the NAEP TrialState Assessment Project.

Between 1992 and 1996, the NAE Panelproduced numerous reports that evaluat-ed the validity of TSA. The panel releaseda capstone report in April 1997. This finalreport concluded that in less than 10years NAEP had increased by a factor offour the number of students assessed;undergone important changes in test con-tent, design, and administration; andengaged the intense interest of manystakeholders and observers. The rapidpace of these changes created extremeconditions of conflicting demands,strained resources, and technical com-plexities that were a potential threat tothe existence of the entire program.

Six major developments have becomepoints of continuous discussion in NAEP:expansion of the assessment to includestate-level NAEP; inclusion of more chal-lenging performance tasks; testing broad-er and more representative samples ofstudents, including students with disabili-ties or limited English proficiency; pres-sure to make NAEP standards-based inthe absence of nationally agreed-uponcontent and performance standards;desire for international comparisons; anddesire to link NAEP with state assess-ments. With the importance of assessmentin education reform likely to continue andincrease, NAEP’s future course dependson critical decisions that take all these fac-tors into consideration. The NAE Panel

also evaluated the National AssessmentGoverning Board performance standardsin a separate report.

EvaluationIn 1996, the National Academy of

Sciences (NAS) was awarded a contract to further evaluate the national and stateNAEP. In response, NAS formed a com-mittee of distinguished educators andother experts to conduct the evaluationactivities described in the Congressionalmandate of 1994. Public Law 103–382mandates that “the Secretary shall pro-vide for continuing review of theNational Assessment, State Assessments,and student performance levels by one ormore nationally recognized organizations.”In the evaluation process, the committeedirected workshops; commissionedpapers; solicited testimony and interviews;observed NAEP activities; and studiedprogram documents, extant research, andprior evaluation reports. NAS released itsNAEP evaluation report, Grading thenation’s report card: Evaluating NAEP andtransforming the assessment of educationalprogress, in early 1999. The report present-ed observations and recommendations ina number of key areas, including stream-lining the design of NAEP, enhancing theparticipation and meaningful assessmentof English-language learners and studentswith disabilities, framework design andthe assessment development process, andthe setting of reasonable and useful per-formance standards. NCES has requestedthat NAS continue to evaluate specificaspects of the NAEP program in the com-ing years.

A new committee is looking at issuessurrounding district-level reporting andmarket basket reporting, two importantissues raised in the NAS evaluation report.

The NAEP Guide20

Que

stion

5

The NAEP Validity Studies Panel

NCES funded a contract that estab-lished the NAEP Validity Studies (NVS)Panel. This panel was formed to providetechnical review of NAEP plans andproducts and to identify technical con-cerns and promising techniques worthyof further study and research.

Since its inception in October 1995,the NVS Panel has worked on numerousvalidity studies, and that work will continue

in the coming years. To date, the panelhas released reports on: (1) optimizingstate NAEP, (2) the information value ofconstructed-response items, (3) evaluationof NAEP equating procedures, (4) a feasi-bility study of two-stage testing in large-scale educational assessments, (5) theeffects of finite sample corrections onstate assessment sample size require-ments, (6) a proposed research programon NAEP reporting, and (7) an investiga-tion of why some students do not respondto NAEP questions.

The NAEP Guide

Que

stion

6

21

What process is used to develop the assessments?

To meet the nation’s growing need for information about what studentsknow and can do, the NAEP assessment instruments must measure changeover time and must reflect changes in curricula and instruction in diversesubject areas. Meeting these goals can be especially challenging becauseinstructional design and objectives may change at any time in the nation’s100,000 schools.

Developing the assessment instruments—from writing questions to analyzingfield-test results to constructing the final instruments—is a complex processthat consumes most of the time during the interval between assessments. Inaddition to conducting a field test, developers subject the assessment instru-ments to numerous reviews to identify areas that require revision or augmen-tation so they comply with the specifications of the framework and theachievement levels.

The Development ProcessThe following section summarizes the

instrument development process thatNAEP uses for the main and state assess-ments. Newly developed assessmentquestions and exercises go through all thesteps in this process. The many reviewshelp identify areas that must be revised or augmented to ensure compliance withthe framework and the achievement lev-els. Thus, many experts offer input intothe development process, ensuring thatthe tests adhere as closely as possible tothe goals established by the NationalAssessment Governing Board (NAGB).

Summary of the NAEPInstrument DevelopmentProcess

• Educational Testing Service (ETS)test development specialists and var-ious subject-matter experts write thequestions and exercises and then

classify them according to frame-work specifications.

• Test development staff experiencedin the subject area review the ques-tions and exercises for content con-cerns and revise them accordingly.

• Questions and exercises are bankedin the test development system, as isall classification information.

• A test developer assembles blocks ofquestions and exercises for field testsaccording to specifications.

• Specialists review the blocks, whichundergo mandatory sensitivity andeditorial reviews.

• Assessment questions are adminis-tered to individual students in one-on-one question tryout sessions todetermine both how well studentsunderstand the questions and whatfurther refinements should be made

Answer

Question:6

FURTHER DETAILS

The NAEP Guide22

Que

stion

6

to the wording or formatting ofquestions.

• Instrument Development Commit-tees convene to review the questionsand blocks and to independentlyconfirm that the questions fit theframework specifications and arecorrectly classified.

• Outside groups of content andassessment experts verify the ques-tion classifications independently.

• For the state assessment program,the NAEP NETWORK reviews allquestions, exercises, blocks, andquestionnaires that will be includedin the assessment.

• A test developer updates the testdevelopment version of the ques-tions and exercises based on commit-tee, NAEP NETWORK, and contentand assessment expert reviews.

• The field test questionnaires andassessment exercises are reviewed bythe National Center for EducationStatistics (NCES), the NationalAssessment Governing Board (NAGB),the Office of Management andBudget (OMB), and the InformationManagement Compliance Division(IMCD) for compliance with govern-ment policies on data collection.Revisions to the field test versions aremade as needed to obtain govern-ment clearance.

• A clearance number is obtained forthe field test.

• Field test booklets and question-naires are printed, and other assess-ment materials (such as audiotapes,photographs, and hands-on sciencematerials) are produced.

• Exercises from the blocks are bankedin the test development system.

• The field tests are administered.

• The field tests are scored and analyzed.

• Suitable questions for the assessmentare selected.

• Subject-matter specialists review theblocks selected for the assessment.

• The blocks undergo sensitivity andeditorial reviews.

• Instrument Development Commit-tees convene to review the questionsand blocks and to independentlyconfirm classification codes.

• The assessment questionnaires andassessment exercises are reviewed byNCES, NAGB, OMB, and IMCD forcompliance with government policieson data collection. Revisions to thefield test versions are made as need-ed to obtain government clearance.

• The camera-ready blocks are proofedand approved for printing.

• Final versions of other assessmentmaterials (such as audiotape andphotographs) are approved for production.

• The assessment booklets and ques-tionnaires are prepared, proofed, and printed.

• Exercises from the blocks are bankedin the test development system.

• The assessments are administered.

The blocks used in NAEP are collectionsof questions administered to students as atimed unit. NAGB is responsible for ensur-ing that all questions selected for NAEPare free from racial, cultural, gender, orregional bias. Thus, the blocks undergo a

The NAEP Guide

Que

stion

6

2323

mandatory sensitivity review to ensurethat the assessment reflects a thoughtful,balanced consideration of all groups ofpeople. External reviewers, including stateeducation agency personnel, review thequestions for appropriateness for studentsfrom a variety of backgrounds and acrossregions. NCES also reviews all NAEPquestions, and OMB and IMCD furtherreview the background questions.

As a final quality control measure tomonitor against bias, the results for eachquestion are checked empirically after thefield test. This empirical check for fairnessemploys differential item functioning(DIF) analyses. DIF analyses identifyquestions that are differentially difficultfor particular subgroups of students (whoare categorized by racial or ethnic groupmembership or by gender) for reasonsthat seem unrelated to the overall abilityof the students. For further discussion ofDIF procedures, see the NAEP 1996Technical Report.

The Instrument Development Commit-tee reviews the questions and blocks andindependently confirms the classificationcodes. This committee meets severaltimes during the development cycle toconsider question format and appropri-ateness and the cognitive processes beingmeasured, to refine scoring rubrics afterthe field test, and to review field testresults.

The NAEP NETWORK includes repre-sentatives from nonpublic schools andassessment directors from all 50 states,Guam, Puerto Rico, the Virgin Islands, theDistrict of Columbia, and the Departmentof Defense Education Activity schools.The NAEP NETWORK convenes toreview all exercises, blocks, and question-naires that will be included in the stateassessment program.

Related Question:Question 11: How does NAEP analyze

the assessment results?

The NAEP Guide24

Que

stion

7

How does NAEP accommodate students with disabilities and students with limited English proficiency?

Throughout its history, NAEP has always encouraged the inclusion of all studentswho could meaningfully participate in the assessment, including those with disabili-ties and those with limited English proficiency. An estimated 10 percent of the schoolpopulation is classified as having a disability or limited English proficiency. Nearlyhalf of these students have been included in previous assessments, although the per-centages vary by grade and subject being assessed.

Previously, because of concerns about standardized administration, accommodationssuch as bilingual booklets and extended testing time were not permitted, excludingsome students who could have participated if accommodations had been made.Because it is committed to increased inclusion, the National Center for EducationStatistics (NCES) formally tested new policies with the 1996 NAEP assessment. Underthese guidelines, school administrators were encouraged even more than in the pastto include students with disabilities or limited English proficiency (SD/LEP) if anydoubt about excluding the student existed. Although NAEP establishes the criteria forinclusion, differences remain among states in how SD/LEP students are treated.Because of the 1997 amendments to the Individuals with Disabilities Education Act,some states are changing their procedures for students with disabilities.

Increased InclusionNAEP intends to assess all students

selected to participate. However, somestudents may have difficulty with theassessment as it is normally administeredbecause of a disability or limited Englishproficiency.

Beginning with the 1996 nationalassessment, NAEP implemented a two-part modification of procedures to in-crease inclusion in NAEP assessments.First, revised criteria were developed todefine how decisions about inclusionshould be made. Second, NAEP providedcertain accommodations that were eitherspecified in a student’s IndividualizedEducation Plan (IEP) or frequently usedto test the student. The accommodations

vary depending on the subjects beingassessed.

When a school identifies a student ashaving a disability or limited English pro-ficiency, the teacher or staff member whois most familiar with the student is askedto complete a questionnaire about the ser-vices received by the student and to de-termine whether the student should takepart in the assessment. The questionnaireprovides useful information about differ-ential exclusion rates across disabilityconditions and across the states. Studentswho cannot take part, even with anaccommodation allowed by NAEP, areexcluded from the assessment.

Answer

Question:7

FURTHER DETAILS

The NAEP Guide

Que

stion

7

2525

SummaryNAEP has traditionally included more

than 90 percent of the students selectedfor the sample. Even though the percent-age of exclusion is now relatively small,NAEP continually explores ways to further reduce exclusion rates while

ensuring that NAEP results are represen-tative and can be generalized.

Related Question:Question 14: What contextual back-

ground data does NAEP provide?

The NAEP Guide26

Que

stion

8

What assessment innovations has NAEP developed?

NAEP frameworks are updated to reflect changes both in curricula and in the waysubjects are taught. The 1997 NAEP arts assessment emphasized performance andintegrated performance tasks into the assessment instruments for visual arts, music,dance, and theatre. The 1998 assessments in reading, writing, and civics emphasizedactivities that used a variety of stimulus materials such as cartoons, color photo-graphs, and letters. Both the mathematics and science assessments for 2000 take intoaccount recent developments in education. The mathematics assessment incorporatesthe use of calculators, rulers, protractors, and manipulatives into meaningful assess-ment exercises. Many questions in the science assessment are performance exercises,and some hands-on tasks require students to engage in scientific investigations. Since1996, all NAEP assessments have emphasized the inclusion of more students whorequire special accommodations.

Innovations in RecentAssessments

Each new framework contains assess-ment objectives that have been updatedto reflect changes in curricula and instruc-tion, often requiring innovations inassessment instrumentation, scoring pro-cedures, and analysis methodology. Evenwhen the same framework guides assess-ments for several years, shifts in curricu-lar or instructional practice may necessi-tate the field testing or use of new blocksof questions or performance tasks for sub-sequent assessments.

NAEP in 2000The 2000 NAEP assessment in mathe-

matics is the fourth in a trend line thatbegan in 1990, continuing in 1992 and in1996. The 2000 assessment maintains thedesign of the earlier mathematics assess-ments. It also continues the trend of incor-porating the use of calculators, rulers,

protractors, and manipulatives into mean-ingful assessment exercises that are presented in both multiple-choice andconstructed-response formats. Some constructed-response questions requirestudents to generate extended responsesthat support conjectures, justify conclu-sions, and substantiate numerical results.

The 2000 NAEP assessment in scienceis the second in a trend line that began in1996. Many questions in the assessmentare performance exercises. These enhancethe assessment’s measurement of stu-dents’ abilities to reason, explain, andapply scientific knowledge and to planand evaluate scientific investigations. Inaddition, many students are required tocomplete a hands-on task that requiresthem to manipulate equipment, observe,measure, and reach some conclusionsregarding their investigation.

Answer

Question:8

FURTHER DETAILS

The NAEP Guide

Que

stion

8

2727

The Impact of NAEPInnovations

The NAEP assessments have had animpact throughout the nation as NAEPdata, the frameworks, and their objectiveshave served as models for several statetesting programs. For example, Marylandused the 1990 NAEP reading and writingframeworks to develop the MarylandLearning Outcomes in those areas. As aresult, the Maryland outcomes resemblethe NAEP objectives in name and con-struct definition. West Virginia, likeMaryland, has used NAEP as a model forits curriculum specifications.

In 1994, North Carolina conducted aspecial study of eighth-grade students tolink their performance on the North

Carolina End-of-Grade Tests in Mathe-matics to the NAEP mathematics assessment performance at grade 8. Thestate is also examining ways to betteralign the achievement levels from theend-of-grade tests for grades 3 through 8with those from the NAEP assessment.

Finally, Ohio used the NAEP frame-works as models to develop its contentstandards. Ohio has also modeled ques-tions on the NAEP item structure whendeveloping state assessments at grades 4and 6 in reading, writing, mathematics, cit-izenship, and science. For additional infor-mation about NAEP activities in a givenstate, contact the NAEP coordinator forthat state. A list of coordinators appears onthe NCES World Wide Web site (http://nces.ed.gov/nationsreportcard).

The NAEP Guide28

Que

stion

9

What results does NAEP provide?

NAEP provides results about subject-matter achievement, instructional experiences,and school environment and reports these results by populations of students (e.g.,fourth graders) and subgroups of those populations (e.g., male students or Hispanicstudents). NAEP does not provide individual scores for the students or schoolsassessed.

Subject-matter achievement is reported in two ways—scale scores and achievementlevels—so that student performance can be more easily understood. NAEP scalescore results provide information about the distribution of student achievement bygroups and subgroups. Achievement levels categorize student achievement as Basic,Proficient, and Advanced, using ranges of performance established for each grade. (Afourth level, below Basic, is also reported for this scale.) Achievement levels are usedto report results by a set of standards for what students should know and be able todo.

Because NAEP scales are developed independently for each subject, scale score andachievement level results cannot be compared across subjects. However, these report-ing metrics greatly facilitate performance comparisons within a subject from year toyear and from one group of students to another in the same grade.

NAEP Contextual VariablesAs the Nation’s Report Card, national

NAEP examines the collective perfor-mance of U.S. students. State NAEP pro-vides similar information for participatingjurisdictions. Although it does not reporton the performance of individual stu-dents, NAEP reports on the overall per-formance of aggregates of students (e.g.,the average reading scale score for eighth-grade students or the percentage ofeighth-grade students performing at orabove the Proficient level in reading).NAEP also reports on major subgroups of the student population categorized bydemographic factors such as race or eth-nicity, gender, highest level of parentaleducation, location of the school (central

city, urban fringe or large town, or ruralor small town), and type of school (publicor nonpublic).

Information provided through back-ground questionnaires completed by students, teachers, and school administra-tors enables NAEP to examine studentperformance in the context of variouseducation-related factors. For instance,the NAEP 1998 assessments reportedresults gathered from these question-naires for the following contextual vari-ables: course taking, homework, use oftextbooks or other instructional materials,home discussions of school work, andtelevision-viewing habits.

Answer

Question:9

FURTHER DETAILS

The NAEP Guide

Que

stion

9

2929

NAEP Subject-MatterAchievement

NAEP assessments provide a great dealof information about students’ knowledgeand abilities. The usefulness of the infor-mation obtained through the main andstate NAEP is maximized by presentingaverage scores and percentiles on NAEPsubject scales and by presenting the per-centages of students attaining specificNAEP achievement levels. (Results fromthe long-term trend component of thenational assessment are reported only interms of NAEP subject-area scale scores,not achievement levels.)

Starting with the 1990 assessment, theNational Assessment Governing Board(NAGB) developed achievement levelsfor each subject at each grade level tomeasure how well students’ actualachievement matches the achievementdesired of them. Thus, NAEP results pro-vide information about what studentsknow and can do (mapped on the NAEPsubject scale), and they indicate the extentto which student achievement meetsexpectations of what students should know

and should be able to do (mapped as anachievement level).

NAEP Subject ScalesFor each subject area assessed, student

responses are analyzed to determine thepercentage of students responding cor-rectly to each multiple-choice questionand the percentage of students perform-ing in each of the score categories for constructed-response questions. Itemresponse theory (IRT) methods are usedto generate scales that summarize perfor-mance on the primary dimensions of thecurricular frameworks used to developthe assessment. For example, three scaleswere developed for the NAEP 1998 read-ing assessments: reading for literary expe-rience, reading to gain information, andreading to perform a task. A reading com-posite scale based on the weighted aver-age of the three scales was also developedand used as the principal measure forreporting NAEP results. Because thescales for each NAEP subject are devel-oped independently, results cannot becompared across subjects.

Achievement Level Policy DefinitionsBasic Partial mastery of prerequisite knowledge and skills that are fun-

damental for proficient work at each grade.

Proficient Solid academic performance for each grade assessed. Students

reaching this level have demonstrated competency over challeng-

ing subject matter, including subject-matter knowledge, application

of such knowledge to real-world situations, and analytical skills

appropriate to the subject matter.

Advanced Superior performance.

The NAEP Guide30

Que

stion

9

Achievement LevelsFor the three subjects to be assessed in

2000, results will be reported using theachievement levels authorized by theNAEP legislation. A broadly representa-tive panel of teachers, education special-ists, and members of the general publicdefined these achievement levels, whichwere adopted by NAGB. The achieve-ment levels are based on collective judg-ments about what students should knowand should be able to do relative to the con-tent reflected in the NAEP assessmentframework.

For each grade assessed, three achieve-ment levels are defined: Basic, Proficient,and Advanced. The table on page 29 pre-sents the policy definitions of the threeachievement levels, and the Report Cardfor each subject contains detailed descrip-tions of the subject-specific achievementlevels.

The NAEP legislation requires that theachievement levels be used on a develop-mental basis until the Commissioner ofEducation Statistics determines, as theresult of a congressionally mandatedevaluation by one or more nationally rec-ognized evaluation organizations, thatthe achievement levels are “reasonable,valid, and informative to the public.”Upon review of the available informa-tion, the Commissioner of EducationStatistics agrees with the NationalAcademy’s recommendation that cautionneeds to be exercised in the use of thecurrent achievement levels, since in theopinion of the Academy, “... appropriatevalidity evidence for the cut scores islacking; and the process has producedunreasonable results.” Therefore, theCommissioner concludes that theseachievement levels should continue to beconsidered developmental and should

continue to be interpreted and used withcaution. The Commissioner and theGoverning Board believe that theachievement levels are useful for report-ing on trends in the educational achieve-ment of students in the United States.

Limitations of NAEPSimple or causal inferences related to

subgroup membership, the effectivenessof public and nonpublic schools, andstate- or district-level educational systemscannot be drawn using NAEP results. Forexample, performance differences ob-served among racial or ethnic subgroupsare almost certainly associated with abroad range of socioeconomic and educa-tional factors that are not addressed byNAEP assessments. In addition, the stu-dent participation rates and motivation ofstudents, particularly twelfth graders,should be considered when interpretingNAEP results.

NAEP does not, nor is it designed to,report scores for individuals. Therefore,student-level inferences should not bedrawn from the NAEP data.

The NAEP assessment results are mostuseful when they are considered in light ofother knowledge about the education sys-tem, such as trends in educational reform,changes in the school-age population, andsocietal demands and expectations.

Related Questions:Question 2: What subjects does NAEP

assess? How are the subjects chosen, and howare the assessment questions determined?What subjects were assessed in 1999? What subjects will be assessed in 2000?

Question 14: What contextual backgrounddata does NAEP provide?

Question 18: Who are the studentsassessed by NAEP?

The NAEP Guide

Que

stion

10

31

Answer

How does NAEP reliably score and process millions of student-composed responses?

Scoring a large number of constructed responses with a high level of reliabilityand within a limited time frame is essential to NAEP’s success. (In 1998,approximately 3.8 million constructed responses were scored.) To ensure reli-able, quick scoring, Educational Testing Service (ETS) and National ComputerSystems (NCS) take the following steps:

• develop focused, explicit scoring guides that match the criteria emphasizedin the assessment frameworks;

• recruit qualified and experienced scorers, train them, and verify their abili-ties through qualifying tests;

• employ an image-processing and scoring system that routes studentresponses directly to the scorers so they can focus on scoring rather thanpaper routing;

• monitor scorer consistency through ongoing reliability checks and assess the quality of scorer decision making through frequent backreading; and

• document all training, scoring, and quality control procedures in the technical reports.

The 2000 NAEP assessments will contain both constructed-response and multiple-choice questions. The constructed responses are scored using theimage-processing system, whereas the responses to the multiple-choice ques-tions are scored by scanning the test booklets. The following table summarizesthe scoring from the 1998 assessments.

Question:10

1998 NAEP Assessment: Main and StateConstructed Responses Scored in Reading, Writing, and Civics

Reading Writing Civics Total

Number of Questions

Responses Scored

Elapsed Scoring Time

Number of Scorers andLeaders Required

66 78 335

373,280 195,817 3,770,952

14 1/2 weeks

637

3,201,855

191

The NAEP Guide32

Que

stion

10

Developing Scoring GuidesScoring guides for the assessments are

developed using a multistage process.

First, scoring criteria are articulated.While the constructed-response tasks arebeing developed, an initial version of thescoring guides is drafted. Subject area andmeasurement specialists, the InstrumentDevelopment Committees, the NationalCenter for Education Statistics (NCES),and the National Assessment GoverningBoard (NAGB) review the scoring guidesto ensure that they include criteria consis-tent with the wording of the questions;are concise, explicit, and clear; and reflectthe assessment framework criteria.

Next, the guides are used to score stu-dent responses from the field test. Thecommittees and ETS staff use the resultsfrom this field test to further refine theguides. Finally, training materials are pre-pared. Assessment specialists from ETSselect examples of student responses fromthe actual assessment for each perfor-mance level specified in the guides.Selecting the examples provides a finalopportunity to refine the wording in thescoring guides, develop additional train-ing materials, and make certain that theguides accurately represent the assess-ment framework criteria.

The examples clearly express the committees’ interpretations of each per-formance level described in the scoringguides and help illustrate the full range ofachievement under consideration. Duringthe actual scoring process, the exampleshelp scorers interpret the scoring guidesconsistently, thereby ensuring the accurateand reliable scoring of diverse responses.

Recruiting and TrainingScorers

Recruiting highly qualified scorers toevaluate students’ responses is crucial tothe success of the assessment. A five-stagemodel is used for selecting and trainingscorers.

The first stage involves selecting scor-ers who meet qualifications specific to thesubject areas being scored. Prospectivescorers participate in a simulated scoringexercise and a series of interviews beforebeing hired. (Some applicants take anadditional exam for writing mechanics.)

Next, scorers are oriented to the projectand trained to use the image scoring sys-tem. This orientation includes an in-depthpresentation of the goals of NAEP and theframeworks for the assessments.

At the third stage, training materials,including sample papers, are prepared forthe scorers. ETS trainers and NCS scoringsupervisors read hundreds of studentresponses to select papers that representthe range of scores in the scoring guideswhile ensuring that a range of participat-ing schools; racial, ethnic, and gendergroups; geographic regions; and communi-ties is represented in the training papers.

In the fourth stage, ETS and NCS subject-area specialists train scorers usingthe following procedures:

• presenting and discussing the task to be scored and the task rationale;

• presenting the scoring guide and the anchor responses;

• discussing the rationale behind thescoring guide, with a focus on the

FURTHER DETAILS

The NAEP Guide

Que

stion

10

3333

criteria that distinguish the variouslevels of the guide;

• practicing the scoring of a commonset of sample student responses;

• discussing in groups each responsecontained in the practice scoring;and

• continuing the practice steps untilscorers reach a common understand-ing of how to apply the scoringguide to student responses.

In the final stage, scorers assigned toquestions that require long constructedresponses work through a qualificationround to ensure that they can reliablyscore student responses for extended constructed-response exercises. At everystage, ETS and NCS closely monitor scor-er selection, training, and quality.

Using the Image-BasedSystem

The image scoring system was de-signed to accommodate NAEP’s specialneeds while eliminating many of the com-plexities in paper-based training and scor-ing. First used in the 1994 assessment, theimage scoring system allows scorers toassess and score student responses online. To do this, student response bookletsare scanned, constructed responses aredigitized, and the images are stored forpresentation on a large computer monitor.The range of possible scores for an itemalso appears on the display, and scorersclick on the appropriate button for quickand accurate scoring.

Developed by NCS, the system facili-tates the training and scoring process byelectronically distributing responses tothe appropriate scorers and by allowingETS and NCS staff to monitor scorer

activities consistently, identifying prob-lems as they occur and implementingsolutions expeditiously.

The system enhances scoring reliabilityby providing tools to monitor the accura-cy of each scorer and allows scoringsupervisors to create calibration sets thatcan be used to prevent drift in the scoresassigned to questions. This tool is espe-cially useful when scoring large numbersof responses to a question, as occurs instate NAEP, which often has more than30,000 responses per question. The abilityto prevent drift and monitor potentialproblems while scorers evaluate the samequestion for a long period is crucial tomaintaining the high quality of scoring.

The image scoring system allows allresponses to a particular exercise to bescored continuously until the item is fin-ished. In an assessment such as NAEP,which utilizes a balanced incompleteblock (BIB) design (see question 20 formore detail), grouping all studentresponses to a single question and work-ing through the entire set of responsesimproves the validity and reliability ofscorer judgments.

Ensuring Rater ReliabilityRater reliability refers to the consisten-

cy with which individual scorers assign ascore to a question. This consistency iscritical to the success of NAEP, and ETSand NCS employ three methods for moni-toring reliability.

In the first method, called backreading,scoring supervisors review each scorer’swork to confirm that the scorer appliesthe scoring criteria consistently across alarge number of responses and that theindividual does so consistently acrosstime. Scoring supervisors evaluate

The NAEP Guide34

Que

stion

10

approximately 10 percent of each scorer’swork in this process.

In the second method, each group ofscorers performs daily calibration scoringso scoring supervisors can make sure thatdrift does not occur. Any time scorershave taken a break of more than 15 min-utes (e.g., after lunch, at the start of theworkday), they score a set of calibrationpapers that reinforces the scoring criteria.

Last, interrater reliability statistics con-firm the degree of consistency and reliabil-ity of overall scoring, which is measuredby scoring a defined percentage of theresponses a second time and comparingthe first and second scores.

Consistent performance among scorersis paramount for the assessment to pro-duce meaningful results. Therefore, ETSand NCS have designed the image scor-ing system to allow for easy monitoringof the scoring process, early identificationof problems, and flexibility in trainingand retraining scorers.

Measuring trends in student achieve-ment, whether short or long term, in-volves special scoring concerns. To main-tain a trend, scorers must train using thesame materials and procedures from pre-vious assessment years. Furthermore, reliability rates must be monitored within

the current assessment year, as well asacross years.

Despite consistent scoring standardsand extensive training, experience showsthat some discrepancies in scoring mayoccur between different assessment years.Thus, a random sample of 20 to 25 per-cent of the responses from the priorassessment is systematically interspersedamong current responses for rescoring.The results are used to determine thedegree of scoring agreement between thecurrent and previous assessments, and, ifnecessary, current assessment results areadjusted to account for any differences.

Documenting the ProcessAll aspects of scoring students’ con-

structed responses are fully documented.In addition to warehousing the actual stu-dent booklets, NCS keeps files of all train-ing materials and reliability reports. NCSrecords in its scoring reports all the proce-dures used to assemble training packets,train scorers, and conduct scoring. Thesescoring reports also include all methodsused to ensure reader consistency, all reli-ability data, and all quality control mea-sures. ETS also summarizes the basicscoring procedures and outcomes in itstechnical report.

The NAEP Guide

Que

stion

11

35

How does NAEP analyze the assessment results?

Before the data are analyzed, responses from the groups of studentsassessed are assigned sampling weights to ensure that their representationin NAEP results matches their actual percentage of the school population inthe grades assessed.

Based on these sampling weights, the analyses of national and state NAEPdata are conducted in two major phases for most subjects—scaling and esti-mation. During the scaling phase, item response theory (IRT) procedures areused to estimate the measurement characteristics of each assessment ques-tion. During the estimation phase, the results of the scaling are used to pro-duce estimates of student achievement. Subsequent analyses relate theseachievement results to the background variables collected by NAEP. BecauseIRT scaling is inappropriate for some groups of NAEP items, results aresometimes reported separately for each task or for each group of highlyrelated tasks in the assessment.

NAEP data are extremely important in terms of the cost to obtain them andthe reliance placed on the reports that use them. Therefore, the scaling andanalysis of these data are carefully conducted and include extensive qualitycontrol checks.

WeightingResponses from the groups of students

are assigned sampling weights to adjustfor oversampling or undersampling froma particular group. For instance, censusdata on the percentage of Hispanic stu-dents in the entire student population areused to assign a weight that adjusts theNAEP sample so it is representative of thenation. The weight assigned to a student’sresponses is the inverse of the probabilitythat the student would be selected for thesample.

When responses are weighted, none arediscarded, and each contributes to theresults for the total number of studentsrepresented by the individual studentassessed. Weighting also adjusts for vari-ous situations such as school and student

nonresponse because data cannot beassumed to be randomly missing. AllNAEP analyses described below are con-ducted using these sampling weights.

Scaling and EstimationNAEP uses IRT methods to produce

score scales that summarize the results foreach content area. Group-level statisticssuch as average scores or the percentagesof students exceeding specific scorepoints are the principal types of resultsreported by NAEP. However, NAEP alsoreports the results of various analyses,many of which examine the relationshipamong these group-level statistics andimportant demographic, experimental,and instructional variables.

Answer

Question:11

FURTHER DETAILS

The NAEP Guide36

Que

stion

11

Because of the reporting requirementsfor the national and state assessments andbecause of the large number of back-ground variables associated with eachassessment, thousands of analyses mustbe conducted. The procedures NAEP usesfor the analyses were developed to pro-duce accurate results while limiting thetesting burden on students. Furthermore,these procedures provide data that can bereadily used in secondary analyses.

The following procedures are used togenerate scale-score data files suitable foranalysis:

• After the computer files containingthe student responses have beenreceived, all cognitive and noncogni-tive questions are subjected to anextensive quality control analysis. An item analysis is then conductedon all of the questions. Project staffreview the item analysis results,searching for anomalies that maysignal unusual results or errors increating the database. Simultane-ously, each cognitive question isexamined for differential item func-tioning (DIF). DIF analyses identifyquestions on which the scores of dif-ferent subgroups of students who areat the same ability level differ signifi-cantly. Items showing DIF are exam-ined by experts for potential bias.

• After the item and DIF analyses havebeen completed, the IRT scalingphase begins. IRT scaling providesestimates of item parameters (e.g.,difficulty, discrimination) that definethe relationship between the itemand the underlying variable mea-sured by the test. Parameters of theIRT model are estimated for eachquestion, with separate scales beingestablished for each predefined

content area specified in the assess-ment framework. For example, in2000 the reading assessment forgrade 4 will have two scales describ-ing reading purposes. Mathematicswill have five content strands, andscience will have scales for the threefields of science. Because the itemparameters determine how eachquestion is represented in the con-tent-area scales, the psychometricstaff carefully verify that the IRTscaling model provides an acceptablerepresentation of the responses to thequestions. In particular, they exam-ine the fit of the model, by question,for the national and state assess-ments. Item-parameter estimationsfor state and national data are per-formed separately because the datacollection processes for these assess-ments differ.

• During the estimation phase, plausi-ble values (see below) are used tocharacterize content-area scale scoresfor students participating in theassessment. To keep student burdento a minimum, NAEP administersfew assessment items to each student—too few to produce accu-rate content-related scale scores foreach student. To account for this, foreach student NAEP generates fivepossible content-related scale scoresthat represent selections from thedistribution of content-related scalescores of students with similar back-grounds who answered the assess-ment items the same way. The plausible-values technology is oneway to ensure that the estimates ofthe average performance of studentpopulations and the estimates ofvariability in those estimates aremore accurate than those determined

The NAEP Guide

Que

stion

11

3737

through traditional procedures,which estimate a single score foreach student. During the constructionof plausible values, careful qualitycontrol steps ensure that the subpop-ulation estimates based on these plau-sible values are accurate. Plausiblevalues are constructed separately foreach national sample and for eachjurisdiction participating in the stateassessment.

• As a final step in the analysisprocess, the results of assessmentsinvolving a year-to-year trend or astate component are linked to thescales for the related assessments.For national NAEP, results are linkedto the scales used in previous assess-ments of the same subject. For stateNAEP, results for the current yearare linked to the scales for thenation. Linking scales in this wayenables state and national trends tobe studied. Comparing the scale dis-tributions for the scales being linkeddetermines the adequacy of the link-ing function, which is assumed to belinear.

Plausible Values NAEP’s assessment frameworks call

for comprehensive coverage of each of thevarious subject areas—mathematics, sci-ence, reading, writing, civics, the arts, andothers. In theory, given a sufficient num-ber of items in a content area (a singlescale within a subject-matter area), perfor-mance distributions for any populationcould be determined for that content area.However, NAEP must minimize its bur-den on students and schools by keepingassessment time brief. To do so, NAEPbreaks up any particular assessment intoa dozen or more blocks, consisting of

multiple items, and administers only twoor three blocks of items to any particularstudent.

This limitation results in any given stu-dent responding to only a small numberof assessment items for each content area.As a result, the performance of any partic-ular student cannot be measured accurate-ly. The impact of this student-level impre-cision has two important consequences:First, NAEP cannot report the proficiencyof any particular student in any givencontent area (see Question 4); and second,traditional statistical methods that rely on point estimates of student proficiencybecome inaccurate and ineffective.

Unlike traditional standardized testingprograms, NAEP must often change itstest length, test difficulty, and balance ofcontent to provide policymakers with cur-rent, relevant information. To accommo-date this flexibility, NAEP uses methodsthat permit substantial updates betweenassessments but that remain sensitiveenough to measure small, real changes instudent performance. The use of IRT pro-vides the technique needed to keep theunderlying content-area scales the same,while allowing for variations in test prop-erties such as changes in test length, minordifferences in item content, and variationsin item difficulty. NAEP estimates IRTparameters using the technique of margin-al maximum likelihood, a statisticalmethodology. Estimations of NAEP scalescore distributions are based on an esti-mated distribution of possible scale scores,rather than point estimates of a single scalescore. This approach allows NAEP to pro-duce accurate and statistically unbiasedestimates of population characteristics thatproperly account for the imprecision instudent-level measurement.

The NAEP Guide38

Que

stion

11

Marginal maximum likelihood methodsare not well known or easily available tosecondary analysts of NAEP data. Sincemost standard statistical packages pro-vide only statistical methods that rely onpoint estimates of student proficiencies,rather than estimates of distributions, asthe basis of their calculations, secondaryanalysts need an analog of point esti-mates that can function well with stan-dard statistical software. For this reason,NAEP uses the plausible-values method-ology as a workable alternative for sec-ondary analysts.

Essentially, plausible-values methodol-ogy represents what the true performanceof an individual might have been, had itbeen observed, using a small number ofrandom draws from an empirically de-rived distribution of score values basedon the student’s observed responses toassessment items and on backgroundvariables. Each random draw from thedistribution is considered a representativevalue from the distribution of potentialscale scores for all students in the samplewho have similar characteristics and iden-tical patterns of item responses. Thedraws from the distribution are differentfrom one another to quantify the degreeof precision (the width of the spread) inthe underlying distribution of possiblescale scores that could have caused theobserved performances.

The NAEP plausible values functionlike point estimates of scale scores formany purposes, but they are unlike truepoint estimates in several respects. First,

they differ from one another for any particular student, and the amount of difference quantifies the spread in theunderlying distribution of possible scalescores for that student. Secondary ana-lysts must analyze the spread among theplausible values and must not analyzeonly one of them as if it were a true stu-dent scale score. Second, the plausible-values methodology can recover any ofthe potential interrelationships amongscore scales and subpopulations definedby background variables that have beenbuilt into the plausible values when theywere generated. Although NAEP builds agreat many background variables into theplausible value estimation, the relation-ships of any new variables (those notincorporated into the generation of theplausible values) to student scale scoresmay not be accurately estimated. Becauseof the plausible-values approach, sec-ondary researchers can use the NAEP datato carry out a wide range of analyses.

SummaryThe NAEP scaling and estimation pro-

cedures yield unbiased estimates whosequality is ensured through numerousquality control steps. NAEP uses IRT sothat NAEP staff and secondary analystscan efficiently complete extensive, de-tailed analyses of the data. Plausible-values scaling technology enables NAEPto conduct second-phase analyses andreport these results in various publica-tions such as the NAEP 1998 ReadingReport Card for the Nation and the States.

The NAEP Guide

Que

stion

12

39

How does NAEP ensure the comparability of results amongthe state assessments and between the state and nationalassessments?

NAEP data are collected using a closely monitored and standardizedprocess. The tight controls that guide the data collection process help ensure the comparability of the results generated for the main and the state assessments.

Main and state NAEP use the same assessment booklets, and they areadministered during overlapping times. Although the administrationprocesses for the assessments differ somewhat, statistical equating proce-dures that link the results from main and state NAEP to a common scalefurther ensure comparability. Comparing the distributions of student abilityin both samples confirms the accuracy of this process and justifies reportingthe results from the national and state components on the same scale.

Equating Main and StateAssessments

State NAEP enables each participatingjurisdiction to compare its results withthose for the nation and with those for theregion of the country where it is located.However, before these comparisons canbe made, data from the state and mainassessments must be scaled separately forthe following reasons:

• The assessments use differentadministration procedures (Westatstaff collect data for main NAEP,whereas individual jurisdictions col-lect data for state NAEP).

• Motivational differences may existbetween the samples of students participating in the main and stateassessments.

For meaningful comparisons, resultsfrom the main and state assessments mustbe equated so they can be reported on a

common scale. Equating the resultsdepends on those parts of the main andstate samples that represent a commonpopulation. Because different individualsparticipate in the national and stateassessments of the same subject, twoindependent samples from the entire population are drawn from each gradeassessed. These samples consist of the following:

• students tested in the nationalassessment who come from the juris-dictions participating in the stateNAEP (called the state comparisonsample, or SCS); and

• the aggregation of all students testedin the state NAEP (called the stateaggregate sample, or SAS).

For the NAEP 2000 science and mathe-matics assessments, equating and scaling

Answer

Question:12

FURTHER DETAILS

The NAEP Guide40

Que

stion

12

national results with state results willoccur through the common populationsof fourth- and eighth-grade public schoolstudents. Separate scales will be devel-oped for the two components and subse-quently linked by equating the scale-scoremeans and standard deviations for theSCS and SAS samples.

Verifying ComparabilityAfter the scales have been linked, addi-

tional analyses verify the comparability of the scales for the state and nationalresults. The verification process includescomparing the shapes of the distributionsfor the SCS and SAS samples. For thiscomparison, each scale is divided intoeven intervals, and the percentages of stu-dents whose scores fall into each intervalare estimated. If these distributions are

similar in shape, the percentages in anygiven interval will be similar on bothscales, and the linking will have producedcomparable scales. For the assessmentsthat have occurred since this procedurewas introduced, the distributions for the two components have proved verysimilar.

Other checks are conducted to verifythe comparability. For a more detailedexplanation of the linking process, see the Technical Report of the 1996 StateAssessment Program in Science.

Related Questions:Question 19: How many schools and stu-

dents participate in NAEP? When are thedata collected during the school year?

Question 21: What are NAEP’s proce-dures for collecting data?

The NAEP Guide

Que

stion

13

41

What types of reports does NAEP produce? What reportsare planned for the 1999 and 2000 assessments?

NAEP has developed an information system that provides various nationaland local audiences with the results needed to help them monitor andimprove the educational system. To have maximum utility, NAEP reportsmust be clear and concise, and they must be delivered in a timely fashion.

NAEP has produced a comprehensive set of reports for the 1998 assess-ments in reading, writing, and civics, which are targeted to specific audi-ences. The audiences interested in NAEP results include parents, teachers,school administrators, legislators, and researchers. Targeting each report to a segment of these interested audiences increases its impact and appeal.Selected NAEP reports are available electronically on the World Wide Web(http://nces.ed.gov/nationsreportcard), which makes them more accessible.The 2000 reports in mathematics and science and grade 4 reading willresemble those for the 1998 assessments.

Reports for DifferentAudiences

NAEP reports are technically soundand address the needs of various audi-ences. For the 2000 assessments, NAEPplans to produce the following reports,most of which will be placed on theNational Center for Education Statistics(NCES) Web site (http://nces.ed.gov/nationsreportcard).

NAEP Report Cards address the needsof national and state policymakers andpresent the results for selected demo-graphic subgroups defined by variablessuch as gender, race or ethnicity, and parents’ highest level of education.

Highlights Reports are nontechnicalreports that directly answer questions frequently asked by parents, local schoolboard members, and members of the concerned public.

Instructional Reports, which includemany of the educational and instructionalmaterials available from NAEP assess-ments, are designed for educators, schooladministrators, and subject-matterexperts.

State Reports, one for each participat-ing state, are intended for state policy-makers, state departments of education,and chief state school officers. Custom-ized reports will be produced for eachjurisdiction that participates in the NAEP2000 state mathematics and scienceassessments, highlighting the results forthat jurisdiction. Mathematics results willbe reported at the state level for the thirdtime since 1992, and science results willbe reported at the state level for the sec-ond time. The NAEP 2000 State Reportswill build on the computer-generated

Answer

Question:13

FURTHER DETAILS

The NAEP Guide42

Que

stion

13

reporting system that has been used suc-cessfully since 1990. As with past stateassessments, state testing directors andstate NAEP coordinators will help produce the NAEP 2000 State Reports.

Cross-State Data Compendia, first pro-duced for the state reading assessment in1994, are designed for researchers andstate testing directors. They serve as refer-ence documents that accompany otherreports. The Compendia present state-by-state results for the variables discussed inthe State Reports.

Trend Reports describe patterns andchanges in student achievement as mea-sured through the long-term trend assess-ments in mathematics, science, reading,and writing. These reports present trendsfor the nation and for selected demo-graphic subgroups defined by variablessuch as race or ethnicity, gender, region,parents’ highest level of education, andtype of school.

Focused Reports explore in-depth ques-tions with broad educational implications.They provide information to educators,policymakers, psychometricians, andinterested citizens.

Summary Data Tables present exten-sive tabular summaries based on back-ground data from the student, teacher,and school questionnaires. A new Webtool for the presentation of these data was

introduced in conjunction with the datafrom the 1998 assessment. The NAEPSummary Data Tables Tool is designed topermit easy access to NAEP results. Thetool enables users to customize tables tomore easily examine desired results.Users can also print tables and extractthem to spreadsheet and word processingprograms. The tool is available from theNAEP Web site (http://nces.ed.gov/nationsreportcard) and will also be avail-able on CD–ROM.

Technical Reports document all detailsof a national or state assessment, includ-ing the sample design, instrument devel-opment, data collection process, andanalysis procedures. Technical Reports onlyprovide information about how theresults of the assessment were derived;they do not present the actual results.One technical report will describe theentire 1998 NAEP, including the nationalassessments, the state reading assessment,and the state writing assessment. TechnicalReports are also planned for the 1999assessment and the 2000 assessment.

In addition to producing these reports,NAEP provides states and local schooldistricts with continued service to helpthem better understand and utilize theresults from the assessments. The processof disseminating and using NAEP resultsis continually examined to improve theusefulness of these reports.

The NAEP Guide

Que

stion

14

43

What contextual background data does NAEP provide?

In addition to testing cognitive abilities, NAEP collects information from participat-ing students, teachers, and principals about hundreds of contextual backgroundvariables regarding student, teacher, and school characteristics; instructional prac-tices; and curricula. When developing the questionnaires used to gather this infor-mation, NAEP ensures that the questions do not infringe on respondents’ privacy,that they are grounded in research, and that the answers can provide informationrelevant to the debate about educational reform.

The questionnaires appear in separately timed blocks of questions in the assessmentbooklets, such as the student questionnaires, or, like the teacher, school, and SD/LEP(students with disabilities or limited English proficiency) questionnaires, they areprinted separately. Four general sources provide context for NAEP results:

• student questions, which examine background characteristics and subject area experience;

• teacher questionnaires, which gather data on background and training andclassroom-by-classroom information;

• school questionnaires; and

• SD/LEP questionnaires.

These questionnaires were developed using a framework and process similar to thatused for the cognitive questions. This process included reviews by external adviso-ry groups, field testing, and additional reviews.

For the main and state NAEP, the student questions appear in noncognitive blocks.The background characteristic questions vary somewhat by grade level within asubject, and the subject area experience questions differ slightly by grade level with-in a subject. Unlike the cognitive blocks, these noncognitive blocks do not differamong the assessment booklets for a given grade and subject.

The teacher questionnaires vary based on subject area and may differ by gradelevel. The school questionnaires are completed by a school official for each grade of students participating in the assessment.

Answer

Question:14

The NAEP Guide44

Que

stion

14

Student QuestionnairesStudent answers to background ques-

tions are used to gather information aboutfactors such as race or ethnicity, schoolattendance, and academic expectations.Answers on those questionnaires alsoprovide information about factorsbelieved to influence academic perfor-mance, including homework habits, thelanguage spoken in the home, and thequantity of reading materials in the home.Because many of these questions docu-ment changes that occur over time, theyremain unchanged over assessment years.

Student subject area questions gatherthree categories of information: timespent studying the subject, instructionalexperiences in the subject, and attitudestoward and perceptions about the subjectand the test. Because these questions arespecific to each subject area, they canprobe in some detail the use of special-ized resources such as calculators inmathematics classes.

Teacher QuestionnairesTo provide supplemental information

about the instructional experiencesreported by students, the teacher for thesubject in which students are beingassessed completes a questionnaire aboutinstructional practices, teaching back-ground, and related information.

Part I of the teacher questionnaire,which covers background and generaltraining, includes questions concerningrace or ethnicity, years of teaching experi-ence, certifications, degrees, major andminor fields of study, course work in edu-cation, course work in specific subject

areas, the amount of in-service training,the extent of control over instructionalissues, and the availability of resourcesfor the classroom.

Part II of the teacher questionnaire,which covers training in the subject areaand classroom instructional information,contains questions concerning the teacher’sexposure to issues related to the subjectand the teaching of the subject. It also asksabout pre- and in-service training, the abili-ty level of the students in the class, thelength of homework assignments, use ofparticular resources, and how students areassigned to particular classes.

School QuestionnairesThe school questionnaire is completed

by the principal or another official of theschool. This questionnaire asks about thebackground and characteristics of theschool, including the length of the schoolday and year, school enrollment, absen-teeism, dropout rates, size and composi-tion of the teaching staff, tracking policies,curricula, testing practices, special priori-ties, and schoolwide programs and prob-lems. This questionnaire also collects infor-mation about the availability of resources,policies for parental involvement, specialservices, and community services.

SD/LEP QuestionnaireThe SD/LEP questionnaire is complet-

ed by teachers of those students who wereselected to participate in NAEP and whowere classified as SD or LEP, or who hadIndividual Education Plans (IEPs) orequivalent classification. The SD/LEP

FURTHER DETAILS

The NAEP Guide

Que

stion

14

4545

questionnaire gathers information aboutthe background and characteristics of eachstudent and the reason for the SD/LEPclassification. For a student classified asSD, the questionnaire requests informa-tion about the student’s functional gradelevel, mainstreaming, and special educa-tion programs. For a student classified asLEP, questions ask about the student’snative language, time spent in specialeducation and language programs, andthe level of English language proficiency.

NAEP policy states that if any doubtexists about a student’s ability to partici-pate, the student should be included in the assessment. Beginning with the 1996assessments, NAEP has allowed moreaccommodations for both categories ofstudents.

Related Question:Question 7: How does NAEP accommo-

date students with disabilities and studentswith limited English proficiency?

The NAEP Guide46

Que

stion

15

How can educators use NAEP resources such as frameworks,released questions, and reports in their work?

NAEP materials such as frameworks, released questions, and reports have many uses in the educational community. For instance, states have considered NAEP frame-works when revising their curricula. Also, released constructed-response questionsand their corresponding scoring guides have served as models of innovative assess-ment practices.

NAEP findings are reported in many publications specifically targeted to educators.Furthermore, NAEP staff host seminars to discuss NAEP results and their implications.

NAEP FrameworksNAEP frameworks present and explain

what experts in a particular subject areaconsider important. Each framework out-lines the subject, often providing exam-ples, in ways that may give teachers andcurriculum planners new perspectivesabout their fields.

Frameworks frequently provide theo-retical information about problem solvingthrough their descriptive classifications ofcognitive levels (e.g., in the mathematicsframework, conceptual understanding,procedural knowledge, and problem solv-ing). Educators can relate these cognitivelevels to various subject content areas andevaluate how classroom instruction andassessment focus on each cognitive level.For example, an instructor may study themathematics framework and see thatmost of his or her instruction addressesonly procedural knowledge. The instruc-tor can then include more problems at ahigher cognitive level, perhaps followingexamples suggested in the framework.

Several states have used the frameworkpublications to review their NAEP resultsand develop recommendations for teach-ers. Furthermore, NAEP staff memberswho are familiar with particular frame-works have lent their expertise to statecurriculum committees.

Released NAEP Questions After each assessment, the National

Center for Education Statistics (NCES)releases nearly one-third of the questions,making copies of them available to theinterested public. The packages contain-ing the released questions include answerkeys, content and process descriptions,and information about the percentagesof students who answered the questionscorrectly.

Released questions are also availableon the NAEP Web site (http://nces.ed.gov/nationsreportcard). The samplequestions tool displays test questions,along with sample student responses and

Answer

Question:15

FURTHER DETAILS

The NAEP Guide

Que

stion

15

4747

scoring guides from the assessment. Thetest questions can be downloaded andprinted directly from the Web site.

Released questions often serve as mod-els for teachers who wish to develop theirown classroom assessments. One schooldistrict used released NAEP readingquestions to design a districtwide test,and another school district used scoringguides for released reading questions totrain its teachers in how to construct scor-ing guides.

NAEP ReportsNAEP reports such as the focus report

on mathematical problem solving provideteachers with useful information. NAEP

staff have also conducted seminars forschool districts across the country to discuss NAEP results and their implica-tions at the local level. In 1996, NCESbegan placing NAEP reports andalmanacs on its World Wide Web site(http://nces.ed.gov/nationsreportcard)for viewing, printing, and downloading.Web access should increase the utilityof NAEP results.

Related Question:Question 4: Can parents examine the

questions NAEP uses to assess studentachievement? Can parents find out how well their children performed in the NAEPassessment? Why are NAEP questions keptconfidential?

The NAEP Guide48

Que

stion

16

How are NAEP data and assessment results used to further exploreeducation and policy issues? What technical assistance doesNAEP provide?

The National Center for Education Statistics (NCES) grants members of the educa-tional research community permission to use NAEP data. Educational Testing Service(ETS) provides technical assistance, either as a public service or under contract, inusing these data.

NAEP results are provided in formats that the general public can easily access.Tailored to specific audiences, NAEP reports are widely disseminated. Since the 1994assessment, reports and almanacs have been placed on the World Wide Web to pro-vide even easier access.

NAEP DataBecause of its large scale, the regularity

of its administration, and its rigid qualitycontrol process for data collection andanalysis, NAEP provides numerousopportunities for secondary data analysis.NAEP data are used by researchers whohave many interests, including educatorswho have policy questions and cognitivescientists who study the development ofabilities across the three grades assessedby NAEP.

World Wide Web PresenceBeginning with the 1994 assessment,

NCES began placing NAEP reports andalmanacs on its World Wide Web site(http://nces.ed.gov/nationsreportcard)for viewing, printing, and downloading.

Software and Data ProductsNAEP has developed products that

support the complete dissemination ofNAEP results and data to many analysisaudiences. ETS began developing these

data products for the 1990 NAEP, addingnew capabilities and refinements in sub-sequent assessments.

In addition to the user guides and aversion of the NAEP database for sec-ondary users on CD-ROM, these otherproducts are available:

• the NAEP Summary Data TablesTool for searching, displaying, andcustomizing cross-tabulated variabletables (available on the NAEP Website and on CD-ROM); and

• the NAEP Data Tool Kit, includingNAEPEX, a data extraction programfor choosing variables, extractingdata, and generating SAS and SPSScontrol statements, and analysismodules for cross-tabulation andregression that work with SPSS and Excel (available on disk).

ETS and NCES conduct workshops onhow to use these products to promotesecondary analyses of NAEP data.

Answer

Question:16

FURTHER DETAILS

The NAEP Guide

Que

stion

16

4949

NAEP Technical Assistance

SeminarsFrequently, NCES offers a four-day

seminar—the NCES Advanced StudiesSeminar on the Use of the NAEP Databasefor Research and Policy Discussion. Thisseminar stimulates interest in using NAEPdata to address educational research questions, enhances participants’ under-standing of the methodological and tech-nological issues relevant to NAEP, anddemonstrates the steps necessary for con-ducting accurate statistical analyses ofNAEP data. In addition to offering formaland hands-on instruction, the seminarhelps participants learn about and workwith currently available software pack-ages specifically designed for NAEPanalyses.

Special AnalysesUnder the cooperative agreement with

NCES, ETS develops software for sec-ondary users of NAEP data and regularlyprovides them with technical assistanceeither by supplying information aboutdata characteristics or by contracting withthem to run analyses.

Special StudiesNAEP meets requests for the empirical

investigation of educational issues. Nearlyevery NAEP assessment contains a specialstudy that gathers specific informationexpressly requested by the public andNCES.

State-Level Forum forDiscussing EducationalIssues and Policy

The NAEP NETWORK provides astate-level forum for educational con-cerns. Participants in these meetingsinclude testing directors; NAEP coordina-tors from individual states, territories,and other jurisdictions; and representa-tives from nonpublic school organizationsand associations. The NAEP NETWORKalso offers information about upcomingassessments and enables those involvedin state NAEP to offer their input.

Related Question: Question 13: What types of reports does

NAEP produce? What reports are planned forthe 1999 and 2000 assessments?

The NAEP Guide50

Que

stion

17

Can NAEP results be linked to other assessment data?

In recent years there has been considerable interest among education policymakersand researchers in linking NAEP results to other assessment data. Much of this interest has been centered on linking NAEP to international assessments. The 1992NAEP mathematics assessment results were successfully linked to those from theInternational Assessment of Educational Progress (IAEP) of 1991, and the 1996 grade 8NAEP assessments in mathematics and science have been linked to the results of theThird International Mathematics and Science Study (TIMSS) of 1995. Also, a numberof activities have focused on linking NAEP to state assessment results. Promotinglinking studies with international assessments and assisting states and school districtsin linking their assessments to NAEP are key aspects of the National AssessmentGoverning Board’s (NAGB’s) policy for redesigning NAEP.

Linking NAEP toInternational Assessments

The International Assessment ofEducational Progress (IAEP)

Pashley and Phillips (1993) investigatedlinking mathematics performance on the1991 IAEP to performance on the 1992NAEP. In 1992, they collected sample datafrom U.S. students who were adminis-tered both instruments. (Colorado drew alarge enough sample to compare itselfwith all 20 countries that participated inIAEP.)

The relation between mathematics pro-ficiency in the two assessments was mod-eled using regression analysis. This modelwas then used for projecting IAEP scoresfrom non-U.S. countries onto the NAEPscale.

The authors of the study consideredtheir results very encouraging. The rela-tion between the IAEP and NAEP assess-ments was relatively strong and could bemodeled well. However, as the authors

pointed out, the results should be consid-ered only in the context of the similarconstruction and scoring of the twoassessments. Thus, they advised thatother studies should be initiated cautious-ly, even though the path to linking assess-ments was better understood.

The Third InternationalMathematics and Science Study (TIMSS)

In 1989, the United States expressed aninterest in international comparisons,especially in mathematics and science.That year, the National Education Summitadopted goals for education. Goal 4 statesthat American students shall be first inthe world in mathematics and scienceachievement by the year 2000. Since thatpronouncement, various approaches havebeen suggested for collecting the data thatcould help monitor progress toward thatgoal.

Answer

Question:17

FURTHER DETAILS

The NAEP Guide

Que

stion

17

5151

The 1995 TIMSS presented one of thebest opportunities for comparison. Thedata from this study became available atapproximately the same time as theNAEP data for the 1996 mathematics and science assessments. Because the twoassessments were conducted in differentyears and no students responded to bothassessments, the regression procedurethat linked the NAEP and IAEP assess-ments could not be used. Therefore, theresults from the NAEP and TIMSS assess-ments were linked by matching theirscore distributions (Johnson & Owen,1998). A comparison of linked grade 8results with actual grade 8 results fromstates that participated in both assess-ments suggested that the link was working acceptably.

A research report based on this linking(Johnson, Siegendorf, & Phillips, 1998)provides comparisons of the mathematicsand science achievement of each U.S.jurisdiction that participated in the stateNAEP with that of each country that par-ticipated in TIMSS. However, as was thecase with the IAEP link, these compar-isons need to be interpreted cautiously.

The same linking approach did not pro-duce satisfactory results at grade 4, and nocomparisons at this grade have been re-ported. Studies to date have yielded noinformation as to why the distributionmatching method produced acceptableresults at one grade but unacceptable re-sults at the other. The National Center forEducation Statistics (NCES) plans to re-peat the linking of NAEP and TIMSS aspart of the NAEP 2000 assessment. How-ever, in this linking effort, a sample of stu-dents will be administered both the NAEPand TIMSS assessments. As a result,regression-based procedures like thoseused in the NAEP-to-IAEP linking can beemployed. It is hoped that the use of these

procedures will provide useful linkages atall grades.

Linking NAEP to StateAssessments

One way in which NAEP can be mademost useful to state education agencies isby providing a benchmark against whichthey can compare the results of the censusassessments they carry out in theirschools. If the state assessment scores aremapped onto the NAEP scale, then stateswill be in a position to make strongerstatements about the implications of per-formance on their state assessment thanwould otherwise be possible. If the pro-jections are valid, schools and districtscan compare their scores not only to thestate as a whole but also to the entirenation and, based on linkage of NAEPto international assessments, to othermajor countries in the world.

Because this capability is valuable,states have examined alternative methodsfor developing linkages between theirassessments and NAEP assessments.Many methods may not produce validresults because they do not link the oper-ational state assessment to the operationalNAEP on which the scales are defined.Varying the context in which an item or a test is administered varies the perfor-mance of students on that item, so that avalid linkage must be based on linkingscores on actual administrations of NAEPand state assessments.

Building on the earlier work of Linn(1993), Bloxom et al. (1995), and Williamset al. (1995), McLaughlin (1998a) exploredthe feasibility and validity of regression-based linkings based on matching stateassessment scores of students to NAEPperformance records. Using the 1996 state NAEP grade 4 and 8 mathematics

The NAEP Guide52

Que

stion

17

assessments in four states, he found that:(a) it is feasible to develop the linkage ofstudent records without violating eitherNAEP or state assessment confidentialityassurances, and (b) in three of the fourstates, acceptably accurate projections ofgroup-level NAEP scores and percentagesat achievement levels could be obtained.

McLaughlin (1998b) found that in orderto be neutral (i.e., so that comparisonsbased on projected NAEP scores lead tothe same conclusions as comparisonsbased on actual NAEP scores), it was nec-essary that the regression models includeexplicit terms for school mean scores, aswell as individual student scores, andexplicit terms for demographic measures.Like others (Linn & Kiplinger, 1993;

Shepard, 1997), he also found that projec-tion functions did not necessarily general-ize across years.

As part of the exploration of linkage,McLaughlin developed detailed guide-lines for validating score projections andfor conveying the appropriate level ofuncertainty in statements based on projec-tions. McLaughlin’s methodology forlinking and for the development and validation of projections is being madeavailable to states, including both thedevelopment and evaluation of the link-age database and the execution of a specified series of analyses to derive theappropriate lining function and evaluateits precision, neutrality, and stability.

The NAEP Guide

Que

stion

18

53

Who are the students assessed by NAEP?

NAEP does not, and is not designed to, report on the performance of indi-vidual students. Rather, it assesses specific populations of in-school studentsor subgroups of these populations, reporting on their performance in select-ed academic areas. NAEP results are based on samples of these student populations of interest.

NAEP assesses representative samples of students in certain grades or atcertain ages in public and nonpublic schools in the United States. For themain NAEP assessment, students are selected from grades 4, 8, and 12 inpublic and nonpublic schools. For the state assessments, the number of sub-jects and grades selected varies, depending on available funding. (State-levelsamples have included public and nonpublic school students since 1994. In2000, however, nonpublic schools will not be included in the state-level sam-ples.) For the long-term trend assessments, students at grades 4, 8, and 11are sampled for writing, and students at ages 9, 13, and 17 are sampled forscience, mathematics, and reading. Since 1984, NAEP has been administeredevery two years. Beginning in 2000, NAEP will be administered annually.

Sampling StudentsNAEP does not report on the perfor-

mance of individual students. Instead, themain and state assessments are adminis-tered every two years to representativesamples of students at certain ages or incertain grades in public and nonpublicschools. Assessment results are based onthe performance of students in these sam-ples. For this reason, NAEP has developedcomplex sampling designs that produceprecise estimates of student performanceand maximize the information availablegiven scarce resources such as students’and teachers’ time.

The number of children in any particu-lar grade in the United States is approxi-mately 3.5 million, and the number of

students selected for NAEP’s nationalsamples for any particular grade and sub-ject is 7,000 or more. Therefore, NAEP’ssample consists of approximately 0.2 per-cent of the student population for eachgrade and subject.

Although only a very small percentageof the student population in each grade isassessed, NAEP estimates are accuratebecause they depend on the absolutenumber of students participating, not onthe relative proportion of students. Thus,all or nearly all of the schools and stu-dents selected must participate in theassessment to ensure that the NAEP sam-ple is truly representative of the nation’sstudent population.

Answer

Question:18

FURTHER DETAILS

The NAEP Guide54

Que

stion

18

Ensuring RepresentativeSamples

As the Nation’s Report Card, NAEPmust report accurate results for popula-tions of students and subgroups of thesepopulations (e.g., minority students orstudents attending nonpublic schools). To ensure accurate results, the relativelysmall samples of students selected for theNAEP assessments must be truly repre-sentative of the entire student population.

Every school has some known chanceof being selected for the sample. Within aselected school, all students within a par-ticipating grade have an equal chance ofbeing selected. The probability of stu-dents and schools being selected into thesample varies based on factors such asgrade, subject, public and nonpublic sta-tus, and so on.

Uncertainty about the validity of theresults can arise, however, if selectedschools decide not to participate, if select-ed students are absent or refuse to partici-pate, or if schools or students volunteer tojoin the NAEP sample. For this reason,NAEP encourages the participation of allthose selected, but is not able to acceptvolunteers.

StratificationA multistage design that relies on strat-

ification (i.e., classification into groupshaving similar characteristics) is used tochoose samples of student populations.To ensure an accurate representation, thesamples are randomly selected fromgroups of schools that have been stratifiedby variables such as region of the country,extent of urbanization, percentage ofminority enrollment, and median house-hold income.

The samples drawn for state assess-ments are separate from those drawn fornational assessments. However, becausestate NAEP assessments are voluntary,some states may choose not to participate.Therefore, the aggregate of the state sam-ples may not be a representative sampleof the nation.

For main and long-term trend NAEPassessments, the sampling design hasthree steps:

• selection of primary sampling units(PSUs), the geographic areas definedas counties or groups of counties;

• selection of schools (public and nonpublic) within the selected areas; and

• selection of students within thoseschools.

Stratification begins when PSUs areidentified based on region of the country(Northeast, Southeast, Central, and West)and when PSUs within each region aredesignated as urban or rural. In someregions, PSUs are classified by the per-centage of Black or Hispanic studentsenrolled in their school populations.Further strata may be defined using other variables such as median householdincome, education level of residents over25 years of age, or other demographiccharacteristics. Once the selection of thePSUs is completed, the schools withineach PSU are assigned a probability ofselection that is proportional to the num-ber of students per grade in each school.

For state NAEP 2000 assessments, thesampling design has two steps:

• selection of public schools within the selected areas; and

• selection of students within thoseschools.

The NAEP Guide

Que

stion

18

5555

To ensure that the state samples pro-vide an accurate representation, publicschools are stratified by urbanization,minority enrollment, and the results ofstate achievement tests or median house-hold income.

OversamplingTo ensure that the results reported for

major subgroups of populations are accu-rate, oversampling (i.e., sampling particulartypes of schools at a higher rate than theyappear in the population) is necessary. Forexample, main NAEP oversamples non-public schools and schools with largeminority populations—which ensures thatthe sample contains adequate numbers ofstudents attending nonpublic schools andstudents from particular racial or ethnicsubgroups to accurately estimate the per-formance of those subgroups.

WeightingIf these samples are to be representative

of the population as a whole, however, thedata from the students in the oversampledschools must be properly weighted duringanalysis. Weighting accounts for the dis-proportionate representation of certainsubgroups that occurs because of oversam-pling. Similarly, weighting also accountsfor low sampling rates that can occur forvery small schools. Thus, when properlyweighted, NAEP data provide results thatreflect the representative performances ofthe entire nation and of the subpopula-tions of interest.

Related Question:Question 1: What is NAEP?

The NAEP Guide56

Que

stion

19

How many schools and students participate in NAEP? When arethe data collected during the school year?

National assessment samples typically include nearly 100,000 students from 2,000 schools. The state assessment samples usually include approximately 100schools and 2,500 students per subject per grade from each participating state.

Data for the main and state NAEP are collected at overlapping times during winter. Data for the national long-term trend assessments are collected during fall for age 13 or grade 8, during winter for age 9 or grade 4, and during spring for age 17 or grade 11.

Sample SelectionEach assessment year, NAEP selects

national samples that represent the entirepopulation of U.S. students in grades 4, 8,and 12 for the main assessment and theentire population of students at ages 9, 13, and 17 for the long-term trend assess-ment. The selection process differs slight-ly based on whether the sample of stu-dents is needed for the main assessment,the long-term trend assessment, or thestate-level assessment.

Main and State NAEPDuring any given assessment year,

subject areas may not be assessed at allgrades. For instance, main NAEP in 2000will assess students in mathematics andscience at all three grades, but only atgrade 4 in reading. State NAEP in 2000will assess students in mathematics andscience at grades 4 and 8, but it will not testtwelfth-grade students in either subject.

The 1998 main assessment, which test-ed reading, writing, and civics at all threegrade levels, required about 137,000 stu-dents, whereas the state-level assessment,

which tested only two grade levels,required a total sample of about 350,000students because of the number of statesthat participated. Approximately 2,000schools took part in the 1998 main assess-ment, while nearly 10,000 schools con-tributed their students’ time and otherresources to the state NAEP.

Long-Term Trend NAEPThe long-term trend assessment tests

the same four subjects across years, usingrelatively small national samples. For thelong-term trend assessment, samples ofstudents are selected by age (9, 13, and 17years) for mathematics, science, and read-ing and by grade (4, 8, and 11) for writ-ing. Based on precedent, long-term trendwriting results are reported by grade, andmathematics, science, and reading long-term trend results are reported by age. In1999, the long-term trend NAEP began tobe administered every four years, but notin the same years as the main and stateassessments in reading, writing, mathe-matics, and science.

Answer

Question:19

FURTHER DETAILS

The NAEP Guide

Que

stion

19

5757

Schools and Students AssessedThe table below shows the target sam-

ple size of schools and students neededfor main and state NAEP in 2000. Thesecomponents use separate, nonoverlap-ping samples whenever possible.

Assessment SchedulesThe table on page 58 presents the typi-

cal assessment periods for the threeNAEP components. The assessmentschedules remain relatively constantacross assessment years to permit anaccurate measurement of change overtime and to help ensure that the resultsare comparable. The long-term trend

assessment is administered three timesduring the school year (one administra-tion per grade), whereas the main andstate NAEP assessments are administeredduring winter for all three grades.

State NAEP assessments are usuallyadministered during February. Becausethe month-long state assessment periodoccurs during the middle of the two-and-one-half-month national assessment peri-od, the effects that a time difference couldproduce are eliminated, making theresults of the state and national assess-ments more comparable than if they wereadministered at different times during theyear.

Total Schools and Students:*Target Sample Sizes for the

2000 NAEP Assessments

NAEP Total TotalCOMPONENTS Schools Students

NATIONAL NAEP 2,500 106,500 Grade 4 Grade 8 Grade 12Reading 8,000Mathematics 21,750 15,750 13,750Science 15,750 15,750 15,750

STATE NAEP average 500,000 Grade 4 Grade 8 Grade 12225 per

stateMathematics 125,000 125,000Science 125,000 125,000TOTALS for all components 606,500 295,500 281,500 29,500

* Numbers are for the total samples rather than for the reporting samples; therefore, these sample sizeswill be larger than the sample sizes published in reports of NAEP results.

The NAEP Guide58

Que

stion

19

Related Questions:Question 2: What subjects does NAEP

assess? How are the subjects chosen, and howare the assessment questions determined?What subjects were assessed in 1999? Whatsubjects will be assessed in 2000?

Typical NAEP Assessment ScheduleGrade or Age: FALL WINTER SPRING

Grade 4 (main, state)Age 9/Grade 4 (LTT*)

Grade 8 (main, state)Age 13/Grade 8 (LTT*)

Grade 12 (main)Age 17/Grade 11 (LTT*)

* LTT refers to ages/grades sampled for long-term trend assessments.

MainState

Long-Term Trend

Long-Term Trend

MainState

Main

Long-Term Trend

Question 20: How does NAEP use matrixsampling? What is focused BIB spiraling, andwhat are its advantages for NAEP?

The NAEP Guide

Que

stion

20

59

How does NAEP use matrix sampling? What is focused BIB spiraling, and what are its advantages for NAEP?

Typically, several hundred questions are needed to reliably test the many specifica-tions of the complex frameworks that guide the NAEP assessments. Administeringthe entire collection of cognitive questions to each student would be far too timeconsuming to be practical.

Therefore, a number of test booklets are printed for a particular subject, with eachbooklet containing a different selection of cognitive blocks. This design, calledmatrix sampling, allows NAEP to assess the entire subject area within a reasonableamount of testing time. In matrix sampling, different portions from the entire poolof cognitive questions are printed in separate booklets and administered to differ-ent but equivalent samples of students. This design minimizes the assessment timerequired per student while allowing complete coverage of the subject beingassessed.

In addition to the background questions, the NAEP test booklets contain cognitivequestions that are arranged in collections of separately timed blocks. For the cogni-tive questions, the blocks and their sequence vary within each version of the testbooklet. This planned variation of test booklets allows NAEP to generate preciseestimates of student performance and maximize the information available givenscarce resources such as students’ and teachers’ time.

The type of matrix sampling used by NAEP is called focused, balanced incompleteblock (BIB) spiraling. Because of BIB spiraling, NAEP can sample enough studentsto obtain precise results for each question while generally consuming an average ofabout an hour and a half of each student’s time.

The “focused” part of NAEP’s matrix sampling method requires each student toanswer questions from only one subject area. The “BIB” part of the method ensuresthat students receive different interlocking sections of the assessment forms,enabling NAEP to check for any unusual interactions that may occur between dif-ferent samples of students and different sets of assessment questions. “Spiraling”refers to the method by which test booklets are assigned to pupils, which ensuresthat any group of students will be assessed using approximately equal numbers ofthe different versions of the booklet.

Answer

Question:20

The NAEP Guide60

Que

stion

20

BIB SpiralingMatrix sampling is used to construct

tests when the objectives being assessedrequire more time than is available to testthem. Although BIB spiraling is only oneform of matrix sampling, the other formshave drawbacks that make them unsuit-able for NAEP.

Occasionally, NAEP may use simplematrix sampling when it matches theneeds of certain assessment questions.With simple matrix sampling, separatesets of questions are confined to particu-lar booklets. However, because eachbooklet contains a set of specific ques-tions, context and order effects must beconsidered. In a simple matrix design, the same subject area questions wouldalways appear in the same place in everyassessment booklet. Thus, student mas-tery of the questions appearing at the endof the booklet may be underestimatedbecause of fatigue or overestimated dueto practice effects.

By contrast, the more sophisticated BIBmethod produces data that are relativelyfree from these placement effects. In theNAEP BIB design, the cognitive blocks arebalanced. Each cognitive block appears anequal number of times in every possibleposition. Each cognitive block is alsopaired with every other cognitive block inat least one test booklet. (The NAEP BIBdesign varies according to subject area.)

The following table presents a simpli-fied example of BIB spiraling based onthe NAEP 1990 mathematics design. Thefull sample of students is divided intoseven equivalent groups, and each groupof students is assigned one of the seventest booklets. In this design, each cogni-tive block appears only once in each ofthree possible positions, and each block ispaired once with every other block. (Thisexample shows only the cognitive blocks,even though the test booklets also containbackground questionnaire blocks.)

FURTHER DETAILS

A Model of BIB Matrix Sampling

Booklet Position 1 Position 2 Position 3version cognitive cognitive cognitive

block block block

1 A B D2 B C E3 C D F4 D E G5 E F A6 F G B7 G A C

The NAEP Guide

Que

stion

20

6161

NAEP uses BIB spiraling even thoughthis design requires a greater variety oftest booklets to be printed. Furthermore,each version of the assessment bookletmust appear in the sample approximatelythe same number of times and must beadministered to equivalent subgroupswithin the full sample. To ensure properdistribution at assessment time, the book-lets are packed in spiral order (in theabove example, one each of booklets 1

through 7, then 1 through 7 again, and so on). The test coordinator randomlyassigns these booklets to the students ineach test administration session. Spiraleddistribution of the booklets promotescomparable sample sizes for each versionof the booklet, ensures that these samplesare randomly equivalent, and reduces thelikelihood that students will be seatedwithin viewing distance of an identicalbooklet.

The NAEP Guide62

Que

stion

21

What are NAEP’s procedures for collecting data?

A cooperative agreement between the National Center for Education Statistics (NCES)and Westat specifies the sampling and data collection operations for national and stateNAEP.

Westat field staff, who receive extensive training, administer the national assessment.Although each participating state is responsible for data collection for the state NAEP,Westat ensures uniformity of procedures across states through detailed procedural manuals, training, supervision, and quality control monitoring.

The complex process by which NAEP collects data is monitored closely. The tight control on this process contributes to the quality—and thus to the comparability—of the main and state assessments and their results.

Organization andSupervision of DataCollection

NAEP relies heavily on the support of school administrators and staff, andobtaining the cooperation of the selectedschools requires substantial time andenergy because participation in NAEPis voluntary. A series of mailings thatincludes letters to the chief state schoolofficers and district superintendents noti-fies the sampled schools of their selection.Additional informational materials aresent and procedures are explained atintroductory meetings.

Westat is responsible for the followingfield administration duties:

• selecting the sample of schools andstudents;

• developing the administration proce-dures, manuals, and materials;

• hiring and training staff to conductthe assessments (for main NAEP);

• training state personnel to conductassessments (for state NAEP); and

• conducting an extensive qualityassurance program.

For the main and long-term trendassessments, Westat hires and trainsapproximately 85 field supervisory staffto collect the data. These field supervisorsand their assistants complete all associat-ed paperwork, reducing the burden onparticipating schools.

For the state assessments, NAEP legis-lation requires each participating jurisdic-tion to handle data collection activitiessuch as obtaining the cooperation of sam-pled schools and assigning personnel toconduct the assessment. Westat employsand trains state supervisors to work withthe state-appointed coordinators whocarry out the necessary organizational

Answer

Question:21

FURTHER DETAILS

The NAEP Guide

Que

stion

21

6363

tasks. The individual schools and the des-ignated assessment administrators areresponsible for preparing lists of enrolledstudents for the sampled grade; distribut-ing the teacher, school, and SD/LEP (stu-dents with disabilities or limited Englishproficiency) questionnaires; and adminis-tering the assessment.

In addition to training local administra-tors, Westat ensures quality control acrossstates by monitoring 25 percent of the sessions in states that have previouslyconducted a NAEP assessment and 50percent of the sessions in states that are participating in NAEP for the first time.Security of assessment materials and uni-formity of administration are high priori-ties for NAEP. Quality control monitorshave never reported any instances of seri-ous breaches in procedures or major prob-lems that could jeopardize the validity ofthe assessment.

After each session, Westat staff inter-view the assessment administrators toreceive their comments and recommenda-tions. As a final quality control step, adebriefing meeting is held with the statesupervisors to receive feedback that willhelp improve procedures, documentation,and training for future assessments.

Management ofAssessment Materials

Under the direction of EducationalTesting Service (ETS) staff, NationalComputer Systems (NCS) produces thematerials needed for the NAEP assess-ments. NCS prints identifying bar codesand numbers for the booklets and ques-tionnaires, preassigns the booklets to testing sessions, and prints the bookletnumbers on the administration schedule.

These activities improve the accuracy ofdata collection and assist with the spi-raled distribution process.

Preassigning numbered test booklets to assessment sessions and printing thebooklet numbers on the administrationschedule improve the systematic protec-tion of student confidentiality. Studentnames appear only on stickers that aretemporarily attached to the booklets;these stickers are removed and destroyedimmediately after the session. Further-more, the administration forms are perfo-rated so all student and teacher namescan be easily removed. The forms contain-ing the student names are never removedfrom the school, even after the assess-ments are completed, and names arenever linked to student or teacher data.

NCS handles all receipt control, datapreparation and processing, scanning,and scoring activities for the NAEPassessments. Using an image-processingand scoring system specially designed forNAEP, NCS scans the multiple-choiceselections, the handwritten studentresponses, and other data provided bystudents, teachers, and administrators.When this image-based scoring systemwas introduced during the 1994 assess-ment, it virtually eliminated paper han-dling during the scoring process. The system also permits scoring reliability tobe monitored on line and recalibrationmethods to be introduced.

Related Questions:Question 3: Is participation in NAEP

voluntary? Are the data confidential? Arestudents’ names or other identifiers available?

Question 20: How does NAEP use matrixsampling? What is focused BIB spiraling, andwhat are its advantages for NAEP?

B I B L I O G R A P H Y

Allen, N. L., Carlson, J. E., & Zelenak, C. A. (1999). The NAEP 1996 technical report.Washington, DC: National Center for Education Statistics.

Allen, N. L., Jenkins, F., & Zelenak, C. A. (1997). Technical report of the NAEP 1996 stateassessment program in mathematics. Washington, DC: National Center for Education Statistics.

Allen, N. L., Mazzeo, J., Ip, E. H. S., Swinton, S., Isham, S. P., & Worthington, L. (1995). Data analysis and scaling for the 1994 Trial State Assessment in reading. In J. Mazzeo, N. L. Allen, & D. L. Kline, Technical report of the NAEP 1994 trial state assessment in reading (pp. 169–219). Washington, DC: National Center for Education Statistics.

Allen, N. L., Swinton, S. S., Isham, S. P., & Zelenak, C. A. (1998). Technical report of the NAEP1996 state assessment program in science. Washington, DC: National Center for EducationStatistics.

Bloxom, B. P. J., Nicewander, W. A., & Yan, D. (1995). Linking to a large-scale assessment: An empirical evaluation. Journal of Educational and Behavioral Statistics, 20, 1–26.

Calderone, J., King, L. M., Horkay, N. (eds). (1997). The NAEP Guide. Washington, DC:National Center for Education Statistics.

Campbell, J. R., Reese, C. M., O’Sullivan, C., & Dossey, J. A. (1996). NAEP 1994 trends inacademic progress. Washington, DC: National Center for Education Statistics.

Campbell, J. R., Voelkl, K. E., & Donahue, P. L. (1997). NAEP 1996 trends in academicprogress. Washington, DC: National Center for Education Statistics.

Campbell, J. R., Voelkl, K. E., & Donahue, P .L. (1998). NAEP 1996 trends in academicprogress: Addendum. Washington, DC: National Center for Education Statistics.

College Board. (1999). Mathematics framework for the 1996 and 2000 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board.

Council of Chief State School Officers. (1999a). Science framework for the 1996 and 2000National Assessment of Educational Progress. Washington, DC: National Assessment GoverningBoard.

Council of Chief State School Officers. (1999b). Reading framework for the National Assessmentof Educational Progress: 1992–2000. Washington, DC: National Assessment Governing Board.

Donahue, P. L., Voelkl, K. E., Campbell, J. R., & Mazzeo, J. (1999). NAEP 1998 Reading ReportCard for the Nation and the States. Washington, DC: National Center for Education Statistics.

65

Green, J. L., Burke, J., & Rust, K. L. (1995). Sample design and selection. In J. Mazzeo, N.Allen, & D. Kline, Technical report of the NAEP 1994 trial state assessment in reading (pp.35–69). Washington, DC: National Center for Education Statistics.

Johnson, E. G., & Owen, E. (1998). Linking the National Assessment of Educational Progress(NAEP) and the Third International Mathematics and Science Study (TIMSS): A technical report.Washington, DC: National Center for Education Statistics.

Johnson, E. G., Siegendorf, A., and Phillips, G. W. (1998). Linking the National Assessment of Educational Progress (NAEP) and the Third International Mathematics and Science Study(TIMSS): Eighth-grade results. Washington, DC: National Center for Education Statistics.

Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6,83–102.

Linn, R. L., & Kiplinger, V. L. (1993). Linking statewide tests to the National Assessment ofEducational Progress: Stability of results. Boulder, CO: Center for Research on Evaluation,Standards, and Student Testing.

Linn, R. L., Koretz, D., & Baker, E. L. (1996). Assessing the validity of the National Assessmentof Educational Progress: NAEP Technical Review Panel white paper. Washington, DC: U.S.Department of Education.

McLaughlin, D. H. (1998a). Study of the linkages of 1996 NAEP and state mathematics assessments in four states. Washington, DC: National Center for Education Statistics.

McLaughlin, D. H. (1998b). Linking state assessments of NAEP: A study of the 1996 mathemat-ics assessment. Paper presented to the American Educational Research Association, San Diego,CA.

National Assessment of Educational Progress. (1982). Writing objectives: 1984 assessment.Washington, DC: U.S. Department of Education.

National Assessment of Educational Progress. (1984). Reading objectives: 1983–84 assessment.Washington, DC: U.S. Department of Education.

National Assessment of Educational Progress. (1985a). Math objectives: 1985–86 assessment.Washington, DC: U.S. Department of Education.

National Assessment of Educational Progress. (1985a). Science objectives: 1985–86 assess-ment. Washington, DC: U.S. Department of Education.

National Assessment of Educational Progress. (1987). Writing objectives: 1988 assessment.Washington, DC: U.S. Department of Education.

66

National Assessment of Educational Progress. (1989). Science objectives: 1990 assessment.Washington, DC: U.S. Department of Education.

Pashley, P. J., & Phillips, G. W. (1993). Toward world-class standards: A research study linkinginternational and national assessments. Princeton, NJ: Educational Testing Service.

Pellegrino, J. W., Jones, L. R., & Mitchell, K. J. (eds). (1999). Grading the nation’s report card: Evaluating NAEP and transforming the National Assessment of Educational Progress.Washington, DC: National Academy Press.

Shepard, L.A. (1997). Measuring achievement: What does it mean to test for robust understand-ing? William H. Angoff Memorial Lecture Series. Princeton, NJ: Educational Testing Service.

Wallace, L., & Rust, K. F. (1996). Sample design. In N. Allen, D. Kline, & C. Zelenak, NAEP1994 technical report (Chapter 3). Washington, DC: National Center for Education Statistics.

Williams, V. S. L., Billeaud, K., Davis, L. A., Thissen, D., & Sanford, E. (1995). Projecting to the NAEP scale: Results from the North Carolina End-of-Grade Testing Program. Journal ofEducational Measurement.

67

Bock, D. B., & Zimowski, M. F. (1998). Feasibility studies of two-stage testing in large-scale educational assessment: Implications for NAEP. Commissioned by the NAEP Validity StudiesPanel. Palo Alto, CA: American Institutes for Research.

Bourque, M. L., Campagne, A. B., & Crissman, S. (1997). 1996 science performance stan-dards: Achievement results for the nation and the states. Washington, DC: National AssessmentGoverning Board.

Campbell, J. R., & Donahue, P. L. (1997). Students selecting stories: The effects of choice inreading assessment. Washington, DC: National Center for Education Statistics.

Chromy, J. R. (1998). The effects of finite sampling on state assessment sample requirements.Commissioned by the NAEP Validity Studies Panel. Palo Alto, CA: American Institutes forResearch.

Dossey, J. A., Mullis, I., & Jones, C. O. (1993). Can students do mathematical problem solving?Washington, DC: U.S. Department of Education.

Hawkins, E. F., Stancavage, F., & Dossey, J. A. (1998). School policies affecting instruction inmathematics. Washington, DC: National Center for Education Statistics.

Hedges, L. V., & Vevea, J. L. (1997). A study of equating in NAEP. Commissioned by the NAEPValidity Studies Panel. Palo Alto, CA: American Institutes for Research.

Jaeger, R. M. (1998). Reporting the results of the National Assessment of Educational Progress.Commissioned by the NAEP Validity Studies Panel. Palo Alto, CA: American Institutes forResearch.

Jakwerth, P. R., Stancavage, F. B., & Reed, E. D. (1999). An investigation of why students do not respond to questions. Commissioned by the NAEP Validity Studies Panel. Palo Alto, CA:American Institutes for Research.

Lindquist, M. M., Dossey, J. A., & Mullis, I. (1995). Reaching standards: A progress report onmathematics. A policy information perspective. Princeton, NJ: Educational Testing Service.

Mislevy, R. J. (1992). Linking educational assessments. Princeton, NJ: Educational TestingService.

Mitchell, J. H., Hawkins, E. F., Jakwerth, P., Stancavage, F. B., & Dossey, J. A. (1999). Studentwork and teacher practices in mathematics. Washington, DC: U.S. Department of Education.

F U R T H E R R E A D I N G

68

69

Mullis, I. V. S. (1997). Optimizing state NAEP: Issues and possible improvements.Commissioned by the NAEP Validity Studies Panel. Palo Alto, CA: American Institutes for Research.

Mullis, I., Jenkins, F. L., & Johnson, G. (1994). Effective schools in mathematics. Washington,DC: National Center for Education Statistics.

O’Sullivan, C. Y., Weiss, A. R., & Askew, J. M. (1998). Students learning science: A report onpolicies and practices in U.S. schools. Washington, DC: National Center for Education Statistics.

Pearson, D. P., & Garavaglia, D. R. (1997). Improving the information value of performanceitems in large-scale assessments. Commissioned by the NAEP Validity Studies Panel. Palo Alto,CA: American Institutes for Research.

almanac. A comprehensive collectionof tables of NAEP results.

assessment session. The period oftime during which a test booklet isadministered to students.

background questionnaires. Theinstruments used to collect informationabout student demographics and educational experiences.

bias. In statistics, the differencebetween the expected value of an esti-mator and the population parameterbeing estimated. If the average value of the estimator across all possiblesamples (the estimator’s expectedvalue) equals the parameter being estimated, the estimator is said to beunbiased; otherwise, the estimator isbiased.

BIB (Balanced Incomplete Block)spiraling. A complex variant of mul-tiple matrix sampling in whichitems are administered so that each pairof items is dispensed to a nationally rep-resentative sample of respondents.

block. A group of assessment itemscreated by dividing the item pool for anage or grade into subsets. Blocks areused in the implementation of the BIBspiral sample design.

booklet. The assessment instrumentcreated by combining blocks of assess-ment items.

calibrate. To estimate the parametersof a set of items using responses of asample of examinees.

clustering. The process of formingsampling units as groups of other units.

codebook. A formatted printout ofNAEP data for a particular sample ofrespondents.

coefficient of variation. The ratio ofthe standard deviation of an estimate tothe value of the estimate.

common block. A group of back-ground items included at the beginningof every assessment booklet.

conditional probability. Probabilityof an event happening, given the occur-rence of another event.

conditioning variables. Demo-graphic and other background variablescharacterizing a respondent. These variables are used to construct plausiblevalues.

constructed-response item. A non-multiple-choice item that requires sometype of written or oral response.

degrees of freedom [of a vari-ance estimator]. The number of inde-pendent pieces of information used togenerate a variance estimate.

derived variables. Subgroup datathat were obtained through interpreta-tion, classification, or calculation proce-dures rather than from assessmentresponses.

design effects. The ratio of the variance for the sample design tothe variance for a simple randomsample of the same size.

distractor. An incorrect responsechoice included in a multiple-choiceitem.

excluded student questionnaire.An instrument completed for every stu-dent who was selected to participate butultimately excluded from the assessment.

excluded students. Sampled stu-dents determined by the school to beunable to participate because they havelimited English language proficiency ora disability.

G L O S S A R Y

70

Que

stion

21

expected value. The average of thesample estimates given by an estima-tor across all possible samples. If theestimator is unbiased, then its expect-ed value will equal the population valuebeing estimated.

field test. A pretest of items to obtaininformation regarding clarity, difficultylevels, timing, feasibility, and specialadministrative situations. The field test isperformed before revising and selectingthe items to be used in the assessment.

focused BIB spiraling. A variationof BIB spiraling in which items areadministered so that each pair of itemswithin a subject area is dispensed to anationally representative sample ofrespondents.

foils. The correct and incorrectresponse choices included in a multiple-choice item.

group effect. The difference betweenthe mean for a specific group and themean for the nation.

imputation. Prediction of a missingvalue based on some procedure, usinga mathematical model in combinationwith available information. See plausi-ble values.

imputed race/ethnicity. The race or ethnicity of an assessed student asderived from his or her responses toparticular common background items. A NAEP reporting subgroup.

item response theory (IRT). Testanalysis procedures that assume a math-ematical model for the probability that agiven examinee will respond correctly toa given exercise.

jackknife. A procedure that estimatesstandard errors of percentages andother statistics. It is particularly suited tocomplex sample designs.

machine-readable catalog.Computer-processing control informa-tion, IRT parameters, foil codes, andlabels in a computer-readable format.

metropolitan statistical area(MSA). An area defined by the federalgovernment for the purposes of present-ing general-purpose statistics for metro-politan areas. Typically, an MSA con-tains a city with a population of at least50,000 and includes its adjacent areas.

multiple matrix sampling.Sampling plan in which different samples of respondents take different samples of items.

multistage sample design.Indicates more than one stage of sam-pling. The following is an example ofthree-stage sampling: (1) sample ofcounties (primary sampling units orPSUs), (2) sample of schools withineach sample county, and (3) sampleof students within each sample school.

NAEP scales. The scales commonacross age or grade levels and assess-ment years used to report NAEP results.

nonresponse. The failure to obtainresponses or measurements for all sample elements.

nonsampling error. A general termapplying to all sources of error, with theexception of sampling error. Includeserrors from defects in the samplingframe, response or measurementerrors, and mistakes in processing the data.

objective. A desirable education goalaccepted by scholars in the field, educa-tors, and concerned laypersons andestablished through a consensusapproach.

observed race/ethnicity. Race orethnicity of an assessed student as per-ceived by the exercise administrator.

71

oversampling. Deliberately samplinga portion of the population at a higherrate than the remainder of the popula-tion.

parental education. The level ofeducation of the mother and father ofan assessed student as derived from the student’s response to two assess-ment items. It is a NAEP reportingsubgroup.

percent correct. The percentage of atarget population that would answer aparticular exercise correctly.

plausible values. Proficiency valuesdrawn at random from a conditionaldistribution of a NAEP respondent,given his or her response to cognitiveexercises and a specified subset ofbackground variables (conditioningvariables). The selection of a plausi-ble value is a form of imputation.

poststratification. Classification andweighting to correspond to external val-ues of selected sampling units by a setof strata definitions after the samplehas been selected.

primary sampling unit (PSU).The basic geographic sampling unit for NAEP. Can be either a single county or a set of contiguous counties.

probability sample. A sample inwhich every element of the populationhas a known, nonzero probability ofbeing selected.

pseudoreplicate. The value of a statistic based on an altered sample.Used by the jackknife varianceestimator.

random variable. A variable thattakes on any value of a specified setwith a particular probability.

region. One of four geographic areasused in gathering and reporting data:Northeast, Southeast, Central, and West(as defined by the Office of BusinessEconomics, U.S. Department of Com-merce). A NAEP reporting subgroup.

reporting subgroup. Groups withinthe national population for which NAEPdata are reported; for example, gender,race/ethnicity, grade, age, level ofparental education, region, andtype of location.

respondent. A person who is eligiblefor NAEP, is in the sample, andresponds by completing one or moreitems in an assessment booklet.

response options. In a multiple-choice question, alternatives that can beselected by a respondent.

sample. A portion of a population, or a subset from a set of units, that isselected by some probability mechanismfor the purpose of investigating theproperties of the population. NAEPdoes not assess an entire population but rather selects a representative sample from the group to answerassessment items.

sampling error. The error in surveyestimates that occurs because only asample of the population is observed.Measured by sampling standarderror.

sampling frame. The list of samplingunits from which the sample is selected.

sampling weight. A multiplicativefactor equal to the reciprocal of theprobability of a respondent beingselected for assessment with adjustmentfor nonresponse and, perhaps, post-stratification. The sum of the weightsprovides an estimate of the number ofpersons in the population representedby a respondent in the sample.

72

school questionnaire. A question-naire completed for each school by theprincipal or other official. It is used togather information concerning schooladministration, staffing patterns, curricu-lum, and student services.

secondary-use data files.Computer files containing respondent-level cognitive, demographic, and back-ground data. They are available for use by researchers wishing to performanalyses of NAEP data.

selection probability. The chancethat a particular sampling unit has ofbeing selected in the sample.

session. A group of students reportingfor the administration of an assessment.Most schools conduct only one session,but some large schools conduct asmany as 10 or more.

simple random sample. Theprocess for selecting n sampling unitsfrom a population of N sampling unitsso that each sampling unit has an equalchance of being in the sample andevery combination of n sampling unitshas the same chance of being in thesample chosen.

standard error. A measure of sam-pling variability and measurement errorfor a statistic. Because of NAEP’s com-plex sample design, sampling stan-dard errors are estimated by jackknif-ing the samples from first-stage sam-ple estimates. Standard errors may alsoinclude a component due to the error ofmeasurement of individual scores esti-mated using plausible values.

stratification. The division of a popu-lation into parts, or strata.

stratified sample. A sample select-ed from a population that has beenstratified, with a sample selected inde-pendently in each stratum. The strataare defined for the purpose of reducingsampling error.

student ID number. A unique identification number assigned to eachrespondent to preserve his or heranonymity. NAEP does not record thenames of any respondents.

subject area. One of the areasassessed by NAEP; for example, art,civics, computer competence, geogra-phy, literature, mathematics, music, read-ing, science, U.S. history, or writing.

systematic sample (systematicrandom sample). A sample select-ed by a systematic method; for exam-ple, units selected from a list at equallyspaced intervals.

teacher questionnaire. A question-naire completed by selected teachers ofsample students. It is used to gatherinformation concerning years of teach-ing experience, frequency of assign-ments, use of teaching materials, andavailability and use of computers.

Trial State Assessment Program.A NAEP program authorized byCongress in 1988 and established toprovide for a program of voluntarystate-by-state assessments on a trialbasis.

trimming. A process by whichextreme weights are reduced (trimmed)to diminish the effect of extreme valueson estimates and estimated variances.

type of location (TOL). One of theNAEP reporting subgroups, divid-ing the communities in the nation intogroups based on the proportion of thestudents living in each of three sizesand types of communities.

variance. The average of the squareddeviations of a random variable fromthe expected value of the variable. Thevariance of an estimate is the squaredstandard error of the estimate.

73

S U B J E C T I N D E X

AAcademic researchers, and confidentiality

issues, 13–14Achievement levels, 27–29Advanced Studies Seminar on Use ofNAEP Database for Research and Policy

Discussion, 48American College Testing (ACT), 1American Institutes for Research (AIR), 1Arts assessment (1997), 25–26, 40

framework for, 8–9Assessment clearance package, 21Assessment development process, 20–22Assessment innovations, 25–26

impact of, 26Assessment schedules, 54–56

BBackground questionnaires, 5, 13, 27, 34,

42–44development of, 20–22parent access to, 15–17results provided by, 27

Balanced incomplete block (BIB) spiraling, 32, 56–59

Bilingual booklets, 23Blocks, 21–22, 42, 57–58

CCalibration scoring, 33, 61CCSSO. See Council of Chief State School

OfficersCenter for Research on Evaluation,

Standards, and Student Testing (CRESST), 17

Civics assessment (1998), 30Commissioner of Education Statistics, 1, 17Comparability of state and national

results, 38–39Computer scanning of response booklets,

30, 32, 61Confidentiality of assessment data,

13–14, 61Congressional mandate for assessment

evaluation, 17–19Constructed-response questions, 4

and assessment scales, 28number of, 7in 1998 reading assessment, 26scoring of, 30–31validity and evaluation of, 18

Contextual background variables, 27–28,42–44. See also Background questionnaires

Council of Chief State School Officers (CCSSO), 1

CRESST. See Center for Research onEvaluation, Standards, and Student Testing

Cross-State Data Compendia, 41

DData

collection of, 54, 60–61confidentiality of, 13–14, 61international linking of, 49–50secondary analysis of, 47–48

Data collection procedures, 54, 60–61Data products, 47Demonstration booklet, 15–16Department of Education

Freedom of Information Act officer, 16Information Management Compliance

Division (IMCD), 21–22Destruction Notice, 14Differential item functioning (DIF), 22, 35Disabilities, students with

accommodation for, 23–24background questionnaires for, 42, 44

Documentation of scoring process, 33

EEducational research, assistance for, 47–48Educational Testing Service (ETS), 1, 8, 20

scoring and processing by, 30–33software for secondary data

analysis, 48technical assistance provided by, 47test materials produced by, 61

Educators, resources for, 45–46Electronic access to reports, 40, 47Estimation, 34–36ETS. See Educational Testing ServiceEvaluation of assessments, 17–19Excluded students, 23–24Expert panels, 17–19

FField-test clearance package, 21–22Field tests, 22Focused Reports, 41Frameworks

development of, 3, 6–8educational uses of, 45–46for 1999 long-term trend assessment, 8for 2000 mathematics assessment, 8–9for 2000 reading assessment, 11–12for 2000 science assessment, 9–10

74

Freedom of Information Act officer(Department of Education), 16

GGrade levels included in 1999–2000

assessments, 6–7, 54Group-level statistics, 34

IIAEP. See International Assessment of

Educational ProgressImage-based scoring system, 30, 32, 61Improving America’s Schools Act of 1994,

Title IV, 13Inclusion, increased, 23, 44Individualized Education Plans (IEPs),

23, 44Information Management Compliance

Division (IMCD), Department ofEducation, 21–22

Instructional experiences, assessment of, 27Instructional reports, 40Instrument Development Committees,

20–22Instruments

development of, 20–22flexibility of, 4

International assessment data, link to, 49–50

International Assessment of EducationalProgress (IAEP), 49

Interpreting, 8Interrater reliability, 32Item analysis, 34–35Item response theory (IRT), 28, 34–36

LLimited English Proficiency (LEP)

accommodation for, 34background questionnaires for, 42, 44, 61

Linking, 6, 17, 36, 39, 49–50Long-term trend assessments, 4, 7–8

data collection procedures, 60framework for, 8frequency and timing of, 54–56student selection for, 51–52

MMain assessments, 4. See also National

assessmentsMaryland Learning Outcomes, 26Mathematics assessments, 25, 35, 38, 40, 54

framework for, 8–9

Matrix sampling, 57–59Multiple-choice questions

and assessment scales, 28, 30scoring of, 28, 30vs. constructed-response questions, 7–8

NNAEP. See National Assessment ofEducational ProgressNAEPEX, 48NAEP NETWORK, 8, 21–22, 48National Academy of Education (NAE),

Panel on the Evaluation of NAEP TrialState Assessment Project, 18

National Academy of Sciences (NAS), 19National Assessment Governing Board

(NAGB), 1achievement levels, 28–29framework development by, 4, 6–8instrument development by, 20–22scoring guide review by, 31subject selection by, 6

National Assessment of EducationalProgress (NAEP) 1996, 40

1997, 6–8, 25–261998, 6–7, 9–12, 26, 38–41comparability of results, 38–39frequency and timing of, 3, 54goals of, 3legislation authorizing, 13, 18long-term trend. See Long-term trend

assessmentsnational. See National assessmentsoverview of, 1, 3–6state. See State assessments

National assessments, 3–41998, sample characteristics of, 541999, subject areas and grades

included in, 6–82000, subject areas and grades

included in, 6–12analysis of, 34background questionnaires for, 42–44 comparability to state assessments, 38–39development of, 20–22frequency and timing of, 54number of schools and students

included in, 54–56results of, 27 scaling of, 34–36 student selection for, 51–52validity and evaluation of, 17–19

75

National Center for Education Statistics(NCES), 1–2

data collection operations, 60–61expert panels established by, 17framework development by, 6–8increased inclusion policies, 23instrument development by, 20–22 reports published by, 13scoring guide review by, 30–31 technical assistance provided by, 47–48test booklets released for review by, 15Web site, 40, 46, 47

National Computer Systems (NCS), 1scoring and processing by, 30–32test materials produced by, 61

National Education Statistics Act of 1994, 13

National Education Summit, 49Nonbias policy, 22Noncognitive blocks, 42North Carolina End-of-Grade Tests inMathematics, 26NVS Panel. See Validation Studies Panel

OOffice of Management and Budget

(OMB), 21–22Ohio state assessments, 26Oversampling, 53

PParent access to test booklets, 15–16Parental consent, 13Participation in NAEP, 3–5, 13, 23, 54Plausible values methodology, 36–37Primary sampling units (PSUs), 52Public Law 100–297, 18Public Law 103–382, 18Public release of questions, 15–16

QQuality control measures

for data collection operations, 60–61to monitor against bias, 22for scaling and analysis of data, 34–36for scoring process, 28–29

Questionnaires, background. SeeBackground questionnaires

Questions constructed-response. See Constructed-

response questionsmultiple-choice. See Multiple-choice

questionsparent access to, 15–16released, 15–16, 45–46sample, 16security of, 15–16

RRAND, 17Reading

for information, 9–11, 28for literary experience, 9–11, 28to perform tasks, 9–11, 28

Reading assessment1994, scales developed for, 28–291998, 9–11, 26, 38–39, 54framework for, 9–10sample questions from, 9–11

Reading composite scale, 28Reading literacy, aspects of, 9–11Reading situations, types of, 9–10, 28Reading stances, 9–10Report Cards (NAEP), 40Reports, 40–41, 46

educational uses of, 45–46focused, 41instructional, 40state, 40technical, 41trend, 41update, 40

Results, 27–29analysis of, 34–37processing and scoring of, 30–33

SSamples, student, 51–53Sample selection, 54–55Sampling, 54, 57–59SAS. See State aggregate sampleScales, linking of, 36Scale-score data files, generation of, 35–36Scaling, 34–36, 37

of national and state assessments, 38–39School environment, assessment of, 27School questionnaires, 42–44. See also

Background questionnairesSchools

number of, 54–56selection of, 52

Science assessments, 25, 38, 54framework for, 9–10

Scorersconsistency/monitoring of, 30, 32–33recruiting and training of, 30–32

Scoringof constructed-response questions, 45documentation of, 33, 61reliability of, 30–33, 61of state assessments, 31for trend questions, 33

Scoring guides, 30–31Scoring system, image-based, 30, 32, 61SCS. See State comparison sample

76

SD. See Students, with disabilitiesSecurity, importance of, 15–16Seminars, 48Sensitivity review, mandatory, 22Software products, 41, 47Special studies/analyses, 48Spiraling, focused BIB, 57–59State aggregate sample (SAS), 38–39State assessments, 3–5

1998, sample characteristics of, 542000, subject areas and grades included

in, 6–8, 54–55analysis of, 34background questionnaires for, 42comparability among, 38–39comparability with national assessments,

38–39development of, 20–22frequency and timing of, 54mandated, 13number of schools and students

included in, 54questions included in, 8results provided by, 27scaling of, 38–39scoring of, 31student selection for, 51–53validity and evaluation of, 18–19

State comparison sample (SCS), 38–39State-level forum, for educational issues

and policy discussions, 48State Reports, 40–41Stratification, 52–53Student achievement

estimation of, 34–36trends in, measurement of, 32–33, 35–36

Student motivation questions, 43Student questionnaires, 42–43. See alsoBackground questionnairesStudents

with disabilitiesaccommodation for, 23–24background questionnaires for, 42, 44

excluded, 23–24with limited English proficiency

accommodation for, 23–24background questionnaires for, 42, 44

number of, 54–56populations of, assessment of, 27sampling, 51–53selection of, 51–53

Subject area questions, on student questionnaires, 43

Subject-matter achievement, assessmentof, 27–29

Subjectsincluded in 2000 assessments, 6–7, 9–12selection of, 6

Subject scales, 28Summary Data Tables Tool, 41, 48

TTeacher questionnaires, 43. See also

Background questionnairesTechnical assistance, 47–48Technical Reports, 41Technical Review Panel (TRP), 17Test booklets

bilingual, 23computer scanning of, 30–31, 61numbering of, 60parent access to, 15–16planned variation of, 57–59warehousing of, 33

Test materials, production and management of, 60–61

Testing time, extended, 23Third International Mathematics and

Science Study (TIMSS), 49–50Training

of data collection administrators, 60of scorers, 30–32

Trend Reports, 41Trial State Assessment (TSA), 4–5, 18TRP. See Technical Review Panel

UUniversity of Colorado at Boulder, 17Update reports, 40Using NAEP data, 42–50

VValidation Studies Panel, 18Validity, of assessments, 17–19Voluntary nature, of assessments,

WWeighting, 34, 53Westat, 60–61World Wide Web site (NCES), 40, 45–47Writing assessment (1998), 30

77

1 Explanation of format for Year column: Before 1984, the main NAEP assessments were administered in the fall of one year through the spring of thenext. Beginning with 1984, the main NAEP was administered after the new year in the winter, although the assessments to measure long-term trendcontinued with their traditional administration in fall, winter, and spring. Because the main assessment is the largest component of NAEP, beginningwith 1984 we have listed its administration year rather than the two years over which trend continued to be administered. Note also that the statecomponent is administered at essentially the same time as the main NAEP.2 This was a small, special-interest assessment administered to limited national samples at specific grades or ages and was not part of a main assess-ment. Note that this chart includes only assessments administered to in-school samples; not shown are several special NAEP assessments of adults.3 This assessment appears in reports as part of long-term trend. Note that the Civics assessment in 1988 is the third point in trend withCitizenship/Social Studies in 1981–82 and in 1975–76. There are no points on the trend line for Writing before 1984.4 The 1986 long-term trend Reading assessment is not included on the trend line in reports because the results for this assessment were unusual.Further information on this reading anomaly is available in Beaton and Zwick (1990).5 State assessments in 1990–94 were referred to as Trial State Assessments (TSAs).

Subject Areas Assessed by NAEP

Year STATE NAEPNATIONAL NAEP

Main Assessment Long-Term Trend

1998 Civics, Reading, Writing Reading Grades 4, 8Writing Grade 8

1997 Arts (Grade 8)

1996 Mathematics, Science Mathematics, Science, Reading, Writing Mathematics Grades 4, 8Science Grade 8*

1994 Geography, U.S. History, Reading Mathematics, Science, Reading, Writing Reading5 Grade 4

1992

1990

1988

1986

1984

1981–821

1979–80

1978–79

1977–78

1976–77

1975–76

1974–75

1973–74

1972–73

1971–72

1970–71

1969–70

Mathematics, Science, Reading

Civics, Document Literacy,2 Geography,2

U.S. History, Reading, Writing

Computer Competence, U.S. History,2

Literature,2 Mathematics, Science, Reading

Reading, Writing

Mathematics, Science, Citizenship/Social Studies3

Reading/Literature, Art

Art, Music, Writing

Consumer Skills,2 Mathematics

Basic Life Skills,2 Science

Citizenship/Social Studies, Mathematics2

Art, Index of Basic Skills, Reading

Career and Occupational Development, Writing

Mathematics, Science

Music, Social Studies

Literature, Reading

Citizenship, Science, Writing

Mathematics, Science, Reading, Writing

Civics3


Mathematics, Science, Reading4

Reading, Writing

Mathematics,3 Science3

Reading3

Mathematics3

Science3

Citizenship/Social Studies3

Reading3

Mathematics,3 Science3

Reading3

Science3

Mathematics5 Grade 8

*Department of Defense schools wereassessed at Grades 4 and 8.

Mathematics, Reading, Writing Mathematics, Science, Reading, Writing Mathematics5 Grades 4, 8Reading5 Grade 4

Since 1994, states have assessed nonpublicschools as well as public schools.

2000

1999

Mathematics, Science, Reading (Grade 4) Mathematics Grades 4, 8Science Grades 4, 8


United StatesDepartment of Education

Washington, DC 20208–5653

Official BusinessPenalty for Private Use, $300

Presorted StandardPOSTAGE & FEES PAID

U.S. Department of EducationPermit No. G–17

Standard Mail (A)

national center for education statistics - the naep …...the guide naepnaep 1999 edition u.s....

Documents