overview of assessment ch. 3

Have you ever had the feeling, after taking a test, that the test was not valid indicator of your knowledge, that you really knew more but the test did not

allow you to show it, or that the test seemed unfair because it asked questions about topics not covered in class? This chapter looks at some

basic principles of assessment. It also looks at a number of different kinds of tests and the ways in which tests and other assessment devices can be

chosen and used as to provide the most accurate and useful information.

Using what you know

Read each of the following statements. Put a check under “Agree” or “disagree” to show how you feel bout each one. If you can, discuss your

responses with classmates.

In assessment, how a student gets an answer is more important than whether the answer is right or wrong.

Most tests do not yield useful information because they distort the reading/writing process.

In general, informal tests are better than formal ones.

One of the best ways to assess a student is to teach him or her in a weak area and see how much and how well she or he learns

Time spent assessing low-achieving readers would be better spent instructing them

Agree Disagree _____ _____ _____ _____ _____ _____ _____ _____ _____ _____

Although assessment and instruction are placed in separate sections in this text, in practice, the two are blended. Assessment should be an integral part of all instruction.

Standards for assessment endorsed by the International Reading Association and National council of Teachers of English (Joint Task Force on Assessment, 1994) Stress that the primary purpose of assessment is to improve teaching and learning.

Assessment must reflect changing academic demands as students move up through the grades and encounter higher-level comprehension and study tasks. Dynamic assessment fits in with the concept of response to intervention discussed in chapter 1.

Under No Child Left Behind (NCLB) all students except for 1 percent who have been excluded because of serious cognitive deficits must be assessed in terms of grade-level standards.

Zone of proximal development: Difference between what students can do on their own and what they can do under the guidance of an adult or more knowledgeable peer.

Assisted Testing- Is an easy-to-apply form of dynamic assessment in which students are given cues or prompts to see how much help they need in order to respond correctly (Johnson 1993).

In assisted testing you ask: How much help and what kind of help do I have to provide in order for a student to perform successfully? You start off by giving a little assistance and then increase it until the student can respond correctly.

Finding a student’s knowledge level provides a realistic starting point. Often we assume that problem learners have no knowledge in particular area, and we waste time reteaching what they already know what they already know.

Assessment should emphasize the students’ strengths so that these provide a foundation for instruction.

See Chapter 6 for more information on trial teaching.

The Diagnostic Assessments of Reading (Riverside) is accompanied by online Trial Teaching Strategies, which are series of brief lessons that can be matched to students’ diagnostic profiles and link instruction to assessment.

Diagnosis does not stop with dynamic or assisted testing or

even trial teaching. It is an ongoing process. As Doris

Johnson (1993) notes, “It goes on forever.” In assessment, you

create a hypothesis and evaluate it through testing, including dynamic testing,

observation, trial teaching, and carefully monitoring the

students’ performance. In a sense, every lesson that you

teach should be a trial or diagnostic one.

Step 1: Establish an estimate of the levels on which the students are operating.

Step 2: Gather and evaluate information about students’ reading/ writing strengths and weaknesses.

Step 3: assess and evaluate students’ teaching-learning situation. Through dynamic testing and trial teaching, determine under what circumstances students learn best. Also assess the home situation.

Step 4: Evaluate materials used in the students’ program.

Step 5: Integrate information and design a long-term program.

Step 6: Continually assess and evaluate the program and make modifications as necessary. In general you will be asking the following questions.

• On what levels are students functioning?

• What are the students’ potential for growth?

• What are the students’ strengths and weaknesses in reading and writing?

• What are the students’ most immediate or most essential needs in reading and writing?

• Under what circumstances and in what setting would these students learn best?

• What would be the most effective materials for these students?

• Are there any physical, psychological, social, or other factors that need to be considered?

• How might the home, larger community, and school work together to help students?

Every child can learn, given the right kind of instruction ,

materials, tasks, and situation. The purpose of assessment is to determine the optimal learning circumstances for a particular

student.

Groups tests of reading and writing generally fall into one of two categories: norm referenced or criterion referenced. In norm-referenced tests, which are often referred to as standardized tests, students are compared with a norm-group, which is a sample of the others who are in the same grade or are the same age. The score indicates whether students’ performance is average, above average, or below average compared to the norm group. Scores are commonly reported in one or more of the following ways:

Norm-referenced test: the performance of students is compared to that of norming or

sample group.

Percentile Rank: the most-used score for

norm-referenced tests of reading and writing.

Percentile rank is measure of comparative standing. If students progress at the same rate, their percentile ranks stay the same. To move up to a higher percentile,

they must make a better-than-average gain.

Raw score is the total number correct. It has no meaning until transformed into a percentile rank, grade equivalent, or other score.

Percentile rank indicates where a student’s score fall son ranking of percentages for 1 to 99.

Grade-equivalent scores characterize performance as being equivalent to that of other students in a particular grade.

The International Reading Association opposes the use of grade-equivalent scores because they are open to misinterpretation.

Normal-curve equivalents (NCEs) place students on a scale of 1 through 99. The term stanine is a combination of the words standard and nine and describes a

nine-point scale. Scaled scores are a continuous ranking of scores from the lowest levels of a series

of norm-referenced tests through the highest-from kindergarten or the first grade through high school.

DRP units are used to report performance on the Degrees of Reading Power Tests. DRP units range from 15 for the easiest materials to 85 for the most difficult reading

material. The advantage of DRP units is that the same type of measurement is used to indicate students’ reading levels and also the difficulty level of reading material.

Lexile scores are also used for reporting tests scores and the difficulty level of texts.

The Lexile framework is a scale from scale from 200 to 1700 with 200 being very easy reading material-about mid-first-grade level-and 1700 being very difficult reading material of the type found in scientific journals.

Criterion-referenced test: student performance is measured against a standard. A typical standard of

performance on criterion-referenced comprehension test is 75 percent.

One form of criterion-referenced assessment is the benchmark, a description of a key task that students are expected to perform. For instance, in one intervention

program for struggling readers, the benchmark is that they be able to read a children’s book entitled A Kiss for Little Beat (Minarik, 1959) (Hiebert, 1994). Benchmarks need

not to be tied to a specific book but might be stated in more general terms:

•Uses context and phonics cues to decode difficult words.

•Can read fourth-grade material and retell the main events or details in the section.

Reading tests can also be categorized as being survey or diagnostic tools. Survey test typically provide an overview of general comprehension and word knowledge. Diagnostic tests assess a number of areas or assess key areas in greater depth. The Stanford Diagnostic Reading Test, one of the best known of the group diagnostic tests, assess comprehension, reading or listening vocabulary, word analysis skills, and, at higher levels, the ability to scan. A list of survey and diagnostic tests is presented in Tables 3.1 and 3.2.

o Standardized Test: assessment tasks and administration are carefully specified so that anyone taking the test does so under similar conditions. The term standardized test is also used to mean a norm-referenced test.

Test can also be categorized as being formal or informal. Formal tests may be standardized. They are designed to be given according to a standard set of circumstances. These tests have sets of directions, which are to be followed exactly. They may also have time limits. All norm-referenced tests are standardized. The advantage of formal standardized tests is that typically they have been constructed with care and tried out on hundreds of thousand of students.

Informal tests generally do not have a set of standard directions, so there is a degree of flexibility in how they are given. In fact, the main advantage of informal tests is their flexibility. They may be designed to assess almost any skill or area, and may be tailored for any population. Informal tests are typically constructed by teachers. Their disadvantage is that they may not be constructed with sufficient care, and their reliability and validity may be unknown. One of the most widely used assessment devices in the field of literacy is the informal reading inventory, which is explored in the next chapter.

Summative assessment summarizes students’ progress at the end of a unit or semester and is administered after learning has taken place.

Formative assessment is used to inform instruction. It takes place during learning.

Assessment is summative or formative. As noted earlier, summative assessment summarizes students’ progress at the end of a unit or a semester or at some other time and is administered after learning has taken place. Norm-referenced and high stakes tests are generally summative. Formative assessment is on going and used to in form instruction.

o Assessing for learning: Using Formative Assessments The purpose of assessing for learning is to obtain information about students then provide the instruction they need.

o Assessing to learn begins with a clear explanation of the standards students are expected to meet. The classroom teacher or tutor might need to break down the state standards into a curriculum map or series of steps that, if followed, will lead to achieving the standard. The standard is expressed in terms that the students can understand.

Formative assessment can be powerful. Black and Wiliam (1998) found that its use increased average student performance by as much as 24 percentile points. Formative

assessment was especially helpful for struggling learners: “ While formative assessment can help all pupils, it yields particularly good results with low achievers by concentrating on specific problems with their work and giving them clear understanding of what is wrong and how to put it right.”

Reliability: the consistency of an assessment device. It is the degree to which the device would yield similar results if given again to the same person or group.

Validity: the degree to which an assessment device measures; also, the degree to which the results can be used to make an educational decision.

Evaluating Assessment Devices Continued…

Reliability

For a test, reliability means the if students retook the test they would get approximately the same score. For an observation guide, its means that if two or three observers rated the same student at the same time, their ratings would be similar.

Validity

Validity means that a device measures what it says it measures, such as vocabulary, comprehension, rate of reading, attitude toward reading, and so forth. It also means that the device will provide information that will be useful in making an instructional decision.

Correlation coefficient: statistical measure that expresses in mathematical terms the degree to which two variables are related.

Construct validity is the degree to which a test measures a theoretical trait or construct such as critical reading, learning ability, or phonological awareness.

Content or curricular validity is the degree to which the content of a tests reflects reading or tasks as they are taught in the schools. Many of the national standardized tests are based on the content standards adopted by major professional organizations, such as the International Reading Association, state standards, and the content of basal readers and other materials used to teach literacy.

In judging the quality of a test, it is also important

to know the standard error of measurement (SEM). The SEM is a

statistical estimate of the amount that a test score

might vary if the test were given again and again. Although tests

yield a particular scores.

Standard error of measurement (SEM): estimate of the difference

between the obtained score and what the sore would be if the test

were perfect.

Usefulness & Fairness

Ultimately, assessment measures should provide information that can be used to foster students’ learning. Results from assessments should not be used to convey a sense of failure and discourage effort. Instead, results should be used to promote students’ growth and sense of self-efficacy. Instead, results should be used to promote students’ growth and sense of self-efficacy.

Tests, of course, should also be fair. As the Joint Task Force on Assessment (1994) notes, “Because traditional test makers have all markers have all too frequently designed assessment tools reflecting narrow cultural values, students and schools with different backgrounds and concern often have not been fairly assessed.” Test bias can take many forms. A test can be biased on the basis of geography, gender, socioeconomic status, ethnicity, or race.

Under the No Child Left Behind Act of 2001, English language learners (ELLs) who have been in the U.S. schools for at least 10 months are required to be assessed in English reading. ELLs are also tested annually to measure their proficiency in English.

Language (s) spoken at home Educational background Reading and writing activities engaged in Favorite books in first language Proficiency in speaking first language Proficiency in speaking English Proficiency in reading and writing in first language Proficiency in reading and writing in English

When group tests are used, struggling readers are often assessed unfairly. It is a widespread practice to administer the same norm-referenced or criterion-referenced test to an entire class, even though there may be a wide range of reading ability in that class. For instance, a seventh-grade level would find a typical seventh-grade test to be extremely frustrating.

How can you tell if a test is too easy or too hard? A test is probably too hard if a student fails to obtain a score that is better than he would have gotten if he had merely guessed.

The solution is to assess students on the level to which they functioning (gunning, 1982). This might mean giving a student an out-of-level test. For a seventh grader reading on a second-grade level this means giving the student a test that actually has a second-grade material on it.

Another possible solution is to select a test that assesses a wide range of reading levels. Administering the right-level test means that you need to know students’ approximate reading levels.

High-stakes tests, such as collage entrance exams, have long been

part of education. However, the number and uses of high-stakes tests has increased dramatically. High-stakes tests are now used in many school districts to determine whether young children PASS or Fail.

Parents are understandably concerned about their children’s performance. When discussing assessment results, focus on the child’s strengths. Also avoid comparisons with other children, if possible. Discuss the student’s performance in the light of what he might reasonably be expected to do. If the student has below-average scores, stress signs of progress.

Assessment is an interactive process and should consider the reader,

the text, the techniques being used, the reading or writing task involved, and the context in the which the reading or writing is

performed. Assessment should also be dynamic. Through instruction provided after initial testing or through trial or assisted teaching, it

should attempt to discover what the student’s true learning potential is and how the student learns best.

overview of assessment ch. 3

Education

students strengths

students program

circumstances students

students potential

students performance

assessment devices

form of dynamic assessment

dynamic assessment analysis