six steps for avoiding misinterpretations

INTERPRETING TEST SCORES

AND NORMSPresented To;

Dr. Muhammad Ramzan

Presented By;

Abdul Majid (RNo.02) MPhil Education

Interpreting Test Scores and

Norms• Criteria Reference or Standard based

Test

• True Zero Point

(The point at which there is no height

at all or no weigh at all)

Equal units provide uniform meaning

Methods of expressing test scores

METHODS OF INTERPRETING

TEST SCORES Raw scores Criterion reference and standard based

interpretations Norm referenced Interpretation Grade norms Percentile Rank Standard scores Profiles Judging the adequacy of norms Using local norms

Raw Scores

Numerical summary of student’s test performance

Criterion Referenced and Standard Based interpretations

o Describe individual’s test performance

o Specify levels of performance

o Percentage score used

o Measures clearly stated learning tasks

o Criterion Referenced Interpretation of Standardization Test

o Analyzing each student response

o Expectancy tables

Standardized Tests – What’s the

Difference? Criterion-Referenced Test

Criterion-referenced tests, also called mastery tests, compare a person's performance to a set of objectives. Anyone who meets the criterion can get a high score.

Everyone knows what the benchmarks / objectives are and can attain mastery to meet them.

It is possible for ALL the test takers to achieve 100% mastery.

“Adjusting” The Raw Score

We have already noted the most immediate result from an

assessment or test is the raw score. Sometimes, before we proceed

to discuss the meaning of the score (i.e., interpret it from either a

norm or criteria perspective), the raw score is adjusted. Usually this

is done by researchers, not classroom teachers. Two special

considerations

◦ Correction for Guessing

Only for selected-response items

Use has faded

◦ Factoring in Item Difficulty

Students get a higher “Theta” Score based on doing well on the

more difficult items of a test.

In fact, you may be looking at a test score report given in

percentiles or standard scores and not realize you are looking

at a transformation from a Theta score rather than the

traditional raw score.

6

Interpreting Student PerformanceNorms or Criteria . . . .

Intelligent interpretation of student performance is crucial for

the use of educational assessment information. We are

building toward this with previous discussions of

◦ Building / choosing good tests

◦ Determining reliability

◦ Determining validity

So now we are set to explore some methods of interpretation.

These methods fall into two basic categories or approaches:

◦ Norm-referenced

Compare this student with others.

◦ Criterion-referenced

Compare this student with some judgment regarding

expected performance level irrespective of others.

7

Standardized Tests – What’s the

Difference? Norm-Referenced TestNorm-referenced tests compare an individual's

performance with the performance of others. They are designed to yield a normal curve, with 50%

of test takers scoring above the 50th percentile and 50% scoring below it, so half the test takers MUST pass and half the test takers MUST fail

The test makers design the test with questions that MOST people will get incorrect.

If too many people get a question correct, or too many score well, then test questions are “thrown out” until they achieve a normal curve again.

Norm Referenced

Interpretation How an individual compares with other

persons

Rank the scores from highest to

lowest

Derived scores

o Obtain a general framework for norm

referenced interpretation

o Raw scores converted to derived

scores

o Numerical report of test performance

Interpreting Test Scores

(some definitions) Raw score. This is the number of items the

student answered correctly. It is used to calculate the other, more useful scores.

Stanine. One of nine equal sections of the normal curve. Stanines can be easily averaged and compared from test to test, but are less precise than other scores.

Normal curve equivalent (NCE). For these scores, the normal curve is divided into equal units ranging from 1 to 99, with an average of 50. These can be averaged and compared from test to test or year to year.

Norms And It Types

o Based on students performance

a) Grade Equivalent or Norms

b) Percentile Ranks

c) Standard Scores

• Above first two indicate individuals relative

understanding with in group

• While Standard Scores indicate T-Scores,

Normal curve equivalent and Standard age

scores

• Differ only in numerical values

Six Misinterpretations(Assumptions

about Grade Equivalent)1) Assuming norms are standards

2) Assuming grade equivalents indicate appropriate grade

3) Assuming all students expected to grow

4) Assuming units are equal

5) Assuming grade equivalent are comparable

6) Assuming scores are based on extrapolations to grades

Six Steps For Avoiding

Misinterpretations1) Don't confuse norms with standards

of what should be

2) Don't interpret a grade equivalent as

an estimate of the grade where a

student should be placed.

3) Don't expect that all students gain 1.0

grad equivalent each year

Don’t assume that the unit are equal

at different parts of the scale.

Don’t assume that scores on different

tests are comparable

Don’t interprete extreme scores as

dependable estimates of students

performance level.

Percentile Rankat or below . . .

Percentiles and Percentile Rank

◦ Definition: % of cases “at or below”

◦ As we noted earlier, these two terms are different

conceptually, however, in practice often both terms are

used interchangeably.

◦ Strengths:

Easy to describe

Easy to compute

◦ Weaknesses

Confusion with a “percentage-right score”

Inequality of units [see next slide]

15

Percentile Rank

Widely used and easy understood.

Indicates students relative position.

Interpreted as percentage of individuals.

Standard scores

Showing how far the raw score is above average or below average.

Expressed in standard deviation and mean.

SD is a measure of the spread of scores.

Normal Curve & Standard

Deviation Unit Symmetrical bell shaped.

Curve contain fix percentages

Types of standard scores

Z-score.Z-score = X – M / SD

X = any raw score.

M = Arithmetic Mean.

SD = Standard deviation of raw scores.

Z-score is always negative when raw score is smaller than the mean.

Pros & Cons of Standard Scores

Strengths

◦ Wide applicability

◦ Nice statistical properties

◦ Teachers often build their narrative reports on these standard scores using the “accepted descriptive words” rather than the numbers.

Weaknesses

◦ May be hard to explain to laypersons

◦ Need to know M and SD of original test

◦ Teachers often build their narrative reports on these standard scores using the “accepted descriptive words” rather than the numbers.

18

More Standard Scores of Interest . .

. .◦ T-scores, SATs, GREs

◦ NCEs (Normal Curve Equivalent)

Recall that the percentile rank scale is not an equal-interval scale; NCEs solve this problem by converting percentile ranks to an equal-interval scale. NCEs range from 1 to 99 with a mean of 50. The major advantage of NCEs over percentile ranks is that NCEs can be averaged.

Used almost exclusively by federal reporting requirement for achievement testing.

◦ Stanines

Widely used in schools so we will look at them in more detail in the next slide.

19

T-Score

Types of normalized score.

T-score = 50 + 10 (z)

Normalized standard scores.

can be calculated by

1. Converting distribution of raw scores into percentile rank.

2. Looking up the Z-score to the corresponding raw scores.

3. Converting the Z-scores to T-score

Stanines

Pronounced (stay-nines)

Single digit scores.

Strengths of stanines scores.

1. Nine point scale 9 is high, 1 is low and 5 is average.

2. Possible to compare a student’s performance on different tests.

3. Easy to combine diverse types of data (test scores, rating, ranked data)

4. Uses single digit score, easily recorded and takes less space.

Limitation

1. Growth can’t be shown from 1 year to the next.

Judging the adequacy of

norms Main purpose is to able to interpret

students test performance.

Qualities in test norms.

1. Test norms should be relevant.

2. Test norms should be representative.

3. Test norms should be upto date.

4. Test norms should be comparable.

5. Test norms should be adequately described.

Using local norms

Compare students with local norms.

Published norms on aptitude,

educational experience, cultural

background.

Prepared using percentile ranks or

stanines.

Cautions in interpreting test

scores A test should be interpreted in terms of

the specific test from which it was derived.

A test should be interpreted in light of all of the students relevant characteristics.

A test should be interpreted according to the type of decision to be made.

A test score should be interpreted as a band of scores rather than as a specific value.

A test score should be verified by supplementary evidence.