characteristics of a good test

59
A. RELIABILITY A. RELIABILITY CHARACTERISTICS OF A CHARACTERISTICS OF A GOOD TEST GOOD TEST

Upload: cyrilcoscos

Post on 16-Dec-2014

1.668 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Characteristics of a good test

A. RELIABILITYA. RELIABILITY

CHARACTERISTICS OF A CHARACTERISTICS OF A GOOD TESTGOOD TEST

Page 2: Characteristics of a good test

ReliabilityReliability

• Reliability is synonymous with consistency. It is the degree to which test scores for an individual test taker or group of test takers are consistent over repeated applications.

• No psychological test is completely consistent, however, a measurement that is unreliable is worthless.

Page 3: Characteristics of a good test

Would you keep using these measurement tools?

The consistency of test scores is critically important in determining whether a test can provide good measurement.

Page 4: Characteristics of a good test

When someone says you are a ‘reliable’ person, what do they really mean?

Are you a reliable person?

Page 5: Characteristics of a good test

Reliability (cont.)Reliability (cont.)

* Because no unit of measurement is exact, any time you measure something (observed score), you are really measuring two things.

1. True Score - the amount of observed score that truly represents what you are intending to measure.

2. Error Component - the amount of other variables that can impact the observed score

Observed Test Score = True Score + Errors of Measurement

Page 6: Characteristics of a good test

Measurement ErrorMeasurement Error

• Any fluctuation in test scores that results from factors related to the measurement process that are irrelevant to what is being measured.

• The difference between the observed score and the true score is called the error score. S true = S observed - S error

Page 7: Characteristics of a good test

Measurement Error is Reduced By:

- Writing items clearly

- Making instructions easily understood

- Adhering to proper test administration

- Providing consistent scoring

Page 8: Characteristics of a good test

Determining ReliabilityDetermining Reliability• There are several ways that measuring reliability can be

determined, depending on the type of measurement the supporting data required. They include:

- Internal Consistency

- Test-retest Reliability

- Inter rater Reliability

- Split-half Methods

- Odd-even Reliability

- Alternate Forms Methods

Page 9: Characteristics of a good test

Internal ConsistencyInternal Consistency

• Measures the reliability of a test solely on the number of items on the test and the inter correlation among the items. Therefore, it compares each item to every other item.

Cronbach’s Alpha: .80 to .95 (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory)

<.60 (Suspect)

Page 10: Characteristics of a good test

Split Half & Odd-Even ReliabilitySplit Half & Odd-Even ReliabilitySplit Half - refers to determining a correlation between the first

half of the measurement and the second half of the measurement (i.e., we would expect answers to the first half to be similar to the second half).

Odd-Even - refers to the correlation between even items and odd items of a measurement tool.

• In this sense, we are using a single test to create two tests, eliminating the need for additional items and multiple administrations.

• Since in both of these types only 1 administration is needed and the groups are determined by the internal components of the test, it is referred to as an internal consistency measure.

Page 11: Characteristics of a good test

Split-half reliability [error due to differences in item content between the halves of

the test]

• Typically, responses on odd versus even items are employed

• Correlate total scores on odd items with the scores obtained on even items

Person Odd Even

1 36 43

2 44 40

3 42 37

4 33 40

1

100

50 pairs

Page 12: Characteristics of a good test

Test-retest ReliabilityTest-retest Reliability• Test-retest reliability is usually measured by computing

the correlation coefficient between scores of two administrations.

Page 13: Characteristics of a good test

Test-retest Reliability (cont.)Test-retest Reliability (cont.)• The amount of time allowed between measures is critical.

• The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation. This is because the two observations are related over time.

• Optimum time between administrations is 2 to 4 weeks.

• The rationale behind this method is that the difference between the scores of the test and the retest should be due to measurement solely.

Page 14: Characteristics of a good test

Inter rater ReliabilityInter rater Reliability

• Whenever you use humans as a part of your measurement procedure, you have to worry about whether the results you get are reliable or consistent. People are notorious for their inconsistency. We are easily distractible. We get tired of doing repetitive tasks. We daydream. We misinterpret.

Page 15: Characteristics of a good test

Inter rater Reliability (cont.)Inter rater Reliability (cont.)

• For some scales it is important to assess interrater reliability.

• Interrater reliability means that if two different raters scored the scale using the scoring rules, they should attain the same result.

• Interrater reliability is usually measured by computing the correlation coefficient between the scores of two raters for the set of respondents.

• Here the criterion of acceptability is pretty high (e.g., a correlation of at least .9), but what is considered acceptable will vary from situation to situation.

Page 16: Characteristics of a good test

Parallel/Alternate Forms MethodParallel/Alternate Forms MethodParallel/Alternate Forms Method - refers to the

administration of two alternate forms of the same measurement device and then comparing the scores.

• Both forms are administered to the same person and the scores are correlated. If the two produce the same results, then the instrument is considered reliable.

Page 17: Characteristics of a good test

Parallel/Alternate Forms Method (cont.)Parallel/Alternate Forms Method (cont.)

• A correlation between these two forms is computed just as the test-retest method.

Advantages • Eliminates the problem of memory effect.

• Reactivity effects (i.e., experience of taking the test) are also partially controlled.

Page 18: Characteristics of a good test

Factors Affecting ReliabilityFactors Affecting Reliability

• Administrator Factors

• Number of Items on the instrument

• The Instrument Taker

• Heterogeneity of the Items

• Heterogeneity of the Group Members

• Length of Time between Test and Retest

Page 19: Characteristics of a good test

How High Should Reliability Be?How High Should Reliability Be?

• A highly reliable test is always preferable to a test with lower reliability.

. 80 > greater (Excellent) .70 to .80 (Very Good)

.60 to .70 (Satisfactory) <.60 (Suspect)

• A reliability coefficient of .80 indicates that 20% of the variability in test scores is due to measurement error.

Page 20: Characteristics of a good test

Reliability deals with the consistency.

Reliability is the quality that guarantees us that we will get similar results when conducting the same test on the same population every time.

Consider this ruler…

Page 21: Characteristics of a good test

Now compare this ruler…

With this one…

Page 22: Characteristics of a good test

Each ruler will give the same answer each time…

But this one will be wrong each time…

Page 23: Characteristics of a good test

Each ruler is reliable…

But reliability doesn‘t mean much when it is wrong…

Page 24: Characteristics of a good test

So, not only do we require reliability…

We also need…

Page 25: Characteristics of a good test

VALIDITY

Good Ruler

Bad Ruler

Page 26: Characteristics of a good test

VALIDITY

Validity deals with the accuracy of the measurement

Page 27: Characteristics of a good test

Validity Depends on the PURPOSE E.g. a ruler may be a valid measuring device for

length, but isn’t very valid for measuring volume Measuring what ‘it’ is supposed to Matter of degree (how valid?) Specific to a particular purpose! Learning outcomes

1. Content coverage (relevance?)2. Level & type of student engagement

(cognitive, affective, psychomotor) – appropriate?

Page 28: Characteristics of a good test

Types of validity measures

Face validity Construct validity Content validity Criterion validity

Page 29: Characteristics of a good test

Face Validity

Does it appear to measure what it is supposed to measure?

Example: Let’s say you are interested in measuring, ‘Propensity towards violence and aggression’. By simply looking at the following items, state which ones qualify to measure the variable of interest: Have you been arrested? Have you been involved in physical fighting? Do you get angry easily? Do you sleep with your socks on? Is it hard to control your anger? Do you enjoy playing sports?

Page 30: Characteristics of a good test

Construct Validity Does the test measure the ‘human’ theoretical

construct or trait. Examples

Mathematical reasoning Verbal reasoning or fluency Musical ability Spatial ability Motivation

Applicable to authentic assessment Each construct is broken down into its

component parts E.g. ‘motivation’ can be broken down to:

Interest Attention span Hours spent Assignments undertaken and submitted, etc. All of these sub-constructs put together – measure

‘motivation’

Page 31: Characteristics of a good test

Content Validity

How well elements of the test relate to the content domain?

How closely content of questions in the test relates to content of the curriculum?

Directly relates to instructional objectives and the fulfillment of the same!

Major concernfor achievement tests (where content is emphasized)

Can you test students on things they have not been taught?

Page 32: Characteristics of a good test

How to establish Content Validity?

Instructional objectives (looking at your list) Table of Specification E.g. At the end of the chapter, the student will be

able to do the following:1. Explain what ‘stars’ are2. Discuss the type of stars and galaxies in our universe3. Categorize different constellations by looking at the

stars 4. Differentiate between our stars, the sun, and all

other stars

Page 33: Characteristics of a good test

Categories of Performance (Mental

Skills)

Content areas

Knowledge Comprehension Analysis Total 1. What are

‘stars’?

2. Our star, the Sun

3. Constellations 4. Galaxies

Total Grand Total

Table of Specification (An Example)

Page 34: Characteristics of a good test

Criterion Validity

The degree to which content on a test (predictor) correlates with performance on relevant criterion measures (concrete criterion in the "real" world?)

If they do correlate highly, it means that the test (predictor) is a valid one!

E.g. if you taught skills relating to ‘public speaking’ and had students do a test on it, the test can be validated by looking at how it relates to actual performance (public speaking) of students inside or outside of the classroom

Page 35: Characteristics of a good test

Factors that can lower Validity

Unclear directions Difficult reading vocabulary and sentence structure Ambiguity in statements Inadequate time limits Inappropriate level of difficulty Poorly constructed test items Test items inappropriate for the outcomes being

measured Tests that are too short Improper arrangement of items (complex to easy?) Identifiable patterns of answers Teaching Administration and scoring Students Nature of criterion

Page 36: Characteristics of a good test

Validity and Reliability

Neither Valid

nor Reliable Reliable but not

Valid

Valid & Reliable Fairly Valid but not very Reliable

Think in terms of ‘the purpose of tests’ and the ‘consistency’ with which the purpose is fulfilled/met

Page 37: Characteristics of a good test

Objectivitythe state of being fair, without bias or external

influence.if the test is marked by different people, the score

will be the same . In other words, marking process should not be affected by the marking person's personality.

Not influenced by emotion or personal prejudice. Based on observable phenomena; presented factually: an objective appraisal.

The questions and answers should be clear

Page 38: Characteristics of a good test

measures an individual's characteristics in a way that is independent of rater’s bias or the examiner's own beliefs

gauges the test taker's conscious thoughts and feelings without regard to the test administrator's beliefs or biases.

help greatly in determining the test taker's personality.

Page 39: Characteristics of a good test

Understanding Norms

a list of scores and corresponding percentile ranks, standard scores, or other transformed scores of a group of examinees on whom a test was standardized.

In a psychometric context, norms are the test performance data of a particular group of test takers that are designed for use as a reference for evaluating or interpreting individual test scores” (Cohen & Swerdlik, 2002, p. 100).

Page 40: Characteristics of a good test

TYPES OF NORMS

•Percentiles - refer to a distribution divided into 100 equal parts.

- refer to the score at or below which a specific percentage of scores fall.

Ex. A student got 90% rank of NAT exam. What does this mean?

Page 41: Characteristics of a good test

It means that 90% of his classmates scored lower than his score or 10% of his classmates got score above his score.

Page 42: Characteristics of a good test

Age Norms (age-equivalent scores)

–“indicate the average performance of different samples of test takers who were at various ages at the time the test was administered” (Cohen & Swerdlik, 2002, p. 105).

Grade Norms–Used to indicate the average test

performance of testtakers in a specific grade.

–Based on a ten month scale, refers to grade and month (e.g., 7.3 is equivalent to seventh grade, third month).

Page 43: Characteristics of a good test

•National Norms–Derived from a standardization sample

nationally representative of the population of interest.

Subgroup Norms–Are created when narrowly defined groups are

sampled.Ex. •Socioeconomic status

•Handedness•Education level

Page 44: Characteristics of a good test

Local Norms–Are derived from the local population’s

performance on a measure.- Typically created locally (i.e., by guidance

counselor, personnel director, etc.)Fixed Reference Group Scoring Systems•Calculation of test scores is based on a fixed

reference group that was tested in the past.

Page 45: Characteristics of a good test

•Norm referenced tests consider the individual’s score relative to the scores of testtakers in the normative sample.

•Criterion Referenced tests consider the individual’s score relative to a specified standard or criterion (cut score).

–Licensure exams–Proficiency tests

Page 46: Characteristics of a good test

Item Analysis

A name given to a variety of statistical techniques designed to analyze individual items on a test

It involves examining class-wide performance on individual test items.

It sometimes suggests why an item has not functioned effectively and how it might be improved

A test composed of items revised and selected on the basis of item-analysis is almost certain to be more reliable than the one composed of an equal number of untested items.

Page 47: Characteristics of a good test

Difficulty index

The proportion of students in class who got an item correct. The larger the proportion , the more students who have learned the content measured by the item

Page 48: Characteristics of a good test

Discrimination indexA basic measure of the validity of an

item.A measure of an item’s ability to

discriminate between those who scored high on the total test and those who scored low.

It can be interpreted as an indication of the extent to which overall knowledge of the content area or mastery of the skill is related to the response on an item

Page 49: Characteristics of a good test

Analysis of response options/distracter analysis

In addition to examining the performance of a test item, teachers are often interested in examining the performance of individual distracters ( incorrect answer options) on multiple-choice items

By calculating the proportion of students who chose each answer option, teachers can identify which distracters are working and appear to be attractive to students who do not know the correct answer, and which distracters are simply taking up space and not being chosen by many students

Page 50: Characteristics of a good test

To eliminate blind guessing which results in a correct answer purely by chance (which hurts the validity of a test item), teachers want as many plausible distracters as is feasible.

Page 51: Characteristics of a good test

The process of item analysis

1. Arrange the test scores from highest to lowest2. Select the criterion groups

Identify a High group and a Low group. The High group is the highest-scoring 27% of the group and the Low group is the lowest scoring 27%

27% of the examinees is called the criterion group. It provides the best compromise between two desirable but inconsistent aims:to make the extreme groups as large as possible and as different as possible

then we can say with confidence that those in the High group are superior in the ability measured by the test than those in the Low group.

Page 52: Characteristics of a good test

3. For each item, count the number of examinees in the High group who have correct responses. Do a separate, similar procedure for the low group

4. Solve for the difficulty index of each item The larger the value of the index, the easier the

item. The smaller the value, the more difficult is the

item. Scale for interpreting the difficulty index of an item

Below 0.25 item is very difficult0.25 – 0.75 item is of average difficulty

or item is rightly difficult

Above 0.75 item is very easy

Page 53: Characteristics of a good test

Example: Item analysis

1. Count and arrange the scores from highest to lowest. Ex. n=43 scores

2. Calculate the criterion group (N) which is 27% of the total number of scores. Ex. N=27% of 43= (0.27)(43) = 12

3. Take 12 scores from the highest down and take 12 scores from the lowest up, call these High group and Low group respectively.

4. Tabulate the number of responses for each options from the high and low groups for that particular item under analysis.

Page 54: Characteristics of a good test

5. Solve for the difficulty index of each item The larger the value of the index, the

easier the item. The smaller, the more difficult.

Scale for interpreting the difficulty index of an itemBelow 0.25 item is very difficult0.25 – 0.75 item is of average difficulty

or item is rightly difficult

Above 0.75item is very easy

Page 55: Characteristics of a good test

A B C D* E Total

Upper Group

1 1 0 9 1 12

Lower Group

3 1 4 4 0 12

Ex: Item # 5 of the Multiple Choice test, D is the correct option.

Page 56: Characteristics of a good test

Idis Index Description Interpretation

0.40 – 1.0 High The item is very good

0.30 -0.39 Moderate Reasonably good, can be improved

0.20 – 0.29 Moderate In need of improvement

< 0.20 Low Poor, to be discarded

The following can be used to interpret the index of discrimination.

Page 57: Characteristics of a good test

Idis Idif Item category

High Easy Good

High Easy/difficult Fair

Moderate Easy/difficult Fair

High/moderate Easy/difficult Fair

low At any level Poor (Discard the item)

•Interpreting the results by giving value judgment

Page 58: Characteristics of a good test

Index of difficulty = (Hc + Lc) / 2N = (9+4)/2(12)=.54 ----the item is rightly difficult

Index of discrimination = (Hc –Lc)/N=(9-4)/12=.42

---- high index of discrimination---- the item has the power to

discriminate

Hence, item number 5 has to be retained.

Distracter analysis: A and C are good distracters

Page 59: Characteristics of a good test

Thank you and God bless us all!