measurement disturbance effects on rasch fit …/67531/metadc279376/... · logit residual index...

Vrv

MEASUREMENT DISTURBANCE EFFECTS ON RASCH FIT

STATISTICS AND THE LOGIT RESIDUAL INDEX

DISSERTATION

Presented to the Graduate Council of the

University of North Texas in Partial

Fulfillment of the Requirements

For the Degree of

DOCTOR OF PHILOSOPHY

By

Robert E. Mount, A.A., B.S., M.A., C.R.C.

Denton, Texas

August, 1997

Mount, Robert E., Measurement disturbance effects on Rasch fit statistics and the

Logit Residual Index. Doctor of Philosophy (Educational Research), August, 1997, 194

pp., 15 tables, 2 illustrations, references, 32 titles.

The effects of random guessing as a measurement disturbance on Rasch fit

statistics (unweighted total, weighted total, and unweighted ability between) and the

Logit Residual Index (LRI) were examined through simulated data sets of varying sample

sizes, test lengths, and distribution types. Three test lengths (25, 50, and 100), three

sample sizes (25, 50, and 100), two item difficulty distributions (normal and uniform),

and three levels of guessing (no guessing [0%], 25%, and 50%) were used in the

simulations, resulting in 54 experimental conditions. The mean logit person ability for

each experiment was +1. Each experimental condition was simulated once in an effort to

approximate what could happen on the single administration of a four option per item

multiple choice test to a group of relatively high ability persons. Previous research has

shown that varying item and person parameters have no effect on Rasch fit statistics.

Consequently, these parameters were used in the present study to establish realistic test

conditions, but were not interpreted as effect factors in determining the results of this

study.

Rasch fit statistics were found to be robust to varying levels of guessing and to

distribution types. The unweighted total fit statistic was more sensitive to fit problems

far away from the average ability of the group in which the fit problems occurred (fit

problems away from the item difficulties). The weighted total fit statistic was more

sensitive to fit problems centered on the item difficulties. It was also found that, as the

probability of guessing the correct answer increased, low-ability persons tended

consistently to guess the correct answer inducing item bias (item familiarity) into the

tests. These items were detected by the unweighted between fit statistic. In conditions

involving minor fit problems and misfitting items, the LRI was able to identify group

membership of the persons in which the fit problem occurred. Therefore, it is necessary

to use the unweighted total, weighted total, and between fit statistics in combination with

the LRI to diagnose fit problems for a more accurate assessment of individual differences.

Vrv

MEASUREMENT DISTURBANCE EFFECTS ON RASCH FIT

STATISTICS AND THE LOGIT RESIDUAL INDEX

DISSERTATION

Presented to the Graduate Council of the

University of North Texas in Partial

Fulfillment of the Requirements

For the Degree of

DOCTOR OF PHILOSOPHY

By

Robert E. Mount, A.A., B.S., M.A., C.R.C.

Denton, Texas

August, 1997

TABLE OF CONTENTS

Page

LIST OF TABLES v

LIST OF ILLUSTRATIONS vii

Chapter

1. INTRODUCTION 1

Overview Properties of Estimators Rasch Estimation Methods Rationale for the Study Research Question Definition of Terms Delimitations

2. REVIEW OF THE LITERATURE 15

Historical Perspective Measurement Disturbances Rasch Fit Statistics Logit Residual Index Purpose of Study

3. METHODS AND PROCEDURES 33

Data Set Construction Simulated Data Sets Rasch Analysis Statistical Analysis

4. RESULTS 39

Effect of Guessing on Rasch Fit Statistics Simulation Design Effects

Table of Content (continued)

Chapter Page

Detection of Guessing by Rasch Fit Statistics Guessing and the Logit Residual Index

5. FINDINGS AND CONCLUSIONS 59

Effect of Guessing on Rasch Fit Statistics Detection of Guessing by Rasch Fit Statistics Guessing and the Logit Residual Index Summary Conclusions Further Study Recommendations

APPENDIX

A IP ARM Control File Parameters 72

B A Summary of Item Fit Information by Experiment 76

C A Summary of Misfitting Item Statistics by Experiment 83

REFERENCES 192

L I S T O F T A B L E S

T a b l e P a g e

1. D e f i n i t i o n o f E x p e r i m e n t s 3 5

2 . M e a n S u m m a r y o f I t e m F i t I n f o r m a t i o n f o r E x p e r i m e n t s 1 - 9

( N o r m a l l y D i s t r i b u t e d I t e m D i f f i c u l t y D i s t r i b u t i o n s a n d N o

G u e s s i n g ) 4 0

3 . M e a n S u m m a r y o f I t e m F i t I n f o r m a t i o n f o r E x p e r i m e n t s 1 0 - 1 8

( N o r m a l l y D i s t r i b u t e d I t e m D i f f i c u l t y D i s t r i b u t i o n s a n d a 2 5 %

C h a n c e o f G u e s s i n g C o r r e c t l y ) 4 1

4 . S u m m a r y o f M e a n I t e m F i t I n f o r m a t i o n f o r E x p e r i m e n t s 1 9 - 2 7

( N o r m a l l y D i s t r i b u t e d I t e m D i f f i c u l t y D i s t r i b u t i o n s a n d a 5 0 %


5 . M e a n D i f f e r e n c e s f o r E x p e r i m e n t s W i t h N o r m a l l y D i s t r i b u t e d

I t e m D i f f i c u l t i e s W i t h N o G u e s s i n g , 2 5 % , a n d a 5 0 %

C h a n c e o f G u e s s i n g C o r r e c t l y 4 3


( U n i f o r m l y D i s t r i b u t e d I t e m D i f f i c u l t y D i s t r i b u t i o n s a n d N o

G u e s s i n g ) 4 4


( U n i f o r m l y D i s t r i b u t e d I t e m D i f f i c u l t y D i s t r i b u t i o n s a n d a 2 5 %


8. S u m m a r y o f M e a n I t e m F i t I n f o r m a t i o n f o r E x p e r i m e n t s 4 6 - 5 4

( U n i f o r m l y D i s t r i b u t e d I t e m D i f f i c u l t y D i s t r i b u t i o n s a n d a 5 0 %


9. M e a n D i f f e r e n c e s f o r E x p e r i m e n t s W i t h U n i f o r m l y D i s t r i b u t e d

I t e m D i f f i c u l t i e s W i t h N o G u e s s i n g , 2 5 % , a n d a 5 0 %

C h a n c e o f G u e s s i n g C o r r e c t l y 4 7

Table

List of Tables (continued)

Page

10. Mean Differences for Experiments With Normal and Uniformly Distributed Item Difficulties and Varying Levels of Guessing (No Guessing, 25%, and 50%) 48

11. A Comparison of the Frequency of Misfitting Items Detected by Rasch Fit Statistics in a Normal Distribution of Item Difficulties at Varying Test Lengths and Levels of Guessing Using %2 50

12. A Comparison of the Frequency of Misfitting Items Detected by Rasch Fit Statistics in a Uniform Distribution of Item Difficulties at Varying Test Lengths and Levels of Guessing Using x2 52

13. A Comparison of the Frequency of Misfitting Items Detected by Rasch Fit Statistics in Normal and Uniformly Distributed Item Difficulties Using % 54

14. Number and Percent of Misfitting Items Detected by Rasch Fit Statistics Across Experiments Involving Normal and Uniformly Distributed Item Difficulties 55

15. Number of Misfitting Items and Number and Percent of LRI Values by Experimental Conditions 56

LIST OF ILLUSTRATIONS

Figure Page

1. Rasch Model Notations 7

2. Table Format for the Display of Experimental Data 40

CHAPTER 1

INTRODUCTION

Overview

Stevens (1946) defined measurement as the assignment of numerals to objects or

events according to rules. However, the measurement of individual differences is not as

straightforward as the definition implies; it is fraught with several measurement

problems: (a) No one approach is universally acceptable; (b) measurements are based on

limited samples of behaviors; (c) measurements are subject to error; and (d) the units of

measurement are not well defined (Crocker & Algina, 1986). Of the problems associated

with the measurement of individual differences, the accuracy of measurement is probably

the most important. A measure can only be as accurate as the ruler (scale) used to obtain

the measurement.

Traditionally, the measurement of individual differences was based on the

classical true score theory and its associated statistics and scales of measurement

(nominal, ordinal, interval, and ratio). The irony is that measurements obtained using this

approach were found to be sample specific and that generalizations beyond the reference

sample should be made with caution. In addition, the scales of measurement were found

to have arbitrary zero points and unequal measurement intervals. Therefore,

measurements obtained using these scales may be as arbitrary and unequal as the ruler

used to make them. Given these characteristics, estimates (statistics) obtained in

identifying individual differences tended to be biased, inconsistent, insufficient, and

inefficient when used with nonnormal distributions.

What was needed was a true linear scale that had an absolute zero point and equal

interval units of measurement. In 1960, one such scale with associated statistics, was

developed by Rasch (Rasch, 1980). Known as Rasch Analysis, this approach allowed for

the independent estimation of person ability and item difficult parameters. In addition,

the statistics used in this approach were found to be consistent, efficient, sufficient, and

unbiased.

In the assessment of item function in the Rasch model, one observes the residual

trends among ability groups. The Logit Residual Index (LRI), a statistic introduced by

Smith (1991b), is a measure of how far an item deviates from the common slope that is

fitted for all items (model curve) and the residual trend among ability groups. Thus,

items with LRI values greater than zero will have an item characteristic curve (ICC) that

is steeper than the modeled, curve and items with values less than zero will have an ICC

that is flatter than the modeled curve. The residual trend is an indication of how well, or

less well, different ability groups performed on an individual item. The purpose of this

investigation is to test the effects of varying test parameters and levels of measurement

disturbance on Rasch fit statistics and the Logit Residual Index.

Properties of Estimators

Rarely, if ever, are characteristics about a complete population known; therefore,

researchers make inferences about a population based on a representative random sample

taken from the population. A population refers to all members in the entire group having

some common characteristic. For example, a population may include all members in a

classroom, school, city, community, county, state, nation, or the world. As group

membership increases, it becomes increasingly more difficult to obtain measures on all

characteristics due to restrictions in time and costs, or the population size increases too

rapidly. A population may be finite, a known or countable number of members, or

infinite, a population that is so large that group membership is not known.

Characteristics about a population are called parameters and characteristics about a

sample are called statistics. The most commonly used estimates about populations are

measures of central tendency (mean, median, and mode). They identify the most typical

measures in a normal distribution.

The arithmetic mean is the most commonly used measure of central tendency. It

is a simple arithmetic average determined by summing all scores in a distribution of

scores and dividing by the total number of scores. The mean is an appropriate measure of

central tendency when the score distribution is normal and the level of measurement is on

the interval or ratio scale. The median is the midpoint in a distribution of scores when

arranged in order of magnitude. Stated differently, it is the point on the score scale below

which 50 % of the scores fall. It is also equivalent to the 50th percentile. The median is

an appropriate measure of central tendency when the level of measurement is on the

ordinal scale and the score distributions are other than normal. The mode is the most

frequently occurring score in a distribution of scores. However, it is not a dependable

measure of central location. Depending on the shape of the distribution, it is possible to

have two (bimodal) or more modes (multimodal). The mode is an appropriate measure of

central tendency when the level of measurement is on the nominal scale.

In a normally distributed population of scores, the mean, median, and mode will

coincide or be the same value. However, in nonnormal distributions, these values differ,

and, therefore, certain estimators have more desirable properties than others. The

desirable properties of estimators are consistency, efficiency, sufficiency, and

unbiasness. These serve as criteria for determining preferences for one method of

estimation over another.

An estimator is considered to be unbiased when the mean of a sampling

distribution of means approaches that of the population parameter as the number of

samples of a given size increases. That is, a statistic is unbiased when it shows no

systematic tendency to be either greater than or less than the population parameter. For

example, it can be shown that the variance

S2 =E(X-X) 2 /n

is a biased estimate of the population variance (cr2) (Ferguson, 1981).

Consistency implies that an estimator more closely approximates a population

parameter as sample size increases. The arithmetic mean is a prime example of a

consistent estimator. It more closely approximates the population parameter as the

sample size increases.

Efficiency is implied by sampling variance. It refers to the variability of estimates

from sample to sample, or the degree of sampling error associated with the estimator.

That is, if the sampling error is less than the sampling error associated with any other

method of estimation, the estimate is considered to be efficient (Ferguson, 1981). More

explicitly, the estimator that has the smallest standard error is more efficient (Walker &

Lev, 1953).

An estimator is sufficient for estimating a population parameter if it exhausts all

the information about the population parameter from sample data. For example, the mean

is a sufficient estimator of the population mean (JJ,), because, once the sample mean is

known, any other statistic computed from the sample data (such as the median or mode)

would provide no further information about the population mean (Neter, Wasserman, &

Whitmore, 1978). Given a normal population, the mean provides an estimate of (j, that

satisfies all the desirable properties of a good estimator (consistent, efficient, sufficient,

and unbiased) (Walker & Lev, 1953).

Rasch Estimation Methods

Measurement implies the determination of the quantity, quality, or some other

characteristic of an object or attribute. It answers the questions concerning how many,

how often, and how much of a particular object or attribute exist. The process involves

the assignment of units or numbers in a logical fashion along a dimension or scale. When

an object or attribute is measured, it is assigned a specific position along a dimension or

numerical scale. Traditionally, we have used four scales or levels of measurement: (a)

nominal scale-when numbers, names, or words are used to identify or label individuals

or objects; (b) ordinal scale—when numbers or words reflect the order of things, (c)

interval scale-has equal units of measurement and an arbitrary zero point; and (d) ratio

scale—has equal units of measurement and a true or absolute zero point. Each scale of

measurement has its own rules and makes different assumptions about the measurement

process. These scales are prevalent in the traditional classical true score approach to the

measurement of human traits.

The Rasch measurement model, unlike the classical true score model, attempts to

explain the effect of a person's ability on item performance. The Rasch model frees the

estimation of a person's ability from the item difficulty, and the estimation of the item

difficulty is freed from the person's ability. In short, the more able the person, the better

the chances for success on an item, and the easier the item, the more likely a person is to

solve it (Wright & Stone, 1979). It has been shown that no other mathematical model

allows the estimation of person ability measures (Pv) and item difficulty calibrations^)

independent of each other (Anderson, 1973; Barndorff-Nielsen, 1978; Rasch, 1961;

Wright & Stone, 1979). The logistic function (probability of a correct response) in the

Rasch model,

p { * v i - 11 Pv.8,! = exp (|3V - 8,)/[l + exp(fiv - 6,)],

P b°th linearity of scale and generality of measure (Wright & Stone, 1979).

Rasch called this particular characteristic "specific objectivity." The symbols and

associated definitions used in the Rasch model are presented in Figure 1.

1 r ~ ability level = = = = = = = = = = = = = = = = = = = = 8 difficulty level "

rv test score of person v " L the number of items in the test H the average difficulty level of the test M t

mean person ability © the variance in difficulties of the test items

i an item on the test Pi sample p-value of an item i V person

Pv ability level of person r score on the Test

individual item difficulty 8i item difficulty level

Xvi person response

Figure 1. Rasch model notations.

Several Rasch measurement models have been identified: (a) rating scale, (b)

poisson, (c) binominal, (d) dichotomous, (e) partial credit, and (f) many faceted.

However, for a measurement model to wo*, there must be some method of estimating its

parameters. Rasch identified six estimation methods: (a) the LOG method, (b) the PAIR

method, (c) the FCON method, (d) the UCON method, (e) the PROX method, and (f) the

UFORM method. Of the six estimation methods, PROX is the only estimation procedure

m which item and person parameters can be easily calculated by hand.

PROX is a normal approximation estimation procedure that expresses item

difficulty calibrations and person ability measures on a common linear scale. This

measurement unit is called a logit. The procedure assumes that: (a) person abilities (Pv)

are more or less normally distributed [with mean (jj) and standard deviation (cr)], and (b)

item difficulties (8j) are assumed to be more or less normally distributed with average

difficulty (H) and standard deviation (co). Consequently, the effects of the sample on

item difficulty calibrations and that of test length on person ability measures can be

summarized by means and standard deviations on the variable being measured (Wright &

Masters, 1982). The PROX estimation procedure frees the scores from the effects of

sample size and test length, then transforms them into a logit measure (Wright & Stone,

1979). The PROX estimation of a person's ability can be found without iteration as

bv = H + (1 + (<o2/2.89)K In [rv/(L - rv)],

with a standard error of

SE(bv) = (1 + ( C O 2 / 2 . 8 9 ) k [L/rv(l - r v ) f ,

a test height of

L

H = J ] di / L ,

i

and a variance estimate of

L

fi>2 = ( J > 2 -LH 2 ) / (L-1) . i

Item difficulty dj can be found as

d, = M + [1 + O2/2.89f In [(1 - Pi)/Pi]

with a standard error of

SE(dj) = (1 + O'2/2.89)'/2 [l/NPi/(l - Pi)]*

The Rasch model uses the logit function,

In [(1 - PiVPi],

to transform the item p-value into a linear equal interval scale. In the Rasch model, Pj is

calculated as:

Pi = Sj/N,

where S, is the number of satisfactory responses (correct answers) and N the number of

persons. The PROX estimation method is most appropriately used for calibrating new

items, because item difficulties among a sample of new items tend to approximate a

normal distribution, and a sample of persons tends to be normally distributed (Wright &

Stone, 1979).

It has been found that Rasch estimation methods are unbiased, consistent,

efficient, and sufficient (Anderson, 1973; Andrich, 1988; Haberman, 1977; Wright, 1977;

Wright & Stone, 1979). Therefore, the Rasch estimation methods are preferred over

those of the traditional classical true score model. In the traditional classical true score

approach, a person's ability is based on a total test score, usually expressed as total

correct, a percentage of 100, or a percentile rank. The total test score has been shown to

10

reflect ordinal and curvilinear characteristics which are not conducive to meaningful

interpretation.

Rationale for Study

The measurement of individual differences based on the classical true score

approach and its associated statistics has been found to be biased, inefficient, inconsistent,

and insufficient, especially with nonnormal distributions. The most often used scales of

measurement (nominal, ordinal, and interval) have arbitrary zero points and unequal

measurement intervals, and the results are sample specific. In addition, a single error term

(standard error of measurement) is used for all examinees (Allen & Yen, 1979).

Consequently, there is no way to identify specific objectivity among items and persons

(independence of person ability and item difficulties) using the classical true score

approach.

Test data are useful only if there is some correspondence between the items on the

test and the latent trait being measured. In addition, the data should fit the measurement

model used in constructing the test. In the classical true score approach, chi-square (%2) and

the point biserial correlation are used as indexes of goodness of fit. Chi-square is used to

test the difference between observed and expected events, and the point biserial correlation

is sometimes used as an index of fit for items on a test. The problem with % as a fit index

is that there are different sampling distributions as the degrees of freedom change. With the

point biserial correlation, it is unclear what magnitude of value is needed to establish an

11

acceptable item, mid it is sensitive to the score distribution of the sample. The Rasch model

has overcome many of the problems associated with the classical true score approach:

1. A standard error of measurement is provided for each examinee and item.

2. The standard error of measurement can be tested for significance.

3. The measurement scale has an absolute zero point and equal interval units.

4. The item and person parameters are independent.

5. The parameter estimators are unbiased, consistent, efficient, and sufficient.

Given these advantages, the Rasch model provides a more accurate estimate of a person's

ability than does the classical true score approach (Allen & Yen, 1979; Wright & Stone,

1979), and it allows for the independent diagnosis of measurement problems associated

with items, persons, and item by person interactions.

Although several research studies have shown the effects of guessing on Rasch fit

statistics, no studies have used the Logit Residual Index (LRI) in conjunction with item fit

statistics in helping to identify the effects of measurement disturbances. The purpose of this

Monte Carlo simulation is to investigate the effects of varying test parameters and levels of

measurement disturbance on Rasch fit statistics and the Logit Residual Index in the

detection of misfitting items in the single administration of a four option per item multiple

choice test as experienced in a classroom situation. Rasch fit statistics are applied to

simulated data sets of varying sample sizes, test lengths, item difficulty distributions, and

levels of measurement disturbance.

12

Research Questions

Although Rasch estimation models possess the desirable characteristics or

properties of estimators, what effects do measurement disturbances have on Rasch item

fit statistics and the LRI when varying test lengths, sample sizes, item difficulty

distributions, and levels of guessing as a measurement disturbance? To test the effects of

guessing and varying test parameters on Rasch item fit statistics and the LRI, the following

research question is addressed:

1. What effect does medium and high levels of guessing have on Rasch item fit

statistics and the LRI when varying sample sizes, test lengths, and item difficulty

distributions?

Definition of Terms

Chi square (x )~a descriptive measure of the magnitude of the discrepancies between the observed and expected frequencies.

Consistency-an estimate more closely approximates the population parameter as the sample size increases.

Estimator—a statistic used to determine some characteristic about a population or sample.

Efficiency—the sampling error associated with a given estimator is less than the sampling error associated with any other method of estimation.

Fit—the degree to which measurement data approximate the assumptions or characteristics of a particular measurement model.

Latent Trait-an ability or characteristic possessed by an individual that cannot be directly observed.

Infit—the weighted total fit statistic.

Logit-a unit of measurement used in the Rasch model.

13

Logit Residual Index-the sum of the chi-squares for an individual item. Provides an indication of variations (steepness or flatness) in the item characteristic curve.

Measurement-the assignment of numerals to objects or events according to rules.

Measurement Disturbances-conditions that interfere with the measurement of some underlying psychological construct (aptitude, ability, or attitude).

Outfit-the unweighted total fit statistic.

Overfit—items with negative total fit statistics and steeper observed item characteristic curves than predicted.

Parameters—characteristics about a population.

Person ability-the amount of a specific trait possessed by an individual that enables that person to answer a test question correctly.

Plodding—to work slowly on a test and run out of time before attempting all items.

Point biserial correlation-the correlation between a continuous variable and a dichotomous variable (correlation between an item score and the total test score).

Population-all members in an entire group having some common characteristic.

PROX-a normal approximation estimation procedure that expresses item difficulty calibrations and person ability measures on a common linear scale.

Start-up-reduced performances at the beginning of a test due to unfamiliarity, anxiety, and so on.

Statistics—characteristics about a sample.

Unbiased-a statistic shows no systematic tendency to be either greater than or less than the population parameter.

Underfit-items with positive total fit statistics and flatter observed item characteristic curves than predicted.

14

Delimitations

For this study, specific parameters related to test lengths, sample sizes

(examinees), and item difficulty distributions were selected based on a review of previous

research. The study is limited to three sample sizes (25, 50, and 100), three test lengths

(25, 50, and 100), three levels of guessing (no guessing [0%], 25%, and 50%), and two

item difficulty distributions (normal and uniform). In addition, the results are based on

experimental conditions that simulate the single administration of four option per item

multiple choice tests as experienced in classroom situations.

CHAPTER 2

REVIEW OF THE LITERATURE

Historical Perspective

"Thorndike (1918) said, whatever exists at all exists in some amount. To know it

thoroughly involves knowing its quantity as well as its quality" (Crocker & Algina 1986,

p. 4). Psychological constructs, however, are hypothetical abstractions that can be

observed only indirectly and the existence of which can never be folly confirmed.

Stevens (1946) defined measurement as the assignment of numerals to objects or events

according to rules—hence, the use of nominal, ordinal, interval, and ratio scales of

measurement. Lord and Novick (1968) and Torgerson (1958) noted that measurement

applied to the properties of objects rather than the objects themselves. Accordingly,

when we measure an individual or object, we are measuring not the object or person, but

rather the properties that define the construct or variable possessed by the person whose

performance is being measured. Thus, the measurement of such abstractions presents

several problems (Crocker & Algina, 1986).

To empirically investigate the existence of a trait or property, it is necessary to

develop a test theory to guide the investigation. Based upon test theory, we develop tests,

the primary tools by which we collect information about individual differences.

However, before any measurement can be made, an operational definition of the variable

16

of interest must be established. In other words, we must establish some correspondence

between the test items and the construct being measured. This correspondence is known

as establishing an operational definition. In the literature, this is sometimes referred to as

item-objective congruence (Crocker & Algina, 1986). The operational definition or

common line of inquiry allows the test to define the variable being measured and provide

a means for estimating the location of the person taking the test along an ability

continuum based on his/her test score (Wright & Stone, 1979). Test scores are

meaningful only if (a) they relate to some scale of measurement; (b) they are

generalizable beyond the test; and (c) the response pattern is consistent with expectations.

Binet, Thurstone, Thorndike, Stevens and others were among the first to develop

scales of measurement (Crocker & Algina, 1986). These scales provided rules and

meaningful units of measurement for the comparison of individual items and persons in

the assessment of individual differences. These scales are predominately used in the

assessment of individual differences in the classical true score approach to measurement.

In the classical true score approach, a person's observed score (X) is the sum of two

unobservable scores, a true score (T) and an error score (E). The observed score is

defined as

X = 1 + E.

The true score is defined as the average score resulting from an infinite number of

repeated testing with the same instrument. The error score is the difference between the

observed score and the true score. Measurements obtained using this approach are

17

usually expressed as correlations, percentile ranks, z-scores, t-scores, or scaled scores. It

has also been shown that these statistics are sample specific. The disadvantages of the

classical true score approach are that a single error term (standard error of measurement)

is used for all examinees and that the item difficulties are related to the number of persons

who answered the items correctly. It has also been shown that, as the reference group

changes, so does the measured performance of the person taking the test. To what

degree of certainty then are these measurements valid for generalizations beyond the

reference sample?

What was needed was a test theory or model that allowed for the separate and

independent estimation of both item and person parameters, something the classical true

score approach did not take into account. One such model was introduced by Rasch in

1960 along with a true linear scale of measurement (Rasch, 1980). When using the Rasch

model, generalizations beyond the test are based on several assumptions:

1. The test theory used in the development of the test is appropriate.

2. The items on the test define the variable being measured.

3. The test score gives us some indication of the properties that define the

variable that is possessed by the person taking the test.

4. The scale of measurement is linear.

5. The item and person parameters are independent. Thus, when the Rasch model

is fitted properly, the criteria of independence of sample and items were satisfied, and

generalizations beyond the test can be sufficiently made.

18

Measurement Disturbances

Measurement disturbances are conditions that interfere with the measurement of

some underlying psychological construct, such as aptitude, ability, or attitude (Smith,

1991b). With respect to the Rasch model, only two conditions determine the outcome of

the interaction between the person and any item on the test: (a) the amount of the trait

possessed by the person and (b) the amount of the trait necessary to provide a certain

response to a given stimulus (Smith, 1991b). These conditions are commonly referred to

a s P e r s o n ability and item difficulty. Any other condition that influences measurement is

considered noise in the measurement process.

Measurement disturbances that are characteristics of the person and independent

of the items include, but are not limited to (a) start-up, '(b) plodding, (c) cheating,

(d) illness, (e) external distractions, (f) guessing, (g) boredom, and (h) fatigue (Smith,

1991b). Measurement disturbances associated with the interaction of the person and the

properties of the items are (a) guessing, (b) sloppiness, (c) item content, (d) item type,

and (e) item bias. Examples of the third type of measurement disturbance may include

such things as typographical errors, unrelated answer choices, and items unrelated to

content. The most common types of measurement disturbances are cheating and guessing

(Smith, 1991b).

Thorndike (1949) developed a list of possible disturbances to the measurement

process. Smith (1985) later classified measurement disturbances into three general

categories: (a) disturbances that are the results of characteristics of the person that are

19

independent of the items, (b) disturbances that are the interaction between the

characteristics of the person and the properties of the items, and (c) disturbances that are

the results of the properties of the items that are independent of the characteristics of the

person. The classification of measurement disturbances is important in that the source of

measurement disturbances dictates the techniques necessary to detect its presence

(Smith, 1991b).

Glaser (1949,1952) and Mosier (1941) felt that a person would exhibit

consistently correct answers to relatively easy items, consistently incorrect responses to

difficult items, and inconsistent responses to items centered on their ability level. Since

inconsistent responses could be associated with measurement disturbances, Thurstone and

Chave (1929) believed that some criterion should be established such that inconsistent

records should be eliminated from the tabulation. Thus, persons with perfect scores (all

correct) and persons with no items correct (score of zero) are eliminated from Rasch item

and person analysis.

The detection of measurement disturbances can be divided into two general

categories: an investigation of the structure of the entire response matrix (an investigation

of the fit of the responses to individual items [item fit]), and an investigation of the fit of

the responses for an individual person (person fit) (Smith, 1991b). Once a measurement

disturbance has been detected, there are four possible responses: (a) ignore the problem,

(b) assume everyone has the problem and make a correction, (c) use some method of

robust estimation, or (d) use the available information about the items and the people to

20

make a systematic analysis of each individual's response patterns. If a measurement

disturbance is noticed with person analysis, there are four possible alternative actions:

(a) accept the original estimate, (b) modify the response pattern and re-estimate ability,

(c) report only subset ability estimates and no total ability, or (d) decide that there is not

enough information in the responses to report any ability estimate (Smith, 1991b).

An analysis of fit for the entire response matrix does not require additional

information about the items or persons. It can be based solely on the observed responses,

but it is more useful when based on some characteristic of the persons (age, gender,

native language, or ethnic origin). These characteristics can be used to create subgroups

of persons that can be used to test the invariance of the item difficulty parameters. Person

fit analysis is more useful when based on groups of items that have the potential to evoke

measurement noise in certain groups of persons. However, there are some measurement

disturbances that cannot be easily identified in either items or persons (Smith, 1991b).

In 1982 Smith compared the weighted and unweighted between fit statistics as

applied to persons and found that (a) the mean and the standard deviation of the two

statistics were almost identical; (b) the correlation between the two fit statistics was very

high (r = .99); (c) the Type I error rates were almost identical; and (d) there was high

correspondence between items and persons identified as misfitting by the two fit statistics

(Smith, 1991b).

Anderson (1973), Gustafsson (1980), and Wollenberg (1982) suggested the use of

the likelihood ratio chi-square test as an alternative to the between fit statistic because the

21

distributional properties of the Pearson chi-square are not known. Smith and Hedges

(1982) demonstrated that the distributions of the Pearson chi-square and the likelihood

ratio chi-square were almost identical in data simulated to fit the Rasch model and data

designed to simulate various forms of measurement disturbances.

Smith (1991a) examined the effects of test length, number of persons, item

difficulty distributions, person abilities, and the number of steps in each item on the mean

squares. The results suggested that (a) item responses are discrete rather than continuous

variables and have little influence on the distribution of the fit statistics; and (b) estimated

item and person parameters appear to have little effect on the mean of the fit statistics, but

seem to reduce the standard deviation. Further simulations by Smith (1991a) studied the

effects of test length, number of persons, range of item difficulties, and offset between

person ability and item difficulty distributions. The results suggested correction factors

for the restrictions imposed by the use of both estimated item difficulties and person

abilities in the fit analysis. Because there was a magnitude of difference between the

weighted and unweighted versions of the fit statistics, two correction factors were

proposed.

Smith (1988b) performed several simulations to assess the distributional

properties of the weighted and unweighted item between fit statistics. These simulations

involved 10 replications of 1,000 persons taking a 20-item test, with the item difficulties

uniformly distributed from -1 to +1 logits. The results showed that, as the number of

ability groups increased, the mean and standard deviation of the transformed fit values

22

approached the hypothesized values of 0,1. Additional simulations studied the effect of

increasing the number of persons and number of items, varying the dispersion of item

difficulties, and varying the offset between the mean of the item and the person

distributions. The results indicated that, within the ranges studied, varying these factors

had little effect on the distribution of the transformed fit values. Thus, there appears to be

no reason to develop correction factors such as those developed for the weighted and

unweighted total fit statistic to correct for the influence of these factors on the distribution

and Type I error rate of the item between fit statistics (Smith 1991a).

Smith (1988a, 1991a) also studied the power of the total and between item fit

statistics to detect two types of measurement disturbances, item bias and guessing when

unknown. These studies found that the total weighted, unweighted, and between fit

statistics were capable of detecting different types of measurement disturbances. The

between fit statistic was more efficient at detecting item bias than either the unweighted

or weighted total fit statistic. The unweighted and weighted total fit statistics were more

sensitive to disturbances such as guessing and start-up. The primary difference between

the two statistics is that the unweighted version is based on the sum of the standardized

residuals, whereas the weighted version is based on the sum of the standardized residuals

that have been weighted by the information function. For items far away from the

person's estimated ability, the weighting process makes the weighted total fit statistic less

sensitive to residuals from those items. Systematic identification and evaluation of

measurement disturbances were also demonstrated by Wright (1977), Wright and Stone

23

(1979), and Wright and Masters (1982). Unless one is looking for a specific type of

measurement disturbance, it seems necessary to use both the total and between fit

statistics in the analysis of item fit information.

Smith (1986,1988a, 1991b) and Smith and Hedges (1982) studied the power of

the total and between item fit statistics to detect two types of measurement disturbances,

item bias and guessing. These studies found that the total and between item fit statistics

were capable of detecting different types of measurement disturbances. The between fit

statistic was more efficient at detecting item bias, and the total fit statistics were more

sensitive in detecting disturbances such as guessing and start-up.

Rasch Fit Statistics

Prior to the development of computers, the calculation of Rasch fit indexes were

not practical. In fact, the first fit statistic developed for use with Rasch item calibration

computer programs was the overall chi-square statistic (Smith, 1991b).

This statistic was based on the Pearson chi-square typically used in the fit statistics

developed by Wright (1977). The overall chi-square fit statistic was developed to be used

with dichotomously scored test items to assess the fit of the entire data matrix to the

Rasch measurement model rather than assessing the fit of individual items or persons

(Smith, 1991b). The overall chi-square

L m

i=1 j=l

24

is formed by summing a version of the squared standardized residual for the entire matrix

where in is the number of raw score persons or groups L -1) and L is the number of

items on the test with (L - l)(m -1) degrees of freedom (Smith, 1991b; Wright &

Panchapakesan, 1969). The standardized residual is defined as

y aij - (n)(Pij)

where a is the observed number of correct responses for persons with a raw score j, rj is

the number of persons with raw score j, and Py is the probability of a correct response on

that item for group j (Smith, 1991b, p. 165).

Anderson (1973) developed an additional index of overall fit based on the

likelihood-ratio chi-square. Wright and Panchapakesan (1969) also developed a fit

statistic known as the item chi-square, which can be used to test the fit of responses to

individual items. These statistics are referred to as between ability group fit statistics.

Traditionally, the point biserial correlation offered a rough estimate of item fit; however,

this statistic is sample specific. That is, it is dependent upon the score distribution of the

sample. Anderson (1973) and Bamdorff-Nielson (1978) have shown that only item

difficulty is necessary for consistent and sufficient estimates.

Rasch suggested several fit statistics to assess the fit of data to his measurement

models, the weighted total fit and the unweighted total fit statistics. The weighted total

fit statistic is referred to as infit and the unweighted total fit statistic is referred to as

outfit. In the weighted version, a greater weight is given to unexpected responses to

25

items near the person's logit measure (ability), and in the unweighted version, a greater

weight is given to unexpected responses that are farther away from the person's logit

measure (Wright & Stone, 1979). The total fit statistics evaluate the general agreement

between the variable defined by the item and the variable defined by all other items over

the whole sample. The weighted item total fit statistic was developed to diminish the

effect of anomalous outliers, which results in unusually large mean squares (Smith,

1991b). This is evident when an unexpected number of low-ability persons answer

difficult items correctly and an unexpected number of high-ability persons answer easy

items incorrectly at the beginning of the test. These fit statistics are sensitive to

measurement disturbances, such as guessing, start-up, highly discriminating items, and

very easy items, but are relatively insensitive to systematic disturbances such as bias

(Smith, 1991b).

BICAL, a computer program used to test item fit, was developed by Wright and

Mead (1978). This program uses the unweighted versions of two item fit statistics, the

total and between fit statistics. The between fit statistic is based on the number of ability

groups, and the total fit statistic is based on the item/person residual rather than the

item/ability level residual (Smith, 1991b). Later versions of BICAL introduced a log

transformation in an attempt to standardize the fit statistics to an approximate unit normal

distribution. These transformations were introduced because the mean squares that

indicated possible misfit varied from item to item and analysis to analysis, depending i on

26

the number of persons, item difficulty distributions, and the distribution of person

abilities (Smith, 1991b).

The latest version of BICAL uses a cube root transformation to convert the mean

squares of the unweighted total and between fit statistics into approximate unit normals

(Smith, 1991b). However, these statistics are sensitive to start-up, guessing, large ranges

of item difficulties, person abilities, and easy items, thereby producing large mean

squares (misfit). This latest version also introduces the weighted version of the total item

fit statistic, which replaced the unweighted version. Wright and Masters (1982) expanded

the notion of item fit to two polychotomous Rasch models, the rating scale and partial

credit models. With this addition, Rasch fit statistics are now available for models with

other than dichotomously scored items.

MSCALE, CREDIT, BIGSCALE, and BIGSTEPS are among the most recent

Rasch calibration programs. The primary purpose of these programs is to estimate item

and person parameters from a collection of responses to the items (Smith, 1991b). These

programs contain both the unweighted and weighted item total fit statistics. These two

statistics accentuate different parts of the item-person relationship. Although there is a

high correlation between the two fit statistics, the difference between the two can help

diagnose different types of measurement disturbances. The total fit statistics are more

sensitive to measurement disturbances such as guessing, where unusual numbers of

low-ability examinees give correct answers to difficult items, and start-up, where an

unusually high number of high-ability examinees give incorrect responses to easy items

27

at the beginning of the test. The total fit statistics are also sensitive to variation in the

item characteristic curve.

IP ARM (item and person analysis with the Rasch model) is an item analysis and

person analysis computer program for dichotomous and rating scale data. The major

advantage of IP ARM is that it constructs between fit statistics based on characteristics of

the person for item analysis or properties of the items for person analysis. It is the only

software program that provides between fit statistics (unweighted version) for

biographical subpopulations (Smith, 1991b). When biographical data are used in the

analysis, it is a direct test of the invariance of the estimation of the item difficulty

parameter over ability groups (Smith, 1991b). When demographic data are used in the

analysis to create subgroups (sex, race, and age), the resulting statistic will give an

indication of the presence of bias, or differential item familiarity, in response patterns for

the items (Smith, 1991b).

In item analysis with IPARM, the software first calculates the item mean squares

associated with the Rasch fit statistics then converts them to their associated fit statistic

with a cube root transformation. The unweighted mean square item (UMSj) is defined as

UMSJ= f Z2

N^lPn(\-Pn)\ Ntt

where N is the number of people, Xn is the observed response, and Pn is the response

predicted from the logit difficulty of the item and the logit ability of the person (Smith,

28

1991b). In the Rasch model, the probability of a correct response Xvi by person v with

ability pv to item i with difficulty (8;) can be found as

P K i = 1 I M i } = e x P (Pv" 8i)/[l + exp((3v - 8j)]

(Wright & Stone, 1979), or

_ exp (bj-di) lij — ,

1 + Qxp(bj - di)

where bj is the ability measure for persons in score group j (Smith, 1991b, p. 153).

Although these formulas appear to be considerably different, they yield the same results

provided the person's ability measure is the same. The standard deviation of the

unweighted total mean square items can be found as

S[MS(UT)i\ =

N i

iZi w™

1/2

TV

The weighted mean square item (WMSj) can be calculated as

j^iXn-Pnf WMSj = —

YWn n=1

where W, the weighting function, can be calculated as

W = [ P ( l - P ) ]

with a standard deviation of

29

$[MS(WT)i\ =

" N N 1/2

Wni - 4 2 > * _«=1 n=1

2 X n=1

(Smith, 1991b). The unweighted between mean square item (UBMSj) is defined as

UBMSj = l 2L w; ( J - l) ^

£ P„(l- P„)

where / is the number of score groups, N, is the number of persons in each score group,

Xn is the observed response for person n, and P„ is the predicted response for person v

(Smith 1991b, p. 32). The unweighted between standard deviation can be approximated

by

L C - i ) J

1/2

Once calculated, the mean squares are converted to unit normal fit statistics by the

following cube root transformation formula:

where V, the mean square, and S, the standard deviation, are the values associated with

the mean square under consideration (Smith, 1991b; Wright & Masters, 1982). The

resulting fit statistics have expected values of 0,1 (mean of 0 and a standard deviation of

one 1).

30

Logit Residual Index

The Logit Residual Index (LRI) provides a reference as to the flatness or steepness

of the item characteristic curve (ICC). It indicates the linear trend of the residuals summed

over persons for each items. The LRI can be calculated as

N

£ ( Y n i - Y . i ) ( b n ~ d i )

LRI i = — ji "Zibn-dif

n-1

where dj is the difficulty of the item, b„ is the ability of the person, N is the number of

persons, Xni is the observed response, and Pni is the predicted response

and a standardized residual Yni

Yni = { X n i - P n i ) v ^ Yni

— ~ ~ /

P n i ( \ - P n i ) t t N '

(Smith, 1991b, p. 30).

As one of the output variables from IP ARM, the LRI is a measure of how far an

item deviates from the common slope that is fitted for all items. The index has an

expected value of zero (Smith, 1991b). That is, an item with an ICC that fits the modeled

common curve will have an LRI value of zero. Therefore, an item with an LRI value

greater than zero will have an ICC that is steeper than the modeled curve, and items with

an LRI value less than zero will have an ICC that is flatter than the modeled curve.

(Note: In the traditional classical true score approach, the point biserial correlation is used

to provide an estimate of the slope of the observed item characteristic curve. However,

31

the point biserial correlation has been found to be sample specific, and no discrete values

have been established to provide an accurate interpretation of the correlation coefficient

as related to the slope of the item characteristic curve).

IP ARM automatically assigns group membership (ability groups) based on the

performance of each person on each item. The program attempts to place an equal

number of persons in each ability group. Negative LRI values indicate that low-ability

groups should have positive residuals and high-ability groups should have negative

residuals. This indicates that low-ability persons performed better than expected and

high-ability persons performed less well than expected. Positive LRI values indicate that

low-ability groups should have negative residuals and high-ability groups should have

positive residuals. This indicates that high-ability persons performed better than expected

and low-ability persons performed less well than expected. Items with negative total fit

statistics tend to have steeper observed ICCs than predicted, indicating an overfit to the

model, and items with positive total fit statistics tend to have flatter observed ICCs than

predicted, indicating an underfit to the model.

Purpose of Study

Smith (1991b) proposed that item fit statistics in Rasch calibration programs

provide a frame of reference forjudging item performance and that one way of

establishing this frame of reference is to simulate data that fit the Rasch model over a

variety of test conditions. The present Monte Carlo study investigated the effects of

varying item difficulty distributions, number of persons, number of items, and levels of

32

guessing on Rasch fit statistics and the LRI. SIMTEST, a data simulation program, and

IP ARM, a Rasch item and person analysis program, were used to simulate and test the

effects of varying test conditions on Rasch fit statistics and the LRI.

CHAPTER 3

METHODS AND PROCEDURES

Data Set Construction

For the purpose of this investigation, synthetic data sets were generated using

SIMTEST version 2.1, a software program developed by Stuart Luppescu (1992) for

simulating dichotomous test data. The program allows the user to adjust person abilities,

number of persons, number of items, item difficulty, bias, start-up, guessing, and slope

(discrimination) parameters. Once the parameters have been set, the program randomly

generates dichotomous data sets based on the input parameters. If bias or start-up is used

in the analyses, bias is added, or start-up (i.e., reduced performance at the beginning of

the test due to unfamiliarity, anxiety, etc.) is subtracted from some of the interactions.

When guessing is introduced into the analyses, expected values are calculated according

to the Rasch model (with weighting if there are slopes not equal to 1.0), and dichotomous

test item responses are produced.

Simulated Data Sets

Dichotomous test data were simulated in varying test length, number of persons,

and item difficulty distributions. A 3 by 3 by 2 design was used (three test lengths (25,

50, and 100 items), three person parameters (25, 50, and 100 persons) and two item

difficulty distributions (normal and uniform). To test the effect of guessing on item fit

34

statistics, the design was subjected to three experimental conditions: (a) no guessing, (b) a

25% chance of guessing correctly, and (c) a 50% chance of guessing correctly.

A SIMTEST batch file was written for each of 54 experiments. The first

experimental condition involved 25 items, 25 persons, normally distributed item

difficulties, and no guessing. The control file for this experiment was as follows:

SIMTEST -H -OEXPl.DAT -S2.0 -NO -DN -125- -P25 -F0 -M0.0 -A0.0 -U0.0 -T-1E200 -CO -L1.0.

The following is a description of the parameters used in the SIMTEST control file

to simulate the data: (a) the batch mode (-H or hands-off switch); (b) output file name (-

0EXP1.DAT); (c) a standard deviation of 2 (-S2.0); (d) no biased items (-NO); (e) normal

distribution of item difficulties (-DN); (f) 25 items (-125); (g) 25 persons (-P25); (h) no

persons with biased scores (-F0); (i) a mean person ability measure of zero (-M1.0); (j) no

startup reduction (-AO); (k) no persons with startup (-U0); (1) threshold for guessing (-T-

1E200); (m) the chance of guessing correctly (-CO); and (n) a slope of 1 [the Rasch

model] (-L1.0).

The item difficulty distributions for the simulation were set at normal and

uniform. The parameters used to simulate the normally distributed data sets were (a) a

mean person ability of 1; (b) a standard deviation of 2; (c) a slope of 1; (d) three test

lengths (25, 50, and 100); and (e) three levels of guessing (no guessing [0%], 25%, and

50%). For the uniformly distributed data sets, the parameters were (a) a mean person

ability of 1; (b) a standard deviation of 1; (c) a slope of 1; (d) three test lengths (25, 50,

35

and 100); and (e) three levels of guessing (no guessing [0%], 25%, and 50%). Given

these parameters, 95% of the item difficulties for both the normal and uniform

distributions will fall between ±2 standard deviations, with the greatest concentration

centered on the mean of zero (0). The parameters used to simulate the chances of

guessing the correct response to an item were 25% and 50%. This is equivalent to having

a l-in-4 and a 2-in-4 chance of guessing the correct answer to an item on a four-option

per item multiple choice test. The discrimination or slope parameter was set at 1,

invoking the program to simulate data appropriate for the Rasch model. Bias as a

measurement disturbance was not used as an experimental condition in the data

simulation; therefore, the control variable used to invoke bias was not entered as a control

variable.

The experiments are referred to by number, as shown in Table 1. For example,

Experiment 1 consists of a 25-item test with normally distributed item difficulties,

administered to 25 persons with a slope of 1 and no guessing.

Table 1

Definition of Experiments

Experiment Measurement disturbance

Difficulty distribution Items Persons

1 No guessing Normal 25 25 2 No guessing Normal 50 25 3 No guessing Normal 100 25 4 No guessing Normal 25 50 5 No guessing Normal 50 50 6 No guessing Normal 100 50 7 No guessing Normal 25 100 8 No guessing Normal 50 100

Ctable continues^

36

Measurement Difficulty Experiment disturbance distribution Items Persons

9 No guessing Normal 100 100 10 Guessing (.25) Normal 25 25 11 Guessing (.25) Normal 50 25 12 Guessing (.25) Normal 100 25 13 Guessing (.25) Normal 25 50 14 Guessing (.25) Normal 50 50 15 Guessing (.25) Normal 100 50 16 Guessing (.25) Normal 25 100 17 Guessing (.25) Normal 50 100 18 Guessing (.25) Normal 100 100 19 Guessing (.50) Normal 25 25 20 Guessing (.50) Normal 50 25 21 Guessing (.50) Normal 100 25 22 Guessing (.50) Normal 25 50 23 Guessing (.50) Normal 50 50 24 Guessing (.50) Normal 100 50 25 Guessing (.50) Normal 25 100 26 Guessing (.50) Normal 50 100 27 Guessing (.50) Normal 100 100 28 No guessing Uniform 25 25 29 No guessing Uniform 50 25 30 No guessing Uniform 100 25 31 No guessing Uniform 25 50 32 No guessing Uniform 50 50 33 No guessing Uniform 100 50 34 No guessing Uniform 25 100 35 No guessing Uniform 50 100 36 No guessing Uniform 100 100 37 Guessing (.25) Uniform 25 25 38 Guessing (.25) Uniform 50 25 39 Guessing (.25) Uniform 100 25 40 Guessing (.25) Uniform 25 50 41 Guessing (.25) Uniform 50 50 42 Guessing (.25) Uniform 100 50 43 Guessing (.25) Uniform 25 100 44 Guessing (.25) Uniform 50 100 45 Guessing (.25) Uniform 100 100 46 Guessing (.50) Uniform 25 25 47 Guessing (.50) Uniform 50 25 48 Guessing (.50) Uniform 100 25 49 Guessing (.50) Uniform 25 50 50 Guessing (.50) Uniform 50 50 51 Guessing (.50) Uniform 100 50 52 Guessing (.50) Uniform 25 100 53 Guessing (.50) Uniform 50 100 54 Guessing (.50) Uniform 100 100

37

Rasch Analysis

SIMTEST version 2.1, a software program developed by Stuart Luppescu (1992)

for simulating dichotomous test data, was used to generate the 54 simulated experimental

data sets. Once the data were simulated, a BIGSTEPS program was written for each

experimental condition to output a data file containing item difficulty parameters to be

read by IP ARM. The BIGSTEPS control program for Experiment 1 (25 items, 25

persons, no guessing, and normally distributed item difficulties) was as follows:

TITLE='EXP1 - ND, NG, 25 ITEMS, 25 PERSONS' NI=10 NAME1=1 ITEM1=13 DATA=C:\EXP1\EXP1 .DAT CODES-Ol TABLES=1010001000100100000000 STBIAS=N INUM=Y IFILE=C :\EXP 1 \EXP 1 BIG.D AT &END

The control parameters consisted of (a) the title; (b) the number of items (NI=25);

(c) the beginning column for identification information (NAME=1); (d) the beginning

column for the data (ITEM1=13); (e) the name of the data file (C:\EXP1\EXP1.DAT); (f)

the possible item response values (CODES=01); (g) the tables included in the output

(TABLES=1010001000100100000000); (h) no statistical bias correction factor

(STBIAS=N), (i) automatic generation of item names (INUM=Y); (j) name of the item

38

difficulty output file (IFILE=C r\EXPl\EXPlBIG.DAT), and (k) the end statement

(&END).

IP ARM was used to produce Rasch fit statistics for the experimental conditions

and the logit residual indices (See Appendix A for a description of the input parameters

required to construct the data analysis control file and initiate the IP ARM program.)

Statistical Analysis

In order to determine whether there were significant differences between mean fit

statistics across and within experimental conditions involving varying levels of guessing

and distribution types, two-tailed independent t-tests were conducted using the formula

for independent samples at the .05 level of significance (Ferguson, 1981). The

experimental conditions were no guessing (0%), 25%, and a 50% chance of guessing the

correct answer in normal and uniformly distributed item difficulty distributions. The

sample n used in the t-test analyses was the number of experimental conditions (9 total)

involved at each level of guessing.

Chi-square was used to determine whether an increased probability of guessing

the correct answer increased the frequency of misfitting items detected by Rasch fit

statistics (Ferguson, 19981). The sample n used in the (y2) analyses was the levels of

guessing (3 total).

CHAPTER 4

RESULTS

Effects of Guessing on Rasch Fit Statistics

To test the effects of guessing on Rasch fit statistics and the Logit Residual Index

(LRI), experimental conditions were simulated, with varying test lengths (25, 50, and

100), persons (25,50, and 100), levels of guessing (no guessing [0%], 25%, and 50%),

and distribution types (normal aid uniform), resulting in 54 experiments. Item and

person parameters were not considered as effect factors but were used to simulate

test-taking conditions. Experiments 1-27 involved normally distributed item difficulty

distributions with varying levels of guessing (1-9, no guessing [0%]; 10-18, 25%; and 19-

27, 50%), and Experiments 28-54 involved uniformly distributed item difficulty

distributions with varying levels of guessing (28-36, no guessing [0%]; 37-45,25%; and

46-54, 50%). The LRI has an expected value = 0; therefore, the mean and S. D. for this

index are not reported in the summary tables. A summary of item fit information by

experiment is presented in Appendix C. Summaries of mean item fit information by

experiment and level of guessing are presented in Tables 2 through 8. A brief description

of the table format used to display the data is presented in Figure 2.

40

Column 1 - Experiment number Column 2 - .Number of items Column 3 - Logit item difficulty. Column 4 - Point biserial correlation Column 5 - Unweighted total fit statistic (outfit statistic). Column 6 - Weighted total fit statistic (infit statistic). Column 7 - Ability between fit statistic (ability groups). Column 8 - Mean item score (proportion correct). Column 9 - Logit Residual Index (indicate variations in the ICC). Mean - Average value for each column is located at the base of the table S.D. - Standard deviation

Figure 2. Table format for the display of experimental data.

Table 2

Mean Summary of Item Fit Information for Experiments 1-9 (Normally Distributed Item Difficulty Distributions and No Guessing)

Logit Point. Unwt. Wt. Ability Mean Logit #of item bis. total total between item residual

Exp. # items diff. corr. fit fit fit score index

25 Persons 1 25 -0.16 0.41 0.20 -0.05 0.20 0.61 -0.07 2 50 -0.10 0.41 0.10 0.01 -0.03 0.66 0.00 3 100 0.29 0.39 0.15 0.00 0.03 0.67 -0.01

50 Persons 4 25 -0.37 0.41 0.12 -0.05 0.01 0.64 0.00 5 50 -0.09 0.39 0.05 0.08 0.00 0.65 0.03 6 100 -0.13 0.38 0.00 0.06 0.04 0.62 0.01

100 Persons 7 25 -0.21 0.42 0.11 -0.03 0.04 0.65 0.00 8 50 -0.09 0.31 0.13 0.06 0.08 0.67 -0.03 9 100 0.00 0.40 0.07 0.05 0.01 0.63 0.00

Mean -0.10 0.39 0.10 0.01 0.04 0.64 S.D. 0.18 0.03 0.06 0.05 0.07 0.02

41

A 25% chance of guessing the correct answer was randomly introduced into

Experiments 10-19 as a measurement disturbance, while the item and person parameters

(test lengths and group membership) remained constant. A summary of mean item fit

information is shown in Table 3.

Table 3

Mean Summary of Item Fit Information for Experiments 10-18 (Normally Distributed Item Difficulty Distributions and a 25% Chance of Guessing Correctly)



25 Persons 10 25 -0.30 0.40 0.19 -0.08 0.15 0.65 -0.06 11 50 -0.21 0.33 0.03 -0.00 0.22 0.61 0.03 12 100 -0.32 0.29 0.01 -0.05 0.18 0.62 0.02

50 Persons 13 25 -0.16 0.37 0.16 0.06 0.34 0.68 -0.04 14 50 -0.17 0.36 0.05 0.01 0.13 0.64 0.01 15 100 -0.22 0.35 0.04 0.03 -0.06 0.65 0.02

100 Persons 16 25 -0.19 0.34 -0.04 -0.05 0.21 0.64 0.04 17 50 -0.10 0.33 0.07 0.01 0.19 0.64 0.00 18 100 -0.15 0.34 0.08 -0.02 0.22 0.64 -0.01

Mean -0.20 0.35 0.07 -0.01 0.18 0.64 S.D. 0.07 0.03 0.07 0.04 0.11 0.02


Experiments 19-27 while holding all other conditions constant. A summary of mean fit

information is presented in Table 4.

42

Table 4

Summary of Mean Item Fit Information for Experiments 19-27 (Normally Distributed Item Difficulty Distributions and a 50% Chance of Guessing Correctly')


Exp. # Items diff. corr. fit fit fit score index

25 Persons 19 25 -0.31 0.40 0.08 -0.04 0.29 0.63 0.01 20 50 -0.28 0.37 0.07 0.06 0.12 0.67 0.00 21 100 -0.14 0.27 0.06 0.01 0.24 0.62 0.02

50 Persons 22 25 -0.36 0.38 0.12 -0.04 0.13 0.64 0.02 23 50 -0.18 0.38 0.18 -0.01 0.06 0.61 -0.05 24 100 -0.09 0.38 0.10 0.01 0.14 0.64 0.00

100 Persons 25 25 0.00 0.36 0.12 -0.02 0.19 0.64 0.01 26 50 -0.09 0.33 0.06 -0.01 0.13 0.66 0.00 27 100 -0.10 0.32 0.08 0.01 0.00 0.63 0.00

Mean -0.17 0.35 0.10 0.00 0.14 0.64 S.D. 0.12 0.04 0.04 0.03 0.09 0.02

To determine whether levels of guessing had an effect on mean Rasch fit statistics

within normally distributed experimental conditions, comparisons were made between

the mean fit values obtained at each level of guessing using an independent t-test at the

.05 level of significance. Item and person parameters remained constant across

experimental conditions. Mean fit values for Rasch fit statistics by experimental

conditions (experiments and levels of guessing) are presented in Table 5.

43

Table 5

Mean Differences for Experiments With Normally Distributed Item Difficulties With No Guessing. 25%. and a 50% Chance of Guessing Correctly

Experiments % Guessing

Unwt. total Wt. total Between

Experiments % Guessing Mean S.D. Mean S.D. Mean S.D.

1-9 0 .10 .06 .01 .05 .04* .07 10-18 25 .07 .07 -.01 .04 .18* .11 19-27 50 .10 .04 .00 .03 .14* .09

Note. Independent t-tests (two tailed) were calculated based upon the number of experimental conditions (n = 9, df=16), with significance set at the .05 level (t > 2.120). An asterisk (*) indicates significance at the .05 level.

No significant differences were observed across levels of guessing between mean

fit values associated with the unweighted and weighted total fit statistics at the .05 level.

For the between fit statistics, the mean values showed significant differences at the .05

level between experimental conditions involving no guessing and those involving a 25%

and 50% chance of guessing correctly.

Experiments 28-54 involved uniformly distributed item difficulty distributions

across the same conditions that were used in the normally distributed conditions. A

summary of mean fit information for experimental conditions involving no guessing in

uniformly distributed conditions (Experiments 28-36) is presented in Table 6.

44

Table 6

Summary of Mean Item Fit Information for Experiments 28-36 (Uniformly Distributed Item Difficulty Distributions and No Guessing)

Logit Point. Unwt. Wt. Ability Mean Logit # of item bis. total total between item residual


25 Persons 28 25 0.00 0.44 0.09 0.02 -0.20 0.68 0.02 29 50 -0.00 0.34 0.11 0.03 -0.16 0.71 -0.01 30 100 0.00 0.36 0.05 0.03 0.10 0.62 -0.02

50 Persons 31 25 0.00 0.33 -0.05 0.05 0.01 0.71 0.05 32 50 -0.00 0.41 0.03 -0.02 0.17 0.68 0.03 33 100 -0.00 0.40 0.02 0.01 -0.07 0.67 0.00

100 Persons 34 25 0.00 0.39 0.01 0.03 0.02 0.70 0.00 35 50 -0.00 0.42 0.07 -0.01 -0.07 0.66 -0.01 36 100 0.00 0.43 0.07 -0.02 -0.04 0.68 -0.01

Mean 0.00 0.39 0.04 0.01 -0.03 0.68 S.D. 0.00 0.04 0.05 0.03 0.12 0.03


Experiments 37-45 as a measurement disturbance, while the item (test lengths) and

person (group membership) parameters remained constant. A summary of mean fit

information is shown in Table 7.

45

Table 7

Summary of Mean Ttem Fit Information for Experiments 37-45 (Uniformly Distributed Ttem Difficulty Distributions and a 25% Chance of Guessing Correctly)

Exp. #

Logit Point. # of item bis.

items diff. corr.

Unwt. Wt. total total fit fit

Ability between

fit

Mean item score

Logit residual index

37 38 39

40 41 42

43 44 45

25 Persons 25 -0.00 0.48 0.07 0.02 0.26 0.70 0.06 50 -0.00 0.37 0.08 0.06 0.21 0.71 -0.02

100 0.00 0.36 0.05 0.06 0.28 0.72 0.00

50 Persons 25 -0.00 0.41 0.07 -0.00 0.03 0.71 -0.01 50 0.00 0.34 0.04 0.02 -0.04 0.66 0.04

100 0.00 0.42 0.06 0.04 0.14 0.72 -0.01

100 Persons 25 -0.00 0.40 0.11 -0.03 0.13 0.71 -0.02 50 0.00 0.37 -0.00 0.03 -0.16 0.68 0.01

100 -0.00 0.40 0.00 0-02 0.08 0.67 0.01

Mean S.D.

0.00 0.00

0.39 0.04

0.05 0.04

0.02 0.03

0.10 0.14

0.70 0.02


Experiments 46-54 while holding all other experimental conditions constant. A summary

of mean fit information is presented in Table 8.

46

Table 8

Summary of Mean Item Fit Information for Experiments 46-54 (Uniformly Distributed Item Difficulty Distributions and a 50% Chance of Guessing Correctly*

Exp.#

Logit Point. Unwt. Wt. Ability # of item bis. total total between

Items diff. corr. fit fit fit

Mean item score

Logit residual index

46 47 48

49 50 51

52 53 54

25 Persons 25 50

100

25 50

100

25 50

100

0.00 -0.00 -0.06

0.00 0.00

-0.00

0.00 0.00 0.00

0.52 0.44 0.34

0.45 0.40 0.41

0.39 0.37 0.37

0.18 0.04 0.05

-0.00 0.02 0.07

50 Persons 0.00 0.05 0.04 -0.03 0.02 0.03

100 Persons 0.04 -0.04

-0.01 0.05 0.02 0.03

0.08 0.30 0.15

0.23 0.23 0.26

0.60 0.17 0.04

0.70 0.72 0.71

0.68 0.69 0.72

0.72 0.68 0.73

-0.05 -0.01 -0.03

0.00 0.03 0.00

0.00 -0.01 0.00

Mean S.D.

-0.01 0.02

0.41 0.05

0.04 0.06

0.02 0.04

0.23 0.16

0.71 0.02

To determine whether levels of guessing had an effect on Rasch fit statistics in

uniformly distributed experimental conditions, comparisons were made between the mean

fit values obtained at each level of guessing with an independent t-test at the .05 level of

significance. The item and person parameters remained constant across experimental

conditions. Mean fit values for each experimental condition are presented in Table 9.

47

Table 9

Mean Differences for Experiments With Uniformly Distributed Item Difficulties With No Guessing, 25%. and a 50% Chance of Guessing Correctly

Experiments % Guessing

Unwt. total Wt. total Between Experiments % Guessing Mean S.D. Mean S.D. Mean S.D. 28-38 0 .04 .05 .01 .03 -.03* .12 37-45 25 .05 .04 .02 .03 .10* .14 46-54 50 .04 .06 .02 .04 .23* .16 Note. Independent t-tests (two tailed) were calculated based upon the number of experimental conditions (n = 9, df=16), with significance set at the .05 level (t > 2.120). An asterisk (*) indicates significance at the .05 level.

No significant differences were found between the mean fit values associated with

the unweighted and weighted total fit statistics across guessing levels at the .05 level. For

the between fit statistic, a significant difference was found between the mean values for

experimental conditions involving no guessing and those involving a 25% and 50%

chance of guessing correctly.

To determine whether distribution types had a significant effect on Rasch fit

statistics, comparisons were made between the mean fit values obtained at the same levels

of guessing in each distribution type using an independent t-test (two-tailed) at the .05

level of significance. A summary of mean fit values by experimental conditions

(experiments and levels of guessing) and distribution type is presented in Table 10.

48

Table 10

Mean Differences for Experiments With Normal and Uniformly Distributed Item Difficulties at the Same Levels of Guessing

Unwt. total Wt. total Between

Normal item difficulty distributions

1-9 0 .10 .06 .10* .05 .04 .07 10-18 25 .07 .07 -.01 .04 .18 .11 19-27 50 .10* .04 .00 .03 .14 .09 Uniform item difficulty distributions

28-38 0 .04 .05 .01* .03 -.03 .12 37-45 25 .05 .04 .02 .03 .10 .14 46-54 50 .04* .06 .02 .04 .23 .16

Note. Independent t-tests (two tailed) were calculated based upon the number of experimental conditions (n = 9,df=16), with significance set at the .05 level (t > 2.120). An asterisk (*) indicates significance at the .05 level.

The mean fit values for the unweighted total fit statistic showed a significant

difference at the .05 level between experimental conditions involving a 50% chance of

guessing the correct answer in normal and uniformly distributed item difficulty

distributions. No significant differences were found between mean fit values for the

unweighted total fit statistic at the 0% and 25% levels of guessing. The mean weighted

total fit values showed a significant difference at the .05 level between experimental

conditions involving no guessing (0%) in normal and uniform item difficulty

distributions. No significant differences were observed between the mean fit values for

the weighted total fit statistic at the 25% and 50% levels of guessing correctly. No

49

significant differences were found between the mean fit values for the various levels of

guessing (no guessing, 25%, and 50%) for the between fit statistic at the .05 level.

Detection of Guessing by Rasch Fit Statistics

In order to determine whether the frequency of misfitting items detected by Rasch

fit statistics differed by distribution types and levels of guessing, an analysis of observed

and expected frequencies were conducted using chi-square (x2). The sum of misfitting

items detected in experimental conditions involving no guessing on tests of 25, 50 and

100 items in each distribution type was used as the baseline or theoretical frequencies

(expected) against which those observed on equivalent tests at the 25% and 50% levels of

guessing were compared. Because these values occurred in the absence of measurement

disturbance, they were purely a matter of chance and provided a realistic frame of

reference against which conditions involving different levels of guessing could be

compared. (See Appendix B for a summary of misfitting items.) Comparisons were

made within and across distribution types. Comparisons of observed and expected

frequencies within normally distributed conditions are presented in Table 11.

50

Table 11

A Comparison of the Frequency of Misfitting Items Detected by Rasch Fit Statistics in a Normal Distribution of Item Difficulties at Varying Test Lengths (Items') and Levels of Guessing Using y-.

Statistic % Items 0 E {O-E) 0O-E)2 (o - ey E

Sig.

Unwt. 25 25 3 3 0.00 0.00 0.00 Unwt. 50 25 2 3 -LOO 1.00 0.33 Unwt. 25 50 2 3 -1.00 1.00 0.33 Unwt. 50 50 3 3 0.00 0.00 0.00 Unwt. 25 100 6 5 1.00 1.00 0.20 Unwt. 50 100 5 5 0.00 0.00 0.00 Wt. 25 25 1 0 1.00 1.00 Wt. 50 25 3 0 3.00 9.00 __ Wt. 25 50 3 3 0.00 0.00 0.00 Wt. 50 50 4 3 1.00 1.00 0.33 Wt. 25 100 3 0 3.00 9.00 Wt. 50 100 2 0 2.00 4.00 . .

Bet. 25 25 3 1 2.00 4.00 4.00 Bet. 50 25 3 1 2.00 4.00 4.00 Bet. 25 50 7 1 6.00 36.00 36.00 *

Bet. 50 50 6 1 5.00 25.00 25.00 *

Bet. 25 100 7 7 0.00 0.00 0.00 Bet. 50 100 6 7 -LOO 1.00 0.17

Note. Percent (%) indicates the chance of guessing correctly. Statistic indicates Rasch fit statistics (Unwt. = unweighted total, Wt. = weighted total, and Bet. = unweighted ability between). "Items" indicate the number of items on each test. The symbol O indicates observed frequencies and E expected frequencies Significance (Sig.) level for x at .05 is > 5.991 with df= 2. An asterisk (*) indicates significance at the .05 level. Chi-square values indicated by are undefined values (division by zero).

51

In experimental conditions involving normally distributed conditions, significant

differences were found at the .05 level between the frequency of observed and expected

misfitting items detected by the between fit statistic on tests simulated with 50 items and

a 25% chance of guessing the correct answer and on tests with 50 items and a 50%

chance of guessing the correct answer. No significant differences were found between

the frequency of observed and expected misfitting items detected by the weighted and

unweighted total fit statistics in normally distributed experimental conditions.

Comparisons of expected and observed frequencies in uniformly distributed conditions

are presented in Table 12.

On tests with uniformly distributed item difficulties, significant differences were

observed at the .05 level between the frequency of expected and observed misfitting items

detected by the weighted total and between fit statistics on tests simulated with 100 items

and a 25% chance of guessing the correct answer and on tests with 25 items and a 50%

chance of guessing the correct answer, respectively. No significant differences were

observed between the expected and observed frequency of misfitting items detected by

the weighted and unweighted total fit statistics in the uniformly distributed experimental

conditions.

52

Table 12

A Comparison of the Frequency of Misfitting Items Detected by Rasch Fit Statistics in a Uniform Distribution of Item Difficulties at Varying Test Lengths and Levels of Guessing Using y-

Statistic % Items O E (O-E) {O-Ef (O - Ef E

Sig.

Unwt. 25 25 2 2 0.00 0.00 0.00 Unwt. 50 25 2 2 0.00 0.00 0.00 Unwt. 25 50 3 3 0.00 0.00 0.00 Unwt. 50 50 5 3 2.00 4.00 1.33 Unwt. 25 100 4 9 -5.00 25.00 2.78 Unwt. 50 100 7 9 -2.00 4.00 0.44 Wt. 25 25 1 3 -2.00 4.00 1.33 Wt. 50 25 0 3 -3.00 9.00 3.00 Wt. 25 50 3 3 0.00 0.00 0.00 Wt. 50 50 6 3 3.00 9.00 3.00 Wt. 25 100 7 2 5.00 25.00 12.50 *

Wt. 50 100 2 2 0.00 0.00 0.00 Bet. 25 25 2 1 1.00 1.00 1.00 Bet. 50 25 4 1 3.00 9.00 9.00 *

Bet. 25 50 3 3 0.00 0.00 0.00 Bet. 50 50 6 3 3.00 9.00 3.00 Bet. 25 100 9 6 3.00 9.00 1.50 Bet. 50 100 5 6 -1.00 1.00 0.17

Note. Percent (%) indicates the chance of guessing the correct answer. Statistic indicates Rasch fit statistics (Unwt. = unweighted total, Wt. = weighted total, and Bet. = unweighted ability between). "Items" indicate the number of items on each test. The symbol O indicates observed frequencies and E expected frequencies. Significance (Sig.) level for %2 at .05 is > 5.991 with df= 2. An asterisk (*) indicates significance at the .05 level. Chi-square values indicated by

are undefined values (division by zero).

53

In order to determine whether normal or uniform distribution types had an effect

on the frequency of misfitting items detected by Rasch fit statistics, a chi-square analysis

of the frequency of misfitting items across distribution types was conducted at the .05

level of significance. Assuming that no significance differences would occur (null

hypothesis), the frequency of misfitting items detected in experiments involving normally

distributed conditions were used as the expected frequencies, and those detected in

experiments involving uniformly distributed conditions were used as observed

frequencies. The results of the % analyses are presented in Table 13. No significant

differences were found between the frequency of misfitting items detected by Rasch fit

statistics in normal and uniformly distributed conditions when varying the levels of

guessing.

54

Table 13

A Comparison of the Frequency of Misfitting Items Detected bv Rasch Fit Statistics in Normal and Uniformly Distributed Item Difficulties Using v~.

Statistic % Items O E (O-E) (O-E)2 o o - E y sig. E

Unwt. 0 25 2 3 -1.00 1.00 0.33 Unwt. 0 50 3 3 0.00 0.00 0.00 Unwt. 0 100 9 5 4.00 16.00 3.20 Unwt. 25 25 2 3 -1.00 1.00 0.33 Unwt. 25 50 3 2 1.00 1.00 0.50 Unwt. 25 100 4 6 -2.00 4.00 0.67 Unwt. 50 25 2 2 0.00 0.00 0.00 Unwt. 50 50 5 3 2.00 4.00 1.33 Unwt. 50 100 7 5 2.00 4.00 0.80 Wt. 0 25 3 0 3.00 9.00 . . .

Wt. 0 50 3 3 0.00 0.00 0.00 Wt. 0 100 2 0 2.00 4.00 Wt. 25 25 1 1 0.00 0.00 0.00 Wt. 25 50 3 3 0.00 0.00 0.00 Wt. 25 100 7 3 4.00 16.00 5.33 Wt. 50 25 0 3 -3.00 9.00 3.00 Wt. 50 50 6 4 2.00 4.00 2.00 Wt. 50 100 2 2 0.00 0.00 0.00 Bet. 0 25 1 1 0.00 0.00 0.00 Bet. 0 50 3 1 2.00 4.00 4.00 Bet. 0 100 6 7 -1.00 1.00 0.14 Bet. 25 25 2 3 -1.00 1.00 0.33 Bet. 25 50 3 7 -4.00 16.00 2.29 Bet. 25 100 9 7 2.00 4.00 0.57 Bet. 50 25 4 3 1.00 1.00 0.33 Bet. 50 50 6 6 0.00 0.00 0.00 Bet. 50 100 5 6 -1.00 1.00 0.17

Note- Percent (%) indicates the levels of guessing. Statistic indicates Rasch fit statistics (Unwt. = unweighted total, Wt. = weighted total, and Bet. = unweighted ability between). "Items" indicate the number of items on each test. The symbols O and E represent observed and expected frequencies, respectively. Significance (Sig.) level for x at .05 is > 5.991 with df= 2. An asterisk (*) indicates significance at the .05 level. Chi-square values indicated by are undefined values (division by zero).

55

Guessing and the Logit Residual Index

Misfitting item data were investigated to determine the effects of guessing on the

Logit Residual Index (LRI). (See Appendix B for a summary of misfitting item

information by experimental conditions.) Presented in Table 14 are the total number of

misfitting items detected by experimental condition and the number and percentage of

those items detected by each fit statistic.

Table 14

Number and Percentage of Misfitting Items Detected bv Rasch Fit Statistics Across Experiments Involving Normal and Uniformly Distributed Item Difficulty Distributions

% Number (%) detected Experiments Guessing N Unweighted. Weighted. Between

Normal item difficulty distributions 1-9 0 17 10 (.59) 2 (.12) 7 (.41) 10-18 25 23 11 (.48) 7 (.30) 17 (.74) 19-27 50 26 10 (.38) 8 (.31) 15 (.58) Uniform item difficulty distributions 28-36 0 24 14 (.58) 7 (.29) 9 (.38) 37-45 25 29 9 (.31) 9 (.31) 16 .(55) 46-54 50 27 13 (.48) 8 (.30) 15 (.56)

Note. Percentages may not sum to 100% because one or more fit statistics may have detected the same items as misfitting. The sum of the items detected by Rasch fit statistics may not sum to the total number of misfitting items at each level of guessing "N" because one or more statistics may have detected the same items. Percent (%) indicates the levels of guessing.

The sum of the items detected by each fit statistic may not sum to the total

number of items detected at each level of guessing because one or more statistics

detecting the same items as misfitting. Presented in Table 15 are the total number of

56

misfitting items by experimental condition (N) and the number and percentage of those

items producing positive and negative LRI values.

Table 15

Number of Misfitting Items and Number and Percentage of LRI Values bv Experimental Conditions in Normal and Uniform Item Difficulty Distributions

—

Positive Negative

Experiments % Guessing Misfitting items N % N %

Normal item difficulty distributions 1-9 0 17 1 6 16 94 10-18 25 23 4 22 18 78 19-27 50 26 5 19 21 81 Uniform item difficulty distributions 28-36 0 24 4 17 20 83 37-45 25 29 7 24 22 76 46-54 50 27 2 7 25 93

In experiments involving no guessing with normally distributed conditions,

approximately 60% of misfitting items were detected by the unweighted total fit statistic.

Of the 17 items detected in the normally distributed conditions, 94% (16 of 17) produced

negative LRI values. The logit item difficulties for these items ranged from -4.37 to 3.14.

The greatest magnitude of change (negative) in LRI values was observed when the misfit

values of the unweighted total statistic was > 2.69 and the associated item difficulties

were either very easy or very difficult. The negative LRI values for these items indicated

a negative trend in the residuals.

In the uniformly distributed conditions with no guessing, 58% (14 of 24) of the

misfitting items were detected by the unweighted total fit statistic. Of the total detected,

57

83% (20 of 24) had negative LRI values. The logit item difficulties for these items

ranged from -1.32 to .99. The greatest magnitude of change in LRI values (negative) was

observed when the item difficulties of the misfitting items detected by the unweighted

total fit statistic were either very easy or very difficult. The negative LRI values

indicated a negative trend in the residuals. That is, persons in the low-ability groups

performed better than expected on these items.

As the levels of guessing increased (25% and 50%), the between and unweighted

total fit statistics detected the greatest number of misfitting items, respectively. At the

25% levels of guessing in the normally distributed conditions, 74% (17 of 23) of

misfitting items were detected by the between fit statistic and 48% (11 of 23) by the

unweighted total fit statistic. Of the items detected, 78% (18 of 23) produced negative

LRI values. The logit item difficulties associated with these items ranged from -4.16 to

5.11. Again, the greatest change (negative) in LRI values was observed when the

misfitting items detected by the unweighted total fit statistic was > 2.69 and associated

with either easy or difficult items.

In the uniformly distributed conditions at the 25% level, 55% (16 of 29) of the

misfitting items were detected by the between fit statistic, and 31% (9 of 29) were

detected by the unweighted total fit statistic. The logit item difficulties associated with

these items ranged from -1.55 to 1.80. Of the misfitting items detected, 76% (22 of 29)

produced negative LRI values. Again, the greatest change (negative) in LRI values was

58

when the unweighted total fit statistic was > 2.69 and the misfitting items were either

very easy or very difficult.

At the 50% level in the normally distributed conditions, more than half (58%) of

the misfitting items were detected by the between fit statistic, 38% by the unweighted

total fit statistic, and 31% by the weighted total fit statistic. Of the total number of

misfitting items detected in the normally distributed conditions (26), 81% (21 of 26) had

negative LRI values. The logit item difficulties for these items ranged from -3.29 to 4.74.

As in previous experiments, the greatest change in LRI values was in the negative

direction when associated with large positive unweighted total misfit values and very

easy or difficult items.

In uniformly distributed conditions at the 50% level of guessing, 56% of the

misfitting items were detected by the between fit statistic, 48% by the unweighted total,

and 30% by the weighted total fit statistic. Of the total misfitting items detected in the

uniformly distributed conditions, 93% (25 of 29) had negative LRI values. The logit item

difficulties associated with these items ranged from -1.77 to 1.21. Large negative

changes in the LRI resulted from large unweighted total misfit values associated with

very easy or very difficult items.

CHAPTER 5

FINDINGS AND CONCLUSIONS

Effects of Guessing on Rasch Fit Statistics

Independent t-tests were used to determine whether item distribution types and

levels of guessing affected the Rasch fit statistics. Item and person parameters were not

interpreted as cause and effect factors, but were used to establish the experimental

conditions. This was justified based upon previous research that has shown that these

parameters have minimal to no effect on Rasch fit statistics.

In experimental conditions involving normally distributed item difficulty

distributions, the mean unweighted and weighted total fit statistics were robust (no

significant differences found) to varying levels of guessing at the .05 level of significance

within normally distributed conditions. As the level of guessing increased (25% and 50%

levels), the mean between fit statistics showed significant differences between mean fit

values observed in conditions involving no guessing and conditions involving a 25% and

a 50% chance of guessing correctly. These findings indicate that the mean unweighted

and weighted total fit statistics were robust to the effects of guessing in normally

distributed item distributions. The significant differences observed with the between fit

statistic indicate that, as the level of guessing increased, the statistic detected a bias in

item function (item familiarity) between ability groups. That is, low-ability groups

60

tended consistently to guess the correct answer as the probability of guessing increased.

An inspection of the data indicated that low-ability persons performed better than

expected on the majority of misfitting items detected in these experimental conditions.

In experiments involving uniformly distributed conditions, no significant

differences were noted between the mean unweighted and weighted total fit statistic at the

.05 level when varying the levels of guessing. This finding indicates that the unweighted

and weighted total fit statistics are robust to the effects of guessing in uniformly

distribution item distributions. The mean between fit statistic showed a significant

difference at the .05 level between experimental conditions involving no guessing and

those involving a 50% chance of guessing correctly. The difference observed with the

between fit statistic is believed to be related to a detection of item bias that favored

low-ability groups, a condition that was also observed in the normally distributed

experimental conditions.

To determine whether distribution types had an effect on Rasch fit statistics,

comparisons were made between mean fit statistics obtained at the same level of guessing

in each distribution type (normal and uniform). A significant difference was observed at

the .05 level between the mean unweighted total fit statistic in normal and uniform

experimental conditions involving no guessing. Significant differences were also found

at the .05 level between mean weighted fit values for experimental conditions involving

no guessing by distribution type and conditions involving a 25% chance of guessing

correctly by distribution type. No significant differences were found between mean fit

61

values associated with the between fit statistic in normal and uniform experimental

conditions.

These results indicate that, within distribution types, the unweighted and weighted

total fit statistics were robust (no significant differences) to varying levels of guessing,

and the between fit statistic tended to detect item bias as the level of guessing increased.

Thus, the between fit statistic was sensitive to item bias as the probability of guessing

correctly increased within distribution types. When comparisons were made between

distribution types, significant differences were found between the mean fit values

associated with the weighted and unweighted total fit values, but no differences were

observed in the between fit mean values.

Simulation Design Effects

The differences observed with the weighted and unweighted total fit values may

have been influenced by the parameters used when simulating the data. The normally

distributed conditions had an S.D. of 2, whereas the data simulated for the uniformly

distributed conditions had an S.D. of 1. This restriction significantly reduced the range of

item difficulties in the uniformly distributed data to > -2.0 to < 2.0, while the item

difficulties in the normally distributed conditions ranges from < -4.0 to > 5.0. Another

factor to be considered is the mean person ability used in the simulation. A mean person

ability of+1 was used in the simulation of all experimental conditions (normal and

uniform). Therefore, the item difficulties in the uniformly distributed conditions

centered on the average person ability. Thus, the difference observed with the weighted

62

fit statistic may have been influenced by its sensitivity to fit problems centered on the

item difficulty, which was, in the uniformly distributed conditions, centered on the

average persons' ability.

The differences observed with the between fit statistic within distribution types as

the level of guessing increased may have been influenced by differences in item difficulty

ranges between the two distribution types, and the number of ability groups used in the

analysis. The between fit statistic is sensitive to group membership (number of ability

groups). The greater the number of ability groups (maximum allowed by IP ARM = 5),

the less likely low- and medium-ability persons will be forced into the same group,

therefore reducing the influence of group membership on the between fit statistic. This is

especially important for persons using IP ARM for data analysis; the software attempts to

place an equal number of persons in each ability group.

Multiple choice tests constructed with five options per item and normally

distributed item difficulties will substantially reduce the probability of low-ability

persons consistently guessing the correct answer. In addition, the total test variance will

increase, thereby increasing the test's ability to differentiate between ability groups.

Detection of Guessing by Rasch Fit Statistics

In order to determine whether the frequency of misfitting items detected by Rasch

fit statistics differed by distribution type as a result of varying the levels of guessing, an

analysis of observed and expected misfitting frequencies was conducted using chi-square

at the .05 level of significance. Assuming no significant difference, the frequency of

63

misfitting items obtained in experimental conditions involving no guessing in normal and

uniform conditions was used as the baseline or theoretical frequency (expected) against

which those observed at the 25% and 50% levels were compared. By inducing no

measurement noise, the results were purely a matter of chance and were assumed to

closely approximate what could be expected on the single administration of four-option

per item multiple choice tests of various lengths administered to relatively high ability

groups of various sizes. Comparisons were made within and across distribution types.

In normally distributed experiments, significant differences were observed at the

.05 level between the theoretical and observed frequencies of misfitting items detected by

the between fit statistic. Significant differences were observed on a test simulated with

50 items and a 25% chance of guessing correctly, and a test involving a 50% chance of

guessing the correct answer. No significant differences were found between the observed

and expected frequencies detected by the unweighted and weighted total fit statistics.

In tests with uniformly distributed item difficulties, significant differences were

observed at the .05 level between expected and observed frequencies detected by the

between fit statistic and the weighted total fit statistics. On a test simulated with 25 items

and a 50% chance of guessing correctly, a significant difference was found at the .05

level. A significant difference was also found at the .05 level between observed and

expected frequencies detected by the weighted total fit statistic on a test with 100 items

and a 25% chance of guessing correctly. No significant differences were observed

between the expected and observed frequency of misfitting items detected by the

64

unweighted total fit statistic. Indications are that the unweighted between fit statistic was

more sensitive at detecting random guessing, because no significant differences were

observed between the expected and observed frequencies. However, as the probability of

guessing correctly increased, the between fit statistic was able to detect bias in the

function of the items. In order to determine whether distribution types had an effect on

the frequency of misfitting items detected by Rasch fit statistics, a chi-square analysis of

the frequency of misfitting items was conducted at the .05 level of significance.

Assuming no significance (null hypothesis), comparisons were made across distribution

types. The frequency of misfitting items detected in experiments involving normally

distributed conditions was used as the expected frequencies, and those detected in

experiments involving uniformly distributed conditions were used as the observed

frequencies. No significant differences were found between the frequency of misfitting

items detected by Rasch fit statistics in normal and uniformly distributed experimental

conditions. Thus, the unweighted total, weighted total, and between fit statistics were

robust to changes in item and person parameters, levels of guessing, and distribution

types (normal and uniform).

Guessing and the Logit Residual Index

Misfitting item data were investigated to determine the effects of guessing on the

Logit Residual Index (LRI). In experiments involving no guessing with normally

distributed conditions, approximately 60% of misfitting items were detected by the

unweighted total fit statistic compared to 12% by the weighted total and 41% by the

65

between fit statistic. In experiments involving uniformly distributed conditions, again,

about 60% of the misfitting items were detected by the unweighted total fit statistic

compared to 29% by the weighted total and 38% by the between fit statistic. Thus, the

unweighted total fit statistic was more sensitive to measurement disturbances of a random

nature than were the weighted total or between fit statistics. The sum of the percentages

may not total 100% due to one or more fit statistics identifying the same items as

misfitting. Likewise, the sum of the items detected by each fit statistic will not add up to

the total number of misfitting items at each level of guessing due to one or more fit

statistics detecting the same items.

Of the 17 items detected in the normally distributed conditions, 94% (16 of 17)

produced negative LRI values. The logit item difficulties for these items ranged from -

4.37 to 3.14. The greatest magnitude of change (negative) in LRI values was observed

when item misfit values for the unweighted total statistic were > 2.69 and the associated

item difficulty values indicated very easy or very difficult items. In the uniformly

distributed conditions with no guessing, 58% (14 of 24) of the misfitting items were

detected by the unweighted total fit statistic. Of the total detected (24), 83% (20 of 24)

had negative LRI values. The logit item difficulties for these items ranged from -1.32 to

.99. The greatest magnitude of change observed in the LRI values was when the

misfitting item difficulties were either very difficult or very easy, producing negative LRI

values. The negative LRI values indicated a negative trend in the residuals in both

66

distribution types. That is, persons in the low-ability groups performed better than

expected, but not well enough to cause a bias in the item function.

As the levels of guessing increased (25% and 50%), the between and unweighted

total fit statistics detected the greatest number of misfitting items, respectively. At the

25% level of guessing in normally distributed conditions, 74% (17 of 23) of misfitting

items were detected by the between fit statistic and 48% (11 of 23) by the unweighted

total fit statistic. Of these items, 78% (18 of 23) produced negative LRI values. The logit

item difficulties associated with these items ranged from -4.16 to 5.11. Again, the

greatest change (negative) in LRI values occurred when the unweighted total fit statistic

was > 2.69. When the between fit statistic was > 2.69, a negative LRI value was

observed, but the magnitude of change was not as great as the change observed with

unweighted total fit statistic. In the uniformly distributed conditions at the 25% level of

guessing, 55% (16 of 29) of the misfitting items were detected by the between fit statistic

and 31% (9 of 29) were detected by the weighted and unweighted total fit statistics,

respectively . The logit item difficulties associated with these items ranged from -1.55 to

1.80. Of the misfitting items detected, 76% (22 of 29) produced negative LRI values.

Again, the greatest change (negative) in LRI values occurred when the unweighted total

fit statistic was >2.69.

At the 50% level of guessing, more than half of the misfitting items were detected

by the between fit statistic in the normal and uniform experimental conditions. In the

normally distributed conditions, the unweighted total fit statistic detected 38% of the

67

misfitting items, and the weighted total detected 31%. Of the total number of misfitting

items detected in the normally distributed conditions, 81% (21 of 26) had negative LRI

values. The logit item difficulties for these items ranged from -3.29 to 4.74. As in

previous experiments, the greatest change in LRI values was observed when the

misfitting items detected by the unweighted total fit statistic were either very easy or very

difficult.

In the uniform conditions, 48% of the misfitting items were detected by the

unweighted total fit statistic, 30% by the weighted total, and 56% by the between fit

statistic. Of the total number of misfitting items detected, 93% (25 of 29) had negative

LRI values. The logit item difficulties associated with these items ranged from -1.77 to

1.21. The LRI showed large negative changes when misfitting items detected by the

unweighted total fit statistic were either very easy or very difficult. The magnitude of

change was greatest when the misfit values of the unweighted total fit statistic increased

above 2.69 than with similar values associated with the weighted and between fit

statistics. As the level of guessing increased, the percentage of misfitting items

producing negative LRI values also increased. However, in all situations the LRI was

able to identify group membership for the misfit problem. This is of particular interest

for persons using IP ARM for data analysis, because the software allows the user to

identify group membership based on demographic characteristics.

68

Summary

No significant differences were noted between mean fit values for the unweighted

and weighted total fit statistics within distribution types. Thus, these statistics were robust

to varying levels of guessing within distribution types (normal and uniform). The

differences observed across distribution types reflect the sensitivity of these statistics to the

magnitude of the differences between the ranges of item logit difficulties in each

distribution type as related to fit problems far away (unweighted total) and centered on

(weighted total) the logit difficulty of the items relative to the ability of the groups in which

the fit problems occurred. In the normally distributed conditions, the item difficulties

ranged from about -5 to +5 logits and from about -2 to +2 logits in the uniformly distributed

conditions.

The between fit statistics showed significant differences between mean fit values

within distributions types, but no significant differences were found across distribution

types. The differences observed within distribution types were due to low-ability persons

consistently guessing the correct answer as the probability of guessing the correct answer

increased. Consequently, the between fit statistic detected bias (item familiarity) in the item

functions. Because the experimental conditions (items, persons, and levels of guessing)

were the same in both distribution types, the detection of biased items by the between fit

statistic was relatively the same in both distribution types, resulting in no significant

differences across distribution types. Therefore, Rasch fit statistics appear to be robust to

varying levels of guessing within and across distribution types.

69

Of the misfitting items detected, the majority produced negative LRI values,

indicating a negative trend in the residuals. That is, low-ability persons performed better

than expected. The LRI was more sensitive to large positive unweighted total misfit

values than to the same or similar values observed for the weighted total or between fit

statistics. This was especially evident when the misfitting items were very easy or very

difficult. In all situations involving fit problems, the LRI was able to identify group

membership of persons in which the fit problems occurred. Since IP ARM allows the user

to identify group membership based on demographic characteristics, the implications for

the use of this statistic are phenomenal. Therefore, in the assessment of individual

differences, it is necessary to use Rasch fit statistics (unweighted total, weighted total,

and unweighted between fit statistics) in conjunction with the LRI in order to make more

accurate decisions about the classification, selection, and placement of the most able

persons.

Conclusions

The results of this study indicate that Rasch fit statistics yield valid and reliable

results on the single administration of tests of varying lengths and levels of guessing

within and across distributions types.

1. The Rasch fit statistics were robust to varying levels of guessing within and

across distribution types (uniform and normal), making the Rasch model an ideal

measurement tool for the assessment of nonnormal populations.

70

2. The unweighted total fit statistic was more sensitive to measurement

disturbances (random guessing)in items far away from the average ability of the groups in

which the misfit problem occurred.

3. The weighted total fit statistic was more sensitive to fit problems centered on

the logit difficulty of the misfitting items. Stated differently, the statistic was sensitive to

fit problems in persons with a logit ability centered on the logit difficulty of the misfitting

items.

4. As the probability of guessing correctly increased, low-ability persons tended

consistently to guess the correct answer, thereby inducing systematic item familiarity bias

in the items that was detected by the between fit statistic.

5. The Logit Residual Index was more sensitive to large positive misfit values

associated with the unweighted total statistic than to similar values associated with the

weighted total or between fit statistics.

6. The Logit Residual Index simplifies the interpretation and identification of

group membership in which misfit problems occurred.

Further Study

The Rasch measurement model should be applied to known published results of

tests constructed under other measurement models to determine whether the results

obtained by Rasch analysis differ significantly from the published results. In addition,

the information included in Appendix C of this study will provide an excellent data

source for establishing acceptable limits for the point biserial correlation as a fit index.

71

Recommendations

The Rasch measurement model has been found to be sufficient, efficient,

unbiased, and consistent when applied to the measurement of individual differences.

Given these characteristics, it is strongly recommended that these statistics be applied to

the analysis of test data obtained under other measurement models. This will reduce the

one-size-fits-all approach to measurement and insure that decisions made about the

selection, placement, and classification of individuals are based on "true scores" (test

results) that are unbiased, efficient, consistent, and efficient.

APPENDIX A

IP ARM DATA CONTROL FILE PARAMETERS

73

IP ARM was used to analyze each of the 54 data sets used in the 54 experimental

conditions. To perform an analysis, you must construct a control file. All information

entered into the control program will be used as a scratch file. The scratch file is then

used by all subprograms to perform the selected analyses. In order to use the control

information in subsequent analyses, a permanent control file must be created. To create a

permanent control file, the following information must be entered and saved on a

computer diskette or the hard disk of your computer:

Test information

1. Test name (60 characters maximum).

2. Model selection (dichotomous or rating scale).

3. Analysis Type (item fit, person fit, both item and person analysis).

4. Test information (number of items, number of people, starting location of items in record).

Item information 1. Item difficulty information (output files from MSCALE, BIGSCALE,

BIGSTEPS, or hand entered).

2. Item name information (eight characters maximum).

Item fit information

1. Item fit information (omit misfitting persons; Y or N).

2. Enter maximum person t value for inclusion (99 for all positive misfit persons and -99 for all negative misfit persons). By default, item between fit information is provided.

Item between fit information

1. How many additional between fit statistic do you want? (item analysis always provide between fit statistics based on ability groups as a default. A maximum of four additional between fit statistics can be requested).

(appendix continues'!

74

Appendix A (continued)

If additional between fit statistics were requested, you will be prompted to enter

the title of the report, number of subgroups (maximum of 5), and the location of the

variable in the data vector. You will then be asked to enter a value for each group, the

title for each group, and if you want a listing of the group information and whether you

want to make changes to any item fit values.

Person fit information

1. Enter the number of person between fit statistics waited (5 maximum). 2. Enter the title for each group.

3. Enter the number of items in each group.

4. Identify group membership for each item.

5. Do you want a listing of the information (Y or N)?

6. Do you want to add an analysis (Y or N)?

7. Do you want to modify any item assignment.

Scoring information

1. Enter the name of the response file (list disk drive and file name)

2. Enter drive name for scratch files (temporary files used by program during analysis).

3. Do you want a listing of the logit ability scale (Y or N)?

4. Enter choice for person plot (Score resid. =1, Std. resid. =2).

5. Enter valid responses.

6. Enter the omit character.

7. Should omits be scored as incorrect (Y or N)?

8. Do the responses need to be scored (Y or N)?

9. Do you want to make changes to the score information values (Y or N).

(appendix continues)

75

Appendix A (continued)

Saving control information

All information entered into the program up to this point is used to create the

control file. It is used as a scratch file by the subprogram to perform all analysis. In

order to use this information in future analyses, you must create a permanent file. You

will be prompted to answer the following questions:

1. Do you want to save all the control information (Y or No)?

2. Enter the drive specification and the name of the file to be created.

Depending on the number of item between fit analyses you request, IP ARM will

produce one to two pages of item information. However, the greater the number of items

and groups used in the analysis, the greater the amount of output.

APPENDIX B

A SUMMARY OF MISFITTING ITEM STATISTICS BY EXPERIMENT

77

A Summary of Misfitting Item Statistics by Experiment

Item #

Logit item diff.

Point, bis. corr.


Ability between

fit

Mean item score

Logit Residual

Index

17 0.88 Experiment 1

17 0.88 0.30 2.02 0.98 -1.00 0.48 -1.49 22 3.14 0.05 1.07 0.46 2.44 0.12 -0.16 24 3.14 0.05 2.21 0.23 0.43 0.12 -1.17

Experiment 2 (none)


69 1.49 0.23 0.75 1.76 2.05 0.40 -0.21 87 2.17 0.13 2.83 1.02 1.28 0.28 -1.96 92 3.05 0.07 3.13 0.59 -0.40 0.16 -2.06 94 2.17 0.18 2.69 0.91 -0.28 0.28 -1.89


18 0.88 0.28 2.41 1.81 -1.12 0.48 -1.13

15 -1.47 Experiment 5

15 -1.47 0.26 0.71 0.58 2.07 0.88 -0.03 Experiment 6

11 -3.68 -0.23 2.75 0.46 2.34 0.98 -1.00 62 0.31 0.24 1.48 1.77 2.11 0.62 -0.40 74 1.02 0.30 0.68 1.62 2.39 0.48 -0.14

Experiment 7(none)

Experiment 8 36 1.04 0.32 3.13 0.88 -0.31 0.53 -1.53 41 1.72 0.60 -2.23 -2.41 1.48 0.39 0.48 44 2.04 0.20 2.27 2.10 0.68 0.33 -0.59

Experiment 9 3 -4.37 -0.09 1.98 0.38 2.70 0.99 -0.26

36 -0.80 0.36 0.94 0.32 2.14 0.80 -0.10 Experiment 10

10 -0.42 0.15 3.11 1.16 0.57 0.72 -2.27 23 2.59 0.04 1.55 1.40 2.01 0.20 -0.46


22 -1.15 -0.04 1.40 1.07 2.26 0.80 -0.36 24 -0.65 0.19 0.55 0.87 2.12 0.72 -0.07


18 -1.72 -0.28 2.29 0.61 1.43 0.88 -0.68 51 0.03 0.15 0.88 1.19 2.11 0.60 -0.37

(appendix continues!

78

Appendix C (continued)

Item #

Logit item diff.

Point, bis. corr.


Ability between

fit

Mean item score

Logit Residual

Index

-3.33 Experiment 13

2 -3.33 -0.03 1.21 0.41 2.80 0.98 -0.03 19 1.12 0.24 2.01 2.43 2.56 0.52 -0.70 25 5.11 0.00 2.41 0.33 1.18 0.04 -0.77


37 0.94 0.65 -2.02 -2.58 2.29 0.48 0.67


10 -2.83 -0.01 0.92 0.50 2.89 0.96 -0.01 32 -1.31 0.18 0.88 0.54 2.24 0.86 -0.09 98 4.13 0.09 1.39 0.20 2.16 0.06 -0.16

Experiment 16 (none)

-4.16 Experiment 17

3 -4.16 0.01 0.86 0.35 2.02 0.99 0.00 17 -1.08 0.22 0.52 0.44 2.28 0.84 -0.02 35 0.73 0.57 -1.62 -2.44 2.33 0.53 0.39 36 1.33 0.26 2.51 2.22 2.72 0.41 -0.79


14 -2.32 0.22 2.05 -0.11 -1.52 0.94 -0.45 51 0.15 0.27 2.47 1.33 1.36 0.64 -0.67 66 0.41 0.23 2.64 2.20 1.28 0.59 -0.76 72 1.36 0.38 0.40 0.84 2.09 0.40 -0.06 75 1.36 0.60 -2.03 -2.01 2.01 0.40 0.43 84 1.46 0.66 -2.72 -2.96 2.60 0.38 0.50


23 1.54 -0.07 2.87 2.40 1.42 0.32 -1.79


28 0.27 0.06 1.67 2.27 2.86 0.64 -0.64


40 -0.66 -0.22 1.77 1.53 2.01 0.76 -0.54 48 -1.22 -0.08 1.22 0.75 2.07 0.84 -0.22 49 -0.21 0.43 -0.57 -0.17 2.15 0.68 0.22 54 0.74 -0.03 2.00 2.08 0.28 0.48 -1.30 72 -0.01 0.72 -1.83 -1.87 2.47 0.64 0.72 92 2.49 -0.12 1.26 0.70 2.23 0.16 -0.25


21 1.63 0.22 1.80 2.06 2.14 0.34 -0.52


79

Appendix B (continued)

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index


9 -2.61 0.13 3.96 -0.12 0.66 0.94 -2.81 31 0.19 0.36 0.35 1.10 2.03 0.60 -0.01 32 0.60 0.22 1.74 2.35 1.50 0.52 -0.64 33 1.01 0.61 -1.20 -1.69 2.37 0.44 0.40 46 2.16 0.28 0.89 0.80 2.21 0.24 -0.10


25 -1.35 0.42 2.09 -0.75 -1.21 0.86 -0.72 34 -1.35 0.31 2.24 -0.11 -1.21 0.86 -0.74 39 -1.00 0.12 2.08 0.96 2.15 0.82 -0.42 45 -0.06 0.30 2.01 1.05 -0.51 0.68 -0.73

-2.70 Experiment 25

3 -2.70 0.10 1.30 0.23 2.02 0.96 -0.12 5 -1.50 0.25 2.69 0.07 -0.64 0.89 -0.74 7 -1.62 0.19 1.18 0.35 2.13 0.90 -0.10 19 1.60 0.31 1.25 2.07 1.06 0.40 -0.26

-3.29 Experiment 26

3 -3.29 -0.01 0.94 0.32 3.13 0.98 -0.01 34 0.89 0.58 -2.13 -2.28 1.10 0.54 0.52 40 1.53 0.59 -2.12 -2.15 0.77 0.41 0.48 49 4.74 -0.07 1.57 0.47 2.81 0.04 -0.09

Experiment 27(none)


-0.74 Experiment 29

2 -0.74 -0.01 2.63 0.60 -1.38 0.84 -1.33


40 -0.45 -0.17 1.82 1.99 2.24 0.72 -0.71 58 -0.23 0.05 2.73 0.86 -1.13 0.68 -1.65 66 0.74 -0.00 2.51 2.16 0.93 0.48 -1.77 92 -0.03 -0.15 2.09 2.58 2.14 0.64 -1.06 98 0.93 0.03 2.20 1.95 1.40 0.44 -1.38


19 0.25 0.07 1.38 2.15 1.64 0.68 -0.38


30 -0.19 0.12 1.09 2.23 2.59 0.72 -0.23 37 0.16 0.63 -1.34 -2.05 1.98 0.66 0.41 45 0.99 0.25 1.21 2.02 1.77 0.50 -0.42


80

Appendix B (continued*)

Item #

Logit item diff.

Point, bis. corr.


Ability between

fit

Mean item score

Logit Residual

Index


23 -0.73 0.17 0.75 1.07 2.93 0.80 -0.05 57 0.36 0.24 2.09 1.34 1.30 0.62 -0.77 61 0.46 0.60 -1.51 -1.39 2.30 0.60 0.45 73 0.56 0.33 0.68 1.04 2.06 0.58 -0.14 84 0.86 0.22 2.00 1.91 0.95 0.52 -0.79


8 -0.06 0.23 2.44 1.29 1.16 0.73 -0.58 17 0.00 0.20 1.26 2.06 2.66 0.72 -0.17 21 0.97 0.61 -2.07 -2.57 1.79 0.54 0.49


8 -0.09 0.34 2.57 0.76 0.37 0.69 -0.88 11 -1.32 0.12 2.74 0.89 3.22 0.86 -0.53 36 -0.20 0.54 -1.42 -1.18 2.10 0.71 0.23


11 -1.15 0.26 2.22 0.53 0.66 0.85 -0.41 67 0.34 0.29 3.21 1.50 0.99 0.63 -1.06 76 0.23 0.39 2.20 0.43 -0.04 0.65 -0.77


21 1.41 0.30 0.98 2.19 2.36 0.48 -0.28

-0.77 Experiment 38

5 -0.77 -0.01 2.42 0.68 -1.05 0.84 -1.12 9 -0.19 0.04 1.28 1.24 2.50 0.76 -0.34

31 0.49 -0.11 2.54 2.60 2.03 0.64 -1.33


13 -0.67 -0.07 1.72 0.80 2.16 0.84 -0.38 36 -1.55 -0.17 1.44 0.65 2.44 0.92 -0.19 84 0.80 0.70 -1.79 -1.72 2.07 0.60 0.75 89 1.75 0.71 -1.86 -2.47 1.00 0.40 0.80 90 0.80 0.73 -1.97 -1.95 2.07 0.60 0.82


10 -0.62 0.31 2.12 -0.12 -1.04 0.82 -0.62


28 0.46 0.18 0.89 1.88 2.46 0.58 -0.33 42 0.65 0.54 -1.10 -2.00 0.45 0.54 0.53

(appendix continues^

Appendix B (continued^)

81

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

38 Experiment 42

38 0.20 0.34 3.99 0.26 -0.47 0.70 -2.71 42 -0.47 0.15 1.48 1.31 2.53 0.80 -0.31 47 0.08 0.21 2.16 1.22 1.05 0.72 -0.75 56 0.55 0.21 1.62 1.96 2.09 0.64 -0.53 64 -0.05 0.26 0.75 1.20 2.28 0.74 -0.08 88 1.28 0.63 -1.64 -2.20 1.40 0.50 0.57


23 1.22 0.34 1.38 1.92 2.08 0.51 -0.33 24 1.80 0.33 3.16 1.32 -0.11 0.40 -1.10

42 0.93 Experimemt 44

42 0.93 0.22 2.36 2.23 1.11 0.50 -0.73


32 -0.24 0.25 0.71 1.61 2.08 0.72 -0.07 33 -0.24 0.29 0.54 1.21 2.08 0.72 -0.04 49 0.25 0.29 1.54 1.66 2.21 0.63 -0.34 76 0.76 0.26 1.62 2.49 1.92 0.53 -0.36 86 0.91 0.30 2.09 1.49 -0.08 0.50 -0.64 87 0.81 0.57 -1.58 -2.02 0.04 0.52 0.38 90 0.86 0.29 2.13 1.70 0.26 0.51 -0.67 99 0.96 0.26 1.73 2.40 2.40 0.49 -0.42


-0.50 Experiment 47

7 -0.50 0.14 2.35 0.79 1.00 0.80 -1.04 10 -1.77 -0.20 1.80 0.90 3.41 0.92 -0.31 23 -0.50 0.20 0.87 0.93 2.44 0.80 -0.13 28 0.96 -0.01 2.75 2.71 1.49 0.56 -1.74 39 0.53 0.77 -1.83 -2.13 1.86 0.64 0.61


33 -0.21 -0.20 2.32 1.51 0.83 0.76 -0.95 36 -0.78 -0.07 1.28 0.85 2.11 0.84 -0.26 67 1.21 -0.06 2.35 2.47 2.40 0.48 -1.52 76 0.65 -0.09 2.30 2.36 1.99 0.60 -1.40 96 0.45 -0.03 2.20 1.66 1.50 0.64 -1.14


23 0.57 0.25 2.39 1.83 0.69 0.58 -0.98


82

Appendix B (continued)

Item #

Logit item diff.

Point, bis. corr.


Ability between

fit

Mean item score

Logit Residual

Index


14 -0.26 0.10 1.39 2.04 3.16 0.74 -0.31 17 -0.26 0.15 0.86 1.96 2.39 0.74 -0.13 28 -0.02 0.32 2.11 0.48 -0.95 0.70 -1.15 38 0.54 0.64 -1.37 -2.37 1.76 0.60 0.45 45 0.54 0.26 0.93 2.06 2.39 0.60 -0.27


55 0.30 0.20 1.41 1.72 2.14 0.68 -0.39 57 -0.06 0.18 1.41 1.38 2.35 0.74 -0.29

-0.61 Experiment 52

3 -0.61 0.17 2.19 1.04 2.02 0.82 -0.36 6 -0.32 0.31 0.95 0.42 2.37 0.78 -0.10

11 0.19 0.23 1.41 1.58 2.78 0.70 -0.26 23 0.67 0.46 -0.09 -0.53 2.24 0.61 -0.04

35 0.42 Experimemt 53

35 0.42 0.14 3.23 2.68 3.82 0.61 -0.99


16 -0.63 0.20 2.16 0.56 0.06 0.83 -0.36 45 0.34 0.26 2.46 0.88 -1.42 0.68 -0.73 86 0.76 0.26 1.05 1.74 2.78 0.60 -0.20 89 0.76 0.36 2.10 -0.01 -0.19 0.60 -0.75

APPENDIX C

SUMMARY OF ITEM FIT INFORMATION FOR EXPERIMENT 1-54

84

Summary of Item Fit Information for Experiment 1-54

Experiment 1: Summary of Item Fit Information for a Normally Distributed Item Difficulty Distribution With 25 Items. 25 Persons, and No Guessing

Logit Point Unwt. Wt. Ability Mean Logit Item item bis. total total between item Residual

# diff. corr. fit fit fit score Index 1 -3.31 0.36 0.11 0.27 -0.54 0.96 0.04 2 -3.31 0.07 0.88 0.57 -0.54 0.96 0.03 3 -4.09 -9.99 0.20 -0.92 0.07 1.00 -0.00 4 -2.46 -0.01 1.09 1.02 1.18 0.92 -0.03 5 -2.46 0.20 1.55 0.25 1.18 0.92 -0.30 6 -2.46 0.64 -0.62 -0.70 -0.16 0.92 0.06 7 -1.90 0.72 -0.93 -1.02 0.18 0.88 0.10 8 -0.77 0.68 -0.62 -0.87 -0.50 0.76 0.20 9 -0.22 0.72 -1.21 -1.17 0.69 0.68 0.39

10 -0.77 0.61 -0.64 -0.23 1.18 0.76 0.16 11 -1.46 0.65 -0.38 -0.75 -0.50 0.84 0.11 12 0.02 0.54 -0.24 -0.05 -0.15 0.64 0.06 13 0.02 0.31 1.36 1.08 0.94 0.64 -0.63 14 0.02 0.48 0.25 0.26 -0.55 0.64 -0.01 15 -0.22 0.38 0.83 0.78 0.26 0.68 -0.24 16 0.25 0.72 -1.50 -1.48 0.55 0.60 0.57 17 0.88 0.30 2.02 0.98 -1.00 0.48 -1.49 18 1.09 0.35 0.74 0.78 -0.44 0.44 -0.32 19 1.51 0.51 -0.54 -0.66 0.19 0.36 0.28 20 1.51 0.51 -0.54 -0.66 0.19 0.36 0.28 21 2.20 0.43 -0.30 -0.36 -1.46 0.24 0.15 22 3.14 0.05 1.07 0.46 2.44 0.12 -0.16 23 2.46 0.11 0.57 1.04 1.22 0.20 -0.02 24 3.14 0.05 2.21 0.23 0.43 0.12 -1.17 25 3.14 0.38 -0.25 -0.19 0.09 0.12 0.09 Mean -0.16 0.41 0.20 -0.05 0.20 0.61 S.D. 2.14 0.25 0.99 0.78 0.85 Groups 2 Note: Raw score mean = 15.24 with a S.D. of 4.12. Mean person ability = 0.70 with a S.D. of 1.28. Test reliability (K.R. 20) = 0.80. Reliability of person separation = -0.80.

85


Experiment 2: Summary of Item Fit Information for a Normally Distributed Item Difficulty Distribution With 50 Items. 25 Persons, anH No Guessing

Ability Mean Logit between item Residual

fit score Index

Logit Point Unwt. Wt. Item item bis. total total

# diff. corr. fit fit 1 -3.54 -9.99 0.15 -0.96 2 -2.78 0.30 0.17 0.28 3 -2.78 0.39 -0.00 0.08 4 -3.54 -9.99 0.15 -0.96 5 -1.97 0.50 -0.40 -0.24 6 -3.54 -9.99 0.15 -0.96 7 -1.97 0.40 -0.10 0.04 8 -1.97 0.47 -0.31 -0.13 9 -1.45 0.49 -0.40 -0.12

10 -2.78 0.30 0.17 0.28 11 -2.78 0.14 0.57 0.45 12 -2.78 0.39 -0.00 0.08 13 -2.78 0.16 0.51 0.43 14 -0.40 0.38 0.49 0.48 15 -2.78 0.07 0.80 0.48 16 -0.70 0.45 -0.09 0.07 17 -1.04 0.53 -0.53 -0.23 18 -0.40 0.41 0.02 0.48 19 -1.04 0.13 1.28 1.03 20 -1.04 0.46 0.14 -0.22 21 0.12 0.69 -1.18 -1.21 22 0.36 0.67 -1.08 -1.03 23 -1.04 0.42 0.07 0.07 24 0.12 0.57 -0.53 -0.31 25 -0.40 0.75 -1.35 -1.71 26 0.12 0.77 -1.65 -1.93 27 0.36 0.39 0.49 0.79 28 0.81 0.58 -0.56 -0.38 29 -0.70 0.56 -0.21 -0.59 30 -0.13 0.10 1.72 1.84 31 0.12 0.47 0.22 0.17 32 -0.40 0.39 0.05 0.61 33 0.59 0.31 0.93 1.31 34 0.36 0.54 0.67 -0.45 35 0.81 0.34 0.96 1.09 36 1.02 0.45 0.31 0.39 37 0.81 0.20 1.86 1.87 38 0.59 0.42 0.49 0.63 39 1.89 0.34 0.25 1.10

0.05 -0.49 -0.49 0.05

-0.11 0.05

-0.11 -0.11 0.21

-0.49 -0.49 -0.49 -0.49 -0.60 -0.49 0.81 0.51 0.17

-0.54 -0.54 0.53 1.03

-0.54 -1.40 1.11 1.77

-0.41 -0.47 -1.52 1.05

-1.40 -0.60 0.49 1.03 1.08

-1.48 1.08 0.49

-0.15

1.00 0.96 0.96 1.00 0.92 1.00 0.92 0.92 0.88 0.96 0.96 0.96 0.96 0.76 0.96 0.80 0.84 0.76 0.84 0.84 0.68 0.64 0.84 0.68 0.76 0.68 0.64 0.56 0.80 0.72 0.68 0.76 0.60 0.64 0.56 0.52 0.56 0.60 0.36

-0.00 0.04 0.03

-0.00 0.06

-0.00 0.06 0.06 0.09 0.04 0.04 0.03 0.04

-0.08 0.03 0.09 0.12 0.05

-0.27 0.04 0.38 0.41 0.05 0.22 0.29 0.46

-0.11 0.27 0.10

-0.63 -0.04 0.05

-0.42 -0.31 -0.36 -0.07 -0.91 -0.10 -0.06

(appendix continues"!


86

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 40 2.36 0.53 -0.50 -0.51 -1.31 0.28 0.21 41 2.12 0.38 0.46 0.43 0.15 0.32 -0.07 42 2.12 0.19 0.83 1.70 1.59 0.32 -0.22 43 1.89 0.69 -1.21 -1.74 -0.15 0.36 0.45 44 2.61 0.38 0.72 -0.08 -0.91 0.24 -0.20 45 2.36 0.50 -0.04 -0.56 0.64 0.28 0.10 46 2.61 0.45 0.02 -0.31 -0.91 0.24 0.07 47 3.60 0.13 0.96 0.51 0.83 0.12 -0.13 48 3.21 0.46 -0.40 -0.32 0.14 0.16 0.12 49 3.21 0.47 -0.37 -0.40 0.14 0.16 0.12 50 5.61 -9.99 0.15 -1.03 0.06 0.00 -0.00 Mean S.D. Groups

-0.10 2.12

0.41 0.22

0.10 0.71

0.01 0.86

-0.03 0.79 2

0.66

Note: Raw score mean = 32.96 with a S.D Mean person ability =1.11 with a S.D. of Test reliability (K.R. 20) = 0.90. Reliability of person separation = 0.89

. of 7.92. 1.27.

87

Appendix C (continued^


Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 1 -3.68 -9.99 0.20 -0.92 0.06 1.00 -0.00 2 -3.68 -9.99 0.20 -0.92 0.06 1.00 -0.00 3 -3.68 -9.99 0.20 -0.92 0.06 1.00 -0.00 4 -2.92 0.46 -0.05 -0.02 -0.47 0.96 0.03 5 -2.08 0.32 0.47 0.11 1.02 0.92 0.02 6 -3.68 -9.99 0.20 -0.92 0.06 1.00 -0.00 7 -3.68 -9.99 0.20 -0.92 0.06 1.00 -0.00 8 -2.92 0.42 0.01 0.10 -0.47 0.96 0.03 9 -3.68 -9.99 0.20 -0.92 0.06 1.00 -0.00

10 -3.68 -9.99 0.20 -0.92 0.06 1.00 -0.00 11 -2.92 0.42 0.01 0.10 -0.47 0.96 0.03 12 -2.92 0.15 0.57 0.50 -0.47 0.96 0.04 13 -2.08 0.35 0.31 0.09 -0.08 0.92 0.04 14 -2.92 0.14 0.61 0.51 -0.47 0.96 0.04 15 -2.08 0.63 -0.61 -0.78 -0.08 0.92 0.07 16 -1.54 0.60 -0.52 -0.71 0.25 0.88 0.10 17 -2.92 0.42 0.01 0.10 -0.47 0.96 0.03 18 -2.92 0.05 0.88 0.55 -0.47 0.96 0.02 19 -2.92 0.46 -0.05 -0.02 -0.47 0.96 0.03 20 -2.92 0.46 -0.05 -0.02 -0.47 0.96 0.03 21 -1.13 0.36 0.19 0.34 -0.63 0.84 0.05 22 -1.54 0.56 -0.29 -0.59 0.25 0.88 0.09 23 -2.08 0.39 0.09 0.03 -0.08 0.92 0.05 24 -2.92 0.14 0.61 0.51 -0.47 0.96 0.04 25 -3.68 -9.99 0.20 -0.92 0.06 1.00 -0.00 26 -1.54 0.31 0.60 0.29 0.16 0.88 -0.04 27 -2.08 0.45 -0.21 0.01 -0.08 0.92 0.06 28 -0.79 0.29 0.22 0.83 0.86 0.80 0.03 29 -1.13 0.28 0.06 0.86 0.56 0.84 0.05 30 -1.54 0.03 0.99 1.11 0.16 0.88 -0.09 31 -0.23 0.60 -0.73 -0.73 0.06 0.72 0.27 32 -2.92 -0.02 1.11 0.57 2.35 0.96 -0.02 33 -2.08 0.45 -0.21 0.01 -0.08 0.92 0.06 34 -2.92 0.21 0.42 0.46 -0.47 0.96 0.04 35 -1.54 0.36 0.30 0.21 0.16 0.88 0.02 36 -1.13 0.40 0.04 0.22 -0.63 0.84 0.05 37 -1.54 0.31 0.45 0.28 0.16 0.88 0.01 38 -0.49 0.39 -0.01 0.46 -0.51 0.76 0.07 39 -1.13 0.34 0.70 0.25 -0.63 0.84 -0.11


Appendix C (continued'!

88

Item #

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

Logit Point Unwt. Wt. Ability Mean item bis. total total between item diff. corr. fit fit fit score

-1.54 0.24 1.03 0.34 0.16 0.88 -1.54 0.54 -0.19 -0.55 0.25 0.88 -0.79 0.62 -0.54 -0.92 -1.34 0.80 -1.54 0.61 -0.56 -0.74 0.25 0.88 -0.23 0.63 -0.81 -1.04 0.06 0.72 0.02 0.44 1.46 -0.30 -1.31 0.68

-0.49 0.22 0.60 1.14 1.56 0.76 -0.49 0.15 1.13 1.22 1.56 0.76 -0.23 0.70 -1.22 -1.43 1.46 0.72 0.02 0.42 0.07 0.35 0.38 0.68

-0.23 0.66 -0.98 -1.17 0.06 0.72 -0.49 0.52 -0.22 -0.34 0.08 0.76 -0.49 0.30 0.47 0.73 0.08 0.76 0.02 0.53 -0.44 -0.34 0.55 0.68 0.46 0.23 1.27 1.49 0.50 0.60

-0.49 0.44 0.08 0.08 -0.51 0.76 0.46 0.43 -0.08 0.40 -1.21 0.60

-0.79 0.55 -0.32 -0.48 -1.34 0.80 0.88 0.52 -0.57 -0.22 0.51 0.52 0.67 0.60 -0.90 -1.05 -0.62 0.56 0.46 0.37 0.44 0.74 0.50 0.60 0.24 0.63 -1.03 -1.17 1.01 0.64 0.67 0.38 0.42 0.71 1.11 0.56 0.46 0.17 1.51 1.93 1.66 0.60

-0.23 0.41 0.00 0.42 0.06 0.72 0.67 0.65 -1.27 -1.50 0.81 0.56 0.46 0.24 1.73 1.37 0.50 0.60 0.24 0.53 -0.29 -0.46 -0.41 0.64

-0.79 0.34 -0.04 0.73 0.86 0.80 1.49 0.23 0.75 1.76 2.05 0.40 0.02 0.38 0.38 0.50 0.38 0.68 1.08 0.34 0.54 1.07 1.07 0.48 1.08 0.57 -0.84 -0.80 -0.58 0.48 0.46 0.66 -1.24 -1.49 1.45 0.60 1.29 0.44 -0.07 0.24 0.41 0.44 2.17 0.49 -0.46 -0.16 1.24 0.28 0.88 0.67 -1.44 -1.82 1.36 0.52 1.08 0.44 -0.15 0.35 -0.23 0.48 1.29 0.55 -0.67 -0.58 -1.38 0.44 0.88 0.38 0.42 0.69 -1.13 0.52 1.93 0.49 -0.32 -0.20 0.20 0.32 1.93 0.50 0.08 -0.54 -0.93 0.32 0.46 0.44 0.04 0.27 -1.21 0.60

Logit Residual

Index

-0.16 0.08 0.16 0.10 0.28

-0.79 -0.07 -0.28 0.35 0.10 0.31 0.14

-0.07 0.15

-0.59 0.08 0.16 0.14 0.36 0.46

-0.05 0.41

-0.08 -0.75 0.06 0.57

-0.94 0.20 0.04

-0.21 -0.09 -0.15 0.46 0.51 0.15 0.21 0.64 0.20 0.39

-0.10 0.14 0.04 0.10

(appendix continues')


89

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 83 1.93 0.52 -0.39 -0.50 0.20 0.32 0.22 84 2.43 0.22 0.48 1.05 -0.88 0.24 -0.08 85 3.05 0.40 -0.06 0.06 -0.40 0.16 0.08 86 1.71 0.61 -0.74 -1.27 -1.03 0.36 0.35 87 2.17 0.13 2.83 1.02 1.28 0.28 -1.96 88 2.72 0.13 0.47 1.35 -1.19 0.20 -0.03 89 2.17 0.59 -0.17 -1.22 -0.28 0.28 0.10 90 1.71 0.54 -0.44 -0.61 -1.03 0.36 0.26 91 2.72 0.47 -0.35 -0.15 0.70 0.20 0.13 92 3.05 0.07 3.13 0.59 -0.40 0.16 -2.06 93 2.17 0.30 0.58 0.64 -0.22 0.28 -0.12 94 2.17 0.18 2.69 0.91 -0.28 0.28 -1.89 95 2.72 0.51 -0.39 -0.40 0.70 0.20 0.15 96 3.95 0.29 0.32 0.09 -0.15 0.08 0.03 97 2.43 0.29 0.67 0.56 -0.88 0.24 -0.15 98 4.76 0.39 0.02 0.03 -0.51 0.04 0.03 99 4.76 0.27 0.23 0.27 -0.51 0.04 0.04 100 4.76 0.07 0.74 0.45 -0.51 0.04 0.03

Mean S.D. Groups

0.29 2.13

0.39 0.21

0.15 0.78

0.00 0.78

0.03 0.78 2

0.67

Note: Raw score mean = 66.52 with a Mean person ability = 0.97 with a S.D Test reliability (K.R. 20) = 0.94. Reliability of person separation = 0.94

S.D. of 14.91. .of 1.24.

90



Logit Point Item item bis.

# diff. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 0.45 -1.06 0.04 1.00 -0.00 0.45 -1.06 0.04 1.00 -0.00 0.98 0.31 0.33 0.92 -0.11 0.78 0.54 0.87 0.94 -0.01

-0.05 0.13 -0.34 0.96 0.04 -0.49 -0.39 0.26 0.90 0.08 -0.51 -0.38 0.90 0.82 0.12 0.33 0.53 -0.63 0.88 0.02 1.23 0.69 1.83 0.84 -0.21

-0.12 -0.35 -1.20 0.84 0.07 0.48 -0.28 0.47 0.84 -0.05 0.03 0.24 1.35 0.64 0.09 0.42 0.60 -0.89 0.70 -0.05

-0.03 0.13 0.09 0.68 0.08 0.83 1.05 -1.34 0.52 -0.26

-0.05 -0.46 0.83 0.52 0.02 -0.84 -0.73 -0.63 0.54 0.33 2.41 1.81 -1.12 0.48 -1.13

-1.10 -1.06 0.70 0.44 0.38 -0.33 0.02 -0.85 0.30 0.11 0.54 0.77 -1.10 0.40 -0.11

-0.81 -0.63 -0.63 0.34 0.26 -1.53 -1.90 1.15 0.26 0.29 0.26 0.25 -0.21 0.16 0.00

-0.27 -0.13 0.45 0.10 0.06 0.12 -0.05 0.01 0.64

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

-4.57 -4.57 -2.28 -2.62 -3.09 -2.00 -1.18 -1.76 -1.35 -1.35 -1.35 0.02

-0.33 -0.21 0.67 0.67 0.57 0.88 1.10 1.90 1.31 1.65 2.16 2.96 3.64

Mean S.D.

Groups

-0.37 2.16

-9.99 -9.99 0.21 0.11 0.30 0.45 0.48 0.27 0.21 0.44 0.40 0.45 0.38 0.45 0.39 0.52 0.56 0.28 0.59 0.49 0.42 0.56 0.68 0.41 0.45 0.41 0.19 0.82 0.80 0.88

2 Note: Raw score mean = 16.02 with a Mean person ability = 0.81 with a S.D Test reliability (K.R. 20) = 0.79. Reliability of person separation = 0.81

S.D. of 4.05. .of 1.35.

91


Experiment 5: Summary of Item Fit Information for a Normally Distributed Ttem Difficulty Distribution With 50 Items. 50 Persons, and No Guessing

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 1 -3.72 0.44 -0.23 -0.15 -0.59 0.98 0.01 2 -4.48 -9.99 0.15 -0.94 0.07 1.00 -0.00 3 -3.72 0.17 0.40 0.47 -0.59 0.98 0.03 4 -3.72 0.24 0.19 0.41 -0.59 0.98 0.03 5 -2.03 0.39 -0.23 0.17 0.27 0.92 0.06 6 -3.72 0.10 0.65 0.50 -0.59 0.98 0.03 7 -2.91 0.29 0.31 0.24 -0.25 0.96 0.02 8 -2.41 0.47 -0.64 0.02 0.03 0.94 0.05 9 -1.47 0.43 -0.22 0.13 -1.56 0.88 0.06

10 -1.47 0.58 -0.91 -0.66 0.70 0.88 0.10 11 -1.25 0.46 -0.50 0.06 0.90 0.86 0.06 12 -2.03 0.31 0.66 0.26 -0.12 0.92 -0.04 13 -0.55 0.40 0.19 0.39 -0.72 0.78 -0.02 14 -2.41 0.42 -0.28 -0.01 0.03 0.94 0.05 15 -1.47 0.26 0.71 0.58 2.07 0.88 -0.03 16 -1.25 0.48 -0.61 0.02 0.90 0.86 0.09 17 -1.47 0.58 -0.98 -0.61 0.70 0.88 0.10 18 -1.25 0.46 0.44 -0.34 0.19 0.86 -0.05 19 -1.05 0.41 0.59 -0.06 1.12 0.84 -0.04 20 -0.55 0.41 -0.11 0.44 0.64 0.78 -0.03 21 -1.25 0.33 0.59 0.56 -0.80 0.86 -0.09 22 0.32 0.45 -0.07 0.08 -0.12 0.64 0.07 23 -0.28 0.34 1.14 0.49 1.17 0.74 -0.20 24 -0.28 0.41 0.18 0.40 -1.57 0.74 0.01 25 -0.55 0.48 -0.07 -0.24 -0.24 0.78 0.06 26 -0.02 0.49 -0.59 -0.06 -0.91 0.70 0.19 27 -0.41 0.39 0.26 0.51 0.56 0.76 0.01 28 0.21 0.48 -0.52 -0.06 -0.62 0.66 0.19 29 -0.02 0.47 0.86 -0.33 -0.29 0.70 -0.27 30 0.21 0.49 -0.38 -0.20 -0.62 0.66 0.15 31 0.84 0.45 0.09 -0.06 0.14 0.54 0.01 32 0.53 0.57 -0.63 -1.23 0.70 0.60 0.20 33 0.74 0.47 -0.26 -0.10 -0.98 0.56 0.17 34 1.04 0.51 -0.65 -0.69 0.17 0.50 0.29 35 1.04 0.37 0.38 0.87 -0.36 0.50 -0.06 36 1.04 0.31 0.82 1.38 0.62 0.50 -0.24 37 0.94 0.35 0.64 1.09 1.06 0.52 -0.18 38 1.24 0.46 -0.03 -0.32 0.28 0.46 0.04 39 1.34 0.55 -1.02 -1.20 -0.15 0.44 0.41



92

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 40 1.45 0.32 0.97 0.94 0.42 0.42 -0.34 41 1.55 0.49 -0.70 -0.60 -0.17 0.40 0.31 42 2.20 0.19 0.99 1.33 0.05 0.28 -0.22 43 2.32 0.47 -0.55 -0.70 -0.18 0.26 0.17 44 1.97 0.41 0.11 -0.20 -0.85 0.32 0.01 45 2.45 0.43 -0.18 -0.53 -0.56 0.24 0.09 46 2.87 0.28 0.24 0.29 0.76 0.18 0.02 47 3.21 0.15 0.57 0.68 0.19 0.14 -0.02 48 5.42 0.15 0.39 0.31 -0.48 0.02 0.03 49 3.63 0.24 -0.00 0.25 0.51 0.10 0.06 50 4.68 0.15 0.26 0.28 -0.14 0.04 0.04

Mean S.D. Groups

-0.09 2.20

0.39 0.14

0.05 0.55

0.08 0.58

0.00 0.72 2

0.65

Note: Raw score mean = 32.36 with a S.D. of 7.55. Mean person ability = 1.00 with a S.D. of 1.20. Test reliability (K.R. 20) = 0.89. Reliability of person separation = 0.88

93



Logit Point Unwt. Item item bis. total

# diff. corr. fit

1 -4.41 -9.99 -0.02 2 -4.41 -9.99 -0.02 3 -4.41 -9.99 -0.02 4 -3.68 0.06 0.73 5 -3.68 0.35 -0.20 6 -2.08 0.22 0.74 7 -3.68 0.35 -0.20 8 -1.55 0.50 -0.35 9 -2.91 0.39 -0.41

10 -2.91 0.19 0.15 11 -3.68 -0.23 2.75 12 -1.79 0.19 0.36 13 -2.43 0.30 -0.10 14 -1.79 0.56 -0.88 15 -1.79 0.26 0.25 16 -1.55 0.28 0.13 17 -2.43 0.19 0.66 18 -2.91 0.46 -0.65 19 -1.34 0.34 0.16 20 -1.79 0.42 -0.17 21 -1.79 0.40 0.11 22 -1.55 0.48 -0.76 23 -1.55 0.50 -0.71 24 -2.43 0.26 0.22 25 -1.14 0.51 -0.63 26 -1.55 0.41 -0.43 27 -1.55 0.61 -1.15 28 -1.55 0.41 -0.43 29 -0.81 0.40 -0.18 30 -1.79 0.19 0.64 31 -1.34 0.36 0.09 32 -0.81 0.28 0.79 33 -1.14 0.36 -0.20 34 -0.26 0.25 0.98 35 -0.66 0.23 1.22 36 -1.55 0.33 0.58 37 -1.14 0.54 -0.62 38 -1.55 0.30 -0.01 39 -0.14 0.33 0.42

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

-1.04 0.05 1.00 -0.00 -1.04 0.05 1.00 -0.00 -1.04 0.05 1.00 -0.00 0.43 -0.51 0.98 0.02 0.12 -0.51 0.98 0.02 0.31 1.64 0.92 -0.03 0.12 -0.51 0.98 0.02

-0.43 -1.28 0.88 0.05 -0.02 -0.17 0.96 0.04 0.41 -0.17 0.96 0.04 0.46 2.34 0.98 -1.00 0.74 -0.93 0.90 0.02 0.26 0.10 0.94 0.04

-0.81 0.55 0.90 0.08 0.46 -0.93 0.90 0.03 0.56 -1.28 0.88 0.04 0.46 0.34 0.94 -0.03

-0.20 -0.17 0.96 0.03 0.30 -0.67 0.86 0.04

-0.21 -0.93 0.90 0.05 -0.23 -0.93 0.90 0.02 -0.13 0.75 0.88 0.08 -0.41 0.75 0.88 0.09 0.22 0.10 0.94 0.03

-0.51 1.14 0.84 0.11 0.15 0.75 0.88 0.07

-1.00 0.75 0.88 0.11 0.06 0.75 0.88 0.08 0.35 0.41 0.80 0.06 0.59 -0.93 0.90 -0.01 0.20 -0.67 0.86 0.04 0.77 1.29 0.80 -0.10 0.46 -0.25 0.84 0.05 1.35 0.69 0.72 -0.23 1.11 -0.35 0.78 -0.28 0.21 0.56 0.88 -0.06

-0.75 -0.44 0.84 0.10 0.50 0.75 0.88 0.05 1.02 1.23 0.70 -0.08


94



# diff. corr. fit fit

Ability Mean between item

fit score

Logit Residual

Index 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

-0.66 -0.14 -0.26 -0.52 -0.52 0.20

-0.52 -0.39 -0.81 0.42 0.09

-0.02 -0.39 -0.02 -0.66 -0.66 -0.02 -0.26 0.31

-0.02 0.42 0.52 0.31 0.52 0.20 0.72 0.52 1.02 0.72 0.62 0.42 0.20 0.92 1.22 1.02 1.02 1.53 1.32 1.96 1.22 1.32 1.32 1.32

0.37 0.34 0.41 -0.35 0.78 -0.03 0.44 -0.22 0.20 -0.20 0.70 0.03 0.63 -1.34 -1.43 0.53 0.72 0.29 0.46 -0.29 -0.11 -0.90 0.76 0.11 0.45 -0.23 0.00 -0.18 0.76 0.05 0.51 -0.10 -0.65 -0.01 0.64 0.07 0.44 -0.37 0.20 0.96 0.76 0.11 0.34 0.58 0.67 0.03 0.74 -0.11 0.48 -0.39 -0.20 -1.20 0.80 0.08 0.58 -1.33 -1.14 -0.21 0.60 0.43 0.32 0.94 1.11 -0.71 0.66 -0.26 0.47 -0.43 -0.02 -1.14 0.68 0.15 0.43 -0.20 0.25 -1.35 0.74 0.08 0.36 0.85 0.61 -1.14 0.68 -0.22 0.44 -0.22 -0.01 -0.35 0.78 0.06 0.55 -0.73 -0.68 -0.61 -0.78 0.15 0.51 -0.78 -0.33 1.14 0.68 0.20 0.42 0.80 -0.09 -0.45 0.72 -0.20 0.37 0.83 0.61 0.54 -0.62 -0.19 0.47 0.30 -0.33 -1.14 0.68 -0.06 0.42 0.29 0.32 -1.22 0.60 -0.07 0.31 1.38 1.23 -0.43 0.58 -0.49 0.24 1.48 1.77 2.11 0.62 -0.40 0.35 0.78 0.94 0.59 0.58 -0.21 0.53 -0.85 -0.60 -1.51 0.64 0.28 0.36 0.61 0.88 -0.40 0.54 -0.16 0.48 -0.22 -0.33 -0.91 0.58 0.12 0.36 0.94 0.65 -1.22 0.48 -0.39 0.52 -0.88 -0.64 0.21 0.54 0.35 0.50 -0.61 -0.34 -1.10 0.56 0.25 0.54 -1.01 -0.72 0.09 0.60 0.36 0.46 -0.12 -0.05 0.91 0.64 -0.03 0.49 -0.65 -0.32 -0.92 0.50 0.29 0.51 -0.63 -0.86 -0.09 0.44 0.21 0.30 0.68 1.62 2.39 0.48 -0.14 0.41 0.08 0.42 -1.22 0.48 0.03 0.47 -0.59 -0.34 -0.28 0.38 0.23 0.54 -1.09 -1.11 0.49 0.42 0.39 0.44 -0.20 -0.41 -0.75 0.30 0.01 0.47 -0.02 -0.54 0.84 0.44 0.03 0.25 2.49 1.36 0.37 0.42 -1.26 0.40 -0.08 0.45 -0.74 0.42 0.08 0.38 0.05 0.62 1.22 0.42 0.06

(appendix continues'!


95

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 83 2.08 0.31 0.05 0.81 0.17 0.28 0.04 84 2.20 0.46 -0.53 -0.55 -0.16 0.26 0.15 85 1.74 0.53 -0.95 -1.08 0.06 0.34 0.29 86 1.96 0.35 -0.06 0.56 -0.41 0.30 0.07 87 2.46 0.34 -0.14 0.29 -0.99 0.22 0.08 88 2.20 0.39 -0.21 0.01 -0.16 0.26 0.09 89 1.63 0.56 -1.12 -1.40 0.40 0.36 0.35 90 2.60 0.36 -0.10 -0.01 0.36 0.20 0.07 91 2.33 0.39 -0.28 -0.00 -0.53 0.24 0.10 92 3.09 0.25 0.24 0.25 0.17 0.14 0.02 93 2.46 0.28 0.79 0.31 1.13 0.22 -0.13 94 3.09 0.36 -0.25 -0.13 -0.81 0.14 0.07 95 3.29 0.48 -0.82 -0.68 0.70 0.12 0.10 96 3.09 0.37 -0.39 -0.15 0.87 0.14 0.08 97 2.46 0.34 -0.15 0.27 -0.99 0.22 0.08 98 4.12 0.26 0.38 -0.04 0.29 0.06 -0.01 99 4.56 0.03 0.71 0.37 1.01 0.04 0.01

100 5.30 0.07 0.59 0.35 -0.48 0.02 0.03 Mean S.D. Groups

-0.13 1.49

0.38 0.15

0.00 0.72

0.06 0.66

0.04 0.85 2

0.62

Note: Raw score mean = 64.40 with a S.D. of 15, Mean person ability = 0.89 with a S.D. of 1.14. Test reliability (K.R. 20) = 0.94. Reliability of person separation = 0.94

30.

96

Appendix C (continued")

Experiment 7: Summary of Item Fit Information for a Normally Distributed Ttem Difficulty Distribution With 25 Items. 100 Persons, and No G u e s s i n g

Logit Point Item item bis.

# diff. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

-5.21 -3.31 -2.98 -1.29 -1.85 -2.14 -1.49 -1.01 -1.72 -0.29 -0.49 -0.42 -0.70 -0.04 0.25 0.58 0.90 0.79 1.33 1.22 1.38 1.90 2.64 2.88 3.85

-9.99 0.21 0.12 0.47 0.43 0.26 0.30 0.44 0.40 0.47 0.42 0.34 0.49 0.51 0.51 0.42 0.51 0.45 0.44 0.53 0.47 0.45 0.38 0.49 0.46

0.67 0.07 0.67

-0.59 -0.09 0.13 0.50

-0.16 0.05

-0.56 -0.13 0.90

-0.72 0.33 0.21 0.88 0.59 0.62 0.11

-0.51 -0.04 -0.03 0.68

-0.45 -0.45

-1.11 0.35 0.52

-0.70 -0.81 0.49 0.51

-0.40 -0.58 -0.02 0.32 1.04

-0.49 -0.99 -0.74 0.78

-0.70 0.55 1.00

-0.64 0.34 0.56 0.75

-0.25 -0.45

0.05 -0.14 0.35 0.43

-0.67 -1.60 0.40 1.07

-0.38 0.26

-0.96 0.69 0.70 1.09 0.18

-0.43 0.89

-1.32 1.01

-0.45 -0.75 0.47 1.06

-0.01 -0.89

1.00 0.97 0.96 0.85 0.90 0.92 0.87 0.82 0.89 0.72 0.75 0.74 0.78 0.68 0.63 0.57 0.51 0.53 0.43 0.45 0.42 0.33 0.22 0.19 0.10

-0.00 0.04 0.02 0.10 0.04 0.04

-0.02 0.07 0.03 0.14 0.08

-0.15 0.14

-0.10 -0.06 -0.31 -0.28 -0.14 0.05 0.15 0.07 0.06

-0.06 0.08 0.06

Mean -0.21 0.42 0.11 S.D. 2.08 0.14 0.49 Groups

-0.03 0.67

0.04 0.78 2

0.65


S.D. of 4.25. .of 1.39.

97



Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 1 -3.86 0.18 -0.01 0.29 -0.31 0.99 0.02 2 -4.56 -9.99 -0.13 -1.21 0.04 1.00 -0.00 3 -3.14 -0.01 0.87 0.31 0.53 0.98 -0.01 4 -3.86 0.06 0.52 0.35 -0.31 0.99 0.02 5 -2.71 0.15 0.26 0.18 -0.24 0.97 0.01 6 -3.14 0.10 0.26 0.26 0.06 0.98 0.02 7 -3.86 0.26 -0.27 0.18 -0.31 0.99 0.01 8 -1.96 0.18 -0.02 0.27 -0.66 0.94 0.03 9 -1.78 0.20 -0.05 0.27 -0.27 0.93 0.04

10 -2.40 0.11 0.30 0.29 -0.95 0.96 0.02 11 -1.49 0.22 -0.18 0.33 0.30 0.91 0.05 12 -0.85 0.33 -0.10 -0.21 0.20 0.85 0.04 13 -1.24 0.38 -0.64 -0.49 -0.69 0.89 0.07 14 -1.49 0.15 0.70 0.37 0.14 0.91 -0.04 15 -2.16 0.15 0.92 0.20 -1.24 0.95 -0.10 16 -0.85 0.28 0.16 0.09 -1.01 0.85 0.02 17 -0.85 0.28 0.06 0.11 -0.35 0.85 0.02 18 -0.94 0.25 0.29 0.16 -0.56 0.86 0.10 19 -0.60 0.42 -0.41 -0.75 1.23 0.82 0.05 20 -0.53 0.43 -0.79 -0.81 -0.18 0.81 0.11 21 -0.53 0.38 -0.51 -0.35 0.69 0.81 0.09 22 -0.68 0.37 -0.62 -0.28 0.22 0.83 0.09 23 -0.13 0.42 -0.42 -0.70 1.24 0.75 0.08 24 -0.53 0.35 -0.28 -0.11 -1.59 0.81 0.06 25 0.11 0.37 -0.29 0.10 -0.01 0.71 0.10 26 0.05 0.40 -0.20 -0.40 -1.17 0.72 0.04 27 0.05 0.41 -0.42 -0.43 -0.14 0.72 0.11 28 0.05 0.46 -0.73 -0.91 1.24 0.72 0.14 29 0.22 0.43 -0.56 -0.56 0.68 0.69 0.14 30 0.38 0.43 0.14 -0.71 0.14 0.66 -0.05 31 0.44 0.29 0.71 1.19 1.48 0.65 -0.14 32 0.84 0.33 0.89 0.86 0.42 0.57 -0.27 33 0.59 0.30 0.78 1.12 -0.81 0.62 -0.20 34 0.89 0.44 -0.43 -0.58 0.51 0.56 0.13 35 0.74 0.37 0.26 0.50 0.41 0.59 -0.02 36 1.04 0.32 3.13 0.88 -0.31 0.53 -1.53 37 0.94 0.35 0.26 0.94 0.41 0.55 -0.01 38 1.47 0.40 0.05 0.38 -0.27 0.44 0.03 39 1.57 0.27 1.34 2.00 1.50 0.42 -0.31



98

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 40 1.78 0.34 1.18 0.77 0.73 0.38 -0.27 41 1.72 0.60 -2.23 -2.41 1.48 0.39 0.48 42 1.72 0.49 -0.85 -1.00 0.27 0.39 0.23 43 2.33 0.41 0.30 -0.00 -0.31 0.28 -0.03 44 2.04 0.20 2.27 2.10 0.68 0.33 -0.59 45 2.27 0.37 0.17 0.51 0.04 0.29 -0.04 46 2.51 0.33 0.28 0.71 -0.37 0.25 -0.00 47 3.18 0.57 -1.37 -1.25 -0.18 0.16 0.12 48 2.94 0.28 1.70 0.44 0.39 0.19 -0.25 49 3.96 0.34 0.49 -0.13 1.24 0.09 -0.03 50 5.72 0.34 -0.48 -0.08 -0.26 0.02 0.02

Mean S.D. Groups

-0.09 2.15

0.31 0.13

0.13 0.87

0.06 0.79

0.08 0.73 2

0.67

Note: Raw score mean = 33.55 with a Mean person ability =1.21 with a S.D Test reliability (K.R. 20) = 0.83. Reliability of person separation = 0.85

S.D. of 6. .of 1.05.

32.

99

Appendix C (continued')


Logit Item item

# diff.

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Point bis. corr.

Unwt. total fit

-3.64 -3.21 -4.37 -4.37 -3.64 -3.64 -2.63 -3.21 -2.63 -2.63 -2.63 -2.63 -1.78 -1.43 -2.07 -1.78 -1.78 -2.23 -2.07 -2.07 -0.97 -1.43 -1.33 -1.43 -2.07 -1.54 -1.54 -1.78 -0.80 -0.51 -1.23 -1.23 -1.33 -1.43 -0.88 -0.80 -0.65 -0.58 -0.65

0.24 0.19

-0.09 0.12 0.13 0.01 0.23 0.15 0.31 0.14 0.29 0.27 0.18 0.39 0.22 0.32 0.34 0.33 0.40 0.42 0.34 0.44 0.37 0.23 0.33 0.36 0.40 0.39 0.38 0.35 0.41 0.42 0.40 0.46 0.47 0.36 0.50 0.49 0.51

-0.24 -0.01 1.98 0.41 0.26 1.04

-0.00 1.07

-0.44 0.49 0.17 1.08 0.34

-0.60 0.58 0.51

-0.45 -0.39 -0.63 -0.58 0.36

-0.49 -0.42 1.36

-0.44 -0.15 -0.30 -0.82 0.82 0.91 0.33

-0.51 0.25

-0.66 -0.62 0.94

-0.99 0.32

-0.17

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 0.16 -0.33 0.98 0.02 0.28 -0.12 0.97 0.04 0.38 2.70 0.99 -0.26 0.33 -0.62 0.99 0.02 0.30 -0.33 0.98 0.03 0.37 1.49 0.98 -0.00 0.32 0.22 0.95 0.04 0.15 0.80 0.97 -0.07 0.01 0.22 0.95 0.04 0.52 -0.16 0.95 0.02

-0.14 -0.16 0.95 0.02 -0.21 -0.16 0.95 -0.13 1.11 -0.65 0.90 0.03 0.07 1.19 0.87 0.08 0.38 1.91 0.92 0.00 0.14 -0.65 0.90 -0.05 0.20 0.86 0.90 0.06

-0.04 0.49 0.93 0.05 -0.35 0.62 0.92 0.06 -0.65 -1.50 0.92 0.05 0.67 1.09 0.82 0.00

-0.53 0.05 0.87 0.07 0.27 -1.37 0.86 0.07 1.01 -1.15 0.87 -0.20 0.09 0.62 0.92 0.05 0.03 -0.74 0.88 0.05

-0.35 -0.74 0.88 0.05 -0.04 0.86 0.90 0.07 0.27 -0.45 0.80 -0.12 0.87 1.21 0.76 -0.11

-0.20 -0.93 0.85 -0.04 -0.17 0.42 0.85 0.07 -0.31 0.11 0.86 -0.02 -0.67 -1.15 0.87 0.08 -0.49 0.11 0.81 0.10 0.32 2.14 0.80 -0.10

-0.43 1.46 0.78 0.13 -0.93 -0.02 0.77 -0.04 -1.09 -0.27 0.78 0.03


100



# diff. corr. fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

-0.88 -0.44 -0.31 -0.18 -0.44 -0.73 -0.58 -0.37 -0.58 -0.24 -0.24 0.30

-0.12 0.52

-0.18 0.24 0.30 0.24 0.63 0.63 0.46 0.52 0.35 0.06 0.63 0.79 0.52 1.00 0.84 1.11 0.95 0.95 1.38 1.44 1.27 1.00 1.66 1.66 1.66 1.55 1.38 1.66 1.71

0.36 0.54 0.39 0.47 0.44 0.50 0.45 0.48 0.45 0.46 0.47 0.42 0.45 0.47 0.37 0.50 0.50 0.48 0.57 0.52 0.38 0.40 0.47 0.42 0.35 0.54 0.52 0.38 0.53 0.51 0.57 0.46 0.51 0.55 0.51 0.49 0.49 0.55 0.46 0.56 0.52 0.45 0.45

0.17 -1.22 1.39 0.72 0.07

-0.87 -0.42 0.20

-0.31 -0.62 -0.24 0.46 0.58 0.22 0.85

-0.55 -0.45 0.46

-1.22 -0.52 1.33 0.79 0.96 0.77 1.99 0.00

-0.56 1.82

-0.32 0.16

-1.17 0.32

-0.51 -0.87 -0.32 0.74

-0.26 -1.00 -0.16 -0.87 -0.06 0.07 1.34

0.59 -0.83 0.48

-0.45 0.15

-0.68 -0.01 -0.50 -0.02 0.44

-0.01 1.08 0.22 0.42 1.31

-0.03 0.01

-0.03 -1.03 -0.17 1.48 1.42 0.19 0.71 1.95

-0.66 -0.33 1.53

-0.34 -0.18 -0.98 0.75 0.01

-0.68 -0.03 0.00 0.18

-0.59 0.69

-0.89 -0.34 0.75 0.06

-1.24 0.41

-0.42 -0.07 -0.90 -0.54 -1.31 -1.36 -1.31 -0.76 -0.76 0.03

-0.37 -1.29 -0.07 -0.16 -0.89 -0.49 0.38

-0.37 -1.11 0.96 0.47

-0.40 1.59

-1.04 -0.23 -1.15 0.51 1.14 1.42 1.14 1.05 0.74

-1.26 -1.15 -1.32 0.35 1.16

-0.65 0.35 0.52 0.09

0.81 0.75 0.73 0.71 0.75 0.79 0.77 0.74 0.77 0.72 0.72 0.63 0.70 0.59 0.71 0.64 0.63 0.64 0.57 0.57 0.60 0.59 0.62 0.67 0.57 0.54 0.59 0.50 0.53 0.48 0.51 0.51 0.43 0.42 0.45 0.50 0.38 0.38 0.38 0.40 0.43 0.38 0.37

0.02 0.18

-0.24 -0.15 -0.01 0.13 0.08

-0.04 0.08 0.14 0.06

-0.04 -0.11 -0.07 -0.14 0.13 0.11

-0.09 0.27 0.15

-0.31 -0.15 -0.30 -0.14 -0.48 -0.04 0.17

-0.47 0.09

-0.05 0.29

-0.03 0.16 0.20 0.10

-0.25 0.10 0.22 0.07 0.22 0.01 0.01

-0.33


101


Logit Point Unwt. Wt. Ability Mean Item item bis. total total between item

# (jiff. corr. fit fit _fit score

83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

100

2.01 2.01 2.01 1.95 1.95 2.25 2.97 2.81 2.81 3.44 3.06 2.74 3.44 3.44 3.66 4.05 4.40 5.60

Mean S.D. Groups

0.00 2.03

0.46 0.53 0.54 0.57 0.50 0.39 0.44 0.41 0.42 0.40 0.33 0.37 0.33 0.44 0.40 0.38 0.27 0.09 0.40 0.13

-0.11 -0.86 -0.33 -1.19 -0.42 0.63 0.52

-0.44 0.50

-0.26 1.89 0.17 0.12

-0.91 -0.38 -0.55 -0.05 1.02 0.07 0.75

0.28 -0.53 -0.90 -1.18 -0.29 0.81

-0.39 0.21

-0.15 -0.41 0.23 0.56 0.07

-0.38 -0.49 -0.40 0.04 0.22

-0.02 1.63 1.01 1.20

-0.25 -0.66 0.85

-0.92 -0.92 -1.10 -0.40 0.38 0.42 1.17

-0.38 -1.38 0.36 1.44

0.05 0.60

0.01 0.94 2

0.32 0.32 0.32 0.33 0.33 0.28 0.18 0.20 0.20 0.13 0.17 0.21 0.13 0.13 0.11 0.08 0.06 0.02 0.63

Logit Residual

Index

0.05 0.17 0.05 0.21 0.11

-0.10 -0.13 0.08

-0.09 0.06

-0.40 0.01 0.03 0.09 0.05 0.05 0.04

-0.04

Note: Raw score mean = 63.49 with a S.D. of 16 Mean person ability = 1.00 with a S.D. of 1.30. Test reliability (K.R. 20) = 0.95. Reliability of person separation = 0.95.

62.

102


Bvpffrimfint 10: Summary of Item Fit Information for a Normally Distributed Item Difficulty Distribution With 25 Items. 25 Persons, and a 25% Chance of Guessing Correctly

LmS Point Unwt. Wt Ability Mean Ttem item bis. total total between item

# diff. com fit fit fit score

Logit Residual

Index

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

-3.75 -3.00 -3.00 -2.21 -3.75 -3.00 -1.70 -0.97 -0.97 -0.42 -0.68 -0.97 -0.42 0.06

-0.68 0.49 0.28 1.13 0.71 1.13 1.57 3.32 2.59 2.92 3.85

Mean -0.30 S.D. 2.16 Groups

-9.99 0.41 0.26 0.11

-9.99 0.26 0.39 0.48 0.09 0.15 0.24 0.60 0.17 0.62 0.72 0.57 0.59 0.41 0.62 0.50 0.57 0.14 0.04 0.66 0.54 0.40 0.25

0.21 0.01 0.27 0.68 0.21 0.27 0.15

-0.28 1.62 3.11 0.44

-0.77 1.44

-0.78 -1.22 -0.60 -0.63 0.34

-0.83 -0.17 -0.48 1.68 1.55

-0.94 -0.45 0.19 1.03

-0.97 0.00 0.30 0.57

-0.97 0.30

-0.12 -0.22 1.11 1.16 1.15

-0.82 1.36

-0.90 -1.83 -0.45 -0.70 0.64

-0.96 -0.02 -0.66 0.28 1.40

-1.19 -0.52

0.05 -0.36 -0.36 0.70 0.05

-0.36 -0.18 -0.81 0.37 0.57

-0.36 1.04 0.57 0.22 1.37

-1.33 0.89

-1.09 1.10

-1.09 0.11 1.18 2.01

-0.01 -0.46

-0.08 0.89

0.15 0.83 2

Note: Raw score mean — 16.16 and a S. Mean person ability = 0.83 with a S.D. Test reliability (K.R. 20) = 0.78. Reliability of person separation = 0.78.

D. of 3.83. of 1.27.

1.00 0.96 0.96 0.92 1.00 0.96 0.88 0.80 0.80 0.72 0.76 0.80 0.72 0.64 0.76 0.56 0.60 0.44 0.52 0.44 0.36 0.12 0.20 0.16 0.08

0.65

-0.00 0.03 0.04 0.01

-0.00 0.04 0.05 0.13

-0.52 -2.27 -0.04 0.18

-0.53 0.33 0.28 0.34 0.31

-0.07 0.42 0.15 0.25

-0.47 -0.46 0.16 0.07

103

Appendic C (continued)

Experiment 11: Summary of Item Fit Information for a Normally Distributed Item Difficulty Distribution With 50 Ttems. 25 Persons, and a 25% Chance of Guessing Correctly

Item #

Logit item diff.

Point bis. corr.

Unwt. Wt. Ability Mean Logit total total between item Residual fit fit fit score Index

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

-3.79 -3.79 -3.06 -3.06 -3.79 -3.06 -3.79 -3.06 -1.82 -1.82 -1.45 -3.06 -1.15 -0.65 -1.82 -1.15 -1.45 -1.15 -0.65 -0.65 -0.65 -1.15 -0.22 -0.65 -0.43 -0.22 -0.22 -0.43 -0.43 0.18 0.38 0.18 0.18 0.18 0.38 0.98 0.77 1.40 1.40

-9.99 -9.99 0.22 0.03

-9.99 0.22

-9.99 0.22 0.33 0.48 0.55

-0.16 0.05 0.28 0.40 0.25 0.43 0.35 0.37 0.31 0.38

-0.04 0.37 0.19 0.51 0.40 0.26 0.48 0.46 0.30 0.45 0.38 0.32 0.55 0.51 0.39 0.65 0.24 0.69

-0.21 -0.21 0.14 0.69

-0.21 0.14

-0.21 0.14

-0.19 -0.59 -0.80 1.45 0.84 0.67

-0.37 0.25

-0.42 -0.34 -0.31 -0.06 -0.19 1.40

-0.08 0.55

-0.74 -0.21 0.42

-0.45 -0.47 0.66

-0.04 0.87 0.24

-0.95 -0.68 0.36

-1.42 0.83

-1.42

-1.05 -1.05 0.29 0.39

-1.05 0.29

-1.05 0.29 0.02

-0.39 -0.72 0.43 0.94 0.26

-0.14 0.36

-0.30 0.14 0.18 0.47 0.01 1.07 0.26 0.87

-0.65 0.12 0.89

-0.57 -0.37 0.73

-0.19 0.00 0.82

-0.88 -0.53 0.26

-1.39 0.92

-1.34

0.05 0.05

-0.06 1.28 0.05

-0.06 0.05

-0.06 0.82 0.82 1.17 1.28 1.03

-0.27 0.82

-0.61 1.17 1.51

-0.27 -0.27 1.01 2.26 0.94 2.12 0.39 0.94 0.94

-1.45 0.35

-0.27 0.54

-0.27 1.89

-0.27 -1.05 -1.31 -0.54 -0.19 1.19

1.00 1.00 0.96 0.96 1.00 0.96 1.00 0.96 0.88 0.88 0.84 0.96 0.80 0.72 0.88 0.80 0.84 0.80 0.72 0.72 0.72 0.80 0.64 0.72 0.68 0.64 0.64 0.68 0.68 0.56 0.52 0.56 0.56 0.56 0.52 0.40 0.44 0.32 0.32

-0.00 -0.00 0.04 0.01

-0.00 0.04

-0.00 0.04 0.08 0.11 0.15

-0.20 -0.12 -0.20 0.10 0.10 0.12 0.13 0.15 0.10 0.14

-0.36 0.13

-0.07 0.30 0.17

-0.09 0.23 0.24

-0.29 0.06

-0.54 -0.02 0.50 0.41

-0.12 0.66

-0.27 0.46


104



# diff. corr. fit

40 41 42 43 44 45 46 47 48 49 50

1.89 1.18 1.89 1.89 2.89 1.64 3.40 4.19 4.19 4.94 4.19

Mean S.D. Groups

-0.21 2.22

0.38 0.39 0.27 0.47 0.46 0.27 0.29 0.43 0.13

-9.99 0.06 0.33 0.21

0.03 0.63

Wt. total fit

0.35 0.68 0.13

-0.14 -0.46 1.11 0.31

-0.26 0.42

-0.23 0.62

0.16 0.11 0.86

-0.16 -0.04 0.66 0.14 0.07 0.40

-0.99 0.43

Ability between

fit

-1.34 0.11 0.71

-1.34 0.00

-0.87 -0.27 -0.60 -0.60 0.06

-0.60 -0.00 0.64

0.22 0.90 2

Mean item score

0.24 0.36 0.24 0.24 0.12 0.28 0.08 0.04 0.04 0.00 0.04 0.61

Logit Residual

Index

-0.08 -0.35 -0.01 0.09 0.09

-0.42 0.02 0.04 0.04

-0.00 0.02

Note: Raw score mean = 30.32 with a S.D. of 6. Mean person ability = 0.52 with a S.D. of 1.03. Test reliability (K.R. 20) = 0.84. Reliability of person separation = 0.85.

46.

105


Experiment 12: Summary ot item n t in TYififirniltv Distribution With 100 Items,

iormauon iui <x iNuimanv lyiomuuiv/u 25 Persons, and a 25% Chance of Guessing

Correctly

Logit Item item

# diff.

Point bis. corr.

Unwt. total fit

Wt. total fit


fit score Index

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

-2.95 -3.67 -3.67 -3.67 -3.67 -3.67 -3.67 -3.67 -2.19 -2.19 -3.67 -2.95 -1.36 -2.19 -1.72 -2.95 -2.95 -1.72 -3.67 -1.07 -2.95 -3.67 -1.36 -2.19 -2.19 -1.07 -1.36 -1.72 -2.19 -1.72 -1.72 -1.36 -0.36 -0.81 -1.07 -0.81 -1.72 -2.19 -1.07

-0.02 -9.99 -9.99 -9.99 -9.99 -9.99 -9.99 -9.99 -0.18 0.30

-9.99 0.20 0.43

-0.03 0.12 0.17 0.08

-0.28 -9.99 0.46 0.32

-9.99 0.15 0.13 0.10 0.43 0.40 0.15 0.09

-0.18 0.03 0.52 0.29 0.19 0.26 0.46 0.25 0.41 0.39

0.75 -0.50 -0.50 -0.50 -0.50 -0.50 -0.50 -0.50 1.69

-0.23 -0.50 0.10

-0.58 1.55 0.29 0.19 0.42 2.29

-0.50 -0.68 -0.16 -0.50 0.30 0.48 0.77

-0.41 -0.30 0.20 0.42 1.63 0.63

-0.81 0.14 0.35

-0.09 -0.44 -0.10 -0.48 -0.33

0.39 -1.06 -1.06 -1.06 -1.06 -1.06 -1.06 -1.06 0.45 0.14

-1.06 0.31

-0.16 0.29 0.38 0.33 0.36 0.61

-1.06 -0.29 0.22

-1.06 0.41 0.24 0.23

-0.31 -0.17 0.34 0.33 0.62 0.48

-0.43 0.31 0.51 0.31

-0.51 0.22

-0.02 -0.15

1.55 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.30 0.23 0.04

-0.20 0.88 0.30

-0.62 -0.20 -0.20 1.43 0.04 1.17

-0.20 0.04

-1.28 0.30 0.30

-0.46 -1.28 -0.62 0.30 1.43 1.43 0.88

-0.32 -0.70 -0.46 -0.70 0.58 0.23

-0.46

0.96 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.92 0.92 1.00 0.96 0.84 0.92 0.88 0.96 0.96 0.88 1.00 0.80 0.96 1.00 0.84 0.92 0.92 0.80 0.84 0.88 0.92 0.88 0.88 0.84 0.68 0.76 0.80 0.76 0.88 0.92 0.80

-0.01 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.38 0.07

-0.00 0.04 0.13

-0.36 0.02 0.04 0.02

-0.68 -0.00 0.17 0.04

-0.00 0.01

-0.02 -0.08 0.13 0.09 0.03 0.01

-0.36 -0.04 0.15 0.02

-0.03 0.08 0.16 0.07 0.07 0.12


106



# diff. corr. fit fit

Ability Mean between item

fit score

Logit Residual

Index

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

-0.81 -1.36 -0.16 -1.07 -0.16 -0.58 -0.81 -0.58 0.03

-2.19 -0.16 0.03

-0.36 0.22 0.41

-0.36 0.78 0.03 0.03 0.03 0.41 0.41

-0.36 0.41

-0.16 0.60

-0.58 1.17

-0.16 0.60 0.60 1.17 0.41

-0.16 0.60 1.59 2.39 1.59 1.37 1.17 1.37 2.39 0.60

0.52 0.07 0.15 0.17 0.41 0.57 0.50 0.37 0.40 0.11 0.04 0.15 0.11

-0.01 0.66 0.44 0.45 0.46 0.25 0.60 0.09 0.57 0.17 0.49 0.50 0.31 0.56 0.09 0.39 0.23 0.42 0.63 0.30 0.29 0.37 0.55 0.15 0.46 0.27 0.49 0.53 0.35 0.47

-0.67 0.75 1.33 0.32

-0.38 -0.98 -0.74 -0.37 -0.33 0.46 0.97 0.88 1.02 1.90

-1.87 -0.37 -0.32 -0.46 0.49

-1.32 1.58

-1.16 0.73

-0.74 -0.59 0.36

-0.94 1.20

-0.41 0.73

-0.11 -1.24 0.28 0.19 0.05

-0.81 0.37

-0.17 0.57

-0.57 -0.46 -0.00 -0.37

-0.70 0.46 0.89 0.48

-0.15 -0.95 -0.53 0.05

-0.06 0.29 1.67 1.19 1.00 1.99

-1.89 -0.41 -0.44 -0.55 0.62

-1.38 1.58

-1.25 0.78

-0.64 -0.82 0.36

-0.91 1.28

-0.04 0.92

-0.30 -1.42 0.50 0.39 0.08

-0.76 0.45

-0.49 0.32

-0.60 -0.87 -0.10 -0.67

0.08 0.70 0.48 0.03 0.17 0.54 0.08 0.54

-0.73 0.30 1.61 2.11 1.04 0.54 0.61

-0.43 -0.69 -0.73 -0.14 0.70 1.11 0.61 1.04 0.61 1.37 0.01 0.54 0.96

-1.16 0.52 0.01 0.99

-0.11 0.48

-0.99 0.16

-1.04 0.16 0.21 0.99 0.59 1.06 0.01

0.76 0.84 0.64 0.80 0.64 0.72 0.76 0.72 0.60 0.92 0.64 0.60 0.68 0.56 0.52 0.68 0.44 0.60 0.60 0.60 0.52 0.52 0.68 0.52 0.64 0.48 0.72 0.36 0.64 0.48 0.48 0.36 0.52 0.64 0.48 0.28 0.16 0.28 0.32 0.36 0.32 0.16 0.48

0.20 -0.10 -0.70 0.00 0.21 0.31 0.22 0.16 0.21

-0.01 -0.37 -0.37 -0.38 -1.11 0.99 0.18 0.15 0.27

-0.17 0.63

-0.95 0.66

-0.22 0.45 0.27

-0.19 0.31

-0.50 0.23

-0.35 0.08 0.51

-0.08 -0.01 0.02 0.27

-0.02 0.08

-0.19 0.27 0.17 0.04 0.16

(appendix continied)

107



# diff. corr. fit

Wt. Ability Mean Logit total between item Residual fit fit score Index

83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

100

1.17 1.59 1.83 2.39 1.37 1.83 3.24 2.76 1.83 1.59 4.01 2.76 3.24 3.24 2.39 2.39 4.01 4.74

0.30 0.28 0.20 0.31 0.45 0.45 0.17

-0.04 0.54 0.48 0.01 0.21 0.37 0.17 0.52 0.22 0.24

-9.99 Mean -0.32 S.D. 2.03 Groups

0.29 0.23

0.36 -0.00 0.25

-0.15 -0.52 -0.46 0.12 1.16

-0.86 -0.23 0.65 0.12

-0.37 0.12

-0.77 0.28

-0.00 -0.52 0.01 0.74

0.32 0.45 0.66 0.15

-0.32 -0.30 0.30 0.54

-0.62 -0.60 0.38 0.29 0.07 0.30

-0.39 0.27 0.28

-1.05

-0.36 -0.89 -0.32 -1.04 -1.09 -0.32 0.06 1.76 1.21

-0.89 -0.34 -0.22 0.06 0.06 0.66 1.06

-0.34 0.06

-0.05 0.72

0.18 0.72 2


S.D.of 11 of 0.84.

.22.

0.36 0.28 0.24 0.16 0.32 0.24 0.08 0.12 0.24 0.28 0.04 0.12 0.08 0.08 0.16 0.16 0.04 0.00 0.62

-0.11 0.03

-0.03 0.07 0.20 0.15 0.04

-0.19 0.23 0.10 0.00 0.03 0.07 0.04 0.14

-0.01 0.04

-0.00

108


Pvpprimpnt 1V Summary of Item Fit Information for a Normally Distributed Item Difficulty Distribution With 25 Items. 50 Persons, and a 25% Chance of Guessing Correctly

Logit Item item

# diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

-4.04 -3.33 -2.58 -3.33 -1.79 -1.79 -1.79 -1.79 -2.12 -0.71 -0.56 -0.88 -0.71 -0.13 0.80 0.69 1.02 0.91 1.12 1.44 1.88 2.74 2.74 3.04 5.11

-9.99 -0.03 0.31 0.22 0.35 0.24 0.29 0.38 0.33 0.47 0.50 0.43 0.21 0.56 0.45 0.32 0.56 0.58 0.24 0.49 0.53 0.46 0.37 0.51 0.00

0.18 1.21

-0.22 0.20

-0.30 0.08

-0.08 -0.32 -0.22 -0.62 -0.31 -0.49 1.71

-0.97 1.14 0.88

-0.95 -0.87 2.01 0.01 0.25

-0.46 0.22

-0.58 2.41

-1.10 0.41 0.05 0.27

-0.00 0.32 0.16

-0.19 -0.02 -0.33 -0.75 -0.17 0.79

-0.86 0.17 1.64

-0.58 -0.99 2.43 0.09

-0.49 0.14 0.64

-0.47 0.33

0.05 2.80

-0.34 -0.63 0.08 0.08 0.08 0.08

-0.12 0.87

-0.26 0.72 0.13 1.48

-0.72 0.79

-0.44 0.90 2.56

-0.19 0.18

-0.43 -0.55 0.24 1.18

1.00 0.98 0.96 0.98 0.92 0.92 0.92 0.92 0.94 0.82 0.80 0.84 0.82 0.74 0.58 0.60 0.54 0.56 0.52 0.46 0.38 0.24 0.24 0.20 0.04

-0.00 -0.03 0.04 0.03 0.06 0.05 0.06 0.06 0.05 0.12 0.10 0.10

-0.41 0.20

-0.50 -0.24 0.35 0.27

-0.70 0.06

-0.11 0.13 0.00 0.12

-0.77

Mean -0.16 0.37 0.16 0.06 S.D. 2.27 0.18 0.91 0.78 Groups

0.34 0.91 2


57.

0.68

109


F.xneriment 14- Summary of Ttem Fit Information for a Normally Distributed Item Difficulty Distribution With 50 Items. 50 Persons, and a 25 % Chance of Guessing Correctly

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

1 -4.34 -9.99 -0.05 -1.06 0.05 2 -4.34 -9.99 -0.05 -1.06 0.05 3 -2.39 0.32 -0.21 0.05 0.13 4 -3.61 0.06 0.64 0.40 -0.48 5 -3.61 0.36 -0.24 0.08 -0.48 6 -2.86 0.14 0.27 0.41 -0.13 7 -1.77 0.30 -0.16 0.20 0.56 8 -1.77 0.04 0.73 0.93 1.02 9 -2.86 0.17 1.06 0.08 1.02

10 -2.39 0.18 0.63 0.20 0.27 11 -2.39 0.40 -0.07 -0.38 0.27 12 -1.33 0.35 0.01 -0.04 0.07 13 -0.99 0.37 0.09 -0.07 0.62 14 -0.99 0.32 -0.01 0.39 0.06 15 -0.99 0.44 -0.12 -0.52 -0.94 16 -0.83 0.40 -0.52 0.13 1.46 17 -0.83 0.40 -0.21 -0.11 -1.38 18 -0.83 0.18 0.62 1.11 0.21 19 -1.33 0.27 0.16 0.45 -0.68 20 -0.43 0.42 0.02 -0.20 1.09 21 -0.56 0.28 1.17 0.40 1.59 22 -0.69 0.45 -0.33 -0.46 -0.76 23 -0.69 0.39 0.43 -0.32 -0.22 24 -1.33 0.28 -0.05 0.42 -0.68 25 -0.43 0.42 -0.33 -0.07 -1.32 26 0.14 0.50 -0.95 -0.65 -0.47 27 0.03 0.35 0.91 0.29 -0.30 28 0.24 0.44 -0.50 -0.04 0.90 29 0.94 0.49 0.16 -0.86 0.85 30 0.03 0.30 0.37 1.03 -0.30 31 0.14 0.51 -0.81 -0.93 1.44 32 -0.08 0.57 -1.07 -1.47 0.92 33 0.84 0.38 0.99 0.28 -0.59 34 0.34 0.43 -0.43 0.10 -0.51 35 1.03 0.41 0.23 0.18 -0.70 36 1.03 0.40 0.67 0.12 -0.63 37 0.94 0.65 -2.02 -2.58 2.29 38 1.54 0.40 0.05 0.25 -0.26 39 1.13 0.24 1.55 1.69 1.58

item score

Logit Residual

Index

1.00 1.00 0.94 0.98 0.98 0.96 0.90 0.90 0.96 0.94 0.94 0.86 0.82 0.82 0.82 0.80 0.80 0.80 0.86 0.74 0.76 0.78 0.78 0.86 0.74 0.64 0.66 0.62 0.48 0.66 0.64 0.68 0.50 0.60 0.46 0.46 0.48 0.36 0.44

-0.00 -0.00 0.05 0.02 0.02 0.04 0.06

-0.01 -0.10 -0.02 0.03 0.05 0.04 0.04 0.06 0.11 0.08

-0.08 0.01

-0.01 -0.27 0.10

-0.05 0.06 0.10 0.31

-0.32 0.23

-0.15 -0.02 0.28 0.30

-0.51 0.21

-0.06 -0.35 0.67 0.01

-0.65


110


Item Logit Point Unwt. Wt. Ability Mean Logit

Item item bis. total total between item Residual # diff. corr. fit fit fit score Index

40 1.76 0.36 0.05 0.63 -0.44 0.32 0.01 41 1.34 0.26 1.12 1.46 -0.16 0.40 -0.38 42 1.76 0.51 -0.46 -0.85 -0.79 0.32 0.17 43 1.88 0.52 -0.85 -0.69 0.99 0.30 0.24 44 2.12 0.38 -0.16 0.31 -0.96 0.26 0.07 45 2.38 0.38 0.10 0.04 0.67 0.22 0.04 46 2.00 0.29 0.43 0.92 1.40 0.28 -0.04 47 3.04 0.27 0.50 0.29 -0.16 0.14 -0.02 48 3.48 0.31 -0.28 0.34 0.64 0.10 0.06 49 3.76 0.26 0.36 0.18 -0.52 0.08 -0.00 50 4.10 0.54 -0.98 -0.65 0.18 0.06 0.05

Mean -0.17 0.36 0.05 0.01 0.13 0.64 S.D. 2.02 0.15 0.65 0.74 0.84 Groups 2 Note: Raw score mean = 31.90 with a S.D. of 7.10. Mean person ability = 0.85 with a S.D. of 1.10. Test reliability (K.R. 20) = 0.86. Reliability of person separation = 0.87

I l l


F.xneriment 15- Summary of Ttem Fit Information for a Normally Distributed Item Difficulty Distribution With 100 Items. 50 Persons, and a 25% Chance of Guessing Correctly

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

1 -4.32 -9.99 -0.05 -1.04 0.05 2 -4.32 -9.99 -0.05 -1.04 0.05 3 -4.32 -9.99 -0.05 -1.04 0.05 4 -3.59 0.13 0.37 0.38 -0.40 5 -3.59 0.24 0.05 0.30 -0.40 6 -4.32 -9.99 -0.05 -1.04 0.05 7 -2.37 0.23 0.19 0.29 0.01 8 -4.32 -9.99 -0.05 -1.04 0.05 9 -3.59 0.11 0.47 0.40 -0.40

10 -2.83 -0.01 0.92 0.50 2.89 11 -2.83 0.46 -0.65 -0.31 -0.03 12 -2.83 0.07 0.54 0.46 -0.03 13 -2.02 0.23 0.32 0.15 -0.69 14 -2.02 0.25 -0.05 0.34 0.49 15 -3.59 0.02 0.86 0.42 -0.40 16 -3.59 0.25 -0.00 0.28 -0.40 17 -1.75 0.33 0.02 -0.05 0.72 18 -2.37 0.34 -0.19 -0.08 0.26 19 -1.51 0.30 0.61 -0.07 0.19 20 -1.51 0.39 -0.50 -0.08 0.92 21 -2.02 0.25 0.26 0.09 -0.69 22 -1.13 0.44 -0.50 -0.33 0.05 23 -1.51 0.20 0.24 0.61 1.58 24 -2.37 0.22 0.19 0.31 0.01 25 -1.51 0.21 0.02 0.68 -0.78 26 -1.31 0.26 0.09 0.44 -0.32 27 -1.75 0.31 0.09 -0.03 0.72 28 -1.13 0.41 -0.36 -0.26 0.05 29 -1.51 0.24 0.05 0.52 -0.78 30 -0.81 0.25 0.19 0.79 0.97 31 -1.13 0.38 -0.29 -0.05 -0.90 32 -1.31 0.18 0.88 0.54 2.24 33 -0.96 0.29 0.41 0.30 -1.36 34 -0.96 0.46 -0.60 -0.51 -1.36 35 -0.67 0.39 -0.35 0.07 -0.26 36 -1.13 0.25 0.17 0.55 0.67 37 -0.67 0.50 -0.90 -0.64 1.86 38 -1.31 0.57 -1.07 -1.07 1.12 39 -1.13 0.46 -0.53 -0.54 -0.90

Mean Logit item Residual

Index score

1.00 1.00 1.00 0.98 0.98 1.00 0.94 1.00 0.98 0.96 0.96 0.96 0.92 0.92 0.98 0.98 0.90 0.94 0.88 0.88 0.92 0.84 0.88 0.94 0.88 0.86 0.90 0.84 0.88 0.80 0.84 0.86 0.82 0.82 0.78 0.84 0.78 0.86 0.84

-0.00 -0.00 -0.00 0.03 0.03

-0.00 0.02

-0.00 0.03

-0.01 0.04 0.02 0.03 0.05 0.01 0.03 0.04 0.05

-0.06 0.09 0.03 0.10 0.02 0.03 0.05 0.03 0.04 0.09 0.04 0.04 0.09

-0.09 -0.03 0.12 0.11 0.04 0.14 0.13 0.10


112


Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 40 -0.41 0.17 1.52 1.19 1.71 0.74 -0.34 41 -0.81 0.47 -0.16 -0.70 0.65 0.80 0.04 42 -0.41 0.38 -0.01 0.10 -0.84 0.74 0.00 43 -0.54 0.37 0.12 0.08 -1.48 0.76 0.02 44 -0.41 0.57 -1.19 -1.29 1.40 0.74 0.26 45 -1.13 0.43 -0.20 -0.44 0.05 0.84 0.06 46 0.36 0.36 0.56 0.45 -1.54 0.60 -0.15 47 -0.29 0.46 -0.65 -0.33 -0.36 0.72 0.19 48 -0.67 0.25 0.46 0.80 -0.76 0.78 -0.08 49 -0.29 0.37 0.55 -0.03 -0.82 0.72 -0.11 50 -0.54 0.31 0.63 0.39 0.11 0.76 -0.10 51 -0.06 0.39 -0.11 0.22 -0.77 0.68 -0.00 52 -0.17 0.34 1.94 0.10 0.01 0.70 -0.85 53 0.36 0.42 0.32 -0.17 0.85 0.60 -0.21 54 -0.54 0.31 0.45 0.42 0.15 0.76 -0.04 55 0.56 0.33 0.73 0.97 0.05 0.56 -0.22 56 -0.17 0.33 0.29 0.55 -1.54 0.70 -0.02 57 -0.06 0.46 -0.59 -0.34 0.40 0.68 0.20 58 0.05 0.41 -0.27 0.07 -1.03 0.66 0.13 59 0.15 0.41 -0.15 0.13 -0.22 0.64 0.12 60 0.56 0.36 0.86 0.46 -1.23 0.56 -0.34 61 0.36 0.49 -0.56 -0.77 -1.54 0.60 0.24 62 0.26 0.43 -0.49 0.05 -0.56 0.62 0.23 63 0.36 0.44 -0.47 0.02 0.87 0.60 0.23 64 0.26 0.41 -0.13 0.16 -0.74 0.62 0.08 65 0.26 0.30 0.40 1.29 -0.74 0.62 -0.08 66 0.85 0.50 -0.80 -0.69 -0.39 0.50 0.34 67 0.76 0.30 1.14 1.31 0.99 0.52 -0.42 68 0.95 0.48 -0.58 -0.38 -0.35 0.48 0.27 69 1.05 0.39 0.33 0.42 0.18 0.46 -0.10 70 0.76 0.36 0.67 0.73 0.12 0.52 -0.23 71 0.46 0.54 -1.10 -1.22 -0.77 0.58 0.37 72 1.15 0.33 0.58 1.08 0.12 0.44 -0.15 73 0.66 0.54 -1.18 -1.21 0.22 0.54 0.45 74 1.05 0.49 -0.37 -0.60 -0.99 0.46 0.17 75 0.95 0.41 0.88 -0.05 -0.35 0.48 -0.49 76 0.66 0.41 0.28 0.11 -0.44 0.54 -0.08 77 1.78 0.48 -0.53 -0.32 -1.03 0.32 0.19 78 1.67 0.48 -0.56 -0.40 -0.79 0.34 0.19 79 1.78 0.39 0.03 0.42 0.25 0.32 -0.03 80 1.90 0.39 0.37 0.12 0.20 0.30 -0.07 81 0.95 0.34 0.81 0.86 -1.05 0.48 -0.28 82 1.67 0.29 0.54 1.17 0.38 0.34 -0.11



113


# diff. corr. fit fit fit score Index

83 1.90 0.43 -0.27 0.05 -0.14 0.30 0.11 84 1.56 0.60 -1.34 -1.44 1.73 0.36 0.40 85 1.90 0.34 0.52 0.54 0.20 0.30 -0.15 86 1.90 0.47 -0.56 -0.21 -0.14 0.30 0.16 87 1.90 0.43 -0.17 0.04 -1.16 0.30 0.09 88 1.78 0.33 0.44 0.82 -1.03 0.32 -0.05 89 1.35 0.43 0.15 -0.09 -0.19 0.40 -0.10 90 1.90 0.48 -0.52 -0.31 -0.14 0.30 0.17 91 2.56 0.28 0.66 0.55 0.02 0.20 -0.08 92 2.56 0.27 0.36 0.80 0.02 0.20 -0.01 93 2.56 0.54 -0.53 -0.96 0.48 0.20 0.11 94 2.88 0.45 -0.16 -0.42 -0.16 0.16 0.05 95 2.56 0.63 -1.13 -1.49 0.48 0.20 0.17 96 3.51 0.29 0.26 0.04 -1.12 0.10 0.02 97 3.79 0.27 0.01 0.30 0.39 0.08 0.04 98 4.13 0.09 1.39 0.20 2.16 0.06 -0.16 99 4.13 0.43 -0.70 -0.14 0.16 0.06 0.05

100 5.34 0.16 0.26 0.34 -0.46 0.02 0.03 Mean -0.22 0.35 0.04 0.03 -0.06 0.65 S.D. 2.05 0.15 0.60 0.62 0.86 Groups 2 Note: Raw score mean = 64.62 with a S.D. of 14. Mean person ability = 0.87 with a S.D. of 1.11. Test reliability (K.R. 20) = 0.93. Reliability of person separation = 0.93.

13.

114

Appendic C (continued)

Experiment 16: Summary of Item Fit Information for a Normally Distributed Item Difficulty Distribution With 25 Items. 100 Persons, and a 25% Chance of Guessing Correctly



1 -4.84 -9.99 -0.05 -1.21 0.04 1.00 -0.00 2 -2.99 0.27 -0.39 -0.05 0.29 0.97 0.03 3 -3.42 0.10 0.54 0.24 0.61 0.98 0.00 4 -3.42 0.16 0.04 0.21 0.03 0.98 0.03 5 -2.23 0.07 0.74 0.48 0.14 0.94 -0.01 6 -2.23 0.12 0.43 0.38 0.14 0.94 0.02 7 -1.89 0.19 0.71 0.16 -0.82 0.92 -0.05 8 -1.30 0.12 1.32 0.88 1.82 0.87 -0.13 9 -0.50 0.22 0.90 1.16 0.47 0.77 -0.10

10 -0.71 0.39 -0.53 -0.24 0.79 0.80 0.09 11 -0.50 0.37 -0.22 -0.10 -0.10 0.77 0.07 12 -0.02 0.39 0.32 0.04 0.32 0.69 -0.06 13 -0.08 0.52 -1.34 -1.50 -0.56 0.70 0.25 14 0.30 0.39 0.10 0.43 1.16 0.63 0.04 15 0.51 0.45 -0.75 -0.05 -0.03 0.59 0.23 16 0.51 0.55 -1.75 -1.80 1.24 0.59 0.41 17 0.66 0.52 -0.99 -1.46 1.47 0.56 0.22 18 1.06 0.35 1.17 1.26 0.23 0.48 -0.30 19 0.96 0.45 0.15 -0.08 -0.35 0.50 -0.06 20 1.74 0.51 -0.75 -0.51 -0.39 0.35 0.19 21 2.03 0.47 -0.47 -0.11 -1.11 0.30 0.10 22 1.85 0.47 -0.35 -0.18 -1.14 0.33 0.05 23 2.63 0.25 1.60 1.35 0.24 0.21 -0.24 24 3.57 0.49 -0.92 -0.67 -0.35 0.11 0.07 25 3.45 0.39 -0.41 0.11 1.10 0.12 0.06

Mean -0.19 0.34 -0.04 -0.05 0.21 0.64 S.D. 2.20 0.17 0.84 0.82 0.76 Groups 2 Note: Raw score mean =16.10 with a S.D. of 3 Mean person ability = 1.00 with a S.D. of 1.14. Test reliability (K.R. 20) = 0.72. Reliability of person separation = 0.74.

.49.

115





1 -4.16 0.13 0.20 0.31 -0.42 0.99 0.02 2 -4.87 -9.99 -0.01 -1.22 0.04 1.00 -0.00 3 -4.16 0.01 0.86 0.35 2.02 0.99 0.00 4 -2.72 0.22 -0.27 0.10 0.36 0.96 0.04 5 -2.72 0.20 -0.20 0.13 0.36 0.96 0.04 6 -4.16 0.06 0.55 0.34 -0.42 0.99 0.02 7 -2.27 0.18 0.71 -0.03 0.53 0.94 -0.05 8 -2.27 0.28 -0.59 -0.03 0.69 0.94 0.05 9 -1.80 0.27 -0.29 -0.02 -0.17 0.91 0.05

10 -2.09 0.21 -0.13 0.10 0.83 0.93 0.04 11 -1.45 0.26 1.91 -0.18 -0.24 0.88 -0.49 12 -1.45 0.36 -0.76 -0.34 1.46 0.88 0.08 13 -1.56 0.34 -0.55 -0.25 0.29 0.89 0.06 14 -1.08 0.16 0.89 0.85 1.55 0.84 -0.07 15 -1.35 0.23 -0.01 0.43 -0.52 0.87 0.04 16 -1.26 0.15 0.64 0.71 1.26 0.86 -0.01 17 -1.08 0.22 0.52 0.44 2.28 0.84 -0.02 18 -1.26 0.23 0.21 0.33 -0.99 0.86 0.02 19 -0.64 0.42 -0.81 -0.61 -1.17 0.78 0.14 20 -0.77 0.36 -0.50 -0.19 -1.02 0.80 0.10 21 -0.70 0.38 0.23 -0.65 -0.66 0.79 -0.04 22 -0.57 0.42 -0.81 -0.60 -1.34 0.77 0.14 23 -0.77 0.20 0.52 1.01 -0.42 0.80 -0.02 24 0.02 0.37 -0.13 0.30 -1.12 0.67 0.06 25 -0.44 0.35 -0.36 0.22 0.05 0.75 0.10 26 -0.04 0.39 -0.35 0.10 -0.40 0.68 0.11 27 0.23 0.30 0.95 1.34 -1.06 0.63 -0.21 28 0.02 0.48 -1.07 -1.19 0.61 0.67 0.24 29 0.28 0.45 -0.38 -0.81 0.42 0.62 0.12 30 0.48 0.45 0.21 -0.80 0.09 0.58 -0.10 31 0.48 0.36 0.42 0.79 -0.10 0.58 -0.06 32 0.48 0.38 0.26 0.61 0.58 0.58 -0.03 33 0.48 0.29 1.99 1.57 -0.10 0.58 -0.71 34 0.73 0.52 -1.22 -1.40 -0.64 0.53 0.34 35 0.73 0.57 -1.62 -2.44 2.33 0.53 0.39 36 1.33 0.26 2.51 2.22 2.72 0.41 -0.79 37 1.23 0.43 0.44 0.24 -0.89 0.43 -0.15 38 1.12 0.47 -0.43 -0.40 -0.89 0.45 0.15 39 1.33 0.37 1.00 0.99 0.66 0.41 -0.22



116

Groups

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 1.28 0.50 -0.92 -0.68 1.50 0.42 0.24 41 2.47 0.57 -1.01 -1.13 1.36 0.22 0.14 42 1.76 0.55 -1.42 -0.95 -0.37 0.33 0.28 43 2.00 0.45 -0.27 0.11 -0.66 0.29 0.08 44 2.32 0.36 0.92 0.66 0.16 0.24 -0.13 45 2.12 0.50 -0.46 -0.47 0.27 0.27 0.07 46 3.05 0.45 -0.10 -0.34 0.18 0.15 0.03 47 2.87 0.37 0.30 0.39 0.54 0.17 0.01 48 3.37 0.39 0.27 0.01 -1.18 0.12 -0.02 49 5.09 0.05 0.77 0.55 -0.01 0.03 0.01 50 5.54 0.16 0.90 0.19 1.25 0.02 -0.03

Mean S.D.

-0.10 2.23

0.33 0.15

0.07 0.84

0.01 0.81

0.19 1.00

0.64

Note: Raw score mean = 31.83 with a S.D. of 6. Mean person ability = 0.93 with a S.D. of 1.13. Test reliability (K.R. 20) = 0.85. Reliability of person separation = 0.87.

75.

117

Appendix (continued)

Exper iment 18- Summary ofTtem Fit Information for a Normally Distributed Item Difficulty Distribution With 100 Items. 100 Persons, and a 25% Chance of Guessing Correctly

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -4.93 -9.99 -0.18 -1.21 0.04 1.00 -0.00 2 -4.93 -9.99 -0.18 -1.21 0.04 1.00 -0.00 3 -4.93 -9.99 -0.18 -1.21 0.04 1.00 -0.00 4 -3.51 0.10 0.20 0.27 -0.12 0.98 0.03 5 -3.08 0.03 1.22 0.30 0.22 0.97 -0.09 6 -4.22 0.06 0.52 0.34 -0.45 0.99 0.02 7 -4.22 0.06 0.52 0.34 -0.45 0.99 0.02 8 -3.51 0.11 0.47 0.22 0.93 0.98 0.00 9 -2.77 0.01 1.11 0.37 1.55 0.96 -0.05

10 -2.77 0.22 -0.28 0.10 0.32 0.96 0.04 11 -3.08 0.26 -0.37 0.01 0.12 0.97 0.03 12 -3.08 0.17 0.02 0.18 0.12 0.97 0.03 13 -2.32 0.10 0.80 0.41 -1.52 0.94 -0.04 14 -2.32 0.22 2.05 -0.11 -1.52 0.94 -0.45 15 -2.77 0.15 0.44 0.14 1.55 0.96 0.00 16 -1.50 0.23 0.63 0.40 -0.98 0.88 -0.07 17 -2.15 0.28 -0.50 0.02 0.80 0.93 0.05 18 -1.30 0.31 -0.27 0.14 -0.33 0.86 0.06 19 -2.53 0.13 0.58 0.22 -0.90 0.95 -0.01 20 -1.85 0.22 1.81 -0.00 -0.56 0.91 -0.36 21 -1.85 0.17 0.86 0.34 -0.56 0.91 -0.07 22 -1.04 0.36 0.38 -0.36 0.36 0.83 -0.06 23 -1.30 0.25 0.59 0.26 1.34 0.86 -0.04 24 -1.40 0.40 -0.76 -0.49 -0.62 0.87 0.08 25 -1.61 0.40 -1.12 -0.31 1.30 0.89 0.09 26 -1.12 0.26 0.19 0.47 -0.20 0.84 0.02 27 -1.30 0.38 -0.77 -0.32 0.76 0.86 0.09 28 -1.85 0.13 0.63 0.54 -0.56 0.91 -0.01 29 -1.21 0.28 0.54 0.12 0.11 0.85 -0.04 30 -1.04 0.23 1.90 0.41 -0.53 0.83 -0.38 31 -0.96 0.35 0.55 -0.12 -0.45 0.82 -0.11 32 -1.72 0.31 -0.27 -0.18 -0.99 0.90 0.04 33 -1.30 0.35 -0.40 -0.22 -0.86 0.86 0.06 34 -0.67 0.54 -1.57 -1.56 1.90 0.78 0.19 35 -0.96 0.27 0.13 0.57 1.11 0.82 0.02 36 -0.41 0.32 0.83 0.45 -0.20 0.74 -0.13 37 -0.54 0.46 -0.48 -0.99 -0.96 0.76 0.09 38 -0.89 0.33 -0.08 0.16 -0.18 0.81 0.04 39 -0.81 0.25 0.89 0.58 -0.34 0.80 -0.10


118


Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 -0.61 0.50 -1.11 -1.21 1.40 0.77 0.15 41 -0.61 0.32 0.00 0.52 -1.45 0.77 0.03 42 -0.29 0.43 -0.75 -0.19 0.93 0.72 0.14 43 -0.61 0.38 0.14 -0.16 -0.16 0.77 -0.02 44 -0.67 0.42 -0.51 -0.46 -1.07 0.78 0.09 45 -0.29 0.29 1.13 0.77 1.45 0.72 -0.19 46 -0.54 0.25 0.91 0.97 1.10 0.76 -0.09 47 -0.67 0.52 -1.51 -1.32 1.24 0.78 0.18 48 -0.07 0.53 -1.54 -1.40 1.65 0.68 0.28 49 -0.07 0.44 -0.76 -0.16 -0.37 0.68 0.15 50 -0.01 0.48 -0.89 -0.82 0.64 0.67 0.19 51 0.15 0.27 2.47 1.33 1.36 0.64 -0.67 52 -0.07 0.27 0.91 1.60 1.71 0.68 -0.15 53 -0.18 0.46 -0.82 -0.60 1.30 0.70 0.16 54 -0.35 0.22 0.52 1.85 1.06 0.73 -0.05 55 0.15 0.34 1.17 0.90 1.36 0.64 -0.29 56 -0.01 0.45 -0.55 -0.44 -0.62 0.67 0.13 57 0.20 0.47 -0.94 -0.51 -0.22 0.63 0.24 58 0.20 0.39 1.13 -0.02 -0.53 0.63 -0.30 59 0.41 0.36 1.39 0.61 -0.80 0.59 -0.42 60 0.31 0.37 0.44 0.67 -0.73 0.61 -0.06 61 0.25 0.54 -1.37 -1.65 1.12 0.62 0.30 62 0.46 0.35 0.69 1.11 0.42 0.58 -0.12 63 0.71 0.52 -1.08 -1.06 0.40 0.53 0.27 64 1.06 0.43 -0.07 0.22 -0.10 0.46 0.05 65 0.31 0.29 1.26 1.66 1.74 0.61 -0.28 66 0.41 0.23 2.64 2.20 1.28 0.59 -0.76 67 0.61 0.40 0.93 0.18 -0.24 0.55 -0.28 68 0.66 0.45 -0.34 -0.10 -1.04 0.54 0.12 69 0.81 0.34 1.33 1.33 1.43 0.51 -0.37 70 0.56 0.57 -1.91 -1.80 0.80 0.56 0.46 71 1.16 0.57 -1.68 -1.79 0.76 0.44 0.40 72 1.36 0.38 0.40 0.84 2.09 0.40 -0.06 73 1.06 0.53 -1.37 -1.11 0.06 0.46 0.36 74 1.26 0.43 0.18 0.08 0.49 0.42 -0.01 75 1.36 0.60 -2.03 -2.01 2.01 0.40 0.43 76 1.06 0.41 0.50 0.31 0.57 0.46 -0.12 77 1.26 0.46 -0.43 -0.18 -0.59 0.42 0.11 78 1.52 0.46 0.19 0.27 -0.66 0.37 0.02 79 1.62 0.34 1.17 0.97 -1.30 0.35 -0.26 80 1.73 0.40 0.14 0.47 0.35 0.33 0.01 81 1.46 0.37 0.97 0.73 0.32 0.38 -0.22 82 1.68 0.52 -1.02 -0.98 0.07 0.34 0.21



119



83 1.52 0.30 1.06 1.66 -0.10 0.37 -0.19 84 1.46 0.66 -2.72 -2.96 2.60 0.38 0.50 85 2.09 0.48 -0.50 -0.56 -1.44 0.27 0.10 86 2.28 0.42 -0.39 -0.06 -0.09 0.24 0.07 87 1.91 0.38 1.10 0.27 1.13 0.30 -0.20 88 2.28 0.39 0.25 0.12 -1.29 0.24 -0.04 89 2.35 0.55 -1.38 -1.16 1.93 0.23 0.18 90 2.15 0.36 0.11 0.68 -0.50 0.26 -0.01 91 2.28 0.46 -0.48 -0.35 0.73 0.24 0.08 92 2.49 0.44 -0.35 -0.41 -0.41 0.21 0.07 93 2.42 0.40 0.82 -0.26 -0.76 0.22 -0.13 94 2.80 0.36 -0.28 0.24 0.18 0.17 0.06 95 3.51 0.38 -0.28 -0.26 -0.65 0.10 0.03 96 3.94 0.17 0.37 0.50 0.46 0.07 0.01 97 3.38 0.31 -0.44 0.30 1.19 0.11 0.06 98 3.94 0.31 -0.34 -0.01 -1.25 0.07 0.04 99 4.58 -0.02 1.73 0.44 1.78 0.04 -0.15

100 4.90 0.34 -0.66 -0.14 0.03 0.03 0.03 Mean -0.15 0.34 0.08 -0.02 0.22 0.64 S.D. 2.08 0.15 0.99 0.86 0.97 Groups 2 Note: Raw score mean = 63.57 with a S.D. of 14. Mean person ability = 0.88 with a S.D. of 1.08. Test reliability (K.R. 20) = 0.93. Reliability of person separation = 0.93.

08.

120


Experiment 19: Summary of Ttem Fit Information for a Normally Distributed Item Difficulty Distribution With 25 Items. 25 Persons, and a 50% Chance of Guessing Correctly

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -3.92 -9.99 0.20 -0.96 0.05 1.00 -0.00 2 -3.92 -9.99 0.20 -0.96 0.05 1.00 -0.00 3 -1.46 0.47 -0.33 -0.18 0.97 0.84 0.11 4 -3.17 0.06 0.80 0.47 -0.24 0.96 0.03 5 -1.13 0.32 0.40 0.35 -0.14 0.80 -0.03 6 -0.85 0.41 0.14 0.04 0.36 0.76 0.05 7 -1.46 0.37 0.09 0.07 -1.12 0.84 0.06 8 -1.13 0.46 -0.11 -0.25 -0.14 0.80 0.11 9 -0.59 0.62 -0.79 -1.21 0.89 0.72 0.29

10 -0.85 0.48 -0.33 -0.17 0.36 0.76 0.17 11 -1.13 0.49 -0.11 -0.52 -0.26 0.80 0.10 12 -0.85 0.53 -0.50 -0.50 0.36 0.76 0.18 13 -1.13 0.26 0.52 0.60 -0.14 0.80 -0.04 14 -0.59 0.37 0.24 0.28 1.34 0.72 0.05 15 -1.13 0.06 0.95 1.34 -0.14 0.80 -0.17 16 0.49 0.57 -0.85 -0.91 0.34 0.52 0.47 17 1.31 0.54 -0.46 -0.69 0.01 0.36 0.26 18 0.49 0.43 -0.18 0.37 1.52 0.52 0.23 19 0.69 0.26 1.29 1.36 0.83 0.48 -0.73 20 1.54 0.59 -0.88 -0.78 1.15 0.32 0.34 21 1.10 0.64 -1.22 -1.52 0.41 0.40 0.53 22 1.54 0.50 -0.02 -0.37 -0.44 0.32 0.07 23 1.54 -0.07 2.87 2.40 1.42 0.32 -1.79 24 2.33 0.30 0.21 0.71 0.44 0.20 0.02 25 4.47 0.46 -0.10 -0.06 -0.67 0.04 0.03

Mean S.D. Groups

-0.31 1.92

0.40 0.23

0.08 0.82

-0.04 0.89

0.29 0.67 2

0.63

Note: Raw score mean = 15.84 and a S.D. of 4.03. Mean person ability = -0.31 with a S.D. of 1.92. Test reliability (K.R. 20) = 0.78. Reliability of person separation = 0.78.

121



Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 1 -3.47 -9.99 0.05 -LOO 0.05 1.00 -0.00 2 -2.73 0.09 0.65 0.43 -0.34 0.96 0.03 3 -2.73 0.32 0.07 0.21 -0.34 0.96 0.04 4 -3.47 -9.99 0.05 -1.00 0.05 1.00 -0.00 5 -3.47 -9.99 0.05 -1.00 0.05 1.00 -0.00 6 -1.94 0.05 1.09 0.49 0.67 0.92 -0.12 7 -1.94 0.31 0.02 0.17 0.06 0.92 0.06 8 -2.73 0.37 -0.02 0.11 -0.34 0.96 0.04 9 -1.94 0.42 -0.27 -0.09 0.06 0.92 0.07

10 -2.73 0.27 0.18 0.28 -0.34 0.96 0.04 11 -3.47 -9.99 0.05 -1.00 0.05 1.00 -0.00 12 -1.94 0.24 0.22 0.32 0.06 0.92 0.05 13 -2.73 0.11 0.57 0.42 -0.34 0.96 0.04 14 -1.94 0.03 0.83 0.60 0.67 0.92 -0.01 15 -1.44 0.51 -0.56 -0.40 0.40 0.88 0.10 16 -1.44 0.25 0.22 0.42 -0.19 0.88 0.05 17 -1.44 0.53 -0.62 -0.44 0.40 0.88 0.10 18 -1.94 0.49 -0.45 -0.31 0.06 0.92 0.07 19 -1.94 -0.03 1.11 0.63 0.67 0.92 -0.08 20 -1.05 0.37 -0.16 0.22 0.71 0.84 0.10 21 -0.45 0.50 -0.49 -0.24 -0.19 0.76 0.17 22 -0.45 0.54 -0.59 -0.57 -0.19 0.76 0.20 23 -1.05 0.31 0.14 0.38 -1.07 0.84 0.05 24 -0.19 0.60 -0.83 -0.91 0.34 0.72 0.27 25 -1.05 0.45 -0.35 -0.13 0.71 0.84 0.11 26 -0.73 0.67 -1.02 -1.36 1.01 0.80 0.21 27 -0.45 0.51 -0.50 -0.32 -0.19 0.76 0.18 28 0.27 0.06 1.67 2.27 2.86 0.64 -0.64 29 -0.45 0.54 -0.37 -0.70 -0.28 0.76 0.17 30 0.27 0.42 0.00 0.44 -0.78 0.64 0.05 31 1.12 0.35 0.62 1.00 0.61 0.48 -0.20 32 1.76 0.63 -0.86 -1.05 -0.74 0.36 0.36 33 0.70 0.32 0.72 1.11 0.74 0.56 -0.29 34 0.27 0.26 0.54 1.43 -0.78 0.64 -0.20 35 0.91 0.43 0.83 0.16 -0.88 0.52 -0.44 36 1.12 0.32 1.72 0.74 0.10 0.48 -1.08 37 1.98 0.48 0.27 -0.21 -0.39 0.32 -0.06 38 0.70 0.29 0.69 1.44 1.84 0.56 -0.21 39 1.76 0.51 -0.33 -0.18 -0.74 0.36 0.20


122

Appendix C ('continued')

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

40 1.32 0.65 -1.23 -1.19 -0.51 41 1.76 0.44 -0.10 0.36 0.08 42 1.76 0.63 -0.97 -1.02 0.08 43 1.98 0.61 -0.89 -0.81 1.17 44 1.32 0.37 0.65 0.74 1.13 45 1.98 0.54 -0.48 -0.35 -0.39 46 1.32 0.23 1.42 1.59 1.13 47 2.78 0.26 0.27 0.80 0.44 48 4.80 0.25 0.19 0.29 -0.63 49 3.50 0.32 0.00 0.30 -0.04 50 4.01 0.37 -0.17 0.09 -0.30

Mean S.D. Groups

-0.28 2.08

0.37 0.21

0.07 0.68

0.06 0.79

0.12 0.73 2

Mean item score

Logit Residual

Index

0.44 0.36 0.36 0.32 0.44 0.32 0.44 0.20 0.04 0.12 0.08

0.55 0.10 0.39 0.30

-0.24 0.22

-0.73 0.03 0.04 0.07 0.06

0.67

Mean person ability = 1.04 with a S.D. of 1.20. Test reliability (K.R. 20) = 0.88. Reliability of person separation = 0.87.

08.

123


Kvpftriment 21: Summary of Item Fit Information for a Normally Distributed Item Difficulty Distribution With 100 Items. 25 Persons, and a 50% Chance of Guessing Correctly

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -2.83 -0.19 1.36 0.43 1.52 0.96 -0.16

2 -2.83 0.35 -0.23 0.21 -0.19 0.96 0.04

3 -3.56 -9.99 -0.55 -1.05 0.05 1.00 -0.00

4 -3.56 -9.99 -0.55 -1.05 0.05 1.00 -0.00

5 -3.56 -9.99 -0.55 -1.05 0.05 1.00 -0.00

6 -2.83 0.37 -0.27 0.19 -0.19 0.96 0.04

7 -2.83 0.37 -0.27 0.19 -0.19 0.96 0.04 8 -2.83 0.02 0.60 0.39 -0.19 0.96 0.01 9 -2.83 -0.13 1.12 0.42 1.52 0.96 -0.09

10 -1.22 0.53 -0.71 -0.44 0.93 0.84 0.13 11 -2.83 0.37 -0.27 0.19 -0.19 0.96 0.04 12 -2.83 0.13 0.29 0.35 -0.19 0.96 0.03 13 -0.92 0.20 0.98 0.26 -0.35 0.80 -0.30 14 -2.83 0.35 -0.23 0.21 -0.19 0.96 0.04 15 -2.83 0.07 0.44 0.37 -0.19 0.96 0.02 16 -1.22 0.43 -0.45 -0.16 0.93 0.84 0.11 17 -1.22 0.18 0.48 0.30 -1.14 0.84 -0.05 18 -2.06 0.13 0.20 0.35 0.25 0.92 0.04 19 -3.56 -9.99 -0.55 -1.05 0.05 1.00 -0.00 20 -1.58 0.29 -0.10 0.15 -0.70 0.88 0.06 21 -2.83 0.06 0.49 0.38 -0.19 0.96 0.02 22 -2.06 0.31 -0.14 0.13 0.25 0.92 0.05 23 -2.06 -0.12 1.12 0.48 0.25 0.92 -0.12 24 -1.22 0.36 0.13 -0.12 -1.14 0.84 0.00 25 -2.06 0.04 0.46 0.41 0.25 0.92 0.01 26 -1.22 0.24 0.46 0.15 -1.14 0.84 -0.07 27 -1.22 0.46 -0.21 -0.36 -1.14 0.84 0.06 28 -0.92 0.37 0.45 -0.25 -0.35 0.80 -0.13 29 -1.58 0.11 0.71 0.31 1.36 0.88 -0.08 30 -0.92 0.46 -0.21 -0.42 -0.35 0.80 0.07 31 -0.43 0.13 1.13 0.63 0.17 0.72 -0.40 32 -0.43 0.61 -1.12 -0.94 1.84 0.72 0.34 33 -0.92 0.22 0.41 0.32 -0.35 0.80 -0.07 34 -1.22 0.42 -0.38 -0.13 -1.14 0.84 0.09 35 -2.83 -0.15 1.19 0.42 1.52 0.96 -0.11 36 -1.22 0.39 -0.32 -0.05 -1.14 0.84 0.09 37 -2.06 0.30 -0.03 0.09 0.25 0.92 0.04 38 -1.22 0.32 -0.25 0.15 0.93 0.84 0.09 39 -1.58 0.18 0.07 0.36 0.61 0.88 0.04


124


Logit Point Unwt. Wt. Ability Item item bis. total total between

# diff. corr. fit fit fit

Mean Logit item Residual score Index

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

-0.66 -0.43 -0.92 0.37

-0.21 0.37

-0.66 -0.43 -1.22 -0.21 -0.21 0.18

-0.01 -0.66 0.74 0.37 0.37

-0.01 0.37 0.55

-0.01 -0.21 0.37 0.37 0.74 0.37 0.37 0.18 1.50 0.55 1.30 0.18

-0.01 1.50 1.11 1.30 1.30 1.71 0.18 1.30 1.11 0.74 1.94

-0.22 0.24 0.24 0.47 0.35 0.65 0.14 0.50

-0.08 0.43 0.35 0.43 0.71 0.45

-0.03 0.46 0.52 0.20 0.51 0.42 0.26 0.37 0.64 0.27 0.42 0.42 0.63 0.11 0.55 0.66 0.23 0.07 0.72 0.26 0.35 0.16 0.30 0.15 0.38 0.15 0.48 0.30 0.10

1.77 0.27 0.17

-0.70 -0.04 -1.69 0.62

-0.61 1.22

-0.57 0.42

-0.37 -1.80 -0.54 2.00

-0.62 -0.91 0.60

-0.58 -0.31 0.83

-0.31 -1.63 0.33

-0.29 -0.46 -1.55 1.09

-0.97 -1.89 0.80 1.52

-1.83 0.69 0.16 0.89 0.12 0.51

-0.19 0.85

-0.78 0.58 1.27

1.53 0.48 0.33

-0.53 0.04

-1.75 0.63

-0.54 0.75

-0.17 -0.14 -0.34 -1.83 -0.23 2.08

-0.49 -0.91 0.75

-0.99 -0.36 0.28 0.08

-1.70 0.58

-0.43 -0.27 -1.60 1.24

-1.06 -1.92 0.39 1.36

-1.87 0.13

-0.10 0.81 0.22 0.67

-0.03 0.90

-0.73 0.26 0.42

2.01 0.17

-0.35 -1.40 -0.53 1.36

-0.89 0.67 2.07 2.15

-0.53 -0.39 1.50 1.53 0.28

-1.40 -1.40 -1.49 0.22

-0.50 0.30

-0.53 0.22 1.44 0.25 0.22 0.22 1.95

-0.68 0.81

-0.78 0.92 2.47

-0.09 -1.19 1.79

-0.78 0.49

-0.47 -0.78 0.45 0.25

-0.52

0.76 0.72 0.80 0.56 0.68 0.56 0.76 0.72 0.84 0.68 0.68 0.60 0.64 0.76 0.48 0.56 0.56 0.64 0.56 0.52 0.64 0.68 0.56 0.56 0.48 0.56 0.56 0.60 0.32 0.52 0.36 0.60 0.64 0.32 0.40 0.36 0.36 0.28 0.60 0.36 0.40 0.48 0.24

-0.54 -0.04 0.01 0.44 0.07 0.88

-0.13 0.21

-0.22 0.22

-0.14 0.25 0.71 0.17

-1.30 0.39 0.50

-0.22 0.31 0.21

-0.34 0.16 0.86

-0.14 0.19 0.32 0.83

-0.47 0.38 1.03

-0.39 -0.84 0.72

-0.27 -0.05 -0.39 -0.01 -0.12 0.09

-0.38 0.40

-0.39 -0.43



125


# diff. corr. fit


83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

100 Mean S.D. Groups

2.19 1.30 2.19 2.49 2.19 2.84 3.31 1.50 3.31 2.49 2.84 2.84 3.31 2.19 3.31 4.07 3.31 3.31

-0.14 1.91

0.22 0.28 0.19 0.13 0.48 0.44 0.38 0.14 0.09

-0.12 0.24

-0.11 0.05 0.20 0.25

-0.02 0.25 0.37 0.27 0.22

-0.01 0.03 0.28 0.42

-0.67 -0.59 -0.38 0.57 0.32 1.26

-0.06 1.49 0.60 0.18

-0.14 0.69

-0.14 -0.42

0.31 0.43 0.31 0.31

-0.49 -0.22 -0.03 0.86 0.30 0.70 0.15 0.47 0.30 0.28 0.17 0.36 0.17 0.00

-0.61 1.79 0.14 0.79 1.09 0.52 0.19 1.21 0.35 2.23

-0.52 1.49 0.35 1.60 0.19 1.58 0.19 0.19

0.06 0.80

0.01 0.71

0.24 0.96 2

0.20 0.36 0.20 0.16 0.20 0.12 0.08 0.32 0.08 0.16 0.12 0.12 0.08 0.20 0.08 0.04 0.08 0.08 0.62

0.05 0.07

-0.02 -0.02 0.17 0.10 0.07

-0.17 0.02

-0.25 0.05

-0.35 -0.04 0.02 0.06

-0.01 0.06 0.07

Note: Raw score mean = 61.60 with a S.D. of 11.30. Mean person ability = 0.64 with a S.D. of 0.80. Test reliability (K.R. 20) = 0.89. Reliability of person separation = 0.89.

126

Appendix (continued)

Experiment 22: Summary of Ttem Fit Information for a Normally Distributed Item Difficulty Distribution With 25 Items. 50 Persons, and a 50% Chance of Guessing Correctly



1 -4.50 -9.99 0.41 -1.08 0.04 1.00 -0.00 2 -4.50 -9.99 0.41 -1.08 0.04 1.00 -0.00 3 -3.78 0.05 0.88 0.40 -0.57 0.98 0.02 4 -3.03 0.16 0.40 0.33 -0.26 0.96 0.04 5 -1.96 0.42 -0.28 -0.41 0.36 0.90 0.07 6 -1.96 0.22 0.96 0.12 1.47 0.90 -0.12 7 -1.52 0.36 -0.29 0.22 0.70 0.86 0.08 8 -1.33 0.41 -0.35 -0.14 0.86 0.84 0.10 9 -1.01 0.54 -0.76 -1.08 -0.11 0.80 0.16

10 -1.96 0.37 -0.18 -0.21 0.36 0.90 0.07 11 -0.86 0.31 0.95 0.24 1.53 0.78 -0.17 12 -0.10 0.45 0.01 0.13 -0.85 0.66 0.06 13 -0.34 0.44 0.06 -0.01 -1.48 0.70 0.05 14 0.01 0.43 0.08 0.37 0.05 0.64 -0.04 15 0.23 0.31 0.88 1.57 0.20 0.60 -0.24 16 -0.10 0.44 0.10 0.10 0.39 0.66 0.05 17 0.34 0.48 0.15 -0.16 -1.09 0.58 -0.09 18 0.12 0.51 -0.37 -0.40 -0.75 0.62 0.17 19 1.63 0.64 -1.52 -1.68 0.71 0.34 0.38 20 1.63 0.54 -0.33 -0.75 -0.38 0.34 0.12 21 1.63 0.22 1.80 2.06 2.14 0.34 -0.52 22 2.41 0.61 -1.26 -1.16 1.31 0.22 0.21 23 1.63 0.42 0.04 0.74 -0.80 0.34 0.04 24 3.56 0.27 0.80 0.32 -0.27 0.10 -0.05 25 4.75 0.19 0.52 0.56 -0.36 0.04 0.02

Mean -0.36 0.38 0.12 -0.04 0.13 0.64 S.D. 2.33 0.19 0.73 0.84 0.88 Groups 2 Note: Raw score mean = 16.10 and a S.D. of 3.77. Mean person ability = 0.79 with a S.D. of 1.30. Test reliability (K.R. 20) = 0.77. Reliability of person separation = 0..78.

127

Appendic C (continued"!

Experiment 23: Summary of Item Fit Information for a Normally Distributed Ttem Difficulty Distribution With 50 Items. 50 Persons, and a 50%'Chanr.e nf f i n i n g Correctly *

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit 1 -3.84 0.23 0.17 0.31 -0.59 2 -3.84 0.07 0.69 0.41 -0.59 3 -4.57 -9.99 0.14 -1.04 0.05 4 -4.57 -9.99 0.14 -1.04 0.05 5 -3.08 0.30 -0.14 0.15 -0.27 6 -2.61 0.26 -0.08 0.33 -0.04 7 -3.84 0.16 0.38 0.37 -0.59 8 -1.54 0.21 0.71 0.67 0.60 9 -2.61 0.13 3.96 -0.12 0.66

10 -1.98 0.26 0.39 0.31 -0.42 11 -1.54 0.57 -1.08 -1.03 0.70 12 -1.54 0.38 -0.16 -0.06 0.70 13 -1.18 0.34 0.31 0.22 -0.22 14 -1.75 0.16 0.51 0.86 -0.95 15 -0.88 0.45 -0.46 -0.23 -1.21 16 -1.18 0.44 -0.42 -0.18 -0.44 17 -1.18 0.54 -0.79 -0.95 1.02 18 -0.25 0.43 -0.15 0.17 -0.24 19 -0.74 0.37 1.06 0.18 -1.14 20 -1.18 0.38 0.54 -0.12 -0.22 21 -1.35 0.33 0.33 0.19 0.19 22 -0.49 0.40 0.30 0.21 -0.79 23 -1.03 0.56 -1.03 -0.95 1.18 24 -0.36 0.51 -0.69 -0.55 0.06 25 -0.02 0.54 -0.55 -1.00 -0.09 26 -0.25 0.60 -1.14 -1.52 0.36 27 -0.61 0.32 0.55 0.63 -0.34 28 -0.61 0.46 -0.39 -0.20 -0.34 29 -0.02 0.49 -0.43 -0.36 -0.09 30 0.60 0.48 -0.19 -0.14 0.14 31 0.19 0.36 0.35 1.10 2.03 32 0.60 0.22 1.74 2.35 1.50 33 1.01 0.61 -1.20 -1.69 2.37 34 1.01 0.48 -0.23 -0.21 -1.40 35 0.81 0.51 -0.46 -0.43 0.03 36 1.44 0.39 0.33 0.68 -1.15 37 0.70 0.47 -0.07 -0.12 -0.85 38 1.33 0.42 1.00 0.15 -0.48 39 1.90 0.25 0.83 1.46 -0.65

Mean item score

Logit Residual

Index

0.98 0.98 1.00 1.00 0.96 0.94 0.98 0.86 0.94 0.90 0.86 0.86 0.82 0.88 0.78 0.82 0.82 0.68 0.76 0.82 0.84 0.72 0.80 0.70 0.64 0.68 0.74 0.74 0.64 0.52 0.60 0.52 0.44 0.44 0.48 0.36 0.50 0.38 0.28

0.03 0.03

-0.00 -0.00 0.04 0.05 0.03

-0.07 -2.81 -0.01 0.13 0.07

-0.01 0.00 0.13 0.10 0.14 0.11

-0.33 -0.08 -0.00 -0.03 0.17 0.20 0.07 0.31

-0.05 0.13 0.19 0.12

-0.01 -0.64 0.40 0.15 0.21

-0.04 0.06

-0.37 -0.23



128

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit 40 1.66 0.58 -1.07 -1.05 0.36 41 1.90 0.43 0.15 0.04 -0.48 42 2.16 0.29 1.12 0.73 0.32 43 2.03 0.51 -0.27 -0.67 0.33 44 2.97 0.29 0.38 0.44 -0.07 45 2.45 0.39 0.95 -0.09 -0.02 46 2.16 0.28 0.89 0.80 2.21 47 2.97 0.19 1.71 0.60 1.33 48 2.97 0.36 0.55 -0.03 -0.54 49 4.06 0.52 -0.81 -0.63 0.15 50 4.53 0.16 0.53 0.35 0.99

Mean S.D. Groups

-0.18 2.18

0.38 0.16

0.18 0.89

-0.01 0.75

0.06 0.85 2

Mean item score

0.32 0.28 0.24 0.26 0.14 0.20 0.24 0.14 0.14 0.06 0.04

Logit Residual

Index

0.29 0.00

-0.24 0.10

-0.00 -0.23 -0.10 -0.39 -0.07 0.05 0.01


0.61

129


Experiment 24: Summary of Item Fit Information for a Normally Distributed Item Difficulty Distribution With 100 Items. 50 Persons, and a 50% Chanpp nf Guessing Correctly

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 1 -4.33 -9.99 -0.02 -1.09 0.05 1.00 -0.00 2 -4.33 -9.99 -0.02 -1.09 0.05 1.00 -0.00 3 -2.86 0.34 -0.30 -0.06 -0.24 0.96 0.04 4 -2.40 0.26 0.12 0.07 0.00 0.94 0.04 5 -2.86 0.24 -0.12 0.24 -0.24 0.96 0.04 6 -2.86 0.10 1.30 0.23 1.28 0.96 -0.15 7 -2.40 0.27 -0.08 0.18 0.00 0.94 0.05 8 -3.61 0.27 -0.03 0.22 -0.56 0.98 0.02 9 -2.86 0.19 0.20 0.27 -0.24 0.96 0.03

10 -3.61 0.27 -0.03 0.22 -0.56 0.98 0.02 11 -2.40 0.11 1.14 0.31 0.55 0.94 -0.11 12 -2.06 0.23 0.04 0.40 -0.02 0.92 0.04 13 -1.00 0.28 0.62 0.64 -0.36 0.82 -0.06 14 -2.86 0.07 1.23 0.32 1.28 0.96 -0.12 15 -2.06 0.25 -0.12 0.40 0.21 0.92 0.06 16 -1.55 0.24 0.43 0.41 0.91 0.88 -0.00 17 -2.06 0.34 0.07 -0.19 -0.02 0.92 0.03 18 -2.40 0.34 -0.42 0.06 0.00 0.94 0.05 19 -1.35 0.46 -0.51 -0.46 -1.21 0.86 0.09 20 -1.55 0.41 -0.56 -0.15 0.57 0.88 0.09 21 -1.79 0.41 -0.42 -0.30 0.40 0.90 0.07 22 -2.40 0.26 -0.14 0.26 0.00 0.94 0.05 23 -2.06 0.30 -0.05 0.06 0.21 0.92 0.05 24 -1.00 0.29 1.09 0.40 -0.36 0.82 -0.21 25 -1.35 0.42 2.09 -0.75 -1.21 0.86 -0.72 26 -1.35 0.29 1.26 0.09 0.48 0.86 -0.25 27 -2.06 0.04 1.38 0.67 1.91 0.92 -0.15 28 -2.06 0.24 0.28 0.25 -0.02 0.92 0.02 29 -1.35 0.37 -0.49 0.30 0.74 0.86 0.09 30 -1.35 0.29 0.33 0.35 0.48 0.86 0.01 31 -2.06 0.26 0.20 0.12 -0.02 0.92 0.03 32 -1.16 0.16 1.14 1.04 0.06 0.84 -0.16 33 -0.30 0.37 0.45 0.58 0.34 0.72 -0.08 34 -1.35 0.31 2.24 -0.11 -1.21 0.86 -0.74 35 -1.16 0.29 -0.11 0.72 -0.70 0.84 0.06 36 -1.00 0.60 -1.32 -1.20 1.07 0.82 0.16 37 -0.56 0.36 0.44 0.46 -0.10 0.76 -0.05 38 -0.42 0.58 -1.23 -0.99 0.77 0.74 0.24 39 -1.00 0.12 2.08 0.96 2.15 0.82 -0.42


Appendix C (continuedJ

130

Item #

Logit item diff.

Point bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index 40 -1.16 0.18 1.21 0.82 1.43 0.84 -0.18 41 -0.84 0.46 -0.43 -0.30 0.67 0.80 0.11 42 -0.69 0.23 1.76 0.83 0.29 0.78 -0.41 43 -0.30 0.38 0.68 0.35 0.34 0.72 -0.11 44 -1.16 0.53 -0.68 -0.92 -0.70 0.84 0.10 45 -0.06 0.30 2.01 1.05 -0.51 0.68 -0.73 46 0.06 0.57 -0.43 -1.20 0.84 0.66 0.13 47 -0.69 0.36 0.96 0.16 -1.52 0.78 -0.20 48 -0.18 0.55 -0.56 -0.86 1.24 0.70 0.14 49 -0.06 0.44 0.07 0.15 -0.64 0.68 -0.03 50 0.06 0.54 -0.98 -0.51 0.84 0.66 0.26 51 0.39 0.41 0.54 0.52 0.05 0.60 -0.11 52 -0.56 0.48 -0.39 -0.40 -0.90 0.76 0.11 53 -0.42 0.42 0.19 0.03 -0.55 0.74 -0.04 54 0.49 0.45 0.13 0.26 -0.43 0.58 -0.02 55 -0.42 0.53 -1.02 -0.51 0.77 0.74 0.19 56 -0.18 0.38 0.01 0.83 -1.20 0.70 0.04 57 0.17 0.50 -0.28 -0.29 -1.17 0.64 0.12 58 0.39 0.45 0.43 0.21 0.05 0.60 -0.14 59 0.60 0.68 -1.85 -2.12 1.50 0.56 0.54 60 0.60 0.56 -0.81 -0.70 0.69 0.56 0.29 61 0.70 0.53 -0.65 -0.40 0.17 0.54 0.25 62 0.06 0.49 -0.39 -0.10 0.25 0.66 0.14 63 0.49 0.31 1.07 1.59 1.44 0.58 -0.27 64 0.70 0.50 -0.47 0.01 0.68 0.54 0.22 65 0.17 0.50 -0.41 -0.25 0.18 0.64 0.13 66 0.49 0.54 -0.49 -0.68 0.31 0.58 0.19 67 1.02 0.51 0.07 -0.41 -0.41 0.48 -0.05 68 0.39 0.58 -1.00 -0.98 -0.11 0.60 0.29 69 0.81 0.51 -0.18 -0.27 0.60 0.52 0.09 70 0.81 0.65 -1.67 -1.73 1.42 0.52 0.51 71 1.12 0.50 0.20 -0.31 -1.07 0.46 -0.08 72 1.23 0.60 -1.20 -1.11 -0.31 0.44 0.39 73 0.81 0.50 -0.34 -0.02 -0.41 0.52 0.17 74 1.33 0.34 1.41 1.07 -0.88 0.42 -0.50 75 1.44 0.31 1.06 1.39 0.99 0.40 -0.29 76 1.23 0.54 -0.63 -0.54 0.68 0.44 0.24 77 0.91 0.35 0.65 1.40 1.51 0.50 -0.12 78 1.65 0.51 -0.07 -0.54 0.16 0.36 0.01 79 1.65 0.51 -0.57 -0.31 0.16 0.36 0.19 80 1.54 0.61 -1.23 -1.44 1.38 0.38 0.35 81 1.44 0.56 -0.92 -0.84 -0.12 0.40 0.30 82 1.65 0.31 0.77 1.38 -0.16 0.36 -0.18



131



83 2.24 0.15 1.80 1.61 1.68 0.26 -0.45 84 1.23 0.34 0.99 1.21 1.08 0.44 -0.29 85 1.77 0.38 0.44 0.67 0.26 0.34 -0.07 86 2.00 0.51 -0.54 -0.58 -1.23 0.30 0.16 87 2.12 0.45 -0.18 -0.17 0.34 0.28 0.10 88 2.37 0.35 -0.12 0.64 -0.10 0.24 0.07 89 2.24 0.48 -0.32 -0.41 -0.48 0.26 0.09 90 2.51 0.31 0.31 0.60 0.29 0.22 -0.01 91 2.66 0.54 -0.07 -1.27 -0.02 0.20 -0.00 92 2.51 0.41 -0.21 0.05 -1.48 0.22 0.08 93 3.16 0.41 -0.49 -0.21 0.74 0.14 0.09 94 3.16 0.39 -0.21 -0.24 -1.19 0.14 0.06 95 2.51 0.42 0.33 -0.31 -1.48 0.22 -0.05 96 3.36 0.13 1.29 0.66 0.89 0.12 -0.20 97 4.64 0.21 0.00 0.24 -0.23 0.04 0.04 98 3.59 0.45 -0.78 -0.43 0.40 0.10 0.08 99 4.19 0.14 0.49 0.33 0.51 0.06 0.01

100 4.64 0.29 -0.26 0.10 -0.23 0.04 0.04 Mean -0.09 0.38 0.10 0.01 0.14 0.64 S.D. 1.96 0.15 0.84 0.72 0.80 Groups 2 Note: Raw score mean = 63.84 with a S.D. of 15.57. Mean person ability = 0.91 with a S.D. of 1.18. Test reliability (K.R. 20) = 0.94. Reliability of person separation = 0.94.

132



Logit Point. Unwt. Wt. Ability Mean Logit Item item bis. total total between item Residual


1 -4.17 0.10 0.43 0.33 -0.57 0.99 0.02 2 -3.45 0.11 0.88 0.20 1.32 0.98 -0.03 3 -2.70 0.10 1.30 0.23 2.02 0.96 -0.12 4 -2.24 0.32 -0.39 -0.11 0.43 0.94 0.04 5 -1.50 0.25 2.69 0.07 -0.64 0.89 -0.74 6 -2.06 0.14 0.55 0.59 0.73 0.93 0.01 7 -1.62 0.19 1.18 0.35 2.13 0.90 -0.10 8 -1.28 0.36 -0.64 0.26 1.28 0.87 0.08 9 -1.08 0.41 -0.10 -0.46 -0.67 0.85 0.03

10 -0.66 0.35 0.12 0.49 -0.75 0.80 0.01 11 0.01 0.48 -0.28 -0.47 -0.71 0.70 0.09 12 -0.24 0.39 0.02 0.49 -0.03 0.74 0.03 13 0.01 0.47 -0.18 -0.42 -0.86 0.70 0.07 14 0.12 0.53 -1.07 -0.94 -1.35 0.68 0.24 15 0.29 0.47 -0.33 -0.10 1.20 0.65 0.08 16 0.51 0.51 -0.55 -0.66 0.18 0.61 0.14 17 0.73 0.57 -1.36 -1.63 1.19 0.57 0.31 18 1.19 0.49 -0.32 -0.37 -0.78 0.48 0.09 19 1.60 0.31 1.25 2.07 1.06 0.40 -0.26 20 1.50 0.54 -0.20 -1.38 0.21 0.42 0.01 21 2.04 0.40 0.62 0.51 1.85 0.32 -0.09 22 2.54 0.45 -0.49 -0.29 -0.29 0.24 0.10 23 2.99 0.31 0.16 0.82 -0.54 0.18 0.03 24 3.25 0.44 -0.66 -0.40 -0.32 0.15 0.09 25 4.25 0.22 0.30 0.43 -1.40 0.07 0.02

Mean 0.00 0.36 0.12 -0.02 0.19 0.64 S.D. 2.13 0.15 0.87 0.75 1.06 Groups 2 Note: Raw score mean = 16.02 and a S Mean person ability = 1.09 with a S.D. Test reliability (K.R. 20) = 0.76. Reliability of person separation = 0.77.

D. of 3.74. of 1.22.

133





1 -4.01 0.17 -0.03 0.30 -0.47 0.99 0.02 2 -4.72 -9.99 -0.25 -1.20 0.04 1.00 -0.00 3 -3.29 -0.01 0.94 0.32 3.13 0.98 -0.01 4 -4.01 0.17 -0.03 0.30 -0.47 0.99 0.02 5 -2.86 0.04 0.65 0.33 0.29 0.97 0.01 6 -2.86 0.32 -0.79 -0.09 0.09 0.97 0.03 7 -2.31 0.14 0.62 0.24 1.14 0.95 -0.02 8 -2.86 0.13 0.31 0.20 0.29 0.97 0.02 9 -2.31 0.26 -0.43 0.08 0.46 0.95 0.04

10 -1.50 0.28 -0.30 0.20 -0.07 0.90 0.05 11 -1.92 0.20 0.18 0.18 -1.02 0.93 0.02 12 -1.92 0.21 0.04 0.22 -1.02 0.93 0.03 13 -1.27 0.28 0.23 0.12 -0.04 0.88 -0.00 14 -1.07 0.27 0.46 0.09 1.43 0.86 -0.01 15 -1.17 0.32 0.52 -0.18 -0.37 0.87 -0.05 16 -1.50 0.28 0.32 -0.09 0.60 0.90 -0.00 17 -0.89 0.46 -0.69 -0.89 1.02 0.84 0.07 18 -1.27 0.34 -0.62 0.03 1.38 0.88 0.07 19 -0.30 0.39 -0.46 -0.03 -1.13 0.76 0.09 20 -0.18 0.41 1.29 -0.49 -1.19 0.74 -0.37 21 -0.44 0.40 -0.56 -0.16 -0.55 0.78 0.09 22 -0.44 0.55 -1.59 -1.45 1.85 0.78 0.19 23 -0.30 0.36 0.54 -0.02 -1.13 0.76 -0.09 24 -0.06 0.27 1.45 0.92 0.20 0.72 -0.27 25 -0.12 0.38 0.16 0.14 -0.15 0.73 -0.02 26 0.06 0.33 1.17 0.43 0.42 0.70 -0.25 27 0.17 0.29 1.22 1.03 1.24 0.68 -0.23 28 -0.18 0.48 -1.05 -0.85 0.44 0.74 0.17 29 -0.18 0.42 -0.53 -0.24 0.44 0.74 0.10 30 0.49 0.37 -0.03 0.70 -1.54 0.62 0.02 31 0.44 0.32 0.90 1.11 1.25 0.63 -0.23 32 0.54 0.47 -0.67 -0.64 -0.72 0.61 0.18 33 0.89 0.41 -0.03 0.28 -1.09 0.54 0.04 34 0.89 0.58 -2.13 -2.28 1.10 0.54 0.52 35 0.89 0.43 -0.27 0.00 -1.09 0.54 0.09 36 1.08 0.37 0.86 0.69 -0.53 0.50 -0.24 37 1.48 0.31 1.02 1.43 -0.55 0.42 -0.25 38 1.13 0.55 -1.66 -1.74 1.77 0.49 0.42 39 1.73 0.33 1.48 0.78 1.03 0.37 -0.34



134



40 1.53 0.59 -2.12 -2.15 0.77 0.41 0.48 41 1.58 0.40 0.29 0.17 -1.16 0.40 -0.04 42 1.78 0.48 -0.84 -0.62 -0.85 0.36 0.20 43 2.23 0.31 1.49 0.53 -0.99 0.28 -0.29 44 2.23 0.33 0.82 0.54 -0.63 0.28 -0.14 45 2.68 0.36 -0.15 0.17 -1.45 0.21 0.04 46 2.17 0.38 0.37 0.19 -0.07 0.29 -0.07 47 2.91 0.35 0.05 0.02 -0.18 0.18 0.02 48 3.45 0.28 -0.39 0.44 1.55 0.12 0.06 49 4.74 -0.07 1.57 0.47 2.81 0.04 -0.09 50 4.11 0.28 -0.17 -0.07 0.01 0.07 0.03

Mean -0.09 0.33 0.06 -0.01 0.13 0.66 S.D. 2.12 0.15 0.89 0.75 1.09 Groups 2 Note: Raw score mean = 32.79 with a S.D. of 6. Mean person ability = 1.09 with a S.D. of 1.04. Test reliability (K.R. 20) = 0.85. Reliability of person separation = 0.85.

64.

135





1 -4.18 -0.04 1.12 0.35 1.84 0.99 -0.03 2 -4.18 0.14 0.07 0.31 -0.36 0.99 0.02 3 -4.18 0.02 0.74 0.35 -0.36 0.99 0.01 4 -4.89 -9.99 -0.26 -1.20 0.04 1.00 -0.00 5 -4.89 -9.99 -0.26 -1.20 0.04 1.00 -0.00 6 -3.04 0.05 0.73 0.30 1.86 0.97 -0.01 7 -4.18 0.09 0.30 0.33 -0.36 0.99 0.02 8 -3.46 0.16 -0.13 0.23 -0.01 0.98 0.03 9 -2.27 0.19 1.40 0.08 -0.89 0.94 -0.22

10 -2.48 0.26 -0.33 0.01 -1.51 0.95 0.04 11 -2.73 0.03 0.72 0.39 1.22 0.96 -0.00 12 -2.10 0.24 0.42 -0.07 -0.22 0.93 -0.02 13 -3.04 0.14 0.94 0.09 -0.08 0.97 -0.07 14 -1.94 0.13 1.01 0.29 0.76 0.92 -0.07 15 -2.27 0.17 0.87 0.11 -0.89 0.94 -0.08 16 -1.80 0.44 -1.11 -0.68 0.13 0.91 0.07 17 -2.10 0.25 -0.24 0.11 -0.47 0.93 0.04 18 -2.10 0.38 -0.29 -0.54 -0.22 0.93 0.02 19 -1.80 0.14 0.44 0.57 0.38 0.91 0.00 20 -1.68 0.38 -0.75 -0.37 0.37 0.90 0.06 21 -1.68 0.19 1.16 0.10 1.13 0.90 -0.12 22 -2.27 0.26 -0.10 -0.04 0.22 0.94 0.02 23 -1.16 0.41 -0.64 -0.56 0.43 0.85 0.08 24 -1.80 0.19 1.40 0.06 -1.23 0.91 -0.20 25 -1.35 0.11 1.18 0.79 1.09 0.87 -0.10 26 -1.16 0.33 -0.45 0.02 -0.67 0.85 0.07 27 -1.80 0.21 0.77 0.03 1.47 0.91 -0.06 28 -1.16 0.20 0.48 0.62 1.33 0.85 -0.01 29 -1.16 0.26 0.74 0.07 0.48 0.85 -0.06 30 -1.16 0.43 -0.68 -0.69 0.43 0.85 0.07 31 -0.99 0.27 0.01 0.45 -0.16 0.83 0.03 32 -0.99 0.33 0.17 -0.13 -1.41 0.83 -0.01 33 -1.68 0.28 1.12 -0.27 -1.14 0.90 -0.17 34 -0.77 0.34 -0.21 0.04 -0.15 0.80 0.06 35 -1.08 0.32 -0.02 0.02 0.17 0.84 0.02 36 -0.99 0.09 0.74 1.53 0.76 0.83 -0.04 37 -0.91 0.33 0.05 -0.02 -0.89 0.82 0.01 38 -0.56 0.20 1.40 1.08 0.63 0.77 -0.24 39 -1.08 0.37 -0.27 -0.19 0.65 0.84 0.04


136




40 -0.77 0.32 -0.10 0.28 -1.50 0.80 0.04 41 -0.84 0.35 -0.01 -0.18 0.17 0.81 0.04 42 -0.63 0.41 -0.53 -0.47 0.13 0.78 0.10 43 -0.25 0.42 -0.20 -0.60 -0.59 0.72 0.06 44 -0.70 0.29 0.25 0.43 0.42 0.79 -0.01 45 -0.20 0.34 0.44 0.19 1.05 0.71 -0.05 46 -0.84 0.35 -0.05 -0.12 0.44 0.81 0.01 47 -0.03 0.31 0.09 0.96 0.23 0.68 0.01 48 -0.31 0.38 -0.31 -0.02 -0.63 0.73 0.09 49 -0.09 0.48 -1.14 -0.95 0.32 0.69 0.23 50 -0.20 0.38 -0.10 0.01 -0.40 0.71 0.04 51 0.18 0.42 -0.45 -0.34 -0.48 0.64 0.13 52 -0.09 0.37 -0.15 0.24 -1.28 0.69 0.07 53 -0.03 0.42 -0.67 -0.26 1.18 0.68 0.15 54 -0.43 0.38 -0.20 -0.16 -0.94 0.75 0.05 55 0.23 0.34 0.85 0.45 -0.68 0.63 -0.21 56 -0.31 0.29 0.34 0.86 -0.99 0.73 -0.06 57 0.63 0.37 0.16 0.41 -0.42 0.55 -0.00 58 0.13 0.39 0.15 -0.07 -0.88 0.65 0.00 59 0.53 0.44 0.25 -0.74 -0.39 0.57 -0.12 60 0.13 0.48 -0.78 -1.12 1.26 0.65 0.17 61 0.43 0.39 0.06 0.05 -0.33 0.59 -0.01 62 0.53 0.40 -0.29 0.13 0.28 0.57 0.13 63 0.13 0.38 0.16 0.10 0.67 0.65 -0.04 64 0.43 0.41 -0.21 -0.17 -1.37 0.59 0.08 65 0.87 0.36 0.33 0.63 -0.06 0.50 -0.05 66 0.72 0.37 0.46 0.49 -0.42 0.53 -0.14 67 0.82 0.50 -1.37 -1.34 -0.42 0.51 0.39 68 1.01 0.48 -0.54 -1.10 1.51 0.47 0.13 69 0.63 0.32 1.21 0.97 -0.42 0.55 -0.40 70 0.72 0.47 -0.79 -0.87 0.29 0.53 0.23 71 1.25 0.36 0.36 0.56 -1.39 0.42 -0.10 72 1.06 0.34 1.02 0.86 -1.06 0.46 -0.33 73 1.11 0.38 0.46 0.26 -1.19 0.45 -0.16 74 1.20 0.30 1.69 1.13 0.08 0.43 -0.57 75 1.30 0.41 0.01 -0.07 -0.01 0.41 -0.01 76 0.91 0.44 -0.36 -0.54 -0.38 0.49 0.11 77 1.66 0.39 -0.16 0.21 0.90 0.34 0.01 78 1.20 0.45 -0.72 -0.47 -0.13 0.43 0.23 79 1.66 0.45 -0.48 -0.58 0.90 0.34 0.12 80 1.94 0.39 -0.17 0.14 -0.30 0.29 0.05 81 1.82 0.56 -1.50 -1.86 1.51 0.31 0.24 82 2.24 0.40 -0.35 -0.10 0.30 0.24 0.08



137



83 1.66 0.39 -0.08 0.17 -1.15 0.34 0.06 84 2.18 0.42 -0.29 -0.26 0.37 0.25 0.04 85 2.05 0.34 0.99 0.24 0.28 0.27 -0.20 86 2.18 0.39 0.16 -0.11 -0.53 0.25 -0.04 87 2.05 0.30 0.91 0.59 1.62 0.27 -0.11 88 2.51 0.39 -0.33 -0.14 0.56 0.20 0.06 89 2.37 0.17 1.39 1.31 0.83 0.22 -0.18 90 2.75 0.34 -0.09 0.10 -0.73 0.17 0.04 91 2.67 0.33 0.14 0.20 -0.42 0.18 0.00 92 2.31 0.42 -0.31 -0.33 -1.34 0.23 0.06 93 2.67 0.28 -0.06 0.80 -0.42 0.18 0.02 94 3.45 0.41 -0.91 -0.30 1.17 0.10 0.07 95 3.21 0.44 -0.80 -0.52 -0.99 0.12 0.07 96 3.33 0.51 -1.37 -0.91 1.30 0.11 0.09 97 4.52 0.19 1.06 0.00 -0.27 0.04 -0.10 98 4.27 0.27 -0.41 0.13 0.46 0.05 0.04 99 4.84 0.16 0.03 0.29 0.08 0.03 0.03

100 4.52 0.39 -0.92 -0.26 0.28 0.04 0.04 Mean -0.10 0.32 0.08 0.01 0.00 0.63 S.D. 2.11 0.13 0.69 0.58 0.83 Groups 2 Note: Raw score mean = 63.26 with a S.D. of 12. Mean person ability = 0.88 with a S.D. of 1.01. Test reliability (K.R. 20) = 0.92. Reliability of person separation = 0.92.

86.

138


Experiment 7R- Summary of Item Fit Information for a Uniformly Distributed Item Difficulty Distribution With 25 Items. 25 Persons, and No Guessing

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -1.07 0.42 -0.24 -0.05 0.48 0.84 0.11 2 -0.46 0.43 -0.15 0.15 -0.68 0.76 0.12 3 -1.95 0.33 0.01 0.11 -0.10 0.92 0.07 4 -0.46 0.26 0.67 0.88 0.21 0.76 -0.13 5 -0.75 0.40 0.21 -0.02 -1.49 0.80 0.02 6 -0.20 0.60 -0.65 -1.03 -0.09 0.72 0.25 7 -1.45 0.22 0.72 0.27 0.17 0.88 -0.06 8 0.93 0.33 0.84 1.10 0.52 0.52 -0.35 9 -0.20 0.44 0.22 -0.13 -0.46 0.72 0.02

10 -0.75 0.33 0.46 0.15 0.82 0.80 -0.02 11 -0.46 0.44 0.25 -0.22 -0.68 0.76 -0.02 12 -0.20 0.64 -0.77 -1.38 -0.09 0.72 0.28 13 0.04 0.52 -0.02 -0.44 -1.45 0.68 0.09 14 0.27 0.71 -1.11 -1.79 0.92 0.64 0.44 15 0.50 0.55 -0.36 -0.36 0.16 0.60 0.23 16 0.04 0.61 -0.68 -0.91 0.43 0.68 0.27 17 0.93 0.43 0.29 0.56 -1.18 0.52 -0.04 18 0.72 0.38 0.73 0.65 -0.66 0.56 -0.31 19 1.15 0.34 0.84 1.07 -0.32 0.48 -0.34 20 0.27 0.32 0.57 1.05 -0.15 0.64 -0.16 21 0.93 0.47 0.01 0.34 0.52 0.52 0.10 22 0.72 0.32 0.89 1.07 -0.12 0.56 -0.39 23 0.04 0.64 -0.76 -1.26 0.43 0.68 0.31 24 0.93 0.54 -0.28 -0.26 -1.18 0.52 0.20 25 0.50 0.36 0.53 0.89 -1.05 0.60 -0.17

Mean S.D. Groups

0.00 0.80

0.44 0.13

0.09 0.58

0.02 0.82

-0.20 0.70 2

0.68

Note: Raw score mean = 16.88 and a S Mean person ability = 1.06 with a S.D. Test reliability (K.R. 20) = 0.85. Reliability of person separation = 0.85.

D. of 5.30. of 1.30.

139


Experiment Summary of Item Fit Information for a Uniformly Distributed Item Difficulty Distribution With 50 Items. 25 Persons, and No Guessing

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -2.37 0.07 0.54 0.39 -0.28 0.96 0.02 2 -0.74 -0.01 2.63 0.60 -1.38 0.84 -1.33 3 -1.59 -0.03 0.88 0.50 0.48 0.92 -0.06 4 -1.11 0.27 0.43 0.03 -0.39 0.88 -0.03 5 -0.74 0.21 0.55 0.26 -1.38 0.84 -0.06 6 -0.74 0.35 0.00 -0.07 -1.38 0.84 0.06 7 -0.74 0.18 0.26 0.54 -1.38 0.84 0.02 8 -0.43 0.12 0.49 0.86 -0.69 0.80 -0.05 9 -0.43 0.57 -0.86 -0.82 1.07 0.80 0.20

10 -0.74 0.10 0.72 0.61 0.90 0.84 -0.09 11 -0.17 0.43 -0.10 -0.31 -0.09 0.76 0.07 12 0.07 0.30 0.08 0.47 0.54 0.72 0.05 13 -1.11 0.31 0.14 0.05 -0.39 0.88 0.03 14 -1.11 0.42 -0.28 -0.21 -0.39 0.88 0.08 15 -0.74 0.36 -0.17 -0.03 -1.38 0.84 0.09 16 0.07 0.31 -0.14 0.58 0.39 0.72 0.08 17 -0.43 0.36 0.19 -0.14 -0.69 0.80 0.01 18 0.07 -0.04 1.73 1.56 1.77 0.72 -0.64 19 -0.17 0.32 0.17 0.14 1.12 0.76 0.03 20 0.90 0.16 1.31 1.46 0.76 0.56 -0.72 21 -0.17 0.45 -0.41 -0.29 -0.44 0.76 0.17 22 0.71 0.27 0.55 0.86 -1.05 0.60 -0.24 23 -0.43 0.35 -0.15 0.09 -0.69 0.80 0.08 24 -0.43 0.27 0.03 0.40 -0.69 0.80 0.05 25 -0.43 0.52 -0.67 -0.60 -0.69 0.80 0.18 26 -0.43 0.32 0.29 -0.05 0.24 0.80 -0.00 27 0.71 0.44 -0.14 -0.25 0.09 0.60 0.15 28 0.07 0.39 -0.20 0.05 -1.35 0.72 0.11 29 0.51 0.44 1.32 -0.81 -0.02 0.64 -0.89 30 0.51 0.52 -0.37 -0.84 1.24 0.64 0.19 31 -0.43 0.29 0.09 0.26 0.24 0.80 0.05 32 -0.17 0.25 1.06 0.17 -0.44 0.76 -0.33 33 0.51 0.54 -0.61 -0.96 -0.02 0.64 0.30 34 -0.74 0.11 1.05 0.49 -1.38 0.84 -0.23 35 1.29 0.40 0.14 0.13 -0.66 0.48 -0.02 36 0.07 0.32 -0.15 0.53 0.39 0.72 0.08 37 0.90 0.26 0.86 0.88 -0.64 0.56 -0.42 38 0.90 0.32 0.37 0.59 -0.64 0.56 -0.10 39 0.71 0.42 -0.33 0.01 0.09 0.60 0.23 40 0.51 0.60 -1.17 -1.20 -0.02 0.64 0.49



140

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

41 0.90 0.31 0.42 0.72 0.76 0.56 -0.10 42 0.07 0.39 -0.26 0.05 0.39 0.72 0.12 43 0.30 0.34 -0.01 0.35 -0.08 0.68 0.07 44 0.07 0.63 -1.14 -1.31 0.39 0.72 0.35 45 2.10 0.62 -1.00 -1.18 0.43 0.32 0.37 46 1.48 0.49 -0.63 -0.39 -1.02 0.44 0.38 47 0.30 0.25 1.06 0.49 -0.67 0.68 -0.43 48 -0.17 0.56 -0.82 -0.80 1.36 0.76 0.24 49 1.68 0.57 -0.99 -0.91 0.00 0.40 0.46 50 1.29 0.61 -1.27 -1.36 -0.21 0.48 0.65

Mean S.D.

-0.00 0.86

0.34 0.17

0.11 0.78

0.03 0.68

-0.16 0.80

0.71

Groups Note: Raw score mean = 35.72 with a S.D. of 7 Mean person ability = 1.21 with a S.D. of 0.99. Test reliability (K.R. 20) = 0.87. Reliability of person separation = 0.85.

83.

141


Experiment 30: Summary of Item Fit Information for a Uniformly Distributed Item Difficulty Distribution With 100 Items. 25 Persons, and No Guessing



1 -0.95 0.16 0.28 0.69 -0.07 0.80 0.00 2 -1.26 0.34 -0.09 0.02 -0.78 0.84 0.07 3 -0.95 0.50 -0.44 -0.59 -0.07 0.80 0.13 4 -0.03 0.46 -0.54 -0.45 1.11 0.64 0.31 5 -1.26 0.55 -0.70 -0.65 1.08 0.84 0.14 6 -1.26 -0.16 1.80 0.99 1.86 0.84 -0.51 7 -0.69 0.24 -0.01 0.64 0.48 0.76 0.05 8 -2.11 0.45 -0.47 -0.13 0.35 0.92 0.07 9 -0.45 0.46 -0.57 -0.36 -0.49 0.72 0.21

10 -0.95 -0.00 0.82 1.08 -0.42 0.80 -0.13 11 -1.63 0.07 0.47 0.56 -0.99 0.88 0.00 12 -0.45 0.33 -0.01 0.18 -0.21 0.72 0.06 13 -0.69 0.41 -0.28 -0.19 -1.57 0.76 0.14 14 -0.03 0.36 0.03 0.11 -0.16 0.64 0.09 15 -0.23 0.55 -0.92 -0.90 0.18 0.68 0.36 16 -0.45 0.46 -0.32 -0.50 -0.21 0.72 0.17 17 -0.45 0.38 -0.27 0.03 0.96 0.72 0.11 18 -0.45 0.40 -0.32 -0.10 0.96 0.72 0.17 19 -0.45 0.46 -0.30 -0.54 -0.21 0.72 0.16 20 0.36 0.40 -0.29 0.04 -0.22 0.56 0.24 21 -0.45 0.30 -0.05 0.38 -0.21 0.72 0.08 22 -0.69 0.54 -0.78 -0.73 1.74 0.76 0.23 23 -1.26 0.51 -0.37 -0.61 -0.78 0.84 0.10 24 -0.95 0.31 -0.11 0.21 -0.42 0.80 0.08 25 -0.03 0.37 -0.12 0.11 0.76 0.64 0.09 26 -1.63 0.36 -0.17 -0.04 -0.99 0.88 0.07 27 -0.03 0.61 -1.18 -1.47 0.76 0.64 0.51 28 -0.45 0.47 -0.52 -0.45 -0.21 0.72 0.22 29 -0.95 0.21 0.22 0.46 -0.42 0.80 0.02 30 -0.69 0.06 1.35 0.88 -1.57 0.76 -0.46 31 0.74 0.62 -1.56 -1.69 0.87 0.48 0.82 32 -0.23 0.57 -0.91 -1.14 -1.13 0.68 0.37 33 -0.23 0.52 -0.57 -0.87 0.18 0.68 0.26 34 -0.95 0.22 0.33 0.40 -0.42 0.80 -0.03 35 -0.69 0.62 -1.00 -1.15 0.48 0.76 0.27 36 -0.23 0.51 -0.75 -0.70 -1.13 0.68 0.32 37 -0.95 0.34 -0.04 0.04 -0.42 0.80 0.07 38 -0.45 0.36 -0.17 0.09 -0.49 0.72 0.11 39 -0.69 0.55 -0.66 -0.91 0.48 0.76 0.21 40 -0.45 -0.17 1.82 1.99 2.24 0.72 -0.71


142




41 -0.23 0.35 -0.20 0.22 -1.13 0.68 0.12 42 0.55 0.36 0.59 0.06 0.17 0.52 -0.39 43 0.17 0.29 0.65 0.44 -1.08 0.60 -0.27 44 -0.45 0.53 -0.71 -0.76 0.96 0.72 0.26 45 0.17 0.32 0.59 0.22 -1.08 0.60 -0.24 46 -0.23 0.06 0.86 1.57 1.68 0.68 -0.29 47 -0.23 0.52 -0.63 -0.83 0.18 0.68 0.29 48 0.36 0.33 0.20 0.45 -0.22 0.56 -0.04 49 0.74 0.41 0.12 -0.26 0.87 0.48 -0.12 50 0.55 0.28 0.57 0.78 -1.26 0.52 -0.25 51 -0.03 0.48 -0.69 -0.51 -0.68 0.64 0.34 52 -0.23 0.21 1.64 0.40 0.53 0.68 -0.85 53 0.36 0.28 0.42 0.79 -0.22 0.56 -0.18 54 -0.45 0.54 -0.78 -0.84 -0.21 0.72 0.28 55 -0.23 0.59 -0.86 -1.37 0.18 0.68 0.35 56 0.17 0.60 -1.25 -1.50 0.09 0.60 0.61 57 0.55 0.55 -1.14 -1.12 -1.26 0.52 0.65 58 -0.23 0.05 2.73 0.86 -1.13 0.68 -1.65 59 -0.45 0.41 -0.32 -0.18 -0.49 0.72 0.15 60 0.93 0.52 -0.94 -0.71 0.40 0.44 0.53 61 -0.23 0.20 0.46 0.88 1.68 0.68 -0.09 62 -0.03 0.60 -1.09 -1.39 0.76 0.64 0.49 63 0.74 0.22 1.35 0.92 -0.39 0.48 -0.89 64 1.12 0.28 1.18 0.40 -0.71 0.40 -0.62 65 0.55 0.32 0.29 0.52 0.40 0.52 -0.10 66 0.74 -0.00 2.51 2.16 0.93 0.48 -1.77 67 0.17 0.33 -0.02 0.43 0.09 0.60 0.05 68 0.17 0.21 0.54 1.13 -1.08 0.60 -0.15 69 -0.45 0.57 -0.75 -1.05 0.96 0.72 0.27 70 -0.03 0.26 0.38 0.66 1.11 0.64 -0.10 71 1.12 0.44 -0.25 -0.25 -0.71 0.40 0.20 72 0.74 0.44 0.02 -0.45 -0.46 0.48 -0.07 73 -0.69 0.48 -0.37 -0.58 -1.57 0.76 0.15 74 0.74 0.34 -0.03 0.50 1.96 0.48 0.14 75 0.17 0.38 -0.26 0.15 0.50 0.60 0.22 76 -0.69 0.34 -0.18 0.19 0.48 0.76 0.10 77 0.36 0.49 -0.77 -0.67 -0.66 0.56 0.45 78 0.74 0.46 -0.25 -0.57 -0.46 0.48 0.15 79 1.12 0.17 0.98 1.32 0.76 0.40 -0.44 80 -0.03 0.20 0.97 0.85 -0.16 0.64 -0.44 81 0.74 0.55 -1.07 -1.15 0.87 0.48 0.60 82 0.93 0.56 -1.16 -1.11 0.40 0.44 0.62 83 1.53 0.52 -0.71 -0.58 0.40 0.32 0.29



143



84 1.32 0.51 -0.68 -0.61 0.79 0.36 0.35 85 0.55 0.52 -0.71 -1.05 0.17 0.52 0.38 86 0.55 0.39 -0.24 0.10 -1.26 0.52 0.22 87 1.75 0.22 1.62 0.33 -0.55 0.28 -0.72 88 1.12 0.29 0.52 0.61 0.76 0.40 -0.15 89 0.93 0.19 1.54 1.07 0.23 0.44 -0.98 90 0.74 0.16 1.50 1.41 1.96 0.48 -0.93 91 0.55 0.29 0.58 0.66 0.17 0.52 -0.31 92 -0.03 -0.15 2.09 2.58 2.14 0.64 -1.06 93 0.36 0.20 0.98 1.13 -0.22 0.56 -0.47 94 1.53 0.36 -0.05 0.26 0.50 0.32 0.10 95 0.55 0.37 0.22 0.11 0.17 0.52 -0.11 96 1.12 0.33 0.41 0.34 -0.11 0.40 -0.21 97 0.74 0.31 0.77 0.36 0.87 0.48 -0.50 98 0.93 0.03 2.20 1.95 1.40 0.44 -1.38 99 0.17 0.46 -0.58 -0.46 0.09 0.60 0.36

100 1.53 0.44 -0.43 -0.14 0.40 0.32 0.18 Mean 0.00 0.36 0.05 0.03 0.10 0.62 S.D. 0.78 0.18 0.89 0.85 0.88 Groups 2 Note: Raw score mean = 62.36 with a S.D. of 17. Mean person ability = 0.66 with a S.D. of 0.95. Test reliability (K.R. 20) = 0.94. Reliability of person separation = 0.93.

01.

144





1 -0.94 0.27 -0.20 0.14 -0.35 0.86 0.07 2 -0.77 0.23 0.34 0.13 -0.83 0.84 -0.00 3 -1.36 0.31 -0.37 0.01 0.71 0.90 0.07 4 -0.21 0.41 -0.39 -0.43 1.04 0.76 0.11 5 -0.77 0.28 -0.18 0.13 -0.01 0.84 0.07 6 -0.09 0.41 -0.26 -0.55 -1.11 0.74 0.11 7 -0.94 0.20 0.43 0.24 -0.35 0.86 -0.02 8 -0.47 0.30 0.05 0.07 -0.85 0.80 0.02 9 -0.33 0.29 0.67 -0.08 -0.60 0.78 -0.13

10 0.25 0.33 -0.15 0.39 0.81 0.68 0.10 11 -0.09 0.26 0.11 0.65 -0.12 0.74 0.04 12 0.46 0.38 -0.20 -0.01 0.07 0.64 0.13 13 -0.47 0.35 -0.40 -0.06 -0.85 0.80 0.11 14 -0.09 0.38 -0.54 -0.08 1.27 0.74 0.15 15 0.25 0.39 -0.30 -0.23 -0.19 0.68 0.15 16 0.36 0.41 -0.34 -0.40 0.45 0.66 0.16 17 0.25 0.35 -0.07 0.09 -1.15 0.68 0.06 18 -0.21 0.36 -0.37 -0.06 1.04 0.76 0.12 19 0.25 0.07 1.38 2.15 1.64 0.68 -0.38 20 0.76 0.49 -0.85 -0.98 -0.01 0.58 0.35 21 0.56 0.36 -0.06 0.29 -0.37 0.62 0.09 22 0.66 0.36 0.16 0.21 -0.44 0.60 -0.01 23 0.85 0.22 1.54 1.46 0.39 0.56 -0.68 24 1.24 0.48 -0.42 -0.80 -0.43 0.48 0.14 25 0.85 0.49 -0.77 -0.99 0.38 0.56 0.34

Mean 0.00 0.33 -0.05 0.05 0.01 0.71 S.D. 0.66 0.10 0.57 0.67 0.76 Groups 2 Note: Raw score mean = 17.84 and a S. Mean person ability =1.18 with a S.D. Test reliability (K.R. 20) = 0.71. Reliability of person separation = 0.73.

D. of 3.90. of 1.02.

145





1 -0.45 0.24 0.81 0.97 1.66 0.76 -0.12 2 -0.73 0.51 -0.49 -0.80 -1.55 0.80 0.14 3 -0.89 0.41 -0.17 -0.10 -0.03 0.82 0.09 4 -1.69 0.37 0.15 -0.26 -0.83 0.90 0.03 5 -0.89 0.49 -0.41 -0.69 -0.03 0.82 0.11 6 -0.58 0.31 -0.00 0.80 0.55 0.78 0.01 7 -0.31 0.19 0.74 1.63 0.28 0.74 -0.18 8 -0.73 0.26 0.59 0.74 0.30 0.80 -0.06 9 -0.19 0.44 -0.11 -0.15 -1.06 0.72 0.10

10 -0.07 0.45 -0.36 -0.11 -0.53 0.70 0.16 11 -0.89 0.35 0.45 0.09 -0.80 0.82 -0.06 12 -0.45 0.41 -0.06 0.05 -0.40 0.76 0.08 13 -0.89 0.51 -0.60 -0.74 -0.03 0.82 0.14 14 -0.45 0.28 1.00 0.53 0.68 0.76 -0.21 15 -0.19 0.43 -0.28 0.00 -1.06 0.72 0.12 16 -0.73 0.24 0.91 0.73 0.30 0.80 -0.16 17 -0.31 0.36 0.02 0.56 -1.19 0.74 0.06 18 -0.73 0.40 0.29 -0.26 0.30 0.80 -0.01 19 -0.45 0.18 1.24 1.26 1.66 0.76 -0.27 20 -0.07 0.36 0.38 0.41 -0.62 0.70 -0.03 21 0.05 0.43 -0.05 -0.05 -0.10 0.68 0.07 22 -0.07 0.48 0.00 -0.63 -0.53 0.70 0.04 23 -0.45 0.34 0.39 0.34 1.66 0.76 -0.01 24 -0.89 0.44 -0.45 -0.09 1.24 0.82 0.09 25 -0.73 0.48 0.08 -0.74 0.28 0.80 0.01 26 0.38 0.57 -0.98 -1.47 0.01 0.62 0.37 27 0.58 0.50 -0.48 -0.64 -0.18 0.58 0.23 28 0.27 0.59 -1.07 -1.67 1.47 0.64 0.37 29 0.27 0.50 -0.05 -0.92 0.62 0.64 -0.07 30 -0.19 0.12 1.09 2.23 2.59 0.72 -0.23 31 0.48 0.48 0.04 -0.64 0.40 0.60 0.02 32 -0.31 0.46 -0.43 -0.21 -0.03 0.74 0.14 33 0.27 0.44 0.29 -0.30 0.62 0.64 -0.19 34 0.58 0.37 0.36 0.77 0.96 0.58 -0.04 35 0.48 0.43 0.07 0.09 -0.70 0.60 0.06 36 0.16 0.46 -0.40 -0.20 -0.98 0.66 0.19 37 0.16 0.63 -1.34 -2.05 1.98 0.66 0.41 38 0.79 0.36 1.28 0.48 -0.30 0.54 -0.76 39 1.09 0.27 1.13 1.77 1.40 0.48 -0.39 40 0.79 0.57 -1.09 -1.46 -0.30 0.54 0.45



146

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit 41 0.69 0.55 -0.80 -1.19 0.26 42 0.16 0.44 -0.06 -0.07 -0.29 43 0.79 0.39 0.63 0.40 -0.30 44 0.79 0.48 -0.52 -0.32 -1.11 45 0.99 0.25 1.21 2.02 1.77 46 0.79 0.39 0.27 0.61 -1.11 47 1.09 0.40 0.02 0.57 0.60 48 1.19 0.59 -1.22 -1.81 0.68 49 0.89 0.53 -0.69 -1.10 0.19 50 0.58 0.37 0.38 0.77 0.05

Mean S.D. Groups

-0.00 0.67

0.41 0.11

0.03 0.66

-0.02 0.95

0.17 0.94 2

Mean item score

0.56 0.66 0.54 0.54 0.50 0.54 0.48 0.46 0.52 0.58

Logit Residual

Index

0.34 0.10

-0.25 0.27

-0.42 -0.03 0.09 0.47 0.31

-0.06


0.68

147



Item Logit Point. Unwt. Wt. Ability Mean Logit

Item item bis. total total between item Residual # diff. corr. fit fit fit score Index 1 -1.66 0.14 1.96 0.30 -1.29 0.90 -0.48 2 -0.73 0.46 -0.66 -0.31 0.52 0.80 0.13 3 -0.88 0.34 -0.17 0.30 0.23 0.82 0.06 4 -0.73 0.19 1.73 0.73 -0.06 0.80 -0.42 5 -1.43 0.24 0.01 0.52 -0.96 0.88 0.05 6 -1.05 0.38 -0.47 0.10 1.22 0.84 0.09 7 -0.45 0.43 -0.26 -0.12 -1.14 0.76 0.08 8 -0.45 0.37 0.04 0.25 -1.14 0.76 0.04 9 -0.45 0.34 0.19 0.39 -1.14 0.76 -0.01

10 -0.32 0.46 -0.34 -0.39 -1.09 0.74 0.09 11 -0.88 0.38 -0.01 -0.05 0.23 0.82 0.04 12 -0.88 0.17 0.66 1.05 0.38 0.82 -0.06 13 -1.43 0.28 0.29 0.14 0.33 0.88 0.01 14 -0.20 0.39 0.30 0.07 0.57 0.72 -0.02 15 -1.05 0.13 1.30 0.84 1.93 0.84 -0.19 16 -0.73 0.27 0.44 0.59 -0.94 0.80 -0.03 17 -1.23 0.22 0.49 0.57 -0.47 0.86 -0.02 18 -1.66 0.24 0.33 0.26 -1.29 0.90 0.00 19 -0.88 0.50 -0.84 -0.62 0.23 0.82 0.13 20 -0.88 0.30 0.65 0.18 0.38 0.82 -0.08 21 -1.05 0.41 -0.40 -0.13 -0.09 0.84 0.08 22 -0.20 0.43 -0.34 0.04 -0.53 0.72 0.13 23 -0.73 0.17 0.75 1.07 2.93 0.80 -0.05 24 -0.20 0.45 -0.39 -0.20 0.57 0.72 0.13 25 -0.32 0.52 -0.75 -0.83 -1.09 0.74 0.18 26 -1.05 0.38 -0.23 -0.05 -0.68 0.84 0.07 27 -0.88 0.22 0.46 0.79 -1.36 0.82 -0.03 28 -0.59 0.40 0.15 -0.11 -0.45 0.78 -0.00 29 -0.73 0.40 -0.10 -0.04 -0.94 0.80 0.04 30 -0.32 0.37 -0.10 0.39 -0.12 0.74 0.06 31 -0.20 0.42 -0.22 0.01 -0.53 0.72 0.08 32 -0.08 0.59 -1.31 -1.25 0.92 0.70 0.31 33 -0.45 0.28 0.47 0.76 0.32 0.76 -0.05 34 0.03 0.30 0.75 0.97 -0.30 0.68 -0.17 35 -0.45 0.39 -0.06 0.10 -1.14 0.76 0.06 36 -0.45 0.50 -0.89 -0.48 1.94 0.76 0.18 37 -0.08 0.56 -0.84 -1.16 -1.26 0.70 0.23 38 -0.45 0.28 1.07 0.48 0.32 0.76 -0.20 39 -0.73 0.41 -0.31 -0.12 -0.94 0.80 0.09 40 -0.08 0.37 -0.02 0.44 0.15 0.70 0.05

(appendix continues")

148

Appendix C (continued!


# diff. corr. fit fit fit score Index 41 -0.59 0.45 -0.48 -0.22 0.79 0.78 0.10 42 -0.32 0.47 -0.47 -0.33 1.29 0.74 0.12 43 0.03 0.60 -1.14 -1.49 0.29 0.68 0.30 44 0.14 0.59 -1.33 -1.36 0.64 0.66 0.33 45 0.14 0.27 1.46 1.06 1.23 0.66 -0.44 46 0.36 0.54 -0.67 -0.96 1.29 0.62 0.19 47 0.36 0.55 -1.03 -0.97 -0.65 0.62 0.29 48 0.25 0.58 -1.18 -1.27 0.97 0.64 0.35 49 0.66 0.51 -0.53 -0.71 -0.23 0.56 0.18 50 -0.08 0.35 0.44 0.45 1.12 0.70 -0.04 51 -0.32 0.47 -0.32 -0.42 -1.09 0.74 0.10 52 0.46 0.63 -1.63 -1.82 0.81 0.60 0.50 53 0.03 0.44 -0.34 -0.03 -0.30 0.68 0.11 54 -0.08 0.40 -0.05 0.26 -1.26 0.70 0.03 55 -0.08 0.44 0.66 -0.43 -0.10 0.70 -0.21 56 -0.32 0.26 0.92 0.84 -0.12 0.74 -0.19 57 0.36 0.24 2.09 1.34 1.30 0.62 -0.77 58 -0.32 0.46 -0.57 -0.22 -1.09 0.74 0.13 59 -0.08 0.49 -0.53 -0.46 -1.26 0.70 0.17 60 0.03 0.38 0.32 0.36 -0.96 0.68 -0.07 61 0.46 0.60 -1.51 -1.39 2.30 0.60 0.45 62 -0.20 0.53 -0.68 -0.90 -0.53 0.72 0.18 63 0.25 0.52 -0.80 -0.65 -1.38 0.64 0.27 64 0.36 0.40 0.78 0.05 -0.65 0.62 -0.29 65 -0.08 0.33 0.45 0.66 0.15 0.70 -0.07 66 0.96 0.57 -1.33 -1.11 0.28 0.50 0.48 67 0.56 0.54 -1.04 -0.75 -0.80 0.58 0.37 68 0.66 0.38 1.13 0.38 -0.23 0.56 -0.44 69 0.66 0.50 -0.50 -0.60 -0.23 0.56 0.18 70 0.77 0.49 -0.21 -0.55 -0.87 0.54 0.08 71 -0.08 0.49 -0.78 -0.34 0.92 0.70 0.19 72 0.46 0.28 1.34 1.37 0.92 0.60 -0.47 73 0.56 0.33 0.68 1.04 2.06 0.58 -0.14 74 0.86 0.59 -1.40 -1.39 -0.24 0.52 0.49 75 0.03 0.63 -1.59 -1.65 1.99 0.68 0.37 76 0.46 0.39 0.13 0.61 0.00 0.60 0.02 77 0.25 0.46 0.32 -0.34 -1.38 0.64 -0.12 78 0.77 0.35 1.20 0.75 0.53 0.54 -0.47 79 1.36 0.47 -0.56 -0.22 0.33 0.42 0.24 80 0.36 0.27 0.88 1.52 0.45 0.62 -0.24 81 0.77 0.35 1.05 0.76 -0.48 0.54 -0.42 82 1.57 0.33 0.59 0.94 0.16 0.38 -0.13 83 0.66 0.37 0.66 0.63 -1.23 0.56 -0.23



149



84 0.86 0.22 2.00 1.91 0.95 0.52 -0.79 85 0.25 0.65 -1.87 -1.91 0.97 0.64 0.49 86 0.66 0.40 0.53 0.29 -1.23 0.56 -0.17 87 0.86 0.34 0.95 0.93 -1.22 0.52 -0.35 88 0.96 0.38 0.72 0.59 0.51 0.50 -0.26 89 0.46 0.32 0.78 1.00 -1.40 0.60 -0.24 90 1.26 0.53 -1.02 -0.79 -0.04 0.44 0.37 91 1.06 0.42 0.46 0.10 0.01 0.48 -0.18 92 1.06 0.46 -0.27 -0.11 -1.33 0.48 0.12 93 0.56 0.38 0.68 0.57 -0.52 0.58 -0.23 94 0.77 0.47 -0.28 -0.31 -0.87 0.54 0.12 95 0.96 0.46 0.22 -0.24 -0.84 0.50 -0.11 96 0.77 0.28 1.60 1.29 -0.48 0.54 -0.63 97 1.06 0.61 -1.64 -1.64 0.76 0.48 0.56 98 0.36 0.45 -0.48 0.11 -0.64 0.62 0.18 99 0.46 0.29 1.10 1.25 -1.40 0.60 -0.33

100 1.47 0.38 0.97 0.37 0.72 0.40 -0.34 Mean -0.00 0.40 0.02 0.01 -0.07 0.67 S.D. 0.72 0.12 0.88 0.81 0.98 Groups 2 Note: Raw score mean = 67.44 with a S.D. of 18.91. Mean person ability = 0.96 with a S.D. of 1.08. Test reliability (K.R. 20) = 0.95. Reliability of person separation = 0.94.

150





1 -0.52 0.37 -0.29 0.06 -0.04 0.80 0.06 2 -1.02 0.30 -0.17 0.25 -0.70 0.86 0.05 3 -1.11 0.38 -0.69 -0.26 0.52 0.87 0.08 4 -1.33 0.32 0.48 -0.29 -1.36 0.89 -0.06 5 -0.68 0.47 -0.63 -1.01 -0.58 0.82 0.08 6 -0.84 0.50 -1.47 -0.94 1.82 0.84 0.13 7 -1.44 0.28 -0.13 0.05 -0.85 0.90 0.04 8 -0.06 0.23 2.44 1.29 1.16 0.73 -0.58 9 -0.60 0.35 0.30 -0.01 0.66 0.81 -0.03

10 0.00 0.44 -0.19 -0.48 0.12 0.72 0.05 11 -0.52 0.30 0.19 0.55 -1.35 0.80 0.01 12 0.18 0.44 -0.16 -0.44 0.13 0.69 0.06 13 -0.06 0.47 -1.02 -0.59 1.33 0.73 0.17 14 -0.38 0.38 0.37 -0.27 -0.89 0.78 -0.04 15 0.40 0.44 -0.47 -0.12 -0.29 0.65 0.13 16 0.40 0.37 0.93 0.48 -1.42 0.65 -0.22 17 0.00 0.20 1.26 2.06 2.66 0.72 -0.17 18 0.71 0.41 0.33 0.28 -1.02 0.59 -0.06 19 0.66 0.48 -0.69 -0.43 -0.32 0.60 0.18 20 0.66 0.41 -0.06 0.58 -0.45 0.60 0.03 21 0.97 0.61 -2.07 -2.57 1.79 0.54 0.49 22 1.12 0.38 0.82 0.78 -0.43 0.51 -0.22 23 1.07 0.38 0.56 1.04 1.19 0.52 -0.09 24 1.27 0.43 0.34 0.16 -0.83 0.48 -0.10 25 1.12 0.40 0.28 0.67 -0.43 0.51 -0.03

Mean 0.00 0.39 0.01 0.03 0.02 0.70 S.D. 0.82 0.09 0.90 0.88 1.11 Groups 2 Note: Raw score mean = 17.661 and a S.D. of 4.42. Mean person ability =1.18 with aS.D.of l . l l . Test reliability (K.R. 20) = 0.79. Reliability of person separation = 0.76.

151


Experiment 35- Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 50 Items. 100 Persons, and No Guessing

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -0.73 0.42 -0.22 -0.26 -1.29 0.79 0.06 2 -0.88 0.36 0.43 0.07 0.34 0.81 -0.03 3 -1.04 0.32 0.63 0.43 -1.30 0.83 -0.08 4 -0.45 0.31 1.11 0.77 1.17 0.75 -0.17 5 -1.22 0.33 -0.12 0.32 -0.75 0.85 0.04 6 -0.66 0.31 0.09 0.99 0.41 0.78 0.02 7 -0.96 0.40 -0.03 -0.16 -0.87 0.82 0.00 8 -0.09 0.34 2.57 0.76 0.37 0.69 -0.88 9 -0.26 0.40 0.17 0.17 -0.43 0.72 0.01

10 -0.88 0.38 0.25 -0.13 -0.78 0.81 -0.01 11 -1.32 0.12 2.74 0.89 3.22 0.86 -0.53 12 -0.20 0.37 0.14 0.60 -0.79 0.71 -0.00 13 -0.88 0.44 -0.64 -0.37 0.50 0.81 0.10 14 -0.52 0.37 -0.17 0.57 -0.15 0.76 0.06 15 -0.66 0.48 0.33 -1.03 -0.85 0.78 -0.07 16 -0.80 0.41 -0.16 -0.19 -1.19 0.80 0.04 17 -0.88 0.29 0.49 0.72 -0.78 0.81 -0.02 18 -0.66 0.43 0.38 -0.41 -0.85 0.78 -0.07 19 -0.33 0.43 -0.16 -0.01 -0.38 0.73 0.05 20 -0.33 0.49 0.43 -0.92 1.18 0.73 -0.15 21 -0.09 0.34 0.93 0.92 0.37 0.69 -0.17 22 0.14 0.45 -0.45 -0.12 -0.95 0.65 0.15 23 0.30 0.54 -0.68 -1.50 0.88 0.62 0.15 24 -0.20 0.48 -0.80 -0.61 -0.78 0.71 0.18 25 -0.03 0.50 -0.90 -0.69 -0.78 0.68 0.20 26 -0.39 0.47 -0.46 -0.64 0.26 0.74 0.09 27 -0.52 0.45 -0.49 -0.30 0.63 0.76 0.09 28 0.25 0.49 -0.74 -0.60 -0.06 0.63 0.16 29 0.51 0.47 -0.47 -0.32 -0.75 0.58 0.15 30 0.08 0.37 0.47 0.87 0.95 0.66 -0.08 31 0.35 0.48 -0.50 -0.54 -0.24 0.61 0.13 32 0.46 0.38 1.91 0.69 0.38 0.59 -0.69 33 0.30 0.46 -0.21 -0.30 0.88 0.62 0.07 34 0.56 0.36 1.12 1.11 -1.26 0.57 -0.28 35 0.41 0.56 -1.66 -1.63 -0.78 0.60 0.38 36 -0.20 0.54 -1.42 -1.18 2.10 0.71 0.23 37 0.76 0.45 0.02 0.08 -0.11 0.53 0.00 38 0.61 0.36 0.72 1.24 0.80 0.56 -0.12 39 0.35 0.43 -0.33 0.32 0.27 0.61 0.14 40 0.66 0.55 -1.48 -1.39 0.14 0.55 0.39

(appendix continues")

152




41 0.91 0.41 0.36 0.56 -1.41 0.50 -0.06 42 0.76 0.49 -0.89 -0.38 -0.11 0.53 0.28 43 0.76 0.36 1.11 1.28 -0.11 0.53 -0.28 44 0.91 0.43 0.04 0.48 -0.45 0.50 0.05 45 0.86 0.38 0.65 1.07 -0.94 0.51 -0.13 46 1.01 0.42 0.98 0.11 0.97 0.48 -0.31 47 0.81 0.46 0.06 -0.24 -0.32 0.52 -0.02 48 0.96 0.51 -0.95 -0.92 -0.91 0.49 0.26 49 1.27 0.44 0.55 0.01 0.11 0.43 -0.14 50 1.17 0.51 -1.08 -0.85 0.71 0.45 0.30

Mean -0.00 0.42 0.07 -0.01 -0.07 0.66 S.D. 0.70 0.08 0.90 0.75 0.93 Groups 2 Note: Raw score mean = 33.23 with a S.D. of 9.92. Mean person ability = 0.92 with a S.D. of 1.18. Test reliability (K.R. 20) = 0.91. Reliability of person separation = 0.89.

153


Experiment 36: Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 100 Items. 100 Persons, and No Guessing



1 -0.50 0.45 -0.36 -0.13 0.28 0.77 0.09 2 -1.36 0.42 -0.57 -0.03 0.24 0.87 0.07 3 -0.97 0.42 0.70 -0.45 0.09 0.83 -0.10 4 -1.25 0.42 0.24 -0.41 -0.94 0.86 -0.01 5 -1.71 0.46 -0.74 -0.53 -0.46 0.90 0.06 6 -1.36 0.41 -0.50 -0.05 0.24 0.87 0.08 7 -1.15 0.22 0.29 1.42 -0.60 0.85 0.00 8 -0.88 0.30 0.47 0.92 0.79 0.82 -0.05 9 -1.06 0.40 0.66 -0.25 0.38 0.84 -0.06

10 -1.36 0.40 -0.13 -0.21 0.14 0.87 0.05 11 -1.15 0.26 2.22 0.53 0.66 0.85 -0.41 12 -0.65 0.43 0.09 -0.20 -0.04 0.79 0.02 13 -0.65 0.47 -0.38 -0.53 -1.39 0.79 0.08 14 -0.43 0.35 0.70 0.66 -1.07 0.76 -0.09 15 -0.57 0.24 1.36 1.43 0.60 0.78 -0.18 16 -0.65 0.41 0.36 -0.09 0.86 0.79 -0.01 17 -0.72 0.35 0.50 0.57 0.25 0.80 -0.05 18 -0.37 0.34 0.85 0.86 0.62 0.75 -0.13 19 -0.30 0.37 1.80 0.26 -0.56 0.74 -0.41 20 -0.57 0.40 0.25 0.12 0.60 0.78 -0.01 21 -0.43 0.45 -0.20 -0.22 0.49 0.76 0.05 22 -0.72 0.47 -0.38 -0.46 -0.45 0.80 0.07 23 -0.17 0.30 1.10 1.32 1.29 0.72 -0.18 24 -0.80 0.43 -0.28 -0.19 0.52 0.81 0.07 25 -0.50 0.45 -0.18 -0.25 -0.67 0.77 0.06 26 -0.43 0.37 0.27 0.60 0.05 0.76 -0.01 27 -0.43 0.43 -0.19 0.06 0.05 0.76 0.06 28 -0.43 0.40 -0.06 0.39 0.05 0.76 0.02 29 -0.30 0.44 0.02 -0.16 -0.56 0.74 0.04 30 -0.05 0.50 -0.32 -0.80 1.57 0.70 0.02 31 -0.72 0.47 -0.70 -0.42 -0.45 0.80 0.12 32 -0.50 0.36 0.54 0.58 -0.67 0.77 -0.07 33 -0.57 0.45 -0.31 -0.26 -1.13 0.78 0.05 34 0.01 0.46 -0.57 -0.06 -1.42 0.69 0.15 35 -0.43 0.45 -0.56 -0.06 0.49 0.76 0.11 36 -0.30 0.51 -0.59 -0.85 0.09 0.74 0.12 37 -0.24 0.45 -0.27 -0.01 -0.59 0.73 0.08 38 -0.50 0.36 0.66 0.57 -0.67 0.77 -0.07 39 -0.37 0.47 -0.29 -0.41 -0.15 0.75 0.08 40 -0.05 0.55 -1.10 -1.29 0.23 0.70 0.21


154


Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

41 0.06 0.48 -0.57 -0.25 -0.06 0.68 0.15 42 -0.24 0.37 0.64 0.53 0.86 0.73 -0.06 43 -0.05 0.45 0.35 -0.35 -0.67 0.70 -0.08 44 -0.50 0.47 -0.44 -0.38 -0.74 0.77 0.06 45 0.06 0.50 -0.80 -0.60 0.29 0.68 0.20 46 -0.05 0.39 1.23 0.34 -0.91 0.70 -0.25 47 -0.05 0.49 -0.93 -0.39 -0.91 0.70 0.19 48 -0.11 0.44 -0.13 -0.07 -0.53 0.71 0.03 49 0.06 0.35 0.52 1.32 0.98 0.68 -0.03 50 -0.24 0.53 -1.01 -1.04 -0.59 0.73 0.19 51 -0.11 0.40 0.94 0.31 0.34 0.71 -0.19 52 0.01 0.26 1.40 1.97 2.36 0.69 -0.27 53 0.06 0.55 -1.13 -1.34 0.68 0.68 0.23 54 -0.11 0.52 -0.80 -0.85 -0.02 0.71 0.16 55 0.18 0.40 0.10 0.73 1.12 0.66 0.04 56 -0.24 0.35 0.54 0.87 0.09 0.73 -0.04 57 0.06 0.44 -0.32 0.30 -0.56 0.68 0.10 58 0.56 0.40 0.80 0.67 -0.13 0.59 -0.19 59 0.34 0.39 0.75 0.67 -0.27 0.63 -0.15 60 -0.05 0.49 -0.45 -0.65 -0.67 0.70 0.11 61 0.61 0.41 0.54 0.61 -0.48 0.58 -0.20 62 0.34 0.48 0.06 -0.52 -1.36 0.63 -0.07 63 0.56 0.49 -0.35 -0.58 0.16 0.59 0.11 64 -0.17 0.49 -0.79 -0.45 -1.45 0.72 0.12 65 -0.05 0.42 0.12 0.24 0.06 0.70 0.03 66 0.61 0.35 1.73 1.11 -0.48 0.58 -0.55 67 0.34 0.29 3.21 1.50 0.99 0.63 -1.06 68 0.23 0.45 -0.04 0.13 1.30 0.65 0.03 69 0.45 0.46 -0.52 0.13 0.31 0.61 0.14 70 0.29 0.44 0.20 -0.01 -0.60 0.64 -0.02 71 0.91 0.55 -1.37 -1.54 0.20 0.52 0.36 72 0.23 0.48 -0.13 -0.46 -1.00 0.65 0.05 73 0.34 0.38 1.37 0.75 0.34 0.63 -0.38 74 0.50 0.40 1.13 0.52 -0.63 0.60 -0.34 75 0.91 0.50 -0.84 -0.57 -1.20 0.52 0.26 76 0.23 0.39 2.20 0.43 -0.04 0.65 -0.77 77 0.61 0.43 0.31 0.32 -1.35 0.58 -0.06 78 0.66 0.47 -0.20 -0.22 0.03 0.57 0.05 79 0.61 0.40 0.32 0.87 -0.48 0.58 -0.03 80 0.56 0.44 -0.12 0.38 0.57 0.59 0.09 81 0.71 0.40 0.83 0.71 -1.51 0.56 -0.21 82 0.56 0.52 -1.17 -0.85 0.16 0.59 0.30 83 1.01 0.53 -1.21 -1.06 0.16 0.50 0.34



155



84 0.66 0.53 -0.97 -1.19 1.28 0.57 0.25 85 0.29 0.51 -0.76 -0.82 0.90 0.64 0.19 86 0.34 0.37 1.03 0.85 -1.36 0.63 -0.29 87 1.32 0.52 -0.83 -1.05 0.86 0.44 0.23 88 0.96 0.54 -1.26 -1.27 -0.18 0.51 0.34 89 1.01 0.39 0.43 0.99 -1.14 0.50 -0.05 90 0.50 0.37 0.61 1.25 1.94 0.60 -0.05 91 0.91 0.53 -1.07 -1.23 0.20 0.52 0.29 92 1.17 0.47 -0.40 -0.18 -0.62 0.47 0.15 93 1.32 0.48 -0.45 -0.54 0.22 0.44 0.13 94 0.81 0.45 0.38 -0.10 -0.51 0.54 -0.12 95 0.40 0.43 0.21 0.35 -0.85 0.62 -0.01 96 1.22 0.57 -1.69 -1.89 0.81 0.46 0.43 97 1.22 0.55 -0.83 -1.78 1.38 0.46 0.18 98 1.07 0.49 0.31 -0.66 -0.62 0.49 -0.20 99 1.07 0.46 0.03 -0.23 -0.21 0.49 -0.02

100 0.71 0.44 1.39 -0.19 0.95 0.56 -0.63 Mean 0.00 0.43 0.07 -0.02 -0.04 0.68 S.D. 0.68 0.07 0.86 0.76 0.80 Groups 2 Note: Raw score mean = 68.05 with a S.D. of 20. Mean person ability = 1.00 with a S.D. of 1.20. Test reliability (K.R. 20) = 0.96. Reliability of person separation = 0.95.

13.

156


Experiment 37: Summary of Item Fit Information for a Uniformly Distributed Item Difficulty Distribution With 25 Items. 25 Persons, and a 25% Chance of Guessing Correctly



1 -0.64 0.60 -0.41 -0.77 0.70 0.80 0.16 2 -1.43 0.21 0.46 1.07 0.10 0.88 0.05 3 -1.43 0.59 -0.26 -0.95 0.10 0.88 0.09 4 -1.43 0.24 1.54 0.33 0.46 0.88 -0.47 5 -1.00 0.38 0.23 0.46 -0.28 0.84 0.06 6 -0.04 0.56 -0.12 -0.48 -0.27 0.72 0.16 7 -0.64 0.42 0.03 0.55 0.70 0.80 0.08 8 0.23 0.31 0.78 1.57 1.93 0.68 -0.24 9 -0.04 0.63 -0.56 -1.02 -0.19 0.72 0.24

10 -0.64 0.41 0.63 0.16 -1.11 0.80 -0.05 11 -0.32 0.43 0.47 0.28 0.41 0.76 -0.01 12 -0.32 0.57 -0.34 -0.47 -0.87 0.76 0.18 13 -0.64 0.52 0.04 -0.40 -1.11 0.80 0.10 14 -1.98 0.14 0.70 0.87 -0.22 0.92 0.05 15 -0.04 0.63 -0.56 -1.02 -0.19 0.72 0.24 16 1.18 0.60 -0.30 -0.49 0.10 0.52 0.21 17 -0.04 0.58 -0.41 -0.52 -0.19 0.72 0.21 18 0.23 0.39 0.36 1.14 0.68 0.68 -0.09 19 0.23 0.69 -0.93 -1.59 1.67 0.68 0.34 20 1.89 0.62 -0.61 -0.44 -0.04 0.40 0.32 21 1.41 0.30 0.98 2.19 2.36 0.48 -0.28 22 0.23 0.51 -0.03 0.12 0.36 0.68 0.02 23 2.14 0.47 0.47 0.58 1.57 0.36 -0.07 24 1.65 0.66 -0.85 -0.82 0.44 0.44 0.40 25 1.41 0.51 0.46 0.25 -0.61 0.48 -0.17

Mean -0.00 0.48 0.07 0.02 0.26 0.70 S.D. 1.11 0.15 0.61 0.91 0.89 Groups 2 Note: Raw score mean = 17.40 and a S. Mean person ability = 1.32 with a S.D. Test reliability (K.R. 20) = 0.88. Reliability of person separation = 0.86.

D. of 5.52. of 1.60.

157


Experiment 38- Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 50 Items. 25 Persons, and a 25% Chance of guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -0.46 0.44 -0.40 -0.26 -0.94 0.80 0.14 2 -1.14 0.21 0.17 0.37 -0.21 0.88 0.04 3 -1.62 0.35 -0.28 0.06 0.06 0.92 0.07 4 -1.14 0.51 -0.62 -0.45 0.39 0.88 0.11 5 -0.77 -0.01 2.42 0.68 -1.05 0.84 -1.12 6 -0.77 0.49 -0.58 -0.44 0.68 0.84 0.14 7 -2.40 0.05 0.64 0.40 -0.34 0.96 0.02 8 0.05 0.16 1.94 0.82 -0.93 0.72 -0.98 9 -0.19 0.04 1.28 1.24 2.50 0.76 -0.34

10 -0.77 0.37 -0.23 -0.02 0.68 0.84 0.10 11 -1.14 0.43 -0.40 -0.22 0.39 0.88 0.10 12 0.28 0.27 0.44 0.84 1.46 0.68 -0.06 13 -0.46 0.45 -0.41 -0.32 -0.94 0.80 0.14 14 -0.46 0.45 -0.45 -0.30 -0.94 0.80 0.15 15 -0.46 0.37 -0.35 0.14 0.96 0.80 0.12 16 0.05 0.46 -0.46 -0.24 -0.93 0.72 0.20 17 -0.77 0.42 -0.41 -0.17 0.68 0.84 0.12 18 -0.77 0.30 0.04 0.13 -1.05 0.84 0.07 19 0.05 0.34 0.58 0.11 -0.93 0.72 -0.16 20 0.05 0.52 -0.45 -0.78 0.20 0.72 0.19 21 0.28 0.33 -0.02 0.64 1.46 0.68 0.09 22 0.28 0.27 0.71 0.69 0.19 0.68 -0.22 23 0.49 0.57 -0.96 -0.94 -0.28 0.64 0.41 24 -0.19 0.35 1.04 -0.24 -0.30 0.76 -0.39 25 -0.46 0.50 -0.54 -0.58 -0.94 0.80 0.16 26 0.49 0.20 1.01 1.24 0.94 0.64 -0.36 27 0.28 0.14 1.40 1.20 1.46 0.68 -0.50 28 -0.19 0.52 -0.35 -0.86 -0.30 0.76 0.13 29 -0.46 0.33 -0.02 0.17 0.44 0.80 0.08 30 0.05 0.68 -1.36 -1.74 1.53 0.72 0.38 31 0.49 -0.11 2.54 2.60 2.03 0.64 -1.33 32 0.70 0.37 0.16 0.40 0.31 0.60 -0.07 33 0.49 0.31 0.15 0.85 -0.47 0.64 -0.03 34 0.49 0.40 -0.02 0.15 -0.47 0.64 0.04 35 0.05 0.53 -0.82 -0.63 1.53 0.72 0.24 36 0.05 0.38 -0.26 0.26 0.20 0.72 0.12 37 0.28 0.47 -0.25 -0.39 -1.02 0.68 0.15 38 0.05 0.28 0.54 0.53 0.77 0.72 -0.10 39 -0.19 0.21 0.43 0.76 -0.19 0.76 -0.05



158

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 0.28 0.51 -0.65 -0.56 0.65 0.68 0.23 41 1.10 0.61 -1.13 -1.13 0.19 0.52 0.56 42 1.10 0.37 0.54 0.41 0.43 0.52 -0.30 43 0.70 0.60 -0.84 -1.33 0.31 0.60 0.39 44 1.10 0.60 -1.07 -1.07 0.19 0.52 0.54 45 1.10 0.39 0.15 0.47 0.43 0.52 0.02 46 0.90 0.31 1.16 0.56 -0.53 0.56 -0.66 47 0.28 0.33 0.29 0.40 0.19 0.68 -0.01 48 1.10 0.58 -0.98 -0.94 1.37 0.52 0.51 49 1.50 0.28 0.81 1.06 0.38 0.44 -0.31 50 0.70 0.49 -0.34 -0.42 0.31 0.60 0.20

Mean S.D.

-0.00 0.77

0.37 0.17

0.08 0.87

0.06 0.79

0.21 0.87

0.71

Groups Note: Raw score mean = 35.68 with a S.D. of 8. Mean person ability = 1.22 with a S.D. of 1.06. Test reliability (K.R. 20) = 0.88. Reliability of person separation = 0.86.

43.

159


Experiment 39: Summary of Item Fit Information for a Uniformly Distributed Item Difficulty Distribution With 100 Items. 25 Persons, and and a 25% Chance of Guessing



1 -2.33 0.26 0.01 0.30 -0.25 0.96 0.04 2 -0.67 0.59 -0.91 -0.47 0.90 0.84 0.15 3 -0.67 0.23 0.87 0.27 0.70 0.84 -0.17 4 -1.05 0.30 0.63 0.02 -0.58 0.88 -0.10 5 -0.36 0.27 0.69 0.20 1.47 0.80 -0.10 6 0.16 0.57 -0.56 -0.78 -0.94 0.72 0.20 7 -0.09 0.51 -0.61 -0.26 0.21 0.76 0.17 8 -1.55 0.52 -0.62 -0.17 0.20 0.92 0.07 9 -1.05 0.15 0.34 0.57 -0.58 0.88 0.01

10 -0.09 0.22 0.60 0.67 0.83 0.76 -0.11 11 -0.36 0.39 0.27 -0.09 -0.03 0.80 -0.02 12 -0.09 0.47 0.04 -0.35 -0.87 0.76 0.02 13 -0.67 -0.07 1.72 0.80 2.16 0.84 -0.38 14 -0.36 0.18 0.65 0.63 -0.03 0.80 -0.09 15 -1.05 0.42 -0.32 -0.02 0.56 0.88 0.08 16 -0.67 0.73 -1.31 -1.10 0.90 0.84 0.18 17 -1.05 0.16 0.87 0.31 1.47 0.88 -0.11 18 -2.33 0.26 0.01 0.30 -0.25 0.96 0.04 19 -1.55 0.12 0.62 0.42 0.37 0.92 -0.03 20 -0.09 0.37 0.25 0.09 0.21 0.76 -0.04 21 -0.36 0.26 0.41 0.45 -0.39 0.80 -0.04 22 -2.33 0.08 0.50 0.40 -0.25 0.96 0.03 23 -2.33 0.26 0.01 0.30 -0.25 0.96 0.04 24 -0.67 0.43 -0.37 -0.02 -1.27 0.84 0.09 25 -0.36 0.05 1.37 0.85 1.47 0.80 -0.33 26 -0.67 0.64 -1.09 -0.65 0.90 0.84 0.16 27 -0.67 0.74 -1.33 -1.12 0.90 0.84 0.18 28 -1.05 0.24 0.55 0.22 -0.58 0.88 -0.06 29 -1.05 0.17 0.63 0.34 1.47 0.88 -0.04 30 -0.36 0.75 -1.43 -1.30 1.23 0.80 0.25 31 -0.67 0.35 -0.26 0.31 0.90 0.84 0.08 32 0.16 0.77 -1.67 -1.72 0.71 0.72 0.44 33 0.39 0.28 0.72 0.49 -0.62 0.68 -0.26 34 -0.67 0.45 -0.39 -0.07 -1.27 0.84 0.10 35 -1.05 0.38 -0.19 0.06 0.56 0.88 0.07 37 -0.67 0.52 -0.61 -0.28 0.90 0.84 0.12 38 -0.09 0.40 -0.22 0.16 0.21 0.76 0.09 39 0.39 0.40 0.21 -0.03 -0.62 0.68 -0.01 40 -0.67 0.44 -0.41 -0.03 0.90 0.84 0.10


160


Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

41 -0.67 0.34 0.30 0.08 -1.27 0.84 -0.03 42 0.39 0.33 0.28 0.43 -0.14 0.68 -0.10 43 -0.09 0.31 0.05 0.49 -0.87 0.76 0.04 44 0.16 0.08 1.37 1.13 1.45 0.72 -0.45 45 0.39 0.19 0.94 0.94 0.85 0.68 -0.34 46 -1.05 0.41 -0.32 0.01 0.56 0.88 0.08 47 -0.36 0.29 0.43 0.38 -0.39 0.80 -0.09 48 -0.36 0.24 1.20 0.17 -0.03 0.80 -0.30 49 -0.67 0.38 -0.27 0.14 0.90 0.84 0.09 50 -0.36 0.69 -1.21 -0.98 1.23 0.80 0.23 51 0.39 0.31 0.27 0.53 -0.14 0.68 -0.08 52 -0.09 0.17 1.33 0.60 0.83 0.76 -0.41 53 0.16 0.25 0.74 0.56 0.16 0.72 -0.20 54 -0.09 0.32 0.10 0.42 0.83 0.76 0.03 55 -1.05 0.54 -0.44 -0.45 0.56 0.88 0.08 56 0.60 0.38 0.19 0.12 -1.19 0.64 -0.09 57 -0.09 0.11 1.07 0.93 0.83 0.76 -0.25 58 -0.09 0.61 -0.84 -0.82 0.21 0.76 0.23 59 0.16 0.48 -0.44 -0.21 -0.94 0.72 0.19 60 0.60 0.59 -1.03 -0.87 -1.19 0.64 0.44 61 0.60 0.21 1.13 0.87 0.19 0.64 -0.53 62 1.19 0.24 1.13 0.87 0.67 0.52 -0.71 63 0.16 0.38 -0.12 0.21 -0.94 0.72 0.07 64 1.19 0.49 -0.22 -0.69 -0.12 0.52 0.06 65 0.99 0.24 0.78 0.96 1.30 0.56 -0.33 66 0.16 0.21 0.53 0.88 0.16 0.72 -0.13 67 0.39 0.62 -1.13 -0.98 -0.14 0.68 0.40 68 0.80 0.34 0.13 0.53 0.80 0.60 0.03 69 -0.09 0.13 0.79 0.97 -0.87 0.76 -0.16 70 0.60 0.45 -0.03 -0.24 -1.19 0.64 0.07 71 0.16 0.44 -0.10 -0.16 0.16 0.72 0.09 72 0.39 0.21 0.82 0.88 1.96 0.68 -0.23 73 1.37 0.48 -0.57 -0.57 -0.81 0.48 0.36 74 0.39 0.55 -0.84 -0.56 1.17 0.68 0.28 75 0.99 0.42 0.07 -0.09 -1.12 0.56 0.02 76 0.99 0.59 -1.04 -1.20 0.48 0.56 0.54 77 -0.09 0.45 -0.36 -0.08 0.21 0.76 0.12 78 0.80 0.38 0.40 0.07 0.80 0.60 -0.13 79 0.39 0.06 1.09 1.56 0.85 0.68 -0.43 80 -0.09 0.51 -0.50 -0.35 0.21 0.76 0.15 81 1.37 0.50 -0.66 -0.72 0.66 0.48 0.38 82 0.99 0.21 1.12 1.08 1.30 0.56 -0.54 83 0.80 0.35 0.28 0.38 -0.60 0.60 -0.06



161

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

84 0.80 0.70 -1.79 -1.72 2.07 0.60 0.75 85 0.39 0.21 0.70 0.95 -0.62 0.68 -0.25 86 0.60 0.09 1.12 1.62 1.39 0.64 -0.46 87 0.39 0.49 -0.59 -0.25 1.17 0.68 0.21 88 1.56 0.44 -0.45 -0.33 0.18 0.44 0.28 89 1.75 0.71 -1.86 -2.47 1.00 0.40 0.80 90 0.80 0.73 -1.97 -1.95 2.07 0.60 0.82 91 1.75 0.30 0.09 0.55 -0.43 0.40 0.02 92 0.80 0.21 0.84 1.08 -0.60 0.60 -0.42 93 0.99 0.28 0.88 0.66 -1.12 0.56 -0.41 94 1.19 0.48 -0.58 -0.50 -0.12 0.52 0.36 95 1.56 0.24 1.35 0.67 0.18 0.44 -0.88 96 2.15 0.13 0.85 1.07 0.64 0.32 -0.27 97 0.39 0.63 -1.02 -1.13 1.17 0.68 0.36 98 0.80 0.31 0.45 0.53 0.80 0.60 -0.13 99 1.19 0.49 -0.61 -0.56 -0.12 0.52 0.37

100 0.80 0.41 -0.21 0.10 -0.60 0.60 0.16 Mean S.D. Groups

0.00 0.93

0.36 0.19

0.05 0.82

0.06 0.75

0.28 0.88 2

0.72

Note: Raw score mean = 72.28 with a S.D. of 15, Mean person ability = 1.25 with a S.D. of 0.92. Test reliability (K.R. 20) = 0.94. Reliability of person separation = 0.92.

98.

162


Experiment 40- Summary of Item Fit Information for a Uniformly Distributed Item Difficulty Distribution With 25 Items. 50 Persons, and a 25% Chance of Guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -2.04 0.26 0.03 0.22 0.13 0.94 0.04 2 -0.62 0.48 -0.41 -0.61 -1.04 0.82 0.10 3 -1.42 0.32 -0.38 0.36 0.57 0.90 0.07 4 -0.32 0.37 0.12 0.30 -0.33 0.78 0.03 5 -0.18 0.18 1.39 1.34 1.50 0.76 -0.28 6 -0.79 0.09 1.76 1.17 -0.46 0.84 -0.33 7 -0.32 0.23 1.93 0.85 1.87 0.78 -0.53 8 -1.42 0.43 -0.53 -0.32 0.57 0.90 0.07 9 -0.62 0.61 -1.23 -1.33 1.32 0.82 0.17

10 -0.62 0.31 2.12 -0.12 -1.04 0.82 -0.62 11 -0.05 0.56 -1.04 -0.84 1.20 0.74 0.21 12 0.08 0.55 -0.95 -0.87 -0.74 0.72 0.23 13 0.43 0.26 1.10 1.63 0.52 0.66 -0.28 14 0.08 0.32 0.62 0.92 -0.39 0.72 -0.13 15 0.32 0.52 -0.75 -0.55 0.14 0.68 0.18 16 0.20 0.44 -0.17 0.05 -0.95 0.70 0.10 17 -0.32 0.42 -0.47 0.27 0.69 0.78 0.11 18 1.07 0.62 -1.64 -1.57 0.99 0.54 0.53 19 0.97 0.44 0.05 0.28 -0.90 0.56 0.04 20 0.76 0.42 0.02 0.53 -1.03 0.60 0.08 21 0.43 0.50 0.43 -0.63 -0.61 0.66 -0.15 22 0.43 0.49 0.59 -0.63 -0.61 0.66 -0.20 23 1.48 0.48 -0.04 -0.20 -1.01 0.46 0.05 24 1.07 0.41 0.52 0.69 0.72 0.54 -0.19 25 1.38 0.57 -1.18 -1.01 -0.40 0.48 0.42

Mean S.D. Groups

-0.00 0.89

0.41 0.14

0.07 1.01

-0.00 0.84

0.03 0.91 2

0.71

Note: Raw score mean = 17.86 and a S.D. of 4.66. Mean person ability = 1.28 with a S.D. of 1.19. Test reliability (K.R. 20) = 0.82. Reliability of person separation = 0.76.

163



Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -2.27 0.27 0.32 -0.11 -0.20 0.94 0.00 2 -1.07 0.21 0.25 0.38 0.45 0.84 0.03 3 -1.25 0.21 0.04 0.51 -0.14 0.86 0.05 4 -1.07 0.29 0.08 -0.00 0.45 0.84 0.05 5 -0.77 0.28 0.23 0.16 -0.50 0.80 0.02 6 -0.50 0.38 -0.04 -0.46 0.28 0.76 0.07 7 -1.25 0.32 0.10 -0.08 -0.14 0.86 0.00 8 -1.45 0.41 -0.35 -0.47 -0.56 0.88 0.08 9 -0.77 0.13 0.58 0.94 1.74 0.80 -0.07

10 -0.91 0.29 0.08 0.11 -0.99 0.82 0.02 11 -0.63 0.37 -0.19 -0.28 -0.07 0.78 0.10 12 -0.63 0.31 0.18 -0.05 1.36 0.78 0.03 13 -0.38 0.34 -0.12 0.04 -0.57 0.74 0.06 14 -0.15 0.29 0.28 0.31 -1.04 0.70 -0.08 15 -0.15 0.25 0.36 0.69 -0.26 0.70 -0.03 16 -0.38 0.38 -0.02 -0.42 -0.57 0.74 0.07 17 0.16 0.30 0.38 0.48 -0.52 0.64 -0.08 18 -0.63 0.29 0.18 0.19 -1.10 0.78 0.02 19 0.06 0.47 -0.66 -0.98 0.89 0.66 0.28 20 0.16 0.26 0.34 0.93 0.52 0.64 -0.03 21 -0.15 0.39 -0.14 -0.38 -0.26 0.70 0.12 22 -0.04 0.37 -0.27 -0.01 0.57 0.68 0.16 23 -0.50 0.31 0.01 0.16 -1.14 0.76 0.06 24 -0.50 0.42 -0.33 -0.62 -1.14 0.76 0.14 25 0.46 0.46 -0.30 -1.29 0.54 0.58 0.17 26 0.16 0.30 0.55 0.29 -0.52 0.64 -0.17 27 0.36 0.47 -0.66 -1.09 0.15 0.60 0.34 28 0.46 0.18 0.89 1.88 2.46 0.58 -0.33 29 0.65 0.37 -0.11 0.14 -0.81 0.54 0.14 30 0.06 0.31 0.07 0.41 0.90 0.66 0.07 31 -0.50 0.23 0.43 0.53 -1.14 0.76 -0.08 32 -0.04 0.31 0.07 0.38 1.26 0.68 0.07 33 0.84 0.24 0.84 1.40 -0.76 0.50 -0.33 34 0.16 0.42 -0.39 -0.52 -0.79 0.64 0.23 35 0.06 0.44 -0.51 -0.71 -0.05 0.66 0.24 36 0.84 0.36 0.05 0.25 -0.61 0.50 0.06 37 0.65 0.37 0.05 -0.06 0.45 0.54 0.06 38 0.93 0.36 0.18 0.20 -1.41 0.48 -0.01 39 0.84 0.20 1.14 1.81 0.30 0.50 -0.50



164

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 0.06 0.31 0.08 0.42 -1.58 0.66 0.06 41 1.22 0.42 -0.15 -0.40 0.51 0.42 0.11 42 0.65 0.54 -1.10 -2.00 0.45 0.54 0.53 43 1.12 0.40 -0.04 -0.29 0.00 0.44 0.07 44 0.84 0.48 -0.56 -1.26 1.22 0.50 0.29 45 0.84 0.29 0.34 1.12 0.30 0.50 -0.04 46 0.93 0.43 -0.30 -0.59 -0.05 0.48 0.19 47 0.84 0.32 0.40 0.60 -0.61 0.50 -0.12 48 0.65 0.41 -0.37 -0.31 -0.81 0.54 0.27 49 0.84 0.27 1.28 0.88 0.30 0.50 -0.78 50 1.22 0.54 -1.25 -1.62 1.32 0.42 0.50

Mean S.D.

0.00 0.79

0.34 0.09

0.04 0.48

0.02 0.78

-0.04 0.86

0.66


28.

165


Exper iment 47.: Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 100 Items. 50 Persons, and a 25% Chance of Guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -1.46 0.34 0.02 0.15 0.44 0.90 0.05 2 -0.81 0.48 -0.59 -0.18 1.00 0.84 0.09 3 -1.00 0.52 -0.76 -0.46 0.81 0.86 0.11 4 -0.81 0.44 0.43 -0.47 -0.11 0.84 -0.04 5 -1.46 0.26 0.71 0.32 1.30 0.90 -0.05 6 -1.00 0.29 0.28 0.62 -1.00 0.86 0.00 7 -0.81 0.51 -0.76 -0.33 1.00 0.84 0.12 8 -0.63 0.42 0.48 -0.33 0.86 0.82 -0.05 9 -0.81 0.51 -0.72 -0.37 1.00 0.84 0.09

10 -0.18 0.47 -0.29 -0.19 -0.55 0.76 0.12 11 -0.32 0.50 -0.47 -0.42 0.45 0.78 0.12 12 -0.32 0.45 -0.19 -0.15 -1.07 0.78 0.09 13 -0.63 0.34 0.15 0.59 -0.15 0.82 -0.02 14 -1.00 0.46 -0.50 -0.22 0.81 0.86 0.10 15 -0.05 0.58 -0.95 -1.02 1.89 0.74 0.23 16 -0.47 0.62 -1.14 -1.32 1.35 0.80 0.19 17 -0.32 0.51 -0.57 -0.44 0.45 0.78 0.14 18 -0.81 0.44 0.07 -0.33 -0.11 0.84 0.04 19 -0.81 0.36 0.26 0.24 -0.11 0.84 0.01 20 -0.32 0.56 -0.81 -0.95 0.45 0.78 0.18 21 -1.00 0.55 -0.31 -1.05 -1.00 0.86 0.06 22 0.20 0.41 0.28 0.19 -0.47 0.70 -0.12 23 -0.18 0.33 1.00 0.45 0.81 0.76 -0.19 24 -0.63 0.38 0.09 0.19 -0.60 0.82 0.05 25 -1.22 0.47 -0.26 -0.43 -1.30 0.88 0.07 26 -1.00 0.19 0.73 1.00 1.71 0.86 -0.06 27 -0.81 0.40 -0.09 0.09 -0.52 0.84 0.06 28 -0.63 0.42 -0.06 -0.04 -0.15 0.82 0.06 29 0.20 0.34 0.85 0.67 -0.47 0.70 -0.29 30 -0.47 0.49 -0.37 -0.47 0.17 0.80 0.11 31 -1.00 0.40 -0.22 0.12 0.81 0.86 0.07 32 -0.63 0.44 -0.20 -0.09 -0.15 0.82 0.07 33 -0.05 0.56 -0.64 -0.99 -0.15 0.74 0.18 34 -0.63 0.60 -0.76 -1.34 -0.15 0.82 0.13 35 -0.32 0.36 0.35 0.37 1.19 0.78 0.00 36 0.32 0.43 0.47 0.09 -0.21 0.68 -0.11 37 -1.00 0.41 0.19 -0.25 0.35 0.86 0.03 38 0.20 0.34 3.99 0.26 -0.47 0.70 -2.71 39 -0.05 0.40 0.07 0.36 -0.97 0.74 0.03


166


Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 0.44 0.21 1.57 1.96 1.68 0.66 -0.50 41 -0.05 0.38 0.27 0.42 0.42 0.74 -0.01 42 -0.47 0.15 1.48 1.31 2.53 0.80 -0.31 43 -0.81 0.44 -0.16 -0.18 -0.52 0.84 0.07 44 -0.32 0.50 -0.11 -0.60 -1.07 0.78 0.06 45 -0.05 0.45 -0.18 0.02 -0.97 0.74 0.10 46 -0.32 0.47 -0.44 -0.12 0.45 0.78 0.13 47 0.08 0.21 2.16 1.22 1.05 0.72 -0.75 48 -0.47 0.34 0.52 0.35 0.45 0.80 -0.03 49 -0.47 0.41 0.13 -0.01 0.45 0.80 0.04 50 0.55 0.38 0.32 0.65 0.45 0.64 -0.08 51 -0.05 0.33 0.61 0.82 -0.97 0.74 -0.12 52 0.32 0.41 0.30 0.32 -1.04 0.68 -0.14 53 0.08 0.46 -0.35 0.08 0.21 0.72 0.13 54 -0.05 0.51 -0.49 -0.49 -0.15 0.74 0.17 55 -0.18 0.30 1.27 0.80 -0.55 0.76 -0.38 56 0.55 0.21 1.62 1.96 2.09 0.64 -0.53 57 0.20 0.46 0.03 -0.16 -0.47 0.70 0.06 58 -0.63 0.39 -0.27 0.42 1.17 0.82 0.09 59 0.32 0.37 0.20 0.80 1.22 0.68 0.04 60 -0.32 0.50 -0.57 -0.39 0.45 0.78 0.13 61 0.20 0.51 -0.60 -0.42 0.53 0.70 0.19 62 0.20 0.32 1.13 0.75 -0.47 0.70 -0.30 63 -0.32 0.31 0.77 0.67 -1.07 0.78 -0.19 64 -0.05 0.26 0.75 1.20 2.28 0.74 -0.08 65 0.97 0.57 -1.11 -1.22 0.20 0.56 0.42 66 -0.18 0.39 0.30 0.24 -0.43 0.76 -0.02 67 0.08 0.46 -0.29 -0.09 0.00 0.72 0.13 68 -0.32 0.25 0.62 1.18 1.19 0.78 -0.05 69 0.44 0.43 -0.14 0.28 -1.16 0.66 0.12 70 0.66 0.45 0.49 -0.21 -0.08 0.62 -0.31 71 0.66 0.38 0.21 0.87 -1.43 0.62 0.00 72 0.76 0.51 -0.50 -0.59 -0.82 0.60 0.24 73 0.20 0.55 -0.73 -0.88 0.53 0.70 0.22 74 -0.18 0.54 -0.86 -0.59 1.71 0.76 0.19 75 0.44 0.39 0.04 0.67 -0.17 0.66 0.07 76 0.32 0.37 1.47 0.20 0.27 0.68 -0.50 77 0.32 0.44 -0.07 0.16 0.84 0.68 0.04 78 0.76 0.42 -0.07 0.44 -0.47 0.60 0.11 79 0.44 0.48 -0.31 -0.25 -1.16 0.66 0.17 80 0.20 0.37 0.40 0.58 0.67 0.70 -0.03 81 0.66 0.44 0.44 -0.02 -0.08 0.62 -0.14 82 0.97 0.49 -0.14 -0.49 0.20 0.56 0.03


167

Appendix C ('continued')

Logit Point. Unwt. Item item bis. total

# diff. corn fit


83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

100

1.18 1.28 0.66 1.18 0.44 1.28 0.76 1.08 0.32 1.08 1.28 1.18 0.76 0.20 0.66 0.55 1.79 1.48

Mean S.D. Groups

0.00 0.70

0.41 0.46 0.36 0.58 0.37 0.63 0.37 0.38 0.59 0.41 0.49 0.49 0.48 0.35 0.48 0.45 0.54 0.34

0.43 -0.20 0.28

-1.08 0.22

-1.64 0.52 0.24

-0.93 0.23

-0.53 -0.47 -0.40 0.55

-0.46 -0.13 -0.94 0.94

0.35 -0.02 1.00

-1.50 0.86

-2.20 0.79 0.90

-1.41 0.53

-0.41 -0.38 -0.20 0.72

-0.19 0.11

-1.09 1.05

-1.09 -0.97 0.96 1.02

-0.17 1.40 0.33 0.20

-0.21 -0.99 -0.40 -1.09 -0.47 0.67

-0.08 -0.68 0.40 0.16

0.42 0.10

0.06 0.78

0.04 0.73

0.14 0.88 2

0.52 0.50 0.62 0.52 0.66 0.50 0.60 0.54 0.68 0.54 0.50 0.52 0.60 0.70 0.62 0.64 0.40 0.46 0.72

-0.13 0.15 0.00 0.40 0.03 0.57

-0.18 0.02 0.29

-0.00 0.26 0.23 0.20

-0.18 0.20 0.13 0.34

-0.32


42.

168


pvpg>t-impnt A-\- Summary of Ttem Fit Inform?*™" fr>r a TTniformlv Distributed Item Difficulty Distribution With 9.5 Items. 100 Persons, and a ,hance of Guessing Correctly

Logit Point. Unwt. Item item bis. total

# diff. corr. fit


1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

-2.05 -2.26 -1.19 -1.56 -0.89 -0.80 -0.71 -0.55 -0.06 -0.26 -0.40 -0.06 -0.06 0.48 0.19 0.19 0.25 0.97 0.70 0.65 1.02 1.33 1.22 1.80 2.02

Mean S.D. Groups

-0.00 1.11

0.33 0.12 0.34 0.21 0.42 0.37 0.27 0.38 0.48 0.29 0.41 0.46 0.44 0.52 0.39 0.51 0.51 0.60 0.36 0.48 0.49 0.50 0.34 0.33 0.54

-0.40 1.22 0.57 1.35

-0.55 -0.42 0.70 0.15

-0.37 0.95

-0.12 -0.42 0.27

-0.91 0.24

-1.00 -0.94 -1.78 1.39 0.02 0.05

-0.73 1.38 3.16

-0.97

-0.17 0.38

-0.20 0.22

-0.38 0.19 0.75

-0.00 -0.54 1.08

-0.01 -0.42 -0.50 -0.89 0.63

-0.60 -0.70 -1.93 1.13

-0.38 -0.47 -0.18 1.92 1.32

-0.97

0.43 1.55

-1.01 0.06 0.54

-0.74 0.20 0.65

-0.09 1.27

-0.34 -1.29 -0.33 1.12 0.02 0.02 0.27 1.85 0.02

-0.48 -1.47 -1.31 2.08

-0.11 0.45

0.40 0.11

0.11 1.06

-0.03 0.83

0.13 0.94 2

0.94 0.95 0.88 0.91 0.85 0.84 0.83 0.81 0.74 0.77 0.79 0.74 0.74 0.65 0.70 0.70 0.69 0.56 0.61 0.62 0.55 0.49 0.51 0.40 0.36 0.71

0.05 -0.10 -0.06 -0.16 0.08 0.08

-0.04 0.01 0.08

-0.11 0.05 0.10

-0.05 0.20

-0.01 0.19 0.20 0.41

-0.35 -0.07 -0.02 0.23

-0.33 -1.10 0.21

Note: Raw score mean = 17.63 and a S Mean person ability = 1.28 with a S.D. Test reliability (K.R. 20) = 0.81. Reliability of person separation = 0.78.

D. of 4.53. of 1.22.

169


Experiment 44: Summarv ot item 111 i niffioiiitv Distribution With 50 Items,

niormaiioii iui a uiumiihy jyiauiuw^ 100 Persons, and a 25% Chance of Guessing

Correctly

Logit Item item

# diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit


fit score Index

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

-0.69 -0.77 -1.27 -1.00 -1.09 -0.62 -1.00 -0.69 -0.77 -0.69 -0.30 -0.56 -0.62 -0.84 -0.13 -0.77 -0.24 -0.36 -0.18 -0.77 -0.13 -0.36 -0.24 -0.07 0.20 0.25 0.25

-0.02 0.09 0.25 0.09 0.45 0.25 0.69 0.30 0.79 0.69 0.45 0.50

0.34 0.30 0.29 0.25 0.27 0.42 0.39 0.22 0.45 0.31 0.29 0.29 0.28 0.26 0.42 0.36 0.43 0.38 0.37 0.23 0.32 0.34 0.31 0.44 0.24 0.44 0.41 0.45 0.42 0.40 0.37 0.35 0.34 0.38 0.41 0.43 0.42 0.47 0.41

-0.24 0.40 0.78 1.57 0.28

-0.83 -0.72 0.91

-0.76 0.36 0.14 0.62 0.19

-0.02 -0.48 -0.51 -0.51 1.09

-0.01 0.60 0.11 0.42

-0.00 -0.71 1.18

-0.55 -0.61 -0.96 -0.62 -0.46 -0.27 0.28 0.72 0.20

-0.12 -0.36 -0.42 -0.93 -0.37

0.06 0.06

-0.21 0.38 0.24

-0.47 -0.33 0.68

-0.81 0.11 0.74 0.36 0.46 0.54

-0.41 -0.03 -0.64 -0.41 -0.02 0.66 0.66 0.06 0.61

-0.64 1.52

-0.64 -0.13 -0.64 -0.32 0.09 0.36 0.62 0.41 0.34

-0.31 -0.40 -0.13 -0.95 -0.04

0.40 0.67

-0.71 0.31

-1.33 0.44 0.31 0.40 0.01

-0.83 -1.19 -0.17 0.12 0.93

-0.16 0.01

-0.78 -0.37 -0.14 -0.26 -0.16 -0.37 -0.79 -0.81 0.19 0.80

-0.69 -0.47 -0.78 -1.02 -0.78 0.82

-0.69 -1.05 -1.50 -0.62 0.72

-0.22 -0.11

0.80 0.81 0.87 0.84 0.85 0.79 0.84 0.80 0.81 0.80 0.74 0.78 0.79 0.82 0.71 0.81 0.73 0.75 0.72 0.81 0.71 0.75 0.73 0.70 0.65 0.64 0.64 0.69 0.67 0.64 0.67 0.60 0.64 0.55 0.63 0.53 0.55 0.60 0.59

0.05 -0.02 -0.09 -0.34 -0.02 0.12 0.08

-0.11 0.09

-0.03 0.01

-0.06 -0.01 0.03 0.10 0.08 0.09

-0.29 0.03

-0.06 -0.02 -0.06 0.03 0.15

-0.25 0.14 0.16 0.20 0.15 0.13 0.09

-0.02 -0.20 -0.01 0.04 0.12 0.16 0.25 0.15

170


Logti Point Unwt. Wt Ability Mean Item item bis. total total between item

# diff. corr. fit fit fit s c o r e

Logit Residual

Index

40 41 42 43 44 45 46 47 48 49 50

0.64 0.35 0.93 0.45 1.08 0.69 0.79 0.84 0.84 1.22 1.12

0.27 0.49 0.22 0.42 0.51 0.42 0.53 0.31 0.36 0.33 0.50

1.88 -0.83 2.36

-0.63 -1.35 -0.50 -1.71 1.43 0.21 0.42

-0.78

1.43 -1.27 2.23

-0.14 -1.51 -0.16 -1.68 1.06 0.60 1.05

-1.45

0.98 0.63 1.11

-1.19 0.33

-1.05 1.03

-0.18 -1.08 -0.09 1.21

-0.00 0.82

0.03 0.77

Mean 0.00 0.37 S.D. 0.66 0.08 Groups Note: Raw score mean = 33.82 with a S.D. of 8.52. Mean person ability = 0.94 with a S.D. of 0.97. Test reliability (K.R. 20) = 0.88. Reliability of person separation = 0.86.

-0.16 0.72 2

0.56 0.62 0.50 0.60 0.47 0.55 0.53 0.52 0.52 0.44 0.46 0.68

-0.59 0.20

-0.73 0.19 0.37 0.18 0.46

-0.44 -0.01 -0.05 0.16

171


Experiment 45- Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 100 Items. 100 Persons, and a 25% Chance of Guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

1 -0.84 0.45 -0.43 -0.74 -0.25 2 -1.17 0.33 0.12 0.03 -1.17 3 -1.17 0.35 -0.03 -0.05 -1.17 4 -1.36 0.34 -0.48 0.03 0.54 5 -1.09 0.40 -0.48 -0.32 -1.32 6 -1.47 0.26 0.05 0.43 -0.04 7 -0.70 0.26 0.95 0.89 -0.58 8 -1.09 0.37 -0.60 0.06 1.04 9 -0.84 0.44 -0.41 -0.57 -0.25

10 -0.84 0.39 0.15 -0.17 0.69 11 -0.24 0.29 0.75 1.14 1.50 12 -1.09 0.33 -0.30 0.32 0.09 13 -0.63 0.44 -0.58 -0.42 0.44 14 -0.49 0.39 0.44 -0.17 0.42 15 -0.43 0.36 0.44 0.23 0.15 16 -0.49 0.42 -0.36 -0.21 0.03 17 -0.37 0.48 -0.63 -0.79 0.49 18 -0.63 0.33 0.33 0.45 -0.51 19 -0.24 0.31 1.04 0.71 0.13 20 -0.77 0.46 -0.58 -0.80 0.00 21 -0.37 0.38 -0.02 0.23 -0.36 22 -0.63 0.31 1.18 0.45 0.94 23 -0.63 0.34 0.12 0.34 -0.96 24 -0.63 0.37 0.02 0.06 0.13 25 -0.49 0.32 1.21 0.28 -1.05 26 -0.70 0.34 0.38 0.31 -0.84 27 -0.24 0.35 -0.04 0.69 -0.81 28 -0.84 0.32 0.20 0.37 0.04 29 -0.07 0.41 -0.11 -0.04 -0.77 30 -0.56 0.26 1.18 0.86 0.68 31 -0.43 0.29 0.79 0.81 0.91 32 -0.24 0.25 0.71 1.61 2.08 33 -0.24 0.29 0.54 1.21 2.08 34 -0.63 0.37 -0.44 0.31 0.44 35 -0.37 0.37 0.15 0.20 -0.36 36 -0.49 0.38 -0.29 0.24 0.03 37 -0.24 0.39 0.28 0.04 -0.81 38 -0.01 0.49 -0.72 -0.93 -0.42 39 -0.19 0.54 -1.29 -1.43 1.69

Mean item score

Logit Residual

Index

0.81 0.85 0.85 0.87 0.84 0.88 0.79 0.84 0.81 0.81 0.72 0.84 0.78 0.76 0.75 0.76 0.74 0.78 0.72 0.80 0.74 0.78 0.78 0.78 0.76 0.79 0.72 0.81 0.69 0.77 0.75 0.72 0.72 0.78 0.74 0.76 0.72 0.68 0.71

0.06 0.01 0.02 0.07 0.07 0.03

-0.13 0.08 0.05

-0.05 -0.10 0.05 0.09

-0.07 -0.04 0.05 0.10

-0.03 -0.17 0.08 0.05

-0.21 -0.00 0.03

-0.24 -0.05 0.04

-0.00 0.05

-0.16 -0.12 -0.07 -0.04 0.06

-0.00 0.07

-0.04 0.16 0.17


172


Logit Point. Item item bis.

# diff. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

0.70 0.18 1.35 0.74 -0.07 0.28 0.40 0.13 0.72 -0.02

-1.71 -1.36 1.20 0.78 0.20 -0.70 -1.03 1.09 0.71 0.12 0.35 0.43 -1.27 0.71 -0.07 1.29 -0.57 0.49 0.74 -0.39

-0.43 -0.23 -1.27 0.71 0.09 0.89 0.53 -0.35 0.65 -0.17

-1.03 -0.58 0.62 0.70 0.19 1.54 1.66 2.21 0.63 -0.34 0.71 0.92 0.61 0.71 -0.08 0.91 0.94 1.73 0.73 -0.12 0.79 1.25 0.13 0.72 -0.13

-0.21 0.57 -1.27 0.71 0.08 -0.27 -0.25 0.16 0.72 0.03 -1.14 -1.52 1.04 0.68 0.22 -0.38 -0.71 0.62 0.70 0.08 -0.18 0.09 -0.94 0.61 0.06 0.24 0.17 -0.71 0.64 -0.02

-0.03 -0.51 -0.76 0.72 0.02 -0.82 -1.23 -0.06 0.59 0.19 -1.59 -1.85 1.65 0.57 0.38 0.28 0.80 -0.38 0.60 -0.04

-0.86 -0.51 -1.29 0.58 0.25 -1.40 -1.40 1.10 0.62 0.30 -0.57 -0.30 -0.05 0.64 0.14 0.43 0.61 0.54 0.63 -0.06 0.73 1.40 0.25 0.62 -0.14

-1.61 -1.61 1.65 0.62 0.35 0.36 0.50 0.64 0.61 -0.06

-0.59 0.12 -0.77 0.59 0.18 -0.29 -1.25 -0.57 0.63 0.05 0.38 0.78 -1.34 0.53 -0.05 0.50 -0.98 0.23 0.63 -0.24

-1.33 -1.70 . 0.15 0.56 0.33 -1.11 -0.93 0.23 0.63 0.24 1.62 2.49 1.92 0.53 -0.36

-0.26 -0.36 0.15 0.56 0.09 -1.65 -1.69 -0.34 0.51 0.42 -0.54 -0.26 0.52 0.57 0.17 0.02 0.33 -0.13 0.54 0.04 1.65 0.74 -0.79 0.50 -0.59 0.09 0.03 -1.34 0.53 -0.01

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

-0.37 -0.24 -0.63 -0.19 -0.19 -0.37 -0.19 0.15

-0.13 0.25

-0.19 -0.31 -0.24 -0.19 -0.24 -0.01 -0.13 0.36 0.20

-0.24 0.46 0.56 0.41 0.51 0.31 0.20 0.25 0.31 0.31 0.36 0.46 0.25 0.76 0.25 0.61 0.25 0.76 0.61 0.86 0.56 0.71 0.91 0.76

0.35 0.36 0.55 0.50 0.36 0.43 0.43 0.36 0.48 0.29 0.32 0.30 0.28 0.37 0.43 0.54 0.47 0.43 0.41 0.44 0.52 0.56 0.38 0.48 0.54 0.46 0.38 0.32 0.55 0.39 0.44 0.51 0.39 0.49 0.55 0.51 0.26 0.46 0.56 0.46 0.42 0.37 0.43


173


Logit Point. Unwt. Wt. Ability Mean Item item bis. total total between item

# diff. corr. fit fit fit score

Logit Residual

Index

83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

100

0.81 0.91 1.11 0.91 0.81 1.06 0.81 0.86 0.56 0.76 0.86 1.01 0.86 0.51 0.86 1.01 0.96 0.71

0.45 0.35 0.50 0.30 0.57 0.48 0.48 0.29 0.50 0.47 0.47 0.43 0.36 0.46 0.40 0.45 0.26 0.48

0.58 0.93

-0.78 2.09

-1.58 -0.40 -0.91 2.13

-0.92 -0.62 -0.56 0.32 0.95

-0.31 0.21

-0.38 1.73

-0.69

-0.34 1.26

-0.92 1.49

-2.02 -0.57 -0.33 1.70

-0.83 -0.44 -0.46 0.03 0.99

-0.33 0.63 0.04 2.40

-0.43

0.04 -0.08 -0.73 -0.08 0.04

-0.32 -0.10 0.26

-0.17 -1.34 0.97

-0.78 -0.48 -0.53 0.26

-0.09 2.40

-0.72

0.52 0.50 0.46 0.50 0.52 0.47 0.52 0.51 0.57 0.53 0.51 0.48 0.51 0.58 0.51 0.48 0.49 0.54 0.67

-0.28 -0.21 0.19

-0.64 0.38 0.10 0.28

-0.67 0.25 0.19 0.17

-0.10 -0.24 0.11

-0.00 0.14

-0.42 0.21

Mean S.D. Groups

-0.00 0.65

0.40 0.08

0.00 0.84

0.02 0.90

0.08 0.90 2

Note: Raw score mean — 66.82 with a S.D. of 18 Mean person ability = 0.91 with a S.D. of 1.08. Test reliability (K.R. 20) = 0.95. Reliability of person separation = 0.94.

92.

174


Experiment 46- Summary of Item Fit Information for a Uniformly Distributed Item Difficulty Distribution With 25 Items. 25 Persons, and a 50% Chance of Guessing Correctly



1 -0.69 0.52 -0.16 0.29 0.75 0.80 0.05 2 -1.54 0.57 -0.03 -0.45 0.07 0.88 0.08 3 -0.35 0.38 1.84 0.62 -0.74 0.76 -0.81 4 -1.07 0.64 -0.20 -0.79 -0.25 0.84 0.11 5 -0.69 0.54 1.89 -0.70 -1.17 0.80 -0.87 6 -1.07 0.34 0.99 0.62 1.75 0.84 -0.14 7 -0.35 0.78 -1.24 -1.66 1.09 0.76 0.28 8 -1.54 0.44 0.48 0.13 0.55 0.88 0.01 9 0.22 0.32 1.33 1.29 0.50 0.68 -0.62

10 0.22 0.24 1.41 1.80 1.78 0.68 -0.57 11 0.47 0.48 0.42 0.44 1.07 0.64 -0.23 12 -0.05 0.50 0.03 0.34 1.16 0.72 0.09 13 0.22 0.56 0.17 -0.22 -1.53 0.68 0.04 14 0.71 0.48 0.01 0.64 0.50 0.60 0.14 15 0.71 0.34 0.78 1.59 0.50 0.60 -0.23 16 -1.07 0.53 -0.16 0.16 0.40 0.84 0.06 17 0.71 0.61 -0.71 -0.47 -1.35 0.60 0.35 18 0.22 0.52 1.18 -0.14 -1.53 0.68 -0.50 19 -0.35 0.72 -0.93 -1.10 1.09 0.76 0.25 20 0.22 0.50 -0.02 0.50 -1.53 0.68 0.09 21 1.38 0.58 -0.52 -0.50 -0.15 0.48 0.32 22 1.38 0.52 0.36 -0.16 -0.15 0.48 -0.12 23 0.47 0.64 -0.80 -0.67 -0.39 0.64 0.33 24 0.47 0.60 -0.64 -0.33 -0.39 0.64 0.30 25 1.38 0.65 -0.93 -1.24 -0.15 0.48 0.45

Mean 0.00 0.52 0.18 -0.00 0.08 0.70 S.D. 0.86 0.13 0.88 0.85 1.00 Groups 2 Note: Raw score mean = 17.44 and a S.D. of 5.97. Mean person ability = 1.20 with a S.D. of 1.52. Test reliability (K.R. 20) = 0.91. Reliability of person separation = 0.90.

175


Exneriment 47- Summary of Item Fit Information for a Uniformly Distributed Item Difficulty Distribution With 50 Items. 25 Persons, and a 50% Chance of Guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -1.24 0.49 -0.35 -0.07 0.13 0.88 0.08 2 -0.50 0.53 -0.56 -0.13 0.70 0.80 0.13 3 -1.24 0.72 -1.02 -1.17 0.13 0.88 0.12 4 -0.83 0.54 -0.23 -0.48 -0.34 0.84 0.09 5 -0.50 0.63 -0.58 -0.84 -1.14 0.80 0.16 6 -0.20 0.67 -0.96 -1.03 0.97 0.76 0.25 7 -0.50 0.14 2.35 0.79 1.00 0.80 -1.04 8 0.06 0.09 1.73 1.63 -0.23 0.72 -0.73 9 -0.83 0.52 -0.25 -0.31 0.41 0.84 0.10

10 -1.77 -0.20 1.80 0.90 3.41 0.92 -0.31 11 -0.83 0.12 0.72 1.13 -0.34 0.84 -0.06 12 -0.50 0.59 -0.19 -0.74 -1.14 0.80 0.09 13 -0.83 0.26 0.70 0.55 1.63 0.84 -0.07 14 -0.50 0.64 -0.68 -0.87 -1.14 0.80 0.17 15 -0.50 0.45 0.36 -0.11 -1.14 0.80 -0.02 16 -0.20 0.66 -0.93 -0.98 0.97 0.76 0.25 17 -1.77 0.40 0.01 0.01 -0.18 0.92 0.05 18 0.31 0.48 -0.24 0.02 0.24 0.68 0.09 19 0.31 0.70 -1.13 -1.54 0.24 0.68 0.38 20 -0.20 0.16 0.85 1.33 1.84 0.76 -0.21 21 0.31 0.50 -0.07 -0.18 -0.99 0.68 0.12 22 0.31 0.48 -0.23 0.01 -0.99 0.68 0.17 23 -0.50 0.20 0.87 0.93 2.44 0.80 -0.13 24 -0.20 0.40 0.83 0.02 0.39 0.76 -0.21 25 0.06 0.33 0.47 0.79 -0.26 0.72 -0.14 26 0.31 0.63 -0.88 -0.96 0.24 0.68 0.33 27 -0.83 0.48 -0.43 0.10 0.41 0.84 0.10 28 0.96 -0.01 2.75 2.71 1.49 0.56 -1.74 29 0.53 0.51 -0.46 -0.08 -0.94 0.64 0.24 30 -0.50 0.42 -0.17 0.38 0.70 0.80 0.07 31 0.31 0.62 -0.97 -0.82 1.55 0.68 0.34 32 0.31 0.45 -0.11 0.26 -0.99 0.68 0.12 33 0.75 0.19 1.27 1.67 0.89 0.60 -0.61 34 0.75 0.45 -0.04 0.18 -0.54 0.60 0.13 35 0.06 0.51 -0.22 -0.14 -0.26 0.72 0.13 36 0.31 0.28 0.83 0.97 0.73 0.68 -0.26 37 0.06 0.35 0.91 0.41 1.28 0.72 -0.24 38 -0.50 0.55 -0.60 -0.20 0.70 0.80 0.15 39 0.53 0.77 -1.83 -2.13 1.86 0.64 0.61



176

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 0.75 0.41 0.06 0.48 -0.54 0.60 0.07 41 0.31 0.30 0.93 0.83 0.24 0.68 -0.38 42 0.96 0.36 0.50 0.77 -1.47 0.56 -0.15 43 0.75 0.53 -0.59 -0.27 -0.21 0.60 0.32 44 0.31 0.69 -1.24 -1.30 0.24 0.68 0.41 45 0.75 0.34 0.91 0.76 0.89 0.60 -0.34 46 1.16 0.61 -1.10 -1.21 -0.43 0.52 0.55 47 1.56 0.26 1.61 1.05 -0.39 0.44 -1.03 48 0.96 0.66 -1.31 -1.48 1.55 0.56 0.59 49 0.75 0.40 0.20 0.57 0.89 0.60 0.04 50 0.96 0.63 -1.17 -1.24 0.38 0.56 0.54

Mean S.D.

-0.00 0.75

0.44 0.20

0.04 0.99

0.02 0.96

0.30 1.03

0.72


97.

Ill


E x p e r i m e n t dR- Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 100 Items. 25 Persons- and a 50% Chance of Guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -2.39 0.26 -0.01 0.27 -0.20 0.96 0.04 2 -1.14 0.19 0.63 0.20 -0.67 0.88 -0.08 3 -0.21 0.33 -0.17 0.19 0.18 0.76 0.09 4 -3.13 -9.99 -0.41 -1.04 0.05 1.00 -0.00 5 -0.78 0.58 -0.85 -0.64 0.92 0.84 0.15 6 -1.14 0.41 -0.34 -0.13 0.59 0.88 0.08 7 -0.78 0.23 0.29 0.25 0.65 0.84 -0.00 8 -3.13 -9.99 -0.41 -1.04 0.05 1.00 -0.00 9 -1.63 0.54 -0.72 -0.27 0.24 0.92 0.08

10 -0.48 0.54 -0.66 -0.65 -0.38 0.80 0.17 11 -2.39 0.40 -0.30 0.12 -0.20 0.96 0.04 12 -0.48 0.27 -0.04 0.37 -0.38 0.80 0.06 13 -1.14 0.59 -0.92 -0.54 0.59 0.88 0.12 14 -0.21 0.21 0.15 0.66 -0.85 0.76 0.01 15 0.02 0.47 -0.51 -0.41 0.21 0.72 0.22 16 -0.48 0.55 -0.55 -0.76 -0.38 0.80 0.14 17 -0.78 0.30 -0.10 0.21 -1.18 0.84 0.07 18 -0.21 0.05 0.90 1.06 0.18 0.76 -0.24 19 -1.63 0.16 0.32 0.30 0.28 0.92 0.02 20 -0.48 0.57 -0.84 -0.69 1.23 0.80 0.19 21 0.02 0.36 -0.15 0.10 0.21 0.72 0.11 22 -0.48 0.23 0.06 0.48 -0.38 0.80 0.05 23 -0.21 0.38 -0.27 -0.01 -0.85 0.76 0.12 24 -1.14 0.15 0.97 0.22 -0.67 0.88 -0.19 25 -0.48 0.50 -0.67 -0.41 1.23 0.80 0.17 26 -0.21 0.49 -0.57 -0.48 0.18 0.76 0.19 27 -0.48 0.09 0.82 0.68 -0.06 0.80 -0.15 28 -1.14 0.27 -0.02 0.19 0.59 0.88 0.06 29 -0.48 0.33 -0.09 0.09 -0.06 0.80 0.08 30 -0.48 0.49 -0.61 -0.37 1.23 0.80 0.16 31 0.02 0.52 -0.78 -0.56 0.65 0.72 0.26 32 -0.78 0.31 0.25 -0.01 0.65 0.84 -0.00 33 -0.21 -0.20 2.32 1.51 0.83 0.76 -0.95 34 -0.48 0.18 0.67 0.38 1.45 0.80 -0.11 35 -0.21 0.22 0.70 0.34 0.83 0.76 -0.16 36 -0.78 -0.07 1.28 0.85 2.11 0.84 -0.26 37 0.45 0.11 1.01 1.33 1.50 0.64 -0.37 38 -1.14 0.36 -0.26 0.02 0.59 0.88 0.07 39 -0.78 0.52 -0.54 -0.51 -1.18 0.84 0.12


178


Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 -0.21 0.61 -0.91 -1.04 0.18 0.76 0.25 41 -0.21 0.31 -0.04 0.23 0.18 0.76 0.06 42 0.24 0.31 0.87 0.06 -0.48 0.68 -0.38 43 0.02 0.41 -0.17 -0.21 -1.04 0.72 0.12 44 -0.21 0.49 -0.66 -0.43 1.53 0.76 0.20 45 -0.21 0.59 -0.84 -1.00 0.18 0.76 0.24 46 -0.78 0.44 -0.36 -0.25 -1.18 0.84 0.10 47 0.45 0.61 -1.26 -1.34 -1.58 0.64 0.54 48 -0.21 0.14 0.34 0.89 -0.85 0.76 -0.03 49 0.24 0.36 0.93 -0.28 -0.27 0.68 -0.48 50 0.45 0.46 -0.60 -0.37 1.49 0.64 0.26 51 -0.21 0.24 0.27 0.43 0.83 0.76 -0.01 52 0.45 0.33 0.00 0.35 0.34 0.64 0.06 53 1.02 0.35 -0.05 0.37 0.95 0.52 0.14 54 0.24 0.21 0.81 0.64 -0.27 0.68 -0.27 55 0.02 0.18 0.62 0.70 0.21 0.72 -0.14 56 0.45 0.52 -0.78 -0.82 1.49 0.64 0.35 57 0.02 0.34 -0.20 0.23 -1.04 0.72 0.11 58 0.84 0.35 -0.01 0.33 0.38 0.56 0.10 59 -1.14 0.46 -0.41 -0.28 0.59 0.88 0.08 60 -0.21 0.28 0.60 0.02 0.83 0.76 -0.12 61 0.45 0.30 0.21 0.46 -1.58 0.64 -0.05 62 0.65 0.64 -1.52 -1.63 1.89 0.60 0.70 63 0.45 0.23 0.41 0.82 0.34 0.64 -0.10 64 0.84 0.52 -0.94 -0.89 0.19 0.56 0.53 65 -0.21 0.35 -0.00 -0.02 -0.85 0.76 0.06 66 0.24 0.45 -0.56 -0.25 1.08 0.68 0.22 67 1.21 -0.06 2.35 2.47 2.40 0.48 -1.52 68 -0.48 0.45 -0.19 -0.39 -0.38 0.80 0.08 69 -0.21 0.50 -0.65 -0.52 1.53 0.76 0.20 70 0.65 0.50 -0.74 -0.73 -0.51 0.60 0.39 71 0.45 0.35 -0.14 0.24 -1.58 0.64 0.10 72 0.65 0.04 1.28 1.94 0.96 0.60 -0.67 73 0.84 0.45 -0.34 -0.46 0.19 0.56 0.21 74 0.65 0.26 0.95 0.49 -0.34 0.60 -0.53 75 0.24 0.55 -0.85 -0.90 -0.48 0.68 0.35 76 0.65 -0.09 2.30 2.36 1.99 0.60 -1.40 77 0.45 0.27 0.33 0.57 0.34 0.64 -0.09 78 0.84 0.21 0.83 1.06 -1.31 0.56 -0.45 79 0.24 0.53 -0.81 -0.77 -0.48 0.68 0.33 80 0.24 -0.05 1.91 1.62 0.93 0.68 -0.88 81 0.65 0.62 -1.43 -1.48 0.85 0.60 0.67 82 1.21 0.31 0.66 0.32 0.34 0.48 -0.42



179



83 0.45 0.28 1.21 0.21 -1.58 0.64 -0.64 84 0.02 0.44 -0.43 -0.29 -1.04 0.72 0.19 85 0.65 0.31 0.89 0.16 -0.34 0.60 -0.50 86 0.84 0.54 -1.04 -0.96 -1.31 0.56 0.58 87 0.24 0.53 -0.72 -0.80 1.08 0.68 0.29 88 0.84 0.32 0.37 0.38 0.19 0.56 -0.21 89 1.21 0.50 -0.68 -0.82 -1.39 0.48 0.41 90 1.40 0.30 0.80 0.32 -0.43 0.44 -0.51 91 1.02 0.18 1.07 1.25 0.95 0.52 -0.59 92 0.24 0.38 -0.25 0.05 -0.27 0.68 0.15 93 0.84 0.18 0.97 1.19 0.38 0.56 -0.48 94 1.02 0.65 -1.75 -1.88 1.83 0.52 0.91 95 1.02 0.17 1.38 1.19 -0.56 0.52 -0.86 96 0.45 -0.03 2.20 1.66 1.50 0.64 -1.14 97 0.24 0.27 0.20 0.55 0.93 0.68 -0.01 98 1.40 0.42 -0.17 -0.29 -0.43 0.44 0.14 99 0.84 0.32 0.36 0.33 -1.31 0.56 -0.14

100 0.84 0.32 0.20 0.45 -1.31 0.56 -0.03 Mean -0.06 0.34 0.05 0.07 0.15 0.71 S.D. 0.88 0.19 0.84 0.80 0.94 Groups 2 Note: Raw score mean = 71.28 with a S.D. of 15. Mean person ability =1.12 with a S.D. of 0.89. Test reliability (K.R. 20) = 0.93. Reliability of person separation = 0.91.

00.

180





1 -1.51 0.39 -0.25 0.05 0.56 0.88 0.07 2 -0.48 0.45 -0.23 -0.05 -0.93 0.76 0.10 3 -0.35 0.55 -0.53 -0.97 -0.47 0.74 0.14 4 -0.48 0.55 -0.94 -0.78 0.52 0.76 0.19 5 -0.62 0.40 1.44 -0.05 0.26 0.78 -0.49 6 -0.10 0.59 -1.20 -1.07 1.23 0.70 0.27 7 -0.77 0.26 0.34 1.14 -0.03 0.80 -0.00 8 -0.35 0.25 0.73 1.50 0.74 0.74 -0.16 9 0.14 0.61 -1.42 -1.45 0.82 0.66 0.37

10 -0.10 0.33 1.01 0.77 1.03 0.70 -0.19 11 0.02 0.33 0.48 1.18 1.58 0.68 -0.03 12 -0.35 0.56 -1.01 -0.82 1.74 0.74 0.22 13 0.02 0.38 0.42 0.72 -0.47 0.68 -0.05 14 0.47 0.55 -0.88 -0.73 0.81 0.60 0.31 15 0.02 0.46 -0.04 -0.11 -0.47 0.68 -0.02 16 0.36 0.42 0.91 0.35 -0.58 0.62 -0.31 17 0.25 0.57 -1.01 -0.97 -1.26 0.64 0.30 18 -0.10 0.49 -0.32 -0.32 -0.03 0.70 0.13 19 0.02 0.51 0.02 -0.65 -0.67 0.68 0.03 20 0.02 0.43 -0.26 0.43 -0.47 0.68 0.12 21 0.89 0.43 0.13 0.66 0.40 0.52 0.03 22 0.57 0.43 0.67 0.45 -0.32 0.58 -0.23 23 0.57 0.25 2.39 1.83 0.69 0.58 -0.98 24 0.57 0.45 0.07 0.33 0.69 0.58 0.06 25 1.30 0.51 -0.47 -0.32 0.41 0.44 0.21

Mean 0.00 0.45 0.00 0.05 0.23 0.68 S.D. 0.58 0.11 0.89 0.87 0.79 Groups 2 Note: Raw score mean = 16.92 and a S.D. of 5.27. Mean person ability = .99 with a S.D. of 1.25. Test reliability (K.R. 20) = 0.84. Reliability of person separation = 0.80.

181





1 -1.37 0.24 0.22 0.28 0.75 0.88 0.05 2 -0.82 0.13 0.64 1.39 0.89 0.82 -0.03 3 -1.17 0.46 -0.54 -0.75 0.82 0.86 0.10 4 -1.17 0.08 1.49 0.89 1.68 0.86 -0.33 5 -0.82 0.24 0.79 0.30 1.99 0.82 -0.08 6 -0.53 0.37 0.29 -0.08 -1.24 0.78 -0.02 7 -0.99 0.30 0.24 0.24 -0.53 0.84 0.02 8 -1.87 0.22 0.22 0.26 -0.18 0.92 0.04 9 -0.67 0.33 0.08 0.27 0.11 0.80 0.04

10 -0.82 0.49 -0.67 -0.98 1.14 0.82 0.14 11 -0.53 0.50 -0.67 -0.93 0.38 0.78 0.17 12 -0.53 0.28 0.23 0.75 0.13 0.78 0.04 13 -0.02 0.53 -0.59 -1.33 -0.95 0.70 0.22 14 -0.26 0.10 1.39 2.04 3.16 0.74 -0.31 15 -0.39 0.37 0.18 0.02 -0.70 0.76 0.03 16 -0.26 0.55 -0.68 -1.61 0.87 0.74 0.19 17 -0.26 0.15 0.86 1.96 2.39 0.74 -0.13 18 -0.26 0.30 0.11 1.03 -0.75 0.74 0.05 19 -0.39 0.44 0.15 -0.72 -0.70 0.76 0.00 20 -0.14 0.48 -0.35 -0.63 1.10 0.72 0.13 21 -0.02 0.39 -0.01 0.29 0.86 0.70 0.08 22 -0.02 0.30 0.83 0.85 -0.23 0.70 -0.20 23 -0.26 0.53 -0.76 -1.24 0.87 0.74 0.21 24 -0.53 0.48 -0.38 -0.96 -1.24 0.78 0.14 25 0.21 0.48 -0.52 -0.31 0.94 0.66 0.19 26 -0.26 0.42 -0.29 -0.06 -0.75 0.74 0.13 27 -0.26 0.29 0.67 0.74 0.55 0.74 -0.10 28 -0.02 0.32 2.11 0.48 -0.95 0.70 -1.15 29 0.54 0.52 -0.62 -0.63 0.03 0.60 0.26 30 0.75 0.45 0.01 0.15 0.06 0.56 0.07 31 0.32 0.59 -1.03 -1.66 0.31 0.64 0.35 32 0.21 0.44 -0.21 0.06 -1.36 0.66 0.14 33 1.27 0.38 0.64 1.17 -0.39 0.46 -0.15 34 0.32 0.45 -0.10 -0.15 -0.93 0.64 0.09 35 -0.14 0.52 -0.61 -1.10 0.05 0.72 0.21 36 0.96 0.59 -1.09 -1.34 0.70 0.52 0.42 37 0.43 0.31 0.59 1.45 0.33 0.62 -0.15 38 0.54 0.64 -1.37 -2.37 1.76 0.60 0.45 39 1.06 0.58 -0.98 -1.21 0.23 0.50 0.38



182

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 0.54 0.51 -0.50 -0.70 0.03 0.60 0.21 41 0.54 0.39 0.19 0.76 -0.09 0.60 0.01 42 0.64 0.50 1.10 -0.87 0.42 0.58 -0.65 43 0.64 0.45 -0.04 0.04 0.42 0.58 0.09 44 0.64 0.50 -0.35 -0.56 -0.68 0.58 0.16 45 0.54 0.26 0.93 2.06 2.39 0.60 -0.27 46 1.06 0.46 0.07 0.17 0.23 0.50 0.04 47 0.96 0.46 0.08 0.16 0.14 0.52 0.02 48 0.96 0.36 0.85 1.20 -1.11 0.52 -0.28 49 0.64 0.46 -0.23 0.08 -0.68 0.58 0.17 50 1.06 0.53 -0.51 -0.53 -0.96 0.50 0.25

Mean S.D.

0.00 0.72

0.40 0.13

0.04 0.72

-0.03 1.00

0.23 1.02

0.69

Groups Note: Raw score mean = 34.30 with a S.D. of 9. Mean person ability =1.13 with a S.D. of 1.33. Test reliability (K.R. 20) = 0.91. Reliability of person separation = 0.91.

80.

183


Experiment 51 • Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 100 Items. 50 Persons, and a 50% Chance of Guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -0.47 0.37 0.14 0.22 0.13 0.80 0.01 2 -1.42 0.44 -0.64 -0.33 0.46 0.90 0.08 3 -0.98 0.46 -0.46 -0.40 -0.97 0.86 0.07 4 -0.47 0.31 0.19 0.58 -1.14 0.80 0.03 5 -1.42 0.31 0.04 0.12 -0.71 0.90 0.04 6 -0.98 0.33 -0.14 0.28 0.82 0.86 0.06 7 -0.47 0.19 1.45 0.75 1.60 0.80 -0.24 8 -1.18 0.43 -0.53 -0.26 0.64 0.88 0.08 9 -1.69 0.24 0.33 0.20 -0.16 0.92 0.01

10 -0.47 0.20 1.21 0.92 1.60 0.80 -0.21 11 -0.98 0.54 -1.15 -0.62 0.82 0.86 0.12 12 -0.63 0.18 0.99 0.90 0.87 0.82 -0.11 13 -0.63 0.25 1.21 0.45 0.87 0.82 -0.20 14 -0.98 0.36 -0.24 0.16 0.82 0.86 0.07 15 -1.18 0.45 -0.83 -0.17 0.64 0.88 0.09 16 -0.79 0.37 -0.29 0.16 -0.52 0.84 0.07 17 -0.19 0.44 -0.26 -0.07 0.66 0.76 0.06 18 -0.47 0.27 1.31 0.52 -1.14 0.80 -0.31 19 0.07 0.53 -0.58 -0.70 1.14 0.72 0.13 20 -0.47 0.64 -1.47 -1.49 1.32 0.80 0.20 21 -0.47 0.56 -1.08 -0.82 1.32 0.80 0.17 22 -0.98 0.42 -0.18 -0.35 0.32 0.86 0.06 23 -0.47 0.38 -0.29 0.32 -1.14 0.80 0.07 24 -0.79 0.44 -0.34 -0.34 -0.52 0.84 0.08 25 -0.79 0.43 -0.42 -0.16 -0.52 0.84 0.08 26 -2.03 0.21 0.14 0.33 0.05 0.94 0.03 27 -0.47 0.51 -0.88 -0.53 0.13 0.80 0.15 28 -0.47 0.20 1.51 0.87 -1.14 0.80 -0.32 29 -0.19 0.44 -0.03 -0.16 -0.64 0.76 0.03 30 -0.79 0.48 -0.86 -0.32 0.99 0.84 0.12 31 -0.19 0.42 0.23 -0.16 -0.34 0.76 -0.00 32 -0.06 0.56 -1.08 -0.82 1.83 0.74 0.23 33 -0.19 0.43 -0.10 -0.08 -0.34 0.76 0.05 34 -0.33 0.42 0.90 -0.37 -1.17 0.78 -0.23 35 -0.33 0.40 0.87 -0.13 -1.17 0.78 -0.24 36 0.30 0.35 0.97 0.59 0.42 0.68 -0.29 37 0.19 0.37 0.65 0.38 0.79 0.70 -0.11 38 -0.19 0.41 0.87 -0.19 0.87 0.76 -0.22 39 -0.79 0.27 0.23 0.58 -0.12 0.84 0.02


184


Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 0.07 0.27 1.55 0.81 1.14 0.72 -0.39 41 0.30 0.61 -1.27 -1.35 1.59 0.68 0.28 42 -0.79 0.27 0.15 0.66 -0.52 0.84 0.02 43 -0.33 0.31 0.31 0.68 0.09 0.78 -0.02 44 -0.47 0.35 0.15 0.30 0.48 0.80 0.01 45 0.19 0.52 -0.61 -0.63 -0.85 0.70 0.17 46 -0.06 0.19 1.59 1.36 0.50 0.74 -0.40 47 -0.19 0.40 0.17 0.10 -0.34 0.76 0.01 48 0.41 0.60 -1.43 -1.17 1.82 0.66 0.35 49 0.30 0.32 1.17 0.68 1.34 0.68 -0.26 50 0.19 0.49 -0.17 -0.47 0.42 0.70 0.01 51 -0.06 0.41 -0.01 0.14 -0.83 0.74 0.05 52 0.63 0.46 0.22 -0.21 -0.32 0.62 -0.03 53 0.41 0.39 0.87 0.24 0.02 0.66 -0.24 54 0.07 0.51 -0.87 -0.37 0.10 0.72 0.20 55 0.30 0.20 1.41 1.72 2.14 0.68 -0.39 56 0.07 0.32 0.23 0.96 -1.52 0.72 -0.03 57 -0.06 0.18 1.41 1.38 2.35 0.74 -0.29 58 0.41 0.44 0.57 -0.10 -1.54 0.66 -0.20 59 -0.06 0.33 0.48 0.60 0.50 0.74 -0.05 60 0.07 0.64 -1.49 -1.63 1.14 0.72 0.29 61 0.52 0.60 -1.04 -1.40 0.37 0.64 0.31 62 0.07 0.43 0.16 -0.06 0.10 0.72 -0.05 63 -0.33 0.31 0.24 0.71 -1.17 0.78 -0.01 64 0.52 0.51 -0.59 -0.47 -0.43 0.64 0.21 65 0.63 0.30 0.95 1.27 1.14 0.62 -0.30 66 0.30 0.37 0.21 0.62 0.42 0.68 -0.03 67 0.74 0.45 -0.26 0.09 -1.22 0.60 0.13 68 0.07 0.35 0.21 0.65 0.11 0.72 -0.04 69 -0.19 0.10 1.67 1.71 0.87 0.76 -0.33 70 0.63 0.63 -1.69 -1.59 0.70 0.62 0.48 71 0.52 0.50 -0.47 -0.40 1.27 0.64 0.13 72 0.63 0.52 -0.80 -0.51 0.24 0.62 0.27 73 0.07 0.46 0.11 -0.39 0.11 0.72 0.02 74 0.52 0.55 -0.97 -0.85 -0.81 0.64 0.30 75 0.30 0.39 0.42 0.40 -0.38 0.68 -0.11 76 0.41 0.40 0.41 0.37 0.02 0.66 -0.07 77 0.30 0.63 -1.55 -1.44 0.71 0.68 0.36 78 0.74 0.58 -0.71 -1.36 1.02 0.60 0.19 79 0.07 0.62 -1.36 -1.37 0.10 0.72 0.28 80 0.63 0.48 -0.34 -0.20 0.70 0.62 0.14 81 0.52 0.54 -0.78 -0.79 0.37 0.64 0.24 82 0.94 0.43 0.35 0.12 -1.51 0.56 -0.17


185


Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

83 0.52 0.56 -1.10 -0.90 1.27 0.64 0.32 84 1.04 0.44 -0.15 0.20 -0.55 0.54 0.11 85 1.14 0.62 -1.63 -1.73 -0.21 0.52 0.53 86 0.52 0.49 -0.36 -0.39 0.37 0.64 0.10 87 0.30 0.56 -1.01 -0.90 1.59 0.68 0.26 88 0.41 0.39 0.25 0.41 1.00 0.66 0.01 89 1.24 0.50 -0.51 -0.49 -0.87 0.50 0.21 90 0.07 0.20 1.66 1.32 0.11 0.72 -0.45 91 0.52 0.42 -0.09 0.40 -0.43 0.64 0.08 92 0.84 0.43 0.04 0.25 0.39 0.58 0.02 93 1.14 0.31 1.27 1.27 1.70 0.52 -0.46 94 0.94 0.29 1.23 1.44 1.66 0.56 -0.44 95 0.94 0.40 0.34 0.55 -1.51 0.56 -0.08 96 1.85 0.50 -0.57 -0.61 1.31 0.38 0.19 97 0.52 0.54 -0.92 -0.72 0.37 0.64 0.28 98 1.14 0.30 1.30 1.37 0.04 0.52 -0.46 99 0.84 0.39 0.33 0.59 0.39 0.58 -0.06

100 0.84 0.23 1.72 1.82 1.24 0.58 -0.58 Mean S.D. Groups

-0.00 0.70

0.41 0.12

0.02 0.88

0.03 0.80

0.26 0.92 2

0.72


51.

186



Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -0.69 0.36 0.21 -0.01 0.72 0.83 -0.00 2 -0.78 0.31 0.33 0.30 0.02 0.84 -0.00 3 -0.61 0.17 2.19 1.04 2.02 0.82 -0.36 4 -0.61 0.38 -0.40 0.12 -0.67 0.82 0.06 5 -1.28 0.29 0.64 0.07 -1.15 0.89 -0.05 6 -0.32 0.31 0.95 0.42 2.37 0.78 -0.10 7 -0.61 0.36 -0.33 0.22 -0.62 0.82 0.05 8 -0.46 0.40 0.01 -0.22 -1.59 0.80 0.02 9 -0.18 0.43 -0.71 -0.07 0.74 0.76 0.12

10 -0.32 0.31 1.18 0.54 -0.65 0.78 -0.21 11 0.19 0.23 1.41 1.58 2.78 0.70 -0.26 12 -0.25 0.50 -0.58 -0.94 -1.22 0.77 0.08 13 -0.32 0.33 0.45 0.41 1.05 0.78 -0.04 14 0.24 0.43 0.09 -0.25 0.19 0.69 0.02 15 -0.05 0.51 -1.25 -0.77 1.74 0.74 0.18 16 0.01 0.58 -1.79 -1.52 1.89 0.73 0.25 17 0.41 0.28 1.16 1.40 0.79 0.66 -0.22 18 0.52 0.37 0.93 0.41 0.25 0.64 -0.23 19 0.77 0.36 0.59 0.85 0.84 0.59 -0.08 20 0.57 0.41 0.32 0.08 0.09 0.63 -0.06 21 0.41 0.56 -1.63 -1.47 0.73 0.66 0.32 22 0.52 0.54 -1.45 -1.42 1.15 0.64 0.31 23 0.67 0.46 -0.09 -0.53 2.24 0.61 -0.04 24 0.97 0.44 -0.27 -0.30 0.89 0.55 0.07 25 1.21 0.48 -0.89 -0.88 0.47 0.50 0.25

Mean S.D. Groups

0.00 0.62

0.39 0.10

0.04 1.00

-0.04 0.83

0.60 1.17 2

0.72


187


Expe r imen t 53- Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 50 Items. 100 Persons, and a 50% Chance of Guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -0.90 0.29 -0.02 0.17 -0.82 0.83 0.04 2 -1.25 0.24 -0.19 0.39 -0.40 0.87 0.05 3 -1.07 0.24 0.30 0.34 -0.13 0.85 -0.00 4 -0.99 0.26 0.68 0.11 -0.83 0.84 -0.07 5 -0.90 0.32 -0.18 -0.04 0.30 0.83 0.04 6 -0.90 0.40 -0.54 -0.65 -0.50 0.83 0.07 7 -0.90 0.25 0.41 0.38 -0.82 0.83 -0.03 8 -0.90 0.39 -0.67 -0.52 -0.50 0.83 0.09 9 -1.16 0.33 -0.33 -0.16 -0.13 0.86 0.05

10 -0.61 0.51 -1.49 -1.30 1.89 0.79 0.17 11 -0.29 0.44 -0.92 -0.57 -0.07 0.74 0.16 12 -0.54 0.39 -0.37 -0.38 -0.19 0.78 0.07 13 -0.11 0.34 0.04 0.42 -1.18 0.71 0.02 14 -0.99 0.33 -0.31 -0.03 -0.83 0.84 0.05 15 -0.61 0.28 0.24 0.53 -1.01 0.79 -0.01 16 -0.35 0.48 -1.23 -0.98 1.20 0.75 0.18 17 -0.06 0.42 -0.45 -0.43 -0.86 0.70 0.11 18 -0.41 0.37 -0.24 -0.08 0.29 0.76 0.05 19 -0.06 0.51 -1.33 -1.30 0.13 0.70 0.24 20 -0.23 0.31 0.19 0.65 0.18 0.73 0.00 21 0.11 0.28 1.15 0.91 1.29 0.67 -0.20 22 -0.29 0.52 -1.51 -1.40 1.97 0.74 0.22 23 0.21 0.39 -0.15 0.09 -0.08 0.65 0.06 24 -0.17 0.27 0.78 0.86 1.25 0.72 -0.10 25 0.05 0.24 0.86 1.54 0.26 0.68 -0.15 26 0.26 0.47 -1.14 -0.79 0.19 0.64 0.24 27 0.21 0.45 0.04 -0.83 1.23 0.65 -0.08 28 -0.11 0.28 0.54 0.91 0.33 0.71 -0.06 29 0.26 0.44 -0.54 -0.46 -0.61 0.64 0.14 30 0.21 0.45 -0.60 -0.67 -0.08 0.65 0.15 31 0.32 0.42 0.25 -0.56 0.28 0.63 -0.06 32 0.21 0.38 -0.26 0.29 -0.08 0.65 0.07 33 0.42 0.39 0.05 0.06 -0.46 0.61 0.01 34 0.26 0.30 0.91 0.97 -1.12 0.64 -0.16 35 0.42 0.14 3.23 2.68 3.82 0.61 -0.99 36 0.52 0.44 -0.37 -0.42 -0.62 0.59 0.11 37 0.42 0.50 -1.41 -1.18 0.92 0.61 0.33 38 0.52 0.41 0.46 -0.20 0.80 0.59 -0.19 39 0.52 0.48 -0.08 -1.15 0.16 0.59 -0.04



188

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 0.52 0.37 0.16 0.49 -0.62 0.59 -0.03 41 0.81 0.27 1.39 1.85 1.76 0.53 -0.36 42 0.66 0.45 -0.59 -0.39 -0.38 0.56 0.19 43 0.91 0.38 0.11 0.58 1.27 0.51 0.02 44 0.76 0.27 1.78 1.78 1.99 0.54 -0.53 45 0.81 0.33 1.72 0.81 0.04 0.53 -0.57 46 1.10 0.48 -0.81 -0.79 -1.11 0.47 0.23 47 0.66 0.36 0.63 0.62 -0.38 0.56 -0.18 48 0.91 0.27 1.55 1.91 1.78 0.51 -0.39 49 1.05 0.47 -0.94 -0.64 -1.21 0.48 0.28 50 0.71 0.49 -1.05 -1.01 -0.05 0.55 0.29

Mean S.D.

0.00 0.65

0.37 0.09

-0.01 0.94

0.05 0.92

0.17 1.04

0.68


63.

189


Exper iment 54- Summary of Ttem Fit Information for a Uniformly Distributed Item Difficulty Distribution With 100 Items. 100 Persons, and a 50% Chance of Guessing Correctly

Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

1 -1.21 0.20 0.51 0.59 -0.93 0.89 -0.02 2 -0.55 0.31 0.63 0.14 0.67 0.82 -0.06 3 -1.10 0.34 0.27 -0.13 -0.54 0.88 -0.02 4 -1.46 0.20 0.58 0.48 -0.92 0.91 -0.04 5 -0.90 0.48 -0.96 -0.74 0.04 0.86 0.09 6 -0.72 0.45 -1.04 -0.45 1.34 0.84 0.11 7 -0.99 0.42 -0.62 -0.46 -0.23 0.87 0.07 8 -0.63 0.30 0.45 0.20 0.95 0.83 -0.02 9 -0.72 0.48 -1.28 -0.57 1.34 0.84 0.12

10 -0.63 0.37 -0.27 -0.07 0.06 0.83 0.06 11 -0.48 0.49 -1.05 -0.79 0.27 0.81 0.12 12 -0.80 0.42 -0.77 -0.38 1.18 0.85 0.09 13 -0.55 0.42 -0.66 -0.31 -1.23 0.82 0.09 14 -0.72 0.15 1.05 1.09 0.38 0.84 -0.10 15 -1.33 0.27 0.49 0.34 0.23 0.90 -0.07 16 -0.63 0.20 2.16 0.56 0.06 0.83 -0.36 17 -0.40 0.28 1.08 0.33 0.90 0.80 -0.13 18 -0.99 0.35 -0.63 0.11 0.84 0.87 0.08 19 -1.33 0.07 1.20 0.95 1.32 0.90 -0.08 20 -0.40 0.33 0.03 0.30 -0.43 0.80 0.03 21 -0.55 0.43 -0.53 -0.46 0.02 0.82 0.07 22 -0.40 0.45 -0.80 -0.58 -0.43 0.80 0.11 23 -0.80 0.36 -0.11 -0.11 -0.94 0.85 0.03 24 -0.90 0.44 -0.52 -0.56 0.04 0.86 0.06 25 -0.33 0.42 -0.44 -0.34 -1.44 0.79 0.08 26 -0.40 0.32 0.25 0.24 0.90 0.80 0.01 27 -0.48 0.27 0.63 0.53 1.17 0.81 -0.08 28 -0.40 0.29 0.33 0.57 0.90 0.80 -0.01 29 -0.06 0.45 -0.50 -0.64 -0.94 0.75 0.09 30 -0.33 0.36 -0.13 0.08 -1.44 0.79 0.05 31 0.17 0.48 -0.92 -0.86 -0.51 0.71 0.18 32 -0.26 0.33 0.01 0.39 -0.60 0.78 0.04 33 -0.55 0.39 -0.11 -0.18 0.02 0.82 0.02 34 -0.40 0.30 0.47 0.36 0.90 0.80 -0.03 35 -0.72 0.26 0.27 0.60 -0.74 0.84 -0.02 36 -0.63 0.30 0.78 0.21 -1.21 0.83 -0.10 37 -0.33 0.37 0.15 -0.08 -0.24 0.79 0.01 38 -0.19 0.56 -1.63 -1.46 1.74 0.77 0.20 39 0.17 0.39 -0.26 0.11 -1.17 0.71 0.08

(appendix continues-)

190


Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

40 -0.33 0.30 1.38 0.12 0.63 0.79 -0.21 41 -0.40 0.29 0.81 0.41 0.08 0.80 -0.09 42 0.06 0.38 -0.09 -0.02 1.73 0.73 0.06 43 -0.13 0.42 -0.14 -0.40 -0.22 0.76 0.04 44 0.12 0.46 -0.91 -0.60 0.77 0.72 0.15 45 0.34 0.26 2.46 0.88 -1.42 0.68 -0.73 46 0.29 0.48 -0.74 -0.89 0.79 0.69 0.14 47 0.06 0.38 -0.27 0.13 1.21 0.73 -0.03 48 0.12 0.40 -0.42 -0.01 -0.89 0.72 0.11 49 0.00 0.39 -0.35 -0.01 -0.03 0.74 0.09 50 0.12 0.36 -0.15 0.43 -0.89 0.72 0.05 51 0.40 0.42 -0.37 -0.27 -1.04 0.67 0.09 52 -0.19 0.39 0.07 -0.19 0.37 0.77 0.00 53 0.29 0.34 0.80 0.34 0.10 0.69 -0.15 54 -0.90 0.34 -0.28 0.12 0.04 0.86 0.05 55 0.12 0.28 0.63 0.98 1.49 0.72 -0.07 56 -0.06 0.43 -0.63 -0.43 -0.94 0.75 0.11 57 -0.48 0.22 0.42 1.04 0.38 0.81 -0.04 58 0.06 0.34 0.15 0.47 0.46 0.73 0.02 59 0.45 0.34 0.15 0.79 -1.11 0.66 0.01 60 0.17 0.41 -0.23 -0.15 0.99 0.71 0.05 61 0.34 0.42 0.14 -0.49 -0.29 0.68 0.01 62 0.12 0.40 -0.23 -0.11 -0.72 0.72 0.06 63 0.23 0.38 -0.21 0.29 0.56 0.70 0.06 64 0.23 0.39 -0.01 0.04 -0.48 0.70 0.03 65 0.61 0.41 0.13 -0.32 0.95 0.63 -0.03 66 0.50 0.28 0.94 1.26 0.92 0.65 -0.16 67 0.29 0.42 -0.56 -0.19 -0.78 0.69 0.13 68 0.40 0.38 0.82 -0.03 -0.66 0.67 -0.19 69 -0.26 0.39 -0.41 -0.02 -0.60 0.78 0.08 70 0.45 0.49 -1.21 -0.85 0.19 0.66 0.26 71 0.45 0.43 -0.49 -0.24 0.19 0.66 0.12 72 0.45 0.40 -0.40 0.07 -0.14 0.66 0.14 73 0.55 0.41 0.27 -0.38 0.05 0.64 -0.05 74 0.76 0.35 0.61 0.60 -0.65 0.60 -0.13 75 0.66 0.35 0.37 0.53 -0.97 0.62 -0.05 76 0.45 0.42 -0.08 -0.32 -0.62 0.66 0.03 77 1.00 0.46 -0.36 -0.88 0.03 0.55 0.08 78 0.34 0.39 -0.18 0.13 0.46 0.68 0.08 79 0.55 0.34 0.42 0.69 -0.02 0.64 -0.04 80 0.86 0.38 -0.00 0.38 0.19 0.58 0.04 81 0.71 0.46 -0.76 -0.75 -0.54 0.61 0.21 82 0.40 0.40 -0.12 -0.08 -0.66 0.67 0.06


191


Item #

Logit item diff.

Point, bis. corr.

Unwt. total fit

Wt. total fit

Ability between

fit

Mean item score

Logit Residual

Index

83 0.76 0.41 -0.17 -0.17 -0.19 0.60 0.06 84 0.61 0.43 -0.73 -0.25 -0.42 0.63 0.18 85 0.91 0.45 -0.76 -0.62 -0.73 0.57 0.20 86 0.76 0.26 1.05 1.74 2.78 0.60 -0.20 87 0.95 0.41 -0.25 -0.00 -1.33 0.56 0.07 88 0.76 0.37 0.37 0.34 -0.65 0.60 -0.05 89 0.76 0.36 2.10 -0.01 -0.19 0.60 -0.75 90 0.40 0.33 0.51 0.66 0.17 0.67 -0.08 91 0.61 0.35 0.43 0.61 0.37 0.63 -0.07 92 0.55 0.45 -0.98 -0.44 0.05 0.64 0.23 93 0.66 0.30 0.79 1.16 1.30 0.62 -0.16 94 1.10 0.48 -1.13 -1.02 0.65 0.53 0.34 95 0.76 0.36 1.67 0.27 0.14 0.60 -0.60 96 0.61 0.45 -0.36 -0.68 1.51 0.63 0.09 97 0.71 0.40 -0.01 -0.06 -0.54 0.61 0.05 98 0.81 0.29 1.02 1.42 0.49 0.59 -0.21 99 0.86 0.37 0.32 0.39 -1.25 0.58 -0.07

100 1.19 0.49 -1.05 -1.24 1.21 0.51 0.30 Mean S.D. Groups

0.00 0.63

0.37 0.08

0.02 0.75

0.03 0.59

0.04 0.86 2

0.73


REFERENCES

Allen, M. J., & Yen, W. W. (1979). Introduction to measurement theory. Belmont, CA: Wadsworth.

Andersen, E. B. (1973). Goodness of fit test for the Rasch model. Psvchometrika, 38,123-140.

Andrich, D. (1988). Rasch models for measurement. Sage University paper series on Quantitative Applications in the Social Sciences (Series No. 07-068). Beverly Hills: Sage.

Bamdorff-Nielsen, O. (1978). Information and exponential families in statistical theory. New York: Wiley.

Crocker, L.,&Algina, J. (1986). Introduction to classical and mordern test theory. New York: Holt, Rinehart, and Winston.

Ferguson, G. A. (1981). Statistical analysis in psychology and education. New York: McGraw-Hill.

Glaser, R. (1949). A methodological analysis of the inconsistency of responses to test items. Educational and Psychological Measurement 9, 721-739.

Glaser, R. (1952). The reliability of inconsistency. Educational and Psychological Measurement. 11. 60-64.

Gustafsson, J-E. (1980). Testing and obtaining fit of data to the Rasch model. The British Journal of Mathematical and Statistical Psychology. 33. 205-233.

Habermann, S. (1977). Maximum likelihood estimates in exponential response models. The Annals of Statistics. 5. 815-841.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA.: Addison-Wesley.

Luppescu, S. (1992). SIMTEST 2.1 [Computer program for simulating test data]. Chicago: University of Chicago Press.

192

193

Mosier, C.I. (1941). Psychophysics and mental test theory: II. The constant process. Psychological Review. 47. 235-249.

Neter, J., Wasserman, W., & Whitmore, G. A. (1978). Applied statistics. Boston: Allyn and Bacon.

Rasch, G. (1961). On general laws and the meaning of measurement in psychology. Proceedings of the Fourth Berkeley Symposium on Mathematical statistics and probability. Berkeley: University of California Press, 4, 321-333.

Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests (Rev. ed.). Chicago: University of Chicago Press.

Smith, R. M. (1985). A comparison of Rasch person analysis and robust estimators. Educational and Psychological Measurement. 45. 433-444.

Smith, R. M. (1986). Person fit in the Rasch model. Educational and Psychological Measurement. 46. 359-372.

Smith, R. M. (1988a). A comparison of the power of Rasch total and between item fit statistics to detect measurement disturbances. A paper presented at the annual meeting of the American Educational Research Association, New Orleans.

Smith, R. M. (1988b). The distributional properties of Rasch standardized residuals. Educational and Psychological Measurement 48. 657-667.

Smith, R. M. (1991a). The distributional properties of Rasch item fit statistics. Educational and Psychological Measurement. 51. 541-565.

Smith, R.M. (1991b). TP ARM: Item and person analysis with the Rasch model. Chicago: MESA Press.

Smith, R. M., & Hedges, L.V. (1982). A comparison of the likelihood ratio % and Pearsonian %2 tests of fit in the Rasch model. Educational Research and Perspectives. 9. 44-54.

Stevens, S. S. (1946). On the theory of scales of measurement. Science. 103. 677-680.

Thorndike, R. L. (1949). Personnel selection: Test and measurement techniques. New York: Teachers College Press.

194

Thurstone, L. L., & Chave, E. J. (1929). Measurement of attitudes. Chicago:

University of Chicago Press.

Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley.

Walker, H. M.; & Lev, J. (1953). Statistical inference. New York: Holt, Rinehart,

and Winston.

Wollenberg, A. L. van den. (1982). Two new test statistics for the Rasch model.

Psvchometrika. 47.123-140.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Tnnrnal of Educational Measurement. 14, 97-116.

Wright, B. D., & Douglas, G. A. (1977). Best procedures for sample-free item analysis. Annlied Psychological Measurement. 1. 281-294.

Wright, B. D., & Linacre, J. M. (1991). mnSTFPS: Rasch analysis for all two-facet

models. Chicago: MESA Press.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA

Press.

Wright, B. W., & Mead, R. J. (1978). RTC ATCalibrating items with the Rasch model (Research Memorandum 23A). Chicago: University of Chicago, MESA Statistical Laboratory, Department of Education.

Wright, B. D., & Panchapakesan, N. (1969). A procedure for sample-free item analysis. Educational and Psychological Measurement. 29. 23-48.

Wright, B. D., & Stone, M. (1979). Best test design. Chicago: MESA Press.

measurement disturbance effects on rasch fit …/67531/metadc279376/... · logit residual index...

Documents