applying the online scoring network (osn) to advanced placement

RESEARCH REPORT April 2003 RR-03-12

Applying the Online Scoring Network (OSN) to Advanced Placement Program® (AP®) Tests

Research & Development Division Princeton, NJ 08541

Yuli (Lilly) Zhang Donald E. Powers Wendi Wright Rick Morgan

Applying the Online Scoring Network (OSN) to

Advanced Placement Program® (AP®) Tests

Yuli (Lilly) Zhang, Donald E. Powers, Wendi Wright, and Rick Morgan

Educational Testing Service, Princeton, NJ

April 2003

Research Reports provide preliminary and limited dissemination of ETS research prior to publication. They are available without charge from:

Research Publications Office Mail Stop 10-R Educational Testing Service Princeton, NJ 08541

Abstract

This project explored the feasibility of using the ETS Online Scoring Network (OSN) to score

selected Advanced Placement Program® (AP®) tests. In particular, the quality of scores obtained

from traditional large-scale readings was compared with the quality of scores obtained from

OSN scoring. (For the former, readers convene in a central location to evaluate student responses

in a paper-based format; for the latter, they work alone from remote locations to evaluate

responses that are transmitted via the Internet.) The study also obtained readers’ reactions to

OSN training and to OSN capabilities for monitoring reader performance online.

The results, based on more than 11,000 test takers who had taken either the AP English

Language and Composition exam or the AP Calculus AB exam, revealed little if any difference

between scores assigned at traditional central readings and those assigned when readers used

OSN. The lack of differences applied to (a) score level and variability, (b) interreader agreement,

and (c) passing rates on the two AP exams that were investigated. Study participants did,

however, suggest several areas in which improvements could be made to OSN procedures.

Key words: Advanced Placement Program, remote scoring, Online Scoring Network,

constructed-responses, essay scoring

i

The College Board’s Advanced Placement Program® (AP®) provides high school

students with an opportunity to earn college credit for, or advanced placement into, college-level

courses while they are still in high school. Currently, more than 100,000 teachers lead AP

courses in 19 subject areas (for which there are 35 exams). In 2001, more than 1,400,000 AP

exams were administered worldwide (The Advanced Placement Program, n.d.).

In addition to their scope, the AP tests are also noteworthy with respect to their format.

Besides multiple-choice questions, all of the AP exams contain a free-response section (essays,

problems, or speaking tasks). The free-response questions are designed to assess examinees’

ability to organize their knowledge and to produce clear, coherent answers that demonstrate their

understanding of the discipline and of specific concepts. The free responses take the form of

essays or solutions to problems and programs and are currently scored in a paper-and-pencil

format by trained readers at central locations. Needless to say, considerable effort is involved in

evaluating examinee responses to AP questions.

The ETS Online Scoring Network (OSN) is a computer-based system that was designed

to accommodate the growing needs of large-scale testing programs to score substantial numbers

of free- or constructed-response test items. By capturing and routing examinee responses, the

system enables readers to evaluate test takers’ performances from remote locations via the

Internet instead of at centralized locations, thus eliminating the transportation and housing costs

associated with a centralized reading. Reader training and certification are also accomplished

remotely as is the continuous monitoring of readers’ performance.

The objective of this study was to evaluate the feasibility of using OSN to score AP

exams. Of particular interest were (a) readers’ reactions to using OSN and (b) the extent to which

the level, reliability, and meaning of scores was comparable under OSN to traditional methods of

scoring.

Previous Research

In perhaps the earliest research concerned with the efficacy of remote scoring, Breland

and Jones (1988) compared the reliability and validity of first-year college students’ essay scores

when they were based on (a) trained readers working in a conference setting versus (b)

unmonitored readers working in their own homes or offices. The results revealed slightly lower

reliability and validity when essays were scored remotely. The researchers concluded, however,

1

that remote scoring was feasible potentially, especially if more sophisticated calibration

procedures could be developed and better reader monitoring implemented.

Early research on the first version of OSN (Powers, Farnum, Grant, & Kubota, 1997) was

conducted to determine readers' reactions to the prospect of evaluating essay responses on

computer screens (instead of paper). To make this assessment, experienced readers evaluated

samples of essays both on screen and on paper. The results revealed that readers were relatively

positive about online scoring. Moreover, there were no differences between the average scores

awarded to on-screen essays versus hard-copy essays, and interreader agreement was comparable

for both kinds of presentation. Similar results were obtained by Powers and Farnum (1997), who

found no differences between displaying and scoring essays on a computer screen versus

presenting them in a paper format.

Powers, Kubota, Bentley, Farnum, Swartz, and Willard (1998a) investigated the extent to

which inexperienced readers could be effectively trained to use OSN. Traditionally, personnel

who evaluate constructed responses for ETS-administered testing programs have been required

to possess certain academic credentials. This study was designed to determine the extent to

which these prerequisites could be relaxed without sacrificing the accuracy of scoring. Both

experienced and inexperienced readers evaluated essays (involving the discussion of an issue)

both before and after they had undergone standard training for scoring essays. The results

showed that training did affect scoring accuracy, especially for readers who were previously

inexperienced. Moreover, after training, a significant proportion of inexperienced readers

exhibited a level of accuracy that was commensurate with that shown by experienced readers. A

related effort extended these findings to a second kind of essay prompt involving the analysis of

an argument (Powers, Kubota, Bentley, Farnum, Swartz, & Willard, 1998b).

Thus, several research studies have provided strong evidence of the feasibility of online

scoring. Most of this research, however, has focused on the evaluation of relatively lengthy essay

responses. To date, no large-scale study has compared evaluations conducted online versus those

conducted when readers convene in a central location to evaluate student responses in a paper-

based format. Also, no study has investigated the evaluation of responses to tests like those

offered by the Advanced Placement Program.

2

The following sections of this paper discuss the research design for, and the results of,

one such study. The final section gives a summary of the results and suggestions for future OSN

training, scoring, and monitoring for AP tests.

Method

The Exams

The study focused on two AP exams, English Language and Composition and Calculus

AB, which employ two very different kinds of constructed-response questions. The AP English

Language and Composition test contains three free-response questions that are designed to test

skills in analyzing the rhetoric of prose passages by requiring students to write narrative,

expository, analytical, and argumentative essays in a clear and cogent manner. The AP Calculus

AB exam contains six free-response questions that require students to demonstrate, by showing

their work, their ability to solve calculus problems involving an extended chain of reasoning.

For this study, readers scored a randomly selected subset of exams from the May 2002

AP test administrations—approximately 6,000 exam books for AP English Language and

Composition and another 6,000 for AP Calculus AB were scanned and available for scoring

using OSN. All of these exams had been scored previously at the operational AP reading for

these exams in June 2002. In addition, approximately 500 exams were scored twice at the same

reading and twice again during OSN reading for our study. OSN scoring took place between July

22, 2002, and August 12, 2002. This allowed readers approximately 20 days to complete their

scoring, with each reader devoting a minimum of 4 hours per day.

Reader Recruitment

The study recruited readers from the pool of previously qualified AP readers. The aim

was to select a sample of readers who would approximate the pool of paper-and-pencil readers

for 2002 in terms of the proportions of experienced and new readers, males and females, college

and high school teachers, and minority group members. People who served as readers in June

2002 were not eligible to read for this study.

Invitations were sent via e-mail to the following categories of readers: those who had

declined invitations to serve as AP readers during 2002; those involved in OSN scoring for other

ETS programs; those who had retired from the reader pool within the past 2 years; and potential

3

readers who had been placed on an AP reader waiting list. The invitations included a link to

ONYX, an electronic relationship management tool on the ETS Web site, where invitees were

directed to complete a form to indicate their interest, availability, and access to the Internet and

other needed hardware. Readers and scoring leaders were compensated at the same hourly rate as

current 2002 AP readers and scoring leaders.

Invitations yielded a total of 73 study readers (33 for the AP English Language and

Composition exam and 40 for the AP Calculus AB exam). A plurality (41%) of the readers was

new to the AP scoring process. The others had read previously for the AP program for either 1

year (21%), 2 to 5 years (22%), or more than 5 years (16%). Most were college faculty members

(62 %). A majority (82%) was White, 11% were minority group members (Asian American,

Black American, or other), and 7% did not reveal their ethnicity. Virtually equal numbers of men

and women participated.

The study involved 10 scoring leaders for the AP English Language and Composition test

and 17 for the AP Calculus AB exam. The scoring leaders’ purpose was to answer questions that

readers might have. Most had served previously as AP table leaders, with 77% having served at

least two years. Only three of the study scoring leaders indicated having no previous experience

as table leaders. Most were either secondary school teachers (n = 10) or college faculty members

(n = 12), with roughly equal numbers of males (n = 13) and females (n = 12). (The gender of two

scoring leaders was not reported.) All scoring leaders were White.

All in all, the readers and scoring leaders who volunteered to participate in the study

appeared to be reasonably representative of the total pool of potential AP readers. The possibility

exists, however, that these volunteers may have differed from other AP readers with regard to the

extent to which they were comfortable with reading papers on a computer screen.

Training Materials

The selection of training samples was completed as part of the regular June 2002 AP

reading sessions. These samples were used to develop OSN calibration items, benchmarks, and

rangefinders for the study. As required for OSN, the samples were also annotated during the

reading session and loaded into OSN in early July. A training Web site, which included

additional samples and annotations, was also developed. Because the calculus exam was scored

analytically, there was no need for traditional rangefinders and benchmarks. Instead, as per

4

current practice for paper-and-pencil reading sessions, samples of test performances were

selected only to represent a range of typical responses.

Scanning

All AP exam books that were scored for this study were scanned into OSN, which was the

same as that used for scoring other ETS tests. The OSN capability to store any scanned

annotations/scoring guides for each question was also used in this study. For both the AP English

Language and Composition and the AP Calculus exams, a total of 240 folders (25 exam books per

folder) were selected to represent the AP populations for each exam. This was accomplished by

creating folders throughout the entire length of the period during which exams were returned, so that

any sampling bias due to early returns or to geographic region would be minimized.

Training

The study conducted two levels of training:

• General OSN training for experienced AP readers

• General OSN training and subject specific training for new readers

Each reader used the OSN AP tutorial Web site to receive initial training. At the end of

this training, which was expected to require several hours for new readers, each reader took a

certification exam, which required trainees to match the evaluations awarded to previously vetted

responses. We administered a second certification set to trainees who failed to qualify as readers

on the initial attempt. Once certified, a trainee was allowed to access the OSN Web site and

begin scoring examinee responses.

In order to evaluate the effectiveness of reader training conducted through OSN and to

gather reader reaction to participating in OSN-based training and scoring, we developed a brief

questionnaire and administered it to each reader upon completion of training. The questionnaire

also asked readers about their likely availability to participate in future OSN readings.

Scoring

Each reader scored for a total of approximately 20 hours, with each scoring session being

a minimum of 4 hours. At the beginning of each scoring session, readers took a calibration exam

5

consisting of prescored papers to ensure that they remained on scale. Readers were permitted to

resume scoring only if they passed the calibration exam.

Results

This report presents results in two main parts. The first part contains the statistical results,

e.g., the comparison of the free-response scoring for AP operational paper/pencil-based scoring

and OSN scoring. The second part contains the main results from surveys of AP OSN readers

and scoring leaders.

Statistical Results

The following results involve comparing the responses of a large sample of AP

examinees as evaluated by two comparable sets of trained readers under each of two different

scoring systems—OSN and traditional centralized paper/pencil scoring sessions. For each of the

two AP exams investigated here, summary statistics are shown (in Table 1 for the AP English

Language and Composition and in Table 2 for the AP Calculus AB exam) for each free-response

question, both for the total AP test taker population and for the OSN study. For the OSN study

sample, we present the results of both the operational scoring and OSN scoring. The total

samples include all examinees who took the exams during 2002. Paired t-tests were conducted

(and effect sizes calculated) to evaluate the difference between means for operational and OSN

scoring for each of the two OSN study samples.

For the AP English Language and Composition test, the study found statistically

significantly differences (p < .01) between groups for two of the three free-response questions, as

shown in Table 1. These results are consistent with a 1988 reader reliability study that was

conducted for the exam. Table 1 also shows very small effect sizes (d < .1) of less than half a

point between the two groups.

6

Table 1

Comparison of Operational and OSN Summary Statistics for Free-Response Questions for the AP English Language and Composition Test

OSN sample

Total

N = 152,889 Operational OSN

Mean SD Mean SD Mean SD N p

Effect

size

Question 1 4.82 1.76 4.97 1.67 5.09 1.71 5,388 < .01 .07

Question 2 4.74 1.66 4.77 1.63 4.67 1.72 4,225 < .01 .06

Question 3 5.02 1.66 5.00 1.61 5.05 1.60 3,414 .11 .03

Comparable summary statistics are shown in Table 2 for six free-response AP Calculus AB exam

questions. Paired t-tests revealed statistically significant differences (p < .01) between the operational

and OSN scoring for all six questions, with an average score difference of less than .05 score points. The

effect sizes, however, were miniscule (d < .02). This result is also consistent with a reader reliability

study for AP Calculus AB test conducted in 1996.

Table 2

Comparison of Operational and OSN Summary Statistics for Free-Response Questions for the AP Calculus AB Test

Total OSN sample

N = 152,696 Operational OSN

Mean SD Mean SD Mean SD N p

Effect

size

Question 1 4.08 2.57 4.12 2.55 4.16 2.57 2,928 < .01 .02

Question 2 3.13 2.82 3.28 2.86 3.31 2.86 5,320 < .01 .01

Question 3 3.12 2.24 3.24 2.32 3.20 2.32 2,660 < .01 .02

Question 4 3.51 2.87 3.62 2.86 3.58 2.84 3,181 < .01 .01

Question 5 2.29 2.66 2.44 2.71 2.47 2.69 5,609 < .01 .01

Question 6 2.33 2.11 2.47 2.17 2.44 2.15 4,142 < .01 .01

7

Comparison of Interreader Correlations for Operational and OSN Scoring

In addition to scoring each AP question once for a large sample, we also conducted a

reader reliability study in which a sample of responses to each free-response question was scored

by two readers. Product-moment correlations between readers were calculated for each question

as an index of reader consistency. The results were compared with the same statistics derived

from a similar study conducted as part of the 2002 operational AP readings. Tables 3 and 4

compare interreader correlations for AP readers using OSN with those for readers who scored the

same responses in the AP operational reading. The correlations between OSN readers were

nearly as high or higher than those for operational readers. However, z tests of the difference

between two correlation coefficients (Marascuilo & Serlin, 1988) revealed no statistically

significant differences (p > .05) between interreader correlation coefficients obtained in the

operational and OSN settings.

Table 3

Interreader Correlations for Free-Response Questions in Operational and OSN Settings for the AP English Language and Composition Test

N OSN correlation

Operational correlation z p

Question 1 365 .69 .55 1.91 > .05

Question 2 354 .61 .62 0.07 > .05

Question 3 306 .67 .52 1.84 > .05

8

Table 4

Interreader Correlations for Free-Response Questions in Operational and OSN Settings for the AP Calculus AB Test

OSN Operational

N reliability reliability z p

Question 1 228 .98 .97 .03 > .05

Question 2 381 .97 .96 .10 > .05

Question 3 210 .94 .93 .11 > .05

Question 4 263 .97 .96 .10 > .05

Question 5 462 .97 .94 .47 > .05

Question 6 362 .97 .96 .20 > .05

Comparison of Reliability of Operational and OSN Scoring

The Cronbach coefficient alpha was used to estimate the internal consistency of the free-

response section. Because each free-response question was read by a different reader, coefficient

alpha reflects both interreader consistency as well as the overall consistency of measurement

provided by free-response questions. Table 5 presents the reliability comparison between

operational and OSN groups for the AP English Language and Composition test. There were no

significant between-group differences in the reliabilities of free-response items between the

operational and OSN scoring results.

Table 5

Comparison of Reliability Estimates for Free-Response Items in the AP English Language and Composition Test

OSN matched sample

Total Operational OSN

N 152,889 3,241 3,241

Reliability .673 .670 .655

9

Table 6 presents the reliability comparison between operational and OSN scoring for the

AP Calculus AB test. The reliability estimates are virtually identical for the two modes of

scoring.

Table 6

Comparison of Reliability Estimates for Free-Response Items in the AP Calculus AB Test

OSN matched sample

Total Operational OSN

N 153,696 514 514

Reliability .851 .861 .864

Comparison of Item-level Reader Agreement Rates for Operational and OSN Scoring

Besides computing interreader correlations for the smaller reliability sample, we also

estimated score consistency in the larger study sample by computing the simple percentage

agreement rates between readers performing in the two different scoring environments. For each

of the free-response questions, which are scored on a 9-point scale, three agreement rates were

computed: exact, exact or within 1 point, and exact or within 2 points. Both the observed

percentages of agreement and Cohen’s kappa agreement statistics are shown in Tables 7 and 8

for each question on the two AP exams. (Cohen’s kappa is a more appropriate way of expressing

reader agreement than is simple percentage agreement because Cohen’s kappa corrects for

chance agreement.)

As shown in Table 7 for the three questions on the AP English Language and

Composition test, kappa statistics for exact agreement, agreement within 1 point, and agreement

within 2 points varied from .16 to .20, from .49 to .58, and from .77 to .83, respectively. The

exact agreement rates are relatively low, based on criteria that have been suggested elsewhere.

(See Powers, 2000, for a review of these criteria.) However, the rates for less than exact

agreement are reasonably good, especially if the standard is within two points. The reasons for

the rate of exact agreement on the AP English Language and Composition test questions may be

a function of either the nature of the task or the different environments in which the questions

were scored. The comparisons in the following section provide a better sense of the reasons for

these low agreement rates.

10

Table 7

Observed Percentage Agreements (and Kappa Statistics) Between Operational and OSN Scoring for the AP English Language and Composition Test

Exact 1 point adjacent 2 point adjacent

Poa Kb Po K Po K

Question 1 .24 .16 .64 .49 .87 .77

Question 2 .25 .17 .66 .53 .88 .78

Question 3 .28 .20 .70 .58 .91 .83

aPo: Observed percentage of agreement. bK: Cohen’s kappa.

Table 8 presents agreement statistics for the six free-response questions for the AP

Calculus AB test. The kappa-based exact, within-1-point, and within-2-point agreement rates

varied from .57 to .78, .89 to .96, and .95 to 1.00, respectively. All are quite respectable

according to the published criteria.

Table 8

Observed Percentage Agreements (and Kappa Statistics) Between Operational and OSN Scoring for the AP Calculus AB Test


Poa Kb Po K Po K

Question 1 .74 .71 .96 .94 .99 .99

Question 2 .72 .69 .95 .93 .99 .98

Question 3 .62 .57 .93 .90 .99 .98

Question 4 .71 .68 .94 .92 .99 . 98

Question 5 .71 .68 .92 .89 .97 .95

Question 6 .80 .78 .97 .96 1.00 1.00

aPo: Observed percentage of agreement. bK: Cohen’s kappa.

11

Appendix A presents detailed agreement tables for each free-response item for both the

AP English and Composition and the AP Calculus AB tests.

Comparison of Interreader Agreement

As shown above, the rate of exact agreement between OSN and operational scoring was

relatively low for the AP English Language and Composition test. In order to ascertain the nature

of the score discrepancies—whether from disagreement between readers or from disagreement

due to different scoring environments—we compared interreader agreement in the reliability

study sample (in which both readers evaluated responses in an operational setting) with the

agreement exhibited in the OSN study sample (in which one reader evaluated responses in an

operational setting and the other evaluated the same responses in an OSN setting). The

comparisons, shown in Table 9, reveal that reader agreement was at least as good (perhaps

better) between two readers in different scoring environments as for readers in the same

operational scoring environment (i.e., in the reliability study sample). This result suggests that

the major source of disagreement is readers themselves, not differences between OSN and

operational scoring environments.

Table 9

Kappa Statistics for the Study Sample (Operational vs. OSN Scoring) and for the Reader Reliability Sample for the AP English Language and Composition Test


OSNa RRSb OSN RRS OSN RRS

Question 1 .22 .21 .62 .53 .88 .79

Question 2 .19 .22 .59 .56 .84 .89

Question 3 .24 .17 .62 .54 .91 .82

aOSN: OSN study sample, n = 5,388; 4,225; and 341, respectively, for each question. bRRS: Reader reliability study sample, n = 500.

12

The results in Table 10 provide similar information (and results) for the AP Calculus AB test.

Table 10

Kappa Statistics for the Study Sample (Operational vs. OSN Scoring) and for the Reader Reliability Sample for the AP Calculus AB Test


OSNa RRSb OSN RRS OSN RRS

Question 1 .73 .74 .96 .94 1.00 .99

Question 2 .70 .67 .93 .92 1.00 .97

Question 3 .63 .58 .93 .88 .97 .98

Question 4 .72 .66 .92 .91 .99 .99

Question 5 .73 .64 .94 .90 .99 .95

Question 6 .76 .76 .98 .95 1.00 .98

aOSN: OSN study sample n = 2,928; 5,320; 2,660; 3,181; 5,609; and 4,142, respectively, for each question. bRRS: Reader reliability study sample, n = 500.

Comparison of AP Grades Based on Operational and OSN Scoring

The effect on AP grades was evaluated by comparing grades based on an operational

reading to those based on OSN scoring. The multiple-choice section scores were, of course,

identical in these comparisons. For the AP Calculus AB test, the free-response section

contributes 50% to the composite score; for the AP English Language and Composition test, it

contributes 55%. Table 11 provides the distribution of operationally reported grades versus

OSN-based grades for the AP English Language and Composition test, with the diagonal cells

representing agreement and the off-diagonal cells indicating discrepancies.

In this study, OSN grades were identical to operational grades in 2,253 of 3,241 cases

(69.6%) for the AP English Language and Composition test. Had reported grades been based on

OSN-based scores, 30.4% of the grades would have differed by one AP grade, with 16.0% being

higher under OSN grading and 14.4% being lower. Less than 0.5% of the differences were

greater than one grade. This result is consistent with reader reliability studies conducted with

13

operational data in 2002 and 1998, which yielded an overall agreement rate of 66% (Educational

Testing Service, 1998).

Table 11

Cross-tabulation of AP Grades Based on Operational and OSN Scoring for the AP English Language and Composition Test

OSN grade

Operational grade

1 2 3 4 5

Total

1 183 (78%) 60 0 0 0 243

2 53 787 (79%) 163 1 0 1,004

3 0 145 680 (67%) 162 2 989

4 0 3 173 404 (62%) 130 710

5 0 0 6 86 203 (61%) 295

Total 236 995 1,022 653 335 3,241

Table 12 presents the distribution of reported grades versus OSN-based grades for the AP

Calculus AB test. In this study, OSN grades were identical to operational grades in 425 of 446

cases (95.3%). Had reported grades been based on OSN scores, 4.7% of the grades would have

differed by one AP grade. This result is consistent with the reader reliability study conducted in

1996, for which an exact rate of agreement of 94% was calculated (see Bleistein, Morgan, &

Battleman, 1996).

14

Table 12

Cross-tabulation of AP Grades Based on Operational and OSN Scoring for the AP Calculus AB Test

OSN grade Operational grade 1 2 3 4 5

Total

1 38 (97%) 1 0 0 0 39

2 0 56 (97%) 2 0 0 58

3 0 2 119 (95%) 4 0 125

4 0 0 37 107 (93%) 8 115

5 0 0 0 4 105 (96%) 109

Total 38 59 121 115 113 446

Performance of OSN Readers Over Time

The OSN reading for the AP English Language and Composition exam required more

than 20 days, with the numbers of free-response questions read varying from day to day. The

consistency of OSN readers over time was estimated by computing the correlation between

scores on free-response and multiple-choice sections for both operational and OSN scoring.

Using a z test for the equality of correlations, these correlations were compared for every

question for each day of scoring that yielded at least 50 evaluations per question.

For the AP English Language and Composition exam, the median correlations (over days)

with performance on multiple-choice questions for the three free-response questions were .46, .42,

and .42 when free-responses were scored operationally. The corresponding correlations when

free-response questions were scored with OSN were .48, .41, and .46. For the AP Calculus AB

exam, the median correlations over time between multiple-choice scores and each of six free-

response question scores were .67, .64, .63, .69, .68, and .71 for operational scoring. For OSN

scoring, the corresponding correlations were virtually identical: .66, .64, .65, .69, .69, and .71.

The comparisons over time for the three free-response questions in the AP English

Language and Composition test are displayed in Figure 1, which shows that, for each of the three

questions, the differences between correlations (OSN correlation minus operational correlation)

15

fluctuate randomly around 0.0 and between –0.2 and +0.2. There were no statistically significant

differences between the two correlations at the .05 level, and the difference between correlations

in the two scoring environments were as likely to be positive as negative. (See Tables B1 to B3

in Appendix B.) These results indicate that, when the multiple-choice section of the test is treated

as a validity criterion, OSN and traditional operational readings of free-response questions for

the AP English Language and Composition test are equally valid.

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Days of scoring

Diff

eren

ce

Q1 Q2 Q3

Figure 1. Differences over time between OSN and OP in correlations for AP English

Language and Composition Test.

The OSN reading for the AP Calculus AB test required about 16 days. The consistency of

OSN readers was estimated in the same manner as for the AP English Language and

Composition test. The results, shown in Figure 2 (see Tables B4 to B9 in Appendix B), are

similar to those shown in Figure 1 for the AP English Language and Composition test. The six

lines in the graph fluctuate in a seemingly random manner around 0.0 and in a very limited range

(-0.03 to 0.04). Thus, there were no apparent differences in the validity of operational and OSN-

based readings over time.

16

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days of scoring

Diff

eren

ce

Q1 Q2 Q3 Q4 Q5 Q6

Figure 2. Differences over time between OSN and OP in correlations for AP Calculus AB.

AP OSN Survey Results

Readers’ Opinions About OSN Training

Upon completion of their training, readers who participated in the study were surveyed to

obtain their impressions of OSN. All of the 73 study readers (33 for the AP English Language

and Composition exam and 40 for the AP Calculus AB exam) completed the survey. A majority

(89%) of readers rated general OSN training as being effective (29% rated it as very effective) in

helping them to properly utilize OSN without seeking technical support, and 94% regarded the

subject-specific scoring training as being effective (28% rated it as very effective) in helping

them to score responses accurately. However, 76% of the 41 readers who had read previously for

the AP program did not think that OSN training was as effective as their previous AP training,

and six of these readers thought that OSN training was much less effective.

Participants who expressed dissatisfaction with OSN training most often mentioned the

lack of opportunity to discuss standards with other readers, the inability to print the commentary

that accompanied training essays, the incapability to display a greater proportion of an essay

without having to scroll, and the lack of correspondence between the prompts used for training

and those that examinees actually had.

17

During the course of the OSN reading, a majority of readers (63%) felt that they needed

to seek OSN technical support, and 88% found this help to be satisfactory for resolving technical

problems. Readers were also asked to rate each of several aspects of OSN as being either

excellent, good, satisfactory, or unsatisfactory. Their responses are shown in Table 13.

Table 13

Readers’ Opinion About OSN

% Satisfactory or higher

% Excellent or good

OSN login process 97 86

User-friendliness (ease-of-use, navigation) 96 65

Online practice tutorial 90 69

Visual display (screen headings, etc.) 87 69

System response time 80 55

Handwriting image display 69 37

Most aspects of OSN were viewed as at least satisfactory and, with one exception, all

were rated as good or excellent by a majority of readers. The lowest rated feature, the

handwriting display, was found to be satisfactory by 69% of readers. Even though a significant

minority of readers found the handwriting display to be less than satisfactory, it apparently did

not hinder their scoring.

Readers were also asked to evaluate their interactions with and support from scoring

leaders. The vast majority of readers (92%) consulted their scoring leader at least once, and a

slight majority (53%) consulted him/her more than twice. The vast majority of readers (92% of

those responding) felt that using the telephone was at least satisfactory for discussing scoring

issues with their scoring leader. All respondents felt that the scoring leader was helpful (78%

found them very helpful), and 86% said that the scoring leader was available when needed.

Finally, readers were asked if they had encountered technical difficulties when using the

OSN Web site. Of the 58 readers who responded to the question, nearly half (45%) indicated that

they had experienced difficulty connecting with the Web site, and 38% said that they had trouble

with download speed.

18

Scoring Leaders’ Opinions

When asked for their opinion about the overall effectiveness of the general OSN training

(in helping them to properly utilize the system without seeking technical support), fully 83% of

respondents indicated that the training was at least slightly effective. Eighty-one percent

regarded the use of the phone as being at least satisfactory for discussing scoring issues with

readers. The most often mentioned positive aspect of OSN training was that it enabled readers to

train at their own pace. Only four scoring leaders felt that training was either not very effective

or not effective at all. The most often mentioned negative aspect of OSN training was the

inability of readers to interact with each other (to discuss troublesome essays or scoring rubrics,

for example).

Scoring leaders were also asked to rate each of several aspects of OSN. Their ratings are

shown in Table 14.

Table 14

Scoring Leaders’ Opinion About OSN

% Satisfactory or higher

% Excellent or good

OSN login process 96 77

User-friendliness (ease-of-use, navigation) 81 54

Online practice tutorial 80 36

Visual display (screen headings, etc.) 81 31

System response time 77 57

Handwriting image display 69 19

Three quarters of the scoring leaders reported that they required technical support during

the readings, and all but one of nineteen respondents rated the help they received as being

satisfactory. However, 15 of 20 respondents said they encountered trouble at least once when

connecting to the OSN Web site, and nearly half of them reported trouble with download speed.

Summary and Discussion

The ETS Online Scoring Network (OSN) is a computer-based system designed to score

free- or constructed-response test items. By capturing and routing examinee responses, the

19

system enables readers to evaluate test takers’ performances from remote locations via the

Internet rather than at centralized locations, as has been the tradition for many large-scale testing

programs that employ constructed-response testing.

The effort described here entailed the development and application of OSN procedures to

score a large sample of constructed responses for two Advanced Placement tests—English Language

and Composition and Calculus AB. The study objectives were to obtain participants’ reactions to

OSN and to assess their performance when using the system. Study participants were 67 readers and

26 scoring leaders, all recruited from the same population as traditional AP readers. All told, these

readers evaluated nearly 6,000 exams for each of the two tests that were studied.

With respect to participant reactions to using OSN, the following results were obtained.

The vast majority of readers rated both general OSN training and subject-specific training as

being effective in helping them to utilize OSN and to score responses accurately. In general,

however, readers did not think that OSN training was as effective as their previous AP training.

During the course of the OSN reading, a slight majority of readers sought technical

support, and, of those readers, the vast majority found this support to be satisfactory for resolving

technical problems. Readers expressed variable opinions about specific aspects of OSN, though

most aspects were rated quite positively. The least positive rating pertained to the handwriting

image display, which was still rated as at least satisfactory by a majority of readers. Many

readers also indicated that they had experienced difficulty connecting with the Web site or with

download speed.

The opinions of scoring leaders regarding the effectiveness of various aspects of the

system were generally consistent with those of readers. For example, scoring leaders also rated

the handwriting image display as the least effective component of the system, although a

majority rated it as satisfactory. Like readers, some scoring leaders also reported difficulty when

connecting to the OSN Web site or with slow download speed.

Readers’ performance in the OSN environment was assessed by comparing the

evaluations made by two comparable sets of trained readers each working in a different scoring

environment—either in OSN or in traditional centralized paper/pencil scoring sessions. The same

samples of examinee test responses were evaluated in each scoring environment.

For each test question, a comparison of mean scores awarded in each scoring

environment revealed statistically significant differences between scoring environments.

20

(Statistical significance was likely a result of the very large samples.) However, these differences

were very small (d < .1) for the AP English Language and Composition test and miniscule (d <

.02) for the AP Calculus AB exam. Moreover, the direction of any difference was as likely to

favor one scoring environment as the other.

In addition to scoring each AP question once for a large sample, a reader reliability study

was conducted, in which a sample of responses to each free-response question was scored by two

readers. The agreement between OSN readers was at least as good as that for those who read in a

traditional operational setting.

The internal consistency of the free-response test sections was also compared across the

two scoring environments. There were no significant between-environment differences in the

reliabilities of free-response sections for either test.

In addition, we compared interreader agreement in operational and OSN settings. The

comparisons revealed that reader agreement was very similar in the two different scoring

environments.

The effect on AP grades was evaluated by comparing grades based on an operational

reading to those obtained with OSN scoring. For each exam, the rates of agreement between

OSN-based grades and those based on the traditional AP scoring environment were at least as

high as the rates noted in previous AP reliability studies, in which exams were double scored in

the same, traditional scoring environment.

The consistency of OSN readers over time was estimated also—by computing the

correlations between scores on free-response and multiple-choice sections for operational and

OSN scoring and comparing these correlations for each day of scoring. There were no

statistically significant differences between the two sets of correlations for either test. These

results indicate that, when the multiple-choice section of the test is treated as a validity criterion,

OSN and traditional operational readings of free-response questions are equally valid.

In conclusion, it appears that, according to study participants, some aspects of OSN need

to be improved. However, by each of the several widely accepted performance standards

(interreader agreement, internal consistency, and relationship to other appropriate variables), the

results obtained with OSN are extremely similar to those obtained with traditional AP scoring

methods.

21

References

The Advanced Placement Program. (n.d.). Retrieved August 29, 2002, from

http://apcentral.collegeboard.com/program

Bleistein, C., Morgan, R., & Battleman, M. (1996). College Board Advanced Placement

Examination reader reliability study Calculus AB and Calculus BC Form 3RBP.

Unpublished manuscript.

Breland, H. M., & Jones, R. J. (1988). Remote scoring of essays (ETS RR-88-04). Princeton, NJ:

Educational Testing Service.

Educational Testing Service. (1998). College Board Advanced Placement English Language and

Composition form 3TRP reader reliability study. Unpublished manuscript.

Marascuilo, L. A., & Serlin, R. C. (1988). Statistical methods for the social and behavioral

sciences. New York: W. H. Freeman and Company.

Powers, D. E. (2000). Computing reader agreement for the GRE Writing Assessment (ETS RM-

00-08). Princeton, NJ: Educational Testing Service.

Powers, D. E., & Farnum, M. (1997). Effects of mode of presentation on essay scores (ETS RM-


Powers, D. E., Farnum, M., Grant, M., & Kubota, M. (1997). A pilot test of online essay scoring

(ETS RM-97-07). Princeton, NJ: Educational Testing Service.

Powers, D. E., Kubota, M., Bentley, J., Farnum, M., Swartz, R., & Willard, A. (1998a).

Qualifying readers for the Online Scoring Network (ETS RR-98-20). Princeton, NJ:

Educational Testing Service.

Powers, D. E., Kubota, M., Bentley, J., Farnum, M., Swartz, R., & Willard, A. (1998b).

Qualifying readers for the Online Scoring Network: Scoring argument essays (ETS RR-


22

Appendix A

Item Agreement Between OP and OSN Scoring

Tables A1 through A9 show the item agreement between operational (OP) and OSN scoring on

free-response (FR) questions for AP English Language and Composition and AP Calculus AB tests.

Table A1

AP English Language and Composition: Question 1

OP FR1 OSN FR1

Frequency Row pct 0 1 2 3 4 5 6 7 8 9 Total

0 1 16.67

3 50.00

116.67

00.00

00.00

00.00

116.67

0 0.00

0 0.00

00.00

6

1 3 3.70

27 33.33

3138.27

1214.81

78.64

11.23

00.00

0 0.00

0 0.00

00.00

81

2 1 0.29

14 4.03

6518.73

10931.41

8223.63

4412.68

185.19

9 2.59

4 1.15

10.29

347

3 2 0.37

7 1.30

8014.81

11320.93

15729.07

9116.85

6912.78

14 2.59

5 0.93

20.37

540

4 1 0.08

3 0.25

564.70

16613.93

33728.27

24520.55

24420.47

109 9.14

24 2.01

70.59

1,192

5 1 0.10

4 0.39

232.23

1039.99

24924.15

20019.40

24223.47

140 13.58

59 5.72

100.97

1,031

6 0 0.00

6 0.48

201.59

635.02

23118.42

27521.93

35027.91

182 14.51

105 8.37

221.75

1,254

7 0 0.00

2 0.33

40.65

182.93

7512.21

8714.17

18429.97

133 21.66

92 14.98

193.09

614

8 0 0.00

0 0.00

31.15

62.31

135.00

3613.85

6926.54

60 23.08

55 21.15

186.92

260

9 0 0.00

0 0.00

00.00

00.00

34.62

46.15

1218.46

9 13.85

24 36.92

1320.00

65

Total 9 66 283 590 1,154 983 1,189 656 368 92 5,390

23

Table A2


OP FR2 OSN FR2

Frequency Rw pct 0 1 2 3 4 5 6 7 8 9 Total

0 1 14.29

4 57.14

1 14.29

00.00

00.00

00.00

114.29

00.00

0 0.00

00.00

7

1 1 1.32

39 51.32

19 25.00

1114.47

56.58

00.00

00.00

11.32

0 0.00

00.00

76

2 1 0.38

28 10.65

85 32.32

5922.43

4316.35

3111.79

103.80

41.52

2 0.76

00.00

263

3 0 0.00

24 4.45

86 15.96

14326.53

13024.12

9217.07

417.61

162.97

6 1.11

10.19

539

4 2 0.19

14 1.31

84 7.88

18917.73

29827.95

27225.52

12611.82

565.25

21 1.97

40.38

1,066

5 6 0.72

8 0.95

33 3.94

10212.17

22126.37

19723.51

16519.69

799.43

25 2.98

20.24

838

6 4 0.47

3 0.36

22 2.61

698.18

15218.01

23227.49

19222.75

11413.51

44 5.21

121.42

844

7 4 0.99

2 0.50

6 1.49

174.21

4912.13

9724.01

10726.49

6917.08

43 10.64

102.48

404

8 0 0.00

0 0.00

2 1.18

42.35

105.88

2816.47

4124.12

4023.53

33 19.41

127.06

170

9 0 0.00

0 0.00

0 0.00

12.86

25.71

38.57

720.00

925.71

6 17.14

720.00

35

Total 19 122 338 595 910 952 690 388 180 48 4,242

24

Table A3


OP FR3 OSN FR3


0 0 0.00

1 100.00

0 0.00

00.00

00.00

00.00

00.00

00.00

0 0.00

00.00

1

1 1 2.63

9 23.68

14 36.84

923.68

25.26

25.26

00.00

12.63

0 0.00

00.00

38

2 1 0.56

10 5.59

35 19.55

4927.37

4826.82

2715.08

73.91

10.56

1 0.56

00.00

179

3 0 0.00

8 2.05

41 10.51

10326.41

11429.23

7418.97

3910.00

82.05

3 0.77

00.00

390

4 0 0.00

4 0.57

36 5.14

11616.55

21831.10

17324.68

10114.41

395.56

12 1.71

20.29

701

5 1 0.13

3 0.40

11 1.48

527.00

17523.55

24032.30

16221.80

8010.77

14 1.88

50.67

743

6 0 0.00

2 0.26

6 0.77

313.99

11915.34

19725.39

21527.71

14718.94

52 6.70

70.90

776

7 0 0.00

2 0.51

2 0.51

102.54

338.40

7819.85

11429.01

9624.43

45 11.45

133.31

393

8 0 0.00

0 0.00

0 0.00

21.27

95.73

2012.74

4528.66

4126.11

35 22.29

53.18

157

9 0 0.00

0 0.00

0 0.00

00.00

25.26

12.63

410.53

1026.32

17 44.74

410.53

38

Total 3 39 145 372 720 812 687 423 179 36 3,416

25

Table A4

AP Calculus AB: Question 1

OP FR1 OSN FR1


0 208 90.83

17 7.42

3 1.31

10.44

00.00

00.00

00.00

00.00

0 0.00

00.00

229

1 14 5.71

184 75.10

32 13.06

104.08

20.82

10.41

00.00

20.82

0 0.00

00.00

245

2 0 0.00

31 8.22

298 79.05

379.81

92.39

10.27

10.27

00.00

0 0.00

00.00

377

3 1 0.20

4 0.79

43 8.46

39277.17

499.65

112.17

81.57

00.00

0 0.00

00.00

508

4 0 0.00

0 0.00

8 2.07

5714.77

26568.65

4611.92

102.59

00.00

0 0.00

00.00

386

5 0 0.00

0 0.00

0 0.00

124.48

3914.55

16862.69

4316.04

51.87

1 0.37

00.00

268

6 0 0.00

0 0.00

0 0.00

10.32

61.95

4313.96

21469.48

3310.71

9 2.92

20.65

308

7 0 0.00

0 0.00

0 0.00

00.00

10.43

62.60

3113.42

15265.80

39 16.88

20.87

231

8 0 0.00

0 0.00

0 0.00

00.00

00.00

00.00

31.55

3015.54

128 66.32

3216.58

193

9 0 0.00

0 0.00

0 0.00

00.00

00.00

00.00

21.09

21.09

16 8.74

16389.07

183

Total 223 236 384 510 371 276 312 224 193 199 2,928

Note. Frequency missing = 150,768.

26

Table A5


OP FR2 OSN FR2


0 1,430 91.67

93 5.96

28 1.79

60.38

30.19

00.00

00.00

00.00

0 0.00

00.00

1,560

1 76 16.41

289 62.42

77 16.63

132.81

61.30

10.22

00.00

00.00

0 0.00

10.22

463

2 14 3.89

64 17.78

217 60.28

5013.89

82.22

51.39

20.56

00.00

0 0.00

00.00

360

3 4 1.25

8 2.51

41 12.85

19761.76

5416.93

123.76

20.63

10.31

0 0.00

00.00

319

4 4 0.59

1 0.15

17 2.49

547.92

50173.46

8212.02

182.64

40.59

0 0.00

10.15

682

5 2 0.35

1 0.18

1 0.18

91.59

6611.66

37065.37

9817.31

162.83

3 0.53

00.00

566

6 1 0.20

0 0.00

0 0.00

00.00

91.79

8817.46

30760.91

8717.26

10 1.98

20.40

504

7 0 0.00

0 0.00

0 0.00

00.00

41.04

102.60

8121.09

21957.03

60 15.63

102.60

384

8 0 0.00

0 0.00

0 0.00

00.00

00.00

20.68

144.76

6823.13

172 58.50

3812.93

294

9 0 0.00

0 0.00

0 0.00

00.00

00.00

00.00

00.00

42.13

52 27.66

13270.21

188

Total 1531 456 381 329 651 570 522 399 297 184 5,320

27

Table A6


OP FR3 OSN FR3


0 331 90.68

31 8.49

2 0.55

10.27

00.00

00.00

00.00

00.00

0 0.00

00.00

365

1 28 7.80

253 70.47

69 19.22

82.23

10.28

00.00

00.00

00.00

0 0.00

00.00

359

2 8 2.04

65 16.54

239 60.81

6817.30

112.80

20.51

00.00

00.00

0 0.00

00.00

393

3 1 0.25

20 4.90

83 20.34

21753.19

6616.18

153.68

51.23

10.25

0 0.00

00.00

408

4 1 0.29

2 0.58

19 5.56

8324.27

17049.71

5516.08

113.22

10.29

0 0.00

00.00

342

5 1 0.33

0 0.00

1 0.33

165.32

6019.93

17056.48

4314.29

72.33

3 1.00

00.00

301

6 0 0.00

1 0.45

0 0.00

62.70

146.31

4520.27

11953.60

3214.41

4 1.80

10.45

222

7 0 0.00

0 0.00

0 0.00

00.00

21.32

127.95

3825.17

7147.02

23 15.23

53.31

151

8 0 0.00

0 0.00

0 0.00

00.00

00.00

33.75

67.50

2126.25

40 50.00

1012.50

80

9 0 0.00

0 0.00

0 0.00

00.00

00.00

00.00

00.00

37.69

10 25.64

2666.67

39

Total 370 372 413 399 324 302 222 136 80 42 2,660

28

Table A7


OP FR4 OSN FR4


0 625 92.05

46 6.77

5 0.74

10.15

10.15

10.15

00.00

00.00

0 0.00

00.00

679

1 36 9.73

292 78.92

33 8.92

71.89

20.54

00.00

00.00

00.00

0 0.00

00.00

370

2 3 1.12

37 13.75

176 65.43

4516.73

82.97

00.00

00.00

00.00

0 0.00

00.00

269

3 2 0.73

4 1.46

51 18.61

17162.41

3211.68

134.74

10.36

00.00

0 0.00

00.00

274

4 1 0.36

0 0.00

7 2.49

6121.71

14651.96

4917.44

155.34

10.36

1 0.36

00.00

281

5 0 0.00

1 0.34

1 0.34

124.05

5117.23

18060.81

4013.51

103.38

1 0.34

00.00

296

6 1 0.27

1 0.27

2 0.54

20.54

215.72

6718.26

21358.04

5013.62

10 2.72

00.00

367

7 0 0.00

0 0.00

0 0.00

10.31

51.55

175.28

4313.35

22168.63

31 9.63

41.24

322

8 0 0.00

0 0.00

0 0.00

00.00

10.51

10.51

105.13

3316.92

140 71.79

105.13

195

9 0 0.00

0 0.00

0 0.00

00.00

00.00

00.00

21.56

53.91

19 14.84

10279.69

128

Total 668 381 275 300 267 328 324 320 202 116 3,181

29

Table A8


OP FR5 OSN FR5


0 1,907 89.03

157 7.33

45 2.10

140.65

100.47

70.33

10.05

10.05

0 0.00

00.00

2,142

1 94 11.94

533 67.73

98 12.45

334.19

121.52

111.40

30.38

30.38

0 0.00

00.00

787

2 42 8.05

101 19.35

279 53.45

6712.84

203.83

61.15

30.57

40.77

0 0.00

00.00

522

3 5 1.45

21 6.07

63 18.21

16347.11

6719.36

164.62

61.73

41.16

1 0.29

00.00

346

4 11 3.23

14 4.11

24 7.04

6619.35

15545.45

5014.66

185.28

30.88

0 0.00

00.00

341

5 0 0.00

4 1.04

11 2.85

276.99

6617.10

22458.03

4712.18

61.55

1 0.26

00.00

386

6 0 0.00

0 0.00

4 0.96

51.20

204.80

5613.43

27365.47

5613.43

2 0.48

10.24

417

7 0 0.00

2 0.48

1 0.24

30.72

51.20

81.91

5412.92

29971.53

42 10.05

40.96

418

8 0 0.00

0 0.00

0 0.00

00.00

00.00

00.00

97.09

3124.41

70 55.12

1713.39

127

9 0 0.00

0 0.00

0 0.00

10.81

00.00

00.00

21.63

32.44

16 13.01

10182.11

123

Total 2,059 832 525 379 355 378 416 410 132 123 5,609

30

Table A9


OP FR5 OSN FR5


0 1,117 94.82

55 4.67

4 0.34

10.08

10.08

00.00

00.00

00.00

0 0.00

00.00

1,178

1 49 12.47

299 76.08

42 10.69

20.51

10.25

00.00

00.00

00.00

0 0.00

00.00

393

2 10 1.46

50 7.31

559 81.73

598.63

50.73

10.15

00.00

00.00

0 0.00

00.00

684

3 0 0.00

10 1.83

74 13.55

38770.88

6511.90

81.47

20.37

00.00

0 0.00

00.00

546

4 0 0.00

1 0.18

10 1.80

6912.39

41674.69

5610.05

50.90

00.00

0 0.00

00.00

557

5 0 0.00

0 0.00

0 0.00

143.66

6316.45

26368.67

379.66

51.31

1 0.26

00.00

383

6 0 0.00

1 0.45

0 0.00

00.00

188.11

4319.37

14364.41

156.76

2 0.90

00.00

222

7 0 0.00

0 0.00

0 0.00

10.92

00.00

65.50

2321.10

7064.22

9 8.26

00.00

109

8 0 0.00

0 0.00

0 0.00

00.00

00.00

00.00

23.64

1120.00

39 70.91

35.45

55

9 0 0.00

0 0.00

0 0.00

00.00

00.00

00.00

00.00

00.00

3 20.00

1280.00

15

Total 1176 416 689 533 569 377 212 101 54 15 4,142

31

Appendix B

Correlations Between Operational and OSN Scoring

Tables B1 through B10 show the correlations between free-response (FR) and multiple-

choice (MC) questions in operational (OP) and OSN scoring for the AP English Language and

Composition and the Calculus AB tests by day of reading.

Table B1


Date of OSN reading

N read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/19/02 99 0.458 0.521 0.063 0.439

7/20/02 280 0.513 0.603 0.090 1.059

7/21/02 155 0.478 0.412 –0.065 0.570

7/22/02 191 0.541 0.499 –0.042 0.411

7/23/02 363 0.505 0.444 –0.061 0.820

7/24/02 179 0.442 0.409 –0.033 0.309

7/25/02 289 0.487 0.511 0.023 0.280

7/26/02 297 0.455 0.498 0.043 0.523

427 0.424 0.412 –0.012 0.178

7/28/02 129 0.401 0.430 0.029 0.231

7/29/02 252 0.519 0.507 –0.012 0.132

7/30/02 47 0.173 0.449 0.276 1.294

7/31/02 210 0.402 0.569 0.167 1.702

8/1/02 267 0.511 0.477 –0.034 0.387

8/2/02 140 0.430 0.532 0.102 0.846

8/3/02 577 0.461 0.385 –0.075 1.277

8/4/02 625 0.385 0.401 0.015 0.271

8/5/02 324 0.479 0.451 –0.028 0.353

8/6/02 495 0.447 0.492 0.044 0.693

7/27/02

32

Table B2


Date of OSN reading

N read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/19/02 41 0.181 0.150 –0.032 0.139

7/21/02 106 0.321 0.445 0.124 0.892

7/22/02 121 0.338 0.406 0.069 0.527

7/23/02 103 0.473 0.322 –0.151 1.067

7/24/02 426 0.474 0.498 0.024 0.349

7/25/02 124 0.474 0.499 0.026 0.200

7/26/02 279 0.392 0.471 0.080 0.937

7/27/02 322 0.483 0.433 –0.050 0.637

7/28/02 150 0.390 0.416 0.025 0.215

7/29/02 91 0.478 0.481 0.003 0.020

7/31/02 274 0.452 0.411 –0.041 0.472

8/1/02 354 0.497 0.521 0.024 0.318

8/2/02 169 0.401 0.412 0.010 0.095

8/3/02 242 0.412 0.318 –0.094 1.028

8/4/02 208 0.417 0.383 –0.033 0.339

8/5/02 87 0.421 0.342 –0.079 0.511

8/6/02 108 0.517 0.357 –0.160 1.162

8/7/02 69 0.654 0.540 –0.114 0.654

8/8/02 198 0.367 0.298 –0.069 0.684

8/9/02 120 0.444 0.504 0.059 0.454

8/10/02 218 0.417 0.399 –0.018 0.186

8/11/02 51 0.480 0.379 –0.102 0.497

8/12/02 369 0.325 0.405 0.080 1.079

33

Table B3


Date of OSN reading

N read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/21/02 72 0.486 0.419 –0.066 0.390

7/22/02 54 0.483 0.503 0.020 0.101

7/23/02 53 0.414 0.438 0.025 0.123

7/24/02 61 0.458 0.493 0.036 0.193

7/25/02 162 0.505 0.502 –0.003 0.029

7/27/02 359 0.474 0.473 –0.001 0.017

7/28/02 222 0.440 0.457 0.017 0.175

7/29/02 138 0.384 0.505 0.121 0.993

7/30/02 217 0.417 0.320 –0.097 1.008

7/31/02 149 0.318 0.309 –0.009 0.075

8/1/02 239 0.393 0.393 –0.001 0.010

8/10/02 209 0.388 0.485 0.097 0.985

8/11/02 181 0.419 0.530 0.111 1.050

8/2/02 191 0.367 0.299 –0.068 0.658

8/3/02 183 0.386 0.371 –0.015 0.144

8/4/02 277 0.405 0.459 0.054 0.627

8/5/02 170 0.479 0.554 0.075 0.687

8/6/02 68 0.416 0.448 0.032 0.183

8/8/02 203 0.521 0.554 0.033 0.326

8/9/02 208 0.360 0.387 0.028 0.279

34

Table B4


Date of OSN reading

N read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/22/02 69 0.681 0.698 0.017 0.096

7/23/02 405 0.665 0.659 –0.005 0.073

7/25/02 473 0.673 0.679 0.006 0.091

7/26/02 146 0.632 0.633 0.001 0.009

7/27/02 105 0.701 0.691 –0.010 0.071

7/28/02 249 0.683 0.675 –0.007 0.078

7/29/02 491 0.644 0.661 0.017 0.265

7/30/02 179 0.648 0.644 –0.004 0.036

7/31/02 195 0.618 0.610 –0.008 0.078

8/9/02 191 0.626 0.648 0.022 0.216

8/10/02 133 0.690 0.679 –0.011 0.089

8/11/02 66 0.706 0.720 0.014 0.077

8/12/02 225 0.620 0.649 0.029 0.309

35

Table B5


Date of OSN reading

N read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/22/02 143 0.679 0.681 0.002 0.018

7/23/02 566 0.653 0.652 –0.001 0.009

7/24/02 506 0.616 0.624 0.009 0.138

7/25/02 862 0.635 0.637 0.001 0.031

7/26/02 653 0.651 0.652 0.001 0.027

7/27/02 205 0.633 0.620 –0.013 0.128

7/29/02 1,080 0.626 0.625 –0.001 0.031

7/30/02 210 0.661 0.665 0.004 0.040

7/31/02 734 0.630 0.621 –0.009 0.179

8/8/02 360 0.626 0.637 0.011 0.147

36

Table B6


Date of OSN reading

N read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/23/02 71 0.682 0.682 0.001 0.004

7/24/02 243 0.551 0.569 0.018 0.197

7/25/02 391 0.635 0.637 0.002 0.031

7/26/02 256 0.625 0.648 0.022 0.252

7/27/02 499 0.598 0.601 0.003 0.040

7/29/02 242 0.648 0.647 –0.001 0.010

7/30/02 142 0.621 0.631 0.010 0.087

8/8/02 275 0.631 0.641 0.010 0.117

8/9/02 464 0.641 0.648 0.007 0.109

8/12/02 77 0.612 0.651 0.039 0.236

37

Table B7


Date of OSN reading

N read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/25/02 53 0.821 0.818 –0.003 0.017

7/26/02 143 0.666 0.671 0.005 0.040

7/27/02 101 0.651 0.631 –0.019 0.133

7/30/02 55 0.729 0.754 0.026 0.131

7/31/02 196 0.679 0.658 –0.020 0.199

8/1/02 294 0.645 0.632 –0.013 0.160

8/2/02 382 0.696 0.694 –0.002 0.034

8/3/02 162 0.715 0.722 0.007 0.064

8/4/02 116 0.755 0.759 0.004 0.031

8/5/02 552 0.711 0.713 0.002 0.038

8/6/02 355 0.672 0.676 0.004 0.052

8/7/02 463 0.677 0.681 0.005 0.075

8/8/02 148 0.675 0.674 –0.001 0.008

8/10/02 148 0.751 0.763 0.012 0.106

38

Table B8


Date of OSN reading

N read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/26/02 27 0.504 0.524 0.020 0.069

7/29/02 6 0.430 0.731 0.301 0.368

7/30/02 64 0.649 0.670 0.021 0.117

7/31/02 132 0.671 0.653 –0.018 0.147

8/1/02 308 0.599 0.597 –0.002 0.019

8/2/02 729 0.602 0.604 0.002 0.041

8/5/02 949 0.627 0.651 0.024 0.531

8/6/02 1,017 0.615 0.620 0.005 0.120

8/7/02 674 0.606 0.628 0.022 0.397

8/8/02 515 0.672 0.687 0.016 0.253

8/9/02 13 0.585 0.542 –0.043 0.096

8/10/02 471 0.590 0.546 –0.044 0.678

8/11/02 291 0.556 0.565 0.009 0.107

8/12/02 413 0.593 0.615 0.022 0.313

39

Table B9


Date of OSN reading

N Read in OSN

Oper FR & MC

OSN FR & MC

Difference between

correlations Z

7/30/02 189 0.766 0.754 –0.012 0.113

7/31/02 562 0.710 0.713 0.003 0.053

8/1/02 462 0.734 0.728 –0.005 0.080

8/2/02 627 0.697 0.687 –0.010 0.181

8/3/02 607 0.722 0.729 0.007 0.125

8/4/02 116 0.692 0.708 0.016 0.120

8/5/02 180 0.698 0.708 0.010 0.098

8/6/02 704 0.694 0.688 –0.005 0.097

8/7/02 593 0.718 0.722 0.004 0.072

8/8/02 52 0.703 0.683 –0.020 0.099

8/9/02 50 0.836 0.821 –0.015 0.073

40

applying the online scoring network (osn) to advanced placement

Documents