a study of variability in the interpretation of sputum...

24
(CANCER RESEARCH 26, 2122-2144, October 1966] A Study of Variability in the Interpretation of Sputum Cytology Slides P. G. ARCHER [1], I. KOPROWSKA [2], J. R. McDONALD [3], B. NAYLOR [4], G. N. PAPANICOLAOU [5],1 AND W. O. UMIKER [6] [/I Department of Chronic Diseases, Johns Hopkins University School of Hygiene and Public Health, fÃ-altimore, Maryland, [2] Department of Pathology, Hahnemann Medical College, Philadelphia, Pennsylvania, [3] Department of Pathology, Harper Hospital, Detroit, Michigan, [4] Department of Pathology, University of Michigan, Ann Arbor, Michigan, [5] Papanicolaou Cancr Research, Miami, Florida, [6] St. Joseph Hospital, Lancaster, Pennsylvania Summary The study reported here was designed to investigate the nature and extent of variability in the malignancy classification of sputum cytologie specimens. The study was conducted in 2 phases. The 1st phase was specifically intended to measure the variability of screeners and cytology centers from multiple re ports on the same material. It was shown that screeners differ in their degree of concordance with one another and that the mag nitude of this difference is, on the average, somewhat greater for screeners in different centers than for screeners in the same center. The data suggests, however, that a screener will tend to have a greater degree of concordance with his own prior report than with that of another screener. Cytology center results were assessed by assigning a panel opinion value to each slide based on the data collected and com paring the performance of each cytology center to this standard value. The results indicated generally good agreement on stand ard negative and standard positive slides, with somewhat less consistent agreement on the intermediate classifications of non- negative slides. Evidence was adduced to demonstrate different "levels of sus picion" (in terms of the probabilities of classifying slides) among the 4 participating centers, the most marked instance being a pronounced tendency to underread on the part of the screeners from 1 center, relative to the consensus. The 2nd phase of the study was an attempt to assess the con cordance of the opinions of cytologists without the disturbing influence of screener variability. Because of practical considera tion, the data are limited to 5 readings of the slides by 3 cytolo gists. Overall consistency for non-negative slides was, as expected, better than Phase I results. The results of this phase suggest another approach to the analysis of ordered categoric classification data when a degree of subjective judgment is involved. A simple mathematic model of the screening process, together with some numeric illustrations, are given. Although the data presented here are limited and must be in terpreted in the light of a number of reservations, it is hoped that 1Deceased; formerly Director, Papanicolaou Cancer Research, Miami, Florida. the detailed reporting of the results of this study will stimulate other investigators to examine some of the questions raised. The American Cancer Society-Veterans Administration Co operative Pilot Study was initiated in 1958 in order to compare the relative efficacy of semiannual sputum cytologie and radio- logic screening in the early detection of lung cancer and to deter mine whether these methods would be practical for mass screen ing of a nonpatient population group. The general plan of the study was for 6 VA domiciliary establishments to take chest X-ray films and to collect sputa semiannually from all male members of the domiciles. All X-rays were subjected to 3 read ings; after first being read in the domicile, they were read inde pendently by 2 radiologists in the radiology center. Slides were prepared from sputum specimens and sent for screening to 1 of 4 participating cytology centers. Screening was done by technicians with all suspicious cells being examined by a cytologist. When a potentially significant abnormality was discovered by either of the screening devices, necessary diagnostic examinations were performed; these included further and more detailed radiologie studies, bronchograms, bronchoscopy, and exploratory thoracot- omies. For those individuals who were diagnosed as having lung cancer, the method of treatment was determined by and carried out at each VA hospital. The screening program continued for a 3-year period during which time 14,607 persons were screened at least once. A detailed report of the methods used and the results obtained is contained in an accompanying paper. Early in the study, a preliminary analysis of the screening findings indicated that there was a significant difference in fre quencies of abnormal findings for sputum cytologie examination reported from the 4 cytology centers even though the allocation of slides to the centers from the domiciliaries was random. This suggested the existence of variability in the interpretation of the sputum smears either by the screeners, cytologists, or both. This was not unexpected since such variability has been observed with the use of many diagnostic technics. However, it suggested the desirability of a study to determine the extent of such variability in the interpretation of sputum cytologie smears. The cooperating cytologists, together with statistics center personnel (P.G.A.), planned and conducted such a study, of which this is a report. It should be emphasized that this study should only be considered a limited evaluation of the true situation since the complexity of 2122 CANCER RESEARCH VOL. 26 on March 29, 2020. © 1966 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Upload: others

Post on 22-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

(CANCER RESEARCH 26, 2122-2144, October 1966]

A Study of Variability in the Interpretation of

Sputum Cytology Slides

P. G. ARCHER [1], I. KOPROWSKA [2], J. R. McDONALD [3], B. NAYLOR [4],G. N. PAPANICOLAOU [5],1 AND W. O. UMIKER [6]

[/I Department of Chronic Diseases, Johns Hopkins University School of Hygiene and Public Health, fíaltimore, Maryland, [2] Departmentof Pathology, Hahnemann Medical College, Philadelphia, Pennsylvania, [3] Department of Pathology, Harper Hospital, Detroit, Michigan,[4] Department of Pathology, University of Michigan, Ann Arbor, Michigan, [5] Papanicolaou Cancr Research, Miami, Florida, [6] St.Joseph Hospital, Lancaster, Pennsylvania

Summary

The study reported here was designed to investigate the natureand extent of variability in the malignancy classification ofsputum cytologie specimens. The study was conducted in 2phases. The 1st phase was specifically intended to measure thevariability of screeners and cytology centers from multiple reports on the same material. It was shown that screeners differ intheir degree of concordance with one another and that the magnitude of this difference is, on the average, somewhat greater forscreeners in different centers than for screeners in the same center.The data suggests, however, that a screener will tend to have agreater degree of concordance with his own prior report thanwith that of another screener.

Cytology center results were assessed by assigning a panelopinion value to each slide based on the data collected and comparing the performance of each cytology center to this standardvalue. The results indicated generally good agreement on standard negative and standard positive slides, with somewhat lessconsistent agreement on the intermediate classifications of non-

negative slides.Evidence was adduced to demonstrate different "levels of sus

picion" (in terms of the probabilities of classifying slides) among

the 4 participating centers, the most marked instance being apronounced tendency to underread on the part of the screenersfrom 1 center, relative to the consensus.

The 2nd phase of the study was an attempt to assess the concordance of the opinions of cytologists without the disturbinginfluence of screener variability. Because of practical consideration, the data are limited to 5 readings of the slides by 3 cytologists. Overall consistency for non-negative slides was, as expected,better than Phase I results.

The results of this phase suggest another approach to theanalysis of ordered categoric classification data when a degree ofsubjective judgment is involved.

A simple mathematic model of the screening process, togetherwith some numeric illustrations, are given.

Although the data presented here are limited and must be interpreted in the light of a number of reservations, it is hoped that

1Deceased; formerly Director, Papanicolaou Cancer Research,Miami, Florida.

the detailed reporting of the results of this study will stimulateother investigators to examine some of the questions raised.

The American Cancer Society-Veterans Administration Cooperative Pilot Study was initiated in 1958 in order to comparethe relative efficacy of semiannual sputum cytologie and radio-logic screening in the early detection of lung cancer and to determine whether these methods would be practical for mass screening of a nonpatient population group. The general plan of thestudy was for 6 VA domiciliary establishments to take chestX-ray films and to collect sputa semiannually from all malemembers of the domiciles. All X-rays were subjected to 3 readings; after first being read in the domicile, they were read independently by 2 radiologists in the radiology center. Slides wereprepared from sputum specimens and sent for screening to 1 of 4participating cytology centers. Screening was done by technicianswith all suspicious cells being examined by a cytologist. When apotentially significant abnormality was discovered by either ofthe screening devices, necessary diagnostic examinations wereperformed; these included further and more detailed radiologiestudies, bronchograms, bronchoscopy, and exploratory thoracot-omies. For those individuals who were diagnosed as having lungcancer, the method of treatment was determined by and carriedout at each VA hospital. The screening program continued fora 3-year period during which time 14,607 persons were screenedat least once. A detailed report of the methods used and theresults obtained is contained in an accompanying paper.

Early in the study, a preliminary analysis of the screeningfindings indicated that there was a significant difference in frequencies of abnormal findings for sputum cytologie examinationreported from the 4 cytology centers even though the allocationof slides to the centers from the domiciliaries was random. Thissuggested the existence of variability in the interpretation of thesputum smears either by the screeners, cytologists, or both. Thiswas not unexpected since such variability has been observed withthe use of many diagnostic technics. However, it suggested thedesirability of a study to determine the extent of such variabilityin the interpretation of sputum cytologie smears. The cooperatingcytologists, together with statistics center personnel (P.G.A.),planned and conducted such a study, of which this is a report.It should be emphasized that this study should only be considereda limited evaluation of the true situation since the complexity of

2122 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 2: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

the process of cytologie screening would indicate that a fairlylarge and elaborate study would be necessary for a thorough evaluation.

Process of Cytologie Screening

Cytologie screening is a 2-stage process. Prepared sputum slidesare first carefully examined by a cytotechnician (screener) inorder to identify cells (or groups of cells) which are sufficientlyabnormal to require interpretation by a cytopathologist (cytolo-gist). These cells are identified by an ink dot placed in somesystematic manner with respect to the cell. The cytologist thenexamines the dotted cells and classifies the slide. Sometimes, thecytologist will either partially or wholly rescreen a slide, if anopinion cannot be reached on the basis of the dotted cells. Thedegree to which this is done varies considerably from person toperson and often depends on the type of material under consideration.

It is important to stress that only those slides on which thescreener observes apparent abnormalities are routinely sent to thecytologist for interpretation. In other words, if the screener considers a slide to be negative, or for some reason inadequate forclassification, that slide is not ordinarily seen by the cytologist.Cytologists in most laboratories make it a practice to reviewsamples of such slides in order to reduce the probability of falsenegatives. However, in this study of variability, any slide callednegative or unsatisfactory by a screener was not seen by the cytologist.

In general, one is concerned with 2 types of variability in interpretation, the interindividual and the intraimlividual. Theformer results from 1 individual interpreting a given slide differently from another, and the latter occurs when the same individual interprets the same slide differently at different times.From the description of the process of cytologie screening, thefinal interpretation of a slide clearly is affected by a complexcombination of these 2 types of variability, since both the cytotechnician and cytologist may show such variability. Moreover,these effects may not be independent since 1screener, while callinga slide non-negative, may dot different cells than another screenercalling the same slide non-negative, or he may dot different cellson the same slide at different times.

It also seems probable that a more subtle and less easily identified type of interaction occurs in most, if not all, cytology laboratories, namely, an interpersonal correlation or interactionbetween screener and cytologist, which depends primarily on theamount and type of mutual experience the pair have had as a"team." This might be characterized, for example, by the fre

quency with which a cytologist felt the necessity of partially orwholly rescreening a slide referred by a given screener. This frequency might well vary from screener to screener within a laboratory and would be very likely to affect differentially the finalclassification given to slides from the different screeners. Thisfactor is virtually impossible to assess with limited data, and forthat reason alone, it will be largely ignored. However, the probable effect of such a factor should be borne in mind in any interpretation of the results.

Superim]>osed on each of the foregoing considerations is therelationship of the amount of classification variability with thedegree of true "non-negativeness" of the material in question. In

almost any study of a reasonable size, there will be s|)ecimenswhich so obviously fall into a given class (e.g., negative) thatvirtually no one with adequate training could fail to recognizethis fact. On the other hand, in spite of the fact that we tend tothink of the degree of "non-negativeness" as a 1-dimension continuum ranging from "true" negative to "true" positive, the

situation is reasonably thought of as a multidimensional one dependent ui>onmany considerations, some of which cannot even beexplicitly defined, since they result from the intangible qualityof experience and undoubtedly change over time in an unpredictable manner.

Nevertheless, for the practical purpose of assessing study results, it is always necessary to force a scheme of classification intoa relatively small number of presumably mutually exclusivecategories. When this is done, the jierformance of a person or alaboratory is systematically affected by the "level of suspicion"

maintained, which in turn is strongly influenced by the natureof the study and the possible consequences of misclassification.Obviously, in most (probably all) screening studies, false-positiveand false-negative reports are viewed from an entirely differentpoint of view than they would be in a study designed to determinediagnosis, rather than just suspicion.

Method of study

Selection of Slides for Study

As has been indicated in an accompanying paper, a system ofclassification for sputum cytologie material was devised for thisstudy from existing systems and agreed upon by the participatingcytologists. This is given below, together with the corres]xmdingPapanicolaou classification numbers, where relevant:

(a) Negative(&) Ambiguous cells

(c) Suspect

(d) Positive

(Papanicolaou numbers 1, 2)(Sufficiently ambiguous cells to warrant

further workup)(Potentially significant abnormality—

Papanicolaou numbers 3a, 3b)(Cells consistent with accepted cyto

logie criteria of malignancy—Papa-nicolaou numbers 4, 5)

In addition, any slide which was considered unsatisfactoryfrom a technical standpoint could be so classified.

For this variability study, a group of slides was selected inOctober 1960, from those which had been used in the screeningstudy and stored at 1 of the cytology centers. Two hundred slideswere selected, with an effort made to obtain representative samples of each of the 4 classification categories. To do this, a list wasprepared of all non-negative screening results which had been received through that date from the chosen center. This list comprised the following number of cases:

Screen report

PositiveSuspectAmbiguous cells

Total

No. of cases

1644

112

172

Due to the design of the screening study, each of the abovecases consisted of 4 slides. To select individual slides, an experi-

OCTOBKH 1(Kill 2123

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 3: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Xaylor, G. N. Papanicolaou, and W. O. Umiker

enced screener was asked to rescreen all the positive cases, and asystematic sample of the suspect and ambiguous cell cases, andto pick out what was in her judgement the "best" 1 of the 4 in

each case. (In 4 cases, 2 of the 4 slides were selected so that 20positive slides could be obtained.) This group was then sampledto obtain a total of 140 non-negative slides, with the followingbreakdown: 20 positive; 40 suspect; 80 ambiguous cells.

To obtain negative slides, a list was made of all negative reports on initial screening for those participants whose study numbers ended with the digit 1. This list was systematically sampledin order to obtain 60 such cases. The screener was again asked toselect what she considered the "best" 60 negative slides.

From this total of 200 slides, a group of 100 was randomlyselected for use in the present study. This was done in such amanner as to result in the following distribution:

Original classification

PositiveSuspectAmbiguous cellsNegative

Total

.Yo. of slides

20303020

100

It was thought that there would be sufficient disagreement ininterpretation so that the use of a round number of slides of eachclassification would not be obvious to the screeners. This apparently turned out to be the case.

There are several considerations which should be mentionedat this juncture. First, the above classification is not thought ofas representing the "true" situation with respect to the slidesunder consideration, but is rather an estimate of the "true"

situation, based on available information. Furthermore, the aboveclassification was originally based on the cytologist's considera

tion of all 4 slides, rather than the particular one selected in eachcase. Also, although these slides were selected from the routinematerial of the screening study, and can be thought of as representative of the material from the early part of that study, interpretation of this material is undoubtedly affected by the relative frequencies of slides in the various categories. In the presentstudy, the ratios of the numbers of slides from most to least severeare 2:3:3:2, whereas the ratios of cases observed in the screening study up until that time were 1:3:7:500. These points represent fundamental but virtually unavoidable differences betweenthe variability study and the screening study.

Design of Study

Because of the considerations mentioned in the introduction,the study was conducted in 2 phases. Phase I was intended toprovide estimates of both intra- and interscreener variability, aswell as an estimate of the variability between cytology centers,as represented by the screener-cytologist teams. Phase II was anattempt to bypass the screener in order to estimate the variabilitybetween cytologists in their classification of a set of slides whichhad been permanently dotted.

The selected slides had their identifying marks covered andwere randomly numbered at the statistics center and sent to eachcytology center in turn. Before being sent back to the same center, the slides were randomly renumbered. It should be noted thatthe participants at the cytology centers were unaware of the

design and the fact that the same 100 slides were being sent eachtime. The standard covering letter which was sent with eachshipment of slides mentioned the possibility that some of the slidesmight be the same from batch to batch. This was done becauseit was strongly suspected that the participants would have atenacious memory for some of the more bizarre cell patterns observed on some of these slides, and to convey the erroneous impression that random samples of some larger group of slides werebeing sent. In treating the material in this manner and becauseof the time lapse between any 2 consecutive readings by a givenlaboratory (about 5 months on the average, with a minimum of3 months), it is felt that the individual readings are as independent as they can be made.

Phase I

Phase I of the study was conducted in 2 rounds; that is, theslides were sent to each laboratory twice. A modification of abalanced incomplete block design was developed so that the 2screeners in each center could be compared with each other, aswell as with any screener from any other center. In addition, agroup of 10 slides was reserved for rereading by the same screeneron the 2 rounds, in order to compare screeners with themselves.

On each round, the slides were divided into 2 groups of 50 inaccordance with the design. Detailed instructions on the classification and recording of results were provided, together with standard recording sheets. Screeners were instructed to examine slidesin a manner conforming as closely as possible to the technics usedin the screening study, and to refer only those slides consideredatypical (and dotted appropriately) to the cytologist for interpretation. The cytologist then examined the referred slides, recorded his findings, and returned all the study material to thestatistics center. Before sending the slides to another center, theywere carefully cleaned to remove all dots. Before the start of the2nd round of Phase I, the slides were randomly renumbered toprevent any possibility of recognition from this source.

On the 2nd round, slides were rearranged so as to be read by*he2nd screener in each center (except for the 10 slides previouslymentioned, which were read by the same person twice).

Originally, we had hoped to have only 2 participating screenersfrom each cytology center. However, because of the long periodof time during which this study took place, personnel changesrequired a 3rd screener to participate in each of 2 of the 4 laboratories. Also, circumstances required that on Round 2 the cytologist's reading for 1 center be done by 1 of the other participating

cytologists.Another peculiarity in the phase I data results from the fact

that one of the cytologists left the screening study prior to theinitiation of the variability study. His place was taken in thatcenter by another eytologist, although the screeners remained thesame. This provided an opportunity to compare these 2 cytologists with each other on slides with identical screening byscreeners with whom both cytologists had worked. Consequently,on each round of screening by that center, 2 independent cytologists' classifications are available.

Phase II

Phase II of the study was designed to compare cytologists directly, on a previously dotted set of slides. This phase was also

2124 CAXCKK RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 4: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

conducted in 2 rounds. However, because of the heavy outsidedemands on the eytologists, data are only available on 2 roundsfor 2 cytologists and 1 round for a 3rd.

Before the beginning of Phase II, the slides were again randomly renumbered and sent to an entirely different cytologycenter for screening and permanent dotting. Then the slides weresent to each of the participating cytologists for interpretation.

In assessing the results of this phase, there are several pointsthat should be mentioned; some will be elaborated upon later.First, screening and dotting technics may differ considerably between centers. The system of dotting of Phase II slides was sufficiently different from those to which the participating cytologists were accustomed that the data must be considered to bestrongly affected by this factor.

In addition, this phase was begun about 1.5 years after thebeginning of Phase I (in fact, some of the slides were then over3.5 years old). Consequently, many of the slides had faded or hadbeen somewhat affected by the many cleanings and the greatdeal of handling to which they had been subjected during thecourse of this study.

Results—Phase I

Sceeeners

Table 1 presents a comparison of the readings by the screenerson the 1st and 2nd rounds for all cytology centers combined. Thistable was prepared by cross-tabulating the 1st and 2nd roundscreening readings within each cytology center and adding the 4tables, cell by cell. A X2 test of independence for this table will

succeed in rejecting the null hypothesis with a P < 0.001 ; thatis, the results of screening on the 2 rounds are not independentof each other. On the other hand, there is perfect agreement only(6 + 178 + 95)/400 = 279/400 = 70% of the time for the totaldistribution, or (211 + 95)/400 = 306/400 = 76% if the unsatisfactory and negative categories are grouped. (Grouping amountsto viewing the data as composed of the dichotomous categories:slides referred, and slides not referred to the cytologist.) In otherwords, for the data as a whole, a screener's decision on a slide

would not be upheld by a 2nd independent screening about 25%of the time.

Table 2 summarizes the % agreements for both 3-way and 2-way breakdowns for each center. When the data are viewed inthis way, it should be borne in mind that a random choice ofcategory on each screening would produce an expected agreement

TABLE 1COMPARISONor SCHBBNBRS'READINGSON IST AND2ND

ROUNDSIN ALL CYTOLOGYCENTERSCOMBINED

ROUND2

ROUNDIUnsatisfactoryNegativeNon

-negativeTotalUnsatis

factory68317Xegative1917824221Non-negative105795162Total35243122400

TABLE 2PERCENTAGREEMENTOF SCREENEHS'READINGSFORBOTH3- AND2-WAYCLASSIFICATIONON ROUNDS1

AND2 IN EACHCYTOLOGYCENTER

No. orSLIDECLASSIFICATIONS3-Way«2-Way"CYTOLOGYCENTERA7385Bli

egC6975D7077ALL

CENTERS7070

" 3-Way classification: unsatisfactory; negative; non-negative.6 2-Way classification: unsatisfactory or negative; non-nega

tive.

TABLE 3PERCENTAGREEMENTFORINTRA-ANDINTERSCREENEHREADING

COMPARISONSFOR BOTH3- AND2-WAYCLASSIFICATIONOFSCREENERS'READINGBY CYTOLOGYCENTER

Classification3-Way"2-Way»CytologycenterABCDAll

centersABC1)All

centersIntrascreener

comparisonGO7590757290801008085Intensereener

comparison74656771698466727675

" 3-Way classification: unsatisfactory; negative; non-negative.62-Way classification: unsatisfactory or negative; non-nega

tive.

of 33% in the 3x3 tables and 50% in the 2 x 2 tables. The contrasts of intra and interscreener comparisons for all cytologycenters combined are summarized in Tables 1 and 2 of AppendixB.2

The 3-way and 2-way % agreements for Appendix B, Tables1 and 2, are as follows:

3-way % agreement2-way % agreement

Within screeners

71.785.0

Between screeners

69.475.0

If the within- and between-reader proi>ortions are tested as 2independent samples, in neither the 3-way nor the 2-way caseare the differences found to be statistically significant. Intra- andinterscreener comparisons for each of the cytology centers aresummarized in terms of percentage agreements in Table 3. In

2Appendix B, Table 2, was formed by keeping a given screener

on the same table axis on which he first appeared, regardless whichslides he saw (i.e., the Round 1 readings on a given slide are not allon the same axis). A table of Round 1 vs. Round 2 readings on allslides, disregarding screeners, could be obtained by subtractingAppendix B, Table 1, from Table 1, cell by cell.

OCTOBER 1966 212.-,

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 5: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Naylor, G. N. Papanicolaou, and W. 0. Umiker

each case, the within-reader percentages are based upon either10 or 20 slides/center and are therefore less reliable than thebetween-reader percentages which are based on either 80 or 90slides/center. The within-reader % agreement is, with only 1exception, consistently higher than the between-reader % agree

ment. Furthermore, this statistic is not consistent between centers in either the 3-way or 2-way classifications.

A way of assessing these results is to perform a sign test ondisagreements between the 2 readings (4). A total of 14 tests arepossible. Since 3 screeners participated in Centers A and C,only 6 screeners can be compared on 2 readings of the same slides.The remaining 8 comparisons are between different screenerswithin the same center. These detailed tabulations are not presented in this report, but performance of these 14 tests revealsonly 1 significant difference: that between the 2 screeners inCenter B (P < 0.01). This leads to the conclusion that the screeners in Center B differ in their probabilities of calling slides non-

negative.Although the number of slides for within-screener comparisons

is small, it appears that the individual screener's average prob

ability of calling a slide non-negative is constant from round to

round. With the exception of Center B this average probabilityis the same for screeners within the same laboratory.

The comparisons of screeners' readings between each pair of

cytology centers are summarized in terms of % agreement inTable 4. It appears from these data that the cytology centers arenot consistent with each other, the agreement ranging from 58to 75.5% for the 3-way classification and from 69 to 80 % for the2-way classification.

Another way of viewing the same data is presented in AppendixB, Tables 3 and 4. These tables show, for each round separately,the patterns of slide screening classifications for the 4 cytologycenters and the number of slides on which each pattern was observed. For example, from Appendix B, Table 3, we see that onRound 1,11 slides were screened non-negative by all 4 cytologycenters; 7 slides were screened non-negative by Centers B, C, and

D, but either negative or unsatisfactory by Center A, etc.Data presented in this way can be analyzed statistically by a

method described by Cochran (4). The rationale behind the testcan be described as follows. Having been examined by all 4

TABLE 4

PERCENT AGREEMENT BETWEEN SCKEENERS IN DIFFERENTCYTOLOGYCENTERSFORBOTH3- AND2-WAY CLASSIFICATIONS

BY CYTOLOGYCENTER

TABLE 5DISTRIBUTION OF ALL NON-NEGATIVE SCKEENERS'

READINGS BY CYTOLOGYCENTER

CLASSIFICATION3-Way"2-Way"CYTOLOGYCENTERA

BCABCCYTOLOGY

CENTERB66.073.0C68.5

75.579.0

80.0D58.0

63.565.569.071.0

NO. OF NON-NEGATIVEREADINGS/SLIDE012345678Total%

of total non-negative readingsCYTOLOGY

CENTERA0000449183513B109589816188329c36409815186924D11136101310IG189734No.OFSLIDES222414567589100TOTALNo. OF NON-

NEGATIVEHEADINGS02428152435305672284100

" 3-Way classification: unsatisfactory; negative; non-negative.b2-Way classification: unsatisfactory or negative; non-nega

tive.

cytology centers, each slide will have received some specific number of between 0 and 4 non-negative readings on each round.If each cytology center is equally likely to have given any non-negative reading, then those slides with, say, 1 non-negativereading should have received, on the average, one-fourth of theirnon-negative readings from each center. Similarly, for those slideswith 2 non-negative readings, each pair of cytology centers is

considered equally likely to have produced the pair of readings,etc. The method of testing is essentially an analysis of varianceextension of the X2 test for 2 X 2 tables of matched samples. Analysis of Appendix B, Tables 3, 4 gives X2 values of 63.5 and 51.8,

respectively, each with 3 degrees of freedom. In both cases, thenull hypothesis of equal probability for each cytology center isrejected with P < 0.001.

Table 5 demonstrates more clearly the reason for rejection ofthe hypothesis. During the course of the 2 rounds of screening,each slide received a total of 8 readings (2 from each cytologycenter), and therefore had from 0 to 8 independent non-negative

readings. Table 5 shows how these readings were distributedamong the 4 centers for slides with each total number of non-negative readings. For example, on the 7 slides which had 5 non-negative readings each (giving a total of 7 X 5 = 35 non-negative

reports), 4 of the readings came from Center A, 9 each from Centers B and C, and 13 from Center D. It is apparent from thistable that there is a definite gradation among cytology centersin the propensity of a screener to refer a given slide to a cytolo-gist, with over twice as many non-negative screening opinions

from Center D as from Center A. Of course, a high or low levelof suspicion on the part of a screener has both its advantages anddrawbacks. If it is high, the amount of work referred to the cytolo-

gist for an opinion is correspondingly high, while if it is low, thecytologist may not have the opportunity to view slides which hemight call non-negative if they were referred to him. This point

will be discussed further in a later section.In summary, the data show significant differences between the

screeners in the 4 cytology centers, with somewhat less variabil-

212(5 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 6: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

ity, on the average between screeners, within centers than between centers, and it appears that the same screener will tend toagree with his own prior opinion of a slide more often than withanother screener's opinion.

Cytology Centers

Analysis of these data could be carried out by formulating amathomatic model which includes all parameters to be estimatedand using some standard technic to obtain the estimates desired.However, explicit solution of a model necessary to describe thedata under discussion would be extremely complicated and couldprobably only be solved by elaborate numeric technics, if at all.Furthermore, the total amount of data available would probablynot suffice to furnish stable estimates in any model of appropriatecomplexity.

An alternate method of handling these data, and one whichshould provide adequate insight into the questions of interest, isone similar to that used by Yerushalmy (14) in the analysis ofmultiple readings of X-ray films. The basic idea is to assign a"probable" or "standard" value to each slide in the study, based

on a consideration of the results obtained on all readings, and toassess the performance of participants relative to this standard.This procedure is justifiable from several points of view. First,the idea of a screening study is predicated on the concept of anunderlying continuum of possible screen results, ranging fromnegative to positive. As was pointed out earlier, this is an oversimplification of the true situation, but presumably if a singleslide were to be classified independently a large number of timesby a (or several) competent cytologist(s), there would be somecategory into which it would be placed more often than any othercategory. It should be stressed that the standard category foreach slide arrived at in this manner cannot be interpreted asrepresenting the unknown "true" value of the slide in question.

However, it can be considered as an estimate of the value. Onthe other hand, when a standard category is arrived at by thepooling of opinions from a number of different cytologists, thisfact should be borne in mind when an individual cytologist iscompared with this "standard," for if criteria or "levels of suspicion" differ greatly between individuals, we will, in many casesbe comparing a person's evaluation of a slide with a "standard"

with which he may be in complete disagreement. The presentanalysis nevertheless utilizes a combination of screeners' andcytologists' results to arrive at such a standard classification.

Each slide has been screened twice at each of 4 cytology centers, giving a total of 8 readings per slide. Each reading may be inany 1 of the categories; unsatisfactory, negative, non-negative.A standard classification based on these readings was defined asfollows:

1. The standard category is negative if 5 or more cytology center readings were negative and there were less than 3 non-negativereadings.

2. The standard category is non-negative if there were 4 ormore non-negative cytology center readings (regardless of thecategories of the other 4 readings).

3. Other slides were categorized individually by a method described below.

The 1st 2 of the above criteria were sufficient to classify 89 ofthe 100 slides, as shown in Appendix B, Table 5.

In addition to this gross classification, it was desired to classifyeach of the non-negative slides into 1 of the 3 non-negative categories: ambiguous cells, suspect, positive. In order to do this, alinear scale was assigned and an average taken of the non-negative readings by the cytologists who examined the slides. Inthose cases in which a slide was called negative by a cytologist,even though it was referred to him as non-negative, the negativereading was included in the average. Otherwise negative readingswere not included, since they were solely screener's opinions.

This procedure can easily be shown to approximate the leastsquares solution to the problem of assignment, provided equalweight is given to each cytologist's reading and to each category

of classification in an additive linear model. That is to say, theprocedure assumes that the categories, negative, ambiguous cells,suspect, and positive are equally spaced along a scale of suspicion for carcinoma. This is an assumption with which one mightdisagree. However, equal spacing was used in the absence of areasonable alternative.

The method just described was also used to classify the 11slides which were not automatically classified (Appendix B, Table5). Each of the 8 slides with 3 non-negative readings was ultimately classified into 1 of the non-negative categories using theabove system, while the other 3 were classified as unsatisfactory.

The distribution of slides among the 5 screening categories,after the standard values had been assigned was as follows:

Standard category

UnsatisfactoryNegativeAmbiguous cellsSuspectPositive

.Vo. of slides

3(¡16

1614

100

Actually, another pair of readings was available on each slide.As mentioned earlier, 1 of the original cytologists had left thestudy and had been replaced at his center by a new cytologist,although the screeners remained throughout. Consequently, during the 1st phase of this variability study, whenever slides werescreened at that center, the non-negative slides were referred toeach of the 2 cytologists for independent evaluation. However,because the screeners remained the same for these evaluations,it was felt that the "cytology center" results for this 5th "center"

could not be considered independent and they were therefore notused for classification puqjoses. This 5th cytologist is later referred to as Cytologist DI.

There is another point with resect to the standard classification which might be thought of as having introduced someamount of systematic bias into the standard classification. Circumstances required that the cytologist's readings of Round 2

for Center C be given by the cytologist for Center D. CytologistD had worked with the screeners at Cytology Center C. Later,we will examine the jiossible effect of this substitution.

Finally, it should be again emphasized that no special effortwas made to obtain slides of the best quality for this study. Infact, the relatively small number of non-negative slides that hadbeen obtained by the time this study was initiated would havemitigated against such selection, even if it had been attempted.As a consequence, some of the opinions rendered by both screeners

OCTOBER 19(Mi 2127

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 7: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Xaylor, G. X. Papanicolaou, and W. O. Umiker

TABLE 0

CLINICAL CONDITION OF PATIENTS AS OF JULY 1963 BY ORIGINAL AND STANDARDCLASSIFICATIONOF SLIDES

CLINICALCONDITIONOFPATIENTSASOF JULY1963Proven

lungcancerSuspectlungcancerCancer

upper respiratorytractOtherpulmonarydiseaseCancer

of othersitesNegativeexaminationTotal

with known clinical conditionClinical

conditionunknownTotalORIGINAL

CLASSIFICATIONUnsat"000000000Neg00011131720Amb101207111930Susp403414161430Pos114120220020Total164592145050100STANDARDCLASSIFICATIONUnsat100001213Neg301518184361Amb000012336Suspf)1320214216Pos63120113114Total164592145050100

" Original and standard classification: Unsat, unsatisfactory; Neg, negative; Amb, ambiguous cells;

Susp, suspect; Pos, positive.

TABLE 7PERCENT DISTRIBUTION OF ALL CYTOLOGYCENTER HEADINGS

FOR EACH STANDARDCLASSIFICATIONCATEGORY

STXNDARDCLASSIFICATIONUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveAll

categoriesCYTOLOGY

CENTERHEADINGSUnsatis

factory5468547Negative29875030763Ambiguouscells84291527Suspect821229189Positive000227013ALLREADINGS(No.)2448848128112800

and cytologists in this study were accompanied by remarks onthe quality of the slides submitted, indicating that the}- would

have liked to have had better slides on which to base an opinion,but were willing to assign a screening category other than unsatisfactory in some instances in the interest of completion of thevariability study. This adds a quality of uncertainty to thoseslides which received one or more unsatisfactory readings, andpossibly some others.

Appendix B, Table 6, shows the comparison between the standard classification and the original classification of the slides in thelarger screening study. It is interesting to note that with only 4exceptions, the slides were originally in at least as high a category as the standard. This situation is undoubtedly due, in largepart, to the fact that the original classification was based on atotal of 4 slides, rather than on the single slide chosen for thisstudy.

Another question of interest is the degree of concordance between the slide classification and clinical condition of the personfrom whom the material was collected. This is shown in Table 6for both the original classification and the standard classification.It should be noted that 4 people appear twice in this table, since

2 slides were taken from each of 4 people in order to increase thenumber of slides that were originally classified as positive. In thistable, the effect of the systematic difference between the originalclassification and the standard can be seen by the fact that all25 cases of either proved or suspect lung cancer or upper respiratory tract cancer are from the original non-negative group, whileonly 20 of these cases are among the standard non-negative

groups.Of the 96 people represented by these 100 slides, 32 were known

to be dead as of July, 1963. This includes all 4 people from whom2 slides were taken. Therefore, of the 100 slides, 36 representpeople who were known to be dead. Of these, autopsies are knownto have been performed on 15 persons, but a number of the casesof cancer had been confirmed by biopsy before death.

Table 7 shows the % distribution of all cytology center re|x>rtsfor each category of the standard classification. These can bethought of as approximations to the probability that a cytologycenter would classify a slide into a given category, given thevalue of the slide which would be assigned if a group opinion wereelicited from a panel of experts.3

The patterns displayed are more easily seen graphically andare presented in Chart 1. The cross-hatched bars in this histo

gram show the proportion of reading which agrees exactly withthe standard classification, whereas the dashed bars show, forthe non-negative standard categories, the total projxirtion ofnon-negative cytology center readings, even though they may

not have agreed exactly with the given standard category.From these data, one would conclude that about 87% of nega

tive slides would be so classified, while 6% would be called non-

negative. If the unsatisfactory readings are discounted, thesefigures would be 93% and 7%, respectively. On the standardambiguous cell slides, 50% of the readings were negative, while

3This interpretation is subject to the qualification that the

cytologist in a given laboratory sees only slides which have beenreturned by his screeners. He may either overrule their opinion ofnon-negative, or accept it and further classify the slide.

2128 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 8: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

••EXACTAGREEMENTWITH STANDARD

¡^AGREEMENT ON NON-NEGATIVES

* U N A S P

** Unsatis

factory

U N A S P

Negative

UNAS PN-N

AmbiguousCells

UNAS PN-N

Suspect

UNAS PN-N

Positive

* Cytologists' Readings

** Standard ClassificationCHART1. Percent distribution of all cytology center readings for each standard classification category. U, unsatisfactory; Ar, nega

tive; .4, ambiguous cells; <S,suspect; P, positive; N-N, non-negative.

29% were ambiguous and 12% suspect, giving a total non-negative proportion of 41%. Again, if the unsatisfactory readings aredisregarded, these pro)>ortions become 55% negative, 32% ambiguous, and 14% suspect, for a total of 46% non-negative.Evidently, opinions arc divided about equally for this categoryof slide. The situation improves with the sus]>ect category to theextent that 30% of the readings are negative. While the exactagreement category is again 29%, 15 and 22% of these standardsuspect slides are called ambiguous cells and positive, respectively, giving a total of 67 % non-negative readings. With the unsatisfactory readings omitted, the proportions for the suspect-category become: negative, 31% ; ambiguous cells, 16%; suspect,19%; and positive, 23%, for a total of 69% non-negative. Forstandard positive slides, the proportion of negative readings dropsto 7%, while the non-negative readings total 90% as follows:ambiguous, 2%; suspect, 18%; positive, 70%. These slides produced only 4 unsatisfactory readings, so even with these readingsomitted, the proportions are relatively unaffected, with suspectsincreasing to 19% and positives increasing to 73%, giving a totalnon-negative proportion of 93%.

These results are about what one would have expected. Theclearly negative and clearly positive slides tend to be so classified(at least on a plus-minus scale) about 90% of the time, while theintermediate categories produce less consistent results, whichtend to improve with the degree of "non-negativeness." The

overall results appear encouraging. However, a similar presentation for the individual centers reveals a fair amount of inconsistency within each category of slide classification.

The data for each center are summarized in Chart 2. It can beseen from this figure that the same general pattern as that pre

sented in Chart 1 is maintained by 3 of the 4 cytology centers,whereas center lìhas a more uniform distribution over all categories, whether the exact agreement or the total non-negativeagreement bars are observed. Apparently this center, on theaverage, more nearly represents the consensus than any other.If the standard classification is interpreted as representing the"true" situation, it is seen from Table 8 that Center B, while

having the highest level of suspicion on the average, also producedthe greatest numbers of "false-jwsitive," but the least numberof "false-negative" readings.

Inasmuch as the standard classifications are based on the consensus of the total group, Chart 2 depicts the relative levels ofsuspicion among the 4 cytology centers. All centers appear tohave about the same opinion on the standard negative slides, thecytology center negative percentages ranging from 83 to 93, withpercentages of "over-calling" ranging from 1 to 15%.

The standard ]x>sitive category is also comparatively stable,relative to the other non-negative categories. Centers A, B, andC are all about equal in percentage of exact agreement with thestandard, though Center A produces fewer total non-negativereadings in this category than either Center B or C. Center Dis substantially higher in the exact agreement category than anyof the other centers, but about the same as Center C in total percentage of non-negative readings.

The standard ambiguous cell and suspect categories show moredivergence of opinion. Center A appears to "under-read" in both

of these categories, relative to the other centers, as shown by thelarge "negative" bars for that center. As will be shown later,

there is some evidence to suggest that most, if not all, of this"under-reading" can be attributed to the screeners for that cen-

OCTOBEH ISMi 2129

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 9: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Naylor, G. A". Papanicolaou, and W. 0. Umiker

••EXACT AGREEMENT

UIJJ NON-NEGATIVE AGREEMENT

¡CENTER A

UNAS PN-N UNAS PN-N U N A S P N-

ni

UNASP UNAS PN-N UNAS PN-N U N A S P

ASP UNAS PN-N UNAS PN-N UNAS PIâ„¢

TABLE 8No. OF "FALSE-POSITIVE" AND "FALSE-NEGATIVE" READINGS

USING STANDARD CLASSIFICATION AS REPRESENTING THE"TRUE" CLASSIFICATION

iVUNASP UNAS PN-N UNAS PN-N UNAS PN-N

** Negative Ambiguous Suspect Positive

Cells

* Cytology Center Readings

** Standard Classification

CHART2. Percent distribution of readings from each cytologycenter for each standard classification category. For abbreviations, see legend to Chart 1.

ter. As was noted earlier, Center B appears to conform morenearly than any other center to the group opinion, showing thehighest % agreement with respect to both the individual category and total percentage of non-negative readings for the

standard ambiguous cell and suspect categories. Center C showsthe highest percentage of "under-reading" in the standard am

biguous cell group. Data to be presented later suggest that thisrepresents a mixture of both screener and cytologist "under-reading," relative to the consensus. Center D is comparable to

Center C for the exact agreement bars in the standard ambiguouscell and suspect categories, as well as for the total non-negative

percentage in the suspect group, but shows a generally higherlevel of suspicion for the ambiguous cell group as shown by thehigher "total non-negative" bar, for that category.

For some reason, the percentage of slides called non-negative

was consistently higher on Round 2 than Round 1, except forCenters D and DI as shown in Appendix V>,Table 7.

Table 9 contains a comparison of Round 1 readings with Round

CytologycenterABC1)TotalNo.of non-

negativereadings3581(ÃŒO64240No.of "false

positives"1186732No.of "false

negatives"3110181170

Note: The standard classification "unsatisfactory" was omittedin computing these numbers.

TABLE 9PERCENT DISTRIBUTION OF THE READINGS ON ROUND 2 WITHINEACH CATEGORYOF ROUND 1 READINGS FOR ALL CYTOLOGY

CENTERS COMBINED

ROUND1READINGUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveAH

Round1readingsROUND

2READINGUnsatis

factory2237024Negative51783526060Ambiguouscells8710747Suspect14727152712Positive5421526716ALLREADINGS(No.)372622!)2745400

2 readings for all the cytology centers combined (A through D).The percentage in this table represent the % distribution of thereadings in Round 2, given Round 1 readings; they are estimatesof the probabilities of various classifications on a 2nd reading,given the value of a first reading.

It should be noted that Center Di was deliberateli' omitted

from these comparisons because readings from that center werenot included when the standard classifications were derived, dueto the identity of screening between Centers D and DI.

Appendix B, Table 8, presents a comparison of the readings ofCytology Centers D and DI, for both rounds, on 100 slides whichhad had identical screening. The proportion of agreement is verygood. The 2 readings agree exactly on 74% of the readings, andare within 1 category of each other on a total of 92.5% of thereadings, though there is a 9% difference in total proportion ofnon-negatives. A substantial part of this agreement is, of course,

due to the large pro]»rtion of consistently negative readings.Appendix B, Table 9, shows the readings of the cytologists

from these centers on the 37 slides which were screened non-nega

tive (that is, these slides were referred to the cytologists for opinions) on both rounds. The percentage of exact agreement in thistable is only 53%, but 88% of the readings are within 1 categoryof each other.

An even more interesting comparison is shown in Appendix B,Table 10. This gives the 4 readings of these 2 cytologists on each

2130 CANCER RESEARCH VOL. 20

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 10: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

round of screening for each of the 37 slides represented in Appendix B, Table 9. This table does not separate the readings specifically by round number or screener, but that could be done toinvestigate interactions between screeners and cytologists. Appendix B, Table 10 is arranged in approximately decreasing order ofnon-negativeness of readings.

The 1st 6 slides show perfect agreement on positive readings.The next 8 each have received 3 ]X>sitivereadings, and 1 readingof either suspect or ambiguous cells. In 6 of these 8, CytologistDi gave the lower valuation. The next 7 slides have received 2positive reports and 2 other non-negative reports. Only the 1st2 and the last of this 7 show uniform agreement within bothrounds of screening, indicating that the differences between the2 rounds may be attributable to screening differences. The other4 show various patterns of agreement which arc virtually impossible to assess without further information. They may reflectdifferences of opinion on the dotted cells, or be the result of apartial or total rescreening of the slide by the cytologist, or theymay result from other considerations.

Table 10 was prepared to demonstrate the frequency withwhich a cytologist agreed or disagreed with the opinions of hisscreeners on those slides which were called non-negative by thescreeners, and consequently referred to the cytologist. It is seenthat Cytologist A never disagreed with his screeners, but classified every slide referred to him into a non-negative category.These are the data which were referred to in the discussion of

TABLE 10DISTRIBUTIONOF CYTOLOGISTHEADINGSON SLIDESREFERRED

AS NON-NEGATIVEBY SCREENERS,BY ROUND AND EACHCYTOLOGYCENTER

CYTOLOGYCENTERABCDDiROUND12Total12Total12Total12Total12TotalCYTOLOGISTREADINGUnsatis

factory000000000235112Negative000022099199286713Non-negative152035275481263400333164473582TOTAL152035275683264369544397544397

Chart 2 which seem to indicate that the reason Center A hadconsistently fewer non-negative readings than the other centersmay be due solely to the low level of suspicion of the screenersfor that center.

Center B shows a similar pattern, with the cytologist reversinga screener's non-negative reading on only 2 occasions out of 83.

Inasmuch as 56 slides were referred to the cytologist on Round2, it appears that the screeners may have missed as many as 27slides on Round 1 (in addition to the 27 referred on that round)which the cytologist might have considered non-negative. Cytologist C demonstrates the same pattern, the Round 2 cytologistreading for Center C having been performed by Cytologist D.

Cytologist D and Di are the only 2 of the 5 who overruled theirscreeners to any extent in this study. This may, of course, be dueentirely to an unusually high level of suspicion on the part ofthe screeners in Center D, but the Round 2 results for Center C,which are due to Cytologist D, suggest that this may not be theentire explanation. As was previously mentioned, Cytologist Dperformed the Round 2 reading for Cytology Center C, whichwas used in the determination of the standard classification.There is no exact way of assessing the jxissible effect of this substitution on the standard classification. However, some idea canbe gained from the available data. Appendix B, Table 11, showsthe cytology center readings on 100 slides for Centers C and D,cross-tabulated for each round of screening. Although this tableincludes differences between screeners as well as cytologists, itcan be seen that a total of 33 non-negative readings were madeby Center D on Round 1 in contrast to 26 non-negative readingsby Center C. For the Round 2 readings, the total numbers ofnon-negatives for the 2 centers are more similar (and to CenterD, Round 1), with 31 for Center D and 34 for Center C (in whichthe cytologist ]>ortion was contributed by Cytologist D). Thissuggests that Cytologist D may have a tendency to classifyslightly higher than Cytologist C. The degree of difference, however, is not likely to have affected the standard classification substantially.

It seems appropriate at this point to return to the screeners'

results for a comparison with the standard classifications.Table 11 shows a comparison of the standard classification

with all readings of all screeners. Again, the patterns are moreeasily seen graphically, and are shown in Chart 3. The generalconfiguration is about the same as that shown in Chart 1. The

TABLE 11PERCENTDISTRIBUTIONOF READINGSFOR ALL SCREENERSFOR

BOTHROUNDSIN ALL CYTOLOGYCENTERSFOR EACHSTANDARDCLASSIFICATIONCATEGORY

STANDARDCLASSIFICATIONUnsatisfactoryNegativeAmbiguous

CellsSuspectPositiveAll

categoriesSCREENER

READINGS IN ALLCYTOLOGYCENTERSUnsatisfactory4«64447Negative29823827858Non-Negative251258709035ALLSCREENERS

READINGS(No.)2448848128112800

OCTOBER 19f>6 2131

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 11: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Xaylor, G. N. Papanicolaou, and W. 0. Umiker

100-

90-

80-

*UNSAT NON N UNSAT NON-N UNSAT NON-N UNSAT NON-N UNSAT NON-N

NEC

** Unsatis-

factory

Negative AmbiguousCells

NEG

Suspect

NEC

Positive

* Screeners' Readings

** Standard Classification

CHART3. Percent distribution of readings for all screeners in all cytology centers for each standard classification category. Unsat,unsatisfactory; Xeg, negative; Non-N, non-negative.

exact agreement bars for standard unsatisfactory and negativeslides are, of necessity, higher in Chart 1 than Chart 3, since thecytologists did not see those slides which were classified into eitherof these categories by the screeners. Consequently, the differencesbetween these bars for the 2 charts reflect the extent to whichthe cytologists reversed screeners' opinions. Conversely, the differences between the non-negative bars for the ambiguous cell,suspect, and positive categories indicate the extent to whichscreeners' opinions were overruled for each of these categories

of slides. It can be seen that this is a monotonie function of standard category, with virtually no reversal on standard positiveslides.

Comparable information for each cytology center is summarized in Chart 4. These screening data are analogous to thecytology center data shown in Chart 2. Care must be taken incomparing the same classes in the 2 charts, since the standardunsatisfactory class is omitted from Chart 2.

The only marked differences between Charts 2 and 4 are thestandard ambiguous group for Center C, and the standard non-positive groups for Center D. This is to be expected, since Cytol-ogist D is the only 1 of the 4 who showed any substantial tendency to reverse his screeners' slide classifications.

A comparison of screeners and cytology centers is summarizedin Table 12 in the form of average % agreement with the standardclassification. Though the magnitudes of the given figures wouldbe different for a different distribution of slides among standardcategories, the relative magnitudes again reflect past conclusions.The only radically aberrant numbers are the 52% figures given

for Center A on non-negative slides. This is entirely dueto screener under-reading, relative to the other centers. The 75%agreement shown for screeners in Center D results from over-reading of standard negative slides. This, however, is entirelycompensated for by the cytologist's reversal of screener's opinion.

Results—Phase II

Phase II of this study was an attempt to assess the degree ofconcordance between cytologists without the confounding influence of variation in screening. It is not possible to do this directly,since practicing cytologists rely heavily on the screeners theyhave trained and worked with. As a practical compromise, it wasdecided to compare the cytologists' interpretation of a set of

slides which had been permanently dotted by screeners who wereforeign to all the participating cytologists. The Phase II resultsconsist of 2 independent readings by 2 cytologists (A and D),and a single reading by a 3rd cytologist (Di), using the same setof 100 slides which had been used in Phase I.

It should be reiterated that the appearance of the slides afterscreening and dotting for Phase II was apparently quite differentfrom what the participating cytologists were accustomed to seeingin their own laboratories; the major difference being the numberof dots per slide. In general, the permanently dotted slides usedfor Phase II often contained considerably more dots than theparticipating cytologists thought advisable, and as a consequence,they did not always examine all dotted cells. This consideration,together with the fact that the screening had been done by tech-

2132 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 12: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

CENTER

ZLUU

TABLE 12AVERAGE% AGUEEMENTOF SCREENEHS'ANDCYTOLOGYCENTEK

READINGS WITH THE STANDARD CLASSIFICATION BYCYTOLOGYCENTER

U NN-N UNN-N U NN-N U NN-N UNN-N

** Unsatis- Nega- Ambig. Sus- Posi-

factory tive Cells pect tive

* Screeners' Readings

"'-Standard Classification

CHART 4. Percent distribution of readings for all screeners ineach cytology center for each standard classification category.For abbreviations, see legend to Chart 1.

nicians totally unknown to the cytologists, must be borne in mindin assessing the following results.

Table 13 shows the results of all Phase II readings, tabulatedagainst the standard classification. Chart 5 shows these proportions depicted in a histogram. The uniformity of these resultstends to bear out the appropriateness of the standard classifications. The standard unsatisfactory slides present a confusedpicture, as would be expected of slides from this category. Ineach other standard classification, however, the modal classagrees with the standard. In the negative, ambiguous, and positive standard classifications, majorities of 73%, 57%, and 56%of the readings agree with the respective standard classes, whilefor the standard suspect slides, a plurality of 39% of the readingsagree with the standard. For the 3 non-negative standard classes,the total percentages of non-negative readings are as follows:

ambiguous cell, 67%; suspect, 87%; and positive, 91%.

CYTOLOGYCENTEKABCDAll

centers(A,B,C,D)DiAGREEMENT

ONNON-NEGATIVEREADINGSScreener5286809479Cytology

center528075847489AGREEMENT

ONNEGATIVEREADINGSScreener9983907587Cytology

center998595949389

TABLE 13

PERCENT DISTRIBUTION OF READINGS OF ALL CYTOLOGISTSFOREACH STANDARDCLASSIFICATIONCATEGORY

CLASSIFICATIONUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveAll

categoriesALL

CYTOLOGISTS'READINGSUnsatis

factory2753045Negative27733014650Ambiguouscells13195725921Suspect2727392012Positive703235612ALLREADINGS(No.)15305308070500

These results represent an improvement over the comparablePhase I results shown in Chart 1 in nearly every category, the difference being most marked for the standard ambiguous slides.A priori, this favorable comparison might have been expected,since 1 source of variability—screener variability—has been

virtually eliminated. On the other hand, the degree of improvement might have been expected to be mitigated by the fact thatthe cytologists had access to all 100 slides in the Phase II portion(in contrast to Phase I, where they received only those slidesreferred by their screeners) and examined nearly every one, except those few slides which had no dots. This probably accountsfor the decrease in the c/0 agreement with the standard negative

class slides from 87 % in Phase I to 73% in Phase II. This decreaseand the generally higher agreement in the standard non-negative

categories, probably is another reflection of a tendency tounder-read on the part of the screeners who participated in the

Phase I study.On the whole, the overall results of Phase II are gratifying,

since they tend to substantiate the assignment of standard classifications based on Phase I results, and, grossly show moreconsistency than might have been expected from the Phase Iresults. Furthermore, if the standard classificai ions are consideredas the "true" situation, these data indicate an agreement of 73%

on negative slides and an agreement of between 67 and 91 % on

OCTOBER I960 2133

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 13: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Xaylor, G. N. Papanicolaou, and W. 0. Umiker

I EXACT AGREEMENT' WITH STANDARD

L_JAGREEMENT ON NON-NEGATIVES

* U N A S P

"" Unsatisfactory

U N A S P

Negative

UNAS PN-N UNAS PN-N

AmbiguousCells

Suspect

UNAS PN-N

Positive

* Cytology Center Readings

** Standard Classification

CHART5. Percent distribution of readings of all cytologists for each standard classification category. For abbreviations, see legendto Chart 1.

non-negative slides, with an average agreement of 84.5% for allstandard non-negative slides.

A comparison of readings of the 3 individual cytologists withthe standard classification is summarized in Chart 6. It is againnoted that with a single exception (that of Cytologist A readingthe standard suspect slides), the modal class in each distributionagrees with the standard classification. If this histogram is compared with Chart 2, which shows comparable data from the PhaseI portion of the study, many differences are noted, but these tendto be substantial only in the case of Cytologist A, whose laboratory showed the most deviation from the consensus in Phase I.

Although the % agreement with the standard drops betweenPhase I and Phase II in the negative group from 9390 to 62%for Cytologist A, all non-negative categories show a gain in overall agreement. This gain is most marked in the ambiguous andsuspect groups, where the non-negative agreement rises from17% to 75% and 31% to 96%, respectively. The standard positive group also shows a gain in total non-negative agreementfrom 78% to 90%, although the exact agreement category dropsfrom 63% to 50%. These differences can again be largely explained by screener under-reading during Phase I, althoughscreener-cytologist interaction also appears to be playing a role.Cytologist D shows generally similar differences between PhasesI and II, but the differences are small enough so that they mightreasonably be explained on a chance basis.

Cytologist DI presents a slightly different picture, havingdropped in non-negative agreement in both the suspect and

positive groups from 82% to 76% and from 100% to 92%, respectively. These, however, are not large differences, consideringthe fact that Cytologist DI was not represented in the assignment of standard categories and only participated hi 1 round ofPhase II. The negative and ambiguous groups show a patternwhich is generally similar to the other 2 cytologists, namely, areduction in exact agreement with the negative standard from74% to 51%, but an increase in both exact agreement and innon-negative agreement for the ambiguous group, the increasebeing similar to that of Cytologist D.

Differences between cytologists for the Phase II data remainmarked, but tend to decrease with increasing non-negativeness ofthe slides examined, as might be expected. Exact agreement withthe standard negative slides ranges from 51% for Cytologist D!to 96% for Cytologist D, with Cytologist A occupying anintermediate jwsition of 62%. Exact agreement with the standardambiguous group for Cytologists A, D, and DI comprises 67%,50%, and 50% of the readings, respectively, while total non-negative agreements are 75%, 50%, and 84%. Evidently slidesreported as ambiguous are a group within which there is considerable room for personal interpretation.

The standard suspect group shows less variability, but uniformly smaller percentages of exact agreement, these being 38%,41%, and 38% for Cytologists A, D, and DI, respectively, withcorresponding non-negative agreement percentages of 95, 79,and 76, respectively. Reports on the standard positive slidespresent a very uniform pattern for all 3 cytologists, with exact

2134 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 14: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

^H EXACT AGREEMENT2.1"] NON -NEGATIVE AGREEMENT

lOOi

TABLE 14COMPARISONOF NEGATIVE AND NON-NEGATIVE HEADINGS ON

IST AND 2ND ROUNDS FOR CYTOLOGISTSA AND I)

UNASP UNAS PN-N UNAS PN-N UNAS PN-N

CYTOLOGIST D, (100 slides p-,

UNASP UNAS PN-N UNAS PN-N UNAS PN-N

100-

80J l CYTOLOGIST D ( 200 slides )

60i—

£ 409Sí 20-

S

WO

80-

60

40

20

"UNASP UNAS PN-N UNAS PN-N UNAS PN-N

** Negative Ambiguous Suspect PositiveCells

* Cytologists' Readings

** Standard Classification

CHART6. Percent distribution of readings of individual cytolo-gists for each standard classification category. For abbreviations,see legend to Chart 1.

agreement on 50%, 61%, and 57%, and total non-negativeagreement on 90%, 90%, and 92%, of the readings for CytologistsA, D, and DI, respectively.

A direct estimate of the Cytologists' consistency can be ob

tained from Appendix B, Table 12, which shows the screeningresults for the individual rounds tabulated against one another.Only Cytologists A and D participated in both rounds of PhaseII. Reports falling on the main diagonal indicate complete consistency. The percentages of such reports are 49% for CytologistA and 74% for Cytologist D. Since the screen report categoriesbear a contiguous relationsip to one another, it is pertinent toexamine the reports which fall within 1 category of each other onthe 2 rounds of screening. (This is not strictly true for the unsatisfactory category.) If this is done, the percentage consistencybetween the 2 rounds becomes 90% for Cytologist A and 91 % forCytologist D. If unsatisfactory reports are disregarded and allnon-negative categories grouped, Table 14 results. This shows thatCytologist A agreed with himself on a negative-non-negative basison 73.5% of his reports, and Cytologist D, 88.5%.

Differences between Cytologists may be partially explained by

CytologistANegativeNon-negativeTotalCytologist

I)NegativeNon

-negativeTotalROUNT5

2READINGNegative2622860202Non-negative20355592332Total463783092594

random errors. These would consist of instances when the cytolo-gists did not examine the same cells, for example. However, alarge part of the lack of agreement is probably due to honestdifferences of opinion, as Appendix B, Tables 13-15 imply. Theseshow cross-tabulations between all pairs of the 3 Cytologists forthose slides on which Cytologists A and D were both consistentwith themselves (but not necessarily with each other) on the 2rounds of Phase II. The percentage agreement for each pair is asfollows: A and D, 78.5%; A and D,, 46%; D and D1; 46%. Therelatively low agreement of Cytologist Di with the other 2 is tobe expected, inasmuch as he has contributed only 1 reading, andthis set of slides undoubtedly contains some for which he wouldgive a different value on a 2nd round of reading, and no doubtalso omits some for which he would be consistent with himself.An examination of these tables shows that the major reason fordisagreement between Cytologist DI and the other 2 is hisgenerally greater propensity to give readings of ambiguous cellsfor the slides represented here.

The only 2 Cytologists who can be contrasted on a comparablebasis from these tables are A and D. The high % agreement inthis contrast results largely from the high proportion of slideswhich both consistently agreed were negative—25 of the 37 total.Aside from these, only 4 of the remaining 12 fall on the maindiagonal, but pairs of readings on all but 1 slide fall within 1category of each other.

Cytologists A and D! both tend to give a range of readings onthose slides which Cytologist D called negative, while they arefairly consistent with one another, calling 34 of the 37 slideswithin 1 category of each other. However, because of the smallnumber of slides involved and Cytologist DI'S limited participa

tion in this phase, any conclusions from these data remain tentative.

Appendix B, Tables 16 and 17, give further indication thatthe slides on which a cytologist is consistent with himself maynot be the same from person to person. These tables show theindividual rounds of reading for Cytologists A and D, respectively,for those slides on which the other cytologist in each case was consistent on both rounds of Phase II. For the total data, CytologistA was in exact agreement with himself on 49% of readings, andwithin 1 category 90% of the time, while the corresponding percentages are 50 and 92, respectively, for the slides on whichCytologist D was perfectly consistent on both rounds of Phase

OCTOBER 2135

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 15: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Naylor, G. N. Papanicolaou, and W. 0. Umiker

II. Similarly, Cytologist D showed consistency 74% of the timeand was within 1 category 91 % of the time on the completetabulation, while the comparable percentages are 76 and 96 onthose slides for which Cytologist A showed perfect consistency.In other words, it appears that each Cytologist shows the sameamount of variability, whether the total distribution is examined,or only those slides on which another cytologist has a fixedopinion. Again, the numbers involved are small and only 2 similarreadings may not indicate a fixed opinion, but the agreement isquite good.

Traditionally, the viewpoint taken in analyzing data of thetype presented here, has been to regard the material studied ashaving some "true" value with respect to the disease in question.

However, the observations in the last paragraph suggest that amore realistic approach might be to define a "true" value specific

for a given cytologist. That is, each cytologist could be expected toperform more consistently relative to his own concept of the"true" situation than to some other measure artificially inducedby averaging over several cytologists' opinions. These classifica

tion schemes would not be expected to coincide from cytologistto cytologist, but the amount of overlap would be expected to begreat, as is apparently the case with the data presented here. Thispoint of view could go a long way toward explaining the consistently high degree of success most laboratories achieve in thediagnosis of cancer, in spite of the apparent lack of uniformagreement on individual classifications.

Unfortunately, this concept had not been formulated at thetime of the design of the present study and the data are consequently too limited to allow analysis from this point of view.However, some of the theoretic aspects have been investigated(1) and are expected to be published elsewhere.

Discussion

A. Variability in Medical Diagnosis

Although the cytologie examination of sputum is now widelyused as an aid in the detection of lung cancer, and many reportshave been devoted to the accuracy of the technic, apparently littlehas been published about its precision. (The word accuracy is usedhere to denote the ability of the technic to detect correctly theunderlying "true" situation, whereas precision is intended to

mean the reproducibility of results on a given specimen.)With notable exceptions, the medical literature contains rela

tively few data on the repeatability of results in medical testing.This seems to be particularly true in those situations which require some degree of subjective judgment, relative to a presumed"true" scale of measurement. The papers of Yerushalmy andco-workers (2, 13) report on exhaustive studies in the field ofchest roentgenography, and contain data which demonstrateconsiderably more variability in the X-ray diagnosis of pulmonary tuberculosis than was previously thought to exist. Basicpapers on methods of investigation of these data have been givenby Yerushalmy (14) and Neyman (8) ; Chiang (3) has publishedan excellent discussion of some of the theoretic consequences ofmisclassification for this type of study. Garland (6) discussessome of these investigations, as well as some of the technical, medical, semantic, and philosophic questions that arise in attempts toassess the meaning of the results. He also presents a brief review

of some of the other medical areas where similar inconsistencyhas been observed in the past.

A paper more closely related to the present discussion, published by Siegler (10) contains limited data which suggest that agood deal of inconsistency exists among pathologists in theclassification of cervical biopsy tissue with respect to the diagnosisof carcinoma in situ. Wilson and Burke (11-13) present analogousdata on multiple readings of lung tissue specimens, and discussthe use of standard analysis of variance technics as a possiblemethod of analysis.

In the field of cytology, data of this sort are quite scarce. Ithas not been possible to find any data on the repeatability ofclassification in vaginal cytology, notwithstanding that this isnow a routine test throughout the country.

A few papers report results pertaining to the precision of pulmonary cytologie technics. In a paper largely concerned withaccuracy, Harris (7) presents data on 2 readings by each ofseveral screeners on a set of 100 slides. The adaption of data fromTables 7 and 9 of that paper, in order to effect a presentationsimilar to that of the present study, results in Appendix B,Table 18. While these results are not directly comparable to thoseobtained in the present study, they show variability of roughlythe same order of magnitude as our observations.

Foot (5) gives a summary of the rereading of 1000 cases fromthe files of Dr. Papanicolaou's laboratory. His data are unlike

those of the present study in that they are based on a variablenumber of slides per case and are unfortunately not given in sufficient detail to allow a direct comparison between the 2 independent readings for all categories of classification.

Alore recently, Russell et al. (9) have given a very thoroughdiscussion of the technics employed in pulmonary cytology, aswell as an indication of some of the complexities of the structureof such a multistage process. They report on the rescreening of241 specimens and present the data in Table 15 to indicate thedistribution of the most extreme disagreements they observed in97 of these specimens which were obtained from patients witheither primary or secondary lung cancer. (The numbers givenindicate Papanicolaou classification.)

B. Interpretation

The foregoing studies have been cited in order to reemphasizethe universality of the problem of inconsistency in any situationthat presumes a high degree of abstraction and subjective interpretation. This phenomenon is usually even more pronounced incases where an underlying "true" scale of measurement is postu

lated, as opposed to a pure ranking situation (where, for example,the task is merely to say whether 1 of 2 objects is "worse" thanthe other, or "more suspicious," or whether they are "equal," rather

than to assign a specific value, as is the case here).A good deal of care must be taken in interpreting such studies,

and the results of several apparently similar studies can sometimes not be directly compared at all. For example, the naturaltendency in viewing repeated independent classifications ofcategoric material is to express variability (or lack of it) in termsof percentage agreement among all readings, and we have donethat in this paper, to some extent. Such figures, however, canonly be meaningfully used within a given amount of material,because the degree of concordance between classifications is in-

2136 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 16: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

TABLE 15MAJOIÕDISCREPANCIESIN 2 INDEPENDENTSCREENINGSOF

SPUTUM SPECIMENSOBTAINED FROM PRIMARY ANDMETASTATICLUNG CANCER PATIENTS (N = 97)"

IIIIIIUnsatisfactoryTotalHIGHEST

RESULTCLASSIFICATION'IV41016V31015Total720211

" Table 16 from Reference 9.bRoman numerals refer to Papanicolaou classification num

bers.

trinsically tied up with the underlying "true" distribution of the

material involved. For example, if dual readings were taken on agroup of material that is uniformly "negative" (or uniformly"positive"), one would expect the % agreement between the 2

readings to be of the order of 95% or so, while a lower percentagewould natural!}" be expected if the basic material were of a more

controversial or heterogeneous nature. Hence, this measure canbe used to compare 2 or more such studies only when the materialupon which they are based is known to be of about the samedifficulty of classification (and when the classification schemesused are known to be identical).

Another factor to which the results are particularly sensitive isthe amount of material upon which each opinion is based. In thepractice of pulmonary cytology, results are rarely, if ever, basedupon a single slide and many of the studies cited here and in theaccompanying papers are based upon different (sometimesvariable) numbers of slides per case. For example, the study ofFoot (5) presents diagnoses based on sets of smears, the sets varying from 1 to 20 or more smears. In the present screening study,on the other hand, every screen consisted of 4 slides, while"secondary" screens comprised 6 slides, in contrast to the vari

ability study discussed in this paper, where all results are givenin terms of single slides.

In order to give some idea of the possible effect of multipleslides, a simple model of the screening process was derived. Thismodel assumes:

1. n technically satisfactory slides are obtained from a patient's

sputum. All are screened and those thought by the screener to benon-negative are dotted and referred to a cytologist for classification.

2. The cytologist examines referred slides and gives a report onthe case which is equal to the highest classification of slide he observes. (Naturally, if the screener classifies all slides for a case asnegative, then the rejjort for that case is negative.)

3. Each slide obtained has a "true" value which corresponds

to 1 of the 4 categories: Category 1, negative; Category 2, ambiguous; Category 3, suspect; Category 4, positive.

4. The screener has a probability pit for each "true" Category

i, of referring a slide from that category to the cytologist as non-

negative. That is, among all negative slides (Category 1), thescreener has a constant probability p\, of referring a slide to the

cytologist as non-negative; among ambiguous slides (Category 2),he has probability p2, and so on.

5. A cytologist has a distribution of probabilities, Qa, whichrepresent his chances of rendering an opinion of Category j on aslide whose "true" category is i, after it has been referred to himas non-negative by his screener. For example, Q.&represents theprobability that the cytologist will give a classification of suspect(Category 3) to a slide whose "true" category is ambiguous

(Category 2), after that slide has been screened and dotted appropriately.

6. A given patient can be characterized by a multinomialdistribution r,, which represents his probabilities of producingslides (at a given point in time) from Category i. For example,TI represents his probability of producing ambiguous cell (Category 2) slides.

Note that this model assumes that the cytologist cannotclassify a slide unless it is referred to him by his screener, whichis in fact the usual case in practice, but it also assumes noscreener-cytologist interaction, and that the ultimate classification for a set of n slides is based upon only dotted slides and notslides which have been screened as negative, which may or maynot be the case, in a given laboratory. There are other shortcomings in the model as is always the case when an attempt ismade to analyze a complex situation in terms of relatively simpleassumptions about the underlying process, but the analogy shouldbe close enough to give some idea of the effect on the outcome ofvarying one of the parameters (in this case, the number of slides,n). Utilizing the given assumptions, it is not too difficult to deriveexpressions for the probabilities of the 4 outcomes: negative,ambiguous, suspect, or positive, in terms of the number of slidestaken, the probabilities of obtaining slides from each of the"true" categories, and the postulated screener's and cytologist's

probabilities of classification. This derivation is given in ApjjendixA.

For example, supi>ose n slides are obtained from a patient whoproduces only negative slides (i.e., r¡= 1.0). Let us further|K)stulate the following probabilities for illustrative purposes:

Pl = o.l, Qn = 0.85Q,t = 0.10On = 0.05Qn = 0.0

These values are equivalent to assuming that, on the average,the screener will regard Vo of the negative slides as non-negativewhile the cytologist will classify 85% of such referred slides asnegative, 10% of them as ambiguous, 5% as suspect, and noneas positive.

Assuming the patient's rejjort is based on the highest category

observed, we can calculate the following probabilities of the finalreading being in each of the 4 categories, for a variety of totalnumber of slides.

PBOBABILITYOFNo. OF SLIDES READ

CLASSIFICATIONNegativeAmbiguous

cellsSuspectPositivei0.9850.0100.0050.020.9700.0200.0100.030.9560.0290.0150.040.9410.0390.0200.0s0.9270.0480.0250.0100.8600.0910.0490.0

OCTOBEB 2137

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 17: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Naylor, G. X. Papanicolaou, and W. 0. Umiker

We see in this case that the specificity of the screen is uniformly higher than the jwstulated specificity of the screener(90%) or the cytologist (85%) for moderately small numbers ofslides, and produces less than 15%, overcalling, even with asmany as 10 slides.

At the other end of the scale, suppose we are dealing with apatient who produces only positive slides (i.e., r4 = 1.0). Assumefurther the following probabilities:

pt = 0.9, Q«= 0.05Q42= 0.05QI3 = 0.20Qit = 0.70

where p4 is the screener's probability of referring positive slidesto the cytologist, and the Q's are the cytologist's probabilities of

classifying the referred slides into the categories: negative, ambiguous, suspect, or positive, respectively.

Then the probabilities for the final report are:

those given should suffice to convince the reader that the processdescribed is indeed quite sensitive to the number of slides examined, as well as to changes in the jiostulated probabilities ofclassification. Furthermore, if this model is approximately representative of the true situation, the results given suggest thatthere may be some optimum number of slides on which to basediagnoses in a given situation, where fewer slides than the optimum would produce an excessive number of false negatives, whilemore would tend to increase unduly the chance of false positives.The problem is too complex to deal with in detail here, but one ofus (P.A.) has pursued the theoretic consideration of these andother questions. The derivation of several models which extendand generalize the results given in Appendix A, together withdiscussions of a number of other theoretic and practical considerations are given in Reference 1. These results will be published elsewhere.

Another important consideration in the interpretation of inter-

NegativeAmbiguouscellsSuspectPositiveNO.

OF SLIDESBEAD10.1450.0450.1800.63020.0210.0150.1010.86330.003050.003810.043790.94935+0.00044250.000064100.00.000862

; 0.0001840.00.0174360.0066870.9812600.9930650.00004810.9999519

Here, even with only 1 slide, the patient stands only a 14.5%chance of being called a flat negative, while if as few as 3 slidesare taken, he is virtually certain to be classified into some non-negative category, and 95 % of the time will be called correctly(in this case) positive.

To illustrate the middle ground, suppose a patient producesneither positive nor negative slides, but only ambiguous andsuspect slides, with equal probabilities (i.e. r2, = r3 = 0.5).Suppose further that p¡= 0.6, Qn = 0.5, Q22= 0.3, Q23 = 0.2,QM = 0.0; and that p3 = 0.7, Q3i = 0.15, Q¡¡= 0.25, Qu = 0.4,Qu = 0.2. Then, the probabilities of the patient's classification

would be as follows:

individual disagreements is the question of congruence of classification categories from cytologist to cytologist. The j)ossible importance of this aspect has not been stressed heretofore, but it isentirely possible that a not inconsiderable amount of the variability observed has a semantic basis. For example, one cytologistmay regard cells suggestive of a certain level of bronchiectasis,say, as negative in his concept of the classification scheme, whileanother may think of the same cells as ambiguous, even thoughthey may be in complete agreement as to the nature of the cells themselves. At present, there are several classification schemes ingeneral use, some of which are based u]>ona modification of theoriginal Papanicolaou numbering system, while others have

No. OF SLIDES READ

NegativeAmbiguouscellsSuspectPositivei0.55250.177520.30530.22760.2000

0.33200.07000.135030.16870.22040.41530.195640.09320.19080.46410.251950.05150.15580.48840.3043100.00270.04030.44100.5160

In this case, the patient has a very high chance of being callednegative if only 1 slide is taken (Prob = 0.55) which decreasesrapidly with increasing number of slides to a level of less than 1%when the final report is based on 10 slides. On the other hand,there is a pronounced drift in the tendency to call the patientpositive. This is to be expected in this model, since the final diagnosis is based on the highest observed report and when there is anonzero probability of overcalling, this will be accentuated as thenumber of slides increases.

Obviously, only a few illustrations can be presented here, but

evolved in terms of descriptive language. Whatever direction thistendency takes in the future, it is imperative that there be uniform agreement on the meaning of any given diagnosis, both interms of the consequences with respect to the patient and therange of cellular findings included in each category. In practice,in any institution, this understanding exists between the cytologistand clinician.

It is clear that the last word has not been said on this subjectand still further work needs to be done in order to attempt toanswer some of the questions raised by this small study.

2 US CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 18: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Acknowledgments

The study was part of a cooperative undertaking of the American Cancer Society and the Veterans Administration. The American Cancer Society provided the major source of support by aseries of grants (Grant No. L-31) to the Veterans Administrationdomiciliaries and to the cytology, radiology, and statistics centers.The Veterans Administration provided grants to the domiciliariesfor certain aspects of the study. It should also be noted thatseveral participants in this study received support from othersources. The Statistics Center was supported in part by GrantNo. CT-5085 from the National Cancer Instititue, Grant No. HT-5082 from the National Heart Institute, a grant from the MaryReynolds Babcock Foundation, Winston-Salem, North Carolina,and by a USPHS Career Program Award GM-KG-13,901 from theNational Institute of General Medical Sciences.

The authors would also like to express their appreciation toMiss Cosgrain of the Strang Clinic and to Mr. Arthur Fuchs of theEastman Kodak Company for their assistance in solving some ofthe technical problems encountered in the study.

Appendix A

Derivation of a Model Describing Cytologie Screening

Suppose we have a classification scheme containing k categoriesof classification and can postulate the probability distributionsdefined in the "Discussion" section of the paper.

Then we assume:1. A group of n slides is obtained from a patient, each of which

is examined once by the screener. For a particular patient, theseslides will consist of «vslides from the ¿thstandard category (i = 1,...,í')l the distribution of n, being determined by the patient's

characteristic r¡distribution, previously defined.2. The screener classified each slide as either negative, in which

case the slide is not examined further, or non-negative, as a resultof which he dots the slide appropriately and refers it to the cytologist for further examination. His probabilities of taking theforegoing actions are characterized by probabilities p¡,previouslydefined. If the screener refers none of the n slides for a given patient, that patient is classified as negative.

3. The cytologist examines only that subset of the n slides whichthe screener has called non-negative and dotted accordingly. Heclassifies each slide and issues a final opinion on the case corresponding to the highest classification category he has observedon any of the slides examined. His probabilities of giving variousopinions on individual slides are characterized by the Q¡¡distributions, previously defined.

For a particular patient, suppose that the cytologist receives x¡out of the n¡slides from the i"' standard category (i = 1, . . . , k)as a result of the screener's having examined, dotted, and referred

them. Upon evaluation, the cytologist classifies xa slides from theilh standard category into Category j. His probability of doing so is

Prob (jr.!, xa, •••,Hi,) = ——n *«i>-'i-i

Variability in Interpretation of Sputum Cytology Slides

Prob (y¡= k) = Prob (xit * 0) = l - Prob (xit = 0)

where

t t t23 ZU = Xi, 23 Qij = 1, and ^ xt < ni-í i-í

The report y¿for this group of slides will be the value of the highest classification category which is not empty. Consequently,

OCTOBER 19WÌ

1 -

Prob (y¡= k - 1) = Prob Uà = O and x¡.k.¡* 0)

-C?:-In general.

Prob (y¡= s) = Prob Or,,,,, = O and n.. ^ 0),

—.-.»-*-[£*]"-[§*]".-*-.and

Prob (î,i= 1) = Prob (m = z.) = Q*{

Working with the cumulative distribution, these probabilitiesare expressed as

Prob (;/, < «)= ¿ Q„ ', s = l, •••, k

At the technician level, the probability that the screener refersXi slides out of the n¡from the tth standard category is

where

HÕ= n and x¡< n,

Combining this result with the foregoing, we arrive at 0,0'); the

probability that the maximum classification given to any of theslides from Category t is less than or equal to.;'.

= (i - pi)"' + Jt (n'V'1(1 - P.)ní~1(?¡.i-i \ ' /

ii + 1 - p,-)"' = [1 - P.U - <2..)]B<

E (^'V''(1 - P')n'~'I èQ«T1-1 \ ' / \_l-l J

In generai,

e.(s)= (i - PÕ)BÕ+

-4- £«,)]",1, •••,k

2139

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 19: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Naylor, G. X. Papanicolaou, and W. 0. Umiker

Now, if y is the final report given on all n slides,

Prob (y = 1) = Prob (y¡= 1 for all i)

= E »-(D= U - Pí(l- G«)]"'v- 1 t - 1

Prob (y < 2) = Prob (y < 2 for all t)

= U ftÖ)= H i l - P*fi - ¿ «WH*

i_i <-i L \ j-i /J

In generai,

Prob (y < s) = Prob (y¡< s for all i)

=n «.-W=n M- P.YI- Ão«) ', «- 1,•••,*i_i ¿-iL \ f-i / J

This distribution represents the probability of the final classification y, conditional on there being n¡slides in Category i. Sincewe have postulated a total of n slides from a patient and probabilities r i of the patient producing slides in the zth category, the probabilities can be compounded further, in terms of the individualpatient as follows:

Prob (y < s) = ]£P('J < s I n.)p(n¡)nt

=z n fi- P.•(i- ÃQiAl"i '-i L V /-i /J

-- n

' n ".!"'»_i* r / •

Er, •I-Pi :(l-ZQ-i L V j-i

The above summation overn¡ indicates a sum over all partitionsof n, and the result follows from the multinomial theorem.

Note that when s = k, Prob (y < k) = 1, as it should.To obtain the probability for an individual category, we use:

P(y = 1) = P(y < 1)

P(y = S) = P(y < s) - P(y < S - 1), S = 2, . . . k

Appendix It

APPENDIX B—TABLE 1

INTRASCREENERCOMPARISONSOF HEADINGS ON IST AND 2NDROUNDS IN ALL CYTOLOGYCENTERS COMBINED

ROUND2 HEADING

UnsatisfactoryNegativeNon-negativeTotalUnsatisfactory1304Negative528235Non-negative431421Total10341660

APPENDIX B—TABLE 2

SUMMARYOF INTERSCREENER COMPARISONSOF READINGS ONIST AND2ND ROUNDS IN ALL CYTOLOGYCENTERS COMBINED

ROUND1READINGUnsatisfactory

NegativeNon-negativeTotalROUND

2HEADINGUnsatisfactory5

14524Negative5

15043198Non-negative4

3381118Total14 197129340

APPENDIX B—TABLE 3DISTRIBUTIONOF SCREENERS'READINGSON ROUND1 BY

CYTOLOGYCENTER

CYTOLOGYCENTER

A+°++-TotalTotal

No. ofnon-negativereadings15B++++——+—+-27C++—_+—++—-26D++++++—+——-5411724422242141100122

+ , non-negative; —, negative or unsatisfactory.

2140 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 20: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

APPENDIX B—TABLE 4DISTRIBUTION OF SCUBENEHS' READINGS ON ROUND 2

BY CYTOLOGYCENTER

CYTOLOGYCENTERA+°+--TotalTotal

No. ofnon-negativereadings20B++—++—+——-5<>C+++_++_+—-43D++++—+_-+-4319111762134334100162

Variability in Interpretation of Sputum Cytology Slides

APPENDIX B—TABLE G

COMPARISONOF STANDARDCLASSIFICATIONWITH THEORIGINAL CLASSIFICATIONOF STUDY SLIDES

CLASSIFICATIONUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveTotalSTANDARD

CLASSIFICATIONUnsatis

factory002103Negative0202614101Ambiguouscells001506Suspect0017816Positive00031114Total020303020100

0 +, non-negative; —,negative or unsatisfactory.

APPENDIX B—TABLE 5

DISTRIBUTIONOF NEGATIVE ANDNON-NEGATIVEREADINGS FOR 100 SLIDES CLASSIFIED ACCORDINGTOSPECIFIC CRITERIA FOU STANDARDCLASSIFICATION

SLIDESCLASSIFIEDBY SPECIFICCRITERIA

Negative0Total

No. ofreadingsNegative877666555Non-

negative010120102Unsatisfactory001102231Total

No. of slidesNo.

ofslides2117634522161Non-negative6Total

No. ofreadingsNegative010223432Non-

negative877655444Unsatisfactory001010012Total

No. of slidesNo.

ofslides94341222128SLIDES

NOTCLASSIFIABLEBYSPECIFICCRITERIABUTCLASSIFIEDON INDIVIDUALBASIS"Total

No. ofreadingsNegative534421Non-

negative333121Unsatisfactory021346Total

No. of slidesNo.

ofslides42211111

Negative—5 or more cytology center readings were negative and there were less than 3 non-negativereadings.

6 Non-i

ther 4 re „_,.' See text for explanation of method.

admgs.6 Non-negative—4 or more non-negative cytology center readings (regardless of the categories of the

other 4 readings).

OCTOBER I960 2141

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 21: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, B. Xaylor, G. N. Papanicolaou, and \V. 0. Umiker

APPENDIX B-TABLE 7

PERCENT OF ALL SLIDES READ NON-NEGATIVE BY SCREENEKSIN EACH CYTOLOGYCENTER BY ROUND

CYTOLOGYCENTERABCDD,All

Centers%

NON'-XEGATIVEREADINGSRound

1152726334725Round2205434313535Both rounds184030324130

APPENDIX B—TABLE 8

COMPARISONOF CYTOLOGYCENTER Dt READINGSWITH CYTOLOGYCENTER I) ON BOTH ROUNDS

READINGSUnsatisfactoryNegativeAmbiguous

CellsSuspectPositiveTotalCYTOLOGY

CENTER DiREADINGSUnsatis

factory18010019Negative29412099Ambiguouscells11545429Suspect1436822Positive01132631Total2211410lu38200

APPENDIX B—TABLE 9

COMPAH/SONOF READINGS OF CYTOLOGISTD WITH CYTOLOQISTDi ON THOSE SLIDES SCREENED NON-NEGATIVE ON BOTH

ROUNDS

CYTOLOGISTDi READINGS

READINGSUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveTotalUnsatis

factory000000Negative161109Ambiguouscells1824419Suspect1126818Positive00i22528Total3156133774

APPENDIX B—TABLE 10

READINGS OF CYTOLOGISTSD AND DI ON EACH SLIDESCREENEDNON-NEGATIVE ON BOTH ROUNDS

ONEROUNDD4°

4

4

4

4

44

34444443

3443Di4

444444

44444423

334

4OTHER

SOUNDD4

444444

44444444

4434Di4

444443

43333244

4322ONE

ROUNDD4

42

240

31332

21111

11Di4

434

22

322331

2221

21OTHER

ROUNDD324

313

13010

31211

11Di2

23

123

133112

3222

11

11, negative; 2, ambiguous cells; 3, suspect; 4, positive.

2142 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 22: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

Variability in Interpretation of Sputum Cytology Slides

APPENDIX B—TABLE 11

COMPARISONOF HEADINGSOF CYTOLOGYCENTER C WITHCYTOLOGYCENTER D ON IST AND2ND ROUNDS

CYTOLOGYCENTERCUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveTotalCYTOLOGY

CENTERDRound

1readingsUnsatis

factory4810013Negative24750054Ambiguous cellsi61008Suspecti40016Positive01410419Total86611105100Round

2readingsUnsatis

factory1fi1109Negative152241(¡0Ambiguous cells010102Suspect1402310Posi-'. tive00021719Total36331021100

APPENDIX B-TABLE 12

COMPARISONOF READINGS ON IST AND 2ND ROUNDSFOR CYTOLOGISTSA AND D

APPENDIX B TABLE 14COMPARISONOF READINGS OF CYTOLOGISTSA AND

Di ON THOSE SLIDES ON WHICH CYTOLOGISTSA AND

ROUND1READINGCytologist

AUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveTotalCytologist

DUnsatisfactoryNegativeAmbiguous

cellsSuspect~D

'*'PositiveTotalUnsatis

factory2740114010102Negative12<>200290002365Ambiguouscells11911103207109Suspect0i560121205412Positive0030413001312D

WERE BOTH CONSISTENTWITHTHEMSELVESTOTALCYTOLOGIST

AREADINGS4

Unsatisfactory53Negative25

Ambiguouscells13Suspect5

Positive100

TotalCYTOLOCIST

DiREADINGSUnsatis

factory020002Negative11220015Ambiguouscells01020214Suspect010203Positive001113TOTAL125533371

APPENDIX B—TABLE1570

COMPARISONOF READINGS OF CYTOLOGISTSI) ANDDI4ON THOSE SLIDES ON WHICH CYTOLOGISTSAAND12

I) WERE CONSISTENT WITH EACHOTHERio ^ _^_^_lo100

CYTOLOGISTDiREADINGSAPPENDIX

B—TABLE 13Unsatisfactory

COMPARISONOF READINGS FOR CYTOLOGISTA AND D NegativeON THOSE SLIDES ON WHICH CYTOLOGISTSA AND Ambiguous cells

1) WERE CONSISTENT WITH THEMSELVESSusoectCYTOLOGIST

DREADINGSCYTOLOCISTA

READINGSUnsatis- Nega-factory tiveAmbig-

cuous cellsuspectPositivePositiveTOIAL

TotalCYTOLOGIST

DREADINGSUnsatis

factory0

00000Nega

tive2

14121130Ambig

uous cells0

00000Suspect01

1215Positive0

01012TOTAL2

15143337

OCTOBER 1966•2143

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 23: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

P. G. Archer, I. Koprowska, J. R. McDonald, R. Xaylor, G. N. Papanicolaou, and \V. 0. Umiker

APPENDIX B—TABLE 16COMPARISONOF READINGSON IST AND2ND ROUNDS

BY CYTOLOGISTA ON SLIDESCYTOLOGISTDAGREEDWITHHIMSELFON BOTHROUNDS

ROUND1READINGSUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveTotalROUND

2READINGSUnsatis

factory1730011Negative12520028Ambiguouscells11550021Suspect011305Positive001539TOTAL348128374

APPENDIX B—TABLE 17COMPARISONOF READINGSIN IST AND2ND ROUNDS

BY CYTOLOGISTD ON SLIDESCYTOLOGISTAAGREEDWITHHIMSELFON BOTH ROUNDS

ROUND 1READINGSUnsatisfactoryNegativeAmbiguous

cellsSuspectPositiveTotalROUND

2READINGSUnsatis

factory010001Negative03000030Ambiguous cells040004Suspect110529Positive000325TOTAL13608449

APPENDIX B—TABLE 18COMPARISONOF 2 INDEPENDENTREADINGSOF 3

TECHNICIANS"

TECHNICIANABCAll

(A, B, and C combined)2ND

READING**+Total+Total+Total+Total1STREADING+29

5342483221

123374

2599-9

576619

49689

586737

164201Total38

6210043

5710030

70100111

189300

References

1. Archer, P. G. A Mathematical Framework for the Investigation of Precision in Cytological Screening Studies. Unpublished D. Sc. thesis, Johns Hopkins University, March, 1965.

2. Birkelo, C. C., Chamberlain, W. E., Phelps, P. S., Schools,P. E., Zachs, D., and Yerushalmy, J. Tuberculosis Case Finding. J. Am. Med. Assoc., 133: 359-66, 1947.

3. Chiang, C. L. On the Design of Mass Medical Surveys. HumanBiol., 23: 242-71, 1951.

4. Cochran, W. (1. The Comparison of Percentages in MatchedSamples. Biometrika, 37: 250-66, 1950.

5. Foot, N. C. Cytologie Diagnosis in Suspected PulmonaryCancer. Am. J. Clin. Pathol., 25: 223-40, 1955.

6. Garland, L. H. On the Scientific Evaluation of DiagnosticProcedures. Radiology, 52: 309-28, 1949.

7. Harris, E. K. An Application of Statistical Method to theExamination of Sputum for Neoplastic Cells. Human Biol.,23: 180-204, 1951.

8. Neyman, J. Outline of Statistical Treatment of the Problemof Diagnosis. Public Health Rept., U. S., 62: 1449-56, 1947.

9. Russell, W O., Neidhardt, H., Mountain, C. F., Griffith,K. M., and Chang, J. P. Cytodiagnosis of Lung Cancer. ActaCytol., 7: 1-44, 1963.

10. Siegler, E. E.| Microdiagnosis of Carcinoma in Situ of theUterine Cervix. Cancer, 9: 463-69, 1956.

11. Wilson, E. B., and Burke, M. H. Some Statistical Observationson a Cooperative Study of Human Pulmonary Pathology.Proc. Nati. Acad. Sei. U. S., 43: 1073-78, 1957.

12. . Some Statistical Observations on a Cooperative Studyof Human Pulmonary Pathology II. Ibid., 45: 389-93, 1959.

13. ——. Some Statistical Observations on a Cooperative Studyof Human Pathology III. Ibid., 46: 561-66, 1960.

14. Yerushalmy, J. Statistical Problems in Assessing Methods ofMedical Diagnosis with Special Reference to X-ray Techniques. Public Health Rept., U. 8., 1432-49, 1947.

" Data adapted from Tables 7 and 9 of Reference 7.' +, positive; —,negative.

2144 CANCER RESEARCH VOL. 26

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from

Page 24: A Study of Variability in the Interpretation of Sputum ...cancerres.aacrjournals.org/content/canres/26/10/2122.full.pdf · Sputum Cytology Slides ... Overall consistency for non-negative

1966;26:2122-2144. Cancer Res   P. G. Archer, I. Koprowska, J. R. McDonald, et al.   SlidesA Study of Variability in the Interpretation of Sputum Cytology

  Updated version

  http://cancerres.aacrjournals.org/content/26/10/2122

Access the most recent version of this article at:

   

   

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected] at

To order reprints of this article or to subscribe to the journal, contact the AACR Publications

  Permissions

  Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://cancerres.aacrjournals.org/content/26/10/2122To request permission to re-use all or part of this article, use this link

on March 29, 2020. © 1966 American Association for Cancer Research.cancerres.aacrjournals.org Downloaded from