flynn1984b[1]

Upload: kelliotu

Post on 10-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 flynn1984b[1]

    1/23

  • 8/8/2019 flynn1984b[1]

    2/23

  • 8/8/2019 flynn1984b[1]

    3/23

  • 8/8/2019 flynn1984b[1]

    4/23

  • 8/8/2019 flynn1984b[1]

    5/23

    Table 2WhiteAmericans: Evidencefor IQ Gains 1932 to 1978

    Dates

    Test combination

    1. SB-L&WISC2. SB-M & WISC3. SB-LM & WISC4. SB-L &WAIS5. SB-LM &WAIS6. SB-LM & WPPSI7. SB-LM & SB-728. WB-I & WISC9, WB-I & WAIS

    10. WISC & WAIS11. WISC & WPPSI12. WISC & SB-7213. WISC & WISC-R14. WAIS & WISC-R15. WAIS & WAIS-R16. WPPSI & SB-7217. WPPSI & WISC-R18. WISC-R & WAIS-R

    Testl

    19321932193219321932193219321936%1936%1947%1947%1947%1947%1953%1953%1964%1964%1972

    Test 2

    1947%1947%1947%1953%1953%1964%1971%1947%1953%1953%1964%1971%1972

    - 197219781971%19721978

    Studies

    1716328123421

    1711121

    N

    1,56346

    460271

    79416

    2,35111015243610830

    1,042407235

    14080

    Means

    Testl

    107.13- 125.13

    1 14.64113.02109.08101.74107.08103.51122.94101.7693.5696.4097.19

    102.94109.6993.06

    112.8499.61

    Test 2

    101.64107.56109.67105.48101.7592.7897.19

    105.54118.2599.1290.8684.4288.789629 -

    101.6588.65

    108.5898.65

    Gain

    5.4917.574.977.547.338.969.89

    -2.034.692.642.70

    11.988.416.658.044.414.260.96

    Years

    15%15%15%21%21%32%39%11176

    172424%18%24%

    77%6

    Rate

    .3541.134

    .321

    .351

    .341

    .276

    .250-.185

    .276

    .440:159.499.343.359.328.630.568.161

    Ages(years)

    5-155

    5-1516-3216-484-62-18

    11-1416-3914-175-66-106-15

    16-17354

    4-55-616

    _

    OO!->05

    iU>

    3s

    Note. See Table 1 for full test names. Totals: 73 studies and 7,431 subjects. Age range in years: 2 to 48 (Mdn = 10.6). All means are weighted in terms of the number ofsubjects with the exception of Combinations 1, 3, and 7. These gave age-specific results and therefore were weighted so that each age counted equally. Sources: 1. Arnold &Wagner (1955); Barratt & Baumgarten (1957); Clarke (cited in Pastovic & Guthrie, 1951); Cohen & Collier (1952); Estes, Curtin, De Burger, & Denny (1961); Frandsen &Higginson (1951); French (cited in Zimmerman & Woo-Sam, 1972, p. 242); Holland (1953); Jones (1962, p. 121); Krugman, Justman, Wrightstone, & Krugman (1951, p.476); Kureth, Muhr, & Weisgerber (1952, p. 282); Levinson (1959); McCoy (cited in Zimmerman & Woo-Sam, 1972, p. 242); Mussen, Dean, & Rosenberg (1952); Pastovic& Guthrie (1951); Rappaport (cited in Pastovic & Guthrie, 1951); Weider, NoUer, & Schramm (1951). 2. Triggs & Cartee (1953). 3. Barclay & Carolan (cited in Zimmerman

    & Woo-Sam, 1972, p. 242); Brittain (1968); Estes (1965); Estes et al. (1961); Oakland, King, White, & Eckman (1971); Sonneman (cited in Zimmerman and Wpo-Sam, 1972,p. 242). 4. Bradway & Thompson (1962, pp. 2-3, 13); Giannell & Freeburne (1963, p. 565); Wechsler (1955, p. 21). 5. Kangas & Bradway (1971); McKerracher & Scott(1966). 6. Barclay & Yater (1969); Pagan, Broughton, Allen, Clark, & Emerson (1969); Flynn (1980, pp. 184-185 plus Garber & Heber, 1977, pp. 122 & 125); Oakland et al.(1971); Pasewark,Rardin, & Grice (1971, p. 46); Prosser & Crawford (1971); Rellas (1969); Wechsler (1967, p. 34). 7. Thorndike (1973, p. 359). 8. Knopf, Murfett,& Milstein(1954); Price & Thome (1955); 9. Karson et al. (1957); Neuringer (1963, p. 758); Rabourn (cited in Wechsler, 1958, p. 116). 10. Hannon & Kicklighter (1970); Quereshi(1968a, p. 77); Quereshi & Miller (1970, p. 108); Simpson (1970). 11. Oakland et al. (1971); Yater, Boyd, & Barclay (1975); 12. Brooks (1977). 13. Appelbaum & Tuma(1977, p. 142); Brooks (1977); Covin (1977); Davis (1977, p. 164); Hartlage & Boone (1977, p. 1285); Klingeet al. (1976, p. 74); Larrabee & Holroyd (1976); Reynolds &Hartlage (1979); Schwarting (1976); Solly (1977); Solway et al. (1976); Stokes et aL (1978); Swerdlik (1978, p. 119); Thomas (1980, p. 440); Tuma et al. (1978, p. 342); Weiner& Kaufman (1979); Wheaton et al. (1980). 14. Wechsler (1974, p. 50). 15. Wechsler (1981, p. 47). 16. Sewell (1977). 17. Rasbury, McCoy, & Perry (1977); Wechsler (1974,p. 49). 18. Wechsler (1981, p. 48).

    OJ

  • 8/8/2019 flynn1984b[1]

    6/23

    34 JAMES R. FLYNN

    Table 3White Americans: Estimates oflQ Gains1932 to 1978

    Samples

    SB, 1932 & W, 1947%SB , 1932 & W, 1953%SB , 1932 & W, 1964%SB , 1932 & S B , 1 9 7 1 V 2W, 1936'/2&,W, 1947%W, 1936% & W, 1953%W, 1947%& W, 1953%W, 1947% & W , 1964%W, 1 94 7 % & S B ,1971%W, 1947% & W, 1972W, 1953%& W, 1972W, 1953%& W, 1978W , 1 96 4 % & S B ,1971> /2W, 1964%& W, 1972W, 1972 & W, 1978Total

    All samplesSamples withN > 14 0Samples withN ;> 400

    N

    2,069350416

    2,35111015243610830

    1,042407235

    14080

    IQgain

    5.767.498.969.89

    -2.034.692.642.70

    11.988.416.658.044.414.260.96

    84.8152.1035.66

    Intervalin years

    15%21%32%39%1 1176

    1 72424%18%24 %

    77%6

    272164118

    Note. SB = Stanford-Binet;W = Wechsler. Rates in IQpoints per year(IQ gain divided by intervals) were,for allsamples, .312; for sampleswith N a; 140, .318;for sampleswith N 2; 400, .302. The samples amalgamate all testcombinations normed on the same combination of stan-dard ization samples; see Table 2 for the test combinations,those having identical dates were the ones amalgamated.

    him to suggest that IQ gains "are experiencedprimarily, perhaps even exclusively, in thepreschool y ears"(p. 6).However,the full bodyof data in Table 2 presents a very differentpicture. A nalysis of those test combinationsthat measure gainsfrom the Stanford-Binet

    standardization sampleof 1932 to the wisesample of 1947-1948 shows that the dispro-portionate gains of young children were totallypresent by 1947. After that date they simplydisappear, for example, the test combinationof wise to WISC-R, measuring gains for allages of children from 1947-1948 to 1972,showsa pattern of almost complete uniformity.Indeed, all of thepost-1947 data suggest uni-formity; the reader need only turnto Table 2and compare test combinations that includeyoung children, older children,and adults tosee that IQ gains have not been age-specific.

    There are two possibilities: either age-spe-cific gains occurred before 1947and termi-nated abruptlyat that date; or even those earlygains were unifo rmand age-specific resultsare

    artifacts of irregularities in the Stanford-B inetsample of 1932, the earliest and most primitiveof our samples. The latter conclusion is sug-gested by the fact that the only test combi-

    nations providing age-specific results have thatsample as their starting point.A t anyrate, thedominant trend for our whole period 1932 to1978 is one of IQ gains for all ages from 2 to48 years.

    Thus far we have used Table 2 to derive avery rough estimate of the rate of IQ gainsprevalent in America since 1932, an estimatethat suggested an overall rate of at least .300points per year. In order to get amore accurateestimation, I have merged thedata in a varietyof ways, weighting for number of subjects,length of period covered, ages covered, andcombinations of these. However,for the sakeof economy, I wish to suggest a simple methodbased on three simplifying assumptions. Thefirst assumption is that the rate of gain hasbeen fairly uniform for allages, which requiresno further comment. The second is that rateshave been fairly uniform year-by-year all theway from 1932 to 1978. SinceI will examinethis question in detail a few pages hence,fornow the reader need merely divide Table 2into test com binations before andafter 1947,or before and after 1953, to be struck by thesimilarity of the rates. The last assumption isthat rates have beenfairly uniformfor subjectswhose IQs (on themore recent test) are aboveand below the population mean of 100, andonce again, I invite the reader to divide thedata along those lines.

    This puts us in a position to understand

    Table 3, which generates estimates of the IQgains of A mericans in general since1932. The18 test combinations have been collapsed into15 combinations of standardization samplesbecause a number of tests have norms basedon a common standardization sample, namely,the SB-L, SB-M, and SB-LM. Given our as-sumptions, each sample combination givesequally valid inform ation about p art o f the 46years we want to measure (1932 to 1978), pe-riods ranging from only 6 years to as muchas 39'/2 years. Therefore, the proper methodof deriving an estimate for the whole 46 yearsis to weight the rates of various combinationsin terms of the length of their periods. Themathematical method for doing this is to totalall gains and divide by the total of all periods:

  • 8/8/2019 flynn1984b[1]

    7/23

    IQ GAINS 1932 TO 1978 35

    Table 3 does this and gives a rate of gain of.312 IQ points per year, very close to the roughestimate.

    The number of subjects needed to measure

    the gap between the norms set by a given com-bination of standardization samples is im-portant up to a point. Ideally, we wpuld haveat least 400 subjects who had taken each com-bination of tests so as to reduce measurementerror to a minimum; beyond that numbersare of diminishing importance. At present,however, the only way to reduce the influenceof combinations with small numbers, on theassumption that their measurement errormight be great, is by selective elimination. That

    is why Table 3 offers estimates based on com-binations with at least 140 subjects and com-binations with at least 400. The fact that ourvarious estimates differ so little from one an-other is gratifying, and because the estimatebased on 400 subjects or more seems best, Iwill use it throughout this article, roundingoff .302 points per year to .300 for ease ofcomputation.

    Our data suggests not only a rate of gainfor the whole of 1932 to 1978 but also dif-ferential rates for shorter periods within thoseyears. This has relevance for testing one variantof the "threshold" hypothesis, the notion thatincreased environmental quality beyond acertain point will yield diminishing IQ gains:It may be important to be raised in a homewith some rather than no books, but less im-portant that there are a thousand rather thana hundred books. Perhaps the United Statesduring a period of prosperity has been ap-proaching such a threshold, and if so, we wouldexpect not a constant rate of IQ gain over ourwhole period but a falling off.

    Table 4 provides estimates of rates of gainfor both 1932-1948 and 1948-1972, and asthe reader can see, the rates are very similarindeed. When the data are divided up amongthese shorter periods, each period possessestoo few sample combinations to allow for ourusual method of calculating rates of gain.Therefore, I adopted the expedient of takingall the combinations whose dates put themmainly into a certain period, merging the gainsof their subjects, and thus treating them as ifthey were one large sample combination. Thismethod is crude but it has the advantage ofreducing the influence of combinations with

    Table 4White Am ericans: IQ Gains SelectedPeriods

    Period

    1932-19481948-1972

    1932-19481948-19601960-1972

    -

    Rate'

    .368

    .353

    .368

    .347

    .359

    No. ofstudies

    2929

    296

    "N.

    2,4191,903

    2,419544

    Data used b

    1-3, 4", 510-14, 15C,

    T6, 17

    1-3, 4V510, 11'Derived from

    1948-1960and1948-1972

    " IQ points per year.b From Table 2. "Prorated.

    small numbers of subjects. The usual methodproduces exactly the same pattern of results,but it has the disadvantage of generating lessplausible values for the shorter periods. In cal-culating these rates, I tried to givethe thresholdhypothesis every possible chance by derivinga maximum rate of gain for the earlier yearsof 1932 to 1948. This meant excluding theWechsler-Bellevue I data, which would havepulled the rate down and which are of dubiousvalue, as we have seen.

    Table 4 also attempts to divide our dataamong three short periods of approximatelyequal length. The rate of gain for 1932-1948of course remains the same, and there wereenough subjects whose test combinationsmatched the period 1948-1960 to get an es-timate for those years. A s for 1960-1972, therewere plenty of subjects whose terminal dateextended to 1972, but their initial date tendedto be well before 1960. Therefore, the rate forthat period was calculated from those for both1948-1960 and 1948-1972 under the as-sumption that they were accurate. The endresult offers differential rates for the early,middle, and late segments of the 40 years be-tween 1932 and 1972, and once again the con-stancy of the rate of IQ gains over all thoseyears is most striking.

    A s for the situation since 1972, we mustlook right back to Table 2, where we find twostudies that extend to 1978, both of whichindicate gains of some sort: Test Combination15 shows a rate of .328 points per year, whichis close to our overall rate, and Test Combi-nation 18 gives .161 points, which is somewhat

  • 8/8/2019 flynn1984b[1]

    8/23

    36 JAMES R. FLYNN

    lower. However, the numbers of subjects, 72and 80, respectively, are too low to allow fora reliable estimate of IQ gains over the lastdecade. Even if the rate has begun to diminish,

    interpretation of the significance of such aphenomenon would be complicated by the factthat there is a serious case for a deterioratingenvironment during the 1970s, for example,those years witnessed the decline of the nuclearfamily and a growth in the number of childrenliving in single-parent homes. The safest con-clusion about the threshold hypothesis is anegative one: As yet, there is no evidence thatfurther environmental enrichment fo r Amer-icans in general will yield diminishing returnsin regard to IQ.

    Implications

    IQ Gains and Levels of Reality

    The magnitude of our estimate, that Amer-icans gained 13.8 IQ points from 1932 to 1978(.300 X 46 years), is sure to provoke a varietyof opinions about the reality of such gains. Ihope to convince others that the gains are real,but I wish to place equal emphasis on some-thing else: convincing skeptics that they mustnot dismiss the data as unimportant becausesurprisingly enough, the implications are al-most as great whether gains are real or simplyan artifact of sampling error. As to levels ofreality, thus far wehave noted two possibilities:that Wechsler and Stanford-Binet standard-ization samples have been reasonably repre-sentative of Americans in general and there-fore, the fact that each sample has set a higher

    standard than its predecessor signals real IQgains; or that a series of mistakes producedstandardization samples that just happened toerr on the side of being more and more eliteover time and therefore, these mistakes havemimicked IQ gains. Eventually, I will add athird possibility, that IQ gains are semireal,and the discussion of the implications of thedata will attempt to cover all three of thesepossibilities.

    IQ Gains and the SAT Score Decline

    Between 1963 and 1981 (there was a slightupturn in 1982), American high school stu-dents who took the Scholastic Aptitude Test(SAT) showed a sharp decline in their averageperformance,particularly on the SAT-Verbal,

    the test most significant as a predictor of col-lege grades. If American IQ gains are real, andif they extend into this period, then the SAT-V score decline becomes almost inexplicable

    and, insofar as we attempt an explanation,suggests societal trends of the most alarmingsort. Let us explore the following propositions:that the years 1963 to 1981 saw either no netloss in IQ or steady gains of up to .300 pointsper year and that IQ accounts for about 64%of variance on the SAT-Verbal.

    We can divide the years of the SAT scoredecline into two equal periods, 1963 to 1972and 1972 to 1981. Our Stanford-Binet andWechsler data show that IQ gains persisteduntil about 1972, but after that the evidenceis fragmentary.As wehave seen, Table 2 revealsone study of 72 adults with a high rate of gainand another study of eighty 16-year-olds witha lesser rate. Table 4 is more helpful in thatit shows that the rate of gain was virtuallyconstant for decades before 1972,with no ten-dency to diminish as that year was approached.N ow it is logically possible that some envi-ronmental deterioration cuts through the year1972 and we suddenly go from gains of .300points per year to losses at that rate, but sucha turnabout seems unlikely. For example, SATscores did not suddenly go from gains to lossesbut rather after a long period of stability, begana gradual decline that then gained momentum.It seems much more likely that the IQ gainsthat persisted for so many years before 1972have continued either at the same rate or ata diminishing rate. Nevertheless, I will offertwo possibilities as limiting estimates of IQ

    trends from 1963 to 1981. One is the possi-bility of 9 years of gain being balanced by 9years of loss, which would result in nil gainoverall and will be called our safe estimate.The other is the possibility of gains at .300points per year throughout, which would resultin an overall gain of 5.4 IQ points (.300 X 18years), or .360 standard deviation units andwill be called our speculative estimate.

    That IQ accounts for 64% ofSAT-V vari-ance assumes that the Stanford-Binet and the

    various Wechsler tests correlate with the SAT-V at about .80 (correlation squared equalspercentage of variance explained). Incredibleas it seems, there appear to be no correlationalstudies in the literature, but a good case canbe made for a value of .80 (assuming correctionfor restriction of range) on the following

  • 8/8/2019 flynn1984b[1]

    9/23

  • 8/8/2019 flynn1984b[1]

    10/23

    38 J A M E S R. FLYNN

    The advisory panel went beyond the factsof the SAT-V score decline to discuss possiblecauses, which meant listing the personal traitsthat contribute to scholastic aptitude: intel-

    lectual ability, motivation, study habits, self-discipline, and acquired verbal and writing,skills. They recognized that these personaltraits could only be the proximate cause ofthe test performance decline, with the ultimatecauses being such things as less demandingtextbooks, less demanding school standards ingeneral, rates of student absenteeism com-monly running above 15%,the erosion of thenuclear family, and the advent of television. Iam going to take the liberty of identifying themain ability component that goes into scho-lastic aptitude as IQ and will describe as non-IQ factors the rest of the personal traits thepanel lists.

    We now have everything we need to analyzethe combination of IQ trends 1963 to 1981and SAT-V trends during the same period. Torecapitulate, there were IQ gains somewherebetween nil (safe estimate) and .360 SOU(speculative estimate); SAT-V losses were about

    .288 SOU; and SAT-V variance was 64% dueto IQ and 36% due to non-IQ personal traits.To anticipate: these values entail a decline innon-IQ personal traits, motivation, self-dis-cipline, and so forth, from 1963 to 1981 ofsuch magnitude as to constitute a national di-saster. The first step is to calculate how mucha one-standard-deviation drop in non-IQ traitswould lower SAT-V scores. Assuming thesetraits have a roughly normal distribution, thatmeans taking the square root of .36, the non-

    IQ portion of SAT-V variance, which gives.600 SDU. The next step is to calculate whatdrop in non-IQ traits has occurred assumingnil IQ gain during this period, which is simply.288 (the SAT loss) + .600 (the SAT loss perSD of non-IQ drop) and gives .480 standarddeviation units as the non-IQ decline. The finalstep is to take the possibility of steady IQ gainsduring this period into account, for naturallysuch gains would tend to boost SAT-V testperformanceand thus have to be overwhelmedby an even greater deterioration in non-IQpersonal traits. Just to make this clear: an IQgain of .360 SDU would tend to boost SATperformanceby , coincidentally, .288 SDU (thesquare root of .64 X .360 = .288); yet whatwe have is an S A Tloss of .288 SDU; thus the

    total SAT-V loss to be explained by non-IQfactors amounts to .576 SDU. When we dividethis by .600, we get .960 standard deviationunits as the non-IQ decline.

    Our analysis dictates the conclusion thatAmerican 18-year-olds have deteriorated .480to .960 SDU in terms of a total package con-sisting of motivation, study habits, self-disci-pline, and acquired verbal and writing skills.Which is to say that only the upper, 17% to32% of today's 18-year-olds can match the up-per half of young people as recently as 1963!The calculations above should not be takenliterally of course: They are merely meant toshow that if both IQ gains and SAT losses aretaken to be real, rather than artifacts of sam-pling error, then the deterioration of non-IQpersonal traits among young Americans musthave been very great.

    I say this because the calculations above canbe fairly described as oversimplified; for ex-ample, they assume that IQ gains and non-IQlosses affect scholastic aptitude in a simple ad-ditive fashion. When we go beyond the per-sonal traitsintelligence, motivation, and so

    onthat are the proximate causes of SAT testperformance and look at the ultimate causes,surely these will interact in a complex andnonadditive fashion. But it is precisely at thispoint that one's head begins to spin: do lessdemanding textbooks and low-level TV pro-grams raise intelligence while lowering verbalskills; do declining standards in schoolssharpen the mind while undermining studyhabits; does student absenteeism mean stu-dents are engaged in mentally demanding tasks

    while missing out on knowledge; does a de-moralized family environment boost IQ whilelowering motivation? Going beyond simplemodels to speculate about ultimate causesmakes no sense whatsoever of the trends inquestion.

    In sum, the combination of IQ gains andSAT-V losses carries an unpalatable impli-cation and problems of causal explanation ofa baffling nature. Given this, we must take thepossibility that IQ gains are not real more se-riously: after all, if IQ gains are a mere artifactof sampling error, they tell us nothing aboutintelligence; and if the intelligence of youngAmericans began to decline in 1963, there isno mystery as to why SAT-V performance de-clined. Let us then grant at least the possibility

  • 8/8/2019 flynn1984b[1]

    11/23

    IQ GAINS 1932 TO 1978 39

    that IQ gains are not real and see whether ornot this possibility robs our mass of Stanford-Binet and Wechsler data of all of its signifi-cance.

    IQ Gains and Confounding Variables

    IQ gains produce obsolete norms and ob-solete norms have acted as unrecognized con-founding variables in literally hundreds ofstudies, misleading researchers about the na-ture and significance of their results. This istrue no matter whether IQ gains are real ornot. After all, standardization samples haveset higher and higher norms over time: This

    fact remains whether the cause is a series ofrepresentative samples reflecting genuine IQgains or a series of unrepresentative samplesgrowing steadily more elite because of sam-pling errors. Thus, for whatever reason, if twoStanford-Binet or Wechsler tests were normedat different times, the later test can easily be5 or 10 points more difficult than the earlier,and any researcher who has assumed the testswere of equivalent difficulty will have goneastray.

    In analyzing the results of such research,begging the question of the reality of IQ gainsmerely means using certain terms with care:Obsolete norms are simply onesjthatare earlierand easier than later norms; current normsare simply those attached to a Stanford^Binetor Wechsler test that happened to'be stan-dardized at about the time the researcher inquestion actually tested subjects. In adjustingthe scores of subjects to compensate for theirhaving taken tests of unequal difficulty, I willuse current norms as the point of reference.Assuming total unreliability of standardizationsamples, they will do as well as any other, andif we assume some sort of progress, the more

    recent norms would have at least some ad-vantage in terms of reliability. Table 6 detailshow many points we should subtract from anearlier (and easier) test to make its scoresequivalent with those from a later (and moredifficult) test.

    This table requires several words of caution.First the allowances to be subtracted have beencalculated on the basis of our uniform scoringconvention (white M = 100, SD - 15), whichmeans that researchers must first use Table 1to translate all test scores into that convention

    Table 6A llowing forObsoleteNorms

    Tests Allowance'

    SB-LM &WPPSIWISC & WPPSIWISC & WISC-RWPPSI & WISC-RWPPSI & SB-72SB-72 & WISC-R

    Subtract 9 points from SB-LMSubtract 3 points from WISCSubtract 8 points from WISCSubtract 4 points from WPPSISubtract 4 points from WPPSINone

    Note. See Table 1 for full test names. Use this table onlyafter using Table 1 to translate all test scores into ouruniform scoring convention. For SB-LM results, ignorethe value 98 for white mean in Table 1: rather take themas scored against a white mean and SD of 100 and 16and translate into our convention of 100 and 15. See text

    for other cautionary notes. Sources: Table 2 and Wechsler(1974, pp. 51-52).

    and only then use Table 6. Second, simpleallowances of this sort are sometimes reliableonly for scores in the normal range of 90 to110: I know of no pair of tests in this tablewhere high and low scores require special ad-justments, but such pairs do exist (Hannoh &Kicklighter, 1970). The wise and WISC-Rmayappear to be such a pair but once again, ifscores are first translated into our uniformscoring convention, the 8-point deductionfrom the wise seems to hold throughout thescale except perhaps at the very extremes.Third, if a researcher's subjects are all of acertain age* say 10 years old, the allowancerequired may bedifferent than the average forall ages. Finally, the number of subjects whotook both tests and thus provided the basisfor the allowance ranges from about 100 to400 for most combinations, a situation thatmust be remedied by further studies.

    Despite all of these qualifications, using bothTable 1 (for translation into uniform scoringconvention) and Table 6 (for its allowances)to compare performances on different tests isfar better than present practice: acceptingscores at face value as if the tests were com-parable and equivalent in difficulty. To dem-onstrate this, from many available candidates,I have selected a few studies for analysis: theMilwaukee Project because of its great noto-riety and because of growingfrustration aboutreplicating its results; and three other studiesthat show how scholars who thought they weremeasuring the predictive value of tests, or theeffects of modes of administering tests, or cul-

  • 8/8/2019 flynn1984b[1]

    12/23

    40 J A M E S R. FLYNN

    Table 7Heber's Experimental Group: Mean IQPerformances with Increasing Age

    Age (years)

    Test 2-3 4-6 7-9 10-11 12-14

    Results as presented

    SB-LM 122WPPSIwise

    121111

    103 104 100

    Results adjusted

    SB-LM 108WPPSIwise

    107105

    95 96 92

    Note. See Table 1 for full test names. Adjusted resultsscored againstWISC-R norm s, sample tested 1972, trans-lated into M = 100 and SD = 15 for whites. Sources:Garber (1982, May, pp . 12-13, 18, 33a; personal com-munication, December 14, 1982), Heber & Garber (1975,p. 40).

    tural bias, were really measuringthe rate ofobsolescence.

    Heber and the MilwaukeeProject. Heber

    and Garber selected 20 children whose risk ofmental retardation, based on all known in-dices, was 16 times greater than the average,indeed, children who promised to have aneventualmean IQ ofabout 70. Who can forgetthe g reat days of the early 1970s when the firstreports emerged: that from ages 2 to 4, theexperimentalchildren were not merely normalbut superior, that their meanIQ on the Stan-ford-Binet was above 120. The news spreadbeyondAmericato the whole English-speakingworld: When A . D. C larke delivered the thirdFred Esher lecture, he gave 25% of his timeto Heber and, despite words of caution, en-thused over "the remarkable accelerationofdevelopment" of the children of the experi-mental group (Clarke, 1973,p. 15). Heber'sresults quicklyfound their way into the text-books: Mussen,Conger,and Kagan(1974)wastypical with its references to this "exciting"study, its "most encouraging" results, its "im-pressive" findingsnot the usual language ofa text.

    Table 7 showswhy theearly excitement soonbecame mixed with bewilderment and someconsternation. If readers look at the Resultsas presented data, which represent the scores

    actually released, they willsee the early Stan-ford-Binet scores that ranged 120 and above;but soon there cameWPPSIscores that seemedto show the children had fallen to a mean IQ

    of about 111 at ages 4 to 6. Or hadthey fallen?Their Stanford-Binet performance remainedhigh; that is, the children maintained a meanof 121 on the SB-LM whileat the same timeperforming 10points lower on theWPPSI.Thenthe wise results began to arrive and at age 7,the experimental groupfell from 111 to 103,a drop of 8 points in a single year. Oddlyenough,this new loss caused less concern, per-haps because it seemed less spectacular thanthe ground lost going from the early Stanford-

    Binet scores to the WPPSI.However,it occurredjust when the children left the enrichmentprogram and entered public schools,all butone entering innercity Milwaukeeschools thathave low levels of academic performance.

    We will never make sense of Heber's ex -perimental groupuntil we have an accuratepicture of their IQ test performance over thelast 14 years. I ask readers to return to Table7, assume for the moment that the Results aspresented are deceptive and the Results ad-

    justed are reliable, and consider the differencebetween the two. Heber's reported results havedominated the minds of scholars: a total lossof over 20 points (Jensen, 1981, p. 187), halflost while the children were still in the en-richment program, abouta third when theyleft, a few points during their seven years ofpublic school.The adjusted results tella verydifferent story: These children neverdid havea mean IQ above the normal rangeand theirperformancewas essentially stable justso longas they remained in H eber's prog ram; indeed,the only loss we can be certain about occurredwhen they left the program, and it was evenlarger than we thought, amountingto a full10 IQ points! In other words, we have no goodreason to believe that the experimental groupever had a mean IQ above 105,we have ex -cellent reason to believe theysuffered a massive10-point loss downto 95 the year theyenteredpublic schools, and since then they have lostlittle ground if any. With the illusion of thelost high IQs dispelled, with the conflict be-tween the SB-LM andWPPSIresults reconciled,we can focus on the only real problem:H owcould the transition from the program to pub-lic schools have been so devastating?

  • 8/8/2019 flynn1984b[1]

    13/23

    IQ GAINS 1932 TO 1978 41

    Heber and Garber made certain observa-tions at the time and now that the childrenare 14, it may be too late to add much. Theyemphasize both problems of adjustment to

    innercity schools, one little girl of high intel-lectual ability refused to speak at all duringthe first 2 months, and factors signaling a de-teriorating home environment, increased fa-ther absence, increased economic distress, andso forth (Garber & Heber, 1977, pp. 125-126;Heber, Garber, Hoffman, and Harrington,1977). The effects of deterioration at homemust have been gradual, which leaves the shockof entry into innercity schools as the obviousdramatic event. Thanks to the adjusted scores,we now know that the 10-point IQ loss wasreal, rather than an artifact of tests of unequaldifficulty, and can hope that those conductingsimilar experiments will prepare their subjectsfor such a transition.

    A s to why Heber and Garber's results asreported were deceptive, we already know theanswer: The high scores on the SB-LM wereinflated by about 30 years of obsolescence be-cause although the 1932 norms were updateda bit for the SB-LM, this did not amount tomuch. The WPPSI scores were inflated by 8yearsof obsolescence, what with norms datingfrom 1963 to 1966; and the wise scores wereinflated by some 25 years of obsolescence be-cause the test was normed in 1947-1948. Theadjusted scores, on the other hand, are theresult of using Table 1 to translate Heber'sscores into our uniform scoring conventionand then using Table 6 to allow for obsoletenorms. As for a test that would serve as a

    measure of current norms at the time the test-ing was done, the wisc-R was the obviouschoice: It was normed in 1972 and the subjectswere tested from 1969 to 1982.

    The most important reduction of Heber'sscores was from the SB-LM results: It dispelledthe myth of superior IQs lost. Since it amountsto a full 14 points, some detail is in order,particularly since we have no studies that yielda direct comparison between the SB-LM andthe WISC-R. First, translating SB-LM scores

    into our uniform scoring convention reducesthem by just over 1 point, which leaves almost13 points to be explained. Second, I used Table6 to equate the SB-LM with WPPSI scores andthen equated these in turn with the WISC-R:This gave 9 plus 4 equals 13 points. As a check

    on the second step, I used data that comparesthe SB-LM with the wise and that in turnwith the WISC-R, which gave 12 points, veryclose to our estimate of 13. The high scores

    of the experimental group were no real indexof what their performance would have beenin terms of current norms.

    The adjusted scores for the mean IQ of He-ber's children, 105 at ages 4 to 6 years, 95 atag e 7 and thereafter, must be taken into ac-count in assessing attempts to replicate hisresults. Naturally these studies will use testswith relatively current norms, and if the mythof IQs of 120 plus persists, they will all appearunsuccessful. Ann Clarke (1981) refers to thefailure "of a quite close attempt to replicateHeber's Milwaukee study" (p. 324); we canonly hope that the definitions of success andfailure were realistic. Jensen (1981, p. 188)tells us of a North Carolina program similarto Heber's in which the experimental childrenhave attained a Stanford-Binet mean (pre-sumably SB-72) of 95 with a control group at81; given my values for the Milwaukee Projectthis comes close to successful replication. Asa help to the scholars in question, rememberthat my values represent scores translated intoa uniform scoring convention. The corre-sponding scores before translation would beSB-72, 97.8-107.8; WISC-R, 97.6-106.9;WPPSI, 101.3-110.7, these last being a bithigher due to the need to allow for obsoletenorms as well.

    Crockett, Rardin, and Pasewark andpre-dicting IQ. Crockett, Rardin, and Pasewark(1975) tested 42 Headstart children on both

    the WPPSI and Stanford-Binet LM and thenin 1972, 4 years later, retested them on thewise. Taking scores at face value, the subjectsgained 7 points over 4 years, going from WPPSIto wise, whereas there was no statistically sig-nificant gain going from SB-LM to wise.These scholars concluded that "the WPPSIIQstend to underestimate subsequent IQs of lower-class children as measured by the wise" andthat "in comparison to the WPPSI, the Stan-ford-Binet seems better able to predict wise

    scores" (p. 922). As we have seen, the rec-ommendation of the SB-LM as a reliable pre-dictor of eventual IQ could hardly have beenmore mischievous: It was the inflated scoresof Heber's experimental children on this testthat led to unrealistic expectations about their

  • 8/8/2019 flynn1984b[1]

    14/23

    42 J A M E S R. FLYNN

    Table 8Crockett, Rardin, and P asewark's HeadstartGroup: IQ Stability orIQ Gains?

    Results

    A s presentedTranslated"Adjusted"

    WPPSI

    (age 51/2)

    87.4684.1480.14

    SB-LM

    (age 5'/2)

    91.9192.4279.42

    wise(age 9'/2)

    94.4394.4386.43

    Crockett elal.'s (1975) conclusion:SB-LM at age 5'/2 pre-dicted WISC mean IQ at age 9'/2, that is, stability over4 years.

    Author's conclusion:all results at age 5'/2 lower than meanIQ at age 9'/2, that is, 6 to 7 points gain over 4 years.

    Note. WPPSI = Wechsler Preschool and Primary Scale of

    Intelligence; SB-LM= Stanford-Binet Form L-M;

    WISC=

    Wechsler Intelligence Scale for Children.' Results translated into uniform scoring convention, M =100, SD = 15 for whites. "Results further adjusted to allowforpbsolete norms, that is, scored against WISC-R norms,sample tested 1972.

    eventual IQ. The only reason the unreliabilityof the SB-LM escaped noticewas that thewise, with norms dating from 1947-1948,went a long way toward matching its radicalobsolescence.

    Table 8 under Results as presented showshow things appearedto Crockettet al.: lookingat the test results from age 5'/2 years, the SB-LM scoresdo look a good matchfor the meanIQ of these children 4 years later, whereastheWPPSIscores seem rather low. Thisgave themthe followingoptions: They could choose toemphasize thegap betweenthe WPPSIand wiseresults, which implied thatthe children hadmade IQ gains with increasing age,or theycould chooseto emphasize the apparent sim-ilarity of the SB-LM andwise results, whichimplied that the children had maintainedmuch the same IQ with increasing age.Onthe face of the results, there was no way tochoose between these two options, althoughfor reasons not stated theygave preference tothe latter and compared the SB-LM favorablywith the WPPSI.

    Table 8 also shows the results translatedinto our uniform scoring conventionand then

    adjusted forobsolete norms, thatis, adjustedto compensate for the fact that the tests usedwere simply not comparable in terms of dif-ficulty. Asa consequence, the conflict betweenthe WPPSI and SB-LM scoresat age 5' /2 hasdisappeared in favor of a remarkably consistent

    performancesignaling a m ean IQ of about 80;and at age9 > / 2 ,the mean IQ stands at almost86.5. There is no doubt that thesedisadvan-tagedchildren made gains at school ra ther than

    exhibiting the usual pattern of decline, a de-cline that leaves such child ren actually labeledas retardates as they enter high school. H ow-ever, since the phenomenon was overlooked,it could not be investigated.

    Pagan and colleagues and administeringtests. W hen lower-classor black childrendobetter on a particular test because scoresareinflated by obsolescence, that test is likely tobe hailed as a fairer or more objective measureof their intelligence. For example, Pagan,Broughton,A llen, Clark,and Emerson (1969)administered both the SB-LM and the WPPSIto thirty-two 5-year-old lower-class children,half black and half w hite, and found that theStanford-Binetmean was 8points higher thanthe WPPSI mean. They explain these resultsby arguing thatthe content and administrationof the WPPSI do not suit the temperament ofthe lower-class childas well as the SB-LM.Thelower-class childis "frequently shy, nonverbal,activity-oriented, and sensitive to failure." TheWPPSI requires frequent changes ininstruc-tions and materials in midtest, asks childrento give additional reasonsfor their answers,and takes too long to ad minister. The Stanford -Binet "enabled the examinersto achieve betterrapp ort and led to their willingness to acceptthe Binet -IQ score as the more accurate"(p . 609).

    If readers turn back to Table 2, they willsee that whether childrenfind the WPPSIharderthan another testhas nothing to do with con-tent or administration but is essentially afunction of obsolescence. It is harder thanev-ery test normed at an earlier time and easierthan every test normedat a later time. For-tunately, the question in hand is subject to adirect empirical test: The SB-LM with normsfrom about 30 yearsbefore the WPPSI,and theSB-72, normed 7 yearsafter the WPPSI, arevirtually identical in content and administra-tive procedure. If content or administration

    are operative factors, both shouldbe easierthan the WPPSI. If obsolescence is at work, theearlier SBshouldbe about 9 points easier thanthe WPPSI,and it is, whereas the later SB shouldbe 2 or 3points harder.Sewell(1977)hasgivenus the first study of the WPPSI and the SB-72

  • 8/8/2019 flynn1984b[1]

    15/23

    IQ GAINS 1932 TOJ1978 43

    and, almost exactly as I would predict, thelatter was4.41 IQ points harder (using uniformscoring convention). As to the moral Sewellhimself drew from his study, he sees it as auseful supplement to Paganet al: Paganet al.showed that the SB-LM was better than theWPPSI as a test for lower-class blacks becausethe Stanford-Binet gave higher scores; nowSewell has shown that the WPPSIis better thanthe SB-72 because the WPPSI gives higherscores!

    Table 9 manipulates Pagan et al.'s resultsin the usual way. Note that when translatedinto our uniform scoring convention, theselower-class children had a greater advantageon the SB-LM over the WPPSI than Pagan etal. thought: almost 12 points rather than 8points.-When we allow for obsolescence, how-ever, the advantage isreduced to less than 3points. Still, that was enough to make the au-thor wonder if there was at least something tothe hypothesisthat lower-class children foundthe SB-LM more to their taste than did middle-class subjects. Therefore, the total data on theSB-LM and WPPSI was divided in terms of

    class: A s Table 9 shows, the score advantagethe SB-LM conferred on lower-class childrenwas actually less than that enjoyed by middle-class children; both groups were of course prof-itting from obsolescence, and their "unearnedincrements" are so close that no real distinc-tion should be made.

    Yater and standardizing tests. This bringsus to one of themost peculiar hypotheses everput forward in the history of psychology; thenotion that a test normed on a standardizationsample including all races will somehow befairer to blacks, or a better measure of theirIQ, than a test normed on whites only. Assumeyou have a test on which the average blackgets the same number of correct answers asthe 16th percentile of the white population.So long as that is the case, nothing you doabout norming.thetest will alter the perfor-mance gap between the races. If you norm on a standardization sample of whites only ariddefine the white mean as 100, blacks will get85. On the other hand, if you norm on a stan-dardization sample of all races, mainly whitesplus blacks, and put the mean at 100, blackswill get about 88. Naturally^blacks are closerto whites plus blacks than to whites only, butthis is simply playing with numbers. If you

    Table 9Social Class and Differential Performance on theStanford-Binet Form L-M (SB-LM) and theWechsler Preschool and Primary Scale

    of Intelligence (WPPSI)Results SB-LM WPPSI Difference

    Pagan et al.'s lower-class children (N = 32)

    As presented 95.20 ' 87.10 8.10Translated" 95.50 83.76 11.74Adjusted" 86.50 83.76 2.74

    Lower-class vs. middle-class children

    Middle class(N= 100)

    Lower class119.18

    94.12

    109.76

    85.89

    9.42

    8.23

    Note.Sources: Barclay& Yater (1969); Pagan et al.(1969);Flynn (1980, pp. 184-185); Garber & Heber (1977, pp.122, 125); Oakland, King, White, & E ckman (1971); Pa-sewark et al. (1971); Prosser & Crawford (1971); Rellas(1969); Wechsler (1967, p. 34)." Results translated into uniform scoring convention, M =100, SD > 15 for whites.

    b Results further adjusted to allow for obsolete norms, thatis, scored against WPPSI norms, sample tested 1963-1966.0 Results translated into uniform scoring convention butno allowance made for obsolete norms;

    want to give blacks a higher score while in noway affecting the real performance gapbetweenthe races in terms of correct answers, it wouldbe simpler to just norm blacks on themselvesand assign them a score of 100.

    Despite this, Yater, Boyd, and Barclay(1975)believed that the WPPSImight be a moreappropriate test of disadvantaged black chil-dren than the wise. The wise was normedon whites only, whereas the WPPSI includedblack children in its standardization sampleand therefore "cultural bias effects with theWPPSIwould not be expected to be operative"(p. 80). They administered both tests to 60disadvantaged black children and found, notsurprisingly,no statistically significant differ-ence. The reader may wonder why scores werenot a few points higher on the WPPSI, thanksto a scoring convention that should elevate awise score of 85 into a WPPSIscore of 88. Theanswer of course is that the wise was stan-dardized some years earlier than the WPPSI.As usual, the earlier test is the easier test andthe inflation of wise scores by obsolescencejust about matches the inflation of WPPSIscores

  • 8/8/2019 flynn1984b[1]

    16/23

    44 JAMES R. FLYNN

    Table 10Race, Class, and Differential Performance on theWechsler Intelligence Scalefor Children (WISC)and the Wechsler Preschool and PrimaryScaleof Intelligence (WPPSI)

    Results WISC WPPSI Difference"

    Lower-class black (N = 84)

    A s presentedTranslated 15

    89.9289.92

    89.3686.18

    0.563.74

    Middle-class white (N = 24)

    As presentedTranslated"

    106.30106.30

    109.00107.22

    -2.70-0.92

    Difference equals WISC score minus WPPSI.b Results translated into uniform scoring convention, M =100, SD = 15 for whites.

    by way of the latter's scoring convention. Table10 makes this clear by combining Yater et al.'sresults with those of Oakland et al. (1971), theonly other study in which children were givenboth the wise and WPPSI.The Results as pre-sented show black children getting virtuallyidentical scores on both tests. The Resultstranslated show what really happened: Whenall scores are translated into our uniform scor-ing convention, the WPPSI mean drops about4 points below the wise, and the advantagethe wise gains from obsolescence is highlyvisible. The scores of the 24 white children donot show the effect of obsolescence, but thesmall sample size is probably the cause of theiratypical results.

    It might be thought that because all current

    tests have been normed on all races, this issuewould die a natural death. But Evans andRichmond (1976) have reservations about theSB-72 because of the racial mix of its stan-dardization sample: The sample included mi-norities, but in the absence of detailed data,they wonder if minorities were properly rep-resented. When Sewell (1977) found that blackchildren scored lower on the SB-72 than onthe WPPSI,he brought the doubts of Evans andRichmond to the readers' attention. Actually,

    minorities were slightly overrepresented in theCA T sample on which the SB-72 norms werebased, but the real point is the irrelevance ofwhether minorities are included in standard-ization samples at all. Indeed, obsolescencehas done us at least one good turn: Those

    comparing a test with mixed-race normsagainst a test with whites-only norms havebeen comparing a later test against an earlierone; obsolescence has worked to guarantee that

    blacks would get lower scores on the all-racestest precisely because it was later, thus falsifyingthe hypothesis of its greater fairness. However,we have had a close call: Imagine that the all-races tests had been the earlier ones; then wewould have been told there was overwhelming"evidence" in favor of the hypothesis in ques-tion.

    Obsolescencearid the future. Earlier Stan-ford-Binet andWechsler IQ tests suffer fromobsolescence in the sense that their norms are

    easier to meet than those of later tests. Nomatter whether this reflects real IQ gains orwhether it is an artifact of cumulative errorsin standardization samples, obsolescence is afact that must be taken into account in rein-terpreting dozens of studies done in recent de-cades. There is also much in the general lit-erature that must be called into question, forexample, the claim that regression from parentto child is less for high-IQ parents than forothers. In fact, children may have been testedon an earlier and easier test (say the SB-LMor wise), whereas parents were tested on alater and harder test (say the WAIS). The effectwould be to deflate regression from high-IQparents down to the mean and inflate regres-sion from low-IQ parents up toward the mean.

    The number of studies that must be rein-terpreted rises into the hundreds if one takesinto account the likelihood of norms of un-equal difficulty for IQ tests other than the

    Stanfbrd-Binet and Wechsler: if IQ gains arereal, these other tests will have been affected;if the Stanford-Binet and Wechsler organiza-tions have made persistent and large samplingerrors, it is unlikely that all others have escapedunscathed. The tables presented here for theSB and Wechsler must be improved; but thensomeone must do a similar job for the Peabody,first comparing earlier and later versions, thencomparing all versions with Stanford-Binetand Wechsler tests, and then do a similar job

    for Ravens, and so on. We have no choice:Allowingfor obsolescence in intelligence test-ing is just as essential as allowing for inflationin economic analysis.

    Still, when dealing with tests normed from1932 to 1972, at least we have a body of studies

  • 8/8/2019 flynn1984b[1]

    17/23

    IQ GAINS 1932 TO 1978 45

    which show that obsolescence was at work andsuggest a rough estimate of rates. What ofscholars using the WAIS-R normed in 1978 orthose who will use the WPPSI-R when it is

    released later this decade? They will have nonotion of where they stand: Even if IQ gainsup to 1972 were real, the trend may havestopped or even reversed thereafter; even ifsampling errors made every test harder thanits predecessor up through 1972, they maybegin to make for easier tests at any time. Atpresent, we are in the intolerable situation ofknowing the extent and direction of obsoles-cence only years after the event, after lots oflittle studies have accumulated and a tell-tale

    pattern has begun to show through their re-sults. Test publishers should assume respon-sibility for giving advance warning of the ex-istence of obsolescence as a matter of profes-sional ethics.

    When the Psychological Corporation pub-lishes the WPPSI-R,the manual should containnot only data on a standardization sample of1,200, there must also be results from amatching sample of 400 given both the WPPSI-R and the old WPPSI, another sample givenboth the WPPSI-Rand the WISC-R,and anothergiven both the WPPSl-rRand the SB-72. Oth-erwise, it may not be long before the SB-72begins to play the destructive role that the SB-LM played with Heber and others, which isto say that the 1990s will yield the same con-fusion as the 1960s and 1970s.

    The Reality oflQ Gains.Revisited

    Having explored the possibility that IQ gainsare not real, I now wish to reverse directionand address the hypothesis that a series ofsampling errors have mimicked IQ gains. As-sume that sampling errors for Stanford-Binetand Wechsler tests from 1932 to 1978 werelikely to run as high as plus or minus 7- IQpoints (a generous assumption). Assume thatit is unlikely that the errors for any two stan-dardization samples would be identical. Thensampling errors could mimic IQ gains and

    produce what we find in Table 11: a perfectcorrespondence between the chronological or-der of our seven standardization samples andtheir rank order when listed in terms of as-cending levels of performance. However, theodds against having this occur by chance are

    Table 11Stanford-Binet and Wechsler StandardizationSamples: Comparison of ChronologicalOrder and Order of Performance

    IQ differentials

    Samples Actual" Predicted" Period 0

    SB, 1932W, 1947V2W, 1953'/2W, 1964'/2SB, 1971'/2W, 1972W, 1978

    .005.767.948.719.89

    13.3714.33

    .004.656.45

    ' 9.7511.8512.0013.80

    015.521.532.539.540.046.0

    Note. SB = Stanford-Binet; W = Wechsler. Source: Table3 with differentialscalculated by averaging all sample se-quences that terminate in a given sample and contain nomore than four samples. Sample combinations with num-ber less than 100 omitted except for WAIS-R (1978) forhere, the sole study offering comparison with WISC-R(1972).has number = 80.' Standardization samples ranked in terms of number ofIQ points by which their performance bettered that of1932.

    b Standardization samples ranked in terms of number ofIQ points by which their performance wouldhave betteredthat of 1932, assuming a rate of .300 points per year." Standardization samples ranked in terms of number of

    years elapsed since 1932.

    7 factorial, or 5,040 to 1. Only seven stan-dardization samples are listed because theeighth, the Wechsler-Bellevue I, was omittedfor reasons already given.

    The exact differentials in Table 11 that rankour standardization samples for quality of per-formance are approximations based on Table3. Nonetheless, the magnitudes of these dif-

    ferentialsare an extraordinary match for what

    we would predict, that is, the enhanced per-formance differentials predicated under theassumption of real IQ gains by Americans ata constant rate of .300 from 1932 to 1978.The discrepancies cluster at about 1 IQ point,and the largest is less than 2 points: The oddsagainst sampling errors producing this kindof match between actual and predicted valuescannot be calculated exactly but would be littleshort of astronomical.

    This naturally suggests the possibility thatsystematic rather than random bias has beenat work to mimic IQ gains, Test manuals doreveal one persistent trend: Early standardiza-tion samples tended to have a geographicalbias in favor of more progressive locales. For

  • 8/8/2019 flynn1984b[1]

    18/23

    46 J A M E S R . FLYNN

    example, both Stanford-Binet 1932 (Terman& Merrill, 1937, p. 12) and Wechsler 1947-1948 (Wechsler, 1949, p. 16) virtually omitthe Deep South with its lowIQs and represent

    the South by states such as Texas, Kentucky,North Carolina, an d Virginia. A sTerman andMerrill(1973, p. 36) have pointed out, effortsto counteract such a bias by using other strat-ification variables are probably never entirelysuccessful. Bycontrast, later samples, the CA Tsample on which the SB-72 was based and theWechsler 1972 sample, have excellent geo-graphical coverage of the nation as a whole(Anastasi, 1976, pp. 258, 310). Now theeffectof this trend would be to give earlier samplesan elite bias lacking in later ones, that is, itwould actually work against enhanced per-formance by later standardization samples.Another possible source of systematic biaswould be the fact that subjects taking both anearly and later test would encounter obsoleteitems on the early test, items whose outmodedcontent lends them artificial difficulty. How-ever, this would make it more difficult for sub-jects to meet early norms than later and onceagain, thebias would operate against estimatesof enhanced performance over time.

    Recall that nothing in the IQ data itselfev-idences gross sampling error, rather we weredriven to this hypothesis by the contrast be-tween IQ trends and SAT-Verbal trends. Itseemed impossible that tests correlated at the.80 level and measuring much in commonwould permit the following: that over a periodof 18 years, performance on the two kinds oftests had diverged by something between .288

    3D [/(safe estimate) and .648 SO U (speculativeestimate). However, acting on a hunch, theauthor undertook a more detailed analysis ofthe IQ data, studies in which subjects had takenboth the wise (1947-1948)and the WISC-R(1972), and these revealed a fact of great in-terest. Eight studies involving a total of 623subjects provide a measure of not only full-scale IQ gains during this period but also gainson each of the Wechsler subtests (Brooks, 1977;Catron & Catron, 1977; Schwarting, 1976;Solly, 1977; Solway et al, 1976; Stokes et al,1978;Swerdlik,1978, pp. 120-121; Weiner&Kaufman, 1979).

    These studies show a full-scale IQ gain of9.27 points, about 1point higher than the totalbody of wise to WISC-R data, and a gain ofonly 3.09 points on the Vocabulary subtest.

    The Vocabulary subtest correlates with full-scale IQ at .80 (Wechsler, 1974, p. 47), yetthese two trends diverge by fully 6.18 pointsor .412 SOU. Indeed, if we compare the Vo-

    cabulary subtest with the rest of the wise (thewhole test minus Vocabulary^ thedivergenceis .458 SDU. These values fall virtually in themiddle of the gap posited between IQ trendsand SAT-V trends, that is, about halfway be-tween .288 and .648 SDU. So what seemedimpossible is possible: IQ and at least one cen-tral verbal skill can diverge radically despitea high correlation coefficient; the full-scale IQversus Vocabulary contrast shows that the IQversus SAT-Verbal contrast really may haveoccurred. On the other hand, these new datado nothing to resolve the problem of explain-ing the IQ versus SAT-V contrast. It merelyposes that problem with renewed force: howcan school children gain so much in overallintelligence andmake so little progress in termsof enhanced vocabulary? What causal factorscould boost intelligence and yet somehowwithhold their potency from the world ofwords?

    IQ Gains and Causal Explanation

    Setting aside the combination of trends dis-cussed above, can we even explain the phe-nomenon of IQ gains taken in isolation? Inmy opinion, the massive gains Americans ap -pear to have made from one generation toanother, about 14 IQ points over 46 years,pose a serious problem of causal explanation,given the present state of our knowledge. I will

    argue for three propositions: that the familiarcausal factors we use to explain IQ variancewithin each generation, such as socioeconomicstatus (SES), have very limited promise; thatthe causal factors that are plausible candidatesfor operating primarily between generationshave greater explanatory potential; but thateven these may fall short and force us to con-clude that our knowledge of environmentaldeterminants of IQ is more limited than wesuspected.

    Recent research indicates that IQ gains overthe last two generations must be due to en-vironmentalprogress rather than to improvedgenes (Loehlin, Lindzey, & Spuhler, 1975, pp.306-307). Among environmental factors po-tent within each generation, Jensen (1973, p.357) designates SES as the most important

  • 8/8/2019 flynn1984b[1]

    19/23

    IQ GAINS, 1932 TO 1978 47

    variable, with additional variables adding verysmall increments; he also argues that an SESgap of a full standard deviation makes a dif-ference of about 2 IQ points in the American

    setting, assuming that confounding geneticfactors have been eliminated (Jensen, 1981,p. 216). Clearly, measured in terms of present-day differences, the SES gap between 1932 and1978 would have to be too enormous for cre-dence in order to play much of a role in ex-plaining a 14-point IQ gain. As for the totalof environmental factors active at a particulartime, the latest estimates attribute a maximumof .25 of IQ variance to between-families fac-tors (I say maximum because the value declinessharply after adolescence; Plomin & DeFries,1980, p. 19). Assuming that systematic en-vironmental effects have a roughly normaldistribution, one standard deviation of be-tween-families difference in environmentalquality makes a difference of 7.5 IQ points(the square root of .25 X 15 as value for SDfor whites). Therefore, if we tried to explainour between-generations IQ gap in terms ofwithin-generations environmental factors, wewould have to say that the average environmentin 1932 was 1.84 standard deviations (13.8 +7.5) below the average in 1978. This wouldput the Americans of 1932 at the 3rd percentileof environmental quality as measured today,which again taxes our credence.

    This suggests turning to environmental fac-tors that differ greatly between two generationswhile perhaps explaining a small amount ofIQ variance within a generation. Scholarlycorrespondents of high competence (H. J.Eysenck, personal communication, December14, 1982; J. C. Loehlin, personal communi-cation, January 3, 1983; D. Zeaman, personal

    * communication, January 13, 1983) have of-fered two possible causes of IQ gains over time,namely, increased test sophistication and a ris-ing level of educational achievement. In pass-ing, it is worth noting that even if these canexplain IQ gains in isolation, neither .does any-thing to solve the puzzle posed by the com-bination of IQ gains (or even IQ stability) andSAT-V losses. Young Americans who enjoyincreased test sophistication or better educa?tion ought to show gains on both of these tests,indeed, it seems inconceivable that one couldbe affected and not the other.

    Test sophistication is unique among envi-ronmental explanations of IQ gains in affecting

    the status of those gains. Assume that a massivegain from one generation to another merelyshows that Americans were getting more prac-tice in taking standardized tests: the gains

    would be real in the sense that enhanced per-formance on IQ tests really exists rather thanbeing an artifact of sampling error; but theywould not be real in the sense that they wouldsignal no increase in intelligence, the trait IQtests attempt to measure. Given this assump-tion, we should call IQ gains semi-real: theyare like the improved times of athletes ben-efitting from a lighter running shoe; althoughperformances on the clock really do improve,no one would claim that the athletes are betteras such.

    However, test sophistication is going to re-quire a great deal of evidence to render it plau-sible as a dominant cause. Jensen (1980, pp.590-591) emphasizes that even working withentirely naive subjects, repeated testing withparallel forms of IQ tests gives gains that totalonly 5 or 6 points. Further, he argues that bythe 1950s, we reached a point of virtual sat-uration in exposure to standardized tests, andif that is so, it should have paid diminishingreturns after that date. Test sophisticationabove all factors exhibits a threshold effect andno such effect is revealed in our data. It isinteresting to note that many years ago whenR. D, Tuddenham (1948) provided evidencefrom Army mental tests of massive IQ gainsbeginning in 1918, the immediate reaction wasto suggest test sophistication as a major factor(Fulk, 1949, p. 17). If it is true that Americanshave gained 6 IQ points every 20 years from1918 to 1978, we cannot keep explaining thisaway by calling upon test sophistication timeafter time.

    Enhanced educational achievement is amuch more impressive candidate for explain-ing IQ gains over time. For example, Tudden-ham (1948) analyzed his army data and foundthat weighting the 1918 mental test perfor-mance in terms of years of school completed,so as to match the 1943 educational distri-bution, caused about 55% of the mental testgains to disappear. On the other hand, selectingan elite in terms of schooling from 1918 tomatch 1943 inflates the influence of educationas an environmental variable: Such an elitewill be to some degree a genetic elite as well,and the influence of superior genes for IQ willbe confounded with better education. Also

  • 8/8/2019 flynn1984b[1]

    20/23

    48 JAMES R. FLYNN

    note that an educational elite will be superiorin terms of a whole range of other environ-mental variables, such as SES, nutrition, childcare, and test sophistication in particular. My

    own guess is that our current notions of thenature and potency of environmental variablesput us about halfway, maybe a bit further, to-ward a plausible explanation of massive IQgains.

    The difficulty of our task, given current no-tions of the potency of environmental factors,can be illustrated by some comparative data.In order to explain a Japanese IQ gain onWechsler tests of 7 points over 23 years (notethat the rate of .304 points per year is familiar),

    A . M. Anderson (1982) had to call attentionto environmental changes of the most radicalsort. As he pointed out, since 1930 Japan hasexperienced massive urbanization, a culturalrevolution from feudal to western attitudes,the decline of inbreeding and consanguineousmarriages, and huge advances in nutrition, lifeexpectancy, and education. The magnitude ofgain Anderson attempted to explain in theJapanese context, we have to explain in theAmerican context for the period from 1948to 1972. And yet, to find analogous environ-mental changes in the United States, we wouldhave to go back to the turn of the century.

    Summary of Implications

    Assuming that American IQ gains 1932 to1978 are not real, but an artifact of samplingerror, they have acted as a confounding vari-able in hundreds of studies and require alteredpractices from testing organizations. Assuming

    that these gains are semi-real, due primarilyto test sophistication, they imply all of theabove and also reveal the inexplicable com-bination of IQ gains and SAT-V losses. As-suming that these gains are real, they implyall of the above and pose a serious problemof causal explanation. Moreover, all of theseimplications hold even if real gains ceased in1978 or even 1972: The period in questionshows the radical malleability of IQ during atime of normal environmental change; other

    times and other trends cannot erase that fact.

    ReferencesAnastasi, A . (1961). Psychological testing (2nd ed.). N ew

    York: Macmillan.

    Anastasi,A . (1976). Psychological testing (4th ed.). N ewYork: Macmillan.

    Anderson,A. M. (1982). The great Japanese IQ increase.Nature, 297, 180-181.

    Anderson, E. E., Anderson, S. F,, Ferguson, C, Gray, J.,

    Hittinger, J., McKinstry,E., Motter, M . E., & Vick, G.(1942).Wilson College studies in psychology: I. A com-parison of the Wechsler-Bellevue, Revised Stanford-Bi-net, and American Council of Education tests at thecollege level. Journal of Psychology, 14 , 317-326.

    Appelbaum, A. S., & Tuma, J. M. (1977). Social class andtest performance: Comparative validity of the Peabodywith the WISC and WISC-R for two socioeconomicgroups. Psychological Reports,40, 139-145.

    Arnold, F. C., & Wagner, W. K. (1955). A comparison ofWechsler children's scale and Stanford-Binet scores foreight- an d nine-year-olds. Journal of Experimental Ed-ucation, 24, 91-94.

    Austin,J . J ., & Carpenter, P. (1970).The use of theWPPSIin early identification of mental retardation and pre-school special education. Proceedingsof the 78thAnnualConventionof the AmericanPsychologicalAssociation,5 , 913.

    Barclay, A ., & Yater, A . (1969). Comparative study of theWechsler Preschool and Primary Scale of Intelligenceand the Stanford-Binet Intelligence Scale, Form L-Mamong culturally deprived children. Journal of Con-sulting an d Clinical Psychology, 33, 257.

    Barratt, E. S., & Baumgarten, D. L. (1957). The rela-tionship of the WISC and Stanford-Binet to schoolachievement. Journal of ConsultingPsychology,21 , 144.

    Bradway,K. P., & Thompson, C. W. (1962). Intelligenceat adulthood: A twenty-fiveyear follow-up.EducationalPsychology, 53, 1-14.

    Brittain, M . (1968). A comparative study of theWechslerIntelligence Scale forChildren and theStanford-BinetIntelligence Scale (Form L-M) with eight-year-old chil-dren. British Journal of Educational Psychology, 38,103-104.

    Brooks, C. R. (1977). WISC, WISC-R, S-B L & M, WRAT:Relationships and trends among children ages six to tenreferred forpsychological evaluation. Psychologyin theSchools, 14 , 30-33.

    Catron, D. W, & Catron, S. S. (1977). WISC vs WISC-R: A comparison with educable mentally retarded chil-

    dren. Journal of School Psychology, 15 , 264-266.Clarke, A. (1981).Sir Cyril Burt and Rick Heber. Bulletinof the British PsychologicalSociety, 34, 324.

    Clarke, A . D.(1973). The prevention of subcultural sub-normality: Problems and prospects. British Journal ofMental Subnormality, 19, 7-20.

    Cohen, B. D., & Collier, J. N. (1952). Anote on theWISCand other tests of children six to eight years old. Journalof ConsultingPsychology, 16 , 226-227.

    Cole, P., & Weleba, L. (1956). Comparison data on theWechsler-Bellevueand the WAIS. Journal of ClinicalPsychology, 12 , 198-200.

    Covin, T. M. (1977). Comparability of WISC and WISC-

    R scores for 30 8- and9-year-old institutionalized Cau-casian children. Psychological Reports,40 , 382.Crockett, B. K., Rardin, M. W., & Pasewark, R. A. (1975).

    Relationship between WPPSI and Stanford-Binet IQsand subsequent WISC IQs in headstart children. Journalof Consulting and ClinicalPsychology, 43 , 922.

  • 8/8/2019 flynn1984b[1]

    21/23

    IQ GAINS 1932 TO 1978 49

    Davis, E. E. (1977). Matched pair comparison of WISCand WISC-R scores. Psychologyin the Schools,14 , 161-166.

    Delattre, L., & Cole, D. (1952). A comparison of theWISC and the Wechsler-Bellevue. Journal of ConsultingPsychology, 16 , 228-230.

    Educational Testing Service. (1977). National report oncollegeb ound seniors,1977. Princeton, NJ: College En-trance Examination Board.

    EducationalTesting Service. (1981). National report oncollegebound seniors,1981. Princeton, NJ: College En-trance Examination Board.

    Estes, B. W. (1965). Relationship between the Otis, 1960Stanford-Binet, and WISC. Journal of Clinical Psy-chology, 21, 296-297.

    Estes, B. W., Curtin, M. E., De Burger, R. A., & Denny,C. (1961). Relationship between the 1960 Stanford-Bi-net, 1937 Stanford-Binet, WISC, Raven, and Draw-a-

    Man. Journal of ConsultingPsychology, 25 , 388-391.Evans, P. L., & Richmond, B. O. (1976). A practitioner'scomparison:'The 1972 Stanford-Binet and the WISC-R. Psychology in the Schools, 13,9-14.

    Pagan, J., Broughton, E., Allen, M., Clark, B., & Emerson,P. (1969). Comparison of the Binet and WPPSI withlower class 5-year-olds. Journal of Consulting and Clin-ical Psychology, 33 , 607-609.

    Flynn, J. R. (1980). Race, IQ and Jensen. London: Rout-ledge & Kegan Paul.

    Frandsen, A. N., & Higginson, J. B. (1951). The Stanford-Binet and the Wechsler Intelligence Scale for Children.Journal of ConsultingPsychology, 15 , 236-238.

    Fulk, B. E.(1949). A comparisonof Negro an d whiteArmyGeneral Classification Test scores. Unpublished master'sthesis, University of Illinois, Urbana.

    Garber, H. L. (1982, May). The Milwaukee Project:Pre-venting mental retardation in children of families atrisk. Paper presented at the CIBA-GEIGy Conferenceon Mental Retardation from a Neurobiological and So-ciocultural Point of View, Lund, Sweden.

    Garber, H., & Heber, F. R. (initials incorrectcorrect:R. F.). (1977). The Milwaukee Project: Indications ofthe effectiveness of early intervention in preventingmental retardation. In P. Mittle (Ed.), Research to prac-tice in mental retardation: Vo l .1. Care an d intervention(pp. 119-127). Baltimore, MD: University Park Press.

    Gehman, I. H., & Matyas, R. P. (1956). Stability of theWISC and Binet tests. Journal of Consulting Psychology,20, 150-152,

    Gerboth, R. (1950). A study of two forms of the Wechsler-Bellevue Intelligence Scale. Journal of Consulting Psy-chology, 14, 365-370.

    Giannell, A. S., & Freeburne, C. M. (1963). The com-parative validity of the WAIS and the Stanford-Binetwith college freshmen. Educational and PsychologicalMeasurement,23, 557-567.

    Gibby, R. G. (1949). A preliminary survey of certain as-pects of Form II of the Wechsler-Bellevue Scale as com-pared to Form I. Journal of ClinicalPsychology,5, 165-169.

    Goldfarb, W. (1944). Adolescent performance on theWechsler-Bellevue Intelligence Scales and the RevisedStanford-Binet Examination, Form L. Journal of Ed -ucational Psychology, 35 , 503-507.

    Goolishian, H. A ., & Ramsay, R. (1956). The Wechsler-

    Bellevue Form I and the WAIS: A comparison. Journalof Clinical Psychology, 12 , 147-151.

    Halpern, F. (1942). A comparison of the Revised StanfordL and the Bellevue Adult Intelligence Test as clinicalinstruments. PsychiatricQuarterly Supplement,16, 206-211.

    Hannon, J. E., & Kicklighter, R. (1970). WAIS vs WISCin adolescents. Journal of Consulting andClinical Psy-chology, 35 , 179-182.

    Harlow, J. E., Price, A. C., Tatham, L. J., & Davidson,J. R. (1957). Preliminary study of comparison betweenWechsler Intelligence Scale for Children and Form Lof the Revised Stanford-Binet Scale at three age levels.Journal of Clinical Psychology, 13 , 72-73.

    Hartlage, L. C., & Boone, K. E. (1977). Achievement testcorrelates of Wechsler Intelligence Scale for Childrenand Wechsler Intelligence Scale for ChildrenRevised.Perceptual an d Motor Skills, 45 , 1283-1286.

    Heber, R., & Garber, H. (1975). Progress report II: Anexperiment in the prevention of cultural-familial re-tardation. In D. A.Primrose (Ed,), Proceedings of theThird Congressof th e International Associationfor theScientific Study of Mental Deficiency, Vol. 1, (pp. 34 -43). Warsaw: Polish Medical Publishers.

    Heber, R. F., Garber, H. L., Hoffman, C., & Harrington,S. (circ. 1977). Establishment of the High Risk Popu-lation Laboratory (Research project report). Unpub-lished manuscript, University of Wisconsin, Madison.

    Holland, G. A. (1953). A comparison of the WISC andStanford-Binet IQs of normal children. Journal of Con-sulting Psychology, 17, 147-152.

    Jackson, R. (1976). A summary of SA T score statisticsfo rCollegeBoard candidates. Princeton, NJ: College En-trance Examination Board.

    Jensen, A. R. (1973). Educability andgroup differences.New York: Harper & Row.

    Jensen, A. R, (1980). Bias in mental testing. London:Methuen.

    Jensen, A. R, (1981). Straight talk about mental tests.New York: The Free Press.

    Jones, S. (1962). The Wechsler Intelligence Scale for Chil-dren applied to a sample of London primary schoolchildren. British Journal of Educational Psychology,32,119-132.

    Kangas, J., & Bradway, K. (1971). Intelligence at middle

    age: A thirty-eight-year follow-up. DevelopmentalPsy-chology, 5 , 333-337.

    Karson, S., Pool, K. B., & Freud, S. L (1957). The effectsof scale and practice on WM S and W-B I test scores.Journal of ConsultingPsychology, 21 , 241-245.

    Kaufman,A .S., & Doppelt, J. E . (1976).Analysisof WISC-R standardization data in terms of the stratificationvariables. Child Development,47, 165-171.

    Klinge,V., Rodziewicz, T, & Schwartz, L. (1976). Com-parison of the WISC and WISC-R on a psychiatric ad-olescent inpatient sample. Journal of Abnormal ChildPsychology, 4, 73-81.

    Knopf, I. J., Murfett, B. J., & Milstein, V. (1954). Rela-tionships between the Wechsler-Bellevue Form I andthe WISC. Journal of ClinicalPsychology,10 , 261-263.

    Krugman, J. I., Justman, J., Wrightstone, J. W., & Krug-man, M. (1951), Pupil functioning on the Stanford-Binet and the Wechsler Intelligence Scale for Children.Journal of ConsultingPsychology, 15 , 475-483.

  • 8/8/2019 flynn1984b[1]

    22/23

    50 JAMES R. FLYNN

    Kureth, G.,Muhr, J . P., & Weisgerber,C. A.(1952). Somedata on the validity of the Wechsler Intelligence Scalefor Children. Child Development,23, 281-287.

    Larrabee, G. J ., & Holroyd, R. G. (1976). Comparisonof WISC and WISC-R using a sample of highly intelligent

    students. PsychologicalReports, 38 , 1071-1074.Levinson,B. M. (1959). A comparison of the performanceof bilingual and monolingual native born Jewish pre-school children of traditional parents on four intelligencetests. Journal of Clinical Psychology, 15 , 74-76.

    Loehlin, J. C., Lindzey, G., & Spuhler, J. N. (1975). Racedifferences in intelligence.San Francisco: Freeman.

    McKerracher, D. W., & Scott, J. (1966). IQ scores andthe problem of classification: A comparison of the WAISand S-BForm L-M in agroup of subnormal and psy-chopathic patients. British Journal of Psychiatry, 112,537-541.

    McNemar, Q. (1942). The revision of the Stanford-Binet

    Scale. Boston: Houghton Mifflin.Memorandumfo r Mrs. Sharp: Scholastic Aptitude Testcandidate volumes. (Undated). (Available from EleanorV. Home, Educational Testing Service, Princeton, N J)

    Modu, C. C., & Stern, J . (1977). The stability of the SAT-Verbal score scale. Princeton, N J: College Entrance Ex-amination Board.

    Mussen, P. H., Conger, J. J., & Kagan, J. (1974). Childdevelopmentand personality (4th ed.). N ew \brk: Harper& Row.

    Mussen, P. H., Dean, S., & Rosenberg, M . (1952). Somefurther evidence of the validity of the WISC. Journalof ConsultingPsychology, 16 , 410-411.

    Neuringer,C. (1963). The form equivalence between theWechsler-Bellevue Intelligence Scale, Form I and theWechslerAdult Intelligence Scale. Educational an d Psy-chological Measurement,23, 755-763.

    Oakland, T.D., King, J . D., White, L. A ., & Eckman, R.(1971). A comparisonof performance on the WPPSI,WISC, and SB with preschool children: Companionstudies. Journal of School Psychology,9, 144-149.

    Pasewark, R. A ., Rardin, M . W., & Grice, J. E. (1971).Relationship of the Wechsler Pre-school and PrimaryScale of Intelligence and the Stanford-Binet (L-M) inlower class children. Journal of School Psychology, 9,43-50.

    Pastovic, J. J., & Guthrie, G. M. (1951). Some evidenceon the validity of the WISC. Journal of ConsultingPsy-chology, 15 , 385-386.

    Plomin, R., & DeFries, J. C. (1980). Genetics and intel-ligence: Recent data. Intelligence, 4, 15-24.

    Price, J. R., & Thome, G. D. (1955). A statistical com-parison of the WISC and Wechsler-Bellevue, Form I.Journal of ConsultingPsychology,19, 479-482.

    Prosser, N. S., &Crawford, V. B. (1971). Relationship ofscores on the Wechsler Preschool and Primary Scale ofIntelligence and the Stanford-Binet Intelligence ScaleForm LM . Journal of School Psychology, 9, 278-283.

    Quereshi, M . Y. (1968a). Thecomparability of WAIS andWISC subtest scores and IQ estimates. Journal of Psy-

    chology, 68, 73-82.Quereshi, M. Y. (1968b). Practice effects on the WISC

    subtest scores and IQ estimates. Journal of Clinical Psy-chology, 24, 79-85.

    Quereshi, M . Y., & Miller, J. M . (1970). Thecomparabilityof the WAIS, WISC, and WB H.Journal of EducationalMeasurement,7, 105-111.

    Rasbury, W, McCoy, J . G., & Perry, N. W. (1977). Re-

    lations of scores on WPPSI and WISC-R at a one-yearinterval. Perceptual an d Motor Skills, 44, 695-698.

    Rellas, A. J. (1969). The use of the Wechsler Preschooland Primary Scale (WPPSI) in the early identificationof gifted students. Journal of EducationalResearch, 20,

    117-119.Reynolds, C. R., & Hartlage, L. (1979). Comparison ofWISC and WISC-R regression lines for academic pre-diction with black and with white referred children.Journal of Con sulting and ClinicalPsychology,47 , 589-591.

    Ross, R. T, & Morledge, J. (1967). A comparison of theWISC and WAIS at chronological age 16.Journal ofConsultingPsychology, 31 , 331-332.

    Sartain, A. Q. (1946). Comparison of the new RevisedStanford-Binet, the Bellevue Scale, and certain grouptests of intelligence. Journal of Social Psychology, 23 ,237-239.

    Schachter, F. E, & Apgar, V. (1958). Comparison of pre-school Stanford-Binet and school-age WISC IQs. Journalof Educational Psychology,49 , 320-323.

    Schwarting,F. G. (1976). A comparison of the WISC andWISC-R. Psychology in the Schools, 13 , 139-141.

    Seashore, H., Wesman, A., & Doppelt, J. (1950). The stan-dardization of the Wechsler Intelligence Scale for Chil-dren. Journal of ConsultingPsychology, 14 , 99-110.

    Sewell, T. E. (1977). A comparison of the WPPSI andStanford-Binet Intelligence Scale (1972) among lower-SES black children. Psychologyin the Schools, 14 , 158-161.

    Simpson, R. L.(1970). Study of the comparability of WISCand the WAIS. Journal of Consulting andClinical Psy-chology, 34 , 156-58.

    Solly, D. C. (1977). Comparison of WISC and WISC-Rscores of mentally retarded and gifted children. Journalof School Psychology, 15 , 255-258.

    Solway, K . S., Fruge, E., Hays, J. R., Cody, J., & Gryll,S. (1976). A comparison of the WISC and WISC-R ina juveniledelinquent population. Journal of Psychology,94 , 101-106.

    Stokes, E. H., Brent, D., Huddleston, N. J., Rozier, J. S.,& Marrero, B. (1978). A comparison of WISC andWISC-R scores of sixth grade students: Implications forvalidity. Educational an d PsychologicalMeasurement,38 , 469-473.

    Stroud, J. B., Blommers, P., & Lauber, M. (1957). Cor-relation analysis of WISC and achievement tests. Journalof Educational Psychology, 48 , 18-26.

    Swerdlik,M . E. (1978).Comparison of WISC and WISC-R scores of referred black, white and Latino children.Journal of School Psychology, 16 , 110-125.

    Terman, L. M. (1942). The revision procedures. In Q.McNemar,The revisionof the Stanford-Binet Scale (pp.1-14). Boston: Houghton Mifflin.

    Terman, L. M ., & Merrill, M. A. (1937). Measuring in -telligence.London: Harrap.

    Terman, L. M., & Merrill, M. A. (1973). Stanford-BinetIntelligenceScale: 1973 norms edition. Boston: Hough-

    ton Mifflin. 'Thomas, P. J. (1980). A longitudinal comparison of the

    WISC and WISC-R with special education pupils. Psy-chology in the Schools, 17, 437-441.

    Thorndike,R. L. (1973).S t a n f o r d - B i n e tIntelligenceScale1972 norms tables. Boston: Houghton M i ffl i n .

    Thorndike, R. L. (1975). M r. Binet's test 70 years later.Educational Researcher, 4, 3-7.

  • 8/8/2019 flynn1984b[1]

    23/23

    IQ GAINS 1932 TO 1978 51

    Triggs, F. Q, & Cartee, J. K. (1953).Pre-schoolperfor-mance on the Stanford-Binetand the WechslerIntel-ligenceScalefo rChildren.Journal o fClinicalP s y c h o l o g y,9, 27-29,

    Tuddenham,R. D. (1948). Soldier Intelligencein World

    Wars I and II.AmericanP s y c h o l o g i s t ,3, 54-56,Tuma, J. M., Appelbaum,A. S., & Bee, D, E. (1978).Comparabilityof the WISCand theWISC-Rin normalchildrenof divergentsocioeconomic backgrounds.Psy-c h o l o g y,i n the Schools, 15, 339-346.

    Wechsler, D.(1939).Themeasuremento fadult i n t e l l i g e n c e .Baltimore, MD: Williams &Wilkins.

    Wechsler, D. (1949).WISC manual.New York: The Psy-chologicalCorporation.

    Wechsler, D. (1955). WA I Smanual.New York: The Psy-chologicalCorporation.

    Wechsler, D. (1958). The measurementand a p p r a i s a lofadult intelligence(4th ed.). Baltimore, MD: Williams& Wilkins,

    Wechsler,D. (1967).W P P S Imanual.NewYo r k :The Psy-chological Corporation.

    Wechsler, D. (1974). WISC-R manual.New York: ThePsychological Corporation.

    Wechsler, D. ^1981). WA I S - Rmanual.New York: ThePsychological Corporation.

    Weider, A,, Levi, J., & Risen, F. (1943). Performanceof

    problem childrenon the Wechsler-BellevueIntelligenceScales and the Revised Stanford-Binet.PsychiatricQ u a r t e r l y,17, 695-701.

    Weider, A., Noller, P.A., & Schramm,T. A. (1951). TheWechsler Intelligence Scalefor Childrenand theRevised

    Stanford-Binet.Journal of ConsultingP s y c h o l o g y,15,330-333,Weiner, S. G., &Kaufman,A. S.(1979). WISC-R versus

    WISC forblack children suspectedof learningor be-havioural disorders. Journal of LearningDisabilities,12, 100-107.

    Wheaton,P. J., Va n d e rg r i f f ,A.F., & Nelson,W. H. (1980).Comparability of the WISC and WISC-R with brightelementary school students. Journal of School P s y -c h o l o g y,1 8 ,271-275. .

    Wirtz,W. (Chairperson). (1977).Onfurtherexamination:R e p o r to f the A d v i s o r yPanelon the ScholasticAptitudeTe s tScore Decline.Princeton, NJ: College Entrance

    ExaminationBoard.Yater,A. C, Boyd, M., &Barclay,A.(1975).Acomparativestudy ofWPPSIand WISC performanceof disadvan-taged children.Journal o fClinicalP s y c h o l o g y,31, 78-80.

    Zimmerman,I. L., &Woo-Sam,J, (1972). Research withthe Wechsler Intelligence ScaleforChildren:1960-1970.P s y c h o l o g yin the Schools,9, 232-271.

    Appendix

    Studies Rejected for Computing IQ Gains Over Time

    Reason Study

    Same subjects,asanother study

    I n s u f f i c i e n tdata about means

    Tests not given to same subjects

    Two or more years betweentests

    Altered environment between tests

    Practice effects

    No date for norms of Wechsler-BellevueII

    Crockett, Rardin, & Pasewark,1975Gehman &Matyas,1956

    Cole & Weleb'a, 1956Hariow, Price, Tatham, & Davidson, 1957

    Goolishian & Ramsay, 1956

    Austin & Carpenter, 1970Schachter& Apgar,1958

    Stroud, Blommers,& Lauber, 1957Garber &Heber, 1977'

    Delattre& Cole, 1952Ross & Morledge, (967

    Gerboth, 1950Gibby, 1949Quereshi & Miller, 1970"

    Note.Exceptions:Asexplainedin the text, I rejected all studies whose subjectshad a mean IQ above 130 on the testwith the more recent norms and all studies of the mentally retarded, unless the subjects had a mean IQ above 75. Afe wstudiesof the combinationSB-LM and SB-72 were omitted as superfluous given the large numbers(N = 2,351)

    provided by the Stanford-Binet 1972 standardization sample. Two studies of the WISC andWISC-R (Thomas, 1980;Solwayet al., 1976) were included despite not meeting the usual criteria because their research design was calculatedto overcome methodological problems peculiarto that test combination,'Rejected data for the Wechsler Intelligence Scalefor Children only.""Rejected Wechsler-BellevueII data only.

    Received March 9, 1983Revision received May 20, 1983