eugene garfieldgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · eugene garfield...

12
EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104 Towards Scientography Number 43 October 27, 1986 Maps serve as tools of navigation by accurately depicting spatial relation- ships. In doing so, maps help us to move from place to place. We can also use maps, however, merely to contemplate the way in which things are distributed and the meaning of their relationships. ISF’s multidimensional-scaling maps of the sciences serve both purposes. Our maps represent where research- ers have been. As historical records, then, these maps are surveys of the geog- raphy of scientific ideas and discov- eries-intellectual gazetteers, if you will—for a given year. They show rela- tionships among fields and how closely linked one discipline is to another. They reveal which realms of science are being most actively investigated and the indi- viduals, publications, institutions, and nations that are currently preeminent in these areas. By examining and by con- templating chronologically sequential maps, we can observe the manner in which scientific knowledge advances. While our maps cannot predict where researchers will go, they can serve as in- dicators. Changes from year to year reveal trends, and the maps, therefore, can serve as forecasting tools. On a certain subject, a novice can at a glance identify those articles and books that experts use most frequently. Even the expert may profit from perusal of the maps; for example, the unsuspected proximity of another area of research to hk or her own area might suggest aher- nate routes of analysis that can yield new understandings, The term “scientography,” coined by EN’s manager of basic research, George Vladutz, reflects both the derivation of mapping from the field of scientometncs and the broad, “geographic” focus of the maps themselves. Henry Small and I published the paper that follows in the Journal of Informa- tion Science. 1 Itdescribes the current state of mapping techniques, reports some findings for the year 1983, and sug- gests the potential contribution of map- ping to the history of science. (Maps for 1984 have recently appeared;z the read- er may wish to compare these with the 1983 maps presented here. ) Since multi- dimensional-scaling maps for research fronts appear regularly in Current Con- tents@, it seemed appropriate to reprint the article in toto. As director of corp~ rate research, Henry Small is working closely with Alexander Gnmwade and me in utilizing these maps to provide the basis for the forthcoming 1S1 Atlas of Sciencem . 0198.51s1 REFERENCES 1. SmaH H & Garfkld E. The geography of science: disciplinary and national mappings. J. Inform. .ki. 11:147-59, 1985. 2. Garffekt E. Mapping the world of biomedical engineering. Ann. I?iomed Eng. 14(2) :97-10S, 1986. 324

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

EUGENE GARFIELDINSTITUTE FOR SCientifiC IN F0fIMA710N”3501 MA RKETST, PHILADELPHIA PA 19104

Towards Scientography

Number 43 October 27, 1986

Maps serve as tools of navigation byaccurately depicting spatial relation-ships. In doing so, maps help us to movefrom place to place. We can also usemaps, however, merely to contemplatethe way in which things are distributedand the meaning of their relationships.ISF’s multidimensional-scaling maps ofthe sciences serve both purposes.

Our maps represent where research-ers have been. As historical records,then, these maps are surveys of the geog-raphy of scientific ideas and discov-eries-intellectual gazetteers, if youwill—for a given year. They show rela-tionships among fields and how closelylinked one discipline is to another. Theyreveal which realms of science are beingmost actively investigated and the indi-viduals, publications, institutions, andnations that are currently preeminent inthese areas. By examining and by con-templating chronologically sequentialmaps, we can observe the manner inwhich scientific knowledge advances.

While our maps cannot predict whereresearchers will go, they can serve as in-dicators. Changes from year to yearreveal trends, and the maps, therefore,can serve as forecasting tools.

On a certain subject, a novice can at aglance identify those articles and books

that experts use most frequently. Eventhe expert may profit from perusal of themaps; for example, the unsuspectedproximity of another area of research tohk or her own area might suggest aher-nate routes of analysis that can yield newunderstandings,

The term “scientography,” coined byEN’s manager of basic research, GeorgeVladutz, reflects both the derivation ofmapping from the field of scientometncsand the broad, “geographic” focus of themaps themselves.

Henry Small and I published the paperthat follows in the Journal of Informa-

tion Science. 1 Itdescribes the currentstate of mapping techniques, reportssome findings for the year 1983, and sug-gests the potential contribution of map-ping to the history of science. (Maps for1984 have recently appeared;z the read-er may wish to compare these with the1983 maps presented here. ) Since multi-dimensional-scaling maps for researchfronts appear regularly in Current Con-

tents@, itseemed appropriate to reprintthe article in toto. As director of corp~rate research, Henry Small is workingclosely with Alexander Gnmwade andme in utilizing these maps to provide thebasis for the forthcoming 1S1 Atlas of

Sciencem . 0198.51s1

REFERENCES

1. SmaH H & Garfkld E. The geography of science: disciplinary and national mappings.J. Inform. .ki. 11:147-59, 1985.

2. Garffekt E. Mapping the world of biomedical engineering. Ann. I?iomed Eng. 14(2) :97-10S, 1986.

324

Page 2: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Reprinted with permission from: J, Inform. Sci. 11:147-59, 1985, Copyright 1986, Elswier Science Publishers B.V,(North-HoUand): Amsterdam, The Netherlands.

The Geography of Science:Dfscff)lfrltary and National Mappings

Henry Small and Eugene Garfield

Maps of research in the sciences and social sciences for the year 19S3 were generated from the Science Cita-tion Indexm /Social Sciences Citation Indexa database using a single-link clustering algorithm. The com-bined use of fractional citation counting and recitation frequency thresholds ensured a fair representationof smaller areas. The resulting clusters of co-cited publications were aggregated through iterative computerruns at five levels, Cl (Imal) to C5 (global). Multidimensional scaling at each level allowed the matrix oflinked publications to be represented geometrically as maps. Each research front or node appears on themaps in proportion to the population (size) of its citing publications. Analyses of the distribution of researchfronts by disciplines and nations are demonstrated, ss well ss measures of expectation and immediacy forresearch in genetic engineering and, in particular, on studies of the beta-thalassemia gene. A walking tourthrough the maps at various levels coupled with content analyses of linkages reveal that, in general, largecentral arcaa represent basic research whereas smaffer peripheral specialties represent appfied research.Although in its infancy, the construction of such maps of the sciences (and comparisons of maps over time)holds the potential for a new kind of hktory of science.

Each year ISF undertakes an analysiz of a

database which is derived from a combina-tion of the Science Citation Index” (SCF )and the Social Sciences Citation IndeP(SSCP ). The purpose of this analysis is tocreate what we call maps of science whichshow the topography of science at variouslevels of aggregation. These range in scalefrom a global perspective down to the level ofthe individual investigator’s key papers. It isfair to say that the science and technology forcreating such maps is in its infancy. Earlierwork by Garfield, Sher, and Torpie I on his to-riographs and Price on networks of scientificpapersz showed the longitudinal atnrcture ofscience by exploiting citation links acroastime. More recent c~citation methods em-phasize the cross-sectional structure of sci-ence by using c~occurrence of references ata specific point in time, and hence suggest theanalogy to geographic maps.

The notion that science can be mapped wasfirst clearly stated by Derek Price4 during the1960s, though we are sure that a thoroughsearch of the literature of informationscience, hktory, sociology and philosophy ofscience would reveal significant precuraora.For example, in 1948 Samuel Bradford,5 thefamous British bibliographer, wrote:

“.. .if x represents any class, as men, its indkviduak will have many different qualities. Wemay separate those irdviduak, which are d~tin-guishable from one another ss having differentquafities, in subclasses. The symbols of aU subclasses tiff have only the possible numericalvalues O and 1. So let us draw straight ~mes, ofunit length, from a convenient point, to repre-

sent these zysnbols. The lines wifl be dmtin-guished by dkection, and, if we lie to use aspace of three dmensions, the unit lines will terminate in points upon the surface of a unitsphere. The aggregate of points wifl representthe class of beings, men, .%rdlarly, let more linesbe drawn to represent all things we wish to tslkabout. The aggregate of points, where all theselines end on the surface of the sphere, repre-sents the unwerse of discourse . . . . And so we geta picture of the universe of d~course as a globe,on which sre scattered, in promiscuous confu-sion, the mutually dated, separate Udngs wesee or tfdik about. ”

Bradford is certainly one of the precursors

of our present effort, but the idea that knowl-edge could be represented spatiaffy is implicitin many early statements. The notion of ‘dis-ciplines’ or ‘fields’ carries with it spatial con-notations, which were made more explicit byVannevar Bush’s metaphor of the scientific‘frontier.’b In social science, the concept ofmental maps, which are subjective versionsof geographic maps, has been important tosome fields such as human geography.7 In thesociology of science the concept of the invisi-ble college auggesta that communication be-tween scientists can give rise to socialgroups.a But it is in the specialized branch of

information science called bibliometncs thatthe idea of mapping science was finally real-ized.

Citation indexes, showing millions of inter-connections among hundreds of thousands ofscientific articles and books annually, seemideally suited for deriving natural maps of thescientific Iandacape. No other database is ascomprehensive either in disciplinary or longi-

325

Page 3: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Fiourc= I I‘98.7 w7P /.f.$CP C5 man; Global mao. .. . .. . . ----- -. —.

yhldography “1food Sc)ence

quantum

10

. “w’’%harmacolow n

13G crwtied(cme

chronobiology

tudinal scope as the Science Citation Indexand the more recent Social Sciences CitationIndex. In this brief paper it will be possible togive only a glimpse of 1S1’s annual mappingexercise, its results, and some practical appli-cations.

In utilizing a combination of the SCI withthe SSCI we attempted for the first time in1983 to observe connections between the nat-ural and social sciences. This mixing of disci-plines also suggests the first problem we hadto overcome, namely, normalizing for vary-ing citation rates across different fields. Thefirst step in the mapping process is to set a ci-tation frequency threshold and select onlythe most cited documents for processingthrough a clustering algorithm. Since citationrates differ across fields of science, applying asimple integer threshold biases the selectionto high citing fields—biomedical researcheliminates less citing fields such as mathemat-ics, The method we came up with, while per-haps not ideal, gives quite satisfactory re-sults. It is cafled fractional citation countingand amounts to assigning to each pubfishedpaper or citing item one unit of strength to bedivided equally among all its references.g Wecan measure the effectiveness of this proce-dure by matching the rough disciplinary dis-tribution of highly cited documents selected

by it to the distribution of source papers appearing in the index. If the percentages arecomparable, then we are sampling the field inproportion to its representation in the data-base. For example, we know that about 14%of the source items in the database are fromthe social sciences. The number of cited doc-uments in the social sciences selected by thefractional threshold was about 12%, indicat-ing proportional coverage of that field.

Other strategies are used to ensure that weare obtaining a fair representation of largeand small research areas. The cluster analyskis based on the c~citation frequency, whichis the number of times two documents arecited together by current papers. Thk associ-ation measure is normalized by dividing bythe square root of the product of the citationfrequencies of the co-cited documents, a for-mula widely used in information sciencestudies. 10 The clustering algorithm is calledthe single-link method and has the advantageof simplicity in applications involving largefiles but the disadvantage that its clusters aresometimes hlgfdy chained. To fimit theamount of chaining we use a cutoff in clustersize of 60 cited documents and a variable co-gitation level to increase the size of smallclusters. Since the limit of 60 in cluster sizemeans that some areas will be arbitrarily cut

326

Page 4: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Table 1Statistics on iterative clustering of clusters: 1983 SCP /SSCI@

Iteration:

cl C2 C3 C4 C5

1,

2,3.4,5,6,7,8,9.

10.Il.12.13.14.15.16.

fractional citation threshold(drop items cited S 4 times)

cited items selectedcitations to cited itemsmean citations per itemdistinct cc-cited pairspercent connectedco-citation thresholdlevel incrementmax cluster sizeclusterscited items in clustersc~cited pairsmean items per clusteritems in largest cluster% clusters with two cited items% cited items clustered

1.5

72077115525716.017890360.069%0,17+0,0360942050994631115.46041.6%70.7%

o

942051735454.9196 3tM0.443%0.017+0,01601386601852794.345449.3%63.9%

into pieces, we put the pieces back togetherby successive iterations of clustering. i I Thatis, in the first step we cluster cited documentsby co-citation. Then, in a second iteration wecluster clusters derived from the first step.We continue clustering the clusters of theprevious step until we obtain a single mega-cluster which includes as many of the previ-ous groupings as is possible. The structure ofthis final grouping corresponds to our globalmap of science (Figure 1). Later on we willdiscuss the structure of this map in detail.

In order to geometrically display what isreally only a matrix of objects linked togetherby varying degrees we use another techniquecalled multidimensional scaling. ‘z Imaginetaking a map of the United States and con-structing a table showing the distances be-tween all major cities. Our problem is the re-verse, We have the table of distances (or ac-tually degrees of closeness) but lack the mapembodying those distances. This is what thescaling technique provides in, of course, an

approximation depending on the number ofdimensions required. In our case, we haveused only twodimensional representations,but there is no reason why the maps could notbe three-dimensional.

The statistics on iterative clustering to fivelevels can be seen in Table 1 for the 1983combined SCI and SSCI. We initially drop allitems cited fewer than 5 times which providesa low cutoff for citedness and prevents the in-troduction of random noise. The fractionalthreshold of 1.5 translates into a variable inte-

0

1386329540237.8325993.40%o+0.035m1717296064.2650~,g~o

52,6%

o

1712194781283.5275518.9%o+0.031w141111167,935242,8%64,9%

o

1417911712794.16268.1%o+oMl1146214140%ltW%

ger citation cutoff. For example, only aboutone-half of all items cited 20 or more timesare included with a fractional cutoff of 1.5,while about 107o of items cited 5 times are in-cluded. Hence we retrieve varying percentsfor each integer citation level up to about 40times cited where 100’70 of items are re-trieved. These vasying samples help compen-sate for differences in citation rates acrossfields.

Of the many statistics generated from clus-tering, perhaps the essential ones are thenumber of clusters generated at the first itera-tion (called Cl), namely, 9,420 for 1983, andnumber of cited documents contained there-in, namely, 50,994. Thk calculates to a meanof 5.4 cited items per cluster, with a maxi-mum cluster size of 60 items. As we movethrough each iteration, Cl through C5, thefile is increasingly aggregated, and fewer andfewer clusters form until only one is formed atC5. When processing is completed, a nestedhierarchy of clusters is generated which isfive levels deep. This means that a point onthe highest level map corresponds to a map ofpoints on the next lower level, and so ondown for five levels. The number of iterationsor levels required to reduce the file to a singlegrouping of 60 or fewer macro-clusters de-pends on the number of cited documents se-lected by the fractional citation thresholdwhich are input to the clustering process.

One of the deficiencies of our presentmethod which we are working to improve isthe inability of the iterative clustering scheme

327

Page 5: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Table 21983 SCP /SSCP clusters: disciplinary dutribution

cl C2.-. .—

number percent number percentof ofclusters clusters

biomedicine andbiochemistry

agriculture,agronomy,botsny,entomology,ecology,animal andfood science

physics andmateriafsscience andengineering

chemistrysocial and

behavioralsciences andpsychiatry

mathematics andcomputerscience

geosciences

3625

69il

16%

12661099

576

468

9420

38.5

7,3

18.0

13.411.7

6.1

5.0

100.0%

506

126

295

166152

87

55

13%

36.5

9.0

21.3

12.011.0

6.2

4.0

Im.o%

to include all clusters obtained at a given levelor iteration in the macro-clusters obtained atthe next higher level. For example, as Table 1shows, about 3670 of the clusters at Cl arenot included in C2 macro-clusters and henceremain isolates. The global map at C5 in-

cludes only about one-third of the C 1 chss-ters. The creation of isolates is due to a com-bination of factors incfuding the use of a mini-mum cluster size of two and a maximum clus-ter size limit of 60. While it may be impossibleto entirely avoid creating isolates with thepresent methodx, it should be possible to ad-just the parameters to ensure that at least allof the largest clusters are included on themaps.

One of the most interesting results of ouranalysis is the distribution of clusters by disci-plines, defined more or less in the traditionalsense. We find, for example, 38. 5?’o of chss-terx at C 1 are on biomedical or biochemicaltopics, ‘7.3V0 on ecology and systematic b]ok

ogy, 18.0~0 in physics, 13.47’o in chemistw)11 .7~o in social and behavioral sciences,6.1 ‘Yoin mathematics and computer science,and 5.OVO in geosciences (Table 2), As wemove up the hierarchy from Cl, changes inthe percentages reflect the varying tenden-

C3

number percentofclusters

59

20

38

1616

13

9

34.5

11.7

22.2

9.49.4

7.6

5,2

C4

numberofClustelx

7

2

2

11

0

1

14

percent

50.0

14.3

14.3

7,17.1

0

7. I

ties of fields to aggregate or to remain sepa-rate. For example, biomedical areas declinein their percent of clusters at C3 while ecolog-ical clusters increase, showing a greatertendency of the biomedical areas to aggre-gate.

The dktribution of national effort acrossclusters is also easily determined. There arevery few which are totally dominated by a sin-

gle country. Cases of this type are small clus-ters usually on topics of particular nationalconcern. Howe\,er, patterns of national em-phasis are easily discerned. Country partici-pation in a cluster is gauged by comparing thenumber of papers from a country with theoverall participation of that country in the file

as a whole. Thus for any given cluster, it ispossible to determine whether the country isparticipating less or greater than expected onthe basis of a random distribution of a coun-try’s research effort.

The disciplinary and national data for clus-ters can be combined in the following way. Ifwe are interested in how a country is concen-trating its research effort, we can examine allclusters where that country’s participation isat, or above, the expected level and look attheir disciplinary distribution compared to

328

Page 6: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Figure 2. 1983 C2 cluster map, cluster no, 0145: genetic engineering.

92

/

239514684692693837

1010142a

1501174018912270242424@Q25382783

percmt UK > S.% = ~

immediacy > 50% = ‘@

both = 4$?

/8577

5606 32195673

?u

A \Y39 \‘%2783w

810 0900

b!!)514

6758

‘Y

4;1 ,dq \5430

h~tone variantsDNA cloningDNA bindhg proteinscDNA clonesprotein electrophorcsisbeta-thalassemia genecDNA cloning from

mRNAE-coli cloningoncogenes & cancerchromatin DNAyeast genesglob in genestransposable genesgene transcriptionhybrid dysgenesis

28082825307231423219

3810@3240374111484054385fM565356735910tLl17

~3072

KEY

DNA methylationcDNA cloning in E-colitRNA gene transcriptionprotein detectiongenetics of

immunoglobulinDNA promotersT-ceU regrdstion of B-ceUsyeast gene polymorphismgene fusionsmRNA synthesisDNA sequencesfristocompatibdity genescDNA for beta globinT-ceU idiotypesmutagenesissimian virus-40

6485675867647023733279187997W’tO813884188546855285778784

888989CQ

4oa

.—.

human gene cloningoncogene productsmammacy tumor virusgene cloninghuman genomic familiesnew gene elementstumor antigen genegenetics of antibodeshuman globm genesRNA transcriptionprotein pufi]cationnucleotide sequencesribosomal RNAprotein synthesis &

expressionherpes virus genegene expression

world science, as represented by the file as awhole. This was done recently for the UnitedKingdom and it was found consistently for alllevels that the U.K. was below world aver-ages in physics (-4’70) and chemistry (-37’o),while it was above world averages in biomedi-cal research ( + 5’7’.) and ecology ( +27’0) (seeTable 3). Whether this view is shared by U .K.scientists or whether it reflects a consciousscience policy in the U.K. remains to be seen.

Country concentration can be seen in indi-vidual cases by coding the cluster maps. Thecircle around each node on the map is prwportional in area to the size of the cluster as

measured by number of citing papers. Wethen indicate by shading those areas whereparticipation from a given country (in thiscase the U.K. ) is greater than expected. Forexample, on the above map showing geneticengineering areas (Figure 2), research on thebeta-thalassemia gene (#1010) is one ofeleven areas in wh]ch the U.K. has a repre-sentation greater than expected (18’% actualversus 8% expected). The expected rate is

simply the percentage of papers contributedby that country to the whole file.

Another kind of coding is by immediacy ofthe cluster. Since each cluster or macr&clus-

329

Page 7: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Table 31983SC1’r/SSCP clusters: United Kingdom disciplinary di.stributiona

cl C2

number percent number percent

of of

clusters clusters

biomedicine and 1520 43,0b 256 43.9bbiochemistry

agriculture, agronomy, 348 9,8b 60 lo.3bbotany, entomology,ecology, animal andfood science

physics and materials 5C6 14.3C 96 16.5Cscience andengineering

chemistry 366 10.3C 54 9.3Csocial and behavioral 377 10.7C 59 Io,lc

wiences and psychkttrymathematics and 213 6.tY 32 5.5C

computer sciencegeosciences 208 5.9b 26 4,5b

3538 103,O% G3 Itm,l%.

a counting clustem having greater than expected (8%) U.K. participation.b greater than expected compared to world.

c less than expected compared to world.

ter consixts of a set of cited documents ofvarying age, an immediacy measure can bedevised to reflect how recent this core litera-ture is. The percentage of core documentspublished within the last three years is onesuch measure. On Figure 2 each area havingan immediacy greater than or equal to 50% isindicated by a different shading of the circle.There are twelve such areas on the map, oneof which is beta-thalassemia, with greaterthan expected U.K. participation. (Five areashave both high immediacy and high U.K. par-ticipation. ) To show how immediacy is deter-mined, the C I map for beta-thalassemia isshown in Figure 3, with circles of varying sizerepresenting cited documents, scaled to re-flect relative citation frequency and labeledwith first author and publication year. Thedocuments published in the last three yearsare shaded, and the number of such shadedcircles divided by the total gives an immedi-acy of 737. (19/26).

Returning to the global map, we will nowtake an excursion starting with the C5 level ofscience to illustrate how the system works.From any node on the global map we canzoom down to the next lower level and ob-serve its structure. Positioning at the largecentral circle representing biomedical andphysical sciences, the map for thk macro-cluster is displayed (Figure 4). We see three

C3

number percentofclusters

40 50.6b

12 15.2b

8 10. IC

5 6,3c6 7,6~

3 3.8C

5 6.3b

79 99,97’”

major regions on this map: biomedical re-search on the left, chemistry in the middle,and physics and mathematics on the right.Thk ‘in between’ position for chemistry hasbeen a consistent finding over the years wehave performed this analysis. Focusing on thelarge circle labeled ‘molecular genetics andimmunology,’ we move to the next lower lev-el and see its map (Figure 5). Shown here isthe genetic engineering area (#145) and alarge immunotherapy area (#31 ) to its right,the site of work on AIDS and monoclinal an-tibodies. Other branches of this map concernthe use of x-ray crystallography to study thestructure of biological molecules (on theright), the biochemistry of connective tissues

(on the left), and isolation and purificationmethods (at the bottom). Zooming in on thelarge genetic engineering area takes us up tothe map shown in Figure 2, containing thebeta-thalassemia cluster, Figure 3.

Moving up again to the global map, we candescend into the circle left and slightly belowcenter representing the social and behavioralsciences, and systematic biology (Figure 6).The map for this region is clearly divided intoa social/behavioral region on the left and abiology/ecology region on the right. Span-ning these regions is work on multivanate sta-tistics and classification. Thk somewhat ten-uous link is the main bridge between the so-

330

Page 8: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Figure 3. 1983 C 1 cluster map, cluster no, 1010 beta-thalassemia gene,

cial and biological sciences, at least in 1983.It is interesting to note that thk conjunctionof social and biological sciences is mirroredin the administrative structure of a majorU.S. funding agency, the National ScienceFoundation.

The C5 map holds special interest becauseit is the most inclusive mapin the hierarchy,containing roughly one-thhd of the Cl chss-ters within its 14 C4macro-chrsters. First wenote that it is a combination of a few verylarge areas and several smaller ones. Thesesize differentials reflect the essential hyper-bolic nature of all bibliometric distributions,including the distributions of cluster size.Second we note that the pattern of links ispredominantly center-periphery, with thelarge central area serving to link the varioussmaller ones around it. Very often the natureof these peripheral areas is more applied andless basic than the central areas, although thkis not always the case. Examining the subjectmatter arrangement, we note that there is arough division between physical and life-sci-ence areas on either side of a line drawn fromthe lower left to the upper right-hand corner.Only relative locations of objects on thesemaps are significant, however.

The scale and inclusiveness of areas on theglobal map vary widely. The two large centralareas (#1 and #2) are multidisciplinary incharacter, encompassing biomedical, chemi-cal, physical science, and mathematics in onecase and social and behavioral sciences, sys-tematic biology, and ecology in the othercase. Most of the medium-sized areas (agri-culture #6, earth science #4, materials sci-ence #3, pharmacology #7)correspondtofa-miliar disciplinary groupings, with the excep-tion of crisis medicine #5. Thk area mightalso be called emergency or trauma medicineincluding aspects of intensive care, and itsemergence shows how an automatic classifi-cation method can create new categorieswhich reflect contemporary concerns. Thesmallest areas are more akin to specialtieswhich, forreasonswe do not yet fully under-stand, have remained separate from thelarger disciplinary groups, yet have links tothem. Some of these small peripheral areasare applied science topics with connectionsto basic science.

The reasons for the links between themacro-clusters are as important as theirspecific contents. These links are the threadswhich hold the fabric of science together. We

331

Page 9: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Figure 4. 1983 C4 cluster map, cluster no. (XJO1:biomedical and physical science.— .

%

13

41

125

23

68

91316

1718

19202122

242627

-----47

I

cancerimmunr-complexes &

autoimmunityneuroscience

protein stmcture &dynamics

physics, generalthermodynamicsmolecular genetics &

immunologyelectrochemistryassays for carcinogens &

mutagenssurface physicsOrganOmetaUlc complexescomputer programmingcoagtdation & platelet

dkrmderstransition metal complexesendocrinologyreaction kinetics

--@a

K ‘7’:3%7

28323840414243495365666871747787

94

104

,/D135

P21

6513

130

28

\ /’

KEY

group theoryphotochemistrymolecular orbhal theoryparasitic infectionspolymers & glasseseardlology: diagnosisBanach-spacesalgebraic topologycardIOlOgy: treatmentcomputational algorithmsnuclear physicsquantum field theorycritical phenomenainformation theoryfiberopticdiet, metabolism & ion

transportprostaglandms &

inflammatoryresponse

aUOgraft survival &transplantation

109110

116118119121125127130133135137147153156158

166

control theoryimmunoglobulin &

secretory immunesystem

pulmonary dwaaemacrocyclic compoundsviraf dwesnatural productshydrodynamics & rheologynumerical methodsuncertain dynamic systemsC,3 NMRheteroaromstic compoundsgraph theo~ algorithmsmetabofic enzymesstochastic processesliver metabolismelectronic properties of

solidsfipids & tosicity of foods

can analyze the nature of these links by sam-pling papers which cite core documents in theadjacent areas, and then examining the por-tion of their texts in which the specific refer-ences are made. The technique, known aa ci-tation context analysis, 13has been carried outfor aU links shown on the global map (Fig-ure 1) and the phrase best describing thenature of the fink is indicated. In addition,certain links on the map have been ‘doubled.’

These links define a minimal spanning treethrough the network beginning with the foodscience node in the upper right. We will fol-low this path to organize our description ofthe kinds of links which tie together theselarge areas.

Food supplies are subject to bacterial con-tamination by various organisms, for ex-ample, Salmonella, and the fink to microbiol-

ogy is due to the use in food *cience of aaaays

332

Page 10: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

;ure 5. 1983 C3 cluster map, cluster no. CQ16:molecular genetics and immunology.—.

KEY

31 T & B cell subsets in 365 lipoproteins of bacteria fM8 atherosclerosisimmunotherapy 401 DNAsecondsry structure 874 anti-tumor activity of T&

38

42

44

57

64

145246

249

344

345

NMR of biOlOgiCalsystems

structnre of RNAsprotein biosynthesisiron-overload disordersx-ray& NMR

conformation analysisgenetic engineeringreceptor binding of

hormonesangiogenesis by vascutsr

exsdothelial celfsx-ray crystaUographyof

orgmometalliccomplexes

contractile proteirrs

& binding403 Epstein-Barr virus454 bone marrow

transplmrtation487 tumor promoters547 NMR & protein structure575 histocompatibility genes577 ribosomal-RNA &

proteins622 nuclear-Overhauscr-

enhancement659 insufin receptors &

disbetes748 tumor virus expression793 culture growth of

fibroblasts

B cells893 glycossminoglycans &

proteoglycans940 isolation & pufi]cation of

proteins948 colfagen & basement

membranes9% suppressor cell activity976 automated analyses of

amino-acids1165 aflograft-rejection &

monoclonsl antibodes1361 glycoprotein biosynthesis1376 nrised-lymphmyte

reactivity

developed in microbiology. In this caxe thelink is the result of one field using the meth-ods of another. The techniques of microbiol-

ogy are ako important to orthopedics in thediagnosis of bone infections, caused by or-ganisms such as Staphylococcus aureus, andtheir treatment with antibiotics. Orthopedicslinks to the large central area of biomedicaland physical science, and thk is due to in-terest in various kinds of bone analysis whichhave found application in biomedical re-

search areas such as platelet deposition inbone marrow and rheumatoid arthritis.

The central area of biomedical and physi-cal science has finks to nearly all peripheralareas on the map. The ‘double’ links will bediscussed here. Its link to pharmacology is inpart due to basic research on the active trans-port of ions across cell membranes and thesubstances which control that procesx. Itslink to chronobiology is the study of how cir-

cadian rhythms control the release of sub

333

Page 11: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

Figure 6. 1983 C4 clus:er map, cluster no. 0002: social, behavioral, and biological sciences.

stances in the body such as the hormone pro-lactin. Its link to thermal biology concernshow blood flow regulates body temperature.

The link of the central biomedical andphysical science cluster to physical topics be-gins with chromatography at the top of themap. This is due to a method in analyticalchemistry, high performance liquid chroma-tography, used to study the concentration ofdrugs and toxic chemicals in the body. An-other physically oriented link is to quantumoptics on the Ief t, which concerns basic and

applied aspects of multiphoton interactionswith matter, specifically as these relate to thedevelopment of high powered lasers un-doubtedly relevant to recent ‘star wars’ re-search. Back in the central region of the map,the link from biomedical and physical sci-ences to social, behavioral, and biologicalscience is the common concern with complexsystems, as manifest in economic behavior,models of the brain, and the statistical me-chanics of so-called ‘self-organizing’ systems.Complex systems appear to be an area ofshared interest among social, physical, andbiological researchers.

We will now switch over to the social, be-

havioral, and biological sciences cluster inthe center and follow its links to other areas.

First we arrive at crisis medicine to the lowerright, which is linked to social science by dis-cussions of policy-related health statistics, in-cluding the incidence of diseases and acci-dents, use of medical technologies, andhealth care costs. Moving toward the leftfrom social, behavioral, and biological sci-ence we arrive at agriculture via the ecologyof plant populations, this link having a theo-retical emphasis on the biological side and an

applied emphasis on the agricultural side.Moving left from agriculture we arrive atearth science via the dynamics of soil, includ-ing erosion, decomposition of biomass, andsediments. Along thk link the emphasis shiftsfrom the short term perspective of agricul-ture to the longer term (geological) perspec-tive of earth science. Finally, the path leadsfrom earth science to matenats science by aconcern with wave motion in solids. Theshtiting interest along this link appears to beone of scale: earth science concerned withmotions of solids on a large scale (e.g., tec-tonics), and materials science concernedwith behavior of solids on a smaller scale.

This walk along a minimal spanning treethrough the global map has shown that thelinks which bind the research areas have aspecific, content-related character and rep-

334

Page 12: EUGENE GARFIELDgarfield.library.upenn.edu/essays/v9p324y1986.pdf · 2001-03-01 · EUGENE GARFIELD INSTITUTE FOR SCientifiC IN F0fIMA710N” 3501 MA RKETST, PHILADELPHIA PA 19104

resent substantive shared interests. It shouldbe noted, however, that the nature of co-cita-tion links between such macro-clusters isquite different from the finks encountered atthe document level, for example, on a Clmap. At this low level it is possible to isolatesingle concepts and specify the nature of eachdocument-to-document link with little ambi-guity. At the macro-structural level, howev-er, multiple reasons for association are therule rather than the exception, and general-ization is more difficult.

In the previous description we have seenthat methods sometimes are the basis of thelinkage, and sometimes empirical data.Occasionally one area provides theory, andthe other area applies it to a practical prob-lem. It is often difficult to ascertain the ‘direc-tion’ of dependence or knowledge transfer,i.e., whether one area is the source, the otherthe recipient of information. In most cases itis possible to see a mutual benefit to the rela-tion, an exchange of physical or intellectualresources. In some cases it is possible to dis-cern a differential concern or interest acrossthe link regarding the topic: e.g., basic versus

applied, long range versus short range per-spective, large scale versus small scale, socialversus physical, methodology versus data,etc.

Some general findings of our mappingwork are that research effort is not uniformly

distributed across fields or countries, butrather obeys well known hyperbolic laws. Instructure this is reflected in few rather largeareas and many smaller ones, often arrangedin center-periphery patterns. Thk is not adramatic finding since we have known forsome time that scientometric distributionsare almost invariably hyperbolic. What wedid not appreciate was how maps of sciencewould have to reflect this mix of few large andmany small areas in their structure. We alsofind that peripheral areas are often those with

applied or technological emphasis, while

central ones are usually academic fields ofbasic research.

We have also found that by analyzing link-ages between areas, it is possible to discernthe nature of the relationships among them,in terms of scientific problems involved, theresources exchanged, and the different per-spectives of the scientific groups involved.The next step in our mapping work will be torelate and coordinate maps derived from dif-ferent time periods, so that changes in themicro- and macro-structures may be ana-lyzed. Then we will be doing a new kind ofhistory of science.

Releremces

E. Garneld. 1.H. Sherand R.]. Torpe, The U,. of Ci-mtion Data (n Wn”fing the His(ory of Sctence, 1S1Monograph (Institute for Scientific Information, Phil-adelphia, 1964).D.J. DeSoOa Price, Networks of scientific papers,Science 149 (1965) [email protected]. Smalt, C*citation in the scientific literature: anew measure of the relationship between twodocuments, J. A mer Soc Information Sri 24 ( 1973)265-269,D.J. DeSofla Price, The Science of Scientists,Medical Opinion and Review / (10) (1966) 88-97.SC. Bradford, Documenta[ton (Crosby Lockwoodand Sons, London, 1948) 137.V. Bush, Science-the endless frontier, A report tothe President on a Program for Postwar Scientific Re-search, July 1945.P, Gould and R. White, Menfa/ Map, (Harmonds-worth, EnSland: Penguin Books, 1974).D. Crane, Invisible Co/leRes. Diffusion of Knowledgein Scientific Communities (University of ChicasoPress, Chicqqo, 1972).H. Smaff and E. Sweeney, Clustering the “ScienceCitation Index’< using co-citations. 1. A comparison ofmethods, Scientome/rics 7 ( 1985) 391-402,G. Salton and D. BerSmark, A citation study of com-puter science literature, IEEE Tmns. ProfessionalCommunica~ion. PC.22 ( 1979) 146-158.H. Smaff, E, Sweeney and E. Greenfee, Clustering the“Science Citation Index” using cc-citations. 11. Mapping science. Scientometric$ 8 ( 1985) 321-340.J.B. Kruskal. Multidimensional scaling by optimizinggmxfness-of-fit to a non-metric hypothesis, Psycho.me(rika 29 ( 1964) 1-27.H, Small, Citation context analysis, Progress in Corn.m unication Sciences 3 (1982) 287-310.

335