the control and communication of bio-medicai...

13
The c ontr ol and communicat ion of bio-medicai information t*l COBLA NS Formerly / Je partment. Assor.iution of Speciul Libruril's nnd lnf ormation lJureaax (Asl ib) , L omlon , Guai Hritain 1t is fashion able to s p t• ak of a cnsts in Jo c um c ntation. Superficiali) it looks Jikt· a p rob lem of shecr quant i ty arising from th e expon <' nt ial gr owtb rat<· of tbe lit e rature. Th <' great Hungarian biochemi st Albert Szent-Gyor- .gyi, who was award ed tbc Nobel prize for hi s work on v itamin C, pr ovided a deli ghtful definition . «A dr ug is a s ubs tanc e wh ich, if iujectcd into an ani ma!, product •s a papcr ». Each yca r mur e drugs an· injec tetl into mor e animals and so therf : are mor e papr rs. 'l'b ere are from four to six thou sand periodicals dealing mainl y with bio-medi cine ali over tbc world. Th eir bibliographic al co ntro! is hant ll ed by abo ut 450 ab stract ing and indexing sc rv i< :es, but of thesonly 6 an · a ll -embracing and airn to be exba ustive in coverage. None tb e less bull is only t he superficial cause, it is quali ty rath er than qua n tity. B io- medicine has a vast spread, c ncompassing an d in tertwini ng wi th a large ran gt· of scic nt ifi c di sciplinrs and teehnological ap plic ations. These dnft•rent bu t related suhjcct fi elds mu st be br ought togcth cr and anal ysed by classifi- cation and i ndcx in g and this is whcre thc r cal problems arise . H O'-\' can a doc umcn talist, an i nd exer, a s ubje ct anal yst (whatever h e or sht• is called) spcci fy the contents of a scientifir pap cr so that in any subse qu nt sear ch , pe rha ps y cars aft cr war ds, in so me distant country, what is rel••vant ca n be ret ricved by a diffcrc nt pc rson anywhcre in tht· world? ldeally we would nccd an i ntern ational sta nda rdi sed mcta-languag e, as nat u ra! languag•· is so full of traps and pitfalls. In rcality thi s approach is not a pr acti cal proposition for a nu mber of rcasons, partly b ccaus•: of thP way iu whicl1 sc ie nti sts wo rk and communicatr and pa rtl y becausc of costs. (*) Confereuza tenuta presso l'ht i tuto Superiore di Sauit11 , Roma. ill2 uowmbrc 1969 A"" · 1st, SuJJcr. Su 11ittl (1070) 6, ·.

Upload: lyngoc

Post on 10-Jul-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

The control and commun ication of bio-medicai information t*l

~~ ~:RB t;RT COBLA NS

Formerly Re.~earch /Jepartment. Assor.iution of Speciul Libruril's nnd lnformat ion lJureaax (Aslib) , L omlon , Guai Hritain

1t is fa shionable to spt•ak of a cn sts in Jocum cntation. Superficiali) it looks Jikt· a problem of shecr quantity a rising from the expon<'ntial growtb rat<· of tbe literature . T h <' great Hungarian biochemist Alber t Szent-Gyor­.gyi, who was awarded tbc Nobel prize for his work on vitamin C, recenti~ provided a delightful definition . «A drug is a substance which, if iujectcd into an anima!, prod uct•s a papcr ». Each ycar mure drugs an· injec tetl in to more animals and so therf: a re more papr rs.

'l'bere a re from four to six thousand p eriodica ls dealing mainly with bio-medicine a li over t b c world. Their bibliographical contro! is hantlled b y about 450 abst ract ing and indexing scrv i<:es, but of t hest· only 6 an· all-embracing and airn to be exbaustive in coverage. None tbe less bull is only t he su perficial cause, it is qualit y rather than quantity. B io­m edicine has a vast spread, cncompassing an d in tertwining with a large rangt· of scicntific disciplinrs and t eehnological applications . These dnft•rent but related suhjcct fi elds must be brought togcthcr and analysed b y classifi­cation and indcxing and this is whcre thc rcal problems arise. H O'-\' can a documcntalist , an indexer , a subjec t analyst (whatever h e or sht• is called) spccify t he contents of a scientifir papcr so that in any subsequt·nt search , perhaps ycars aftcrwards, in some dist ant country, what is rel••vant ca n b e retricved by a d iffcrcnt pcrson any whcre in tht· world? ldeally we would nccd an international standard ised mcta-language, as natura! languag•· is so full of traps and pitfalls. In rcality this approach is not a practical prop osit ion for a nu mber of rcasons, partly bccaus•: of thP way iu whicl1 scientist s work and communicatr and partly b eca usc of cost s.

(*) Confereuza tenuta presso l'htituto Superiore di Sauit11, R oma. ill2 uowmbrc 1969

A"" · 1s t , SuJJcr. Su 11ittl (1070) 6, 138-1 ~ · .

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

lially )wt h

yor· ntly ·t e!l

. t t>! l

. ith H et! are ulk {io-

-l~e

Hl t

i ti-an

gc .d t h

9

COBLANS 139

Traditional index.ing uses mainly subject headings in natura! language but also a range of conventional dcvices, understood only by tbe narrow subject spccialist.

a) Systcmatic names (mainly in cbemistry and ta.-xonomy). Tbey are part of a symbolic language which is valuable as it is fairly standar· dised. However, in certain subjects like cbemistry there are many variant forms wbich gr eatly complicate retrieval. Tbus tbere are 21 different, but ali correct, systematic names for « aspirin», althougb tbere is a unique structural formula for the compound;

b) Trivial names. e.g. aspirin;

c) Proprietary names. e.g. DDT, oylon, Microcard;

d) Acronyms. e.g. A.T.P., D.N.A.

Tbe inherent difficulties of index.ing can be illustrated by an example of a fi ctitious r esearcb proj ect of a pharmaceutical manufacturer on « The side effects produced by aspirin used as a drug». In r etrieving aU docu­menta which refer to aspirin and its biologica! action tbere would be much « noise » (papers in which tbc relevance is minima!) and some signifìcant papcrs would be missed. Thus tbc spccification « aspirin » would not nor­mally puU out tbose papers dealing witb tbe more generai leve! of aspirin­like drugs, or lower level constituents. Obviously if tbere is to be searching of free text (in centres using duplicate magnetic tapes based on computer stores) ali the 21 systematic names for aspirin would bave to be program· m cd in.

In practice an experienced documentalist would probe by discussion with tbc project staff and break down tbe field identifying related subj ccts that bave a bearing on the inquiry. For example tb e special relevance of pregnancy migbt be established and would lead on to retrieving papera dealing with women, marea or rats; possibly under conditions of malnu· trition, mainly in the Tropics. This means searching far b eyond the con· ventional literature likely to be available in tbc documentation centr e, going to look at and borrow government reports and tecbnical reports of surveys carried out by WHO or FAO in say, India, or Biafra, or even the southern States of tbe U.S.A. There are trails whicb must b e blazed in many directions and at many different levcls going back in time even to the end of tbe nineteentb century. Evcn then only a fraction of the papere finally found may bave some direct v alue. lt thus bccomes clcar why the subject control and specification of the papers wben they appear and are indexed is of sucb basic importance in all IR systems, wbethcr tbey are manual or mecbanised.

tl1m. Ist. Suve-r. Sanità ( 1070) l, l:lll- 150.

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

14(1

Wbat does th r hibliographical contro!, the documenta tion of the litr­raturc of bio-medicinr inv olve i11 practice? Broadly thc problems cau be div id ed into 2 classcs - the published and the scmi-publish ed or unpub­lish ecl matcrial.

l. Published. - Tbis includes books aml parnphlets, cspt~ciaUy tbc pmcecdings of conferenc<·s, and p eriodicals or seria ls - as it w<· rc, the ensem­blc of « papers» or communications to Jearu ed societi<·s. (Tt bas bren t•sti­matt·d tha t from the b eginnings iu t b c scventeenth ccntury some lO milJitm scicntific papers bave bccn publish ed (') and tb c prescnt rate of growth is ahout 6 <l'o annually. lt is rathf· r difficult t o cnn estimate bo w many of thcse new papers cach ycar might be of interest to bio-medicine) .

li. Semi-published. - a) Tbe research and developmeut report was originally an Amcrican expcdient in World War Il , but it has com e to stay and brougbt a grea t deal of confusion into bibliographical control (2·3) .

Since then about a million bave b ecn issucd and at present several bundred tbousand new ones appear each year (1) . As, in thc main, they represent the « shadow» or« undergroUIId» litera turc of tbe t echnologist, thcy do no t bave a major importance for bio-medicine exccpt in thc fi elds of nuclear medicine, radiobiology and instrumentation.

b) On the other band thc widespread dissemination of « preprints» (tmpublish ed wben sent by authors and perhaps never to b e accepted b y a reputable p eriodical) is very common in the biologica} seiences. In the early sixties lnformation Exchangc Groups (IEG) were founded to formali se tbe sending out of preprints tbrougb a cent rai clearing house in thf' National lnstitutes of Health . Thus in 1966 more than one and a half milliou copies of thesc preprints were distrihuted (1). However editors of p eriodicals and otbers finally managed to stop this flood of unrefereed, uncontrolled materia! wbicb at b est should bave tbc sta tus of correspondenc<· between friends rather than organised distribution (4

•5).

Tbe n ext question is bow this mass of incoming raw materia! is organised so that its content beeomes available by selcction, processing, indexing, storing and dissemination, to m eet tbc special interests of its potent ial users. Ther e are basically t wo ways in which documenta t ion centres work: a) through printed indexcs and abstracting periodicals, and b) in-bousc indexing etc. r ecorded mainly in card fìl es, wbctber tbey b e standa rd library cards or punched cards. In recent years duplicate magnetic tapes storing indexed information ean b e obtaincd from some of thc large subj ect serviccs bnt this is just anothcr form of a) . ldeally ali t hc literature which bas b een published, i.e. in p eriodicals, would b e covered iu a) and would rcpresent tbe bibliograpbical record . of wbat is valuable in sciencc an d

A'" ' · l at . SIIJJ l'f, SoMtlì (11170) 6, 188- l~(l.

r

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

t e· ho' •,l!-

ti·

;w

~ is

·.y

:). d

t

t

r

))

l '

e

COBLANS 141

technology. Undcr b) ali the ephemeral and semi-publishcd materia! of clirect interest to the centre, which varies from time to time and from piace to piace, would be captured, worked up for current awareness and fìnally ùiscard ed by regular weeding.

The printed index is cheap, well suited to retrospective searching through quinquennial indexes, but requires trained information workers to exploit it thoroughly, since each index is a law unto itself in its organisation and method of indexing. Furthermore printed indexes are slow and in certain fleids, e.g. the pharmaceutical industry, delays of 3 to 6 months from the time of the originai appearance of the literature, are quite unac­ccptablc. Thus already in 1954 tbe first reported experiment in mechanised IR was carried out on an lBM 701, at the U. S. Naval Ordnance Test Sta­tion at China Lake in California, for indexing 1400 items. (It is intercsting to note that the first commerciai computer, the Eckert-Mauchly UNIVAC I had oniy become available on the market 3 years previously. Further by 1964, that is in ten years, the expenditure on computers for IR had reached the figure of 200 million dollars in the U.S.A.). The attractiou of the computer is speed and the hope that indexing can be simplified by using more natural language terms as entry points into the index.

The KWIC Index developed by Luhu ( 6) certainiy greatly speed ed up

indexin g since it used only the informative words in the title. The fact that tbe Americau Chemical Society has been producing the monthly « Chemical titlcs» since 1961 shows that this is the cheapest way to use the computer and that there is a large body of users who are fairly satisfìed with this « quick and dirty» tool. However, it must b e realised that KWIC is cheap only in comparison with other computer methods for IR (1). To overcome its basic weakness, that its indexing depth is limited to the inade­quacies of a title, the Centre d'.€tudes uucléaires at Saclay started produc­ìng « Physindex» (8

) in 1963. Here additions were made to the title where necessary so as to incrcase its informative content. But it is doubtful whether this extra effort is worth while in t erms of the additional cost of human indexers spending time on evaluating each tìtle for adequacy (1). Partly because of cost, but also for other reasons, Physindex ceased publi­cation in 1967.

A more refined but much more expensive approach to the KWIC techuique was pioneered by the American Chemical Society in its« Chemical­biological activities» (CBAC) which appeared in January 1965 as a fort­tùghtly periodica!. This is entirely compiled and printed by computer covering some 600 periodicals in depth for any work on tbc interaction of organic compounds (drugs, pesticides etc.) with biologica} systems (mam­malia, plants, micro-organisms). The rclevant parts of each paper are re­writtcn in the form of a digest , which togcther with the bibliograpbical

In 11. I st. SU/lB T. San ità (1970) l, 1~150.

142

details, cbt·mical formulae and structures pro' ide inpul tu tlw computer. After ordt~ring and pruccssiug tbt· machin c prints out the di~ests a nd a scri t·to of indt·Xel> (author, molccular formula c and K W1C in d ex of t bc sta t t·· mt•nts in tht• dige t), wbich a re then reproduced h~ pboto-oft'set. TlH'~t·

indexe!l an· a utoroa tically cumalett•d <·very six rooutb!'. CBAC is indcctl a signi1icanl example of wha t can be clone hoth for document and data n·trieval usìng full computerisation in a highly spccialised 6eld. Th <· sub­scription pricc of the 26 issues per year is 1100 doUars and althou~h tbi~;- m a~

sccm t o ht· high, it rcpresent~; a non-profit situation which proviJes a real­istic estimate of what computt·r operations cost . In addition CBAC dupli­cate• tapes an· available on suhscription and t bis mcans tbat fas t scarche~

can be made on thc ftùl ston· of the wholc file sincc 1965 . Howcver it is in tbc production of au index for tbc whole of hio-m ed­

icin ~": that tbc real t est for mcchaniscd rctrit·val is poscd. Index .Mcdicus wa!> st a rted in 1879 and the ~ational Libra ry of Medicine (at tha t time tbe Library of tbc United States SurgPon-Gcnera r s Office) bas ha<l a s ta tu· tory rcsponsibility for its monthly issue and its cu mula tions. By the late tìftics its Director , Dr. Frank Rogers, bc~an to rcalis t> that t he ra te of gro\\·t h of tb t> litc.>r aturt•, the need to ext end covera{!e and the long dd ay" in tht· appcaranec of the annual index werc crucial factors dcmandin~ a ncw approacb . Thut: in tbc carly s ixties a semi-mech anised sys tcm bas<·d on puncbeù cards and the List omatic cam era was int roduced C'). but i t soou became clca r that it would uot b e ablc to cop c with the quartcr of a miJlion cntries p er annum cstimated for around 1970. And so the stage wa8 sct for the plauning anù execution of ME DLARS l - a computer-aided system of compilation and prù1ting baseù on input froro buman indcxing.

Tbc first full y mecbauised issu c of lndex Medicus was that of Augusl 19M , a veritablc incunabulum of the twentieth century! From tbc input of bibliograpbical data and index t erm for eacb article, produced by tapt' t yp e-writer, to tbc final printcd page with its justified roargins, hypheoation , rang<· of t ype faces an d layout, aH processing an d sorting an d collation ,,·as done by computer which finally }lTOÙuced a t ap l' fo r driving the Photon-900 Computer Phototy pesetter. This Graphic Arts Composing E q •ùpment (GRACE) pro\'ided the film page from whicb the printcù page wa~ m anu­facturcd by photolithograpby . Tbc wbolt· cyclc of compila tion and printin~

was r educed for cach montbly issue from the form er 22 days to 5 days (10· 11). A few fi~ures will sbow tbc size of th<· problem an d t b c cnormous achie­

vemcnt at tbc mcchanicallevcl. Thus for the full ycar of 1966 thcre were 164,000 refcrcnccs from 2400 pcriodicals (55 % in Englisb and 45 % in other languagcs) . Tbc cumulation for 1966 was containcd in four v olumcs (J author, 3 subj cct) of about 7000 pages available for distribution in F cbruarr 1967. The capitai cost for designing and installing t h(' wbol(' system was

.4 1111. l lt. SUJJCT. ~.,,. ,,,; (19711) 6, 13$-J:J\1.

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

COBLAN!I 143

3 million dollars in hardware (Honeywell computer and GRACE) and 30 man-years of programming effort, costing something around balf a million dollars. In the MEDLARS store, starting with the entries for 1963, therc will b e some one million entries (with on the average lO index terms per cntry) some time early in 1970, an accession rate of 6 r eels of magnetic tape per annum.

The primary responsihility of the N ational Library of Medicine was the production of the monthly Index Medicus and its cumulations. But it soon became clear that there were two important by-products which justifìed its name - MEDicai Literature Analysis and Retrieval System - both for the selcctive dissemination of information (SDI) and retrospective searching.

l) Recurring bibliographies. - These are specialised indexes selected for a limited audience from the common MEDLARS base. At input all the references that are of concern to, say dentists or rheumatologists, are tagged in such a way that the computer can print them out selectively. This means tbat medicai specialities t bat bave had no regular indexing ser· vice can at a purely nominai cost bave ali the relevant items appearing in Index Medicus. In this way there are now some 10 different indexes produced regularly. A typical example is the quarterly « Interna tional nursing index» (with its annua} cumulation) which started in 1966 and for the first time provided an international service in this field.

2) Demand searches. - P erhaps the most significant resource provided by the growing store of indexed references is the answering of specific que· stions by refercoce r etrieval. Thus tbe figure of 4000 searches in 1967 was almost doubled in 1968, partly as a rcsult of decentralisation. The ccono· mics of the hardware are attractive. One reel of magnetic tape costs 50 dol· lars and copying from an existing tape is a simple operation. Thus for domctbing like 1300 dollars per annum a duplicate sct of tapes can be made available.

It was soon realiscd that the rcal cost of searcbing lay in tb c highly cxpericnced search analysts needed for formulating and programming the search questions. Therefore this load on the NLM was reduced by tbc creation of ll regional centrcs ( '2

) in th e U .S.A., each with a full set of tapes. The next step, extension to int ernational participation, follows logically, but raises a number of problems, technically, economically and politically. By tbc end of 1966 MEDLARS tapes were b eing exploited (on a free scrvice basis at first) in the U.K. in a joint experimcntal project by the University of Ncwcastle Computing Laboratory and the National Lcnding Lihrary. Around a 100 questions per month were being processed (13

). Similarly Swe­den, for the whole of Scandinavia, and France bave completed negotiations

.l n n . l st, SupeT. Sanità (1970) '· 138- 1G:l .

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l l

144 CO:-iFJ::RE NZt. E ~EMI !> A IIl

to obtain t hesc tapes. Also t he N Ll\1 has set up an internatioual regional centrc in Brazil to serve the wholc of Latin America. Tbc W or l d Healtb Organisation in Geneva will soon be anothcr E uropean centre a t which searching on thes•· tapes will be carried out.

At a first glance a ll tbis seems to be tbc p erfec t answcr tu tbc problem~>

of tbc documentat ion of bio-m edicin e. Nevcrthelcss it is bccorning ch•ar tha t thcre a r 1• formidabJc difliculti•~s , both at tht· hardware and admirùstra ­tivc level~ and, m ore seriously , the unsolved intellectual problem.s of cl~­sification and indcxing. Ou the hardwarl' sidE" t here are t b t• chaotic con­dit ions arising from tbe lack of standardisation and compatihilit y . T h•· N LM cbosc for v•~ry good reasons to equip ME DLARS l witb a Honeywell comput•·r. This m eant tba t , when regioual ccntrc~ werc s tarted , the ta pe:< ba d to be converted to IBM . I n tbe U .K . t bc project ouly had acccss tu an English E lectric KDF9, an d so another conversion bccamf• ncccssary . Each conversion inevitably produces losses in content and involve~>

considcrable programming costs . F ortunately tbesc difficulties are conti­nually b ecomiug lcss serious.

However, tbc prospects on the « liveware » s idc, the intellectual effurt, are far lcss encouraging, bccause tbc « man and woman»-power nceded are still grossly underrated . T o appreciate the impor tancc of t h•· human analysts (indcxcrs and searcbers ) it might be u seful t o look at a t ypical search procedure as outlined in an artici c by Harley an d U,ar.aciough c•) . Tbc searcb ques tion is for papers on the « metabolism of sulphur-containiug amino acids in tumours», prcfera bly in man, but also in primates and othcr vertebra tes in desccnding order of interest. Only papers in English, Freuch or German are wanted. Ali indexing is based on thc controllcd vocabulary of the NLM, M(•dical Subject Hcadings (MeSH ) which contains some 7000 t erms which are revised and supplemented as n ecessary each year. In t bc first piace t erms relevant to the question must be selcctcd.

M l Methionin(•

M2 Cystine

M3 Cysteinc

M 4 Ml or M2

M 5 amino acid

M6 Primatcs

M 7 Apcs

M 8 Monkeys

M 9 M6 or M7

or M3

mctabolism

or M8

L l E nglish

L 2 F rencb

L 3 German

L 4 Ll or L2 or L3

A n n. h t . Supc.-. Sanitd (1970) 6, IS~ l5o .

r

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

COBLAN!I 145

Tbere are a number of relevant categories in MeSH.

C 2 Cysts, neoplàsms and granulomatous discases (approx . 300 terms) B l Invcrtebrates

B 2 V ertebrates (this does not include« Man» which docs no t occur in MeSH)

The statemen ts for the programming of tbe searcb are divided into 3 sub-searcbes, the most generai first.

l) Ra = M4 and M5 and C2 not Bl and L4 2) Rh = not M9 3) Re = not B2

Tbus t he 6rst search se!ccts by matching all references with

a) one or m ore of tbc amino acids b) the indexing term «amino acid metabolism » c) any t erm from the 300 in Cysts etc. d) no term from invertcbrates e) one of tbc 3 languages

The sccond search selects from the products of t he first scarch ali those N OT lab ellcd witb the terms Primates, Apes and Monkeys.

T he third searcb selects from the results of the second with the elimi­nation of those labelled wit h the term V ertebratcs.

In this way tbc final print-out produces 3 sets of relevant referen.ces, those rderring to Man, non-primate Vertebrates, and Primates .

T he above example illut>tra tes that tbe adequacy of MeSH is a limiting factor of fundamcntal importance. Relevant papers which for any reason bave not been indcxed undcr onc of tbc terms (say M5) as they seem unsuit · able to tbc indexer, will be lost . Furtbermore, and this is tbc danger of the use of NOT in tbc Boolean logic of computer searching, tbc introduction of NOT Bl in Ra could eliminate a highly relevant paper contrasting amino acid mctabolism in vertebrates and invertebrates, as it would in ali likeli­hood be indexed with some term for an invertebrate creature.

The language and tbe Jogic of the system is so important t bat its adequacy has been examined in a major evaluation test carried out at tbc NLM by F . W. Lancaster. Actual search questions (302 searches made between 1966 and 1967) were tak en and reasons for tbc 4000 failures (non­relevant documents retrieved and relevant documcnts which had been missed) wcre analysed. The Report (1°) cxplains tbat tbe main sources of error a nd failure lie in tbc interpretation of the needs of the user into a good search strategy. This underlines on ce again how immensely impor-

.l nn. lat .• Su per. San ìt4 (1970) 6, 138-160.

IO

146

tant tb~ « livcware» is; in fact , important as it is in a manual s ituation it hecomes evcn more crucial a !' tbc scale {tbc total stock of sca rcbahlt· cntries in th t> syst cm) incrcasc~< and as mecba nisation is inten sified .

MEDLARS I was planned to last 5 ycar!', 1964 to 1969. Tb(' stmcture for MEDLAHS II ( 16

) is well on the way t o realisa t ion. The com puter will bt• chang(·d lo an IBM of a sizf" which will make possihle greal l'T capacity and. higher spccd, timc-sharing and on-linl:' acccss from remote tcrminals . This will makt· possihle the storagc of abstracts in ad.ditipn to tbc rcferenccs: a graphic imagt' (diagrams, phot ograpbs etc.) storage and retrieval system : a mi quicker responsc lo tbc growing volum(• of requ(·Sts including demand 8<'arches and recurring bibliograpbics .

Tbert• Ì!! anotht'r fully mechanised system of comparable size, Exccrpla M<!dica (17

) witb it8 bcadquarter in Amst erdam, and it is in fact mon · comprehensive in that it stores t be abstracts as well as the bibliographical refcrcnces and index terms. T hus some 200,000 citations and 80,000 abt: tracts from more tban 3000 puiodicals are stored in tbc E . M. Data Rank, providing tht• contents of the 33 E . M. Montbly abstract periodica!~ (in Englisb) in ali tbt' fields of mt'dicine and public bealth . Excerpta Medica was created in 1946 as a commerciai cnterprist~, but suhsequcntly it b(•came an intcrnationaJ non-profit making foundation with region al oflict·s in ~ ix citics in the different contincnts . Since tbe bcginning of 1969 tbc) bave h een using an NCR 315-RMC computer (with CR AM units for auxiliary storage) an d a Digiset E lectronic Composition Machine for producing their pnriodicals.

Over tbe past ft> w years t h ey bave compiled a tbesaurus of 40,000 t t•rms and about 500,000 synonym s and word forms. F urthermore in colla­boration with tbc univcrsity of Leyden , usin g their IBM 360/ 50 computer tbt• Foundat ion is developing a n on-line net work for bio-medicai infor­mation . At present tbey can provide duplicate tapes from tbeir data bank, citations and the corrcsponding a bstracts on specifi ed subj ects as part of an SDI scrvice. However , no published report on cvaluation tests on th1' indexing system , comparable witb wbat bas h een donc witb Index Me­dicus e~). bas as yet b ecome available.

Tbc exist encc of compreben sive computerised services, in tbc form of duplicate magnctic tapcs or SDI and questionfanswer scrvices in tbc English language, certainly bas important implica t ions for tbc smaller E 1uopcan countries, or even thl' larger d cveloping countries . Should each sucb country attcmpt to partially and inefficient ly cover tht• world's litcraturc? H ow essential is it for such non- En glisb speaking cou ntries to providc index­ing and abstracting services in tbc local langu agr, say ltalian or Hcbrew, or cven German? Wha t sbould be tbc size and scope of local or regional units in rela tion to national cen~rali sation (' ~) ?

A 1111. l tt . SUJJCt. Shnit<i (19o0) 6, 13:<-150.

re ill l,.

·l s .

o:

u l

la

re

al \ )()

t a

ls

1 1a

i t l'~

' \

IO a-

t•r r-

k. ,f

l t'

,r h I l

h ·'l ,.

l,

l .

COBUNS 147

Tbese are aH questiona for wbich there are no generally valid answers. They must be considcrcd in relation to some unplcasant facts about mecha­nised IR. To use duplicate tapc stores cffectively, whatevcr the indexing system, requires a highly trained staff of search analysts. To make a com­puter search worth while questions must be fairly complex (having nume· rous facets) , specific and cast in a suitable form. Otherwise the output of references would be overwhelming and self-defeating. These ùifficnltics increase exponentially with the size of the store, as for exam.ple t he million cntries which MEDLARS now has. There must .also b e a large enough number of suitahle questions forthcoming to warrant a computer run. Therefore with all its disadvantages of lack of direct dialogue b etween the users and the system, one national centre handling ali questions is probably the inevitahle solution for rnost small countries. In fact, for some develop· ing countries and even whole regions, some international ccntre, like the W ori d H e alt h Organisation, might provide the most suitablc form of help.

Summary. - Superficially the nature of the problem of modern docu­mentation seems to b e the sbeer quantity, t he much quoted « information e•xplosion». However it is basically a mattcr of quality ratber than quan· tity. F irstly tbere is the large spread of tbe bio-medicai complex and the scatter of its literature in fringe periodicals. Secondly there is the problcm of subject categorisation - classification and indexing. There is a real differcnce betwccn the univcrse of knowledge, facts and criticai data, and their representation in documentary form, the way in which scienti.~ts and technologists communicate thcir work, an esscntially human and social phenomenon. The documentalist must be able to specify t he content of a scientific paper so that in any suhsequent search what is relevant can be retrieved. lndexing must take account not only of s tandarclised subject headings or classifìcation codes but also of a whole range of conventional, mainly « language» devices such as systematic names, trivial and proprie· tary naJ:nes, acronyms etc.

The bibliographical contro! of bio-medicine must ensure the coverage of 2 classcs of materia!: the puhlishcd (books, periodicals etc.) and the >' t•rni-publisbed {reports and pre·prints). In practice doeumentation ccn· tn·s use printed indexcs and abstracting pcriodicals and in-housc indcxing of tbc more ephemeral materials . From a large nwnber of overlapping >'t•rvices which index or abstract the literature by convcntional mcthods an increasing number of mechanised products have speeded u p the docu· mcntation proccss without adding much to tbc quaJity of tbc indcxing. Typical examplcs are the KWIC index (e.g. « Chemical titles ») and « Che· nucal-biological activities» (American Chemical Socicty). The devclop· rnent since 1964 of « lndex medicus>1 as t he main product of .MEDLARS

.l""· 13t. Super. Sflnilcì (l\l70) 6, 1:18 150.

t

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

H l!

has het•n t hc pior•ecring arhievemcnt in tht• fullf'r mcrhanil'ation of doeu­mf\ntation ami in posing somt' of tht· problem!< of intemational ro-opt•ration. lt~ ~t.orc of r(•ft· rl:'nc••s, which will r cach l million entrics (with on the avt•ragt· ]() index t erms pt•r l'ntry) in 1970 providt·l' 2 valuable by produr ts : r ecurring bibliographies and tht• facility for d(·mand !-W<Lrchl'S. T he la tler has bren largt•ly d<'centralised b~· t h t· distrihution of duplicate magn etic taJWi-> in tlw U.S.A. and in Europc·.

Apart from thc· problt•ms of hardwa re (compatibil ity and conversion) "hich are slowly becoming k ss diflicul t th(' intc·lkctual techniqu<'~ for rctri<'val nel'd highly traineù indl'Xers and searchers as is demon~trated h~

fo llowing throttgh t h t' !'teps in a st•arch for papt~rs in a typical que::. tion. « tht· m etabolism of sulphur-conwining amino a cicl!l in tumours» . lmpro­vemen t :-< t o tlw « Mt•dical subjcct h Padings » (M eS H ) ari' aJ ._o essentia l. MEDLARS I is now being replaced b y MEDLARS l l , providing grcater capacity a ncl Sp('ed , aml usi ng newer m <' thod!' of graphir image s torag<'.

S ince 1969 d u· otht•r comt>r<'hem;in· and more compldc system « Ex­cerpta med ica » (it storN; abstractt- as wcll) has hel'n similarlr mf'cha­niH<'<I. Bascd on a different indexing approach it aiHo p rovides duplical<· tape::~ and a hstract !> ou specified su bjl'cts as part of a sl'rvicl' for tbC' l-ielecti,·t· d is!'emination of inforrnation.

Riassunto (Il co11trollo (! la com unicaz ion e dell 'ir~formazionP biomedica).

- A pri ma vista la natura del p roblema della documentazioni' moderna sembra cou>; istf'rt• !:>emplic:ern.-n tt• nella quantità della letteratura scientifica pi ìr nota conw « esplosiont· delfinformazionl'». Tuttavia :<i t ratta soprattutto di qualità piuttosto che di q uantitìL I n primo luogo vi r l 'e~o~ l<'nsiont· del settore biomt·dico c la dispersione della lcttt·ratura in pt•riodici secondari. l n !~t'condo luogo, eSÌl' l t' il probkma dt•lla cat<·gorizzazione JWr argo­men to. classifica:r. iorw c indicizzazion<·. E siste una v t·ra <' propria differenza tra il mondo dell a ('Ono;;cenz<t , i fatti , i dati critici l' la loro rapprt~sentaziunt·

in fu rwa dueumentaria, il modo col qualr s tndioHi <' ({·enologi comunicano i loro }a,·ori, ft•nomeno questo tipicamente urnan.o e sociale. Il documenta­lista dc, ·c ess!' r<' capac(· di mettere in r ilit·vo il cont<·nuto di un lavoro scil'n · t ifico così eh(• in una successiva ricerca ciò ch t• è rilevant<' si pOS!'a repcr in·. La indicizzazionl' d<•Vf' t ener conto non solo di vor i-soggetlo st andard, o di codici di cla !<sifica, rna a nche d i tutto un insieme di espedient i com enzionali. soprattutto linguis t ici quali nomi sist ematici, nomi comuni, nomi dl'posi­t a ti , acronimi, l'CC.

II con t rollo bibliografico dell t~ scicnZI' biomediclw deve• asl;icurart: la copert ura d i due tipi di materiale: quello pubblicato (libri, periodici, cc·c.) e quello semi-pubblicato (rap por ti c « prl'prints »). Iu pratica i ccn.t ri d i documcnta:r.ion.e usano indici stam pati, periotl ici-abstract!' ed indici per uso

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

COIILANS 149

interno del materiale pm effimero. A partire da un gran numero di serv1z1 parzialmente sovrappos ti che indicizzano o riassumono la letteratura con i metodi tradizionali, un numero sempre crescente di prodotti meccanizzati ha dato impulso al processo lli documentazione senza aggiungere molto alla qualità dell'indicizzazione. Tipici esempi sono gli indici KWIC (ad (•sempio i (( Chemical titles » e le« Chemical biologica! activities » dell 'Ame· rican Ch emical Society) . Lo sviluppo, a partire dal 1964, deU' Index Mcdicus quale prodotto principale del MEDLARS ha costituito un primo passo verso una più completa meccanizzazionc della documentazione e verso l' imposta­zione di alcuni p roblemi di cooperazione internazionale. La raccolta, da parte dcll' Indcx .\1edicus, di referenze che raggiungeranno un milione di voci nel 1970 (con una media di 10 voci soggetto per ogni voce principale) fornisce due notevoli sottoprodotti: bibliografie ricorrenti c la possibilitl di ricerca su richiesta. Quest ' ultima è stata largamente decentralizzata con la distribuzione eli nastri magnt>tici duplicati n egli Stati Uniti c in Europa. T ralasciando i problemi dell'« hardware» (compatibilità e conversione) che s tanno divenendo man mano meno difficili, le tecniche intelle ttuali per il ricupero dell' informazione necessitano di personale altamente addestrato per l'indicizzazione e il ricupero come è dimostrato seguendo passo passo la ricrrca di articoli su un argomento tipico, quale:« il m etabolismo di ammi­noacidi solforati nei t umori». Sono anche essenziali dci miglioramenti nel «Medicai subject headings » (MeSH ). A ttualmcnte è in corso la so:;tituzione dd MEDLARS l con il ~IEDLARS 2 dotato quest ' ultimo di una maggiore t•apacità e rapidità e basato sull 'impiego di metodi più nuovi per la regi­,-trazione di immagini grafiche.

Dal 1969 l'altro ~crvizio più completo e eli vasta portata, l'« E xcerpta "'ledica», (esso registra anche i riassunti) è pure stato meccanizzato. Basato su. un sis tema diverso di indicizzazione esso fo rnisce anch e nastri duplicati e riassun ti su dctt>rminati a rgomenti quale parte di un servizio per la dif· fusione !H•Ie ttiva dcll'informazion(•.

REFERE~CES

( 1) P ASSMAN, S. Scientijic and lecltnological rommuniwtion. Perg:unon, Xew York, 1969.

(t ) ConLANS, H. Jfi<thodes et techn ifJUCK uouvelles de diffusion d es connaissnnces. Bull. Un esco à l'inten rion de~ Bibliothèques, 11, 153·179 (1957).

(:l) COBLANS, H. Ordnungsformen von Informntionsgut - Versuch einer Sy~tematik. Vachr. Dok., 13, 8·12 (1962).

(l) Fou.r year s of informution excbaoge (groups). Sllt u re, 211. 901·905 (1966).

(~) G REEN, D. Oeath of an expcrirneut. Intfrn . Sci. T echnol .. no. 6.5. 82·88. \'lay 1967.

( ' ) LURN, Il. P. Keyword·in·COntext index for lechnical litcratur e (K w re index). Am. Doc., 11, 288-295 (1960).

1 '"" lat. SurJe r. Sun ità (lUiO) 6, 1:1<> t;~) .

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

150 CONFt:HF.NZf. t: SF.MINAitl

(') CoULA~~ - H . U se of m echan isecl mtthods in dorumentation. Asl i b. London. l 966.

(8) CuONEZ, N. P rt>paratio11 des biblìographìes auto·ìndexle.~ « Physìndt>x». Sarlay. CEA. Mai 196-t. (ì'\ote AFD 4-l) .

(1) !\ATLO:'IAL LIBRAR Y OF )iF.OICINE. l ndex mechanisation pruject. Bull . M ed. Libmry A ssoc .. 49. no. l. purt 2 (1961).

('O) ThP MEDLAR8 stor,v al the lVation ol Library of l\lt>dicìnt>. l'ublic H ealth SeC\•ice. \\ u~h -

ington, 196:1.

( 11 ) A U!>TI I\'. C. .1 . . l'tTEDLAR~ 1963-1967. Nnt.ional J.ibrnry of Mcdic·iue. Bt"the>'fla, 19611.

(12 ) Seurch boundaril'~ renligned . 1\'atl. Library M ed. !\ews, 24 (l), 2 (1 969).

(l") P ACE. E. S. MEDLARS in tht> Lui ted Kiu~rdom. I n: Librury s.v.~tems and infvrm(I(Ìon services. Procc·r din~ts 2ud An~tlo-Czecli Conference of Jnformat iou Spe('iali,b. Cro~h~ Lo('kwood. L oudou, 1970, l'· 116-120.

(' 4) H A RLEY, A. J . & E. D. BARHACLOUCII. ) l EDJ.AH S informulion rNrieva l iu Urituin. J>ustgruduatP A1ed . .1., 42. 69-73 (1966).

(1 5) LA 'iCAS'I'F.R, F . W. Evoluation of the 1\fEVLARS demand searrh strvìrc•. Nntional Lihrur~ of l\IediciJIC, Bethesdu. 1968.

( 16) ''tiF.DLARS II. N ati. Libmr.v M ed. !Vews, 33. (1), 6 (1968).

( 17) E XCF.RI'TA MEDICA FOU:'IIOATIO'\'. l::x.rerpw Medirn: automated slora/{1' u11d rrt rìet•nl p rowa.m of biomedica! ìnformution . Excerpla Mark l Sysltm. Am sterdam. 1969.

( 18) CoiiLANS, 11. The mechnnisation o f docurucututiou - o tentativc hnlance sh rel. In: Ciba Foundntion S.vmposium on CamnwnirMion ;, Science. Churchill. Loudon. l96i . p. 711-8:\.

-11111 . /st . Sutl<' r. Sonilù ( 1970) 6, 13S-I :.tl.

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l