05 rdb productivity

Upload: jagiovannyvalenciavelazquez

Post on 02-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 05 RDB Productivity

    1/9

    Th e 98 A C M Turing Award Lecture

    Delivered at ACM '81, Los Angeles , ,California , November 9, 1981

    Th e 1981 A CM Tur i ng A w ar d w as pr esen ted to Edgar F . Codd, an I B M

    Fellow of the San Jose Re search La bora tory, by Pres ident Peter Denr~ing on

    No vem ber 9, 1981 at the AC M Annu al Conference in Los Angeles , California .

    ~t is the Associat ion' s foremost award fbr technical contr ibutions to the com-

    p u t i n g c o m m u n i t y .

    Codd w as se lec ted by the A CM G ener a l Technica l A chievement A w ar d

    Com mit te e f or h is f und amen ta l and cont inu ing cont r ibu t ions to the theor y

    and pr ac t ice of da tabase m anag emen t sys tems . The or ig ina tor of the r e la t iona l

    mode l f or da tabases , Codd has made t~ r the r impor tan t cont r ibu t ions in the

    deve lopm ent of r e la tiona l a lgebr a , r e la t iona l ca lcu lus , and nor mal iza t ion of

    relations.

    Edgar F . Codd jo ined I BM in 1949 to pr epar e pr ogr ams f or the Se lec t ive

    Sequence Electronic Calculator . S ince then, his work in computing has encom-

    passed ~ogicat des ign of comp uters ( IB M 701 and Stretch) , man agi ng a comp uter

    cente r in Canad a , heading the d eve lopment of one of the f ir st oper a t ing sys tems

    w i th a gener a l mu i t ip r o gr am min g capabi l i ty , cont r ibu t ing to the log ic of s el t:

    r epr oduc ing au to mata , deve loping h igh leve l t echniques f or sof tw ar e spec if ica -

    t ion , c r ea t ing and ex tending the r e la t iona l appr oach to da tabase management , and deve loping an Engl i sh ana lyz ing

    and syn thesizing subsy stem for casual users of re lat ional databa ses . He is a lso the au thor of Cellular Automata,an ear ly

    volume in the A CM Monogr aph Ser ies .

    Codd r ece ived h i s B .A. and M.A . in Ma them at ics f r om O xf or d U niver s i ty in England , and h is M.Sc. and Ph .D .

    in Co mp ute r and Com mun ica t ion Sc iences f rom the U niver s i ty of Michigan . H e i s a Me mbe r of the N a t iona l

    A cademy of Engineer ing ( U SA) and a Fe llow of the Br i t ish Co mpu te r Soc ie ty .

    The A C M Tu r ing A w ar d i s p r esen ted each yea r in comm emo r a t ion of A . M. Tur ing , the Engl ish mathe mat i c ian

    w ho made major cont r ibu t ions to the comput ing sc iences .

    Re lation al Database A Practical Fou nda tion for

    Productivity

    E. F. Codd

    IB M S an Jose R esea rch L abora to ry

    I t i s w e l l know n tha i the gr ow th in demand s f r om end

    user s f or new appl ica t ions i s ou tst r ipp ing the capabi l i ty

    of da ta pr oc ess ing de par tme nts to implement the cor r e -

    sponding appl ica t ion pr ogr ams . Ther e a r e tw o

    comple-

    mentary a p p r o a c h e s t o attacking this problem (and both

    approaches are needed): one is to put end users into

    di r ec t touch w i th the

    information stored in computers;

    the n the r i s to inc r ease the productivity of data process-

    ing professionals in the development of application pro-

    grams. It is less well known that a single technology,

    Author's Present Address: E. F. Co dd, IBM Rese arch Laboratory,

    5600 Cottle Road , San Jose, CA 95193.

    Permission to copy without fee all or part of this material is

    granted provided t hat the copies are not made or distributed for direct

    commercial advan tage, the ACM co pyrightnotice and the title of the

    publication and its d ate app ear, and notice is given hat co pying is by

    permission of the Association for Computing Machinery. To copy

    otherwise, or to republish , requires a fee and/or specificpermission.

    1982 ACM 0001-0782/82/0200-0109 $00.75

    11}9

    r e la t iona l

    database management,

    provides a pract ical

    f ounda t ion tbr bo th appr oaches . I t i s expla ined w hy th i s

    is so.

    Whi le deve loping th i s

    productivity theme, it

    is noted

    tha t the t ime has com e to dr aw a ve r y sha r p l ine between

    r e la t iona l and non-relational database sys tems , so tha i

    the label re la t ion al wil l not be used in mis leadin g ways.

    The key to drawing this l ine is something cal led a

    relational processing capability.

    CR Categor ies and Su bjec t D esc r ip tor s : H .2 ,0 [ D a tabas~

    Management]: General; H.2.1 [Database Management]:

    Logica l Design-data models; H . 2 . 4 [ D a t a b a s e M a n a g e -

    mea t ] : Sys tems

    G e n e r a l T e r m s : H u m a n F a c t o r s , L a n g u a g e s

    A ddi t iona l K ey Wor ds and Phr ases : da tabase , r e la t iona l

    da tabase , r e la t iona l mode l , da ta s t r uc tur e , da ta manip-

    u la t ion , da ta in tegr i ty , p r oduc t iv i ty

    Communications Februa ry 1982

    of Volume 25

    the ACM Number 2

  • 8/10/2019 05 RDB Productivity

    2/9

    1t. ilntroduc*ien

    [t is general ly admitted that there is a productivi ty

    c r i s i s in the deve lopment o f " running code" for corn-

    merc ia l and indus t r ia l app l ica t ions . The growth in end

    use r demands fbr new appl ica t ions i s ou ts t r ipp ing the

    capabi l i ty o f da ta processing depa r tments to im plem ent

    the corresponding applicat ion programs. In the la te s ix-

    t ie s and ea r ly sevent ie s many people in the comput ing

    f ie ld hoped tha t the in t roduc t ion of da tabase manage -

    ment sys tems (commonly abbrev ia ted DBMS) would

    markedly inc rease the produc t iv i ty o f app l ica t ion pro-

    grammers by removing many of the i r prob lems in han-

    d l ing input and output f i l e s . DBMS (a long wi th da ta

    dict ionar ies) appear to have been highly successful as

    ins t ruments o f da ta contro l , and they d id remove m any

    of the f i l e handl ing de ta i l s f rom the concern o f app l ica-

    t ion programmers . Why then have they fa i led a s pro-

    ductivi ty boosters?

    There are three pr incipal reasons:

    (1 ) These sys tems burdened ap pl ica t ion programm ers

    wi th n um erous concepts tha t were i r r e levant to the i r da ta

    re t r ieva l and manipu la t ion ta sks , forc ing them to th ink

    and code at a needlessly low level of structura l deta i l ( the

    " o w n e r - m e m b e r s et " o f C O D A S Y L D B T G i s a n ou t-

    s t a n d in g e xa m p le ) ;

    (2 ) No commands were prov ided for process ing mul -

    t ip te r ecords a t a t ime-- in o the r words , DBMS did not

    suppor t

    set p rocessing

    and, as a result, progra mm ers were

    %rced to th ink and code in te rms o f i te rat ive loops tha t

    were often unnecessary (here we use the word "set" in

    i ts tradit ional mathematica l sense , not the l inked struc-

    t u re s e ns e o f C O D A S Y L D B T G ) ;

    (3) Th e :needs of end u sers for direct interaction w ith

    da tabases , pa r t icu la r ly in te rac t ion of an unant ic ipa ted

    na ture , were inadequa te ly recognize d--a q ue ry capabi l -

    i ty was a ssumed to be som eth ing one could add on to a

    DB MS a t some la ter t ime .

    Looking back a t the da tabase management sys tems

    of the la te s ixt ies , we may readi ly observe that there was

    no sha rp d is t inc t ion be tween the program mer ' s ( log ica l)

    v iew of the da ta and the (phys ica l ) r epresenta t ion o f da ta

    in s torage. Ev en th oug h w ha t w as ca l led the log ica l l eve l

    usua l ly p rov ided pro tec t ion f rom p lacement expressed in

    terms of storage addresses and byte offsets , many stor-

    age--oriented concepts w ere an integra l p ar t o f this level~

    The adverse impac t on deve lopment produc t iv i ty o f

    requ i r ing program mers to nav iga te a long access pa ths to

    T h e c r u x o f t h e p r ob l e m w i t h t h e t he C O D A S Y L D B T G o w n e r -

    memb er s e t i s t h a t i t co mb i n es i n to o n e co n s t r u c t th r ee o r th o g o n a l

    co n cep t s : o n e - to - m an y r e l a t i on s h i p , ex i s t en ce d ep en d en cy , an d a u s e r -

    v i s i b l e l i n k ed s t r u c tu r e to b e t r av e r s ed b y ap p l i ca t i o n p r o g r ams . I t i s

    th e l a s t o f th es e th r ee co n cep t s th a t p l aces a h eav y an d u n n eces s a r y

    n av i g a t i o n b u r d en o n ap p l i ca t i o n p r o g r ammer s , I t a l s o p r es en t s an

    i ~ l s u r mo u n tab l e o b s tac l e f b r en d u s e r s .

    110

    reach t lae target dat a ( in som e cases having to deal

    directly with the layout of data ir~ storage and in others

    having to fb l low po in te r cha ins ) was

    enormous.

    In ad-

    dit ion, i t was not possibte to make sl ight changes in the

    layout in s torage wi tho ut s im ul taneou s ly hav ing to r ev ise

    at1 programs that re l ied of~ the previous structure . The

    in t roduc t ion of a s. index mi ght hav e a s im i la r e f f?ct . As

    a resta i t , f ia t too much manpower was being invested in

    contin ual (ar id avoi dable) maintenar~ce of ' app l icat ion

    programs.

    An othe r consequence was tha t ins ta l la t ion of these

    sys tems was o t ten agoniz ing ly s low, due to the la rge

    am oun t o f t ime spent in 1ea rn ing about the sys tems and

    in p lanning the organ iza t ion of the da ta a t bo th log ica l

    and phys ica l l eve ls, p r ior to da tabase ac t iva t ion . The a im

    of" this prepl annin g was to "g et i t r ight once an d for a l l"

    so as to avoid the need fbr subsequent changes in the

    da ta desc r ip t ion tha t , in turn , wo uld force coding changes

    in app l ica t ion programs. Such an ob jec t ive was , o f

    course , a mirag e , even if sou nd pr incip les for database

    des ign had been kn own a t the t ime (and , o f course , they

    were not).

    To show how re la t iona l da tabase management sys -

    tems avoid the three pi tfa l ls c i ted above, we shal l f i rst

    r ev iew the m ot iva t ion of the re la t iona l m ode l an d d iscuss

    some of i ts features. We shal l then c lassify systems that

    are based ctpon that model. As we proceed, we shal l

    s t re ss app l ica t ion program mer p roduc t iv i ty , even though

    the benefi ts fbr end users are ju st as great , because mu ch

    has a l r eady been sa id and demonst ra ted rega rd ing the

    valu e of re la t ional datab ase to end u sers (see [23] and

    the papers c i ted there in).

    2. Met ivat ieri

    The m ost im por tan t m ot iva t ion for the re search work

    tha t r e su l ted in the re la t iona l mod e l w as the ob jec tive o f

    provid ing a sha rp and c lea r bou ndary be tween the log ica l

    and phys ica l aspec ts o f da tabase m anag em ent ( inc lud ing

    da tabase design , da ta r e t rieva l , and da ta manipu la t ion) .

    We cal l this the

    data independence objective.

    A second ob jec t ive was to make the mode l s t ruc tm-

    a l ly simpte , so tha t a l l k inds o f use r s and p rogramm ers

    c o u ld ha ve a c o m m o n un d e r s t a n d in g o f t h e d at a , a n d

    could the re fore commu nica te wi th one anothe r about the

    database. We cal l this the

    communicability objective.

    A th i rd ob jec t ive was to in t roduce h ig h leve l language

    concepts (but not specific syntax) to enable users to

    express ope ra t ions upon la rge chunks of in forma t ion a t

    a t ime . Th is en ta i led prov id ing a founda t ion for se t -

    or iented p rocessing (i .e. , the abi l i t y to express in a s ingle

    sta tement the processing of mult iple se ts of records a t a

    t ime). W e cal l this the

    set-processing objective.

    There were other objectives, such. as providing a

    sound theore t ica l founda t ion for da tabase organ iza t ion

    and management, but these objectives are less re levant

    to our present produc t iv i ty theme .

    C o m m u n i c a t i o n s F e b r u a r y 1 9 82

    o f V o l u m e 2 5

    t h e A C M N u m b e r 2

  • 8/10/2019 05 RDB Productivity

    3/9

    3~ The Relational l~ 'iode~

    To sat is f ) / these three object ives , i t was necessary to

    discard a l l tlhose da ta s t ructu r ing concepts (e .g. , repeat ing

    groups , l i nked s t ruc tures ) t ha t were no t f ami l i a r t o end

    users and to tak e a f resh lo ok a t the address ing of data .

    Pos i t ional concepts have a lways played a s igni f icant

    ro l e i n compute r addres s ing , beg inn ing wi th p lugboard

    address ing, then absolute numeric address ing, re la t ive

    num er i c addres sing , and sym bol i c addres s ing wi th a r i th -

    met ic prop ert ies (e .g. , the sym bol ic address A + 3 in

    as semble r l anguage ; t he addres s X( I + t , Y - 2 ) o f an

    element in a ~ : :or t ran, Algol , or PL/I array named X). In

    the re l a t iona l m ode l we rep l ace p os i t iona l addres s ing by

    tota l ly associa t ive address ing. Ever y datu m in a re la-

    t i ona l da t abase can be un ique ly addres sed by means o f

    the re l a t i on , name , p r imary key va lue , and a t t r i bu te

    name . Assoc i a t i ve addres sing o f th i s fo rm enab les use rs

    (yes, and e ven pro gra m m ers a lso ) to leave i t to the

    sys tem to ( t ) de t e rm ine the de ta i l s o f p l acement o f a x lew

    piece o f i n fo rm a t ion tha t i s be ing inse r t ed in to a da tabase

    and (2 ) s e l ec t appropr i a t e acces s pa ths when re t r i ev ing

    data.

    A l l i n form a t ion in a re l a t i ona l da t abase i s r epresen ted

    by va lues i n t ab l e s ( even t ab l e names app ea r a s cha rac te r

    s t r ings in a t leas t one table) . Address ing data by value ,

    ra the r than by pos i t i on , boos t s the produ c t i v i ty o f p ro -

    gram m ers as wet1 as end u sers (pos i t ions of i tems in

    sequences are usual ly subject to change and are not easy

    for a person to keep t rack of , especia l ly i f the sequences

    conta in many i t ems ) . Moreove r , t he f ac t t ha t p rogram-

    mers and end users a l l address data in the same way goes

    a long way to mee t ing the communicab i l i t y ob jec t i ve .

    The n-ary re la t ion was chosen as the s ingle aggregate

    s t ruc ture fo r t he re l a t i ona l mode l , because wi th appro-

    pr i a t e ope ra tors and an ap prop r i a t e conceptua l represen-

    ta t ion ( the table) i t sa ti s f ies a l l three of the c i ted objec-

    t ives . Note that an n-ary re ta t ion is a mathemat ica l se t ,

    i n w hich the orde r ing o f rows i s imm ate r ia l .

    Som et imes the fo l l owing q ues t ions a r i se : Wh y ca ll i t

    t he re l a ti ona l m ode l ? W hy n ot ca l l i t t he t abu la r m ode l ?

    There are two reasons : (1) At the t ime the re la t ional

    mo de l was i n t roduced , m any peop le i n da ta proces s ing

    fe l t t ha t a re l a t i on (o r re l a t i onsh ip) among two or more

    ob jec t s mus t be represen ted by a l i nked da ta s t ruc ture

    ( so the na m e was s e l ec t ed to cou nte r t h is mi sconcept ion) ;

    (2) Tab les are a t a lower level of abs t ract ion than re la-

    t ions , s ince they give the impress ion that pos i t ional (ar-

    ray- type ) addres s ing i s app l i cab le (which i s no t t rue o f

    n -a ry re l a ti ons ) , and they f a i l to show tha t t he i n form a-

    t ion conten t o f a t ab l e i s i ndependent o f row orde r .

    Nevertheless , even wi th these minor f laws , tables are the

    mos t im po r t an t conceptu a l represen ta tion o f re l a ti ons ,

    because they a re un ive rsa l l y unders tood .

    Inc iden ta l l y , i f a da t a m ode l i s to be cons ide red a s a

    se rious a l t e rna t i ve fo r t he re l a t i ona l mode l , i t t oo should

    have a c l ea r ly de f ined conceptua l represen ta t ion fo r

    database ins tances . Such a representat ion fac i l i ta tes

    111

    think: iag abou t the ef fects of whatev er operat io ns are

    und er cons ide ra tion . I t i s a requ i rem ent ~br prog ram me r

    and end-user product ivi ty. Such a representat ion is

    rare ly, i f ever, discussed in data mo dels that use concepts

    :such as ent i t ies and re la t ionships , or in funct ional data

    mode l s . Such mode l s f requent ly do no t have any ope r -

    a tors ei ther Nev ertheless , they may be useful tbr certa in

    k inds o f da ta type analysi s encounte red in the p roces s o f

    es tabl ishing a new database , especia l ly in the very ear ly

    s tages o f de t e rmin ing a p re l imina ry in form a l o rgan i za -

    t ion. This leads to the ques t ion: What i s a data model?

    A data m odel i s, of course , not jus t a d ata s t ructure ,

    a s many people s eem to th inL I t i s na tura l t ha t t he

    pr inc ipa l da t a mode l s a re named a f t e r t he i r p r inc ipa l

    s t ructures , but that i s not the whole s tory.

    A data m ode l [9] is a com binat io n of a t leas t three

    component s :

    (1) A col lect ion of data s t ructure types ( the database

    building, blocks);

    (2) A col lect ion of operators or rules of inference ,

    which can be ap pl ied to .any val id ins tances of the data

    types t i s ted in (1) , to re t r ieve , der ive , or modify data

    f rom any pa r t s o f t hose s t ruc tures i n any combina t ions

    des i red;

    (3) A col lect ion of genera l in tegr i ty rules , which im-

    pl ic i t ly or expl ic i t ly def ine the se t of cons is tent database

    s ta tes or changes of sta te or bo th- - th ese rules are genera l

    in the sense that they apply to any database us ing this

    mode l ( i nc iden ta l l y , t hey may somet imes be expres sed

    as i nse r t -up da te -de l e t e ru l es ) .

    The re la t ional model i s a data model in this sense ,

    and was the f i rs t such to be def ined. W e do no t prop ose

    to g ive a de ta i l ed de f in i t i on o f t he re l a t i ona l m ode l

    he re - - the or i g ina l de f in i t i on appea red in [7 ] , and an

    improved one in Secs . 2 and 3 of [8] . I t s structuralpart

    cons is ts of domains , re la t ions of assorted degrees (w i th

    tables as thei r pr incipal conceptual representat ion) , a t -

    t r ibutes , tuples , candidate keys , and pr imary keys . Under

    the pr inc ipa l r epresen tat ion , a t t r ibu te s becom e co lumns

    of t ab l es and t up l e s becom e rows, bu t t he re i s no no t ion

    of one co lum n succeed ing anothe r o r o f one row suc -

    ceeding another as t~r as the database tables are con-

    ce rned . In o the r words , t he l e f t to r i gh t o rde r o f co lum ns

    and the top to bo t tom orde r o f rows in those t ab le s a re

    a rb i t ra ry and i r re levan t .

    T h e manipuIativepart of the re l a t iona l m ode l consi s ts

    of the a lg ebra ic op erators (se lect, project , join, e tc . ) wh ich

    t rans form re l a t ions i n to re l a t i ons ( and he nce t ab l e s i n to

    tables).

    T h e integrity par t cons ists of two integr i ty rules : ent i ty

    integr i ty and referent ia l integr i ty (see [8 , 11] tbr recent

    deve lopment s i n th i s l a t t e r a rea ) . In any pa r t i cu l a r ap-

    p l i ca t ion o f a da ta m ode l i t ma y be neces sa ry to impose

    further (database-speci f ic) integr i ty cons tra ints , and

    there by def ine a smal ler se t of cons is tent database s ta tes

    or changes of s ta te .

    In the deve lop m ent o f t he re l at i ona l mode l , t he re has

    a lways been a s t rong coupl ing be tween the s t ruc tura l ,

    Comm unications Febru ary 1982

    of Volume 25

    the ACM Num ber 2

  • 8/10/2019 05 RDB Productivity

    4/9

    m anip ula t ive , and integr i ty aspects . If" the s t ructures are

    de f ined a tone and s epa rat e ly , t he i r behav iora l p rope~ ti e s

    a re no t p inn ed d own, i n f in i t ely man y pos s ib il i ti e s p resen t

    themselv es , and end less specula t ion resul ts . It is therefore

    no surpr i s e tha t a tt empt s s uch a s those o f CO D A SY L

    and ANS-[ to deve lop da ta s t ruc ture de f in i ti on l anguage

    ( D D L ) a n d d a t a m a n i p u l a t i o n l a n g u a g e ( D M L ) i n s e p -

    a ra t e commi t t ees have y i e tded many mi sunders t and ings

    and incompat ibi l i t ies .

    4o The Relat ienal Precess i rag CapabWty

    The re l a t i ona l mode l ca l l s no t on ly fo r re l a t i ona l

    s t ructure s (which can be tho ug ht of as tables) , but a lso

    fbr a part ic ular kinnd of se t process ing ca l led

    relational

    proee.rsing. Rela t iona l p roces s ing en ta i l s t r ea t ing whole

    re l a t ions a s ope rands . I t s p r imary purp ose i s l oop-avo id -

    ance , an abso lu t e req u i rem ent fo r end use rs to be pro -

    duc t i ve a t a l l, and a c t ea r p roduc t i v i ty boos te r fo r app l i -

    c a t io n p r o g r a m m e r s .

    T h e S E L E C T o p e r a t o r ( a l s o c a l l e d R E S T R I C T ) o f

    the re l a t i ona l a l gebra t akes one re la t ion ( table) as oper-

    and and produces a new re l a t i on ( t ab l e ) cons i s t i ng o f

    s e l ect ed tup l e s ( rows ) o f the f i rs t. The P RO JEC T ope r -

    a tor a l so t rans forms one re la t ion ( table) into a new one,

    this t ime however cons is t ing of se lected a t t r ibutes (col -

    u m n s ) o f t h e fi rs t. T h e E Q U I - J O I N o p e r a t o r t a k es

    two

    re l a t i ons ( t ab l e s) a s ope rands a nd p roduces a th i rd con-

    s i st i ng o f rows o f t he f i r st conca tena ted w i th rows o f t he

    second , bu t o n ly where spec i f ied co lumns in the f i rs t and

    spec i f i ed co lumns in the s econd have ma tch ing va lues .

    t f r edundancy in co lumns i s removed , t he ope ra tor i s

    c a l l e d N A T U R A L J O I N . I n w h a t f o l l o w s , w e u s e t h e

    t e rm " jo in " to ret%r to e i the r the equ i - j o in o r the na tura l

    j o in .

    The re l a t i ona l a l gebra , which inc ludes these and

    othe r op e ra tors , i s i n t ended as a ya rds t i ck o f power . I t i s

    not i n t ended to be a s t anda rd l anguage , t o which a l l

    r e l a t i ona l sys tems should adhe re . T he s e t -proces s ing ob-

    j ec t i ve o f t he re l a t i ona l mode l i s i n t ended to be me t by

    means o f a da ta s ub languag e 2 hav ing a t l ea s t t he pow er

    of the re l a t i ona l a l gebra without making use of i terat ion

    or recurs ion s tatements .

    Mu ch of the de r i vab i l i ty pow er o f the re l a t i ona l

    a l g e b r a i s o b t a i n e d f r o m t h e S E L E C T , P R O J E C T , a n d

    JOIN ope ra tors a lone , p rov ided the JOIN i s no t s ub jec t

    to any implementa t ion re s t r i c t i ons hav ing to do wi th

    prede f in i t i on o f s upp or t ing phys i ca l access pa ths. A sys-

    t em has an

    unres tr ic ted join capab i l i ty

    i f i t a l lows joins to

    be t aken w here in any pa i r o f a t t r i bu te s m ay be ma tched ,

    prov id ing on ly tha t t hey a re de f ined on the s ame dom a in

    or da ta type ( fo r our presen t purpose , i t does no t ma t t e r

    2 A data sublang uage is a special ized language for database man -

    agement , suppor t ing at least data def in i t ion, data ret r ieval , inser t ion,

    update , and delet ion. I t need not be computat ional ly complete , and

    usual ly is not. In the context of appl icat ion p rogramm ing, i t i s in tended

    to be used in con junc t ion w i th one o r m or e p r ogr am m ing l anguages .

    ~12

    whethe r the doma in i s syn tac t i c o r s emant i c and i t does

    not ma t t e r whe the r the da ta type i s weak or s t rong , bu t

    s ee [10 j fb r c i rcums tances i n which i t does ma t t e r ) .

    Occas iona l ly , one f i nds sys t ems in which jo in i s

    sup por t ed o n ly i f t he a t t r ibu te s to be ma tc hed have the

    same na m e or a re supp or t ed by a ce r t a in type o f p re -

    declared access path. Such res t r ic t ions s igni f icant ly im-

    pa i r t he pow er o f t he sys tem to de r i ve re l a t i ons f rom the

    base re l a t i ons . These re s t r i c t i ons conseq~ent ty reduce

    the sys t em 's capab i l i t y to hand le unant i c i pa t ed que r i e s

    by end use rs and reduce the chances fb r app l i ca t ion

    p r o g r a m m e r s t o a v o i d c o d i n g i t e r a ti v e l o o p s .

    Thus , we s ay tha t a da ta s ub language L has a re / a -

    tional processing cc~pability i f t he t rans forma t ions spec i -

    f i e d b y t h e S E L E C T , P R O J E C T , a n d u n r e s t r ic t e d J O I N

    opera tors o f ' t he re l a t i ona l a l gebra can be spec i f i ed in L

    wi thou t re sort i ng to comm and s fo r i t e ra t i on or recurs ion .

    F o r a d a t a b a s e m a n a g e m e n t s y s t e m t o b e c a t l e d rela-

    tional i t mus t s uppor t :

    ( 1 ) Tab les w i thou t use r -v i s ib l e n av ig a t ion l i nks be -

    tween them;

    (2) A da ta sub lang uag e w i th a t l ea s t th i s (min ima l )

    re l a t i ona l p roces s ing capab i l i t y .

    One con seque nce o f th i s is t ha t a D BM S tha t does

    not suppor t r e l a t i ona l p roces s ing should be cons ide red

    non-relational. S u c h a s y s t e m m i g h t b e m o r e a p p r o p r i -

    a te ly ca l led tabular, prov id ing tha t i t s uppor t s t ab l e s

    wi th out u se r -v i sib l e nav ig a t ion l i nks be tw een t ab l e s. Th i s

    t e r m s h o u l d r e p l a c e t h e t e r m " s e m i - r e l a t i o n a l " u s e d i n

    [8 ] , because the re is a l a rge d i f f e rence in imp lem enta t ion

    com plex i ty be tween t abu la r sys t ems , in which the pro-

    grammer does h i s own nav iga t ion , and re l a t i ona l sys -

    t ems , i n which the sys t em does the nav iga t ion fo r h im,

    i .e . , the sys tem provides automatic navigat ion.

    T h e d e f i n i t i o n o f . r e l a t i o n a l D B M S g i v e n a b o v e i n -

    t en t iona l ly pe rm i t s a l o t o f l a t i t ude in the s e rv i ces pro -

    v ided . For example , i t i s no t requ i red tha t t he fu l l

    r e l a t i ona l a l gebra be suppor t ed , and the re i s no requ i re -

    ment i n rega rd to suppor t o f t he two in t egr i ty ru l e s o f

    the re l a t i ona l mode l ( en t i t y i n t egr i ty and re fe ren t i a l i n -

    t egr i ty ) . Fu l l s up por t b y a re l a t i ona l sys t em of these

    l a t t e r two pa r t s o f t he m ode l j us t i f i e s ca l l i ng tha t syst em

    fu l l y r e la t iona l

    [ 8 ] . A l t h o u g h w e k n o w o f n o s y st e m s th a t

    qua l i f y a s tMly re l a t i ona l t oday , some a re qu i t e c lose to

    qua l i fy ing , and no doubt w i l l soon do so .

    In F ig . 1 we i l l us t ra t e the d i s t i nc t ion b e tween the

    va r ious k inds o f re l a t i ona l and t abu la r sys tems . For each

    c las s the ex ten t o f shad ing in the S box i s i n t ended to

    show the degree o f f i de l i ty o f mem bers o f t ha t c la ss to

    the s t ruc tura l r equ i rem ent s o f the re l a t i ona l m ode l . A

    s imi l a r remark app l i e s to the M box wi th re spec t t o the

    manipu l a t i ve requ i rement s , and to the I box wi th re spec t

    to the i n t egr i ty requ i rement s ,

    m denotes the min ima l re l a t i ona l p roces s ing capab i l -

    i t y . denotes re l a t i ona l comple t enes s ( a capab i l i t y cor -

    re sponding to a two-va lued f i r s t o rde r pred ica t e l og i c

    w i t h o u t n u l ls ) . W h e n t h e m a n i p u l a t i o n b o x M i s f u l ly

    shaded , t h i s denotes a capab i l i t y cor re sponding to the

    Com m un ica t ions F eb r ua r y 1982

    o f V o l u m e 2 5

    t h e A C M N u m b e r 2

  • 8/10/2019 05 RDB Productivity

    5/9

    Fig , 1 . Class i f i cat io~ of DBM S,

    5 . The Uni form Re la t iona l P rope r ty

    S = S t r u c t u r a l

    M = M a n i p u l a t i v e

    ] =

    I n t e g r i t y

    T abu l a r

    (prev ious ly ca l led

    semi -re lat ional )

    M i n i m a l l y

    Relat ional

    g

    Relat ional l y

    , ~ Co mp l e te

    72

    F u l l y

    R e l a t i o n a l

    c = Re l a t iona l c omp l e tenes s

    m =: M in imal re la t ional

    p roc es s i ng c ap ab i lhy

    M

    m ~ .

    S

    1

    M

    C ---~-

    S m ~ l l

    M

    m . ~

    1

    M

    m 4~.

    S I

    f u t l r e l a t i ona l a l gebra de f ined in [ 8 ] ( a th ree -va lued

    pred ica t e l og i c w i th a s ing l e k ind o f nu ll ) . The ques t ion

    m ark in the i n t egr i ty box for each c la ss except the fu l l y

    re l a t i ona l i s an ind i ca t ion o f t he presen t i nadequa te

    sup por t fo r i n t egr i ty i n re l a t i ona l syst ems . S t ronge r sup-

    por t t b r doma ins and pr imary keys i s needed [10 ] , a s

    wel l as the kin d of fac i l i ty discussed in [ 14] .

    Note tha t a re l a t i ona l DBMS may package i t s r e l a -

    t i ona l p roces s ing capab i l i t y i n any conven ien t way . For

    e x a m p l e , i n t h e I N G R E S s y st e m o f R e l a ti o n a l T e c h n o l-

    o g y , In c ., t h e R E T R I E V E s t at e m e n t o f Q U E L [ 2 9 ]

    em bod ies a l l three op erators (se lect , project , join) in one

    s t a t ement , i n s uch a way tha t one can ob ta in the s ame

    e f fec t a s any one o f t he ope ra tors o r any combina t ion o f

    them.

    In the de f in i t i on o f t he re l a t i ona l mode l t he re a re

    severa l prohibi t ions . To c i te two examples : user-vis ible

    nav ig a t ion l i nks b e tween t ab l e s a re ru l ed ou t , and da ta-

    base i n form a t ion m us t no t be represen ted (o r h idden) in

    the orde r ing o f t up l e s w i th in base re l a ti ons. O ur expe r i -

    ence i s t ha t DBMS des igne rs who have implemented

    non- re l a t i ona l sys t ems do no t read i ly unders t and and

    accept these prohibi t ions . By contras t , users enthus ias t i -

    ca l l y un ders t an d and accept t he enhanced ease o f l ea rn -

    ing and ease o f use re su l t i ng f rom these proh ib i ti ons .

    Inc iden ta l l y , the Re la t iona l Task G rou p o f t he Am er -

    ican Nat i ona l S tandar ds Ins t i tute has recent ly i ssued a

    repor t [ 4 ] on the f eas ib i l i ty o f deve lop ing a s t anda rd fo r

    re la t ional database sys tems. This report conta ins an en-

    l i gh ten ing ana lys i s o f t he f ea tures o f a dozen re l a ti ona l

    sys tems , and i t s authors c lear ly unders tand the re la t ional

    mode l .

    In o rde r to have wide app l i cab i l i t y mos t re l a t i ona l

    DBM S have a da ta s ub languag e which can be in t e r faced

    w i t h o n e o r m o r e o f th e c o m m o n l y u s e d p r o g r a m m i n g

    languages (e .g. , Cobol , Fort ran, P L / I , APL) . We sha l l

    refer to these la t ter languages as

    host languages.

    A rela-

    t i ona l DBMS usua l ly suppor t s a t l ea s t one end-use r

    or i en ted da ta s ub language - - some t imes s eve ra l , because

    the needs of these users may vary. Som e pr efer s tr ing

    l anguages such a s QU EL or SQL [5 ] , wh i l e o the rs pre fe r

    the s c reen-or i en ted two-d imens iona l da t a s ub language

    o f Q u e r y - b y - E x a m p l e [ 3 3 ] .

    Now, some re la t ional sys tems (e .g. , Sys tem R [6] ,

    IN G RE S [29 ]) s uppor t a da ta s ub language tha t i s usab le

    in two modes : ( t ) in teract ive ly a t a termina l and (2)

    embedded in an app l i ca t ion program wr i t t en in a hos t

    l anguage . The re a re s trong a rgum ent s fo r s uch a

    double-

    mode

    da ta sub language :

    (1 ) Wi th such a l anguage app l i ca t ion programmers

    can separate ly debug a t a terminal the database s ta te-

    ment s they w i sh to i ncorpora t e i n the ir app l i ca t ion pro -

    g r a m s - - p e o p l e w h o h a v e u s e d S Q L t o d e v e l o p a p p l ic a -

    t i on programs c l a im tha t t he doub le -mode f ea ture s i g -

    n i f i can t ly enhances the i r p rodu c t i v i ty ;

    (2 ) Such a l anguage s i gn i f ican t ly enhances comm u-

    n ica t ion among program mers , ana lys ts , end u se rs , da t a -

    base administration staft ; etc.;

    (3) Fr ivol ous dis t inct ions betw een the langu ages used

    in these two modes p l ace an unneces sa ry l ea rn ing and

    m em ory burden on those use rs who hav e to work in bo th

    modes .

    The imp or t ance o f t hi s f ea ture i n produc t i v i ty sug-

    ges ts that re la t ional DBMS be c lass i f ied according to

    wh ether they possess this feature or not . Accordin gly, we

    ca ll t hose re l a t i onal DB MS tha t s uppor t a doub le -m ode

    sub language

    uniform relational.

    Thus , a un i fo rm re l a -

    t i ona l DBMS suppor t s re l a t i ona l p roces s ing a t bo th an

    end-use r i n t e r f ace and a t an app l i ca t ion programming

    interface using a data sublanguage com mon to both inter-

    fdce

    The na tura l t e rm for a l l o the r re l a t i ona l DBMS i s

    non-uniform relational. A n e x a m p l e o f a n o n - u n i t b r m

    r e la t io n a l D B M S i s t h e T A N D E M E N C O M P A S S [ 19 ].

    With this sys tem, when re t r ieving data interact ive ly a t a

    t e rmina l , one uses the re l a t i ona l da t a s ub language EN-

    FORM (a l anguage wi th re l a t i ona l p roces s ing capab i l -

    i t y ) . When wr i t i ng a program to re t r i eve or manipu l a t e

    da ta, on e uses an ex tended ve rs ion o f Cobol ( a l angu age

    that does not possess the re la t ion al process ing capabi l i ty) .

    Com m on to bo th l evel s o f use a re the s t ruc tures: t ab le s

    wi thout use r -v is ib le nav iga t ion l i nks be tween them.

    A ques t ion that immediate ly ar ises i s this : how can a

    da ta sub language wi th re l a t i ona l p roces s ing capab i l i t y

    be in t e rf aced wi th a l anguage such a s Cobol o r P L / I t ha t

    can handle data one record a t a t ime only ( i .e . , that i s

    incapab le of t reat ing a se t of records as a s ingle operand )?

    To so lve th i s p rob lem we mus t s epa ra t e the fo l l owing

    113

    Co mm u n i ca t i o n s F eb r u a r y 1 982

    o f V o l u m e 2 5

    t h e A C M N u m b e r 2

  • 8/10/2019 05 RDB Productivity

    6/9

    two ac t ions f ?om one anothe r : ( 1 ) de f in i t i on o f t he re l a -

    t i on to be de r i ved ; ( 2 ) p re sen ta t ion o f t he de r i ved re l a t i on

    to the hos t l anguage program.

    One so lu t ion ( adop ted in the Pe te r l ee Re la t iona l Tes t

    V ehic le [3 l ] ) i s to cas t a der iv ed re la t ion in the fbrm of

    a f i t e t ha t can be read record-by- record by means o f hos t

    lang uag e s ta tements . In this case det ive~ 7 of records i s

    delegated to the f i le sys tem used by the pert inent hos t

    l a n g u a g e .

    Anothe r so lu t ion ( adopted by Sys tem R) i s t o keep

    the de l i v e ry o f records under the cont ro l o f da t a s ub lam.

    guag e s t a t ement s and , hence , unde r the cont ro l o f t he

    re l a t i ona l DB MS opt i rai ze r . A q ue ry s t a temen t Q of

    S Q L ( t h e d a t a s u b l a n g u a g e o f

    Sys tem

    R ) m a y b e e m -

    bedded in a hos t l anguage program, us ing the fb l l owing

    kind o f phrase ( fo r expds i to ry reasons , t he syn tax i s no t

    e x a c t l y th a t o f S Q L )

    D E C L A R E C C U R S O R F O R Q

    w h e r e C s t a nd s f o r a n y n a m e c h o s e n b y t h e p r o g r a m m e r .

    Such a s ta tement associa tes a

    cursor

    named C wi th the

    de f in ing expres s ion Q , Tup les f rom the de r i ved ro t a t i on

    de f ined by Q a re presen ted to the prog ram one a t a t ime

    b y m e a n s o f th e n a m e d c u r so r . E a c h t i m e a K E T C H p e r

    th i s cursor i s executed , t he sys t em de l i ve rs ano the r t up l e

    f l ' om the de r i ved re l a ti on . T he orde r o f de l i ve ry is sys -

    t em-de te rmined -un le s s the SQL s t a t ement Q de f in ing

    the de r i ved re l a t i on conta ins an ORDER BY c l ause .

    t t is imp or t an t t o no te tha t i n advanc ing a cursor ove r

    a de r i ved re l a t i on the programmer i s

    not

    engag ing in

    nav iga t ion to some t a rge t da t a . The de r i ved re l a t i on i s

    i t s e l f t he t arge t da t a I t i s t he DBM S tha t de t e rm ines

    whe the r the de r i ved re l a t i on should be ma te r i a l i zed en

    bloc

    pr io r to the cursor -cont ro l l ed s can or ma te r i a l i zed

    piecemeal dur ing the scan. In e i ther case , i t i s the sys tem

    (not the programmer ) tha t s e l ec t s t he acces s pa ths by

    which the de r i ved da ta i s t o be gene ra t ed . Th i s t akes a

    s i gn i f i can t burden o f f t he pro gram m er ' s shoulde rs ,

    t he reby inc reas ing h i s p roduc t i v i ty .

    6o S kep~ticism A bo st R elat,~ena] Sy stem s

    Th ere has been no shor t age o f s kept ic i sm conce rn ing

    the prac t i ca l i t y o f t he re l a t i ona l approach to da tabase

    m a n a g e m e n t . M u c h o f t h is s k e p ti c is m s t em s f r o m a l a ck

    o f u n d e r s t an d i n g , s o m e f r o m a f e a r o f t h e n u m e r o u s

    theoret ica l inves t igat ions that are based on the re la t ional

    m ode l [1, 2 , 15 , 16, 24] . Ins tead o f we lcom ing a theoret. .

    i ca l fbunda t ion a s prov id ing soundnes s , t he a t t i t ude

    seems to be: i f i t ' s theoret ica l , i t cannot be pract ica l . T he

    absence o f a theore t i ca l Ibun da t ion fo r a lmos t a l l non-

    re l a t i ona l DB MS i s t he pr ime cause o f t he i r

    ungepotchket

    qua l i t y . (T h i s i s a Y idd i sh word , one o f whose mean ings

    i s pa t ched up . )

    On the o the r hand , i t s eems reasonab le to pose the

    fo l lowing two ques t ions :

    (1 ) Can a re l at i ona l syst em prov ide the range o f set-

    H 4

    vices

    that

    w e h a v e g r o w n t o e x p e c t f r o m o t h e r D B M S ?

    (2) I f (1) is m~swered af~i rm adv dy, can such a system

    pe rfb rm as well as non~relatior~.a/ D BM S? ~

    W e took at each. of the se i~"~ t~rm

    6.~ Rang e o f Se rv i ces

    A fu l l - s ca l e DBMS prov ides the %l lowing capab i l i -

    ties:

    o da ta s to rage , r e t ri eva l , and upd a te ;

    a use r -acces s ib l e ca t a log fb r dam desc r i p t i on ;

    t ransac t ion suppor t t o ensure t h a t all o r none o f a

    s equence o f da tabase cha rg es a re re f l ec t ed in the

    per t inent datab ases (see [17] fbr an ~dp- to-date sum-

    m a r y o f t r an s a c t i o n t e c h n o l o g y ) ;

    o recov ery services in case of fhi ture (sys tem , media ,

    o r p r o g r a m ) ;

    o concu r rency con t ro l s e rv i ces to ensure tt~a t concur -

    ren t t r ansac t ions be have the s a rae way as i f run in

    some sequent i a l o rde r ;

    o au thori zat i on services to ensu re th at a l l access to

    a n d m a n i p u l a t i o n o f d a t a b e i n a c c o r d a n c e w it h

    spec i f i ed cons t ra in t s on u se rs and prog ram s [18 ];

    i n t egra t ion wi th su pp or t t k~ r da ta comm unica t ion ;

    integr i ty services to ensure that database s ta tes and

    changes o f s t a te con fbrm to spec i f i ed rubs .

    Cer t a in re l a t i ona l p ro to types deve loped in the ea r ly

    sevent ies fe l t far short of providing a l t these services

    (pos s ib ly fo r good reasons ) . N ow , h ow ever , s eve ral r e-

    l a t i ona l sys t ems a re ava i l ab l e a s so f tware produc t s and

    prov ide a l l t hese s e rv i ces w i th the e xcep t ion o f t he l as t.

    P resen t ve rs ions o f t hese produ c t s a re adm i t t ed ly weak

    in the prov i s ion o f i n t egr i ty s e rv i ces , bu t t h i s i s r ap id ly

    being rem edied [ 10]~

    S o m e r e l a t i o n a l D B M S a c t u a l l y p r o v i d e m o r e c o m -

    p le t e da ta s e rv i ces than the non- re l a t i on a l sys tems . Three

    examples fo l l ow.

    A s a f i r s t e x a m p l e , r e l a t i o n a l D B M S s u p p o r t t h e

    ext rac t ion o f a l l m ean in gf u l r e l a t i ons f rom a da tabase ,

    whereas non- re l a t i ona l sys t ems suppor t ex t rac t ion on ly

    where the re ex i s t s t a t i ca l l y pre de f i ned access pa ths .

    As a s econd exam ple o f" the add i t i ona l s e rv i ces pro-

    v ided by som e re l a t i ona l sys t ems , cons ide r v i ews . A view

    i s a v i r t ua l r e l a t i on ( t ab l e ) de f i ned b y m eans o f an

    e x p r e s si o n o r se q u e n c e o f c o m m a n d s . A l t h o u g h n o t di -

    rec t ly sup por t ed by ac tua l da t a , a v i ew app ea rs to a user

    as i f i t we re an add i t i on a l base t ab l e k ep t u p- to - da te and

    in a s t a te o f i n t egr i ty w i th the o the r base t ab l e s. V iews

    a re use fu .1 fo r pe rm i t t i ng ap p l i ca t ion pro gram s an d use rs

    a t t e rmina l s t o i n t e rac t w i th cons tan t v i ew s t ruc tures ,

    even when the base t ab l e s themse lves a re undergo ing

    s t ruc tura l changes a t t he

    logical

    l eve l ( p rov id ing tha t t he

    pe r t i nen t v i ews a re s t i l l de f inab le f rom the new base

    tab l e s ) . They a re a l so use fu l i n re s t r i c t i ng the s cope o f

    a One sho uld bear in mi nd tha t the n on-re l a t iona l ones always

    employ compara t ive ly k)w leve l da ta sublanguages ibr appl ica t ion

    p rog ra mming .

    Comm unic a dons K e brua ry t 982

    of Volum e 25

    t h e A C M N u m b e r 2

  • 8/10/2019 05 RDB Productivity

    7/9

    access of programs and users. Non-re la t ional systems

    either do not support views at a l l or e tse support much

    more pr im i t ive counte rpa r ts , such as the CODASYL

    s.ubschema.

    As a th i rd example , some sys tems (e4 . , SQ L/D S [28 ]

    and i ts proto type p redecessor System R) perm it a var ie ty

    of" changes to be m ade to the logical and physical orga-

    n iza t ion of ' the d a ta dynam ica l ly - - wh i le t r ansac tions a re

    in progress. These clhanges rarely require application

    prog ram s to be recoded~ Th us, there is less of a progra m

    maintenance bv,rden , leav ing programmers to be more

    produc t ive do ing dev e lopm ent r a the r than m a in tenance ,

    This capabi l i ty i s mad e poss ib le in SQ L/D S by the t~c t

    that the system has complete control over access path

    selection.

    In non-re la t iona l sys tems such changes would nor -

    matly require a l l other database activi t ies including

    transactions in progress to be brought to a halt . The

    da tabase then rema ins out o f ac t ion unt i l the organiza -

    t ional changes are completed and any necessary recom-

    pil ing done.

    6.2 PerfOrmance

    Natura l ly, people would hesi ta te to use re la t ional

    systems if these systems were sluggish in performance.

    All too oRen, erroneous conclusions are draw n about the

    pe r formance of r e la t iona l systems by compar ing the t ime

    it might take for one of these systems to execute a

    complex transaction with the t ime a non-re la t ional sys-

    tem might take to execute an extremely simple transac-

    t ion. To arr ive a t a fa ir performance comparison, one

    must compare these systems on the same tasks or appl i-

    cat ions. We shal l present arguments to show why re la-

    t ional systems should be able to compete successful ly

    wi th non- re la t iona l sys tems.

    G ood pe r form ance i s de te rmined by two fac tor s : ( t )

    the sys tem m ust suppor t pe r formance -or ien ted phys ica l

    data structures; (2) high-lev el language requests for data

    must be compiled into lower-level code sequences a t

    least as goo d as the average applicat ion prog ram me r can

    produce by hand.

    The first s tep in the argument is that a program

    wr i t ten in a Cobol - leve l language can be made to pe r-

    form -efic iently on large databases conta ining production

    da ta s t ruc tured in tabula r form wi th no use r -v is ib le

    naviga t ion l inks be tween them. This s tep in the a rgum ent

    i s suppor ted by the fo l lowing in form a t ion [19 ] : a s o f

    Augus t t 9 8 1 , T a n d e m C o m pu te r C o r p . h a d m a n u fa c -

    tured an d in sta l led 760 systems; of these, over 700 w ere

    m a k in g u se o f th e T a n d e m E NC O MP A S S r e l a ti o n al

    da tabase m anag em ent sys tem to suppor t da tabases con-

    ta in ing 1produc t ion da ta . T andem has comm it ted i ts own

    man ufac tu r ing da tabase to the ca re o f ENC OM PA SS.

    EN CO MP A SS does not suppor t l inks be tween the data -

    base tables, e i ther user-visible (navigation) l inks or user-

    invisible (access 'method) l inks.

    In the second step of the argument, suppose w e take

    the applicat ion programs in the above-c ited insta l la t ions

    1t5

    and replace the database re tr ieval and manipulat ion

    sta tements by sta tements in a database sublangu age w ith

    a relational processing capability (e.g. , SQL). Clearly, to

    obta in good performance with such a high level lan-

    guage, i t is essentia l that i t be com piled into object code

    (instead of being interpreted), and i t is essential that that

    object code be efficient.

    Compila t ion is used in System R and i ts product

    ve r sion SQ L/D S. In 1976 Raym ond Lor ie deve loped an

    ingenious pre- and post-compil ing scheme ~br coping

    with dynam ic changes in access paths [21]. It a lso copes

    with ear ly (and hence effic ient) author iz at ion and integ-

    r i ty checking (the la t ter , however , is not yet imple-

    mented). This scheme cal ls for compil ing in a ra ther

    specia l way the SQL sta tements embedded in a host

    language program. T his comp i la t ion s tep t r ansforms the

    SQL s ta tements in to appropr ia te CALLs wi th in the

    source program together with access modu les conta ining

    object code. These m odules are then stored in the data-

    base tbr la ter use a t runtime. The code in these access

    modu les is generated by the system so as to optim ize the

    sequencing of the majo r operations and the se lection of

    access paths to prov ide ru ntim e effic iency. Al ter this pre-

    compila t ion step, the applicat ion program is compiled

    by a regular com piler tbr the per t inent .host languag e. I f

    a t any subsequent t ime one o r more o f the access paths

    is removed and an a t tempt i s made to run the program,

    enough source information has been re ta ined in the

    access mod ule to enable the system to re-compile a new

    access m odul e that exploits the now ex ist ing access paths

    without requiring a re-compilation of the application pro-

    gram.

    Incidenta l ly, the same data sublanguage compiler is

    used on ad hoc quer ies submitted interactively t~om a

    terminal and a lso on quer ies that are dynamical ly gen-

    erated dur ing the execution of a program (e .g. , i~om

    parameters submitted interactively). Immediate ly after

    compila t ion, such quer ies are executed and, with the

    exception of the simplest of quer ies, the p erform ance is

    better than that of an interpreter .

    The generation of access modules (whether a t the

    init ia l compil ing or re-compil ing stage) enta i ts a quite

    sophist icated optimi zation scheme [27], wh ich mak es use

    of sys tem-ma in ta ined s ta t is t ic s that would not norma l ly

    be wi th in the programm er ' s knowledge . Thus , on ly on

    the simplest of a l l transactions would i t be possible for

    an ave rage app l ica t ion programm er to compe te w i th th i s

    optimizer in generation of eff ic ient code. Any attempts

    to compe te are bou nd to reduce the programm er ' s pro-

    ductivi ty. Thus, the pr ice paid for extra compile-t ime

    overhead wou ld seem to be we l l wor th pay ing .

    Assu ming no n-l in ked tabul ar structures in both cases,

    we can expec t SQL/DS to genera te code comparab le

    with average hand-writ ten code in many simple cases,

    and supe r ior in man y comp lex cases . Many comm erc ial

    transactions are extremely simple . For exam ple , one m ay

    need to look up a record for a par t icular ra i lroad w agon

    to find out where i t is or f ind the balance in someone 's

    Communications February 1982

    of Volume 25

    the ACM Number 2

  • 8/10/2019 05 RDB Productivity

    8/9

    savings account, if suitably t~st access paths are sup-

    por ted (e.g., hashing), there is no reason wh y a high-lev el

    language such as SQL, QUEL, or QBE should resut t in

    less efficient ram.time code for these simple transactions

    than a lower level language, even though such transac--

    t ions m ake l it tle u se of the opt imizing capabi l i ty o f the

    high- level data sublanguage compi ler .

    7 . Future Direct ions

    If we are to use relat ional database as a %undat ion

    for product iv i ty , w e need to know what sort of develop -

    m ents m ay l ie ahead tbr relat ional systems.

    Let us deal with near-term developments first . In

    some relat ional systems s t ronger support i s needed for

    domains and prim ary keys per suggest ions in [10] . As

    al ready noted, at l relat ional systems need upg rading wi th

    regard to automatic adherence to integrity constraints.

    Exis ting constraints on u pdat ing jo in- type views need to

    be relaxed (where theoretically possible), and progress is

    being m ade on this problem [20]. Supp ort for outer jo ins

    is needed.

    Marked improvements a re be ing made in op timiz ing

    technology, so we may reasonably expect further im-

    provem ents in per%finance. In certain products , such as

    the ICL CA FS [22 ] and the Br i t ton -Lee IDM5 00 [13 ],

    special hardw are sup port has been implem ented. Special

    hardware may help performance in certain types of

    appl icat ions . However, in the majori ty of appl icat ions

    deal ing wi th format ted databases , sof tware- implemented

    relat ional systems can compete in performance wi th

    software- implemented non-relat ional systems.

    At present , most relat ional systems do not provide

    any special support for engineering and scient i f ic data-

    bases . Such supp ort , including interfacing w i th Fort ran,

    is clearly needed and can be expected.

    Catalogs in relational systems already consist of ad-

    ditional relations that can be interrogated just l ike the

    rest of the database us ing the same query language. A

    n.atural develop me nt that can and should b e swif t ly pu t

    in place is the exp ansion o f these catalogs into full-

    f ledged act ive dict ionaries to p rovide addi t ional on- l ine

    data control.

    Final ly , in the near term, we may expect database

    design aids suited for use with relational systems both at

    the logical and physical levels.

    In the longer term we may expect support for rela-

    t ional databases dis t r ibuted over a communicat ions net -

    work [25 , 30, 32] and m anage d in such a w ay that

    appl icat ion programs and interact ive users can manipu-

    late the dat a (1) as if al l of i t were stored at the local

    node--location transparency--and (2) as i f no data w ere

    replicated

    anywhere--replication transyaremy.

    All three

    of the projects ci ted above are based on the relat ional

    model . One important reason for th is i s that relat ional

    databases offer great decomposi t ion f lexibi l i ty when

    planning how a database is to be dis t r ibuted over a

    H6

    network o f comp~ ter sys tems , and g rea t recompos i ti on

    p o w e r f o r d y n a mi c combination of decentralized infor-

    mat ion . By con trast , CO D A SY L D BT G databases a re

    very d i f f i cuk to decompose and recompose due to the

    er~tanglement of" the ow ner -m em ber navigat ion l inks.

    Th i s p roper ty makes the CO D A SY L app roach ex t remely

    di f f icul t to adapt to a d is t r ibuted database environment

    and m ay w ell prov e to be i ts downfh11. A secor~d reason

    for u se of ' the rela tional m ode l is that i t off?rs concise

    high level data subtanguages for t ransmit t ing requests

    tbr data fYom node to node.

    The ongoing work in extending the relat ional model

    to cap tu re in a fo rmal w ay m ore m ean ing o f the data

    can be exp ected to lead to the incorporat ion of this

    raeaning in the database catalog in order to thctor i t out

    o f app l i ca t ion p rogram s and make these p rograms even

    m ore concise and s imple. Here, we are, of course, talk ing

    abou t m eaning that i s represented in such a way that the

    system can unders tand i t and act upon i t .

    1reproved theories are being developed %r handl ing

    miss ing data and inapp l icable data (see for example

    [3]) . This w ork should yield imp roved t reatment of null

    values .

    As it stands today, relational database is best suited

    to data wi th a rather regular or homogeneous s t ructure.

    Can we retain the advantages of the relat ional approach

    whi le handl ing heterogeneous data also? Such data may

    include images , text , and m iscel laneous facts . An aff irm-

    at ive ans we r is expected, and some research is in progress

    on this subject , but more is needed.

    Considerable research is needed to achieve a rap-

    p rochem ent be tween da tabase l anguages and p rogram-

    ming l anguages . Pasca l /R [26 ] i s a good exam ple o f work

    in this direction. Ongoing investigations focus on the

    incorp orat ion of abstract data types into database lan-

    gu ages o n the one hand [12] and relat ional processing

    in to p rogramm ing l anguages on the o ther .

    8. Conclus ions

    W e have p resen ted a ser ies o f a rgum ents to suppor t

    the c l a im

    that

    relat ional databa se technology offers dra-

    mat i c improvements in p roduct i v i ty bo th fo r end users

    and fo r app l i ca t ion p rogramm ers . T he argum ents cen ter

    on the data independence, s t ructural s impl ici ty , and

    relat ional process ing def ined in the relat ional model and

    implemented in re l a t iona l da tabase management sys -

    tem s. A tt three o f these fF,atures s impl i fy the task of

    deve lop ing app l i ca t ion p rograms and the fo rmula t ion o f

    queries and updates to be submit ted from a terminal . In

    addi t ion, the f i rs t feature tends to keep programs viable

    in the face of organizat ional and descript ive changes in

    the database and therefore reduces the effort that i s

    norm al ly d iver t ed into the main tenance o f p rograms .

    W hy, then, does the t i t le of this pap er sugg est that

    relat ional database provides only a tbundat ion for im-

    proved product iv i ty and not the total solut ion? The

    Comm unic atio ns February 1982

    of Volume 25

    the AC M N umber 2

  • 8/10/2019 05 RDB Productivity

    9/9

    reason is s imple : re la t ional database deals only with the

    sha red da ta component o f app l ica t ion programs and

    end-use r in terac tions . There a re num erous com plemen-

    ta ry technologies tha t m ay he lp wi th o the r components

    or aspects , { ior exam ple , p rogram m ing lang uag es that

    suppor t r e la t iona l process ing and improved checking of

    da ta types , improv ed ed i tor s tha t u nders tand more o f the

    language be ing used , e tc . We use the te rm " tbu nda t ion , "

    because interaction with shared data (whether by pro-

    gram or via term inal) represents the core of so mu ch

    data processing activi ty.

    The practica l i ty of the re la t ional approach has been

    proven by the test and production insta l la t ions that are

    already in operation. Accordingly, with re la t ional sys-

    tems we can f low look forward to the produc t iv i ty boos t

    that we a l l hop ed DB MS w oul d provide in the f irst place.

    Acknowledgmems. I woald l ike to express my in-

    debtedness to the System R development team at IBM

    Research, San Jose for developing a ful l -scale , uniform

    re la t iona l pro to type tha t en ta i led numerous language

    and sys tem innova t ions ; to the deve lopm ent team a t the

    IBM Labora tory, Endicott , N .Y. tbr the professional way

    in which they converted System R into product tbrm; to

    the va r ious teams a t un ive r s i t ie s , ha rdware manufac -

    turers , softw are f irm s, an d user inta t la tions," wh o de-

    s igned and imp lem ented work ing re la tiona l sys tems; to

    the QBE team a t IBM Yorktown He ights , N.Y. ; to the

    PRTV team a t the tBM Sc ient i f ic Centre in England;

    and to the numerous cont r ibutor s to da tabase theory

    who have used the re la t ional model as a cornerstone. A

    specia l acknowledgement is due to the very few col-

    leagues who saw someth ing w or th suppor t ing in the ea r ly

    stages, par t icular ly, Chris Date and Sharon Weinberg.

    Final ly, i t was Sharon Weinberg who suggested the

    theme of this paper .

    Received 10/81 ; revised and accepted 12/81

    ~.eferences

    1. Beeri, C., Bernstei n, P., Oo odm an, N. A sophisticate 's intro ducti on

    to da tabase normaliza t i on theory. Proc. Very Large Data Bases, West

    Berlin, Ge rm any, Sept. 1978.

    2 . Bernste in, P .A., G oodm an, N. , La i, M-Y. Laying phan tom s to

    res t. Report TR-03- 8 t, Center tbr R esearch in Computing

    Technology, Harvard Univers i ty, Cambridge , Mass. , 1981.

    3 . Biskup, J.A. A form al approach to nul l va lues in da tabase

    relations. Proc. W orksh op on F orma l Bases fo r Data Bases, Toulouse ,

    France , Dec 1979; published in [16] (see be low) pp 29 9-342.

    4 . Brodie, M. and Schmidt , J . (Eds) , Report of the AN SI Rela t iona l

    Ta sk G roup . , ( to be pub l i she d A CM SIG MO D Re c ord) .

    5. Chamb erl in, D.D. , e t aL SEQUEL2: A unif ied approach to da ta

    def ini t ion, manipula t ion, and control . I B M J. Res. Dev., 20, 6,

    (Nov. 1976) 560-5 65.

    6 . Chamb erl in, D.D. , e t al . A his tory and eva lua t ion of system R.

    Comm. ACM , 24 , 10, (Oct. 198 1) 632-646.

    7 . Codd, E.F. A re la t iona l mo del of da ta for la rge shared data

    banks. Comm. ACM , 13 , 6, (June 1970) 377-387.

    8, Codd, E.F. Exten ding the datab ase relational mo del to capture

    more meaning. AC M TODS, 4 , 4 , (Dec. 19 79 ) 397-.-434.

    9. Codd, E.F. Data models in da tabase manag ement. A C M

    SIG MO D Re cord , 1 t , 2, (Feb. 1 981 ) 112-114.

    10. Codd, E.F. The capabilities of relational database management

    systems. Proc. Convencio I nformation Llatina, Barceiona, Spain, June

    ,.-12, 1981,,pp 13-26; also available as Report 3132, IBM Research

    Lab., San Jose, Calif.

    11. Date, C.J. Referential integrity. Proc . Very Large Data Bases,

    Cannes , France , Septem ber 9-11, 1981, pp 2-12.

    12. Ehrig, H ., and W eber, H. Alg ebraic specification schemes for

    data base systems. Proc. Very Large Data Bases, West Berlin,

    Germ any, Sept 13-15, 1978, 427-440.

    :13. Epstein, R., a nd Haw thorne, P. D esign decisions for tile

    intelligent database machine. Proe. N CC 1980 , AFIP S , Vol . 49 , May

    1980, pp 237-241.

    14. Eswaran, K.P., and Chamberlin, D.D. Functional specifications

    of a su bsystem fo r database integrity. Proc. Very Large Data Bases,

    Fram ingha m, M ass., Sept. 1975 , pp 48--68.

    15. Fagin, R. Horn clauses and database dependencies. Proc. 1980

    A CM SIG A CT Syrup . on Theory of Computing, Los Angeles, CA, pp

    123-134,

    16, Gallaire, H., Mink er, J., and N icolas, J.M. Advanc es in Data Base

    Theory. Vol 1, Plenum Press, New York, 1981.

    17. G ray, J. Th e transaction concept: virtues and limitation s, Proc.

    Very Large Data Bases, Cannes , France , September 9- t 1 , 1981, pp

    I44-154.

    18. G f if f i ths , P .G. , and W ade, B.W. An author iza t ion mechanism for

    a relational data base system. A C M T O D S , 1, 3, (Sept 1 976) 242-255.

    19. Held, G, ENCOM PASS: A re la t ional da ta manager . Data Base /

    81, Western Institu te of Com put er Science, Univ . of Sant a Clara,

    Santa Clara, Calif. , August 24-28, 1981.

    20. Keller, A.M. Up dates to relational databases thro ug h views

    involving oins. Report RJ3282, IBM Research Laboratory, San Jose,

    Calif., October 27, 198l.

    21. Lorie, R.A , and N ilsson, J.F. A n access specification language

    for a relational data base system. IB M J. Res. Dev., 23, 3, (May

    1979) 286-298.

    22. M aller, V.A.J. The content addressable file store- -CA FS. I C L

    g?chnical J. , 1, 3, (Nov. 1979 ) 265-279.

    23. Reisner, P. Hu m an factors studies of database query languages::

    A survey and assessment. AC M Computing Surveys', I3, 1, (March

    1981) 13-31.

    24. Rissanen, J. Theory of relations for datab ases- -A tutoria l smvey.

    Proc. Syrup. on M athematical Foundations of Computer Science,

    Zakopane , P oland, September 1978, Lecture Notes in Compu ter

    Science, No. 64, Springer Verlag, New Y ork, 1978.

    28. Rothnie, J.B., Jr, et al. Introd uctio n to a system for distributed

    databases (SDD-1). A C M T O D S , 5, 1, (March 1980) 1-17.

    26. Schmidt, J.W. Some hig h level langu age constructs tbr data of

    type relation. ACM TODS, 2 , 3, (Sept 1977) 247-261.

    27. Selinger, P .G., et al. Access path selection in a relationa l database

    system. Proe. 1979 A CM SI G M O D International Conference on

    Man agement of Data, Boston, MA , May 1979, pp 23-34 .

    2K - - - , SQL /Da ta system for VSE: A re la t ional da ta system for

    appl ica t ion deve lopment. IBM Corp. D ata Process ing Divis ion,

    W hite Plains, N.Y., G320-6590, Feb 1981.

    29. Stonebraker , M.R. , e ta l . T he des ign and implem enta t ion of

    IN G RES, A C M T O D S, 1, 3, (Sept. 19 76) 189-22 2.

    30, Stonebraker, M.R., and Neuhold, E.J. A distributed data base

    vers ion of ING RES. Proe. Second Berk eley Workshop on Distributed

    Data M anagement and Computer N etworks, Lawrence-Berkeley Lab.,

    Berkeley, Calif., M ay 19 77, pp 19-36.

    31l. Todd., S.J.P. The Peterlee relational test vehicle--A system

    overview. IB M Systems J . , 15 , 4 , 1976, 285-308.

    32. Williams, R. et al. R*: An ov erview of the architecture. Repo rt

    RJ3325, IBM Research Laboratory, San Jose, Calif. , October 27,

    1981.

    33. Zloof, M.M. Query by example. Proc . N CC, AF IP S Vol 44 , Ma y

    197 5, pp 431--..438.

    117

    Comm unica t ions February 1982

    of Volume 25

    the A CM N um be r 2