rdf, the semantic web, jordan, jordan and jordan · the web. the key point is that the web – and...

24
,,nn Gray, N. (2015) RDF, the Semantic Web, Jordan, Jordan and Jordan. In: Moss, M. and Currall, J. (eds.) Transition to the Digital. Facet Publishing. Copyright © 2015 The Author A copy can be downloaded for personal non-commercial research or study, without prior permission or charge Content must not be changed in any way or reproduced in any format or medium without the formal permission of the copyright holder(s) http://eprints.gla.ac.uk/101484/ Deposited on: 10 April 2015 Enlighten – Research publications by members of the University of Glasgow http://eprints.gla.ac.uk

Upload: others

Post on 04-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

,,nn

Gray, N. (2015) RDF, the Semantic Web, Jordan, Jordan and Jordan. In: Moss, M. and Currall, J. (eds.) Transition to the Digital. Facet Publishing.

Copyright © 2015 The Author

A copy can be downloaded for personal non-commercial research or study, without prior permission or charge

Content must not be changed in any way or reproduced in any format or medium without the formal permission of the copyright holder(s)

http://eprints.gla.ac.uk/101484/

Deposited on: 10 April 2015

Enlighten – Research publications by members of the University of Glasgow http://eprints.gla.ac.uk

Page 2: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordanNormanGray1

24August2014

DRAFT chaptertoappearin‘TransitiontotheDigital’, MichaelMossandJamesCurrall(eds.) (2015?). Thiscollectionisaddressedtoarchivistsandlibraryprofessionals, andsohasaslightfocusonimplicationsimplicationsforthem. Thischapterisnonethelessintendedtobeamore-or-lessgenericintroductiontotheSemanticWebandRDF,whichisn'tspecifictothatdomain. CopyrightNormanGray2014.

ThischapterisaboutthekeynoveltiesoftheSemanticWeb–thenovelideas,andthenovelopportunities. ButwewilldiscussthesedigitalnoveltiesinthecontextoftheSemanticWeb’s continuities withotherfeaturesoftheinformationworld.

Ourmostobviousantecedentisnotthatold–the(non-semantic)Webdidn’texistbeforethe90s–andwewilllearnabouttheveryclosetechnicaloverlapbetweentheSemanticWebandthe‘textualWeb’ofournow-usualexperience.TheSemanticWebis, closerthanacousin, thesiblingofthetextualweb.

Otherantecedentshaveahistoryasoldasthefirstlibraryindex. TheSe-manticWebhas, wemightsay, a‘logicalwing’andan‘informationwing’. Thesearenotprimarilydistinguishedbytheirtechnicalororganisationalfeatures, butby the largely disjoint research questions they address, and by theirmotiva-tions. While the ‘logicalwing’ ischaracterisedbyaconcern for formal logicanditsimplementations, richinthetheoryofcomputingscience1, the‘informa-tionwing’, withasturdilypragmaticfocus, canberegardedascontinuouswiththeinformation-organisinggoalsoftheworldoflibraryscience, sharingitsaspi-rationtosystematizeandshareinformation, anditsacknowledgementthatsuchsharingisalwaysapproximateandneverunmediated, andthatonemustaimforabalancebetweenfaithfulness tosources, andwhat isactuallyusablebytheinformation’sactualaudience.

Below, wewillstartwithabroadintroductiontotheSemanticWeb. Fromtherewecanmovebrisklyontopractice, andthequestionofwhere, howandwhether thesemanticwebmightappear intechnological fact. Ourgoal is toindicatethecontinuitieswiththetextualweb, andthustoindicatethenovel-tiesoftheSemanticWeb, andsosuggestwhytheyareimportant. Afterabriefparenthesison‘Web2.0’(inSect. 2), wedescribe‘linkeddata’inSect. 3.

1 WhatistheSemanticWeb?

TheSemanticWebissimpleinsummary:

TheSemanticWebistheemergingnextstageoftheweb, designedtotransmitmachine-processablemeaning, through a logical framework

1PhysicsandAstronomy, UniversityofGlasgow, UK; http://nxg.me.uk1ThisstrandcanwithatleastalittlejusticeberegardedasthemostrecentlasthurrahofAI,a

traditionpronounceddeadmoreoftenthan SouthPark’s Kenny.

Page 3: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

namedRDF,enhancedbymachineinferencebasedonOWL andotherontologies.

ThoughI assertthatthismodeststatementisplausible, andtheoutcomedesir-able, thereadermaybedisinclinedtoagree, onthegroundsthatthestatementisonthefaceofthingsgobbledegook. Overthecourseofthischapter, I intendtoexplaineachcomponentofthisremark, stepbystep, inthehopethatitisashorthopfromtheretoplausibility.

1.1 TheSemanticWeb…

First, agenerallament.The term ‘semanticweb’ is anunfortunateone, since itmakes the topic

soundmuchmorearcanethanitreallyis. Itisarguablysimple: thenextstageinalong-termvisionoftheweb, originallyformulatedbyTimBerners-Lee, andelaboratedbythe WorldWideWebConsortium(W3C) whichhewasinstrumen-talinfounding.

1.2 …istheemergingnextstageoftheweb…

Beforewecanfullyunderstandthe‘semanticweb’, weneedaclearideaofwhatthe‘WorldWideWeb’is, eventhough, nowadays, suchaquestionmayseemasoddasaskingwhat‘air’is.

Thewebis remarkably homogeneous: itconsistsof one protocol, one bitofglue, andmarkup.

Everythingon theweb is connectedby theHypertextTransport Protocol(HTTP) [1], andifyoulookatawebpage, updateapodcast, talkonJabber, ordownloadvideos, onacomputeroramobilephone, thebytescometoyourcomputerviaHTTP.Indeed, adebatablebutplausibledefinitionofthewebisas the setof things reachable throughanHTTP request. Theonlybitof thatprotocolyouevernoticeis‘404’, whichistheHTTP errorcodemeaning‘I don’thaveanythingbythatname’.

Thebitofglueisthe UniformResourceIdentifier(URI) [2]. That’sauniformnaming schemeforthingsonthewebandbeyond. EverythingonthewebhasaURI,andmostURIsrefertothingsontheweb.2

Themarkupismostly HypertextMarkupLanguage(HTML),theangle-bracketswhichindicatehowwebpagesshouldbeformatted. Butit’salsoRSS andAtomfeeds(ie, blogsandpodcasts), wikisyntax, PDFs, andafewothermoreobscurealternatives.

Thesecomponentscometogetherwhenyouclickonalinkinawebpage(thefollowingexplanationofhowthewebworksisprobablynotnewtoyou,butI'mspellingitoutinordertomakeclearhowthevariouscomponentsworktogether). Thewebbrowserprogramonyourcomputerortabletorphoneisaclient, whichdisplaysapagepreviouslysenttoitbyaprogramsittingona serversomewhereintheinternet(wewillusetheselattertermsrepeatedlybelow). Thepage(typically)arrivesatyourcomputerintheformofHTML whichthebrowserknowshowtoformatanddisplayasheadings, sidebars, images, andlinks. Thelinkassociates some texton thepagewitha URI,andclickingon it tells thebrowser‘goandlookatthispageinstead’. ThebrowserthenexaminestheURItodiscoverwhichwebserverisprovidingthatpage, thenimmediatelymakesanotherHTTP requesttothatservertoretrievethepage, displayittoyou, andstartthecycleoncemore.

2That’sasortof ‘mindspace’most: thisstatementprobably isn’tnumerically trueifyougobynumbersofobjectsorvolumeofdata.

2

Page 4: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

Beforewegoon, I should(parenthetically)becarefultodistinguishthewebfromthe internet. Theinternetisasetofprotocols, plural, forexchangingmate-rialbetweennetworkedcomputers. Whenyousendanemailorretrieveawebpage, thematerialtravelsovertheinternet, butunderthecontrolofdifferentpro-grams–anemailclientandawebbrowser–usingacombinationoflower-andhigher-levelprotocols, or languages.3 Internet telephony (suchasSkype)andtimeservicesaretworeasonablyvisibleinternetserviceswhicharedistinctfromtheweb. Thekeypointisthattheweb–andthisincludestheSemanticWeb–isanotablysimplestructuresittingatoptheinternet.

Thereareafewkeydatesinthehistoryoftheweb.

• 1990: TimBerners-LeeandRobertCaillaufirstproposedthesystemwhichbecametheWorldWideWeb [3].4 Thefirstserverandclientimplementa-tionsappearedasCERN-internalserviceslaterthatyear.

• August1991: Firstpublicweb server. Initially, users interactedwith theserverbyusingTelnet(anothernon-Webinternetprotocol)toconnectdi-rectlytoaclientprogramatCERN,andthencetotheserver.

• December1991: Firstweb serveroutsideEurope, at theStanfordLinearAcceleratorCenter(SLAC,likeCERN anexperimentalhigh-energyphysicslaboratory).

• April1993: Mosaic, fromthe(US) NationalCenterforSupercomputingAp-plications(NCSA),wasthefirstgraphicalbrowser. MosaicledtoMozilla,which led to today’s Firefox; Mozilla also led to the Netscape browser.Aboutthesametime, NCSA releasedtheir‘httpd’webserver, whicheven-tuallymutatedintothenow-ubiquitousApache.5

• October1994: TheWorldWideWebConsortium(W3C) wasfounded, withBerners-Lee as its director, to act as the standards bodyof the emergingsystem. EarlystandardsincludedHTML 3.2(1997),6 thefirstversionofXML(1998) [5], andthefirstmodelandsyntaxforRDF (1999) [6].

• 2006: Lolcats (andother ‘user-generated content’). We return to this inSect. 2

Although itmayseemaslight tangent, it isworthwhilebrieflydiscussingwhatitisthatmakesthewebsospecial, andsoverysuccessful, sincethese-manticwebisinprotocoltermsidenticaltothewebwearefamiliarwith, andsosharesthesamespecialfeatures.

Thewebisnotthefirsthypertextsystem(VannevarBush’shypotheticalMemexsystem [7], andTedNelson’s Xanadu system7 canprobablylayclaimtothat),and it’snot thefirstdistributedhypertext system (VAX NotesandLotusNotescanprobablyclaimthat), butitisthefirsttrulysuccessfulworldwidedistributedhypertextsystem.

Thewebgetssomethingsright:

3Thispointisrathermuddiedbytheobservationthatmanypeoplenowreademailmessagesinawebbrowser; nonethelesstheemailisstilltransportedbetweensystems, inthebackground, usingthedecades-oldemailprotocols.

4This1990documentcomesabout18monthsafter[4], whichisabroaddiscussionofinforma-tionmanagement, inthecontextof“themanagementofgeneralinformationaboutacceleratorsandexperimentsatCERN.”Thisdocument–onwhichBerners-Lee’smanagerwrote“vague, butexciting”–isclearlyancestraltoboththewebandto[3], butitisparticularlyinterestingfromthepointofviewofthischapter, becauseinhindsightitmorecloselyprefiguresthestructureandpotentialofRDF andthe Semantic Web. Onecouldalmostcallthetextualweb‘SemanticWeb0.1’.

5Apacheoriginallyconsistedofasetofsoftware‘patches’toNCSA httpd, hencethe(unfortunatelyapocryphal, itseems)etymologyofitsnameas‘apatchyserver’.

6HTML 2.0wasanIETF standard; earlierversionswerenotformallystandardised.7http://xanadu.com/

3

Page 5: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

• It isdistributed, ordecentralised, so that there isno centre to theweb–thereisnosinglepointwhichcanfail, orbeco-opted, orwhichcangrantorrefusepermission.

• Itisnon-proprietary, oropen: theprotocolswhichdefinethewebarefreetoobtain, andmaybeimplementedwithoutanyinhibitionsfromlicences.Also, theweb’sgoverningbody(whichthe W3C is, ineffect)doesitsworkthroughanopenprocess.

• Thewebissimple, inthesensethatitisarchitecturallysimple(thenotionoftheweb, asserversplusbrowserspluslinks, iseasytocomprehend), andstraightforwardinprotocol terms(thecoreprotocolof theweb, HTTP, issuch thatacrudeserverorclient implementationcanbedeveloped inarelativelyshorttime).

• It’seasytojoinin: thisistoalargeextentaconsequenceofthesimplicityandopenness.

• YoucanwastehoursonWikipedia.

Onesenseof‘easytojoinin’isthatthewebhas always beenread-write:Berners-Leeconceiveditasaread-writemedium, thefirstbrowserscouldwritetothewebaswellasreadit, and‘anyone’canputupawebpage. Now, ‘anyone’heremeant‘anyonewithaccesstoaunixboxconnectedtotheinternet, whocanbuild, configureandinstallawebserver’; sonotquiteeveryone’s‘anyone’, butthekeypoint is that itwasusability that stoppedyou fromputtingupawebpage, nottheneedforpermission. Thusthe(genuinely)biginnovationofwhatbecameknownas‘Web2.0’, or‘theread/writeweb’, wasthatitwas‘Web1.0’foreveryoneelse.

Thelastpointinthelistisnotfrivolous, butisthepointthat, ontheweb,youcanlinkto anythingelseontheweb, andthatthisisbothculturallyaccept-ableandencouraged. Youcanwanderthroughalotofwebserversbystartingatonepageand‘followingyournose’. This isaconsequenceof thesimplic-ityandopenness, whichtogethermeanthatthereareverymanywebservers,fromveryheterogeneoussources, servingwebpageswrittenbyexpertsandnon-experts, andthatlinkingbetweentheseisnotinhibitedbyanyrequirementforpre-coordination.

Theweb’ssuccessisalsopartlyduetogettingsomethings wrong:

1. linkscanbreak, andpagesdisappear;

2. linksare all ‘seealso’(asopposedto‘parent’, ‘next’, ‘author’oranythingmoreinformative);

3. everythingisastring–theinformationonthewebis, fundamentally, com-municatedthroughtext.8

Onemightalsosaythat‘noqualitycontrol’isavice, butsinceonecansimulta-neouslyclaimthat‘nocontrolatall’isavirtue, thisisatleastdebatable.

Xanadu, forexample, guaranteedlinkintegrity, hadtypedlinks, andtriedtodevelopanewintellectualpropertymodel, allatthesametime. Berners-LeeandCaillau’sinsight, in [3], wasthatthesearesimplyignorableproblems–it’sacceptableforthingstobreak, andoccasional404sareapriceworthpayingtoavoidtheneedforpre-coordinationorregistration; andtheweb’stype-lessandone-directionallinkissuchapowerfulnotionthatitsintrinsicvaguenessisonly

8TheHTTP protocolcanbe, andis, usedtotransportdigitalmediaofalltypes–plaintext, images,PDFs, audio, andotherdata; forourpresentpurposes, I taketheterm‘theweb’torefertotheglobalhypertextsystemofBerners-Lee’soriginalconception; thisdoesnotaffecttheessentialpoint.

4

Page 6: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

adetail. Thepointofthesemanticwebisthatitstartstoaddressthesecondandthirdproblems.

Whyis‘everythingisastring’aproblem? Ifyousearchonthewebfor‘jor-dan’, yougetlinksaboutthecountry, theriver, theglamourmodel, thebreakfastcereal, thebrandofshoes, varioussmallbusinesses, thebasketballplayer, themathematician, andmore. Thesearenotallthesamething.

Thisdoesn’tmatter, however, becauseweashumansknowthey’redifferentthings, andwe’re not likely to get confused (and ifwe are, briefly, confusedwethinkit’sourfault, ratherthanGoogle’s). Wecan, inotherwords, addthesemanticsourselves, inexactlythesamewaythatwedosowhenreadingtextanywhereelse. Butthismeansthatcomputersareflyingblindwhentheytrytoperformactionsonthewebonourbehalf. Theyhavenoideawhatall thesestringsmean, nor(moreimportantly)howtheyrelatetooneanother.

Thewebworksverywellformanythings, andsearchenginesofcourseworkspookilywellinmanycases.9 Butitonlyreallyworkswhenthere’sahumanintheloop, orwherethere’salotofstatisticstobuildon, andyouwouldn’twanttoletyourcomputerunsupervisedontotheweb, withyourcreditcard.

Butinfactyou do wanttoletyourcomputerontothewebwithyourcreditcard. Inafamousandseminalpaper, Berners-Lee, HendlerandLassila [8]de-scribeanextendedscenarioinwhichcomputersareabletointeractwitheachothertoorganisetravelandotherappointments. Thisscenariohascomeaboutto someextent: price-comparisonwebsitesnow routinely interactwithotherwebsites to extract price data; and sites like tripit.com workbyparsing theconfirmationemailsfromtravelwebsitessuchasExpedia, toextractthedetailsofflightsandhotelstays. Theproblemisthattheseservicesworkbypainfullyscanningthecontentofwebpagesoremailsdirectedathumans(thisisknownas‘screenscraping’), heuristicallyparsingthem, andactingonwhatmayormaynotbetheintendedmeaning.

Thisishardworkforlimitedresults, but(atleastuntilcomputerssomehowmanagetounderstandtext)itisafundamentallimitationofthe webofstrings.

1.3 …designed…

I have stressed thatBerners-LeeandCaillau’soriginal conception is architec-turallystillveryclosetothewebweseenow, 25yearslater. Berners-Leewentontoformthe W3C,anditisthisbodywhichbygeneralconsent–fornocom-pulsionispossiblehere–stillshepherdsthedevelopmentofthecollectionofstandardswhichunderliestheweb(otherbodies, mostnotablytheIETF,arere-sponsible for the technicalgovernanceanddevelopmentof the internet, byasimilarprocessofgeneralconsent). Althoughthecorestandardsareeasilyiden-tified– HTTP, URIs and HTML –theseareaccompaniedbyablizzardofotheragreements, majorandminor, rangingfromthesyntaxofXML toconsensusontheformattingofdates. Theseagreementstaketheformof‘W3C Recommen-dations’,10 collaborativelyauthoredbyW3C WorkingGroups formallydrawnfromacademicandcommercialW3C memberorganisations, butwithwideandoccasionallynoisyparticipationfrominterestedindividualsworld-wide.

Muchof the developmentwork for the semanticweb, in particular, hascomefromuniversitiesandasmallnumberofresearch-activecommercialorgan-isations, oftenwithquitecloselinkstoacademia. Thereisastronginheritance

9Ifyouactuallysearchfor‘jordan’in, forexample, Google, Bing, andDuckDuckGo, youwillfindthat several ofthesedistinctsensesappearonthefirstpage, sothatthereismorevarietythanwouldresultfromsimplylistingthemostpopularpageswiththatstringinthem. Thisisbecausesearchenginescan statistically identifythattherearemultipleclustersofrelatedpagesandthus, forexample, displayonehitfromeachclusterinasituationlikethis. Thesearchenginesareunderstoodtoimprovetheirperformanceherewithsomelightweightsemantics, butthecoreworkofasearchengineisatpresentstillprobabilisticratherthanlogical.

10http://www.w3.org/standards/

5

Page 7: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

fromprecedingdecadesofresearchon ArtificialIntelligence(AI);thisinheritanceincludedcrucialexperienceofthelogicalunderpinningsofwhatbecameRDF(seeSect. 1.5 below), andexperiencewithrelatedtechnologies, whichmeantthatthefirstRDF standardswereremarkablyself-consistentandwell-developed.Howeverthissameprocessmeantthatthesefirststandardswereratherhermetic,whichwillhavecontributedtotheirearlyreputationforincomprehensibility.

Howevercomprehensiblethesestandardsare–andwewilltouchonthecoreideasbelow–itisimportanttostressthatitdoesnotrequiresomefoun-dationalunderstandingofthemathematico-logicalfoundationsofRDF inordertodescribe, forexample, abook’sbibliographicaldetails. Inrecentyears, the‘linkeddata’ paradigmhas emerged, againwith both academic and industrybacking, withafocusonthepracticalstepstobringabouttheimmediate-termpayoffsof thesemanticweb. Wediscussthis inalittlemoredetailbelow, inSect. 3.

1.4 …totransmitmachineprocessablemeaning…

TheSemanticWebhassomefamilyrelationshipswiththeworldof AI,andin-deedhassuffered, inmarketingterms, fromitsassociationwiththatdiscipline’srepeatedpostponementsof itsgreatpromises. TheSemanticWebisnotcon-cernedwithmachineunderstanding, orwithmachines’creationofmeaning, butinsteadwiththemoremodestgoaloftransmittingmeaningacrosstheweb, inamorereliablewaythanispossiblewiththewebofstrings.

Ourconfusion, above, aboutthedifferentthingscalled‘jordan’arisesbe-causethesevariousverydifferentthingsallshareasinglelabel–‘jordan’–anditisonlyourabilitytounderstandthecontext, andourknowledgeofthestructureoftheworld, thatallowsusinpracticalfacttodistinguishthevariousdenota-tions(wedonotgetconfusedwhentheterm‘jordan’appearsonanewsweb-site’spoliticsandcelebritypages, denotingdifferentthings). Facedwiththesametext, computerscandolittlemorethan(beprogrammedto)relyonstatisticsandheuristics. Searchenginesshowthatthisapproachismoresuccessfulthanonemightexpect, butuntilcomputersdofinallymanageto‘understand’things(iftheyeverdo), thereislittlemorewecando, ifwestickwithjustthetext.

SemanticWebtechnologiesallowustotakeastepbeyondtheselimitations,by

• providinga foundationwithwhich todefinemore specific labels for theconceptsandcategorieswhichmakeupourworld; and

• allowingustomanipulatetheselabels(andbyanalogymanipulatethecon-ceptsandcategories)inamoreorlesssimplecalculusofrelationships.

Anexampleisusefulhere.Onename for theRiver Jordan is http://dbpedia.org/resource/Jordan_

River. ThisderivesfromDBPedia [9], whichisacollectionofSemanticWebnamesderiveddirectlyfromWikipedia. Thename http://sws.geonames.org/7874114/ isanotherone, whichderivesfromtheGeonamesdatabase.11 There-semblanceofthese‘names’toordinarywebURLsisnotacoincidence–allthenameswithintheSemanticWebaresyntacticallyofthisform. Oneoftherea-sonsforthisisthatitpreservesthe‘decentred’propertyofthe(text)web: theownersofthe geonames.org domaincancreatewhatnamestheylikewithinthatdomain, since‘creatinganame’inthiscontextconsistssolelyofdecidingthataparticular URI shouldactassuchapersistentname, andprovidingRDF (seeSect. 1.5)todescribeit. Thishastheimmediateattractivepropertythat(usually)onecanfindmoreaboutaparticularnamebytypingthenameintoanordinary

11http://www.geonames.org

6

Page 8: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

webbrowser; althoughit’snotarequirement, theusualgoodpracticeisforsuchaURI toproduceahuman-readabledescriptionofsometype.

Anyonecancreatesuchaname: I ownthedomain nxg.me.uk anddecided,bypersonal fiat, that http://nxg.me.uk/norman/ shouldbemy ‘name’ in thiscontext. Afterashortbitofwebsiteconfiguration, itwasso.

Oncewehavenames, wecanstarttodescribethethingssonamed.12

Sinceanyonecan–andmanydo–createonlinenamesforthings, oneofthefirstthingswemightwanttodoistoaddresstheapparentproblemofduplica-tion. So, forexample, wemightwanttosaythat http://dbpedia.org/resource/Jordan_River isthesamethingas http://sws.geonames.org/7874114/. Thiswillbetrueinsomecontexts, butfalseinothers. Inthiscase, itturnsoutthatthegeonames.orgdesignershavedecidedthat http://sws.geonames.org/7874114/referstotheRiverJordan inJordan, andthatthe‘same’riverinIsraelisnamedhttp://sws.geonames.org/294624/. Therefore, in some contexts, itwould besimplyfalse(andimportantlyfalse)tostatethat http://dbpedia.org/resource/Jordan_River is ‘the same as’ one or the other, and wemight find it bettertosaythat http://sws.geonames.org/294624/ is ‘partof’ http://dbpedia.org/resource/Jordan_River. Thissortof(oftenbarelyconsistent)subtletyissome-thingwhichwearecomfortablewithinhumanconversation, butwhichwehavetospelloutinpedanticdetailassoonascomputersareinvolved.13

Otherthingswemightwanttosayarethat http://dbpedia.org/resource/Jordan_River hasacertainlength, thatitisamemberofthecategoryof‘rivers’,that‘rivers’areatypeofgeographicalfeature, thattheriveris locatedwithin (asopposedtobeingacomponentpartof)bothIsraelandJordan; wemightwantacomputerto‘know’thatthecategoriesof‘rivers’and(forexample)‘cats’aredisjoint(computersare very ignorant).

The term ‘know’, here, doesnotof course refer to anycognitionon thepartofamachine. Instead, itreferstoacalculusofmanipulationswhichwouldallowittodetectthatastatementthat‘http://dbpedia.org/resource/Jordan_River isacat’isinconsistentwithwhatithasalreadystored, orthatasearchfor‘thingslabelled“jordan”whicharegeographicalfeatures’shouldnotreturnanybasketballplayers.

ThislastpointindicatesonefragmentofwheretheapplicationsoftheSe-manticWebmightlie. Weknowhowtostoreandmanipulate facts oncomputer–wecanmatchanemployeenumberwithanameanddepartment, orastaratparticularcoordinateswithbrightnessandcolour–butoftenthesefactslackthereal-worldstructureandinterrelationshipsthataresoimportanttohowweashumanswanttohandlethem. Evenwhensuchknowledgeisstructuredinsomecontexts, sothattheinternalstructureofalibrarycataloguemightreflectthereal-worldstructureoftheorganisationthatmaintainsit, thisstructuringisnotshareablewithoutclosepriorcooperation. IfI emailedyouacopyofsuchacatalogueasaspreadsheet, youasahumanwouldswiftlyworkoutwhatthisdocumentwasandwhatsomeofthecolumnsmeant, thoughyoumight(forex-ample)havedifficultytellingapartthecolumnswithpublicationyear, acquisitionyearand, say, theyearthebookwasfirstborrowed. I wouldhavetotalktoyoutotellyouwhichcolumnwaswhich; neitherofuswouldexpectthespreadsheet

12I noteparenthetically–forthereisa veryverylong distractionpossiblehere–thattheURIs http://dbpedia.org/resource/Jordan_River and http://nxg.me.uk/norman/ arenamesfor thephysicalthingsthemselves, andnotforanyonlineresourcedescribingthethings; inthesamesense, aDOI isadigitalidentifierforanobjectsuchasanarticle, andnotfortherecorddescribingthearticle. ThisistrueeventhoughbothofthemhavetheformofaURI whichisretrievable(or‘dereferenceable’)inthesensethatawebbrowsercanretrievesomethingfromthataddress. IfyouretrieveeitherofthoseURLsby typing it intoabrowser, youare redirected toanotherwebpagewhichdescribestheresource(thatis, aresourcewithadifferentname, which is awebpageratherthanapersonorconcept); inneithercasedoyougetme, orariver-fulofwater, deliveredtoyourcomputer. Ifyouareinterested, andhaveadayorsotoburn, searchfor‘httprange-14’.

13WemakesomeextraremarksaboutthelogicofRDF,andidentity, inSect. 1.5.

7

Page 9: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

tobeusableonacomputerwithoutthishuman-to-humaninteraction.ThegoaloftheSemanticWebistomakethissortofstructuredknowledge

–aboutgeography, stars, people, or retailopportunities–generallyshareableandmanipulable by computers, while preserving asmany as possible of the(text)web’spropertiesofdecentralisation(there isnopermissionorcoordina-tionrequired), openness(whatwecancommunicateisnotlimited apriori), andsimplicity(thereisstillonlyoneprotocol, HTTP).

Thatis, theSemanticWebisnotaboutmachine understanding, butaboutthetechnologyrequiredtoletcomputersmanipulatetheseURI-basednamesinwayswhichare, as faraspossible, not inconsistentwith thepropertiesof thereal-worldobjectsforwhichtheyareanalogues. Thisanalogy–thefunctionalconsistencybetweenthebehaviourofreal-worldobjectsandthedeclaredrulesformanipulatingtheirnamesinthemachine–isthe‘semantics’intheSemanticWeb.

1.5 …throughalogicalframeworknamedRDF…

Sofar, soabstract. Howdoweactually writedown thestatementthat‘http://dbpedia.org/resource/Jordan_River isa river’? For this, wemustexaminethe ResourceDescriptionFramework(RDF).

RDF isa framework fordescribingthings, andtheirmutualrelations. It’saratherabstractframeworkforthinkingaboutthemechanicshere, andisnot,itself, aspecificdata format orlanguage(thoughthereareformatsandlanguagesintimatelyassociatedwithit).

Inthelogicallanguageinwhichitwasfirstdescribed, RDF israthersim-ple(oratleastcompact, whichisnotnecessarilythesamething). Itseemsbest,therefore, tofollowtheoverallplanofthissectionandgivethecompactexpla-nationfirst, andsubsequentlyglossitatalittlemorelength. So(omittingafewdetailswhichareunimportantatthispoint):

• Theworldisdescribedbyamixtureof resources and literals.

• Literals aresimplystrings, suchas“RiverJordan”or“RíoJordán”.

• Resources arethingsintheworldsuchasindividuals, categoriesofthings(suchaspeopleorrivers), abstractconcepts(‘worldpeace’), thingsontheweboroffit, orindeed anythingthatcanbegivenaname.

• Resourcesaregiven names whichareallsyntacticallyURIs.

• Onecanmake statements aboutresources, allofwhichareoftheformofatriple ofsubject(theresourceinquestion), predicate, andobject. Subjectsandpredicatesarealwaysresources; objectsmayberesourcesorliterals.

RDF version1.1isformallydescribedin [10].14

Tosaymore, wewillhavetowritedownRDF.ThereisnosingleRDF syntax;wewillpickthesyntaxTurtle [11]fromtheavailableoptions. Inthissyntax, wecanwrite

<http://nxg.me.uk/norman/><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://xmlns.com/foaf/0.1/Person>.

<http://nxg.me.uk/norman/>14Thisreplacestheoriginalversion1.0; therearefurtherdocuments, includinglinkstoprimersat

differenttechnicallevels, at http://www.w3.org/standards/techs/rdf, andabroadercollectionofresourcesat http://www.w3.org/RDF/. TheRDF suiteofdocumentswascomprehensivelyrefreshedinFebruary2014, withthereleaseofnewversionsofmanyofthestandardswhichhadaccumulatedovertheprevious15years.

8

Page 10: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

<http://xmlns.com/foaf/0.1/name>"Norman Gray".

Therearetwostatements (ie, triples)here, onestatingthat http://nxg.me.uk/norman/ isaPerson, andtheotherstatingthatresource’sname. Ineachcase,wecanseethestructureofthesubject-predicate-object triple, withtheresourcehttp://nxg.me.uk/norman/ being the subject in both cases, and the resourcehttp://xmlns.com/foaf/0.1/Person andliteral "Norman Gray" beingtwo objects.Thefirstpredicate–statingthetypeoftheresource–isoneofthepredicatesde-fined in theW3C‘sRDF ‘Schema’ standard (RDFS) [12]; the type inquestionisonedefinedbytheindependentlydefined FriendofaFriend(FOAF) schema(whichwewillreturntobelow). TheFOAF schemaalsodefinesa‘name’predi-cate, whichgivestheliteral, conventional, nameoftheresourcethatisitssubject.

Thisnotationisobviouslyrathercumbersome. TheTurtlesyntaxstatesthatwecanabbreviatethe‘type’predicatebyjust‘a’, thatwecancollapserepeatedsubjects, andthatwecanprovideabbreviationsforcumbersomelylongURIs; asaresult, themoreidiomaticwayofrepresentingthestatementsaboveisjust

@prefix foaf: <http://xmlns.com/foaf/0.1/>.

<http://nxg.me.uk/norman/>a foaf:Person;foaf:name "Norman Gray".

Thereareafewmoresubtletiestothisnotation, whichcanbefoundin [11], buttheyneednotdetainus, sinceourgoalhereissimplytoconcretizethegeneral-itiesofearliersections, andtoallowustousethisnotationinexamplesbelow.

Youmayobjectthatthisseemsaterriblylong-windedwaytoindicatesome-one’sname, andyouwouldberight. ThereisanothernotationcalledRDF/XML [13]whichisevenmorelong-winded, whichonlyitsmothercouldlove, andwhichwewillpassoverinsilence. AndthereisyetanotherRDF notationcalledRDFa15

whichisconcernedwithembeddingRDF statementswithinotherdocuments,andmostparticularlywithinhuman-readable HTML pages; theintentionisthatthesamedocumentthatahumanreadsdescribing, forexample, alibrarybookora‘retailopportunity’, isatthesametimereliablyinterpretablebymachine.Butwemustnotallowourselvesbedistractedbysyntax.

ThecentralgoalofRDF isnottoprovideadata-transportformat(andhereisagoodpointtoemphasisethatthe‘F’inRDF standsfor‘framework’, andnot‘format’), buttoprovideasetofprimitivenotionswithwhichwecanrepresentawidebreadthofknowledge. Thosenotionsaresimpleenoughthatitisfeasibletotranslateawidevarietyofother, possiblymorenatural, formatsintoRDF –forexampletherowsinadatabasetable, ortheelementsinanXML file; andtheyarewell-enoughdefinedthatwecanuselogicaltoolstoprocesstheresultsanddrawout their implications, whilestandingagoodchanceofpreservingtheirreal-worldmeanings(cf. Sect. 1.6).

Saying thatweneednot storeor transportknowledge in this form isnotsayingthatweshouldnotdoso, anditisperfectlyreasonabletostoreandtrans-portRDF ifthatiswhatisconvenient. ThedatabasesinwhichonestoresRDFiscalleda‘triple-store’, after thesubject-predicate-object triples theycontain;theyarearchitecturallydifferent from the relationaldatabasesof tables, rowsandcolumns, withwhichyoumightbemorefamiliar, andalthoughtheyarenotquiteastechnicallymatureasrelationaldatabases, theyaresteadilyimproving.

Asafinalpoint, thestatementsofRDF arenotrequiredtobetrue, consistent,orevenmeaningful(youcansay‘Truthsmellsofmuesli’ifyouwantto, or‘thepresentKingof France isbald’) – anxieties about such statements are for thehigherlevelsofinferenceandontologiesthatwewillcometoshortly.

15https://www.w3.org/standards/techs/rdfa

9

Page 11: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

A brieflogicalexcursion, fortheenthusiast: RDF byitselfhasanexceed-inglysimplecoresemantics. Thelinkbetweenaname(intheformofaURI)anditsreference(inRDF terms, a‘resource’)ismadeinnaturallanguage, andisnotexpectedtobemeaningfultoamachine. RDF doesnotdistinguishbetween,forexample, senseanddenotation(inFrege’s terminology), andisuncommit-tedabouttheidentityorotherwiseofresources; thusitispossibletonamebothMarkTwainandSamuelClemenswithURIs, anditisnotpossible, withinRDFalone, tomakestatementsabouttheirmutualidentityorotherwise. OWL on-tologies(seebelow)areabletomakestatementsaboutequivalence, butevenheretheframeworkisuncommittedaboutwhatequivalencemeans(itdoesnot,forexample, distinguishsenseanddenotation), andinonecontextitmay, andinanothermaynot, beuseful for theauthorofastatement/triple tostate thatMarkTwainandSamuelClemens, are‘equivalentresources’, orthemodelJor-danandtheécrivaineKatiePrice, ortheGeonamesandDBPedianamesfortheRiverJordan. Onceanassociationhasbeenmadebetweenanameandarefer-ent, thegoalofRDF istoallowthisassociationtobepreservedacrossmachinesandnetworks, andtoallowmachinesprovidedwithbothRDF statementsandontologies(whichmaycomefromdifferentsources)todiscoverthestatementsentailedbytheontologies, whichhavethesametruthvaluesastheinitialRDF.Forfurtherdetailedlogicaldiscussion, see [14].

1.6 …enhancedbymachineinference…

IntheTurtleexampleabove, I ‘described’theresource http://nxg.me.uk/norman/bysaying that the thingwith thatnameisa foaf:Person, whose foaf:name is“NormanGray”. Theseare two terms froma simple ontology (seeSect. 1.7)called FOAF.16 TheFOAF ontologyhasa namespace http://xmlns.com/foaf/0.1/, meaningthatallofitstypesandpredicatesstartwiththat URI.Itincludesa number of types such as ‘Person’ and ‘Project’, and a number of relationssuchas‘name’, ‘mbox’(emailaddress), ‘publications’andsoon. Anotheron-tology, well-knowninthelibrarycommunity, is DublinCore(DC).This isde-scribedauthoritativelyin http://dublincore.org/documents/dcmi-terms/, andtheRDF expressionof thisvocabularydescribesthenamespace http://purl.org/dc/terms/, sothattheDC ‘title’predicateis, infull, http://purl.org/dc/terms/title, which ismostoftenseenabbreviated to just dc:title. TheDContologyisfocused, atleastinitially, onmetadataforbibliographicandarchivalresources, andincludesrelationssuchas‘title’, ‘creator’, andsoon.

Importantly, thesevarioustermshavenointrinsicmeaning, andRDF placesnorestrictionsonthepredicatesattachedtoaresource, norwhoattachesthem.Thusifyouwishtosay

<http://nxg.me.uk/norman/>foaf:name "Capitania General Bernardo O'Higgins";myprops:shirtType "class C".

thenyouareatperfectlibertytodoso, evenalthoughthefirststatement(statingthatmynameisBernardoO’Higgins)isfalse, andthesecond(apparentlysay-ingsomethingaboutmyshirt)ismeaninglesstome, thoughdoubtlesssomehowusefultoyou.

Theprimarygoalofthesevariousrelationsistodescribearelationshipsuf-ficientlyunambiguouslythatevenamachinecanprocessit. Thusa‘title’intheDC sense–thatis, theobjectofthe dc:title predicate–isalwaysabibliographictitle, suchas dc:title "Transition to the Digital". TheFOAF ontologyalsohasa‘title’predicate, butinthiscaseitisalwaysandonlyaperson’shonorific:<http://nxg.me.uk/norman/> foaf:title "Dr".

16The‘FriendofaFriend’ontologywasoriginallyconceivedofasawayofcreatingapeer-to-peersocialnetwork.

10

Page 12: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

<http://nxg.me.uk/norman/>foaf:name "Norman Gray";foaf:mbox <mailto:[email protected]>.

<http://example.org/a.n.other>a foaf:Person;foaf:name "Aloysius Naismythe Other".

<http://someone.org/ping>foaf:mbox <mailto:[email protected]>.

<isbn:????> # XXX INSERT BOOK'S ISBN HEREdc:title "Transition to the Digital;dc:contributor <http://someone.org/ping>.

Figure 1: Somestatementsaboutpeople.

Butwecandomorethanthis.ConsiderFig. 1: thisprovidesaFOAF nameandemailaddress for http:

//nxg.me.uk/norman/, statesthat http://example.org/a.n.other isa(FOAF) Per-son, and provides an email address for an otherwise unidentified somethingnamed http://someone.org/ping. Youmightalreadyhaveapictureoftheenti-tiesbeingdescribedhere, intermsoftheirnumberandtype.

Ifyouwere toput thisdescriptions inFig. 1 intoa triple-store, and thenqueryittoretrieveallofthePersonsdescribed, youwouldobtainonlyasinglePerson, namely http://example.org/a.n.other, becausethisistheonlyresourcewhichhasbeenexplicitlystatedtobeoftype foaf:Person. Itisobvioustousashumans, however, thatifsomethinghasanameandanemailaddress, thenit’salmostcertainlyhuman, andtheFOAF specificationindeedmakesthisstip-ulation, thata foaf:mbox propertycanbeattachedonly toa foaf:Person. Informallanguage, wecansaythat foaf:Person isthe‘domain’ofthe foaf:mboxand foaf:name predicates, meaningthatonlythingsoftype foaf:Person areper-mittedtohavethosepredicates(thisisdistinctfromtheuseof‘domain’tonameacollectionofmachinesontheinternet, asinforexample www.w3.org); simi-larly, wecansaythat dc:Agent isthe‘range’ofthe dc:contributor predicate,meaningthatanyobjectofthispredicatecanbededucedtobea dc:Agent. Inconsequence, anRDF triple-store withthisextrainformation(moreprecisely, atriple-store possessedofbasicinferencingcapabilities)can deduce that /norman/and /ping arePersons, andsowouldgivethreeanswers, whenaskedtolistthePersonsitknowsabout.

A furtherthingthathumansknowisthat, whileindividualsmayhavemul-tipleemailaddresses, anaddress isusuallyownedbyasingleperson. FOAFagrees: onlyasingle foaf:Person canbethesubjectofa foaf:mbox property.Butboth /norman/ and /ping havethispropertywiththesamevalue. Thereso-lutionisstraightforward: thesetwoURIscanbededucedtobemerelydifferentnamesforthesamePerson. Primedwiththisextrainformationaboutthedomainof foaf:mbox, a triple-store, askedhowmanyPersonswererepresentedinFig. 1,wouldanswer‘two’. Sucha triple-store cananswermorecomplicatedquerieswithoutgettingconfused. Askedforthetitlesofthingstowhichthepersonnamed"Norman Gray" hascontributed, a triple-store providedwithFig. 1 couldanswer"Transition to the Digital", byfollowingthechainofallowed inferences –itsynthesisesstatements, suchas“http://nxg.me.uk/norman/ isa foaf:Person”,and“/norman/ sameAs /ping”, whichareonlyimplicitintheoriginalsetofstate-

11

Page 13: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

ments.Theend result, asmaynowbeclear, is that a triple-store canaggregate

informationfrommultiplesources: theinformationinFig. 1 mighthavebeengatheredfromapublisher’swebsite, amembershipdatabase, andanindividual’shomepage, andmayhavestartedoffinanyorallofRDFa, XML,orarelationaldatabase. Onceitisgathered, the triple-store candrawtheconclusionswhicharenotapparentfromanydatasourcebyitself.

Beforegoingontodescribebrieflyjusthowthesevariousrelationsarear-ticulated, wehavea fewfinal remarks tomake. Firstly, whenwesay, above,that“a foaf:mbox propertycanbeattachedonlytoa foaf:Person”, wearesay-ingsomethingdifferentfromtheapparentlysimilarstatementwemightfindinanXML schema. InXML,suchastatementisasyntacticalone, sayingthatan‘mbox’element(inthatexample)is permitted tobeattachedonlytoaPerson, sothatifitisattachedtosomethingwhichisn’taPerson, oratleastisn’tknowntobeaPerson, thenthisisanerror. InRDF,incontrast, thestatementmeansthatanythingtowhichthe foaf:mbox propertyisattachedmustbe deduced tobeaPerson. Itisnotevenillegitimate(althoughitisinfactuntrue)tosaythat

<http://nxg.me.uk/norman/> foaf:name "Norman Gray";dc:bibliographicCitation "Gray (2014)".

even though thedomainof dc:bibliographicCitation allowsa triple-store todeducethat /norman/ isof type dc:BibliographicResource (that is, somethinglike a book, that canhave ‘Gray (2014)’ as its citation). Only if the store issubsequentlytoldthat foaf:Person and dc:BibliographicResource are disjointwillitraiseanyobjection, andthatobjectionwillnotbeasyntacticone, buttheannouncementofalogicalinconsistency.17

Secondly, wemightreasonablyobjectthat, intherealworld, alibrary(whichisnota foaf:Person)mighthaveanemailaddressforenquiries, andthatarole-basedemailaddressmightinfactbesharedbymultiplepeople. Thisisaccurate,buttherestrictionsweplacedonthe foaf:mbox propertyabovearepartoftheapproximationtotherealworldthatwemustmakewhenwedescribethatworldintermssimpleenoughforacomputer. Thelibrary’semailaddressisthereforenottheobjectofany foaf:mbox property, forpreciselythereasonthatalibraryisnota foaf:Person; andsimilarlyfortherole-basedaddress, forpreciselythereasonthatitisincompatiblewiththedefinitionof foaf:mbox foranemailad-dresstohavemultipleowners. A different‘people’ontologymighthavedifferentrestrictions, andsopermitdifferentimplications.

Anotherlogicalexcursion: Above, wementionedtheideathat(someof)theRiverJordanislocatedwithinJordan, asopposedtobeingpartofit. Thetruthofthisstatementsdependsinpartonwhat, precisely, ‘ispartof’means, andsopotentiallydrags inasetofphilosophicalconsequences, andindeedpoliticalandculturalones; buttheonlyonesthatmatterstothecomputerarethelogicalconsequences: giventheinformationthat‘A ispartofB’, whatotherstatementsisitpermittedtosynthesise? ThatisalargelytechnicalproblemconcerningwhatisPartOf or sameAs isintendedtomeaninaparticularsystem, anditcorrespondstothehumanproblemofarticulatingwhatoneknowswithinaformalsystem,inawaywhichallowsthemachinetodrawtheconclusionsoneexpects. Itispossibletospendagooddealofwork-dayeffortarguingaboutjustwhat‘is’is.Thesequestionsarewhatpermitanontologytohavesufficientstructuretomakeinferencespossibleanduseful, andworkingoutthenecessarybalancebetweenapproximationandexpressivenessiswhatmakesthedesignofontologieschal-lenging. Allthatsaid, suchexoticquestionsarelargelyirrelevanttothe users of

17Saying that the twoclassesare disjoint is to say that therearenoobjectswhichare inbothclasses; thisisincompatiblewiththededuction, above, that /norman/ is inbothclasses, anditisthisinconsistencythatmachinewillobjectto.

12

Page 14: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

anontology, aslongastheyarebroadlyawareofthepotentialdislocationbe-tweenstatementsintheformallanguageandtimelessstatementsabouttherealworld.

ThuswecanseethatRDF statements, andtheontologieswhichstructurethem, arenotreally, ornotonly, about truth, otherthanintheabstractlogicalsensethatvalidargumentsmustpreservethetruthvaluesoftheirinputs. Simi-larly, wearenotconcernedwithdevelopingasingletrueontologyoftheworld,orevenwiththeclaimthatanontologywhichisusefulinonecontext, willbeusefulorevenmeaningfulinanother, orthattwoontologiesdescribingoneob-jectwillnecessarilybecommensurable. Developingontologiesisatypeofpro-grammingactivity, andthetypesandpredicates, andtherelationshipsbetweenthem, mustbechosentomatchthethingsbeingmodelled, toacceptthepreci-sionofwhatcanbesaidaboutthem, andtosupporttherangeofconclusionsonemayhopetodraw.

FOAF andDC arerelativelysimpleontologies, withratherlightweightcon-straints; theyarethereforeverywidelyapplicablefordescribingpeopleandarte-facts, respectively, and, inconsequence, areveryusefulforintegratingotherwiseratherdisconnectedinputdatasets. Incontrast, alargeontologysuchastheGeneOntology [15, 16]isconcernedwithalogicalstructurewhichissufficientlyin-tricate, andsufficientlycloselymappedtonature, that theontologycandrawscientificallyvalidconclusions, atthecostofbeingconsiderablyhardertouse.

Intheexamplesabove, wehavetalkedofvariouswaysofaddingstructuretothelistoftypesandpredicatesinanontology; nowisthetimetodiscussbrieflyhowthisactuallyhappens.

1.7 …basedonOWL andotherontologies.

Theinternalstructureofontologiesisthemosttechnicallyintricatepartofourstoryhere. Fortunately, itisnotnecessarytodiscussthatstructureindeepdetail:ourgoalinthissectionissimplytoconcretizetherathergeneralstatementsaboutontologystructureabove, and togivegeneralpointers towardsmoredetailedadvice.

There’sawell-knowndescriptionofanontologybyThomasGruber[17],18

inwhichhedeclaresthatanontologyis

“aformalspecificationofasharedconceptualization”

Thatappearsopaqueatfirstglance, butinfactit very conciselypullstogetherallofthekeyconcepts. Thatistosay, anontologyis

conceptualization asetofconcepts

shared …whichatleasttwopeopleagreeabout

specification …andwhichhasbeenwrittendown

formal …inamachine-readableway.

Like‘semantics’, theterm‘ontology’hasaforbiddingaspect. Inthiscontext,however, itsimplylabelsoneendofaspectrumofwaysofstructuringinforma-tion. InDeborahMcGuinness’s ‘semanticspectrum’ (in [18], andredrawninFig. 2)sheillustratesarangeof ‘sharedconceptualizations’, andsuggests thattheterm‘ontology’ismostnaturallyrestrictedtothosemoreformalstructurestotherightofthecentralline, dividinginformalfromformal‘is-a’relations; Gru-ber’sdefinitionappliestomostofthisspectrum.

18ThisisthebynowapparentlytraditionalslightmisquotationofGruber’s‘explicitspecification’as‘formalspecification’.

13

Page 15: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

Catalogue/ID

Terms, glossary,folksonomy

Thesauri(‘narrower’

relation)

Informal is-a

Formal is-aValue

restrictions

Generallogical

constraintsProperties

Figure 2: Thesemanticspectrum, simplifiedfrom[18]

First, attheextremeleft, simplylistingidentifiersforasetofobjectsisasortofprimitivesharedconceptualization.

Next, andmarginallymore formally, a controlledvocabulary is simplyadeliberaterestrictionofthetermsweusetodescribetheworldinsomecontext.Inthesamegeneralcategory, a folksonomymightbedefinedasa‘collaboratively’or‘loosely’controlledvocabulary, inwhich, ideally, someconsensusemergesonthetermstousetodescribesomeuniverseofresources. Inbothcases, though,thereisnorealstructureto(therelationshipbetween)theterms.

A thesaurus is acontrolledvocabulary representingconcepts, plus somedeclaredrelationshipbetweentheconcepts. Mosttypically, athesaurusisusedfor, oratleastassociatedwith, informationretrieval: youmightgointoalibraryandasktobedirectedtobooksaboutcats, orwhatwemightcalltheconceptof‘Cats’, andalibrarianmightbeabletotakeyoutoashelfofusefulbooks. Ifyou’reinterestedinspecifically‘domesticcats’, they’reonasmallerpartoftheshelf; ifyou’reactuallyinterestedin‘mammals’ingeneral, theymightbespreadoverafewshelves. Therelationshipsbetweenthese broader and narrower terms–forthisisthemostcommonstructuralrelationinthesauri–isapracticalone:any information retrievedbyuseofagivenconceptwillalsobe retrievedbyuseofthe‘broader’concept. Theseare‘is-a’relations, inthesensethateverydomesticcat isa cat, andeachcat isa mammal; butthisisan‘informalis-a’, inthesenseofFig. 2, sinceinthelayoutofapet-shop, forexample, orthelayoutofitssupplier’scatalogue, ‘catcollars’mayreasonablybea‘narrower’conceptwithrespectto‘cat’withoutanysuggestionthatacatcollarisatypeofcat, andalibrarybookaboutcatsmaybelabelled‘cat’(orforexamplewiththeDeweynotation636.8)withoutanyoneoranythingbeingentitledtoconcludethatthebookitselfwouldbeimprovedbytheinsertionoffish.

The W3C standardforthesauriistheSimpleKnowledgeOrganizationSys-tem(SKOS),19 whichformalisesthesebroader/narrowerrelationsandaverysmallsetoffurtherones, inadeliberatelysimpleframeworkwhichisdesignedtobebroadlydeployable.

Next, andsteppingoverthedividinglineofFig. 2, thesimplest ontologiesarethosewherethishierarchyof‘is-a’relationsisindeedexpectedtoholdinalogicallyusefulsense, andwherelabellingsomethingwithanontologytermis indeedanassertion that that thing is an instanceof the labelled type, andthereforeofanysupertype. Forexample, theLinnaeannameforcatsis Feliscatus;thisisaspeciesofthegenus Felis inthefamily Felidae, intheorder Carnivora, allthewayuptothekingdom Animalia; andthishierarchyisexplictlyintendedtoallowmetoconclude, onbeingpresentedwithsomethinglabelled Feliscatus,thatthatthingisindeedacatratherthanabookaboutcats, andthatthe(bynowindignantlywriggling)subjectisacarnivorousanimal, withclaws.

This‘formalis-a’istheintendedinterpretation–thatis, theintended‘seman-

19http://www.w3.org/2004/02/skos/intro

14

Page 16: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

tics’–ofthestatementinFig. 1 thatthethingnamedby http://example.org/a.n.other isof‘type’ foaf:Person. Theusualwordforthetypeinthiscontextisclass, andthe(FOAF) URI http://xmlns.com/foaf/0.1/Person (usuallyabbrevi-atedtojust foaf:Person)isthenameoftheabstractclassofpersons; individualpeopleare instances ofthisclass.

Thenextstepalongthesemanticspectrumistobeabletodeclarethatin-stancesofsomeclassesmaypossess properties (theterm predicates isgenerallyinterchangeable)suchas foaf:name. Anontologyofthistypemaydeclarethedomainandrangeofapredicate, restrictingthetypeofsubjectorobjectthatthepredicatemayhave. A triple-store canthenusethisinformationtomakecertaindeductions, asweillustratedinSect. 1.6 onp.11. However, thereisnoway,inthistypeofsimpleontology, to require thataninstanceofaclassmusthaveaparticularproperty(I canbea foaf:Person withouthavinganemailaddress,forexample, bizarrelyvictorianthoughthatnotionmayseem), andareasonerwhichdoesnotknowmyemailaddress, becauseithasn’tbeenprovidedwithanfoaf:mbox propertyforme, isnotentitledtoconcludethatI thereforedon’thaveone.20

Thesimpleontologicalframeworkrepresentedby RDF Schemas(RDFS) iscapableofexpressingnomorethanthis: thatclassesexistandthatsomeclassesaresubclassesofothers, thatresourcescanbeinstances(ormembers)ofclasses,thatpropertiesexistandhaveclassesasdomainandrange. Thepayoffisthatreasoners21 whichimplementthislogiccanbesimplerandfasterthanareasonercapableofmoreintricacy. ReferringbacktothediscussionaroundFig. 1, suchareasonerwouldbecapableofconcludingthat http://nxg.me.uk/norman/ isa foaf:Person (becausethatresourceisassertedtohavea foaf:name, implyingthat /norman/ is in thatproperty’sdomain), and it couldconclude that http://someone.org/ping isa dc:Agent; butitcouldnotusetheidentityofthe foaf:mbox properties toconclude that theseare the sameperson, and it couldnotdetectanyinconsistencyinasubsequentassertionordiscoverythat /norman/ isa dc:BibliographicResource.

Togetsuchextrareasoningpower, itisnecessarytogobeyond RDFS totheright-handsideoftheontologyspectrum, towardsmoreelaborateframeworkssuchasOWL.TheWebOntologyLanguage(OWL22) comesinseveralstandardvarieties–OWL Lite, OWL DL,andOWL Full–andfurthervarietiesassociatedwithparticularimplementations(see [19]andreferencesat http://www.w3.org/standards/techs/owl). Thedifferencebetweenthesevarietiesisthatsomearemore expressive, inthesensethatitispossibletoarticulatemorecomplicatedrelationshipsbetweenresources; thepracticaltradeoffisthatthemoreexpressivevariantsareharderormoreexpensivetoimplement. ItisinOWL,andnotRDFS,thatitispossibletosay

dc:Agent owl:disjointWith dc:BibliographicResource.

A reasoner, onceprogrammedtocalculatethelogicalimplicationsofthe owl:disjointWith predicate, canthereafterconcludethatifaresourcesuchas http://nxg.me.uk/norman/ hasbeenassertedordiscoveredtobeinbothclasses, thenithasdetectedalogicalinconsistency. Thismightformpartofalongerchainof reasoning, or itmaybepracticallyuseful fordiscovering latent errors ina

20This‘permissionforignorance’isknownasthe‘openworldassumption’. Itisterriblyimportantforthemathematicallogicwhichareasonerimplements, butmostnon-specialistusersofRDF canbeforgivenforfeelingittobearatherabstractdetail. Itisincontrasttothe‘closedworldassumption’builtintomoretraditionalrelationaldatabases: ifthereisnosalarybesidemynameinthepersonneldepartment’sdatabase, thenthesystemwillconcludethatI amapersonwithoutasalary, asopposedtoapersonwhosesalaryisunknown(whichistheonlyconclusionpossibleinanRDF analogueofthisdatabase); inthiscase, thepersonnelofficeiscertainthey’renotgoingtopayme.

21A ‘reasoner’ in thiscontext is apieceof softwarewhichcan implement the logicalcalculusindicatedbyanontology.

22Yes, thestandardisindeedcrackingaWinnie-the-Poohjokehere.

15

Page 17: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

database: ifamappingdatabasediscoversa feature that ismarkedasbothariverdeepandmountainhigh, thenitscuratorscantherebydiscovertheyhavesomerepairingtodo.

1.8 Sowhat, briefly, istheSemanticWeb?

Whatwehaveendedupwithisthis: inRDF wehaveaflexiblemodelforrep-resenting relatively simplestatementsabout theworld, inaprimitive subject-predicate-objectpattern. Thisframeworkhasanumberofsyntaxeswhicharemoreorlesssuitableinparticularcontexts; thesearemostprominentlyRDF/XML,TurtleandRDFa, buttherearealsosomeemergingalternativessuitableforJSON.Despitethisabundanceofsyntax, itisthesimplicityandverybroadapplicabilityoftheprimitive triples thatiskey, sincealmostanyreasonablystructureddatasourcecanbewrangledintoRDF form, onewayoranother, andthismakesRDFanexcellentmechanismforcombiningheterogeneousdatasources.

Animportantstepinthiscombiningmechanismisthecarefully-struckbal-ancebetweenthelogicalprecisionofthecalculusofresourceswhichontologiesexpress(givenasetofRDF triples, whatfurthertriplesarelogicallyentailed?),andboththeflexible imprecisionofthenatural-languagelinkbetweenURIsandtheirdenotations, andthevariableprecisionofontologicaldesigns, whichmayrangefromthebroad-brushutilityofFOAF (‘only Personshaveemailaddresses’)totheintricacyoftheGeneOntology.

Furthermore, theveryclosetechnicalanalogybetweentheSemanticWeband theWebof Strings reassuresus that thepropertieswhichmade thewebsuccessful (itsopenness, fault-tolerance, andsoon, asdiscussed inSect. 1.2)willsupporttheSemanticWeb, too.

Onceinformationis inRDF form, andstoredina ‘triple-store’, itcanbequeried23 eitherdirectly, orwiththeassistanceofa‘reasoner’, whichsynthesisesextratriplesbasedontheinferencesobtainedfromtheoriginallyassertedtriples,inthepresenceoftheextrastructureprovidedbyasimpleoracomplex ontology.

Wecouldsaymoreaboutthe‘SemanticWeb’. Wecouldtalkaboutsomeofthedetailsof triple-stores andqueryengines, butthatwouldveryquicklyim-merseus ina levelof technicaldetailwhichcanvergeon the impenetrable,andwhichisinanycasestillinactiveresearch-leveldevelopment. Wecouldintroducesomeof thefeaturesofontologydesign: thoughthis isnowlargelystable, itwouldtakeseveralmorechapterstocover, withoutimmediateintellec-tualprofit. Wecouldtalkaboutdeployment, butmanyofthemaindeploymentsofSemanticWebtechnologyareinback-endsystemssupportingcomplicateddata-integrationprojects, anddonotmakeforvividillustration.

Instead, thegoalofthischapterhasbeentoprovideanintroductiontothecoreconceptsofthe‘SemanticWeb’, atalevelandtoanextentthatwillallowthe reader tounderstand thegoalsand starting-pointof aprojectusing thesetechnologies, andtounderstandthewayinwhichthe‘Semantic’WebrelatestotheWebofStringsofourlasttwodecades’familiarexperience. Formoredetails,the W3C’s‘DataActivity’24 providesusefullinkstofurtherreading; andforsomepracticaldetails, includingarathermorescepticalaccountoftheSemanticWebthanI haveproducedhere, see [21].

1.9 WhereistheSemanticWeb?

ItishardertopointtotheSemanticWebthantothetextualweb. AlthoughweencounterthetextwebwheneverwelookatWikipedia, bookaflight, orsearchforcatvideos, thesemanticversionremainsinthebackground.

23ThestandardW3C querylanguageforRDF iscalledSPARQL [20], whichistoa triple-store whatSQL istoarelationaldatabase.

24http://www.w3.org/2013/data/

16

Page 18: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

TheFrenchNationalLibrary(BnF) hasalargecatalogue, splitintomultipleheterogeneousparts, usingvariousstandards(suchasEAD andMARC),andmak-inglinkstoadozenorsoexternalsites. A SemanticWebapproachhasmadethisinformationavailablethroughahomogeneousinterfacewhichprovidesintelligi-bilityandsyntacticalconsistency, withoutsacrificingtheopen-endednesswhichisnecessaryifsuchacomplicatedandlong-accumulatedinformationsourceistobemadefullyavailable [22]. Thiscatalogueisnowinusebyvariousapplica-tionsandsmallerlibraries, orenhancedandexpandedbymorespecialisedcata-logueselsewhere. Goingfromthecataloguetotheinstance, [23]describesasys-temwhichmakesavailable, inmachine-readableform, elementsofthesemanticcontentofpapersinPubMedCentral. Reference [22], isinthe‘in-use’trackofthatyear’sESCW conferenceproceedings, whereyoumightfindotherillustra-tiveexamples; thereare further illustrations in thecollectionofopendatasetsmaintainedbytheSemanticWebJournal.25

Atthetimeofwriting(2014), it isnotyetclear, atleasttome, justwhentheSemanticWebwillbemanifestly‘working’, norhowitwillgetthere; itisnoteventransparentlyclearwhat‘working’wouldmean. Thehighly-integratedscenarioof [8]isstillinthefuture, notbecauseitisinfeasible, butmostlikelybecauseitispredicatedonagooddealoflarge-scalecoordinationwhichisdiffi-culttoforceonadecentredweb. LinkedData(seeSect. 3, below)showsapathintothefuture, butisnotanend-point. Butperhapsweareaskingtoomuchforagranddesign. Theevolutionfromwebpages, toblogs, toTwitterisobviousonlyin retrospect; the fact that (for example) Instagram, Facebook, Twitter, Flickr,andup-to-dateconferenceorganisersallusehashtagsisexplicable, butcouldneverhavebeenpredicted; conversely, itisrathersurprisingthatwearestillus-ingSMTP-basedemail32yearsafteritwasfirststandardised, andinthefaceofnumerousclaimsforkilleralternatives. PerhapsthemostlikelyfuturefortheSe-manticWebmatchesthatofArtificialIntelligenceresearch, eventhoughthefor-mercommunityhasalwaysbeennervousofthecomparison: AI neverobviouslydelivereditsgrandpromises–itnever‘worked’–butinaworldofself-drivingcars, face-recognition, andpocket-sizedmachine-translationdevices, it'sclearthatit'sbeenquietlyengineeringitswayintosuccessfuldeploymentfordecades.

Thetwofinalremarkstobemadeconcern, inthefirstplace, Web 2.0, whichissometimeslinkedtotheSemanticWeb, butwhichinfacthasalmostnothingtodowithit; andinthesecondplacetheLinkedDataparadigm, whichisarguablythefirstappearanceoftheSemanticWebinmorenearlyeverydayexperience.

2 Web2.0

Theterm‘Web2.0’hasdisorientinglymanymeanings. Tosome, itreferstotheclusterofmid-leveltechnologieswhichletservicessuchasGoogleMapsorFace-bookactasan‘applicationinsideawebpage’; toothersithasmeaningswhichcanbegroupedundertheslogan‘theread/writeweb’; anothercampseesitasamarketingexpressionandisdividedaboutwhetherthisisanobviouslygoodoranobviouslybadthing; andforyetothersitisaproxyforabroadchangeinsocietytowardsamoredecentred, orengaged, orempowered, oropen-licensed,orotherwiseutopianfuture. Thesetermsarenotremotelyequivalent: insomecasestheymaynotevenintersect, sosomethingasobviouslynext-generationasWikipediaisWeb 2.0undersomedefinitionsbutnotothers.

Thetermisreallyonlyusefulasanadjective, todenotesomething–anything–thatismorethanacollectionofstaticwebpages. However, evensayingthat

25http://www.semantic-web-journal.net/accepted-datasets

17

Page 19: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

muchbegsthequestion: whatisitthat’sso wrong withstaticwebpages(whichwe’renowobligedtocallWeb1.0)? Indeed, dependingonyourconceptionofwhat‘Web2.0’means, podcastsarefirmlyWeb1.0, andifyouturnoffvisitoreditingonyourwikipages, orcommentingonyourblog, thenthosebecomeWeb1.0, aswell.

The term ‘Web 2.0’ appeared as the title of a conference series run byO’ReillyMediain2004, andthetermspreadfromtheretowiderandstillva-guerusage. Thereisnotechnicaldifferencebetween‘Web 2.0’andtheoriginalversionoftheweb; instead, thetermreferstoaloosecollectionoftechniquesandevenattitudes, whichbecameincreasinglypopularinthefirsthalfofthatdecade, whichwereheavilyorientedtowardssupporting, encouraging, andex-ploitinguser interactivity ontheweb.

Above, I characterisedthewebasaspacewhereanyonecanjoinin, butadmittedthatinitiallythiswasmoretrueinprinciplethaninfact. Thewebofthe90swas, formost, aread-onlymedium: thewebwasfulloftextandimages,buttheonlywaytotalkbackwasthroughemail, orafewnow-arcanespacessuchasUsenet. Thoughbothwereeffective, andthoughUsenetisstill, in2014,NotQuiteDeadYet, theyarehardlyafree-flowingworld-wideconversation. Inthebroadest senseof the term, ablog is aWeb 2.0 thing: it’seasy to setupforoneself, andifeventhatis tooinconvenient, it’seasytouseablogsetupbya service suchasWordpress. With thatstep taken, it’smerelyamatterofcreativediligencetobroadcasttotheworld, andtodispute, declaim, decryandgenerallybickerincommentthreads. Broadlyblog-like thingssuchasTwitterandFacebookbecomemucheasiertoconceive, asbothproviderandconsumer,oncetheideaofablogservicepluscommentshasbecomepartoftheculture.

Withthatsteptaken, othertechnologieswereabletocollide; ‘AJAX’wasonesuch. The‘J’standsforJavascript, whichoriginallyappearedasalanguagesupportingsimplescriptingwithinawebbrowser–providinganelementofdy-namisminwebpages, sothattheycouldadjustthemselvestothebrowser’sen-vironment rather thanappearingasa simpleblockof text sent froma server.Thislanguage, andthesetofpracticeswhichAJAX represents, havegrowntothepointwhereawebbrowsermaynowbeseenasaseparateprogrammingecology. Somewebpages, suchasaGoogleMailpage, arriveasaminimalamountof HTML,enclosingaJavascriptprogramwhich, runningentirelywithinthebrowser, interrogatesaserver, andsynthesisesfromscratchtheHTML whichthebrowserfinallydisplays. Fromaprotocolpointofview, thereareminimaldifferencesbetweentheearliestwebpagesat http://info.cern.ch/hypertext/WWW/TheProject.html (witha‘lastmodified’dateof3December1992), andthetwitter.com homepagewhichcreatesthepageprogrammaticallyandontheflyin thebrowser: bothuseHTTP to transportmaterials fromaserver, andbothofferHTML forthebrowsertorender.26

Oneoftheotherlong-latentpossibilitieswhichwasexploredintheearly2000swastheideaofuploadedandsharedcontent. Flickr(https://www.flickr.com)wasoneofthefirstbroadly-knownservicestosupportusers’uploadingandsharingimages. Atatechnicallevel(again), thisisnotmuchdifferentfromblog-ging, butFlickrandthesocial-bookmarkingsite delicious.com areinterestingforthischapter’spurposesbecausetheywereamongthefirstwidespreadservicestosupport tagging. Flickrallowedusers toaddsimple labels topictures, andDeliciousalloweduserstosimilarlylabelwebpages; thereispresumablysomeindirectlinkherewiththerapidadoptionofTwitterhashtags. Thesetags–whichwereintendedtosupportbothcategorisationandsearch–weretakenfroman

26Indeedthereissomethingofaparadoxhere: thewebof2010andthewebof2000lookhugelydifferent, totheextentthattheyseemtocomefromdifferentworlds. Butthesedifferencesaretodowithnewsocialpossibilities, andwiththebrowser’snewandalmostentirelyunpredictedprogram-mingecology, ratherthanarisingfromanychangeintheunderlyingnutsandbolts; andthecloserwelookatWeb 2.0, themoreoutoffocusitseemstobecome.

18

Page 20: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

unrestrictedvocabularybutwiththeexpectationthatuserswouldspontaneouslycooperatetochoosebroadlycompatibletags, inalightweightsharedsemanticenvironmentwhichwassoonlabelleda‘folksonomy’(aswebrieflymentionedabove). HoweverthisisasfartowardstheSemanticWebasWeb 2.0hasevermoved, andso, despitethesuggestionofthe‘Web 2.0’nameandsomeoftheaccompanyingrhetoric, Web 2.0hasratherlittletodowiththeSemanticWeb.There'smoretoexploreintheprogrammingecology, butinwebterms, Web 2.0isratheradeadend.

3 LinkedData

EvenacknowledgingthattheSemanticWebwillprobablyremainsomewhatob-scure, onecanaskhowbroadlyitisdeployed, ortowhatextentitsdeploymentsarereachable. Aswediscussedabove, theSemanticWebisstilllargelyusedin‘server-side’applications, ratherthanbecomingpartofthecommerceoftheopenweb, aswasoriginallyanticipated. PartofthereasonforthisisthattheSemanticWebtechnologystackischallengingtouse: thetoolsareofvaryingmaturity,anduseabroadrangeofunderlyingtechnologies, sothatdeploymentsarestilltoalargeextentbespoke. Evenwhensimplymakingdataavailable, ratherthanattemptingtoprocessit, makingdataweb-readymayrequireaquiteprofoundengagementwiththeSemanticandtextualwebs’conceptualfoundations; manysuchdeploymentsarestill, evennow, worthyofanacademicpaperdescribingtheexperience. Fewnon-specialistprojectscanmakesuchinvestments.

Analternativeapproach is touse the LinkedData paradigm.27 Atheart,the linkeddataparadigm is the SemanticWeb, butwithapractical focusontheirreducibleminimumrequiredtogetdataontothewebinawaywhichiscompatiblewiththegoalsandpracticesdescribedinSect. 1.

Quoting [24], thelinkeddataprinciplesare:

1. “UseURIsasnamesforthings.

2. UseHTTP URIssothatpeoplecanlookupthosenames.

3. Whensomeonelooksupa URI,provideusefulinformation, usingthestan-dards(RDF*, SPARQL).

4. IncludelinkstootherURIs. sothattheycandiscovermorethings.”

Notably, theseprincipleshavemoretosayaboutHTTP andURIs–thatis, aboutthe mechanics ofretrievingthisinformation, andbyextensionthemechanicsofmakingitavailable–thantheysayaboutthesemanticsoftheinformationbeingdistributed.

Together, thesesay(ineffect)thatthe‘linkeddataweb’isjustlikethewebofstrings, familiar tousall, except that therawmaterialsarenot HTML files,butfiles inoneorotherRDF syntax (soRDF/XML,Turtle, orRDFa). All fourpointsareessentiallyadaptationsofthedesignprinciplesofSect. 1.2 andbestpracticessuchas [25], whichwere, asitturnedout, soveryeffectiveinmakingtheweb(ofstrings)soverysuccessful. ThethirdpointstatesthatlookingupaURI shouldprovideuseful(RDF) information toamachine, butisalsotakentosuggestthatputtingaresource’sURI intoawebbrowsershouldprovidehuman-readable(HTML) informationaswell–thisprovidesonebridgebetweenthetwowebs.

27Theword‘paradigm’isprobablythebesttermingeneral, totheextentthatitrepresentstheresultofacceptingtheinsightofasetofratherhigh-levelprinciples. Howevermakingtheseprinciplesconcrete is intricateenough that ‘LinkedDataBestPractices’sometimesseemsmoredescriptive;andwhendefendingtheprinciplesagainstotherapproaches, theword‘dogma’springsquicklytomind.

19

Page 21: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

Figure 3: TheLinkingOpenDataclouddiagram, asofApril2014. Althoughthe text andmost of the interconnections are too small tomakeout at thisscale, thetwolargestcentralblobs–whichactasrichsourcesofconsensusnamesforobjectsintheworld–representDBPediaandGeoNames. Thegreen-colouredclusters representgovernment (left)andbibliographicdata (right);otherclustersrepresentprovidersoflifesciences, geographic, socialnetwork-ing, media, user-generated, and linguistic data. For details and credits, seehttp://lod-cloud.net.

Althoughthetraditionalwebisgenerallythoughtofasdeliveringhuman-readablepages, ithasfromtheverybeginningbeenusedfordeliveringmachine-readablecontent, suchasdatafilesforanalysisorPDF filesforprinting. Thisisnot themachine-readabilitywemean in thiscontext. Justasahumanmightread aWikipedia article, internalise the results, andclickona link formoreinformation, alinkeddataclient–typicallyacomponentofalargerapplication– isexpected toprocess thesemanticcontentof thedata itfindsatoneURI,andthenfollowfurtherlinksforfurtherinformation. GiventheRDF ofFig. 1,forexample, a linkeddataclientmight store the information listed, and thenfollowthelink http://someone.org/ping toseeifthereisanyfurtherinformationavailablethere. Itisthisfinalstep, whichletsthemachine‘followitsnose’thatrepresentsthekeyinsightofthelinkeddataparadigm, andwhichisthemachineanalogueof‘wastinghoursonWikipedia’.

Thenetworkoflinksbetween‘linked-data’-styledatasourcesisillustratedinFig. 3. A largefractionofthedatainthiscloudisintheformofSKOS vocab-ularies, orusesthesimpleFOAF orDC ontologies; butwhiletheseareusefulforcoordinationbetweendatasources, thereisnorestrictionontheontologieswhichareusedinfact.

ThearchetypalLinkedData(client-side)applicationisasimpleapplication–perhapsevenrunningentirelywithinabrowser–whichreadsthecontentsofaURI,interpretsandperhapsdisplayswhatpredicatesitrecognises(whichwillprobablyincludeatleastFOAF),andletsausermoveontofurtherrelatedinfor-mation. ThearchetypalSemanticWebapplication(ifsuchathingexists)mightuseSPARQL toqueryalargeandsemanticallyrichdatabase, possiblyrequiringafairamountofinferencingonthepartoftheserviceandwithinitself, andre-quiringsignificanteffortonthepartofitsauthor, tounderstandtheontologyorontologiesinwhichtheinformationisencoded. Thoughthesewillbevisiblydifferentapplications, andoneisaloteasiertoimplementthantheother, itishardtoarticulateafundamentaldifferencebetweenthem.

20

Page 22: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

Therearefurthercompactdetailsin [26], abook-lengthdiscussionwithapractical focus in [27], andpointers tocurrent informationat http://www.w3.org/standards/semanticweb/data.

4 Conclusion

Onecouldderivetheimpression, fromthelongdescriptionofSect. 1, thattheSemanticWebiscomplicatedandarcane. Itcanbe, buttherudehealthofthelinkeddatacloudshowsthatthatcomplicationisinthebackgroundofasimpleandrobustmeansofdistributingmachine-intelligibleinformation. Whilewemaynotbequiteatthestageoflettingourcomputersrunamokwithourcreditcard,ororganiseourlivesasthevisionof[8]imagines, wehavenowconfidently, ifstillratherdiscreetly, steppedintoawebofdata.

Acknowledgements

I ammostgratefultoSusanStuart forexactinganddetailedcommentsonthedraftsof this chapter, tomembersof the [email protected] list for ‘in-use’references, andtoChrisBizerformakingavailableanearlycopyofthe2014LinkedDatacloud.

Glossary

AI ArtificialIntelligence.

DC DublinCore: asetofcoremetadatatermsdevelopedbyandforthelibrarycommunity; see http://dublincore.org/.

FOAF FriendofaFriend: avocabularyforattributesof, andrelationshipsbe-tween, people; see http://www.foaf-project.org/.

HTML HypertextMarkupLanguage; see http://www.w3.org/html/wg/.

HTTP HypertextTransportProtocol: theunderlyinglanguageoftheweb; see [1].

RDFS RDF Schemas: lightweightontologiesforRDF;see [12].

RDF ResourceDescriptionFramework: see [10].

URI UniformResourceIdentifier: theformofaresourcename(almostthesameasaURL);see [2].

W3C WorldWideWebConsortium: theorganisationwhichdevelopsandstan-dardisesprotocolsfortheweb; see http://www.w3.org.

triple Thetripleofsubject, predicateandobjectwhichisattheheartoftheRDFmodel.

triple-store A databasedesignedtostoreRDF triples, ratherthanthetabulardataofamorecommon‘relational’database.

References

[1] Roy T Fielding, J Gettys, J Mogul, H Frystyk, LarryMasinter, P Leach, andTimBerners-Lee. Hypertexttransferprotocol–HTTP/1.1. RFC 2616, June1999. URL: http://www.ietf.org/rfc/rfc2616.txt.

21

Page 23: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

[2] TimBerners-Lee, Roy ThomasFielding, andLarryMasinter. Uniformresourceidentifier(URI):Genericsyntax. RFC3986, January2005. URL:http://www.ietf.org/rfc/rfc3986.txt.

[3] TimBerners-LeeandRobertCailliau. WorldWideWeb: Proposalforahypertextproject. Webpage, CERN,November1990. URL:http://www.w3.org/Proposal.html.

[4] TimBerners-Lee. Informationmanagement: A proposal. Technicalreport,CERN,March1989. URL: http://info.cern.ch/Proposal.html.

[5] TimBray, JeanPaoli, andC M Sperberg-McQueen. Extensiblemarkuplanguage(XML) 1.0. W3C Recommendation, February1998. Firsteditionof http://www.w3.org/TR/REC-xml/. URL:http://www.w3.org/TR/1998/REC-xml-19980210.

[6] OraLassilaandRalph R Swick. Resourcedescriptionframework(RDF)modelandsyntaxspecification. W3C Recommendation, February1999.URL: http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.

[7] VannevarBush. Aswemaythink. TheAtlanticMonthly, July1945. URL:http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/.

[8] TimBerners-Lee, JamesHendler, andOraLassila. TheSemanticWeb.ScientificAmerican, 284(5):34–43, May2001.doi:10.1038/scientificamerican0501-34.

[9] JensLehmann, RobertIsele, MaxJakob, AnjaJentzsch, DimitrisKontokostas, Pablo N.Mendes, SebastianHellmann, MohamedMorsey,PatrickvanKleef, SörenAuer, andChristianBizer. DBpedia–alarge-scale, multilingualknowledgebaseextractedfromwikipedia.SemanticWebJournal, 2014. Toappear. doi:10.3233/SW-140134.

[10] RDF 1.1conceptsandabstractsyntax. W3C Recommendation, February2014. URL: http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.

[11] RDF 1.1Turtle. W3C Recommendation, February2014. URL:http://www.w3.org/TR/2014/REC-turtle-20140225/.

[12] RDF Schema1.1. W3C Recommendation, February2014. URL:http://www.w3.org/TR/2014/REC-rdf-schema-20140225/.

[13] RDF 1.1XML syntax. W3C Recommendation, February2014. URL:http://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/.

[14] RDF 1.1semantics. W3C Recommendation, 2014. URL:http://www.w3.org/TR/rdf11-mt/.

[15] MichaelAshburner, Catherine A.Ball, Judith A.Blake, DavidBotstein,HeatherButler, J. MichaelCherry, Allan P.Davis, KaraDolinski, Selina S.Dwight, Janan T.Eppig, Midori A.Harris, David P.Hill, LaurieIssel-Tarver,AndrewKasarskis, SuzannaLewis, John C.Matese, Joel E.Richardson,MartinRingwald, Gerald M.Rubin, andGavinSherlock. Geneontology:toolfortheunificationofbiology. NatureGenetics, 25(1):25–29, May2000. doi:10.1038/75556.

[16] MichaelBada, RobertStevens, CaroleGoble, YolandaGil, MichaelAshburner, Judith A.Blake, J. MichaelCherry, MidoriHarris, andSuzannaLewis. A shortstudyonthesuccessofthegeneontology. JournalofWebSemantics, 1(2), 2004. URL:http://www.websemanticsjournal.org/ps/pub/2004-9.

[17] Thomas R Gruber. A translationapproachtoportableontologyspecification. KnowledgeAcquisition, 5(2):199–220, 1993.doi:10.1006/knac.1993.1008.

[18] Deborah L.McGuinness. Ontologiescomeofage. InDieterFensel, JimHendler, HenryLieberman, andWolfgangWahlster, editors, Spinningthe

22

Page 24: RDF, the Semantic Web, Jordan, Jordan and Jordan · the web. The key point is that the web – and this includes the Semantic Web – is a notably simple structure sitting atop the

RDF,theSemanticWeb, Jordan, JordanandJordan

SemanticWeb: BringingtheWorldWideWebtoItsFullPotential.MITPress, 2003. URL: http://www-ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm.

[19] W3C OWL WorkingGroup. OWL 2webontologylanguagedocumentoverview(secondedition). W3C Recommendation, December2012.URL: http://www.w3.org/TR/owl2-overview/.

[20] TheW3C SPARQL WorkingGroup. SPARQL 1.1overview. W3CRecommendation, March2013. URL:http://www.w3.org/TR/sparql11-overview/.

[21] AaronSwartz. A ProgrammableWeb: AnUnfinishedWork. SynthesisLecturesontheSemanticWeb: TheoryandTechnology.Morgan&Claypool, 2013. doi:10.2200/S00481ED1V01Y201302WBE005.

[22] AgnèsSimon, RomainWenz, VincentMichel, andAdrienDi Mascio.Publishingbibliographicrecordsonthewebofdata: OpportunitiesfortheBnF (FrenchNationalLibrary). InPhilippCimiano, OscarCorcho,ValentinaPresutti, LauraHollink, andSebastianRudolph, editors, TheSemanticWeb: SemanticsandBigData, volume7882of LectureNotesinComputerScience, pages563–577.SpringerBerlinHeidelberg, 2013.10thInternationalConference, ESWC 2013, Montpellier, France, May26-30, 2013. URL:http://2013.eswc-conferences.org/program/accepted-papers,doi:10.1007/978-3-642-38288-8_38.

[23] L Garcia Castro, C McLaughlin, andA Garcia. Biotea: Rdfizingpubmedcentralinsupportforthepaperasaninterfacetothewebofdata. JournalofBiomedicalSemantics, 4(Suppl1):S5, 2013. URL:http://www.jbiomedsem.com/content/4/S1/S5,doi:10.1186/2041-1480-4-S1-S5.

[24] TimBerners-Lee. Linkeddata. Webpage, 2006. URL:http://www.w3.org/DesignIssues/LinkedData.html.

[25] LeoSauermann, RichardCyganiak, DannyAyers, andMaxVölkel. CoolURIsforthesemanticweb. W3C InterestGroupNote, December2008.URL: http://www.w3.org/TR/2008/NOTE-cooluris-20081203/.

[26] ChristianBizer, TomHeath, andTimBerners-Lee. Linkeddata–thestorysofar. InternationalJournalOnSemanticWebandInformationSystems,5(3):1–22, 2009. URL: http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf,doi:10.4018/jswis.2009081901.

[27] TomHeathandChristianBizer. LinkedData: EvolvingtheWebintoaGlobalDataSpace. SynthesisLecturesontheSemanticWeb: TheoryandTechnology.Morgan&Claypool, 2011. URL:http://linkeddatabook.com, doi:10.2200/S00334ED1V01Y201102WBE001.

Version4fa6c06eeaa4, 2015-01-18 23