rdf, the semantic web, jordan, jordan and jordan · the web. the key point is that the web – and...
TRANSCRIPT
,,nn
Gray, N. (2015) RDF, the Semantic Web, Jordan, Jordan and Jordan. In: Moss, M. and Currall, J. (eds.) Transition to the Digital. Facet Publishing.
Copyright © 2015 The Author
A copy can be downloaded for personal non-commercial research or study, without prior permission or charge
Content must not be changed in any way or reproduced in any format or medium without the formal permission of the copyright holder(s)
http://eprints.gla.ac.uk/101484/
Deposited on: 10 April 2015
Enlighten – Research publications by members of the University of Glasgow http://eprints.gla.ac.uk
RDF,theSemanticWeb, Jordan, JordanandJordanNormanGray1
24August2014
DRAFT chaptertoappearin‘TransitiontotheDigital’, MichaelMossandJamesCurrall(eds.) (2015?). Thiscollectionisaddressedtoarchivistsandlibraryprofessionals, andsohasaslightfocusonimplicationsimplicationsforthem. Thischapterisnonethelessintendedtobeamore-or-lessgenericintroductiontotheSemanticWebandRDF,whichisn'tspecifictothatdomain. CopyrightNormanGray2014.
ThischapterisaboutthekeynoveltiesoftheSemanticWeb–thenovelideas,andthenovelopportunities. ButwewilldiscussthesedigitalnoveltiesinthecontextoftheSemanticWeb’s continuities withotherfeaturesoftheinformationworld.
Ourmostobviousantecedentisnotthatold–the(non-semantic)Webdidn’texistbeforethe90s–andwewilllearnabouttheveryclosetechnicaloverlapbetweentheSemanticWebandthe‘textualWeb’ofournow-usualexperience.TheSemanticWebis, closerthanacousin, thesiblingofthetextualweb.
Otherantecedentshaveahistoryasoldasthefirstlibraryindex. TheSe-manticWebhas, wemightsay, a‘logicalwing’andan‘informationwing’. Thesearenotprimarilydistinguishedbytheirtechnicalororganisationalfeatures, butby the largely disjoint research questions they address, and by theirmotiva-tions. While the ‘logicalwing’ ischaracterisedbyaconcern for formal logicanditsimplementations, richinthetheoryofcomputingscience1, the‘informa-tionwing’, withasturdilypragmaticfocus, canberegardedascontinuouswiththeinformation-organisinggoalsoftheworldoflibraryscience, sharingitsaspi-rationtosystematizeandshareinformation, anditsacknowledgementthatsuchsharingisalwaysapproximateandneverunmediated, andthatonemustaimforabalancebetweenfaithfulness tosources, andwhat isactuallyusablebytheinformation’sactualaudience.
Below, wewillstartwithabroadintroductiontotheSemanticWeb. Fromtherewecanmovebrisklyontopractice, andthequestionofwhere, howandwhether thesemanticwebmightappear intechnological fact. Ourgoal is toindicatethecontinuitieswiththetextualweb, andthustoindicatethenovel-tiesoftheSemanticWeb, andsosuggestwhytheyareimportant. Afterabriefparenthesison‘Web2.0’(inSect. 2), wedescribe‘linkeddata’inSect. 3.
1 WhatistheSemanticWeb?
TheSemanticWebissimpleinsummary:
TheSemanticWebistheemergingnextstageoftheweb, designedtotransmitmachine-processablemeaning, through a logical framework
1PhysicsandAstronomy, UniversityofGlasgow, UK; http://nxg.me.uk1ThisstrandcanwithatleastalittlejusticeberegardedasthemostrecentlasthurrahofAI,a
traditionpronounceddeadmoreoftenthan SouthPark’s Kenny.
RDF,theSemanticWeb, Jordan, JordanandJordan
namedRDF,enhancedbymachineinferencebasedonOWL andotherontologies.
ThoughI assertthatthismodeststatementisplausible, andtheoutcomedesir-able, thereadermaybedisinclinedtoagree, onthegroundsthatthestatementisonthefaceofthingsgobbledegook. Overthecourseofthischapter, I intendtoexplaineachcomponentofthisremark, stepbystep, inthehopethatitisashorthopfromtheretoplausibility.
1.1 TheSemanticWeb…
First, agenerallament.The term ‘semanticweb’ is anunfortunateone, since itmakes the topic
soundmuchmorearcanethanitreallyis. Itisarguablysimple: thenextstageinalong-termvisionoftheweb, originallyformulatedbyTimBerners-Lee, andelaboratedbythe WorldWideWebConsortium(W3C) whichhewasinstrumen-talinfounding.
1.2 …istheemergingnextstageoftheweb…
Beforewecanfullyunderstandthe‘semanticweb’, weneedaclearideaofwhatthe‘WorldWideWeb’is, eventhough, nowadays, suchaquestionmayseemasoddasaskingwhat‘air’is.
Thewebis remarkably homogeneous: itconsistsof one protocol, one bitofglue, andmarkup.
Everythingon theweb is connectedby theHypertextTransport Protocol(HTTP) [1], andifyoulookatawebpage, updateapodcast, talkonJabber, ordownloadvideos, onacomputeroramobilephone, thebytescometoyourcomputerviaHTTP.Indeed, adebatablebutplausibledefinitionofthewebisas the setof things reachable throughanHTTP request. Theonlybitof thatprotocolyouevernoticeis‘404’, whichistheHTTP errorcodemeaning‘I don’thaveanythingbythatname’.
Thebitofglueisthe UniformResourceIdentifier(URI) [2]. That’sauniformnaming schemeforthingsonthewebandbeyond. EverythingonthewebhasaURI,andmostURIsrefertothingsontheweb.2
Themarkupismostly HypertextMarkupLanguage(HTML),theangle-bracketswhichindicatehowwebpagesshouldbeformatted. Butit’salsoRSS andAtomfeeds(ie, blogsandpodcasts), wikisyntax, PDFs, andafewothermoreobscurealternatives.
Thesecomponentscometogetherwhenyouclickonalinkinawebpage(thefollowingexplanationofhowthewebworksisprobablynotnewtoyou,butI'mspellingitoutinordertomakeclearhowthevariouscomponentsworktogether). Thewebbrowserprogramonyourcomputerortabletorphoneisaclient, whichdisplaysapagepreviouslysenttoitbyaprogramsittingona serversomewhereintheinternet(wewillusetheselattertermsrepeatedlybelow). Thepage(typically)arrivesatyourcomputerintheformofHTML whichthebrowserknowshowtoformatanddisplayasheadings, sidebars, images, andlinks. Thelinkassociates some texton thepagewitha URI,andclickingon it tells thebrowser‘goandlookatthispageinstead’. ThebrowserthenexaminestheURItodiscoverwhichwebserverisprovidingthatpage, thenimmediatelymakesanotherHTTP requesttothatservertoretrievethepage, displayittoyou, andstartthecycleoncemore.
2That’sasortof ‘mindspace’most: thisstatementprobably isn’tnumerically trueifyougobynumbersofobjectsorvolumeofdata.
2
RDF,theSemanticWeb, Jordan, JordanandJordan
Beforewegoon, I should(parenthetically)becarefultodistinguishthewebfromthe internet. Theinternetisasetofprotocols, plural, forexchangingmate-rialbetweennetworkedcomputers. Whenyousendanemailorretrieveawebpage, thematerialtravelsovertheinternet, butunderthecontrolofdifferentpro-grams–anemailclientandawebbrowser–usingacombinationoflower-andhigher-levelprotocols, or languages.3 Internet telephony (suchasSkype)andtimeservicesaretworeasonablyvisibleinternetserviceswhicharedistinctfromtheweb. Thekeypointisthattheweb–andthisincludestheSemanticWeb–isanotablysimplestructuresittingatoptheinternet.
Thereareafewkeydatesinthehistoryoftheweb.
• 1990: TimBerners-LeeandRobertCaillaufirstproposedthesystemwhichbecametheWorldWideWeb [3].4 Thefirstserverandclientimplementa-tionsappearedasCERN-internalserviceslaterthatyear.
• August1991: Firstpublicweb server. Initially, users interactedwith theserverbyusingTelnet(anothernon-Webinternetprotocol)toconnectdi-rectlytoaclientprogramatCERN,andthencetotheserver.
• December1991: Firstweb serveroutsideEurope, at theStanfordLinearAcceleratorCenter(SLAC,likeCERN anexperimentalhigh-energyphysicslaboratory).
• April1993: Mosaic, fromthe(US) NationalCenterforSupercomputingAp-plications(NCSA),wasthefirstgraphicalbrowser. MosaicledtoMozilla,which led to today’s Firefox; Mozilla also led to the Netscape browser.Aboutthesametime, NCSA releasedtheir‘httpd’webserver, whicheven-tuallymutatedintothenow-ubiquitousApache.5
• October1994: TheWorldWideWebConsortium(W3C) wasfounded, withBerners-Lee as its director, to act as the standards bodyof the emergingsystem. EarlystandardsincludedHTML 3.2(1997),6 thefirstversionofXML(1998) [5], andthefirstmodelandsyntaxforRDF (1999) [6].
• 2006: Lolcats (andother ‘user-generated content’). We return to this inSect. 2
Although itmayseemaslight tangent, it isworthwhilebrieflydiscussingwhatitisthatmakesthewebsospecial, andsoverysuccessful, sincethese-manticwebisinprotocoltermsidenticaltothewebwearefamiliarwith, andsosharesthesamespecialfeatures.
Thewebisnotthefirsthypertextsystem(VannevarBush’shypotheticalMemexsystem [7], andTedNelson’s Xanadu system7 canprobablylayclaimtothat),and it’snot thefirstdistributedhypertext system (VAX NotesandLotusNotescanprobablyclaimthat), butitisthefirsttrulysuccessfulworldwidedistributedhypertextsystem.
Thewebgetssomethingsright:
3Thispointisrathermuddiedbytheobservationthatmanypeoplenowreademailmessagesinawebbrowser; nonethelesstheemailisstilltransportedbetweensystems, inthebackground, usingthedecades-oldemailprotocols.
4This1990documentcomesabout18monthsafter[4], whichisabroaddiscussionofinforma-tionmanagement, inthecontextof“themanagementofgeneralinformationaboutacceleratorsandexperimentsatCERN.”Thisdocument–onwhichBerners-Lee’smanagerwrote“vague, butexciting”–isclearlyancestraltoboththewebandto[3], butitisparticularlyinterestingfromthepointofviewofthischapter, becauseinhindsightitmorecloselyprefiguresthestructureandpotentialofRDF andthe Semantic Web. Onecouldalmostcallthetextualweb‘SemanticWeb0.1’.
5Apacheoriginallyconsistedofasetofsoftware‘patches’toNCSA httpd, hencethe(unfortunatelyapocryphal, itseems)etymologyofitsnameas‘apatchyserver’.
6HTML 2.0wasanIETF standard; earlierversionswerenotformallystandardised.7http://xanadu.com/
3
RDF,theSemanticWeb, Jordan, JordanandJordan
• It isdistributed, ordecentralised, so that there isno centre to theweb–thereisnosinglepointwhichcanfail, orbeco-opted, orwhichcangrantorrefusepermission.
• Itisnon-proprietary, oropen: theprotocolswhichdefinethewebarefreetoobtain, andmaybeimplementedwithoutanyinhibitionsfromlicences.Also, theweb’sgoverningbody(whichthe W3C is, ineffect)doesitsworkthroughanopenprocess.
• Thewebissimple, inthesensethatitisarchitecturallysimple(thenotionoftheweb, asserversplusbrowserspluslinks, iseasytocomprehend), andstraightforwardinprotocol terms(thecoreprotocolof theweb, HTTP, issuch thatacrudeserverorclient implementationcanbedeveloped inarelativelyshorttime).
• It’seasytojoinin: thisistoalargeextentaconsequenceofthesimplicityandopenness.
• YoucanwastehoursonWikipedia.
Onesenseof‘easytojoinin’isthatthewebhas always beenread-write:Berners-Leeconceiveditasaread-writemedium, thefirstbrowserscouldwritetothewebaswellasreadit, and‘anyone’canputupawebpage. Now, ‘anyone’heremeant‘anyonewithaccesstoaunixboxconnectedtotheinternet, whocanbuild, configureandinstallawebserver’; sonotquiteeveryone’s‘anyone’, butthekeypoint is that itwasusability that stoppedyou fromputtingupawebpage, nottheneedforpermission. Thusthe(genuinely)biginnovationofwhatbecameknownas‘Web2.0’, or‘theread/writeweb’, wasthatitwas‘Web1.0’foreveryoneelse.
Thelastpointinthelistisnotfrivolous, butisthepointthat, ontheweb,youcanlinkto anythingelseontheweb, andthatthisisbothculturallyaccept-ableandencouraged. Youcanwanderthroughalotofwebserversbystartingatonepageand‘followingyournose’. This isaconsequenceof thesimplic-ityandopenness, whichtogethermeanthatthereareverymanywebservers,fromveryheterogeneoussources, servingwebpageswrittenbyexpertsandnon-experts, andthatlinkingbetweentheseisnotinhibitedbyanyrequirementforpre-coordination.
Theweb’ssuccessisalsopartlyduetogettingsomethings wrong:
1. linkscanbreak, andpagesdisappear;
2. linksare all ‘seealso’(asopposedto‘parent’, ‘next’, ‘author’oranythingmoreinformative);
3. everythingisastring–theinformationonthewebis, fundamentally, com-municatedthroughtext.8
Onemightalsosaythat‘noqualitycontrol’isavice, butsinceonecansimulta-neouslyclaimthat‘nocontrolatall’isavirtue, thisisatleastdebatable.
Xanadu, forexample, guaranteedlinkintegrity, hadtypedlinks, andtriedtodevelopanewintellectualpropertymodel, allatthesametime. Berners-LeeandCaillau’sinsight, in [3], wasthatthesearesimplyignorableproblems–it’sacceptableforthingstobreak, andoccasional404sareapriceworthpayingtoavoidtheneedforpre-coordinationorregistration; andtheweb’stype-lessandone-directionallinkissuchapowerfulnotionthatitsintrinsicvaguenessisonly
8TheHTTP protocolcanbe, andis, usedtotransportdigitalmediaofalltypes–plaintext, images,PDFs, audio, andotherdata; forourpresentpurposes, I taketheterm‘theweb’torefertotheglobalhypertextsystemofBerners-Lee’soriginalconception; thisdoesnotaffecttheessentialpoint.
4
RDF,theSemanticWeb, Jordan, JordanandJordan
adetail. Thepointofthesemanticwebisthatitstartstoaddressthesecondandthirdproblems.
Whyis‘everythingisastring’aproblem? Ifyousearchonthewebfor‘jor-dan’, yougetlinksaboutthecountry, theriver, theglamourmodel, thebreakfastcereal, thebrandofshoes, varioussmallbusinesses, thebasketballplayer, themathematician, andmore. Thesearenotallthesamething.
Thisdoesn’tmatter, however, becauseweashumansknowthey’redifferentthings, andwe’re not likely to get confused (and ifwe are, briefly, confusedwethinkit’sourfault, ratherthanGoogle’s). Wecan, inotherwords, addthesemanticsourselves, inexactlythesamewaythatwedosowhenreadingtextanywhereelse. Butthismeansthatcomputersareflyingblindwhentheytrytoperformactionsonthewebonourbehalf. Theyhavenoideawhatall thesestringsmean, nor(moreimportantly)howtheyrelatetooneanother.
Thewebworksverywellformanythings, andsearchenginesofcourseworkspookilywellinmanycases.9 Butitonlyreallyworkswhenthere’sahumanintheloop, orwherethere’salotofstatisticstobuildon, andyouwouldn’twanttoletyourcomputerunsupervisedontotheweb, withyourcreditcard.
Butinfactyou do wanttoletyourcomputerontothewebwithyourcreditcard. Inafamousandseminalpaper, Berners-Lee, HendlerandLassila [8]de-scribeanextendedscenarioinwhichcomputersareabletointeractwitheachothertoorganisetravelandotherappointments. Thisscenariohascomeaboutto someextent: price-comparisonwebsitesnow routinely interactwithotherwebsites to extract price data; and sites like tripit.com workbyparsing theconfirmationemailsfromtravelwebsitessuchasExpedia, toextractthedetailsofflightsandhotelstays. Theproblemisthattheseservicesworkbypainfullyscanningthecontentofwebpagesoremailsdirectedathumans(thisisknownas‘screenscraping’), heuristicallyparsingthem, andactingonwhatmayormaynotbetheintendedmeaning.
Thisishardworkforlimitedresults, but(atleastuntilcomputerssomehowmanagetounderstandtext)itisafundamentallimitationofthe webofstrings.
1.3 …designed…
I have stressed thatBerners-LeeandCaillau’soriginal conception is architec-turallystillveryclosetothewebweseenow, 25yearslater. Berners-Leewentontoformthe W3C,anditisthisbodywhichbygeneralconsent–fornocom-pulsionispossiblehere–stillshepherdsthedevelopmentofthecollectionofstandardswhichunderliestheweb(otherbodies, mostnotablytheIETF,arere-sponsible for the technicalgovernanceanddevelopmentof the internet, byasimilarprocessofgeneralconsent). Althoughthecorestandardsareeasilyiden-tified– HTTP, URIs and HTML –theseareaccompaniedbyablizzardofotheragreements, majorandminor, rangingfromthesyntaxofXML toconsensusontheformattingofdates. Theseagreementstaketheformof‘W3C Recommen-dations’,10 collaborativelyauthoredbyW3C WorkingGroups formallydrawnfromacademicandcommercialW3C memberorganisations, butwithwideandoccasionallynoisyparticipationfrominterestedindividualsworld-wide.
Muchof the developmentwork for the semanticweb, in particular, hascomefromuniversitiesandasmallnumberofresearch-activecommercialorgan-isations, oftenwithquitecloselinkstoacademia. Thereisastronginheritance
9Ifyouactuallysearchfor‘jordan’in, forexample, Google, Bing, andDuckDuckGo, youwillfindthat several ofthesedistinctsensesappearonthefirstpage, sothatthereismorevarietythanwouldresultfromsimplylistingthemostpopularpageswiththatstringinthem. Thisisbecausesearchenginescan statistically identifythattherearemultipleclustersofrelatedpagesandthus, forexample, displayonehitfromeachclusterinasituationlikethis. Thesearchenginesareunderstoodtoimprovetheirperformanceherewithsomelightweightsemantics, butthecoreworkofasearchengineisatpresentstillprobabilisticratherthanlogical.
10http://www.w3.org/standards/
5
RDF,theSemanticWeb, Jordan, JordanandJordan
fromprecedingdecadesofresearchon ArtificialIntelligence(AI);thisinheritanceincludedcrucialexperienceofthelogicalunderpinningsofwhatbecameRDF(seeSect. 1.5 below), andexperiencewithrelatedtechnologies, whichmeantthatthefirstRDF standardswereremarkablyself-consistentandwell-developed.Howeverthissameprocessmeantthatthesefirststandardswereratherhermetic,whichwillhavecontributedtotheirearlyreputationforincomprehensibility.
Howevercomprehensiblethesestandardsare–andwewilltouchonthecoreideasbelow–itisimportanttostressthatitdoesnotrequiresomefoun-dationalunderstandingofthemathematico-logicalfoundationsofRDF inordertodescribe, forexample, abook’sbibliographicaldetails. Inrecentyears, the‘linkeddata’ paradigmhas emerged, againwith both academic and industrybacking, withafocusonthepracticalstepstobringabouttheimmediate-termpayoffsof thesemanticweb. Wediscussthis inalittlemoredetailbelow, inSect. 3.
1.4 …totransmitmachineprocessablemeaning…
TheSemanticWebhassomefamilyrelationshipswiththeworldof AI,andin-deedhassuffered, inmarketingterms, fromitsassociationwiththatdiscipline’srepeatedpostponementsof itsgreatpromises. TheSemanticWebisnotcon-cernedwithmachineunderstanding, orwithmachines’creationofmeaning, butinsteadwiththemoremodestgoaloftransmittingmeaningacrosstheweb, inamorereliablewaythanispossiblewiththewebofstrings.
Ourconfusion, above, aboutthedifferentthingscalled‘jordan’arisesbe-causethesevariousverydifferentthingsallshareasinglelabel–‘jordan’–anditisonlyourabilitytounderstandthecontext, andourknowledgeofthestructureoftheworld, thatallowsusinpracticalfacttodistinguishthevariousdenota-tions(wedonotgetconfusedwhentheterm‘jordan’appearsonanewsweb-site’spoliticsandcelebritypages, denotingdifferentthings). Facedwiththesametext, computerscandolittlemorethan(beprogrammedto)relyonstatisticsandheuristics. Searchenginesshowthatthisapproachismoresuccessfulthanonemightexpect, butuntilcomputersdofinallymanageto‘understand’things(iftheyeverdo), thereislittlemorewecando, ifwestickwithjustthetext.
SemanticWebtechnologiesallowustotakeastepbeyondtheselimitations,by
• providinga foundationwithwhich todefinemore specific labels for theconceptsandcategorieswhichmakeupourworld; and
• allowingustomanipulatetheselabels(andbyanalogymanipulatethecon-ceptsandcategories)inamoreorlesssimplecalculusofrelationships.
Anexampleisusefulhere.Onename for theRiver Jordan is http://dbpedia.org/resource/Jordan_
River. ThisderivesfromDBPedia [9], whichisacollectionofSemanticWebnamesderiveddirectlyfromWikipedia. Thename http://sws.geonames.org/7874114/ isanotherone, whichderivesfromtheGeonamesdatabase.11 There-semblanceofthese‘names’toordinarywebURLsisnotacoincidence–allthenameswithintheSemanticWebaresyntacticallyofthisform. Oneoftherea-sonsforthisisthatitpreservesthe‘decentred’propertyofthe(text)web: theownersofthe geonames.org domaincancreatewhatnamestheylikewithinthatdomain, since‘creatinganame’inthiscontextconsistssolelyofdecidingthataparticular URI shouldactassuchapersistentname, andprovidingRDF (seeSect. 1.5)todescribeit. Thishastheimmediateattractivepropertythat(usually)onecanfindmoreaboutaparticularnamebytypingthenameintoanordinary
11http://www.geonames.org
6
RDF,theSemanticWeb, Jordan, JordanandJordan
webbrowser; althoughit’snotarequirement, theusualgoodpracticeisforsuchaURI toproduceahuman-readabledescriptionofsometype.
Anyonecancreatesuchaname: I ownthedomain nxg.me.uk anddecided,bypersonal fiat, that http://nxg.me.uk/norman/ shouldbemy ‘name’ in thiscontext. Afterashortbitofwebsiteconfiguration, itwasso.
Oncewehavenames, wecanstarttodescribethethingssonamed.12
Sinceanyonecan–andmanydo–createonlinenamesforthings, oneofthefirstthingswemightwanttodoistoaddresstheapparentproblemofduplica-tion. So, forexample, wemightwanttosaythat http://dbpedia.org/resource/Jordan_River isthesamethingas http://sws.geonames.org/7874114/. Thiswillbetrueinsomecontexts, butfalseinothers. Inthiscase, itturnsoutthatthegeonames.orgdesignershavedecidedthat http://sws.geonames.org/7874114/referstotheRiverJordan inJordan, andthatthe‘same’riverinIsraelisnamedhttp://sws.geonames.org/294624/. Therefore, in some contexts, itwould besimplyfalse(andimportantlyfalse)tostatethat http://dbpedia.org/resource/Jordan_River is ‘the same as’ one or the other, and wemight find it bettertosaythat http://sws.geonames.org/294624/ is ‘partof’ http://dbpedia.org/resource/Jordan_River. Thissortof(oftenbarelyconsistent)subtletyissome-thingwhichwearecomfortablewithinhumanconversation, butwhichwehavetospelloutinpedanticdetailassoonascomputersareinvolved.13
Otherthingswemightwanttosayarethat http://dbpedia.org/resource/Jordan_River hasacertainlength, thatitisamemberofthecategoryof‘rivers’,that‘rivers’areatypeofgeographicalfeature, thattheriveris locatedwithin (asopposedtobeingacomponentpartof)bothIsraelandJordan; wemightwantacomputerto‘know’thatthecategoriesof‘rivers’and(forexample)‘cats’aredisjoint(computersare very ignorant).
The term ‘know’, here, doesnotof course refer to anycognitionon thepartofamachine. Instead, itreferstoacalculusofmanipulationswhichwouldallowittodetectthatastatementthat‘http://dbpedia.org/resource/Jordan_River isacat’isinconsistentwithwhatithasalreadystored, orthatasearchfor‘thingslabelled“jordan”whicharegeographicalfeatures’shouldnotreturnanybasketballplayers.
ThislastpointindicatesonefragmentofwheretheapplicationsoftheSe-manticWebmightlie. Weknowhowtostoreandmanipulate facts oncomputer–wecanmatchanemployeenumberwithanameanddepartment, orastaratparticularcoordinateswithbrightnessandcolour–butoftenthesefactslackthereal-worldstructureandinterrelationshipsthataresoimportanttohowweashumanswanttohandlethem. Evenwhensuchknowledgeisstructuredinsomecontexts, sothattheinternalstructureofalibrarycataloguemightreflectthereal-worldstructureoftheorganisationthatmaintainsit, thisstructuringisnotshareablewithoutclosepriorcooperation. IfI emailedyouacopyofsuchacatalogueasaspreadsheet, youasahumanwouldswiftlyworkoutwhatthisdocumentwasandwhatsomeofthecolumnsmeant, thoughyoumight(forex-ample)havedifficultytellingapartthecolumnswithpublicationyear, acquisitionyearand, say, theyearthebookwasfirstborrowed. I wouldhavetotalktoyoutotellyouwhichcolumnwaswhich; neitherofuswouldexpectthespreadsheet
12I noteparenthetically–forthereisa veryverylong distractionpossiblehere–thattheURIs http://dbpedia.org/resource/Jordan_River and http://nxg.me.uk/norman/ arenamesfor thephysicalthingsthemselves, andnotforanyonlineresourcedescribingthethings; inthesamesense, aDOI isadigitalidentifierforanobjectsuchasanarticle, andnotfortherecorddescribingthearticle. ThisistrueeventhoughbothofthemhavetheformofaURI whichisretrievable(or‘dereferenceable’)inthesensethatawebbrowsercanretrievesomethingfromthataddress. IfyouretrieveeitherofthoseURLsby typing it intoabrowser, youare redirected toanotherwebpagewhichdescribestheresource(thatis, aresourcewithadifferentname, which is awebpageratherthanapersonorconcept); inneithercasedoyougetme, orariver-fulofwater, deliveredtoyourcomputer. Ifyouareinterested, andhaveadayorsotoburn, searchfor‘httprange-14’.
13WemakesomeextraremarksaboutthelogicofRDF,andidentity, inSect. 1.5.
7
RDF,theSemanticWeb, Jordan, JordanandJordan
tobeusableonacomputerwithoutthishuman-to-humaninteraction.ThegoaloftheSemanticWebistomakethissortofstructuredknowledge
–aboutgeography, stars, people, or retailopportunities–generallyshareableandmanipulable by computers, while preserving asmany as possible of the(text)web’spropertiesofdecentralisation(there isnopermissionorcoordina-tionrequired), openness(whatwecancommunicateisnotlimited apriori), andsimplicity(thereisstillonlyoneprotocol, HTTP).
Thatis, theSemanticWebisnotaboutmachine understanding, butaboutthetechnologyrequiredtoletcomputersmanipulatetheseURI-basednamesinwayswhichare, as faraspossible, not inconsistentwith thepropertiesof thereal-worldobjectsforwhichtheyareanalogues. Thisanalogy–thefunctionalconsistencybetweenthebehaviourofreal-worldobjectsandthedeclaredrulesformanipulatingtheirnamesinthemachine–isthe‘semantics’intheSemanticWeb.
1.5 …throughalogicalframeworknamedRDF…
Sofar, soabstract. Howdoweactually writedown thestatementthat‘http://dbpedia.org/resource/Jordan_River isa river’? For this, wemustexaminethe ResourceDescriptionFramework(RDF).
RDF isa framework fordescribingthings, andtheirmutualrelations. It’saratherabstractframeworkforthinkingaboutthemechanicshere, andisnot,itself, aspecificdata format orlanguage(thoughthereareformatsandlanguagesintimatelyassociatedwithit).
Inthelogicallanguageinwhichitwasfirstdescribed, RDF israthersim-ple(oratleastcompact, whichisnotnecessarilythesamething). Itseemsbest,therefore, tofollowtheoverallplanofthissectionandgivethecompactexpla-nationfirst, andsubsequentlyglossitatalittlemorelength. So(omittingafewdetailswhichareunimportantatthispoint):
• Theworldisdescribedbyamixtureof resources and literals.
• Literals aresimplystrings, suchas“RiverJordan”or“RíoJordán”.
• Resources arethingsintheworldsuchasindividuals, categoriesofthings(suchaspeopleorrivers), abstractconcepts(‘worldpeace’), thingsontheweboroffit, orindeed anythingthatcanbegivenaname.
• Resourcesaregiven names whichareallsyntacticallyURIs.
• Onecanmake statements aboutresources, allofwhichareoftheformofatriple ofsubject(theresourceinquestion), predicate, andobject. Subjectsandpredicatesarealwaysresources; objectsmayberesourcesorliterals.
RDF version1.1isformallydescribedin [10].14
Tosaymore, wewillhavetowritedownRDF.ThereisnosingleRDF syntax;wewillpickthesyntaxTurtle [11]fromtheavailableoptions. Inthissyntax, wecanwrite
<http://nxg.me.uk/norman/><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person>.
<http://nxg.me.uk/norman/>14Thisreplacestheoriginalversion1.0; therearefurtherdocuments, includinglinkstoprimersat
differenttechnicallevels, at http://www.w3.org/standards/techs/rdf, andabroadercollectionofresourcesat http://www.w3.org/RDF/. TheRDF suiteofdocumentswascomprehensivelyrefreshedinFebruary2014, withthereleaseofnewversionsofmanyofthestandardswhichhadaccumulatedovertheprevious15years.
8
RDF,theSemanticWeb, Jordan, JordanandJordan
<http://xmlns.com/foaf/0.1/name>"Norman Gray".
Therearetwostatements (ie, triples)here, onestatingthat http://nxg.me.uk/norman/ isaPerson, andtheotherstatingthatresource’sname. Ineachcase,wecanseethestructureofthesubject-predicate-object triple, withtheresourcehttp://nxg.me.uk/norman/ being the subject in both cases, and the resourcehttp://xmlns.com/foaf/0.1/Person andliteral "Norman Gray" beingtwo objects.Thefirstpredicate–statingthetypeoftheresource–isoneofthepredicatesde-fined in theW3C‘sRDF ‘Schema’ standard (RDFS) [12]; the type inquestionisonedefinedbytheindependentlydefined FriendofaFriend(FOAF) schema(whichwewillreturntobelow). TheFOAF schemaalsodefinesa‘name’predi-cate, whichgivestheliteral, conventional, nameoftheresourcethatisitssubject.
Thisnotationisobviouslyrathercumbersome. TheTurtlesyntaxstatesthatwecanabbreviatethe‘type’predicatebyjust‘a’, thatwecancollapserepeatedsubjects, andthatwecanprovideabbreviationsforcumbersomelylongURIs; asaresult, themoreidiomaticwayofrepresentingthestatementsaboveisjust
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
<http://nxg.me.uk/norman/>a foaf:Person;foaf:name "Norman Gray".
Thereareafewmoresubtletiestothisnotation, whichcanbefoundin [11], buttheyneednotdetainus, sinceourgoalhereissimplytoconcretizethegeneral-itiesofearliersections, andtoallowustousethisnotationinexamplesbelow.
Youmayobjectthatthisseemsaterriblylong-windedwaytoindicatesome-one’sname, andyouwouldberight. ThereisanothernotationcalledRDF/XML [13]whichisevenmorelong-winded, whichonlyitsmothercouldlove, andwhichwewillpassoverinsilence. AndthereisyetanotherRDF notationcalledRDFa15
whichisconcernedwithembeddingRDF statementswithinotherdocuments,andmostparticularlywithinhuman-readable HTML pages; theintentionisthatthesamedocumentthatahumanreadsdescribing, forexample, alibrarybookora‘retailopportunity’, isatthesametimereliablyinterpretablebymachine.Butwemustnotallowourselvesbedistractedbysyntax.
ThecentralgoalofRDF isnottoprovideadata-transportformat(andhereisagoodpointtoemphasisethatthe‘F’inRDF standsfor‘framework’, andnot‘format’), buttoprovideasetofprimitivenotionswithwhichwecanrepresentawidebreadthofknowledge. Thosenotionsaresimpleenoughthatitisfeasibletotranslateawidevarietyofother, possiblymorenatural, formatsintoRDF –forexampletherowsinadatabasetable, ortheelementsinanXML file; andtheyarewell-enoughdefinedthatwecanuselogicaltoolstoprocesstheresultsanddrawout their implications, whilestandingagoodchanceofpreservingtheirreal-worldmeanings(cf. Sect. 1.6).
Saying thatweneednot storeor transportknowledge in this form isnotsayingthatweshouldnotdoso, anditisperfectlyreasonabletostoreandtrans-portRDF ifthatiswhatisconvenient. ThedatabasesinwhichonestoresRDFiscalleda‘triple-store’, after thesubject-predicate-object triples theycontain;theyarearchitecturallydifferent from the relationaldatabasesof tables, rowsandcolumns, withwhichyoumightbemorefamiliar, andalthoughtheyarenotquiteastechnicallymatureasrelationaldatabases, theyaresteadilyimproving.
Asafinalpoint, thestatementsofRDF arenotrequiredtobetrue, consistent,orevenmeaningful(youcansay‘Truthsmellsofmuesli’ifyouwantto, or‘thepresentKingof France isbald’) – anxieties about such statements are for thehigherlevelsofinferenceandontologiesthatwewillcometoshortly.
15https://www.w3.org/standards/techs/rdfa
9
RDF,theSemanticWeb, Jordan, JordanandJordan
A brieflogicalexcursion, fortheenthusiast: RDF byitselfhasanexceed-inglysimplecoresemantics. Thelinkbetweenaname(intheformofaURI)anditsreference(inRDF terms, a‘resource’)ismadeinnaturallanguage, andisnotexpectedtobemeaningfultoamachine. RDF doesnotdistinguishbetween,forexample, senseanddenotation(inFrege’s terminology), andisuncommit-tedabouttheidentityorotherwiseofresources; thusitispossibletonamebothMarkTwainandSamuelClemenswithURIs, anditisnotpossible, withinRDFalone, tomakestatementsabouttheirmutualidentityorotherwise. OWL on-tologies(seebelow)areabletomakestatementsaboutequivalence, butevenheretheframeworkisuncommittedaboutwhatequivalencemeans(itdoesnot,forexample, distinguishsenseanddenotation), andinonecontextitmay, andinanothermaynot, beuseful for theauthorofastatement/triple tostate thatMarkTwainandSamuelClemens, are‘equivalentresources’, orthemodelJor-danandtheécrivaineKatiePrice, ortheGeonamesandDBPedianamesfortheRiverJordan. Onceanassociationhasbeenmadebetweenanameandarefer-ent, thegoalofRDF istoallowthisassociationtobepreservedacrossmachinesandnetworks, andtoallowmachinesprovidedwithbothRDF statementsandontologies(whichmaycomefromdifferentsources)todiscoverthestatementsentailedbytheontologies, whichhavethesametruthvaluesastheinitialRDF.Forfurtherdetailedlogicaldiscussion, see [14].
1.6 …enhancedbymachineinference…
IntheTurtleexampleabove, I ‘described’theresource http://nxg.me.uk/norman/bysaying that the thingwith thatnameisa foaf:Person, whose foaf:name is“NormanGray”. Theseare two terms froma simple ontology (seeSect. 1.7)called FOAF.16 TheFOAF ontologyhasa namespace http://xmlns.com/foaf/0.1/, meaningthatallofitstypesandpredicatesstartwiththat URI.Itincludesa number of types such as ‘Person’ and ‘Project’, and a number of relationssuchas‘name’, ‘mbox’(emailaddress), ‘publications’andsoon. Anotheron-tology, well-knowninthelibrarycommunity, is DublinCore(DC).This isde-scribedauthoritativelyin http://dublincore.org/documents/dcmi-terms/, andtheRDF expressionof thisvocabularydescribesthenamespace http://purl.org/dc/terms/, sothattheDC ‘title’predicateis, infull, http://purl.org/dc/terms/title, which ismostoftenseenabbreviated to just dc:title. TheDContologyisfocused, atleastinitially, onmetadataforbibliographicandarchivalresources, andincludesrelationssuchas‘title’, ‘creator’, andsoon.
Importantly, thesevarioustermshavenointrinsicmeaning, andRDF placesnorestrictionsonthepredicatesattachedtoaresource, norwhoattachesthem.Thusifyouwishtosay
<http://nxg.me.uk/norman/>foaf:name "Capitania General Bernardo O'Higgins";myprops:shirtType "class C".
thenyouareatperfectlibertytodoso, evenalthoughthefirststatement(statingthatmynameisBernardoO’Higgins)isfalse, andthesecond(apparentlysay-ingsomethingaboutmyshirt)ismeaninglesstome, thoughdoubtlesssomehowusefultoyou.
Theprimarygoalofthesevariousrelationsistodescribearelationshipsuf-ficientlyunambiguouslythatevenamachinecanprocessit. Thusa‘title’intheDC sense–thatis, theobjectofthe dc:title predicate–isalwaysabibliographictitle, suchas dc:title "Transition to the Digital". TheFOAF ontologyalsohasa‘title’predicate, butinthiscaseitisalwaysandonlyaperson’shonorific:<http://nxg.me.uk/norman/> foaf:title "Dr".
16The‘FriendofaFriend’ontologywasoriginallyconceivedofasawayofcreatingapeer-to-peersocialnetwork.
10
RDF,theSemanticWeb, Jordan, JordanandJordan
<http://nxg.me.uk/norman/>foaf:name "Norman Gray";foaf:mbox <mailto:[email protected]>.
<http://example.org/a.n.other>a foaf:Person;foaf:name "Aloysius Naismythe Other".
<http://someone.org/ping>foaf:mbox <mailto:[email protected]>.
<isbn:????> # XXX INSERT BOOK'S ISBN HEREdc:title "Transition to the Digital;dc:contributor <http://someone.org/ping>.
Figure 1: Somestatementsaboutpeople.
Butwecandomorethanthis.ConsiderFig. 1: thisprovidesaFOAF nameandemailaddress for http:
//nxg.me.uk/norman/, statesthat http://example.org/a.n.other isa(FOAF) Per-son, and provides an email address for an otherwise unidentified somethingnamed http://someone.org/ping. Youmightalreadyhaveapictureoftheenti-tiesbeingdescribedhere, intermsoftheirnumberandtype.
Ifyouwere toput thisdescriptions inFig. 1 intoa triple-store, and thenqueryittoretrieveallofthePersonsdescribed, youwouldobtainonlyasinglePerson, namely http://example.org/a.n.other, becausethisistheonlyresourcewhichhasbeenexplicitlystatedtobeoftype foaf:Person. Itisobvioustousashumans, however, thatifsomethinghasanameandanemailaddress, thenit’salmostcertainlyhuman, andtheFOAF specificationindeedmakesthisstip-ulation, thata foaf:mbox propertycanbeattachedonly toa foaf:Person. Informallanguage, wecansaythat foaf:Person isthe‘domain’ofthe foaf:mboxand foaf:name predicates, meaningthatonlythingsoftype foaf:Person areper-mittedtohavethosepredicates(thisisdistinctfromtheuseof‘domain’tonameacollectionofmachinesontheinternet, asinforexample www.w3.org); simi-larly, wecansaythat dc:Agent isthe‘range’ofthe dc:contributor predicate,meaningthatanyobjectofthispredicatecanbededucedtobea dc:Agent. Inconsequence, anRDF triple-store withthisextrainformation(moreprecisely, atriple-store possessedofbasicinferencingcapabilities)can deduce that /norman/and /ping arePersons, andsowouldgivethreeanswers, whenaskedtolistthePersonsitknowsabout.
A furtherthingthathumansknowisthat, whileindividualsmayhavemul-tipleemailaddresses, anaddress isusuallyownedbyasingleperson. FOAFagrees: onlyasingle foaf:Person canbethesubjectofa foaf:mbox property.Butboth /norman/ and /ping havethispropertywiththesamevalue. Thereso-lutionisstraightforward: thesetwoURIscanbededucedtobemerelydifferentnamesforthesamePerson. Primedwiththisextrainformationaboutthedomainof foaf:mbox, a triple-store, askedhowmanyPersonswererepresentedinFig. 1,wouldanswer‘two’. Sucha triple-store cananswermorecomplicatedquerieswithoutgettingconfused. Askedforthetitlesofthingstowhichthepersonnamed"Norman Gray" hascontributed, a triple-store providedwithFig. 1 couldanswer"Transition to the Digital", byfollowingthechainofallowed inferences –itsynthesisesstatements, suchas“http://nxg.me.uk/norman/ isa foaf:Person”,and“/norman/ sameAs /ping”, whichareonlyimplicitintheoriginalsetofstate-
11
RDF,theSemanticWeb, Jordan, JordanandJordan
ments.Theend result, asmaynowbeclear, is that a triple-store canaggregate
informationfrommultiplesources: theinformationinFig. 1 mighthavebeengatheredfromapublisher’swebsite, amembershipdatabase, andanindividual’shomepage, andmayhavestartedoffinanyorallofRDFa, XML,orarelationaldatabase. Onceitisgathered, the triple-store candrawtheconclusionswhicharenotapparentfromanydatasourcebyitself.
Beforegoingontodescribebrieflyjusthowthesevariousrelationsarear-ticulated, wehavea fewfinal remarks tomake. Firstly, whenwesay, above,that“a foaf:mbox propertycanbeattachedonlytoa foaf:Person”, wearesay-ingsomethingdifferentfromtheapparentlysimilarstatementwemightfindinanXML schema. InXML,suchastatementisasyntacticalone, sayingthatan‘mbox’element(inthatexample)is permitted tobeattachedonlytoaPerson, sothatifitisattachedtosomethingwhichisn’taPerson, oratleastisn’tknowntobeaPerson, thenthisisanerror. InRDF,incontrast, thestatementmeansthatanythingtowhichthe foaf:mbox propertyisattachedmustbe deduced tobeaPerson. Itisnotevenillegitimate(althoughitisinfactuntrue)tosaythat
<http://nxg.me.uk/norman/> foaf:name "Norman Gray";dc:bibliographicCitation "Gray (2014)".
even though thedomainof dc:bibliographicCitation allowsa triple-store todeducethat /norman/ isof type dc:BibliographicResource (that is, somethinglike a book, that canhave ‘Gray (2014)’ as its citation). Only if the store issubsequentlytoldthat foaf:Person and dc:BibliographicResource are disjointwillitraiseanyobjection, andthatobjectionwillnotbeasyntacticone, buttheannouncementofalogicalinconsistency.17
Secondly, wemightreasonablyobjectthat, intherealworld, alibrary(whichisnota foaf:Person)mighthaveanemailaddressforenquiries, andthatarole-basedemailaddressmightinfactbesharedbymultiplepeople. Thisisaccurate,buttherestrictionsweplacedonthe foaf:mbox propertyabovearepartoftheapproximationtotherealworldthatwemustmakewhenwedescribethatworldintermssimpleenoughforacomputer. Thelibrary’semailaddressisthereforenottheobjectofany foaf:mbox property, forpreciselythereasonthatalibraryisnota foaf:Person; andsimilarlyfortherole-basedaddress, forpreciselythereasonthatitisincompatiblewiththedefinitionof foaf:mbox foranemailad-dresstohavemultipleowners. A different‘people’ontologymighthavedifferentrestrictions, andsopermitdifferentimplications.
Anotherlogicalexcursion: Above, wementionedtheideathat(someof)theRiverJordanislocatedwithinJordan, asopposedtobeingpartofit. Thetruthofthisstatementsdependsinpartonwhat, precisely, ‘ispartof’means, andsopotentiallydrags inasetofphilosophicalconsequences, andindeedpoliticalandculturalones; buttheonlyonesthatmatterstothecomputerarethelogicalconsequences: giventheinformationthat‘A ispartofB’, whatotherstatementsisitpermittedtosynthesise? ThatisalargelytechnicalproblemconcerningwhatisPartOf or sameAs isintendedtomeaninaparticularsystem, anditcorrespondstothehumanproblemofarticulatingwhatoneknowswithinaformalsystem,inawaywhichallowsthemachinetodrawtheconclusionsoneexpects. Itispossibletospendagooddealofwork-dayeffortarguingaboutjustwhat‘is’is.Thesequestionsarewhatpermitanontologytohavesufficientstructuretomakeinferencespossibleanduseful, andworkingoutthenecessarybalancebetweenapproximationandexpressivenessiswhatmakesthedesignofontologieschal-lenging. Allthatsaid, suchexoticquestionsarelargelyirrelevanttothe users of
17Saying that the twoclassesare disjoint is to say that therearenoobjectswhichare inbothclasses; thisisincompatiblewiththededuction, above, that /norman/ is inbothclasses, anditisthisinconsistencythatmachinewillobjectto.
12
RDF,theSemanticWeb, Jordan, JordanandJordan
anontology, aslongastheyarebroadlyawareofthepotentialdislocationbe-tweenstatementsintheformallanguageandtimelessstatementsabouttherealworld.
ThuswecanseethatRDF statements, andtheontologieswhichstructurethem, arenotreally, ornotonly, about truth, otherthanintheabstractlogicalsensethatvalidargumentsmustpreservethetruthvaluesoftheirinputs. Simi-larly, wearenotconcernedwithdevelopingasingletrueontologyoftheworld,orevenwiththeclaimthatanontologywhichisusefulinonecontext, willbeusefulorevenmeaningfulinanother, orthattwoontologiesdescribingoneob-jectwillnecessarilybecommensurable. Developingontologiesisatypeofpro-grammingactivity, andthetypesandpredicates, andtherelationshipsbetweenthem, mustbechosentomatchthethingsbeingmodelled, toacceptthepreci-sionofwhatcanbesaidaboutthem, andtosupporttherangeofconclusionsonemayhopetodraw.
FOAF andDC arerelativelysimpleontologies, withratherlightweightcon-straints; theyarethereforeverywidelyapplicablefordescribingpeopleandarte-facts, respectively, and, inconsequence, areveryusefulforintegratingotherwiseratherdisconnectedinputdatasets. Incontrast, alargeontologysuchastheGeneOntology [15, 16]isconcernedwithalogicalstructurewhichissufficientlyin-tricate, andsufficientlycloselymappedtonature, that theontologycandrawscientificallyvalidconclusions, atthecostofbeingconsiderablyhardertouse.
Intheexamplesabove, wehavetalkedofvariouswaysofaddingstructuretothelistoftypesandpredicatesinanontology; nowisthetimetodiscussbrieflyhowthisactuallyhappens.
1.7 …basedonOWL andotherontologies.
Theinternalstructureofontologiesisthemosttechnicallyintricatepartofourstoryhere. Fortunately, itisnotnecessarytodiscussthatstructureindeepdetail:ourgoalinthissectionissimplytoconcretizetherathergeneralstatementsaboutontologystructureabove, and togivegeneralpointers towardsmoredetailedadvice.
There’sawell-knowndescriptionofanontologybyThomasGruber[17],18
inwhichhedeclaresthatanontologyis
“aformalspecificationofasharedconceptualization”
Thatappearsopaqueatfirstglance, butinfactit very conciselypullstogetherallofthekeyconcepts. Thatistosay, anontologyis
conceptualization asetofconcepts
shared …whichatleasttwopeopleagreeabout
specification …andwhichhasbeenwrittendown
formal …inamachine-readableway.
Like‘semantics’, theterm‘ontology’hasaforbiddingaspect. Inthiscontext,however, itsimplylabelsoneendofaspectrumofwaysofstructuringinforma-tion. InDeborahMcGuinness’s ‘semanticspectrum’ (in [18], andredrawninFig. 2)sheillustratesarangeof ‘sharedconceptualizations’, andsuggests thattheterm‘ontology’ismostnaturallyrestrictedtothosemoreformalstructurestotherightofthecentralline, dividinginformalfromformal‘is-a’relations; Gru-ber’sdefinitionappliestomostofthisspectrum.
18ThisisthebynowapparentlytraditionalslightmisquotationofGruber’s‘explicitspecification’as‘formalspecification’.
13
RDF,theSemanticWeb, Jordan, JordanandJordan
Catalogue/ID
Terms, glossary,folksonomy
Thesauri(‘narrower’
relation)
Informal is-a
Formal is-aValue
restrictions
Generallogical
constraintsProperties
Figure 2: Thesemanticspectrum, simplifiedfrom[18]
First, attheextremeleft, simplylistingidentifiersforasetofobjectsisasortofprimitivesharedconceptualization.
Next, andmarginallymore formally, a controlledvocabulary is simplyadeliberaterestrictionofthetermsweusetodescribetheworldinsomecontext.Inthesamegeneralcategory, a folksonomymightbedefinedasa‘collaboratively’or‘loosely’controlledvocabulary, inwhich, ideally, someconsensusemergesonthetermstousetodescribesomeuniverseofresources. Inbothcases, though,thereisnorealstructureto(therelationshipbetween)theterms.
A thesaurus is acontrolledvocabulary representingconcepts, plus somedeclaredrelationshipbetweentheconcepts. Mosttypically, athesaurusisusedfor, oratleastassociatedwith, informationretrieval: youmightgointoalibraryandasktobedirectedtobooksaboutcats, orwhatwemightcalltheconceptof‘Cats’, andalibrarianmightbeabletotakeyoutoashelfofusefulbooks. Ifyou’reinterestedinspecifically‘domesticcats’, they’reonasmallerpartoftheshelf; ifyou’reactuallyinterestedin‘mammals’ingeneral, theymightbespreadoverafewshelves. Therelationshipsbetweenthese broader and narrower terms–forthisisthemostcommonstructuralrelationinthesauri–isapracticalone:any information retrievedbyuseofagivenconceptwillalsobe retrievedbyuseofthe‘broader’concept. Theseare‘is-a’relations, inthesensethateverydomesticcat isa cat, andeachcat isa mammal; butthisisan‘informalis-a’, inthesenseofFig. 2, sinceinthelayoutofapet-shop, forexample, orthelayoutofitssupplier’scatalogue, ‘catcollars’mayreasonablybea‘narrower’conceptwithrespectto‘cat’withoutanysuggestionthatacatcollarisatypeofcat, andalibrarybookaboutcatsmaybelabelled‘cat’(orforexamplewiththeDeweynotation636.8)withoutanyoneoranythingbeingentitledtoconcludethatthebookitselfwouldbeimprovedbytheinsertionoffish.
The W3C standardforthesauriistheSimpleKnowledgeOrganizationSys-tem(SKOS),19 whichformalisesthesebroader/narrowerrelationsandaverysmallsetoffurtherones, inadeliberatelysimpleframeworkwhichisdesignedtobebroadlydeployable.
Next, andsteppingoverthedividinglineofFig. 2, thesimplest ontologiesarethosewherethishierarchyof‘is-a’relationsisindeedexpectedtoholdinalogicallyusefulsense, andwherelabellingsomethingwithanontologytermis indeedanassertion that that thing is an instanceof the labelled type, andthereforeofanysupertype. Forexample, theLinnaeannameforcatsis Feliscatus;thisisaspeciesofthegenus Felis inthefamily Felidae, intheorder Carnivora, allthewayuptothekingdom Animalia; andthishierarchyisexplictlyintendedtoallowmetoconclude, onbeingpresentedwithsomethinglabelled Feliscatus,thatthatthingisindeedacatratherthanabookaboutcats, andthatthe(bynowindignantlywriggling)subjectisacarnivorousanimal, withclaws.
This‘formalis-a’istheintendedinterpretation–thatis, theintended‘seman-
19http://www.w3.org/2004/02/skos/intro
14
RDF,theSemanticWeb, Jordan, JordanandJordan
tics’–ofthestatementinFig. 1 thatthethingnamedby http://example.org/a.n.other isof‘type’ foaf:Person. Theusualwordforthetypeinthiscontextisclass, andthe(FOAF) URI http://xmlns.com/foaf/0.1/Person (usuallyabbrevi-atedtojust foaf:Person)isthenameoftheabstractclassofpersons; individualpeopleare instances ofthisclass.
Thenextstepalongthesemanticspectrumistobeabletodeclarethatin-stancesofsomeclassesmaypossess properties (theterm predicates isgenerallyinterchangeable)suchas foaf:name. Anontologyofthistypemaydeclarethedomainandrangeofapredicate, restrictingthetypeofsubjectorobjectthatthepredicatemayhave. A triple-store canthenusethisinformationtomakecertaindeductions, asweillustratedinSect. 1.6 onp.11. However, thereisnoway,inthistypeofsimpleontology, to require thataninstanceofaclassmusthaveaparticularproperty(I canbea foaf:Person withouthavinganemailaddress,forexample, bizarrelyvictorianthoughthatnotionmayseem), andareasonerwhichdoesnotknowmyemailaddress, becauseithasn’tbeenprovidedwithanfoaf:mbox propertyforme, isnotentitledtoconcludethatI thereforedon’thaveone.20
Thesimpleontologicalframeworkrepresentedby RDF Schemas(RDFS) iscapableofexpressingnomorethanthis: thatclassesexistandthatsomeclassesaresubclassesofothers, thatresourcescanbeinstances(ormembers)ofclasses,thatpropertiesexistandhaveclassesasdomainandrange. Thepayoffisthatreasoners21 whichimplementthislogiccanbesimplerandfasterthanareasonercapableofmoreintricacy. ReferringbacktothediscussionaroundFig. 1, suchareasonerwouldbecapableofconcludingthat http://nxg.me.uk/norman/ isa foaf:Person (becausethatresourceisassertedtohavea foaf:name, implyingthat /norman/ is in thatproperty’sdomain), and it couldconclude that http://someone.org/ping isa dc:Agent; butitcouldnotusetheidentityofthe foaf:mbox properties toconclude that theseare the sameperson, and it couldnotdetectanyinconsistencyinasubsequentassertionordiscoverythat /norman/ isa dc:BibliographicResource.
Togetsuchextrareasoningpower, itisnecessarytogobeyond RDFS totheright-handsideoftheontologyspectrum, towardsmoreelaborateframeworkssuchasOWL.TheWebOntologyLanguage(OWL22) comesinseveralstandardvarieties–OWL Lite, OWL DL,andOWL Full–andfurthervarietiesassociatedwithparticularimplementations(see [19]andreferencesat http://www.w3.org/standards/techs/owl). Thedifferencebetweenthesevarietiesisthatsomearemore expressive, inthesensethatitispossibletoarticulatemorecomplicatedrelationshipsbetweenresources; thepracticaltradeoffisthatthemoreexpressivevariantsareharderormoreexpensivetoimplement. ItisinOWL,andnotRDFS,thatitispossibletosay
dc:Agent owl:disjointWith dc:BibliographicResource.
A reasoner, onceprogrammedtocalculatethelogicalimplicationsofthe owl:disjointWith predicate, canthereafterconcludethatifaresourcesuchas http://nxg.me.uk/norman/ hasbeenassertedordiscoveredtobeinbothclasses, thenithasdetectedalogicalinconsistency. Thismightformpartofalongerchainof reasoning, or itmaybepracticallyuseful fordiscovering latent errors ina
20This‘permissionforignorance’isknownasthe‘openworldassumption’. Itisterriblyimportantforthemathematicallogicwhichareasonerimplements, butmostnon-specialistusersofRDF canbeforgivenforfeelingittobearatherabstractdetail. Itisincontrasttothe‘closedworldassumption’builtintomoretraditionalrelationaldatabases: ifthereisnosalarybesidemynameinthepersonneldepartment’sdatabase, thenthesystemwillconcludethatI amapersonwithoutasalary, asopposedtoapersonwhosesalaryisunknown(whichistheonlyconclusionpossibleinanRDF analogueofthisdatabase); inthiscase, thepersonnelofficeiscertainthey’renotgoingtopayme.
21A ‘reasoner’ in thiscontext is apieceof softwarewhichcan implement the logicalcalculusindicatedbyanontology.
22Yes, thestandardisindeedcrackingaWinnie-the-Poohjokehere.
15
RDF,theSemanticWeb, Jordan, JordanandJordan
database: ifamappingdatabasediscoversa feature that ismarkedasbothariverdeepandmountainhigh, thenitscuratorscantherebydiscovertheyhavesomerepairingtodo.
1.8 Sowhat, briefly, istheSemanticWeb?
Whatwehaveendedupwithisthis: inRDF wehaveaflexiblemodelforrep-resenting relatively simplestatementsabout theworld, inaprimitive subject-predicate-objectpattern. Thisframeworkhasanumberofsyntaxeswhicharemoreorlesssuitableinparticularcontexts; thesearemostprominentlyRDF/XML,TurtleandRDFa, buttherearealsosomeemergingalternativessuitableforJSON.Despitethisabundanceofsyntax, itisthesimplicityandverybroadapplicabilityoftheprimitive triples thatiskey, sincealmostanyreasonablystructureddatasourcecanbewrangledintoRDF form, onewayoranother, andthismakesRDFanexcellentmechanismforcombiningheterogeneousdatasources.
Animportantstepinthiscombiningmechanismisthecarefully-struckbal-ancebetweenthelogicalprecisionofthecalculusofresourceswhichontologiesexpress(givenasetofRDF triples, whatfurthertriplesarelogicallyentailed?),andboththeflexible imprecisionofthenatural-languagelinkbetweenURIsandtheirdenotations, andthevariableprecisionofontologicaldesigns, whichmayrangefromthebroad-brushutilityofFOAF (‘only Personshaveemailaddresses’)totheintricacyoftheGeneOntology.
Furthermore, theveryclosetechnicalanalogybetweentheSemanticWeband theWebof Strings reassuresus that thepropertieswhichmade thewebsuccessful (itsopenness, fault-tolerance, andsoon, asdiscussed inSect. 1.2)willsupporttheSemanticWeb, too.
Onceinformationis inRDF form, andstoredina ‘triple-store’, itcanbequeried23 eitherdirectly, orwiththeassistanceofa‘reasoner’, whichsynthesisesextratriplesbasedontheinferencesobtainedfromtheoriginallyassertedtriples,inthepresenceoftheextrastructureprovidedbyasimpleoracomplex ontology.
Wecouldsaymoreaboutthe‘SemanticWeb’. Wecouldtalkaboutsomeofthedetailsof triple-stores andqueryengines, butthatwouldveryquicklyim-merseus ina levelof technicaldetailwhichcanvergeon the impenetrable,andwhichisinanycasestillinactiveresearch-leveldevelopment. Wecouldintroducesomeof thefeaturesofontologydesign: thoughthis isnowlargelystable, itwouldtakeseveralmorechapterstocover, withoutimmediateintellec-tualprofit. Wecouldtalkaboutdeployment, butmanyofthemaindeploymentsofSemanticWebtechnologyareinback-endsystemssupportingcomplicateddata-integrationprojects, anddonotmakeforvividillustration.
Instead, thegoalofthischapterhasbeentoprovideanintroductiontothecoreconceptsofthe‘SemanticWeb’, atalevelandtoanextentthatwillallowthe reader tounderstand thegoalsand starting-pointof aprojectusing thesetechnologies, andtounderstandthewayinwhichthe‘Semantic’WebrelatestotheWebofStringsofourlasttwodecades’familiarexperience. Formoredetails,the W3C’s‘DataActivity’24 providesusefullinkstofurtherreading; andforsomepracticaldetails, includingarathermorescepticalaccountoftheSemanticWebthanI haveproducedhere, see [21].
1.9 WhereistheSemanticWeb?
ItishardertopointtotheSemanticWebthantothetextualweb. AlthoughweencounterthetextwebwheneverwelookatWikipedia, bookaflight, orsearchforcatvideos, thesemanticversionremainsinthebackground.
23ThestandardW3C querylanguageforRDF iscalledSPARQL [20], whichistoa triple-store whatSQL istoarelationaldatabase.
24http://www.w3.org/2013/data/
16
RDF,theSemanticWeb, Jordan, JordanandJordan
TheFrenchNationalLibrary(BnF) hasalargecatalogue, splitintomultipleheterogeneousparts, usingvariousstandards(suchasEAD andMARC),andmak-inglinkstoadozenorsoexternalsites. A SemanticWebapproachhasmadethisinformationavailablethroughahomogeneousinterfacewhichprovidesintelligi-bilityandsyntacticalconsistency, withoutsacrificingtheopen-endednesswhichisnecessaryifsuchacomplicatedandlong-accumulatedinformationsourceistobemadefullyavailable [22]. Thiscatalogueisnowinusebyvariousapplica-tionsandsmallerlibraries, orenhancedandexpandedbymorespecialisedcata-logueselsewhere. Goingfromthecataloguetotheinstance, [23]describesasys-temwhichmakesavailable, inmachine-readableform, elementsofthesemanticcontentofpapersinPubMedCentral. Reference [22], isinthe‘in-use’trackofthatyear’sESCW conferenceproceedings, whereyoumightfindotherillustra-tiveexamples; thereare further illustrations in thecollectionofopendatasetsmaintainedbytheSemanticWebJournal.25
Atthetimeofwriting(2014), it isnotyetclear, atleasttome, justwhentheSemanticWebwillbemanifestly‘working’, norhowitwillgetthere; itisnoteventransparentlyclearwhat‘working’wouldmean. Thehighly-integratedscenarioof [8]isstillinthefuture, notbecauseitisinfeasible, butmostlikelybecauseitispredicatedonagooddealoflarge-scalecoordinationwhichisdiffi-culttoforceonadecentredweb. LinkedData(seeSect. 3, below)showsapathintothefuture, butisnotanend-point. Butperhapsweareaskingtoomuchforagranddesign. Theevolutionfromwebpages, toblogs, toTwitterisobviousonlyin retrospect; the fact that (for example) Instagram, Facebook, Twitter, Flickr,andup-to-dateconferenceorganisersallusehashtagsisexplicable, butcouldneverhavebeenpredicted; conversely, itisrathersurprisingthatwearestillus-ingSMTP-basedemail32yearsafteritwasfirststandardised, andinthefaceofnumerousclaimsforkilleralternatives. PerhapsthemostlikelyfuturefortheSe-manticWebmatchesthatofArtificialIntelligenceresearch, eventhoughthefor-mercommunityhasalwaysbeennervousofthecomparison: AI neverobviouslydelivereditsgrandpromises–itnever‘worked’–butinaworldofself-drivingcars, face-recognition, andpocket-sizedmachine-translationdevices, it'sclearthatit'sbeenquietlyengineeringitswayintosuccessfuldeploymentfordecades.
Thetwofinalremarkstobemadeconcern, inthefirstplace, Web 2.0, whichissometimeslinkedtotheSemanticWeb, butwhichinfacthasalmostnothingtodowithit; andinthesecondplacetheLinkedDataparadigm, whichisarguablythefirstappearanceoftheSemanticWebinmorenearlyeverydayexperience.
2 Web2.0
Theterm‘Web2.0’hasdisorientinglymanymeanings. Tosome, itreferstotheclusterofmid-leveltechnologieswhichletservicessuchasGoogleMapsorFace-bookactasan‘applicationinsideawebpage’; toothersithasmeaningswhichcanbegroupedundertheslogan‘theread/writeweb’; anothercampseesitasamarketingexpressionandisdividedaboutwhetherthisisanobviouslygoodoranobviouslybadthing; andforyetothersitisaproxyforabroadchangeinsocietytowardsamoredecentred, orengaged, orempowered, oropen-licensed,orotherwiseutopianfuture. Thesetermsarenotremotelyequivalent: insomecasestheymaynotevenintersect, sosomethingasobviouslynext-generationasWikipediaisWeb 2.0undersomedefinitionsbutnotothers.
Thetermisreallyonlyusefulasanadjective, todenotesomething–anything–thatismorethanacollectionofstaticwebpages. However, evensayingthat
25http://www.semantic-web-journal.net/accepted-datasets
17
RDF,theSemanticWeb, Jordan, JordanandJordan
muchbegsthequestion: whatisitthat’sso wrong withstaticwebpages(whichwe’renowobligedtocallWeb1.0)? Indeed, dependingonyourconceptionofwhat‘Web2.0’means, podcastsarefirmlyWeb1.0, andifyouturnoffvisitoreditingonyourwikipages, orcommentingonyourblog, thenthosebecomeWeb1.0, aswell.
The term ‘Web 2.0’ appeared as the title of a conference series run byO’ReillyMediain2004, andthetermspreadfromtheretowiderandstillva-guerusage. Thereisnotechnicaldifferencebetween‘Web 2.0’andtheoriginalversionoftheweb; instead, thetermreferstoaloosecollectionoftechniquesandevenattitudes, whichbecameincreasinglypopularinthefirsthalfofthatdecade, whichwereheavilyorientedtowardssupporting, encouraging, andex-ploitinguser interactivity ontheweb.
Above, I characterisedthewebasaspacewhereanyonecanjoinin, butadmittedthatinitiallythiswasmoretrueinprinciplethaninfact. Thewebofthe90swas, formost, aread-onlymedium: thewebwasfulloftextandimages,buttheonlywaytotalkbackwasthroughemail, orafewnow-arcanespacessuchasUsenet. Thoughbothwereeffective, andthoughUsenetisstill, in2014,NotQuiteDeadYet, theyarehardlyafree-flowingworld-wideconversation. Inthebroadest senseof the term, ablog is aWeb 2.0 thing: it’seasy to setupforoneself, andifeventhatis tooinconvenient, it’seasytouseablogsetupbya service suchasWordpress. With thatstep taken, it’smerelyamatterofcreativediligencetobroadcasttotheworld, andtodispute, declaim, decryandgenerallybickerincommentthreads. Broadlyblog-like thingssuchasTwitterandFacebookbecomemucheasiertoconceive, asbothproviderandconsumer,oncetheideaofablogservicepluscommentshasbecomepartoftheculture.
Withthatsteptaken, othertechnologieswereabletocollide; ‘AJAX’wasonesuch. The‘J’standsforJavascript, whichoriginallyappearedasalanguagesupportingsimplescriptingwithinawebbrowser–providinganelementofdy-namisminwebpages, sothattheycouldadjustthemselvestothebrowser’sen-vironment rather thanappearingasa simpleblockof text sent froma server.Thislanguage, andthesetofpracticeswhichAJAX represents, havegrowntothepointwhereawebbrowsermaynowbeseenasaseparateprogrammingecology. Somewebpages, suchasaGoogleMailpage, arriveasaminimalamountof HTML,enclosingaJavascriptprogramwhich, runningentirelywithinthebrowser, interrogatesaserver, andsynthesisesfromscratchtheHTML whichthebrowserfinallydisplays. Fromaprotocolpointofview, thereareminimaldifferencesbetweentheearliestwebpagesat http://info.cern.ch/hypertext/WWW/TheProject.html (witha‘lastmodified’dateof3December1992), andthetwitter.com homepagewhichcreatesthepageprogrammaticallyandontheflyin thebrowser: bothuseHTTP to transportmaterials fromaserver, andbothofferHTML forthebrowsertorender.26
Oneoftheotherlong-latentpossibilitieswhichwasexploredintheearly2000swastheideaofuploadedandsharedcontent. Flickr(https://www.flickr.com)wasoneofthefirstbroadly-knownservicestosupportusers’uploadingandsharingimages. Atatechnicallevel(again), thisisnotmuchdifferentfromblog-ging, butFlickrandthesocial-bookmarkingsite delicious.com areinterestingforthischapter’spurposesbecausetheywereamongthefirstwidespreadservicestosupport tagging. Flickrallowedusers toaddsimple labels topictures, andDeliciousalloweduserstosimilarlylabelwebpages; thereispresumablysomeindirectlinkherewiththerapidadoptionofTwitterhashtags. Thesetags–whichwereintendedtosupportbothcategorisationandsearch–weretakenfroman
26Indeedthereissomethingofaparadoxhere: thewebof2010andthewebof2000lookhugelydifferent, totheextentthattheyseemtocomefromdifferentworlds. Butthesedifferencesaretodowithnewsocialpossibilities, andwiththebrowser’snewandalmostentirelyunpredictedprogram-mingecology, ratherthanarisingfromanychangeintheunderlyingnutsandbolts; andthecloserwelookatWeb 2.0, themoreoutoffocusitseemstobecome.
18
RDF,theSemanticWeb, Jordan, JordanandJordan
unrestrictedvocabularybutwiththeexpectationthatuserswouldspontaneouslycooperatetochoosebroadlycompatibletags, inalightweightsharedsemanticenvironmentwhichwassoonlabelleda‘folksonomy’(aswebrieflymentionedabove). HoweverthisisasfartowardstheSemanticWebasWeb 2.0hasevermoved, andso, despitethesuggestionofthe‘Web 2.0’nameandsomeoftheaccompanyingrhetoric, Web 2.0hasratherlittletodowiththeSemanticWeb.There'smoretoexploreintheprogrammingecology, butinwebterms, Web 2.0isratheradeadend.
3 LinkedData
EvenacknowledgingthattheSemanticWebwillprobablyremainsomewhatob-scure, onecanaskhowbroadlyitisdeployed, ortowhatextentitsdeploymentsarereachable. Aswediscussedabove, theSemanticWebisstilllargelyusedin‘server-side’applications, ratherthanbecomingpartofthecommerceoftheopenweb, aswasoriginallyanticipated. PartofthereasonforthisisthattheSemanticWebtechnologystackischallengingtouse: thetoolsareofvaryingmaturity,anduseabroadrangeofunderlyingtechnologies, sothatdeploymentsarestilltoalargeextentbespoke. Evenwhensimplymakingdataavailable, ratherthanattemptingtoprocessit, makingdataweb-readymayrequireaquiteprofoundengagementwiththeSemanticandtextualwebs’conceptualfoundations; manysuchdeploymentsarestill, evennow, worthyofanacademicpaperdescribingtheexperience. Fewnon-specialistprojectscanmakesuchinvestments.
Analternativeapproach is touse the LinkedData paradigm.27 Atheart,the linkeddataparadigm is the SemanticWeb, butwithapractical focusontheirreducibleminimumrequiredtogetdataontothewebinawaywhichiscompatiblewiththegoalsandpracticesdescribedinSect. 1.
Quoting [24], thelinkeddataprinciplesare:
1. “UseURIsasnamesforthings.
2. UseHTTP URIssothatpeoplecanlookupthosenames.
3. Whensomeonelooksupa URI,provideusefulinformation, usingthestan-dards(RDF*, SPARQL).
4. IncludelinkstootherURIs. sothattheycandiscovermorethings.”
Notably, theseprincipleshavemoretosayaboutHTTP andURIs–thatis, aboutthe mechanics ofretrievingthisinformation, andbyextensionthemechanicsofmakingitavailable–thantheysayaboutthesemanticsoftheinformationbeingdistributed.
Together, thesesay(ineffect)thatthe‘linkeddataweb’isjustlikethewebofstrings, familiar tousall, except that therawmaterialsarenot HTML files,butfiles inoneorotherRDF syntax (soRDF/XML,Turtle, orRDFa). All fourpointsareessentiallyadaptationsofthedesignprinciplesofSect. 1.2 andbestpracticessuchas [25], whichwere, asitturnedout, soveryeffectiveinmakingtheweb(ofstrings)soverysuccessful. ThethirdpointstatesthatlookingupaURI shouldprovideuseful(RDF) information toamachine, butisalsotakentosuggestthatputtingaresource’sURI intoawebbrowsershouldprovidehuman-readable(HTML) informationaswell–thisprovidesonebridgebetweenthetwowebs.
27Theword‘paradigm’isprobablythebesttermingeneral, totheextentthatitrepresentstheresultofacceptingtheinsightofasetofratherhigh-levelprinciples. Howevermakingtheseprinciplesconcrete is intricateenough that ‘LinkedDataBestPractices’sometimesseemsmoredescriptive;andwhendefendingtheprinciplesagainstotherapproaches, theword‘dogma’springsquicklytomind.
19
RDF,theSemanticWeb, Jordan, JordanandJordan
Figure 3: TheLinkingOpenDataclouddiagram, asofApril2014. Althoughthe text andmost of the interconnections are too small tomakeout at thisscale, thetwolargestcentralblobs–whichactasrichsourcesofconsensusnamesforobjectsintheworld–representDBPediaandGeoNames. Thegreen-colouredclusters representgovernment (left)andbibliographicdata (right);otherclustersrepresentprovidersoflifesciences, geographic, socialnetwork-ing, media, user-generated, and linguistic data. For details and credits, seehttp://lod-cloud.net.
Althoughthetraditionalwebisgenerallythoughtofasdeliveringhuman-readablepages, ithasfromtheverybeginningbeenusedfordeliveringmachine-readablecontent, suchasdatafilesforanalysisorPDF filesforprinting. Thisisnot themachine-readabilitywemean in thiscontext. Justasahumanmightread aWikipedia article, internalise the results, andclickona link formoreinformation, alinkeddataclient–typicallyacomponentofalargerapplication– isexpected toprocess thesemanticcontentof thedata itfindsatoneURI,andthenfollowfurtherlinksforfurtherinformation. GiventheRDF ofFig. 1,forexample, a linkeddataclientmight store the information listed, and thenfollowthelink http://someone.org/ping toseeifthereisanyfurtherinformationavailablethere. Itisthisfinalstep, whichletsthemachine‘followitsnose’thatrepresentsthekeyinsightofthelinkeddataparadigm, andwhichisthemachineanalogueof‘wastinghoursonWikipedia’.
Thenetworkoflinksbetween‘linked-data’-styledatasourcesisillustratedinFig. 3. A largefractionofthedatainthiscloudisintheformofSKOS vocab-ularies, orusesthesimpleFOAF orDC ontologies; butwhiletheseareusefulforcoordinationbetweendatasources, thereisnorestrictionontheontologieswhichareusedinfact.
ThearchetypalLinkedData(client-side)applicationisasimpleapplication–perhapsevenrunningentirelywithinabrowser–whichreadsthecontentsofaURI,interpretsandperhapsdisplayswhatpredicatesitrecognises(whichwillprobablyincludeatleastFOAF),andletsausermoveontofurtherrelatedinfor-mation. ThearchetypalSemanticWebapplication(ifsuchathingexists)mightuseSPARQL toqueryalargeandsemanticallyrichdatabase, possiblyrequiringafairamountofinferencingonthepartoftheserviceandwithinitself, andre-quiringsignificanteffortonthepartofitsauthor, tounderstandtheontologyorontologiesinwhichtheinformationisencoded. Thoughthesewillbevisiblydifferentapplications, andoneisaloteasiertoimplementthantheother, itishardtoarticulateafundamentaldifferencebetweenthem.
20
RDF,theSemanticWeb, Jordan, JordanandJordan
Therearefurthercompactdetailsin [26], abook-lengthdiscussionwithapractical focus in [27], andpointers tocurrent informationat http://www.w3.org/standards/semanticweb/data.
4 Conclusion
Onecouldderivetheimpression, fromthelongdescriptionofSect. 1, thattheSemanticWebiscomplicatedandarcane. Itcanbe, buttherudehealthofthelinkeddatacloudshowsthatthatcomplicationisinthebackgroundofasimpleandrobustmeansofdistributingmachine-intelligibleinformation. Whilewemaynotbequiteatthestageoflettingourcomputersrunamokwithourcreditcard,ororganiseourlivesasthevisionof[8]imagines, wehavenowconfidently, ifstillratherdiscreetly, steppedintoawebofdata.
Acknowledgements
I ammostgratefultoSusanStuart forexactinganddetailedcommentsonthedraftsof this chapter, tomembersof the [email protected] list for ‘in-use’references, andtoChrisBizerformakingavailableanearlycopyofthe2014LinkedDatacloud.
Glossary
AI ArtificialIntelligence.
DC DublinCore: asetofcoremetadatatermsdevelopedbyandforthelibrarycommunity; see http://dublincore.org/.
FOAF FriendofaFriend: avocabularyforattributesof, andrelationshipsbe-tween, people; see http://www.foaf-project.org/.
HTML HypertextMarkupLanguage; see http://www.w3.org/html/wg/.
HTTP HypertextTransportProtocol: theunderlyinglanguageoftheweb; see [1].
RDFS RDF Schemas: lightweightontologiesforRDF;see [12].
RDF ResourceDescriptionFramework: see [10].
URI UniformResourceIdentifier: theformofaresourcename(almostthesameasaURL);see [2].
W3C WorldWideWebConsortium: theorganisationwhichdevelopsandstan-dardisesprotocolsfortheweb; see http://www.w3.org.
triple Thetripleofsubject, predicateandobjectwhichisattheheartoftheRDFmodel.
triple-store A databasedesignedtostoreRDF triples, ratherthanthetabulardataofamorecommon‘relational’database.
References
[1] Roy T Fielding, J Gettys, J Mogul, H Frystyk, LarryMasinter, P Leach, andTimBerners-Lee. Hypertexttransferprotocol–HTTP/1.1. RFC 2616, June1999. URL: http://www.ietf.org/rfc/rfc2616.txt.
21
RDF,theSemanticWeb, Jordan, JordanandJordan
[2] TimBerners-Lee, Roy ThomasFielding, andLarryMasinter. Uniformresourceidentifier(URI):Genericsyntax. RFC3986, January2005. URL:http://www.ietf.org/rfc/rfc3986.txt.
[3] TimBerners-LeeandRobertCailliau. WorldWideWeb: Proposalforahypertextproject. Webpage, CERN,November1990. URL:http://www.w3.org/Proposal.html.
[4] TimBerners-Lee. Informationmanagement: A proposal. Technicalreport,CERN,March1989. URL: http://info.cern.ch/Proposal.html.
[5] TimBray, JeanPaoli, andC M Sperberg-McQueen. Extensiblemarkuplanguage(XML) 1.0. W3C Recommendation, February1998. Firsteditionof http://www.w3.org/TR/REC-xml/. URL:http://www.w3.org/TR/1998/REC-xml-19980210.
[6] OraLassilaandRalph R Swick. Resourcedescriptionframework(RDF)modelandsyntaxspecification. W3C Recommendation, February1999.URL: http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.
[7] VannevarBush. Aswemaythink. TheAtlanticMonthly, July1945. URL:http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/.
[8] TimBerners-Lee, JamesHendler, andOraLassila. TheSemanticWeb.ScientificAmerican, 284(5):34–43, May2001.doi:10.1038/scientificamerican0501-34.
[9] JensLehmann, RobertIsele, MaxJakob, AnjaJentzsch, DimitrisKontokostas, Pablo N.Mendes, SebastianHellmann, MohamedMorsey,PatrickvanKleef, SörenAuer, andChristianBizer. DBpedia–alarge-scale, multilingualknowledgebaseextractedfromwikipedia.SemanticWebJournal, 2014. Toappear. doi:10.3233/SW-140134.
[10] RDF 1.1conceptsandabstractsyntax. W3C Recommendation, February2014. URL: http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.
[11] RDF 1.1Turtle. W3C Recommendation, February2014. URL:http://www.w3.org/TR/2014/REC-turtle-20140225/.
[12] RDF Schema1.1. W3C Recommendation, February2014. URL:http://www.w3.org/TR/2014/REC-rdf-schema-20140225/.
[13] RDF 1.1XML syntax. W3C Recommendation, February2014. URL:http://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/.
[14] RDF 1.1semantics. W3C Recommendation, 2014. URL:http://www.w3.org/TR/rdf11-mt/.
[15] MichaelAshburner, Catherine A.Ball, Judith A.Blake, DavidBotstein,HeatherButler, J. MichaelCherry, Allan P.Davis, KaraDolinski, Selina S.Dwight, Janan T.Eppig, Midori A.Harris, David P.Hill, LaurieIssel-Tarver,AndrewKasarskis, SuzannaLewis, John C.Matese, Joel E.Richardson,MartinRingwald, Gerald M.Rubin, andGavinSherlock. Geneontology:toolfortheunificationofbiology. NatureGenetics, 25(1):25–29, May2000. doi:10.1038/75556.
[16] MichaelBada, RobertStevens, CaroleGoble, YolandaGil, MichaelAshburner, Judith A.Blake, J. MichaelCherry, MidoriHarris, andSuzannaLewis. A shortstudyonthesuccessofthegeneontology. JournalofWebSemantics, 1(2), 2004. URL:http://www.websemanticsjournal.org/ps/pub/2004-9.
[17] Thomas R Gruber. A translationapproachtoportableontologyspecification. KnowledgeAcquisition, 5(2):199–220, 1993.doi:10.1006/knac.1993.1008.
[18] Deborah L.McGuinness. Ontologiescomeofage. InDieterFensel, JimHendler, HenryLieberman, andWolfgangWahlster, editors, Spinningthe
22
RDF,theSemanticWeb, Jordan, JordanandJordan
SemanticWeb: BringingtheWorldWideWebtoItsFullPotential.MITPress, 2003. URL: http://www-ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm.
[19] W3C OWL WorkingGroup. OWL 2webontologylanguagedocumentoverview(secondedition). W3C Recommendation, December2012.URL: http://www.w3.org/TR/owl2-overview/.
[20] TheW3C SPARQL WorkingGroup. SPARQL 1.1overview. W3CRecommendation, March2013. URL:http://www.w3.org/TR/sparql11-overview/.
[21] AaronSwartz. A ProgrammableWeb: AnUnfinishedWork. SynthesisLecturesontheSemanticWeb: TheoryandTechnology.Morgan&Claypool, 2013. doi:10.2200/S00481ED1V01Y201302WBE005.
[22] AgnèsSimon, RomainWenz, VincentMichel, andAdrienDi Mascio.Publishingbibliographicrecordsonthewebofdata: OpportunitiesfortheBnF (FrenchNationalLibrary). InPhilippCimiano, OscarCorcho,ValentinaPresutti, LauraHollink, andSebastianRudolph, editors, TheSemanticWeb: SemanticsandBigData, volume7882of LectureNotesinComputerScience, pages563–577.SpringerBerlinHeidelberg, 2013.10thInternationalConference, ESWC 2013, Montpellier, France, May26-30, 2013. URL:http://2013.eswc-conferences.org/program/accepted-papers,doi:10.1007/978-3-642-38288-8_38.
[23] L Garcia Castro, C McLaughlin, andA Garcia. Biotea: Rdfizingpubmedcentralinsupportforthepaperasaninterfacetothewebofdata. JournalofBiomedicalSemantics, 4(Suppl1):S5, 2013. URL:http://www.jbiomedsem.com/content/4/S1/S5,doi:10.1186/2041-1480-4-S1-S5.
[24] TimBerners-Lee. Linkeddata. Webpage, 2006. URL:http://www.w3.org/DesignIssues/LinkedData.html.
[25] LeoSauermann, RichardCyganiak, DannyAyers, andMaxVölkel. CoolURIsforthesemanticweb. W3C InterestGroupNote, December2008.URL: http://www.w3.org/TR/2008/NOTE-cooluris-20081203/.
[26] ChristianBizer, TomHeath, andTimBerners-Lee. Linkeddata–thestorysofar. InternationalJournalOnSemanticWebandInformationSystems,5(3):1–22, 2009. URL: http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf,doi:10.4018/jswis.2009081901.
[27] TomHeathandChristianBizer. LinkedData: EvolvingtheWebintoaGlobalDataSpace. SynthesisLecturesontheSemanticWeb: TheoryandTechnology.Morgan&Claypool, 2011. URL:http://linkeddatabook.com, doi:10.2200/S00334ED1V01Y201102WBE001.
Version4fa6c06eeaa4, 2015-01-18 23