rdfa - csee.umbc.edurdfa: embedding rdf knowledge in html some content from a presentation by ivan...
TRANSCRIPT
RDFa:EmbeddingRDF
KnowledgeinHTML
SomecontentfromapresentationbyIvanHermanoftheW3c,IntroductiontoRDFa,givenatthe2011SemanticTechnologiesConference.
lSerializationofRDFembeddedinHTML,HTMLorXMLProvidessetofattributes(thea inRDFa)tousewithexistingtagstocarryRDFmetadata
l2004:workondevelopingstandardsbeganl2008:RDFa1.0arecommendation(butonlyinXHTML,whichfailedtolaunch)
l2012-15:RDFa1.1recommendation(worksinHTML4,HTML5)
lSeehttp://rdfa.info/
WhatisRDFa?
lRDFcontentspecifiedinXMLattributes oftagsratherthanelements
lTheXML/HTMLtreestructure isusedascontext,whenappropriate
lSomenewattributesareintroduced andsomeexistingones(@href,@rel)reused
lWhenpossible,HTMLtextcontentusedforliteralvalues
èSamefileusedbybrowser& RDFextractor
PrinciplesofRDFa
Webpageviewedbyaperson
http://www.w3.org/ns/entailment/data/RDFS.html
Thesource<p about="http://www.w3.org/ns/entailment/RDFS"
property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>
SourceandgeneratedRDF…<p about="http://www.w3.org/ns/entailment/RDFS"
property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>
<http://www.w3.org/ns/entailment/RDFS>… .
SourceandgeneratedRDF…<p about="http://www.w3.org/ns/entailment/RDFS"
property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>
<http://www.w3.org/ns/entailment/RDFS><http://purl.org/dc/terms/description>
… .
SourceandgeneratedRDF…<p about="http://www.w3.org/ns/entailment/RDFS"
property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>
<http://www.w3.org/ns/entailment/RDFS><http://purl.org/dc/terms/description>
"Unique identifier for RDFS Entailment." .
TheWebpageviewedbyaperson
Thesource<a about="http://www.w3.org/ns/entailment/RDFS"
rel="http://www.w3.org/2000/01/rdf-schema#seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">
RDF Semantics.</a>
SourceandgeneratedRDF…<a about="http://www.w3.org/ns/entailment/RDFS"
rel="http://www.w3.org/2000/01/rdf-schema#seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">
RDF Semantics.</a>
<http://www.w3.org/ns/entailment/RDFS>….
SourceandgeneratedRDF…<a about="http://www.w3.org/ns/entailment/RDFS"
rel="http://www.w3.org/2000/01/rdf-schema#seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">
RDF Semantics.</a>
<http://www.w3.org/ns/entailment/RDFS><http://www.w3.org/2000/01/rdf-schema#seeAlso>
… .
SourceandgeneratedRDF…<a about="http://www.w3.org/ns/entailment/RDFS"
rel="http://www.w3.org/2000/01/rdf-schema#seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">
RDF Semantics.</a>
<http://www.w3.org/ns/entailment/RDFS><http://www.w3.org/2000/01/rdf-schema#seeAlso>
<http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> .
NtriplesinHTMLL
<http://www.w3.org/ns/entailment/RDFS> <http://purl.org/dc/terms/description>
"Unique identifier for RDFS Entailment." .<http://www.w3.org/ns/entailment/RDFS>
<http://www.w3.org/2000/01/rdf-schema#seeAlso><http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> .
lAllowURI prefixesandsharedsubject,likethis@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix dcterms: <http://purl.org/dc/terms/> .
<http://www.w3.org/ns/entailment/RDFS>rdfs:seeAlso <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> ;dcterms:description "Unique identifier for RDFS Entailment." .
lMaybewecandobetter,insteadofthis
lTurtlesupportsseveralsimplifyingideaslUsecompactURIs (CURIE)whenpossible
– URIwithaprefixdefinedelsewhere,e.g.,foaf:mbox
lMakinguseofthenaturalstructurefor– sharedsubjects– sharedpredicates– creatingblanknodes– etc.
Turtlizing RDFa
CURIEdefinitionandusage<html>…<p about="http://www.w3.org/ns/entailment/RDFS"
property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>
…</html>
l canbereplacedby:<html prefix="dcterms: http://purl.org/dc/terms/">
…<p about="http://www.w3.org/ns/entailment/RDFS"
property="dcterms:description">Unique identifier for <em>RDFS Entailment</em>.</p>
…</html>
lCanbeanywhereintheHTMLtreeandisvalidforentiresub-tree– i.e.,htmlelementnottheonlyplacetohaveit
lThesame@prefixattributecanholdseveraldefinitions:– prefix="dcterm:http://purl.org…foaf:http://…”
lCURIEsand“real”URIscanusuallybemixedlCURIEscannot beusedon@href
Detailson@prefixinRDFa
Sharingsubjects
<html prefix="dcterms: http://purl.org/dc/terms/rdfs: http://www.w3.org/2000/01/rdf-schema#">
…<body about="http://www.w3.org/ns/entailment/RDFS">
…<p property="dcterms:description">Unique identifier for <em>RDFS Entailment</em>.</p>
<p>…<a rel="rdfs:seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210">RDFS Semantics</a>…</p>
Basicprinciple:@aboutisinheritedbychildrennodes,sonoreasontorepeatit
…yielding
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix dcterms: <http://purl.org/dc/terms/> .
<http://www.w3.org/ns/entailment/RDFS>rdfs:seeAlso <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> ;dcterms:description "Unique identifier for RDFS Entailment." .
Onreusingliterals
lReusingliteralsisaplus,butyoudon’talwayswanttodoit
lThebasicrulesays:the(RDF)LiteralistheenclosedtextfromtheHTMLcontent
lThisisfinein80%ofthecases,but…l…itmaynotbenaturalinmanycases!
Example:dates<body about=".." prefix="dcterms: http://… xsd: http://…"
<address><p property="dcterms:date" datatype="xsd:date">2010-07-05</p>
</address></body>
lThisleadsto:@prefix dcterms: <http://…> .@prefix xsd: <http://…> .<..> dcterms:date "2010-07-05"^^xsd:date .
l2010-07-05isofficialISOformat(forxsd:date)but“July5,2010”ispreferredbypeople
Usageof@content<body about=".." prefix="dcterms: http://… xsd: http://…"
<address><p property="dcterms:date" datatype="xsd:date"
content="2010-07-05">July 5, 2010</p></address>
</body>
lAlsoleadsto:@prefix dcterms: <http://…> .@prefix xsd: <http://…> .<..> dcterms:date "2010-07-05"^^xsd:date .
lHereisourrulesofar– @aboutsetsthesubject– @hrefsetstheobject
lButthatisnotalwaysgoodenough– Wemaynotwanttointroduceanactivelink(i.e.,"a"element)onthewebpage
– whataboutotherlinksinHTML?
Onsubjectsandobjects
Wemaynotalwayswantlinks…
<span about="http://www.ivan-herman.net/foaf#me"><span rel="rdfs:seeAlso"
resource="http://www.w3.org/People/Ivan/">Activity Lead</span></span>
lThe RDFa @resource attribute is equivalent to @href
lSets the object, just like @href but is ignored by browsers, e.g.,:
Morefeatures
lRDFa 1.1hasmorefeaturesthatmakeiteasiertorepresentknowledgecompactlyinHTML
lThesetakeadvantageoftheHTMLtreecontext
lWe’llskipthedetails,whichyoucanfindin– RDFa1.1Primer– RDFa1.1Core
lSometoolsalreadyhaveRDFafacilities:– e.g.,itispossibletoaddtherightDTDtoDreamweaver,Amayahasitatitscore,etc.
lTherearepluginsto,e.g.,WordPress,togenerateRDFamarkup
lCMSsystems(likeDrupal7)mayhaveRDFabuiltintheirpublicationsystem– usersgenerateRDFawhethertheyknowaboutitornot…
AuthoringRDFa
lMajorsearchengines(Google,Yahoo)processRDFaforvocabulariestheyunderstandcanuse
lTherearelibraries,distillers,etc.,toextractRDFainformation– maybepartofRDFdevelopmentenvironmentslikeRedland,RDFLib
– see,forfurtherreferences,http://rdfa.info/wiki/Consume
lFacebook’s“socialgraph”isbasedonRDFa
ConsumingRDFa
ApagefromBestBuyRDFa for Facebook markup, JSON-LD for search engines
FB’sOpenGraphProtocol
lRDFa+HTML filecanjustbeonaserver– theclientextractstheRDFcontent
lContentnegotiationscanbesetupontheserverside– theclientgetstheformathe/sheasksfor– theRDFcontentcaneitherbegeneratedontheflyorstoredontheserverstatically
PublishingRDFa
Embeddedmetadata(microdataorRDFa)isusedtoimprovesearchresultpage– atthemomentonlyafewvocabulariesarerecognized,butthatisevolvingcontinually
Google’srichsnippets
AnumberofpopularsitespublishRDFaaspartoftheirnormalpages:
– Tesco,BestBuy,Slideshare,TheLondonGazette,Newsweek,MSNBC,O’ReillyCatalog,theWhiteHouse…
– CreativeCommonssnippetsareinRDFa(e.g.,onFlickr)
Effectsof,e.g.,GoogleorFacebook
CourtesyofJayMyers,BestBuy,SemTech2010Presentation
BestBuyexampleofRDFause
BestBuyexampleofRDFaUse
CourtesyofJayMyers,BestBuy,SemTech2010Presentation
lReportedinaBestBuyblog:– GoodRelations+RDFa improvedGoogleranktremendously
– 30%increaseintrafficonBestBuystorepages– Yahooobserversa15%increaseinclick-throughrate
lToday,BestBuyusesRDFaformuchmorethanjustsnippets– E.g.,tolocateshopsthathavecertainproductsonstock…
EffectsonBestBuy
LibraryofCongressRDFause
LibraryofCongressRDFause
Overstock.com example
Overstock.com example
Drupalcontentmanagementsystem
l RDFsupportinDrupalv.7
l MajorCMSsysteml HasRDFathiscore,pagescontainRDFa
l InonestepmillionsofpagesofadditionalRDFdata!
TheExaminer.com
TheExaminer.com
Extractingthedatardfa>pythongetdata.py "http://www.w3.org/ns/entailment/data/RDFS.html"@prefixdc:<http://purl.org/dc/terms/>.@prefixent:<http://www.w3.org/ns/entailment/>.…ent:RDFS aent:Entailment ;dc:creator <http://www.ivan-herman.net/foaf#me>;dc:date "2010-05-03"^^xsd:date ;dc:description "UniqueidentifierforRDFSEntailment";rdfs:comment "ThespecificationfortheRDFSentailmentis…SemanticsW3CRecommendation.";rdfs:isDefinedBy <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#rdfs_entailment>;rdfs:seeAlso <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/>.
<http://www.w3.org/ns/entailment/data/RDFS.html>dc:title "InformationResourceRDFSEntailment";xhv:stylesheet <http://www.w3.org/StyleSheets/TR/base>.
<http://www.ivan-herman.net/foaf#me>afoaf:Person ;rdfs:seeAlso <http://www.ivan-herman.net/foaf>;foaf:mbox <mailto:[email protected]>;foaf:name "IvanHerman";foaf:title "SemanticWebActivityLead";foaf:workplaceHomepage <http://www.w3.org>.
getdata.py isverysimpleimportrdflib,sysifnot(1<len(sys.argv)<4):print'usage:pythongetdata.py url [‘json-ld’|rdfa |rdfa1.1|microdata|html]'
print'eg:pythongetdata.py "http://www.w3.org/ns/entailment/data/RDFS.html"'
sys.exit(0)
url =sys.argv[1]format=sys.argv[2]iflen(sys.argv)==3else'rdfa1.1’g=rdflib.Graph()g.parse(url,format=format)printg.serialize(format='n3')
OpenLinkStructuredDataSniffer*
* http://osds.openlinksw.com/
OpenLinkStructuredDataSniffer*
* http://osds.openlinksw.com/
lWebdeveloperswantcontentproviderstoaddstructureddatatoHTMLpages
lContentprovidersareincentivizedtodosobecausetheircontentwillbebetterunderstood,rankedhigher,moreuseful,etc.
lRDFaismostpowerful& flexibleknowledgemarkupstandardunderstoodbysearchengines
lRDFaisalsoanalternativeserializationoffullRDF
Conclusions