embedding knowledge in html - inspiring innovation
TRANSCRIPT
EmbeddingKnowledgeinHTML
Somecontentfromapresenta.onsbyIvanHermanoftheW3c
Overview
l WhywewanttoembedstructureddatainHTML
l RDFal Microdataandschema.orgl RDFaliteasanencodingforMicrodatal JSON-LDasanencodingforRDFandMicrodata
l Usecasesandexamples
HTMLisEverywhere
l WeusuallythinkofHTMLasthelanguageofWebpages
l Butit’salsowidelyusedon/formobiledevicesandtablets– Itreadilyadaptsfordifferentscreensizes/orienta.ons
l Andisthebasisofmanyebookformats– E.g.Kindle’sformats,mobi,epub
l HowcanweaddknowledgetoHTMLpages?
AddingRDF-likedatatoHTML
l We’dliketoaddsemi-structuredknowledgetoaconven.onalHTMLdocument– HumansseeandunderstandregularHTMLcontent(text,images,videos,audio)
– MachinesseeandunderstanddatamarkupinXML,RDForsomeotherformat
l Possibili.esinclude– Addalinktoseparatedocumentwithknowledge– Embedknowledgeascomments,Javascript,etc.– DistributeknowledgemarkupthroughoutHTMLasaZributesofexis.ngHTMLtags
l Contentprovidersprefernottogeneratemul.plepage:,oneforhumans(HTML)andanotherformachines(RDF)– RDFserializa.onsarecomplex– Requiresseparatestorage,genera.on,etc.mechanisms
– Introducesredundancy,whichcanleadtoerrorsifwechangeonepagebutnottheother
l Simplifiesthejobofsearchenginesaswell
Onepage,nottwo
Generalapproachl Provideorreusetaga"ributestoencodethemetadata– Browsers&appsignoreaZributestheydon’tunderstand
l Threeapproacheshavebeendeveloped– Microformats(~2005)– RDFa(~2007)– Microdata(akaschema.org)(~2012)
l Status2014/5(IMHO)– Microformatsusedbutfutureislimited– RDFabecomingtheencodingofchoice– Schema.orgvocabulariesgehnglargeuptake
l Earliestidea,supplantedbyRDFaandMicrodatal ReusesHTMLaZributeslike@class,@.tlel Separatevocabulariesdevelopedforcommonusecases,e.g.,address,CV,recipes…
l Difficulttomixmicroformats(noconceptofnamespaces)
l Doesn’tdefineanRDFrepresenta.onpossibletotransformvia,e.g.,XSLT+GRDDL,buttransforma.onsarevocabularydependent
l vCard:popularformatfor“businesscard”data
l Exampleusecaseforemail– SenderaZachesvCardtoemailmessage– Recipientdetachestocontactapp
l hCardisaMicroformatbasedonvCard– AllowswaytoembedvCarddatainawebpage
l vCard:popularformatfor“businesscard“data
l Exampleusecaseforemail– SenderaZachesvCardtoemailmessage– Recipientdetachestocontactapp
l hCardisaMicroformatbasedonvCard– AllowswaytoembedvCarddatatoawebpage
BEGIN:VCARD VERSION:4.0 N:Forrest;Gump;;Mr.; FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;MEDIATYPE=image/gif:http://www.example.com/dir_photos/my_photo.gif TEL;TYPE=work,voice;VALUE=uri:tel:+1-111-555-1212 TEL;TYPE=home,voice;VALUE=uri:tel:+1-404-555-1212 ADR;TYPE=WORK,PREF:;;100 Waters Edge;Baytown;LA;30314;United States of Amer ica LABEL;TYPE=WORK,PREF:100 Waters Edge\nBaytown\, LA 30314\nUnited States of America ADR;TYPE=HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;TYPE=HOME:42 Plantation St.\nBaytown\, LA 30314\nUnited States of Ame rica EMAIL:[email protected] REV:20080424T195243Z END:VCARD
l vCard:popularformatfor“businesscard“data
l Exampleusecaseforemail– SenderaZachesvCardtoemailmessage– Recipientdetachestocontactapp
l hCardisaMicroformatbasedonvCard– AllowswaytoembedvCarddatatoawebpage
<ul class="vcard"> <li class="fn”>Forrest Gump</li> <li class="org”>Bubba Gump Shrimp Co.</li> <li class="tel”>1-111-555-1212</li> <li><a class="url" href="http:/bubbagump.com/"> http://bubbagump.com/</a></li> </ul>
l DefinedandsupportedbyGoogle,Bing,YahooandYandex
l AddsnewaZributestoHTML5toexpressmetadata
l Workswellforsimpler“single-vocabulary”cases,butnotwellsuitedformixingvocabulariesorforcomplexvocabularies
l Nono.onofdatatypesornamespacesl DefinesagenericmappingtoRDF
Microdataapproach
l Addsnew(X)HTML/XMLaZributesl HasnamespacesandURIsatitscore
– Somixingvocabularyiseasy,asinRDF
l CompleteflexibilityforusingliteralsorURIresources
l Isacompleteserializa.onofRDF
RDFaapproach
YieldingthisRDF
<http://www.ivan-herman.net/foaf#me> schema:alumniOf <http://www.elte.hu> ; foaf:schoolHomePage <http://www.elte.hu> ; schema:worksFor <http://www.w3.org/W3C#data> ; … <http://www.elte.hu> dc:title "Eötvös Loránd University of Budapest" . … <http://www.w3.org/W3C#data> dc:title "World Wide Web Consortium (W3C)” …
YieldingthisRDF
[ rdf:type schema:Review ; schema:name "Oscars 2012: The Artist, review" ; schema:description "The Artist, an utterly beguiling…" ; schema:ratingValue "5" ; … ]
RichSnippetsl Searchenginesaddtextunderresultstopreviewwhat’sonpageandwhyit’srelevant
l Textosenextractedfromstructureddataembeddedonthepage
l SeehZp://bit.ly/RichSNformoreinforma.on
l RDFaandMicrodataaremodernop.onsl Bothhavesimilarapproaches– StructureddataencodedinHTMLa"ributesonly–nonewelements
– Definesomespeciala"ributese.g.,itemscopeformicrodata,resourceforRDFa
– ReusesomeHTMLcoreaZributes(e.g.,href)– UsetextualcontentofHTMLsource,ifneeded
l RDFdatacanbeextractedfromboth
RDFaandMicrodata:similariGes
l Microdataop:mizedforsimplerusecases:– Onevocabularyata.me– Treeshapeddata– Nodatatypes
l RDFaprovidesfullserializa.onofRDFinXMLorHTML– PriceisextracomplexityoverMicrodata
l RDFa1.1LiteisasimplifiedauthoringprofileofRDFa,verysimilartomicrodata
RDFaandmicrodata:differences
AmountofstructureddataonWeb?
l WebDataCommonsprojectusesCommonCrawldatatoes.mateamountofstructureddataonWeb
l LookedforMicrodata,RDFaotherformats(e.g.,hCalendar,hCard)inURLsparsableasHTML
l November2015crawlfound– 541Mpagesoutof1.77B(30%)withstruct-ureddatain2.7Mdomainsof14.4M(19%)
– 24.2Btriplesabout6.1Ben..es
l Datacanbedownloaded
AmountofstructureddataonWeb?
Conclusions
l Theamountofstructureddataonthewebisgrowingsteadily
l Microdatashowsthestrongestgrowthl RDFaalsocommonl Microformatdataisprobablynotgrowingasmuch