semantic web and the web of commerce - pdf version
TRANSCRIPT
The Seman)c Web and The Web Of Commerce
The Seman)c web has the poten)al be completely disrup)ve or completely opportune to online commerce.
Barbara Starr Email: [email protected]
Twitter: @BarbaraStarr
Disrup)ve Innova)on A disrup've innova'on is an innova)on that disrupts an exis)ng market. The term is used in business and technology literature to describe innova)ons that improve a product or service in ways that the market does not expect, typically by lowering price or designing for a different set of consumers. In contrast to "disrup)ve" innova)on, a "sustaining" innova)on does not have an effect on exis)ng markets. Sustaining innova)ons may be either "discon)nuous"[1] (i.e. "transforma)onal") or "con)nuous" (i.e. "evolu)onary"). Transforma)onal innova)ons are not always disrup)ve. Although the automobile was a transforma)onal innova)on, it was not a disrup)ve innova)on, because early automobiles were expensive luxury items that did not disrupt the market for horse-‐drawn vehicles. The market for transporta)on essen)ally remained intact un)l the debut of the lower priced Ford Model T in 1908 by making higher speed, motorized transporta)on available to the masses.[2]
Disrup)ve innova)on
Christensen defines a disrup)ve innova)on as a product or service designed for a new set of customers.
Christensen argues that disrup)ve innova)ons can hurt successful, well managed companies that are responsive to their customers and have excellent research and development. These companies tend to ignore the markets most suscep)ble to disrup)ve innova)ons, because the markets have very )ght profit margins and are too small to represent significant growth.[5]
The Theory
What is the Seman)c Web
Seman)c web synonymous with Web 3.0 ?
PuXng structured informa)on into the web in some machine readable format
Seman)c meaning
Not about the rela)onships between links, but about rela)onship between things, and the proper)es of those things
A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities By Tim Berners-Lee, James Hendler and Ora Lassila
What is the Seman)c Web (cont)
“The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF)”
World Wide Web Consor)um
hZp://www.w3.org/2001/sw/
RDF -‐ Triples
• RDF (Resource Descrip)on Framework) A resource is anything you want to describe. An RDF triple contains a subject, predicate, and object.
e.g
Michael David knows
Michael boy is-‐a
subject predicate
object
subject predicate
object
RDF -‐ Triples
translates into (michael knows David)
(michael is-‐a boy) and we can infer with the right ontology & inferencing mechanism
(michael is-‐a person)
Michael David knows
Michael boy is-‐a
predicate
object
subject predicate
object
subject
(michael knows David)
(michael is-‐a boy) (michael is-‐a person)
RDF -‐ Triples
Triple store Or Web 3.0 database
Wikipedia Defini)on of an Ontology
In computer science and information science, an ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain."
In theory, an ontology is a "formal, explicit specification of a shared conceptualization".[1] An ontology provides a shared vocabulary, which can be used to model a domain — that is, the type of objects and/or concepts that exist, and their properties and relations.[2]"
Ontologies are used in artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, library science, enterprise bookmarking, and information architecture as a form of knowledge representation about the world or some part of it. The creation of domain ontologies is also fundamental to the definition and use of an enterprise architecture framework."
Not restricted to a hierarchical structure as with a taxonomy
Ontologies
• OWL – Web Ontology Language (OWL 2.0 released)
• RDFS – RDF Schema
• Some exis)ng standards ontologies: – FOAF -‐ Friend of a Friend -‐ for social networks
– SIOC -‐ Seman)cally Interlinked Online Communi)es
– Goodrela)ons for e-‐commerce
– Geodata
– Upper Level Ontology
– FOAF -‐ Friend of a Friend -‐ for social networks
– SIOC -‐ Seman)cally Interlinked Online Communi)es
– Google Ontology
– …..
RDFa
• RDFa is simply RDF in attributes. It adds a set of attribute level extensions to HTML, enabling rich metadata to be embedded within web pages. "
• It not only enables triples to be embedded in web pages but also ultimately enables the extraction of triples"
In short, this is the means by which we add structured markup to web pages
SPARQL
• SPARQL is an RDF Query Language. • It is a recursive acronym and stands for SPARQL Protocol And RDF Query Language.
• Informa)on from linked Datasets can be accessed via SPARQL queries.
• Most linked data sources provide SPARQL ENDPOINTS to enable access.
• A SPARQL endpoint -‐ provides access to its data via supported SPARQL protocol
To cut a loooong story short:
We go from: a web of documents or hyperlinks, to: a web of data or seman)c links
with: linked data and linked datasets.
We use RDF to represent the data on the web
and we use SPARQL to query the data
And RDFa is simply RDF in aZributes.
Core Concepts:
• Not that we have forgoZen about:
– RDFS (RDF Schema)
– OWL (Web Ontology Language)
• & then in summary
– RDF (central to all) – Linked Data – SPARQL
• RDFa (simply stated as it is html markup)
So at this point either your head is spinning
or
You are bored to death because you already know about the seman)c web
SO …
-‐ How is this being used? -‐ What is the extent of adop)on -‐ Who is using it? -‐ How can it be leveraged?
LOD Cloud Evolu)on
The rate of growth has been remarkable
Source maintained by: Richard Cygniak and Anja Jentsch. hZp://lod-‐cloud.net
March 5 -‐ 2009
As of March 2009
LinkedCTReactome
Taxonomy
KEGG
PubMed
GeneID
Pfam
UniProt
OMIM
PDB
SymbolChEBI
Daily Med
Disea-some
CAS
HGNC
InterPro
Drug Bank
UniParc
UniRef
ProDom
PROSITE
Gene Ontology
HomoloGene
PubChem
MGI
UniSTS
GEOSpecies
Jamendo
BBCProgramm
es
Music-brainz
Magna-tune
BBCLater +TOTP
SurgeRadio
MySpaceWrapper
Audio-Scrobbler
LinkedMDB
BBCJohnPeel
BBCPlaycount
Data
Gov-Track
US Census Data
riese
Geo-names
lingvoj
World Fact-book
Euro-stat
IRIT Toulouse
SWConference
Corpus
RDF Book Mashup
Project Guten-berg
DBLPHannover
DBLPBerlin
LAAS- CNRS
Buda-pestBME
IEEE
IBM
Resex
Pisa
New-castle
RAE 2001
CiteSeer
ACM
DBLP RKB
Explorer
eprints
LIBRIS
SemanticWeb.org Eurécom
ECS South-ampton
RevyuSIOCSites
Doap-space
Flickrexporter
FOAFprofiles
flickrwrappr
CrunchBase
Sem-Web-
Central
Open-Guides
Wiki-company
QDOS
Pub Guide
Open Calais
RDF ohloh
W3CWordNet
OpenCyc
UMBEL
Yago
DBpedia
Freebase
Virtuoso Sponger
March 27 -‐ 2009
As of March 2009
LinkedCTReactome
Taxonomy
KEGG
PubMed
GeneID
Pfam
UniProt
OMIM
PDB
SymbolChEBI
Daily Med
Disea-some
CAS
HGNC
InterPro
Drug Bank
UniParc
UniRef
ProDom
PROSITE
Gene Ontology
HomoloGene
PubChem
MGI
UniSTS
GEOSpecies
Jamendo
BBCProgramm
es
Music-brainz
Magna-tune
BBCLater +TOTP
SurgeRadio
MySpaceWrapper
Audio-Scrobbler
LinkedMDB
BBCJohnPeel
BBCPlaycount
Data
Gov-Track
US Census Data
riese
Geo-names
lingvoj
World Fact-book
Euro-stat
flickrwrappr
Open Calais
RevyuSIOCSites
Doap-space
Flickrexporter
FOAFprofiles
CrunchBase
Sem-Web-
Central
Open-Guides
Wiki-company
QDOS
Pub Guide
RDF ohloh
W3CWordNet
OpenCyc
UMBEL
Yago
DBpedia
Freebase
Virtuoso Sponger
DBLPHannover
IRIT Toulouse
SWConference
Corpus
RDF Book Mashup
Project Guten-berg
DBLPBerlin
LAAS- CNRS
Buda-pestBME
IEEE
IBM
Resex
Pisa
New-castle
RAE 2001
CiteSeer
ACM
DBLP RKB
Explorer
eprints
LIBRIS
SemanticWeb.org
Eurécom
RKBECS
South-ampton
CORDIS
ReSIST ProjectWiki
NationalScience
Foundation
ECS South-ampton
Sept 22 -‐ 2010
As of September 2010
MusicBrainz
(zitgist)
P20
YAGO
World Fact-book (FUB)
WordNet (W3C)
WordNet(VUA)
VIVO UFVIVO
Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UMBEL
UK Post-codes
legislation.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov
.uk
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
The Open Library (Talis)
t4gm
Surge Radio
STW
RAMEAU SH
statisticsdata.gov
.uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
Semantic CrunchBase
semanticweb.org
SemanticXBRL
SWDog Food
rdfabout US SEC
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAAS
KISTIJISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints
dotAC
DEPLOY
DBLP (RKB
Explorer)
Course-ware
CORDIS
CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov
.uk
referencedata.gov
.uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
PSH
ProductDB
PBAC
Poké-pédia
Ord-nance Survey
Openly Local
The Open Library
OpenCyc
OpenCalais
OpenEI
New York
Times
NTU Resource
Lists
NDL subjects
MARC Codes List
Man-chesterReading
Lists
Lotico
The London Gazette
LOIUS
lobidResources
lobidOrgani-sations
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
Linked Open
Numbers
lingvoj
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Good-win
Family
Jamendo
iServe
NSZL Catalog
GovTrack
GESIS
GeoSpecies
GeoNames
GeoLinkedData(es)
GTAA
STITCHSIDER
Project Guten-berg (FUB)
MediCare
Euro-stat
(FUB)
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
Freebase
flickr wrappr
Fishes of Texas
FanHubz
Event-Media
EUTC Produc-
tions
Eurostat
EUNIS
ESD stan-dards
Popula-tion (En-AKTing)
NHS (EnAKTing)
Mortality (En-
AKTing)Energy
(En-AKTing)
CO2(En-
AKTing)
educationdata.gov
.uk
ECS South-ampton
Gem. Norm-datei
datadcs
MySpace(DBTune)
MusicBrainz
(DBTune)
Magna-tune
John Peel(DB
Tune)
classical(DB
Tune)
Audio-scrobbler (DBTune)
Last.fmArtists
(DBTune)
DBTropes
dbpedia lite
DBpedia
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Discogs(Data In-cubator)
Climbing
Linked Data for Intervals
Cornetto
Chronic-ling
America
Chem2Bio2RDF
biz.data.
gov.uk
UniSTS
UniRef
UniPath-way
UniParc
Taxo-nomy
UniProt
SGD
Reactome
PubMed
PubChem
PRO-SITE
ProDom
Pfam PDB
OMIM
OBO
MGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Cpd
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
GenBank
ChEBI
CAS
Affy-metrix
BibBaseBBC
Wildlife Finder
BBC Program
mesBBC
Music
rdfaboutUS Census
LOD cloud – Sept 22 2010
As of September 2010
MusicBrainz
(zitgist)
P20
YAGO
World Fact-book (FUB)
WordNet (W3C)
WordNet(VUA)
VIVO UFVIVO
Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UMBEL
UK Post-codes
legislation.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov
.uk
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
The Open Library (Talis)
t4gm
Surge Radio
STW
RAMEAU SH
statisticsdata.gov
.uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
Semantic CrunchBase
semanticweb.org
SemanticXBRL
SWDog Food
rdfabout US SEC
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAAS
KISTIJISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints
dotAC
DEPLOY
DBLP (RKB
Explorer)
Course-ware
CORDIS
CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov
.uk
referencedata.gov
.uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
PSH
ProductDB
PBAC
Poké-pédia
Ord-nance Survey
Openly Local
The Open Library
OpenCyc
OpenCalais
OpenEI
New York
Times
NTU Resource
Lists
NDL subjects
MARC Codes List
Man-chesterReading
Lists
Lotico
The London Gazette
LOIUS
lobidResources
lobidOrgani-sations
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
Linked Open
Numbers
lingvoj
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Good-win
Family
Jamendo
iServe
NSZL Catalog
GovTrack
GESIS
GeoSpecies
GeoNames
GeoLinkedData(es)
GTAA
STITCHSIDER
Project Guten-berg (FUB)
MediCare
Euro-stat
(FUB)
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
Freebase
flickr wrappr
Fishes of Texas
FanHubz
Event-Media
EUTC Produc-
tions
Eurostat
EUNIS
ESD stan-dards
Popula-tion (En-AKTing)
NHS (EnAKTing)
Mortality (En-
AKTing)Energy
(En-AKTing)
CO2(En-
AKTing)
educationdata.gov
.uk
ECS South-ampton
Gem. Norm-datei
datadcs
MySpace(DBTune)
MusicBrainz
(DBTune)
Magna-tune
John Peel(DB
Tune)
classical(DB
Tune)
Audio-scrobbler (DBTune)
Last.fmArtists
(DBTune)
DBTropes
dbpedia lite
DBpedia
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Discogs(Data In-cubator)
Climbing
Linked Data for Intervals
Cornetto
Chronic-ling
America
Chem2Bio2RDF
biz.data.
gov.uk
UniSTS
UniRef
UniPath-way
UniParc
Taxo-nomy
UniProt
SGD
Reactome
PubMed
PubChem
PRO-SITE
ProDom
Pfam PDB
OMIM
OBO
MGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Cpd
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
GenBank
ChEBI
CAS
Affy-metrix
BibBaseBBC
Wildlife Finder
BBC Program
mesBBC
Music
rdfaboutUS Census
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
latest LOD cloud
Adopters? • UK Government • US Government • BBC (FIFA world cup site dynamically generated using linked data) • Thomson Reuters • Freebase • NY Times • Best Buy • Tesco • Google (More to follow hZp://rdf.data-‐vocabulary.org/#) • Yahoo • Facebook • Oracle • Tons more – Just look at the diversity in the LOD data cloud • …….
What is Seman)c Search
• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.
• There are several ways that the Search engines on the web may use this to enhance search results.
– FIND, rather than SEARCH. • Searching directly on the metadata directly can yield specific
answers or results as demonstrated in the following example:
Query “Barack Obama Birthday”
Results on
What is Seman)c Search (cont)
• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.
• There are several ways that the Search engines on the web may use this to enhance search results.
– FIND, rather than SEARCH. • Searching directly on the metadata directly can yield specific
answers or results as demonstrated in the following example: • Ran the query “Barack Obama Birthday” on both google, and
bing. Obtained the following:
– Answer engines rather than Search Engines? • At this point really, a defini)ve answer followed with the standard search result set for that query
What is Seman)c Search (Cont)
• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.
• There are several ways that the Search engines on the web may use this to enhance search results.
– FIND, rather than SEARCH. – Another aspect of using metadata such as embedding
metadata or semantic markup in web pages could be demonstrated by enhanced displays in search results (e.g. rich snippets in google). Both Google and Yahoo support enhanced displays for RDFa markup.
Rich Snippets
• Google now supports Rich snippets for – People – Events – Businesses and organiza)ons – Reviews – Recipes – Products – Breadcrumbs – Local Search – Video – images
hZp://rdf.data-‐vocabulary.org/#
Rich Snippets
• Google now supports Rich snippets for – People – Events – Businesses and organiza)ons – Reviews – Recipes – Products – Breadcrumbs – Local Search
hZp://rdf.data-‐vocabulary.org/#
Social Networks
• While search engines can benefit from access to social networks, social networks can benefit from seman)c metadata in web pages
– Example is Facebook’s Open Graph Protocol (also supports RDFa) which allows users to share & like objects (such as products) as opposed to web pages. Enables “Seman)c Profiling” of the users by facebook.
Generic Web Benefits / Uses
• Yahoo stated 15% increase in CTR as a result of enhanced displays, rich snippets in Google
• Definitive answers enabled by understanding and leveraging how search engines are searching directly on metadata
• Semantic Profiling and adoption by social networks
• Embedding semantic markup in web pages and product pages ultimately makes information “findable” by search engines, enabling them to provide improvements such as definitive answers, enhanced displays, etc
Google supports 3 alterna)ves for microformats for products
• Google format • Hproduct • Goodrela)ons
• Which do we use & why?
What can we do with RDFa?
• Produce RDFa • Consume RDFa
• Build smart applica)ons with consumed RDFa as we now have a triple store and all the reasoning and decision making tools such as those provided by allegrograph
hZp://www.overstock.com/Clothing-‐Shoes/Grane-‐Womens-‐Double-‐breasted-‐Military-‐Coat/
5237784/product.html
Jans will demonstrate how to navigate this in allegrograph to ensure rela)onships set correctly re: nested divs, etc
1. Op)mized RDFa in all pages that -‐ serves Google Rich Snippets -‐ serves Yahoo SearchMonkey -‐ is GoodRela)ons valid -‐ is accessible to all client-‐side browser extensions (there will be a lot of such services soon, believe me)
-‐ is perfect for any seman)cs-‐aware search engines so that overstock.com offers are visible
2. Novel recommender systems based on GoodRela)ons I have quite a lot of novel approaches for GoodRela)ons-‐based
recommender systems (in-‐site), which would be very useful for consumer electronics and fashion, in par)cular.
3. Precision E-‐Commerce Content Syndica)on
Soon, owners of small sites will be able to syndicate very specific e-‐commerce content based on GoodRela)ons (same as currently supported by eBay and Amazon, but then across the whole WWW).
We have a prototype for this already running, it could be a great new direc)on for overstock. (besides the posi)ve buzz about it).
hZp://www.ny)mes.com/2010/11/25/technology/personaltech/25smart.html?src=busln
Myriad of compe))ve shopping apps already out there, but always room for one more?
How is this poten)ally disrup)ve The hypothesis: Possible New Consumer market. Good for mama & papa Shops and small business. Also for Overstock. Data exposed that may be lost if only consumed by larger users e.g. google
Mama & Papa Online Store
Web of data – LOC cloud
Seman)c GR Based apps -‐
Seman)c GR based Shopping apps
RDFa
Random Shopper
Shoppers
Seman)c GR based Shopping apps
Where do we go from here?
• Other means of supplying gr markup hZp://www.jarltech.de is exposing its full catalog with ca. 4,000 auto-‐ID products in RDF/XML (updated several )mes a week!) (not as silly as I first thought)
hZp://www.jarltech.de/goodrela)ons.rdf
Other Ver)cals within retail
• New Ver)cal Markets – Automa)ve (Large automo)ve manufacturer recently adopted GR :
announcement coming soon) – Financial – Fashion – Real Estate – Tickets – Electronics