it’s the a-box, stupid! · it’s the a-box, stupid! (free after carvill/clinton) frank van...

43
It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix, but must attribute & non-commercial

Upload: others

Post on 25-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

It’s the A-box, stupid!(free after Carvill/Clinton)

Frank van HarmelenVrije Universiteit Amsterdam

Creative Commons License: allowed to share & remix,but must attribute & non-commercial

Page 2: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Semantic Web

News Headlines

Page 3: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,
Page 4: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,
Page 5: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,
Page 6: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,
Page 7: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

energy consumptionunemployment ratesriver elevationssocial benefitstrade statisticsassaults on policetornado reportscrime statisticsconsumer price indexrecent earthquakesconsumer expendituretoxic releases

Page 8: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,
Page 9: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,
Page 10: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

<rdf:RDF><rdf:Description rdf:about="/music/artists/584c04d2-4acc-491b-8a0a-e63133f4bfc4.rdf<rdfs:label>Description of the artist Yeah Yeah Yeahs</rdfs:label><foaf:primaryTopic rdf:resource="/music/artists/584c04d2-4acc-491b-8a0a-e63133f4bf</rdf:Description><mo:MusicArtist rdf:about="/music/artists/584c04d2-4acc-491b-8a0a-e63133f4bfc4#a<rdf:type rdf:resource="http://purl.org/ontology/mo/MusicGroup"/><foaf:name>Yeah Yeah Yeahs</foaf:name><ov:sortLabel>Yeah Yeah Yeahs</ov:sortLabel><bio:event><bio:Birth><bio:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime

</bio:event><owl:sameAs rdf:resource="http://dbpedia.org/resource/Yeah_Yeah_Yeahs"/>

<mo:image rdf:resource="/music/images/artists/7col_in/584c04d2-4acc-491b-8a0a-e63<foaf:page rdf:resource="/music/artists/584c04d2-4acc-491b-8a0a-e63133f4bfc4.html"/<mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/584c04d2-4acc-491b-8a0a-<foaf:homepage rdf:resource="http://www.yeahyeahyeahs.com/"/><mo:wikipedia rdf:resource="http://en.wikipedia.org/wiki/Yeah_Yeah_Yeahs"/><mo:myspace rdf:resource="http://www.myspace.com/yeahyeahyeahs"/><mo:member rdf:resource="/music/artists/a1439b8d-672a-446f-a7ff-6f09d68254b3#art<mo:member rdf:resource="/music/artists/14d44067-99c2-4f77-b58b-138f0b6911fa#ar<mo:member rdf:resource="/music/artists/20dc35ec-6cc1-4c66-98a3-4a6116cb3869#ar...

Page 11: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

<foaf:made><mo:Record><dc:title>It's Blitz!</dc:title><mo:musicbrainz rdf:resource="http://musicbrainz.org/release/9c4177fe-bdce-4f9d-ab<rev:hasReview rdf:resource="/music/reviews/hnp2#review"/></mo:Record>

</foaf:made>.....<mo:MusicArtist rdf:about="/music/artists/a1439b8d-672a-446f-a7ff-6f09d68254b3#artis<foaf:name>Brian Chase</foaf:name>

</mo:MusicArtist>

<mo:MusicArtist rdf:about="/music/artists/14d44067-99c2-4f77-b58b-138f0b6911fa#arti<foaf:name>Karen O</foaf:name>

</mo:MusicArtist>

<mo:MusicArtist rdf:about="/music/artists/20dc35ec-6cc1-4c66-98a3-4a6116cb3869#art<foaf:name>Nick Zinner</foaf:name>

</mo:MusicArtist></rdf:RDF>

Page 12: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,
Page 13: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

<rdf:RDF>−<rdf:Description rdf:about="/music/reviews/h24h.rdf">

<rdfs:label>Description of a review of Fever To Tell</rdfs:label><foaf:primaryTopic rdf:resource="/music/reviews/h24h#review"/>

</rdf:Description>−<rev:Review rdf:about="/music/reviews/h24h#review"><rev:title>Fever To Tell</rev:title>−<rdfs:label> Review of Fever To Tell - Yeah Yeah Yeahs, by Nick Reynolds</rdfs:label<rev:createdOn rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2003−<foaf:primaryTopic>−<mo:Record>

<dc:title>Fever to Tell</dc:title><owl:sameAs rdf:resource="http://dbpedia.org/resource/Fever_to_Tell"/><mo:musicbrainz rdf:resource="http://musicbrainz.org/release/f4783344-6746-4938<foaf:maker rdf:resource="/music/artists/584c04d2-4acc-491b-8a0a-e63133f4bfc4#</mo:Record>

</foaf:primaryTopic>−<rev:reviewer>−<foaf:Person><foaf:name>Nick Reynolds</foaf:name></foaf:Person>

</rev:reviewer>−<rev:text><p>When the Yeah Yeah Yeahs stormed into the UK...

</rev:text><cc:license rdf:resource="http://creativecommons.org/licenses/by-nc-sa/3.0/"/>

</rev:Review>

Page 14: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

meta-lex + RDF-a

hosting LOD

EU tenders

property+attributeRDF-a

RDF export

Page 15: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

When success is becoming a

problem...

Page 16: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Success is becoming a problem

Gartner (May 2007):"By 2012,

70% of public Web pages will have some level of semantic markup,20% will use more extensive Semantic Web-based ontologies”

• Semantic Technologies at Web Scale?– 20% of 30 billion pages @ 1000 triples per page =

6 trillion triples– 30 billion and 1000 are underestimates,

imagine in 6 years from now…– data-integration and semantic search at web-scale?

Page 17: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 17 http://www.aifb.uni-karlsruhe.de/WBS

1 triple:

Page 18: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 18 http://www.aifb.uni-karlsruhe.de/WBS

Page 19: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 19 http://www.aifb.uni-karlsruhe.de/WBS

Page 20: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 20 http://www.aifb.uni-karlsruhe.de/WBS

Page 21: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 21 http://www.aifb.uni-karlsruhe.de/WBS

107 Triples[OWLIM]

Suez Canal

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 21 http://www.aifb.uni-karlsruhe.de/WBS

Page 22: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 22 http://www.aifb.uni-karlsruhe.de/WBS

RDF Store subsecond querying108 Triples

[Ingenta]

Moon

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 22 http://www.aifb.uni-karlsruhe.de/WBS

Page 23: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 23 http://www.aifb.uni-karlsruhe.de/WBS

~109 TriplesEarth

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 23 http://www.aifb.uni-karlsruhe.de/WBS

Page 24: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 24 http://www.aifb.uni-karlsruhe.de/WBS

[LarKC proposal] ~1010 Triples ≈ 1 triple per web-page

≈ 1 triple per web-page

Jupiter

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 24 http://www.aifb.uni-karlsruhe.de/WBS

Page 25: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 25 http://www.aifb.uni-karlsruhe.de/WBS

~1011 Triples

Page 26: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 26 http://www.aifb.uni-karlsruhe.de/WBS

Distance Sun – Pluto

Fensel / Harmelen estimate1014 Triples

~1014 Triples

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 26 http://www.aifb.uni-karlsruhe.de/WBS

Page 27: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

What to do when success becomes a problem?The Large Knowledge Collider

a platform for infinitely scalable reasoning on the data-web

Page 28: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Why “LarKC” ?

• The Large Knowledge Collider

A configurable platformfor experimentationby others

Page 29: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

footer08/01/09

Part I: LarKC platform

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

Semantic web research is stifled by the complexity of writing a large scale engine, with services for data access, storage, aggregation, inference, transport, transformation, etc,

Physics research has dealt with a similar problem by providing large scale infrastructure into which experiments can be plugged.

The idea behind LarKC, which I found so compelling, is that people who wanted to build small scale plugins, for example, plugins for some non-standard deduction, or transformation of text to triples, or estimating the weights for relational models, could do so, taking advantage of the EU's investment in a platform with significant capabilities.“

Quote from US high-tech CTO:

29

Page 30: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

footer08/01/09

Part I: LarKC platform

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

“LarKC's value is as an experimental platform. LarKC is as an environment where people can go to

replicate (or extend) their results in an environment where all the infrastructural heavy lifting has already been taken care of”

Quote from EU Project Officer:

30

Page 31: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

“Configurable platform”“a configurable platform for infinitely scalable semantic web reasoning”

Page 32: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

What do we mean by:

• reusable components • reconfigurable workflows• provide infrastructure needed by all users:

– storage and retrieval– communication (between plugins, plugins and datalayer)– synchronisation (support for anytime behaviour) – registration (of plugins) – abstracts from local or remote data-storage– abstracts from local or remote plugin-invocation – (will) provide instrumentation & measuring – (will) provide for caching and data-locality

• integration of very heterogeneous components– heterogeneous data: unstructured text, (semi)structured data– heterogeneous code: Java, scripts, remote services

("wrap & integrate")

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

32

Page 33: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Infinite scalability?

parallelisation• cluster computing

distribution• “Thinking@home”,

“self-computing semantic Web”

approximation• “almost” is often good enough• gets better with more resources

Page 34: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

What do we mean by:

not only: deductive inference over given axiomsbut also:

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

where do the axioms come from? (IDENTIFY)which part of knowledge & data is required (SELECTion)when is an answer "good enough" or "best possible" (DECIDEr)non-deductive inference (inductive, statistical) (REASONer)

Remember: “ReaSearch: integrating reasoning and search"

34

Page 35: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

25 working plugins

• IDENTIFY data sources(e.g. Sindice, Swoogle)– note: use existing web-service

• IDENTIFY text sources (Google)– note: use existing non-semantic search engine

• TRANSFORM text to triples (GATE, Open Calais)– note: use a very large (pipeline-based) system (GATE)

• TRANSFORM XML data to RDF triples (XSLT)

• SELECT geographically relevant triples (Allegrograph)– note: use another RDF store as SELECT

• SELECT semantically relevant triples– tokens, key phrases, prior knowledge, ranked

• SELECT structurally triples through spreading activitation

• REASON with very different reasoners (Jena, Cyc, IRIS, DIG)• REASON over inconsistent ontologies (PION)• ......

Page 36: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

First result: MaRVIN

Node

Reasoning Routing

InputPool OutputPool

Node

Node

Node

DataPreparation

ResultStorage

Node

Node

statistics & visualisationMaRVIN scales by:•distribution (over many nodes)•approximation (sound but incomplete)•anytime convergence (more complete over time)

brain the size of a planet

Page 37: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Some muscle rolling

• “Lazy semantic pipes” for efficiently handling O(1010) triples• Peak inference rates at 8M triples/sec• Sustained inference rates at 4M triples/sec• deployed on laptops, servers, clusters

– remote deployment on 64 nodes• 25 plugins, 3 pipelines

– WebSPARQL– Alpha Urban LarKC – NewsSPARQL

• Substantial new datasets:– Linked Life Data (1.4B explicit, 2.3B closure, 1.3M links)– Milan traffic grid (2M explicit +2Tb sensor-data (to come))– Interest-enhanced DBLP (615k authors + interests) – LDSR (358M explit + 512 inferred, 100m URIs)

Page 38: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

We encourage further contributionsfrom external parties:

• The Large Knowledge Collider is an open, and configurable platform.

• The first public version of the Large Knowledge Collider is available now.

1. Organisations from outside the consortium can use the LarKC platform for their own purposes,

2. LarKC has formed an "early adapters group". – LarKC will actively support this group in use the Large

Knowledge Collider platform. – This group will be given

access to a high-performance computing-clusterin Germany for running LarKC on their own problems.

External parties (both academic and commercial) are welcome to contact us on this opportunity.

Page 39: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Historical perspective Implications for tools?

(thinking aloud...)

Page 40: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Any patterns in the applications? They all read from and or write tothe Linked Open Data cloud

They all do some reasoningThe reasoning is very lightweightBut happens over very many instances

Notice stark difference with Guus:

"If we cannot show added value in knowledge-rich domains, then it may have no value at all"."If we cannot show added value in knowledge-rich domains, then it may have no value at all".

Page 41: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Two tribes?Maximum Fidelity tribe:

capture the conceptual relationships within knowledge and domains

Maximum Scalability tribe: link and describe as many things as possible

Page 42: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

A history of KR?'70s: foundational theories

small axioms sets (< 102), highly intensive reasoningHayes’ Naive Physics Manifestofew but very generic rules, (almost) no instance data

'80s-'90s: KBS (102-103 rules)moderately intensive reasoning: frames, rules, semantic networksFeigenbaum’s "the power is in the knowledge"

'00-'10: Web of Data (1010 facts, 107 rules)many, many instancesLinked Open Data cloud“it’s the A-box, stupid!"

Page 43: It’s the A-box, stupid! · It’s the A-box, stupid! (free after Carvill/Clinton) Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix,

Open questionsIs this shift saying something fundamental

about knowledge representation?about knowledge engineering?about knowledge?

Should this have consequences for the tools we develop?