larkc: the large knowledge collider

Post on 06-Dec-2014

1.676 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Brief presentation of vision and mission of the LarKC project, building the Large Knowledge Collider http://www.larkc.eu

TRANSCRIPT

Frank van HarmelenVrije Universiteit Amsterdam

the Large Knowledge

Collider

Creative Commons License: allowed to share & remix,but must attribute & non-commercial

• The vision

• The project

• The consortium

• The plan

Yes!Oh Shit…

The Vision

“a configurable platform for infinitely scalable semantic web reasoning”

27-June-07

Why we needThe Large Knowledge Collider

Gartner (May 2007):"By 2012,

70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies”

• Semantic Technologies at Web Scale?– 20% of 30 billion pages @ 1000 triples per page =

6 trillion triples– 30 billion and 1000 are underestimates,

imagine in 6 years from now… – data-integration and semantic search at web-scale?

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 5 http://www.aifb.uni-karlsruhe.de/WBS

1 triple:

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 6 http://www.aifb.uni-karlsruhe.de/WBS

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 7 http://www.aifb.uni-karlsruhe.de/WBS

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 8 http://www.aifb.uni-karlsruhe.de/WBS

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 9 http://www.aifb.uni-karlsruhe.de/WBS

107 Triples[OWLIM]

Suez Canal

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 9 http://www.aifb.uni-karlsruhe.de/WBS

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 10 http://www.aifb.uni-karlsruhe.de/WBS

RDF Store subsecond querying108 Triples

[Ingenta]

Moon

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 10 http://www.aifb.uni-karlsruhe.de/WBS

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 11 http://www.aifb.uni-karlsruhe.de/WBS

~109 TriplesEarth

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 11 http://www.aifb.uni-karlsruhe.de/WBS

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 12 http://www.aifb.uni-karlsruhe.de/WBS

[LarKC proposal] ~1010 Triples ≈ 1 triple per web-page

≈ 1 triple per web-page

Jupiter

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 12 http://www.aifb.uni-karlsruhe.de/WBS

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 13 http://www.aifb.uni-karlsruhe.de/WBS

~1011 Triples

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 14 http://www.aifb.uni-karlsruhe.de/WBS

Distance Sun – Pluto

Fensel / Harmelen estimate1014 Triples

~1014 Triples

Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 14 http://www.aifb.uni-karlsruhe.de/WBS

Infinitely scalable (1/2)

• by giving up 100% correctness:• trading quality for size• often completeness is not needed• sometimes even correctness is not needed

pre

cisi

on

(sou

ndn

ess

)

recall (completeness)

logic

IR

Semantic Web

A logician’s nightmare

(Dieter Fensel)

Infinitely scalable (2/2)

• by parallelisation:• cluster computing

• wide area distribution “Thinking@home”, “self-computing semantic Web”

• cloud computing? (Amazon now, Google soon?)

“Configurable platform”

“a configurable platform for infinitely scalable semantic web reasoning”

Why “LarKC” ?

• The Large Knowledge Collider

A configurable platform

for experimentation

by others

Why “LarKC” ?

and also:1. a merry, carefree adventure. 2. innocent or good-natured mischief; a prank. 3. something extremely easy to accomplish

But also:

• The vision

• The consortium

• The project

• The plan

The consortium

50 people present

The Consortium

• Combining consortium competence– IR, Cognition– ML, Ontologies– Statistics, ML,

Cognition,DB– Logic,DB,

Probabilistic Inference– Economics,

Decision Theory

The Consortium

 

Sem

antic Web

Logic

Distribu

ted

Com

puting

Informa

tion R

etrieval

human

problem

solving

Machine Learn

ing

Prob

abilistic Inference

RD

F techno

logy

Data

base T

echnology

Use

Case 1

Use

Case 2

UIBK                      

VUA                      

CycEur                      

HLRS                      

USFD                      

MPG                      

WICI                      

Siemens                      

Ontotext                      

CEFRIEL                      

Saltlux                      

WHO-IARC                      

• The vision

• The consortium

• The project

• The plan

Oh Shit…

The project

• 10M€ budget

• 3.5 years

• 80 person years

• 3 case studies

• 14 partners

• obtained in FP7 Call1: – overall < 10% funding rate– LarKC has highest funding, longest runtime

Project Workpackages& timeline

WP1 – Conceptual Framework & Evaluation

WP 2: Retrieval and Selection

WP5: Collider Platform

WP

9:

Ex

plo

ita

tio

n a

nd

s

tan

da

rds

WP

10

: P

roje

ct

Ma

na

ge

me

nt

WP

8:

Tra

inin

g,

dis

se

min

ati

on

, c

om

mu

nit

y

bu

ild

ing

WP3: Abstraction and Learning

WP4: Reasoning and Deciding

WP 6: Use case: Real Time City

WP 7a: Use case: Early Clinical Development

WP 7b: Use case: Carcinogenesis

Reference Production

Use case: Drug Discovery • Problem: pharmaceutical R&D in early clinical

development is stagnating

(Q1Q2Q3)

FDA white paper Innovation or Stagnation (March 2004):

“developers have no choice but to use the tools of the last century to assess this century's candidate solutions.”

“industry scientists often lack cross-cutting information about anentire product area, or information about techniques that may be used in areas other than theirs”

FDA white paper Innovation or Stagnation (March 2004):

“developers have no choice but to use the tools of the last century to assess this century's candidate solutions.”

“industry scientists often lack cross-cutting information about anentire product area, or information about techniques that may be used in areas other than theirs”

“Show me any potential liver toxicity associated with the compound’s drug class, target, structure and disease.”

Show me all liver toxicity associated with the target or the pathway.

Genetics

1Q“Show me all liver toxicity associated with compounds with similar structure”

Chemistry

2Q

“Show me all liver toxicity from the public literature and internal reports that are related to the drug class, disease and patient population”LITERATURE

3Q

Current NCBI: linking but no inference

Use Case: City on-line

• Our cities face many challenges • Urban Computing

is the ICT way to address them • How can we redevelop existing neighborhoods

and business districts to improve the quality of life?

• How can we create more choices in housing, accommodating diverse lifestyles and all income levels?

• How can we reduce traffic congestion yet stay connected?

• How can we include citizens in planning their communities rather than limiting input to only those affected by the next project?

• How can we fund schools, bridges, roads, and clean water while meeting short-term costs of increased security?

• How can we redevelop existing neighborhoods and business districts to improve the quality of life?

• How can we create more choices in housing, accommodating diverse lifestyles and all income levels?

• How can we reduce traffic congestion yet stay connected?

• How can we include citizens in planning their communities rather than limiting input to only those affected by the next project?

• How can we fund schools, bridges, roads, and clean water while meeting short-term costs of increased security?

Is public transportation where the people are?Is public transportation where the people are?

Which landmarks attract more people?Which landmarks attract more people?

Where are people concentrating?Where are people concentrating?

Where is traffic moving?Where is traffic moving?

• The vision

• The consortium

• The project

• The plan

Oh Shit…

Project Timeline

• Surveys (plugins, platform)• Requirements (use cases)

Prototype Internal Release Public Release Final Release

Use Cases V1

Use Cases V2

Use Cases V3

420 6 18 3310

Communication

• Early Access Group

• Usage Competition– “we will win if we start to loose”

• We deliver:– software– publications– not “deliverables”

And Finally….

• People are already looking at us:– “Damn... the EU is where all the cool semweb work is

happening these days”– “This kind of infrastructure is exactly the kind of rocket fuel

that is needed at this stage of semweb maturity.”– “The LarKC-inspired workshop on new forms of reasoning for

the semantic web was a conference highlight for me”– “With the current growth rates of RDF on the Web, LarKC

which started out as technologically possible will quickly become operationally necessary”

– “this project really has it all (potentially) in terms of both science and impact”

• “projects already seeking collaboration:OKKAM, MUSING

“This project has the potential

to change the way people work in this area”

top related