slider: an efficient incremental reasoner, by jules chevalier

30
Slider: an Efficient Incremental Reasoner Jules Chevalier [email protected] LT2C, Télécom Saint Etienne, Université Jean Monnet Octobre 2014 Supervisors : Fréférique Laforest Christophe Gravier Julien Subercaze

Upload: opencloudware

Post on 13-Jul-2015

71 views

Category:

Internet


0 download

TRANSCRIPT

Slider: an Efficient Incremental Reasoner

Jules [email protected]

LT2C, Télécom Saint Etienne, Université Jean Monnet

Octobre 2014

Supervisors :Fréférique LaforestChristophe GravierJulien Subercaze

Summary

Introduction

State of the art

Contribution

Experimental results

Conclusion

2 / 27

Semantic Web & Description Logic

I Formalises concepts to represent themI Standardizes this representationI Makes it readable for both humans and computersI Links these data togetherI Allows automatic operations on these data

I Integrity constraint validationI Query the knowledge baseI Extraction of implicit data

3 / 27

Semantic Web & Description Logic

I Formalises concepts to represent themI Standardizes this representationI Makes it readable for both humans and computersI Links these data togetherI Allows automatic operations on these data

I Integrity constraint validationI Query the knowledge baseI Extraction of implicit data = Reasoning

4 / 27

Reasoning : Forward Chaining VS Backward Chaining

Abraham

Homer Marge

Liza Bart

I What we know :I Abraham father HomerI Homer father LizaI Homer father BartI Marge mother LizaI Marge mother bart

I What Forward Chaining do :I Abraham grandfather LizaI Abraham grandfather BartI ...I Abraham grandfather Liza ? → yes

I What Backward Chaining do :I Abraham grandfather Liza ?I Abraham father X & X father Liza ?I Abraham father Homer &

Homer father Liza → yes

5 / 27

Reasoning : Forward Chaining VS Backward Chaining

Abraham

Homer Marge

Liza Bart

I What we know :I Abraham father HomerI Homer father LizaI Homer father BartI Marge mother LizaI Marge mother bart

I What Forward Chaining do :I Abraham grandfather LizaI Abraham grandfather BartI ...I Abraham grandfather Liza ? → yes

I What Backward Chaining do :I Abraham grandfather Liza ?I Abraham father X & X father Liza ?I Abraham father Homer &

Homer father Liza → yes

5 / 27

Reasoning : Forward Chaining VS Backward Chaining

Abraham

Homer Marge

Liza Bart

I What we know :I Abraham father HomerI Homer father LizaI Homer father BartI Marge mother LizaI Marge mother bart

I What Forward Chaining do :I Abraham grandfather LizaI Abraham grandfather BartI ...I Abraham grandfather Liza ? → yes

I What Backward Chaining do :I Abraham grandfather Liza ?I Abraham father X & X father Liza ?I Abraham father Homer &

Homer father Liza → yes5 / 27

Rule-based ReasoningRules

I An antecedent: Allows the rule to be executedI A consequent: The statement inferred

c1 subClassOf c2, x type c1 (cax-sco)x type c2

FragmentsI A fragment is a set of inference rulesI Semantic Web standards suggest different pre defined

fragments (RDFS, OWL Lite, OWL Full, OWL DL, ...)I The more they have a high expressivity, the more the

operations are complex (from P to NEXPTIME)I Choosing one fragment is trade off between expressivity and

computational complexity

6 / 27

Problematic

What we want to doI Efficient forward-reasoning over Big Data (NP complete for

most complex ontologies)

What are the problemsI Rules form a cyclic graph

I Complexity depends on the fragment !I The amount of triples generated is quite unpredictable

I The complexity also depends on data !I Big Data is not static

I We need to handle data streams !

7 / 27

Problematic

What we want to doI Efficient forward-reasoning over Big Data (NP complete for

most complex ontologies)

What are the problemsI Rules form a cyclic graph

I Complexity depends on the fragment !I The amount of triples generated is quite unpredictable

I The complexity also depends on data !I Big Data is not static

I We need to handle data streams !

7 / 27

Summary

Introduction

State of the art

Contribution

Experimental results

Conclusion

8 / 27

Batch reasoning approachesWebPie : a Web-scale Parallel Inference Engine

I 2009 - Jacopo Urbani Thesis [6]I Uses MapReduce for OWL Horst and RDFS reasoning

I 2011 - Fix some issues to improve OWL Horst reasoning [7]I Duplicates limitationI Indexation for sameAsI Greedy schedulingI Cleaner Job after some rules, or at the end

MapResolve [5]

I Based their work on WebPie to develop a OWL Horst reasonerI Use 3 sets for triples : usable, used, inferredI Limits overheads, optimiseI Points out MapReduce limitations

9 / 27

Analysis : MapReduce approaches

MapReduceFramework

I Allows to implement distributedtasks

I The Hadoop frameworkI Best suited to batch process huge

amounts of data

I MapReduce requires an acyclicdataflow

I Jobs run in isolationI Not suitable network shufflingI Hadoop distributed file system

WebPie and MapResolveContributions

I Only provide batch reasoningI Nodes must wait for each otherI Generate a lot of duplicatesI Fragment dependantI Naive partitioning

I Critical letter for WebPie [4]

10 / 27

Incremental solutionsHistory Matters: Incremental Ontology Reasoning UsingModules [2]

I Maintains classification of ontologies as they evolveI Provides encouraging resultsI Not viable for static hierarchy of ontologiesI Not adapted on high number of nominals

Incremental Reasoning in OWL EL without Bookkeeping [3]

I Handles both addition and deletion of knowledgeI Incremental classification of TBoxI Limited to the classification on the TBoxI Dedicated to the EL+ fragment

11 / 27

Summary

Introduction

State of the art

Contribution

Experimental results

Conclusion

12 / 27

Proposed solution

SliderI Parallel and Scalable Execution

I Rules mapped to independent modulesI Multiple rule instances allowed to run in parallel

I Duplicates LimitationI Shared triple storeI Vertical partitioning [1] and multiple indexing

I Data Stream SupportI Streamed architectureI Parallel parsing/reasoning

I Fragment’s CustomizationI Dynamic support of rulesetI ρdf and RDFS natively supportedI Extendible to any other fragment

13 / 27

Architecture

TRIPLE STORE

EvolvingData

Newtriples

Explicit Triples

Implicit Triples

Streamed Triples

R2

Buffer R3Distributor R3

Distributor R2

Distributor R1

Buffer R2

Buffer R1

R2R1

R2

R1R3

Input Manager Rules Buffers Thread Pool Distributors

R1

R1

R2

R2

R3

R3

Incomingtriples

Concurrent Access

Rule Modules

InputManager

R1

R2

R3

R3

R1

R2

14 / 27

Architecture

Input ManagerI Receives incoming triplesI Sends them to

I The triple storeI The rules buffers

Rules BuffersI A buffer for each ruleI Run the rule when fullI Run the rule when

timed-outI Ensures completeness

Thread PoolI Manages a pool instancesI Ensures scalability

Rule instanceI Execute the inferenceI Access concurrently the

triple store

DistributorI Stores inferred triplesI Dispatches them to the

buffers

15 / 27

Inference: cax-sco

16 / 27

Triple Store

Vertical Partitioning

21

34

57

689

(1,2,3)(4,2,5)(6,7,8)(6,7,9)

TRIPLES ENCODING

Near-optimal indexingI Indexing by predicates, subjects

and objectsI Best trade-off for nearly all rules

from the OWL fragments

Concurrent AccessI ReentrantReadWriteLocks

ensure concurrencyI Write lock to add triplesI Read lock for other methods

Duplicates EliminationI HashMap of MultiMaps∗

I Bans duplicatesI Ensures uniqueness of triples

∗Google’s Guava libraries

17 / 27

Rules Dependency Graph

I Directed graphI Edges represent rulesI A → B: B can use the output

of A

I Created at initialisation timeI Used to route new triples by

I The input managerI The distributors

PRP-DOM

CAX-SCO

PRP-RNG

PRP-SPO1

SCM-SCO

SCM-DOM2

SCM-RNG2

SCM-SPO

Universal Input

Rules Dependency Graph for ρdf

18 / 27

Architecture

TRIPLE STORE

EvolvingData

Newtriples

Explicit Triples

Implicit Triples

Streamed Triples

R2

Buffer R3Distributor R3

Distributor R2

Distributor R1

Buffer R2

Buffer R1

R2R1

R2

R1R3

Input Manager Rules Buffers Thread Pool Distributors

R1

R1

R2

R2

R3

R3

Incomingtriples

Concurrent Access

Rule Modules

InputManager

R1

R2

R3

R3

R1

R2

19 / 27

Summary

Introduction

State of the art

Contribution

Experimental results

Conclusion

20 / 27

Experimentations

BaselineI OWLIM-SE (Standard Edition)I Semantic repository with reasoning featuresI Fastest reasoner available to the best of our knowledgeI Outperforms Jena and SesameI Natively supports RDFS, custom rule configuration for ρdf

DatasetI 13 ontologies from 3 sets:

I 2 Real life ontologies: WordNet and WikipediaI 5 generated by BSBM, from 100,000 to 5 million triplesI 6 subClassOf ontologies (closure computation, duplicates

intensive)

21 / 27

Experiments

ρdf reasoning RDFS reasoningOntology OWLIM Slider OWLIM SliderBSBM_100k 9.907s 4.636s 7.487s 4.558sBSBM_200k 13.338s 6.059s 11.064s 6.198sBSBM_500k 23.595s 11.133s 20.580s 10.984sBSBM_1M 39.364s 22.357s 35.602s 22.192sBSBM_5M 170.151s 126.292s 160.699s 127.037swikipedia 18.802s 17.422s 17.186s 22.443swordnet - - 15.075s 8.828ssubClassOf10 3.507s 1.209s 1.423s 1.216ssubClassOf20 3.730s 1.316s 1.536s 1.330ssubClassOf50 4.159s 1.615s 1.865s 1.583ssubClassOf100 4.397s 1.827s 2.242s 1.805ssubClassOf200 4.962s 2.210s 2.837s 2.170ssubClassOf500 9.862s 8.102s 7.584s 7.625s

ImprovementI Average 71.47%I RDFS 36.08%I ρdf 106.86%

RDFS

owlimse

slider

5

10

15

20

25

30

35

40

Infe

rence

Tim

e(i

n s

eco

nds)

ρdf

51015202530354045

BSBM

_100

k

BSBM

_200

k

BSBM

_500

k

BSBM

_1M

wikip

edia

wordn

et

subC

lass

Of10

subC

lass

Of20

subC

lass

Of50

subC

lass

Of100

subC

lass

Of200

subC

lass

Of500

owlimse

slider

Infe

rence

Tim

e(i

n s

eco

nds)

Inference time for Slider and OWLIM-SE on ρdf and RDFS

22 / 27

Demonstration

23 / 27

Summary

Introduction

State of the art

Contribution

Experimental results

Conclusion

24 / 27

Conclusion and Future Work

SliderI Efficient incremental rule-based reasoningI Fragment agnocismI Data streams supportI Improvement of 71.47% in average against baseline

Future WorkI Implementation of new rulesetsI Just-in-time optimisation of rules schedulingI Use of historical statistics for adaptation

25 / 27

Bibliography I

[1] Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K.Scalable semantic web data management using vertical partitioning.In Proceedings of the 33rd International Conference on Very Large Data Bases (2007), VLDB ’07, VLDBEndowment, pp. 411–422.

[2] Cuenca Grau, B., Halaschek-Wiener, C., and Kazakov, Y.History matters: Incremental ontology reasoning using modules.In The Semantic Web. 2007.

[3] Kazakov, Y., and Klinov, P.Incremental reasoning in owl el without bookkeeping.In ISWC 2013. 2013.

[4] Patel-Schneider, P.Letter: Comments on WebPIE: A Web-scale parallel inference engine using MapReduce.Web Semantics: Science, Services and Agents on . . . 15 (Sept. 2012), 69–70.

[5] Schlicht, A., and Stuckenschmidt, H.Mapresolve.Web Reasoning and Rule Systems 6902 (2011), 294–299.

[6] Urbani, J.RDFS/OWL reasoning using the MapReduce framework.PhD thesis, Vrije Universiteit - Faculty of Sciences, 2009.

[7] Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., and Bal, H. E.WebPIE: A Web-scale parallel inference engine using MapReduce.J. Web Sem. 10 (2012), 59–75.

26 / 27

Slider: an Efficient Incremental Reasoner

Thank you for your [email protected]

https://github.com/juleschevalier/sliderhttp://demo-satin.telecom-st-etienne.fr/slider/

27 / 27