designing indexing structure for discovering relationships in rdf graphs stanislav bartoň

14
Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Upload: peregrine-ellis

Post on 16-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Designing Indexing Structure for Discovering Relationships in

RDF Graphs

Stanislav Bartoň

Page 2: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

RDF & RDF Schema

• Triple (Object, Property, Subject)

• Objects identified by its URI

• Subject – an object or explicit value

• Special semantics added to certain resources (e.g. rdfs:class, rdfs:subclass)

Page 3: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň
Page 4: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Known approaches to discovering associations in RDF

graphs

• Using graph algorithms on real data, or

• Path and Schema indices– 2D array of paths between Classes i and j

within one Schema– An array of interconnections between Schemas

Page 5: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Tree Signatures• based on Dietz numbering scheme

• immediate knowledge of mutual position of any two nodes within a signature

Page 6: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Transforming the graph into forest of trees

• The RDF graph is generic directed graph possibly containing cycles– two situations can violate the tree structure:

• Cycles

• Nodes with in-degree > 1

=> Transform the graph into forest of trees where the tree signatures could be applied

Page 7: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Transforming the graph into forest of trees

• In-degree > 1 transformation:

Page 8: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Transforming the graph into forest of trees

• Cycle transformation:

Page 9: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Transforming the graph into forest of trees

1. The first transformation breaks the graph into several components.

2. Individual components within the graph are identified via reachability.

3. Cycles are detected within a component by inappropriate amount of edges.

4. The signature is then built to each component.

The total time complexity is then O(4n) => O(n).

Page 10: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Tree Signature indices

• There are two indices built to keep track of– which nodes have been ‘divided’ and to which

signatures they belong, and – which multiple nodes are contained in each

signature

• The indices are built along the creation of signatures

Page 11: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Path algorithm

1. Takes the start and end node as an input2. Current node = start node, start signature

= current signature.3. Finds all the multiple nodes above the

current node in the current signature.4. Traverses all the new possibilities until it

either does not find the end node or it does not have any possibilities left.

Page 12: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Connection algorithm

• The problem of finding intersecting paths is reduced to finding the multiple node to which exists a path from both starting nodes.

• The algorithm is keeping the set of reachable multiple nodes to each starting node.

• Each node gets one turn to enlarge its set of usable multiples in each iteration.

• After each iteration the sets of reachable multiples are intersected.

Page 13: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Conclusion and future work

• The algorithms are time intensive on large scale data => further optimization.

• Both algorithms suffer from the disability of telling the mutual position of two nodes within the graph => second level indexing structure.

• Proposed indexing structure is less memory intensive than the Path and Schema indices.

• Further support of Rho iso operator.

Page 14: Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň

Thank you for your attention.