on graph query optimization in large networks
DESCRIPTION
On Graph Query Optimization in Large Networks. Alice Leung ICS 624 4/14/2011. The problem. Dramatic proliferation of sophisticated networks Need for effective querying and mining methods for large-scale graph-structured data. The problem. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/1.jpg)
On Graph Query Optimization in Large Networks
Alice LeungICS 624
4/14/2011
![Page 2: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/2.jpg)
The problem• Dramatic proliferation
of sophisticated networks– Need for effective
querying and mining methods for large-scale graph-structured data
![Page 3: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/3.jpg)
The problem• How to search graph structures efficiently within a
large network?• Two main challenges:
– Graph query: NP-complete– Networks are heterogeneous and large, hindering direct application of well-known graph matching methods.
• This paper focus on connected, undirected simple graphs with no weights assigned on edges
![Page 4: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/4.jpg)
Example network graphs:
![Page 5: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/5.jpg)
![Page 6: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/6.jpg)
![Page 7: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/7.jpg)
Proposed solution: SPath• A graph indexing technique that makes use of
neighborhood signatures of vertices for indexing• Decompose a query graph into a set of shortest
paths, then pick a subset of candidate paths with high selectivity
• Join those candidate paths to reconstruct the original query graph
• Graph matching is performed in a path-at-a-time manner different from the usual vertex-at-a-time
![Page 8: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/8.jpg)
Problem definition
![Page 9: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/9.jpg)
![Page 10: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/10.jpg)
The Pattern-based Graph Indexing Framework
• Introducing a baseline algorithmic framework with no indexing techniques exploited.
• In order to improve the query performance. Framework is extended by structural patterns for graph indexing.
• As a result, path-based graph indexing mechanism is selected as a feasible solution in large network.
![Page 11: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/11.jpg)
The baseline algorithm framework
![Page 12: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/12.jpg)
Matches vi over C(vi) then recursively match the subsequent vertex vi+1 or output f if every vertex of Q is matched in G.
If there’s no match, go back to the previous stage
Start with finding matching candidates C(v) for each vertex
See if vi can be mapped to u by considering the preservation of structural connectivity
![Page 13: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/13.jpg)
Structural pattern based graph indexing
• Answering graph queries is very costly, and it becomes even more challenging when the network is large and diverse
• To alleviate the time-consuming exhaustive search in graph query processing, aim to minimize search space size.
![Page 14: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/14.jpg)
Minimizing search space size
![Page 15: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/15.jpg)
Baseline algorithm: Search space size • N = |V(Q)| since it is performed in a vertex-at-a-time
manner– Can be reduced by indexing a set of structural patterns
first, then do path-at-a-time
• Every u ϵ C(vi) is a potential matching vertex. Many false positives– Help if we can pre-prune false positives– Consider k-neighborhood induced subgraph Gk
u. Contains all vertices within k hops away from u.
![Page 16: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/16.jpg)
Picking structural patterns
• Baseline algorithm does not consider any structural patterns but vertex labels only for indexing.
![Page 17: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/17.jpg)
Picking structural patterns• Which kinds of structural patterns are the most
suitable for graph indexing on large networks?• Has exponential number of possible patterns even
for small k.• Need careful selection of indexing solution that lies
between indexing-nothing and indexing-everything
![Page 18: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/18.jpg)
Structural Pattern Evaluation Model
• Focus on two cost-sensitive aspects:– Feature selection cost (Cs): for identifying a pattern from
the k-neighborhood subgraph– Feature pruning cost (Cp): for checking whether there
exists a pattern p’ in the k-neighborhood subgraph• n(n’) is the number of such patterns
![Page 19: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/19.jpg)
Structural Pattern Evaluation Model
• Path excel trees and graphs as good indexing patterns in large network– use shortest paths for graph indexing, which can be easily
reconstructed during graph query processing
![Page 20: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/20.jpg)
SPath• A path-based graph indexing technique on large
networks• The principle of it is to use shortest paths within the
k-neighborhood subgraph of each vertex of the graph to capture the local structural information around the vertex.
![Page 21: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/21.jpg)
SPath• Neighborhood signatures of vertices are built to
maintain indexing features: Effective search space pruning ability
• Processing (Query Decomposition): Decompose the query graph into a set of indexed shortest paths in S-Path
![Page 22: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/22.jpg)
Instantiate the path, then test joinability between the new path with all previously instantiate paths.
If every edge in the Q has been covered by some paths in I, a matching f is found as output. If not, pick another path
Pruning. Getting the reduced matching candidates
Check the join predicated between the pu and every path pi in Q
Start query processing
Select an optimal path to initiate recursive search
![Page 23: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/23.jpg)
Neighborhood Signature (cont.)
• Slk(u) is he set of vertices k hops away from u and
having the vertex label l.
![Page 24: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/24.jpg)
Neighborhood Signature (cont.)
• NS(u) maintains all k-distance sets of u from k = 0 up to k = k0.
![Page 25: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/25.jpg)
Neighborhood Signature (Example)
• Distance set:– 1-distance set S1(u) is {B: {2}, C: {3}}– 2-distance set S2(v) is {A: {4,6}, B: {5}}
• If ko is set to 2, – NS(u1) = { {A: {1}, B: {2}, C: {3}}, {A: {4,6}, B: {5}}}– NS(v1) = { {A: {1}, { B: {2}, C: {3}}, {C: {4}}}
![Page 26: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/26.jpg)
Neighborhood Signature (cont.)
• Based on Theorem 2, if NS(v) is not contained in NS(u), u is a false positive and be pruned, thus reducing search space
![Page 27: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/27.jpg)
SPath Implementation
![Page 28: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/28.jpg)
SPath Implementation (example)
Network
A global lookup table Neighborhood signature of v3
Query
![Page 29: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/29.jpg)
SPath Implementation (cont.)• Principle: both lookup table and histograms can be
maintained as space-efficient data structure. NS containment testing can be performed without referring to the exact vertex information stored in ID-lists.
• ID-lists can be very large, but only accessed during the graph query processing phase.
![Page 30: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/30.jpg)
Graph Query Processing
![Page 31: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/31.jpg)
Query Decomposition
![Page 32: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/32.jpg)
Query Decomposition (cont.)
• Based on this, the shortest paths originated from v with length no greater than k* is selected.
![Page 33: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/33.jpg)
Query Decomposition (Example)
![Page 34: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/34.jpg)
Path Selection and Join
![Page 35: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/35.jpg)
Path Instantiation
![Page 36: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/36.jpg)
Experimental evaluation• Compared SPath with GraphQL using a yeast protein
interaction network as dataset• 1) Index construction cost:
– SPath grows linearly as k0 increases from 0 to 4
![Page 37: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/37.jpg)
Experimental evaluation (cont.)
• Tested clique queries.• Instantiation takes up the majority of the time
![Page 38: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/38.jpg)
Experimental evaluation (cont.)
• Tested path and subgraph queries• SPath has a speedup of up to 4 times, due to signature
containment pruning• Each steps takes less time than that for clique queries
![Page 39: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/39.jpg)
Conclusion
![Page 40: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/40.jpg)
Problems• Many large networks change rapidly, incremental
update of graph indexing structures becomes important.
• Need to extend the method to support approximate graph queries as well to accommodate noise and failure in the networks
![Page 41: On Graph Query Optimization in Large Networks](https://reader035.vdocument.in/reader035/viewer/2022062517/56812eff550346895d949da3/html5/thumbnails/41.jpg)
Questions• Is modeling networks as large graph the most
efficient?• Ways for SPath to deal with incremental update?