evaluating reachability queries over path collections* p. bouros 1, s. skiadopoulos 2, t. dalamagas...
Post on 19-Dec-2015
216 views
TRANSCRIPT
Evaluating Reachability Queries over Path Collections*
P. Bouros1, S. Skiadopoulos2, T. Dalamagas3, D. Sacharidis3, T. Sellis1,3
1National Technical University of Athens2University of Peleponnese
3Institute for Management of Information Systems – R.C. Athena
HDMS'09
* Long version of SSDBM’09 paper
Introduction (I)
• Several applications store and query large collections of data sequences– Recent advances in GIS and geoservices resulted
in large volumes of routes (e.g., Points of Interest (POIs) sequences)
• Route collections– Points => nodes– Sequences => routes
HDMS'09
Introduction (II)
• Web sites retain huge collections of routes– ShareMyRoutes.com– TravelByGPS.com
• People visiting Athens– Track their sightseeing– Create routes of
interesting places
• Frequent updates– Users upload new routes
HDMS'09
Problem
• Route collections1. Too large to fit in main
memory2. Frequently updated,
adding new routes
• Reachability queries– Q: path from Academy to
Zappeion– A: Academy -> University
of Athens (change to route p2) -> Parliament-> Zappeion
HDMS'09
Problem
• Route collections1. Too large to fit in main
memory2. Frequently updated,
adding new routes
• Reachability queries– Q: path from Academy to
Zappeion– A: Academy -> University
of Athens (change to route p2) -> Parliament-> Zappeion
HDMS'09
Why not a graph-based solution?
• Transform route collection P into graph GP
1) Searching: depth or breadth-first search• Low storage and maintance cost• Slow query evaluation
2) Enconding transitive closure:1)Fast query evaluation2)Expensive precomputation, not for frequently updated graphs
1)2-hop [CH+02], HOPI [STW05] 2)DAGs: Geometric-based & partitioning 2-hop [CY+06,08], interval LB
[AB+89]3)GRIPP [TL07]
HDMS'09
Outline
• The pfs algorithm– Indexing route collections– Indexing route transitions
• Index maintenance• Experimental evaluation• Conclusions and Further work
HDMS'09
The pfs algorithm (I)
• Path-first search, basic idea: – Examine part of routes at once, not single nodes
• Extend depth-first search– Work with routes instead of graph edges
• For each route p containing current node v– Visit each node after v (successor) in p– Push to dfs stack set of successors at once
HDMS'09
The pfs algorithm (II)
• Find a path from node F to C
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfs algorithm (II)
• Find a path from node F to C
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfs algorithm (II)
• Find a path from node F to C
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfs algorithm (II)
• Find a path from node F to C
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)• Answer:
(F, D, N, B, C)
P-Index
• Inverted index on route collections– For each node store
routes containing it
• Access paths containing current node
• Better termination condition => pfsP– Identify a path containing
current node before target
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
node routes list
A <p1,1>, <p2,1>, <p5,1>
B <p1,2>, <p2,5>, <p4,3>
C <p1,3>
D <p1,4>, <p2,3>, <p4,1>
… …
P-Index
• Inverted index on route collections– For each node store
routes containing it
• Access paths containing current node
• Better termination condition => pfsP– Identify a path containing
current node before target
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
node routes list
A <p1,1>, <p2,1>, <p5,1>
B <p1,2>, <p2,5>, <p4,3>
C <p1,3>
D <p1,4>, <p2,3>, <p4,1>
… …
P-Index
• Inverted index on route collections– For each node store
routes containing it
• Access routes containing current node
• Better termination condition => pfsP– Identify a route
containing current node before target
HDMS'09
node routes list
A <p1,1>, <p2,1>, <p5,1>
B <p1,2>, <p2,5>, <p4,3>
C <p1,3>
D <p1,4>, <p2,3>, <p4,1>
… …p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfsP algorithm
• Find a path from F to T
HDMS'09
node routes list
… ….
F <p2,2>, <p4,4>, <p5,2>
… …
T <p2,6>
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfsP algorithm
• Find a path from F to T
HDMS'09
JOIN
JOIN
node routes list
… ….
F <p2,2>, <p4,4>, <p5,2>
… …
T <p2,6>
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfsP algorithm
• Find a path from F to T
HDMS'09
JOIN
JOIN
node routes list
… ….
F <p2,2>, <p4,4>, <p5,2>
… …
T <p2,6>
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfsP algorithm
• Find a path from F to T
• Answer: (F, D, N, B, T)
HDMS'09
JOIN
JOIN
node routes list
… ….
F <p2,2>, <p4,4>, <p5,2>
… …
T <p2,6>
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
H-graph (I)
HDMS'09
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
H-graph (I)
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
HDMS'09
p1 (A, B, C, D, J)
p4 (D, N, B, F, K)
H-graph (I)
HDMS'09
p1 (A, B, C, D, J)
p4 (D, N, B, F, K)
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
H-graph (I)
HDMS'09
p1 (A, B, C, D, J)
p4 (D, N, B, F, K)
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
H-graph (I)
HDMS'09
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
H-graph (II)
• Find a path from node F to J
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
H-graph (II)
• Find a path from node F to J
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
H-graph (II)
• Find a path from node F to J
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)• Answer: (F, D, J)
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
p1 p2
B,D
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
JOIN
JOIN p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
JOIN
JOIN p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
JOIN
JOIN p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
• Answer: (F, D, J)
Index maintenance
• P-Index, H-Index as inverted files on disk– Updates -> adding new routes– Not consider each new route separately– Batch updates, consider set of new routes
• Basic idea:– Build memory resident P-Index, H-Index for new
routes– Merge disk-based indices with memory resident
onesHDMS'09
Outline
• The pfs algorithm– Indexing route collections– Indexing route transitions
• Index maintenance• Experimental evaluation• Conclusions and Further work
HDMS'09
Setup
• Synthetic route collections– |P|, lavg, |V|, zipf, U
• Compare– Convert collection to graph, dfs & adjacency lists– pfsP & P-Index– pfsH & P-Index, H-Index
• Construction cost, query evaluation, vary one of |P|, lavg, |V|, zipf
• Maintenance cost, vary UHDMS'09
Index construction
HDMS'09
|P| (x 103)lavg = 10, |V| = 100000, zipf = 0.8
|V| (x 103)|P| = 100000, lavg = 10, zipf = 0.8
Query evaluation (I)
HDMS'09
|P| (x 103)lavg = 10, |V| = 100000, zipf = 0.8
lavg
|P| = 100000, |V| = 100000, zipf = 0.8
Query evaluation (II)
HDMS'09
|V| (x 103)|P| = 100000, lavg = 10, zipf = 0.8
zipf|P| = 100000, lavg = 10, |V| = 100000
Conclusions
• Reachability queries over frequently updated route collections
• The path-first search (pfs) algorithm– Indexing route collections: P-Index & pfsP– Indexing route transitions: H-Index & pfsH
• Handling frequent updates, adding new routes• Experimental evaluation
– P-Index & pfsP, low construction & maintance cost– H-Index, P-Index & pfsH, fast query evaluation
HDMS'09
Further work
• Ongoing– New index that combines P-Index & H-Index
advantages• Low constructing and maintenance cost• Fast query evaluation
• Future work– Other types of queries
• Considering constraints
HDMS'09