1 qsx: querying social graphs graph queries and algorithms graph search (traversal) pagerank nearest...

50
1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Upload: charleen-rogers

Post on 21-Jan-2016

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

1

QSX: Querying Social Graphs

Graph Queries and Algorithms

Graph search (traversal)

PageRank

Nearest neighbors

Keyword search

Graph pattern matching

Page 2: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

2

Basic graph queries and algorithms

Graph search (traversal)

PageRank

Nearest neighbors

Keyword search

Graph pattern matching (a full treatment of itself)

Widely used in graph algorithms

Algorithms in MapReduce (lecture 4) and other parallel models (lecture 5)Algorithms in MapReduce (lecture 4) and other parallel models (lecture 5)

Page 3: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Graph representation

33

Page 4: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Gen

4

Directed graph G = (V, E, fA) attributes fA(u): a tuple (A1 = a1, ..., An = an)

Social Graphs

Med

Soc Eco

AI

Chem

(‘dept’=CS, ‘field’=AI) (‘dept’=CS, ‘field’=AI)

(‘dept’=CS, ‘field’=DB) (‘dept’=CS, ‘field’=DB) (‘dept’=Bio, ‘field’=Gen) (‘dept’=Bio, ‘field’=Gen)

(‘dept’=Bio, ‘field’=Eco) (‘dept’=Bio, ‘field’=Eco)

Social graphs: attributes, extensible with edge labels

DBDB

label, keywords, blogs, comments, rating …label, keywords, blogs, comments, rating …

4

EcoEco

Page 5: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

5

Relational representation

Node relation: node( nodeId, label, attributes)e.g., node(02, book, “Graphs”), node(03, author, “John Doe”)

Edge relation: edge( sid, did, label)sid, did: source and destination nodes; e.g., edge(03, 02, write)

Possibly with an attribute relation

Nontrivial to leverage existing relational techniques

Pros and cons Lossless: the original graph can be reconstructed Ignore topological structure Querying efficiency: Requires multi-table joins or self joins for

simple queries, e.g., book[author=“Bush”]/chapter/title requires 3 joins of the edge relation!

Page 6: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Adjacency Matrices

Represent a graph as an n x n square matrix M– n = |V|– Mij = 1 means a link from node i to j

1 2 3 41 0 1 0 12 1 0 1 13 1 0 0 04 1 0 1 0

1

2

3

4

6

Pros and cons Connectivity: O(1) time (rows: outlinks, columns: inlinks)

Too costly to be practical: when G is large, M is |V|2

Inefficiency: real-life graphs are often sparse

Page 7: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Adjacency Lists

Each node carries a list of adjacent edges

1: 2, 42: 1, 3, 43: 14: 1, 3

1 2 3 41 0 1 0 12 1 0 1 13 1 0 0 04 1 0 1 0

7

Pros and cons

efficient, especially for sparse graphs;

easy to compute over outlinks

difficult to compute over inlinks

Page 8: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Graph search (traversal)

88

Page 9: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

9

Path queries

Reachability• Input: A directed graph G, and a pair of nodes s and t in G• Question: Does there exist a path from s to t in G?

What do you know about these? 9

Distance• Input: A directed weighted graph G, and a node s in G• Output: The lengths of shortest paths from s to all nodes in G

Regular path• Input: A node-labeled directed graph G, a pair of nodes s and t

in G, and a regular expression R• Question: Does there exist a path p from s to t that satisfies R?

Page 10: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

10

Reachability queries

How to evaluate reachability queries?10

Applications: a routine operation• Social graphs: are two people related for security reasons?

• Biological networks: find genes that are (directly or indirectly)

influenced by a given molecule

Nodes: molecules, reactions or physical interections

Edges: interactions

Reachability• Input: A directed graph G, and a pair of nodes s and t in G• Question: Does there exist a path from s to t in G?

Page 11: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

11

Breadth-first search

What is the complexity?11

BFS (G, s, t):

1. while Que is nonempty do

a. v Que.dequeue();

b. if v = t then return true;

c. for all adjacent edges e = (v, u) of v do

a) if not flag(u)

then flag(u) true; enqueue u onto Que;

2. return false

Use (a) a queue Que, initialized with s, (b) flag(v) for each node, initially false; and (c) adjacency lists to store G

Use (a) a queue Que, initialized with s, (b) flag(v) for each node, initially false; and (c) adjacency lists to store G

Why do we need the test?Why do we need the test?

Breadth-first, by using a queue Breadth-first, by using a queue

Complexity: each node and edge is examined at most once

Page 12: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

12

Breadth-first search

How to strike a balance?12

BFS (G, s, t): O(|V| + |E|) time and space

1. while Que is nonempty do

a. v Que.dequeue();

b. if v = t then return true;

c. for all adjacent edges e = (v, u) of v do

a) if not flag(u)

then flag(u) true; enqueue u onto Que;

2. return false

Reachability: NL-complete

O(1) time?

Too costly as a routine operation when G is largeToo costly as a routine operation when G is large

Yes, adjacency matrix, but O(|V|2) space

Page 13: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

13

2-hop cover

A number of algorithms for reachability queries (see reading list)13

For each node v in G,

2hop(v) = (Lin(v), Lout(v))

•Lin(v): a set of nodes in G that can reach v

•Lout(v): a set of nodes in G that v can reach To ensure: node s can reach t if and only if

Lout(s) Lin(t) •Testing: better than O(|V| + |E|) on average •Space: O(|V| |E|1/2)

Find a minimum 2-hop cover? NP-hardFind a minimum 2-hop cover? NP-hard

Maintenance cost in response to changes to G? Left as a project (LN 7)Maintenance cost in response to changes to G? Left as a project (LN 7)

Page 14: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

14

Distance queries

14

Distance: single-source shortest-path problem• Input: A directed weighted graph G, and a node s in G• Output: The lengths of shortest paths from s to all nodes in G

Application: transportation networks

Dijkstra (G, s, w):

1. for all nodes v in V do

a. d[v] ;

2. d[s] 0; Que V;

3. while Que is nonempty do

a. u ExtractMin(Que);

b. for all nodes v in adj(u) do

a) if d[v] > d[u] + w(u, v) then d[v] d[u] + w(u, v);

Use a priority queue Que; w(u, v): weight of edge (u, v); d(u): the distance from s to u

Use a priority queue Que; w(u, v): weight of edge (u, v); d(u): the distance from s to u

Extract one with the minimum d(u)Extract one with the minimum d(u)

Page 15: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Example: Dijkstra’s algorithm

15

0s

a b

c d

10

5

2 3

1

9

7

2

4 6

Q = {s,a,b,c,d}d: {(a,∞), (b,∞), (c,∞), (d,∞)}

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

Page 16: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

16

0s

10

5

a b

c d

10

5

2 3

1

9

7

2

4 6

Q = {a,b,c,d} d: {(a,10), (b,∞), (c,5), (d,∞)}

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

Example: Dijkstra’s algorithm

Page 17: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

17

0s

8

5

14

7

a b

c d

10

5

2 3

1

9

7

2

4 6

Q = {a,b,d} d: {(a,8), (b,14), (c,5), (d,7)}

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

Example: Dijkstra’s algorithm

Page 18: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

18

0s

8

5

13

7

a b

c d

10

5

2 3

1

9

7

2

4 6

Q = {a,b}d: {(a,8), (b,13), (c,5), (d,7)}

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

Example: Dijkstra’s algorithm

Page 19: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

19

0s

8

5

9

7

a b

c d

10

5

2 3

1

9

7

2

4 6

Q = {b} d: {(a,8), (b,9), (c,5), (d,7)}

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

Example: Dijkstra’s algorithm

Page 20: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

20

0s

8

5

9

7

a b

c d

10

5

2 3

1

9

7

2

4 6

Q = {} d: {(a,8), (b,9), (c,5), (d,7)}

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

Example: Dijkstra’s algorithm

O(|V| log|V| + |E|). A beaten-to-death topic ?

How to speed it up by leveraging parallelism?How to speed it up by leveraging parallelism?Shortest paths in terriain? New spatial constraintsShortest paths in terriain? New spatial constraints

Page 21: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

21

Why do we care about regular path queries?21

Regular simple path• Input: A node-labeled directed graph G, a pair of nodes s and t

in G, and a regular expression R• Question: Does there exist a simple path p from s to t such

that the labels of adjacent nodes on p form a string in R?

Regular path queries

What is the complexity?

NP-complete, even when R is a fixed regular expression (00)* or 0*10*.

In PTIME when G is a DAG (directed acyclic graph)

A. Mendelzon, and P. T. Wood. Finding Regular Simple Paths In Graph Databases. SICOMP 24(6), 1995

Patterns of social linksPatterns of social links

Page 22: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

22Graph queries are nontrivial, even for path queries

22

Regular path• Input: A node-labeled directed graph G, a pair of nodes s and t

in G, and a regular expression R• Question: Does there exist a path p from s to t such that the

labels of adjacent nodes on p form a string in R?

Regular path queries

What is the complexity?

PTIME

Show that the regular path problem is in O(|G| |R|) timeShow that the regular path problem is in O(|G| |R|) time

Page 23: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

23O(|V| + |E|)

23

A strongly connected component in a direct graph G is a set V of nodes in G such that

• for any pair (u, v) in V, u can reach v and v can reach u; and • V is maximal: adding any node to V makes it no longer strongly

connected

Strongly connected components

What is the complexity?

by extending search algorithms, e.g., BFSby extending search algorithms, e.g., BFS

SCC• Input: A graph G• Question: all strongly connected components of G

Find social circles: how large? How many?Find social circles: how large? How many?

Page 24: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

PageRank

2424

Page 25: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

25

Introduction to PageRank

To measure the “quality” of a Web page• Input: A directed graph G modelling the Web, in which nodes

represent Web pages, and edges indicate hyperlinks • Output: For each node v in G, P(v): the likelihood that a

random walk over G will arrive at v.

25

Intuition: how a random walk can reach v?• A random jump: (1/|V|)

• : random jump factor (teleportation factor)

• Following a hyperlink: (1 - ) _(u L(v)) P(u)/C(u)

• (1 - ): damping factor

• (1 - ) _(u L(v)) P(u)/C(u): the chances for one to click a hyperlink at a page u and reach v

The chances of hitting v among |V| pages The chances of hitting v among |V| pages

Two parts

Page 26: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

26

Intuition

26

Following a hyperlink: (1 - ) _(u L(v)) P(u)/C(u)

• L(v): the set of pages that link to v; • C(u): the out-degree of node u (the number of links on u)• P(u)/C(u):

• the probability of u being visited itself• the probability of clicking the link to v among C(u) many links

on page u

The chances of reaching page v from page uThe chances of reaching page v from page u

Intuition: • the more pages link to v, and • the more popular those pages that link to v,

v has a higher chance to be visited

One of the models

Page 27: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

27

Putting together

27

The likelihood that page v is visited by a random walk:

P(v) = (1/|V|) + (1 - ) _(u L(v)) P(u)/C(u)

Recursive computation: for each page v in G,• compute P(v) by using P(u) for all u L(v)

until• converge: no changes to any P(v)• after a fixed number of iterations

random jumprandom jump following a link from other pagesfollowing a link from other pages

too expensive; use an error factortoo expensive; use an error factor

How to speed it up?

costly: trillions of pagescostly: trillions of pages Parallel computationParallel computation

Page 28: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Example: PageRank

28

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

(d, 0.2)

(a, 0.2)

(b, 0.2)

(c, 0.2)

(e, 0.2)

0.1

0.1

0.10.1

0.2

0.066

0.066

0.066

0.2

Initial stage (assume = 0, P(v) = 0.2)

3-way split3-way split

Page 29: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Example: PageRank

29

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

(a, 0.066)

(b, 0.166)

(c, 0.166)

(d, 0.3)

(e, 0.3)

(a, 0.2)(b, 0.2)

(c, 0.2)(e, 0.2)

0.1

0.1

0.10.1

0.2

0.066

0.066

0.066

0.2

Iteration 1

Page 30: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Example: PageRank

30

Exa

mpl

e fr

om C

LR

2nd

ed.

p. 5

28

(a, 0.066)(b, 0.166)

(c, 0.166)

(d, 0.3)

(e, 0.3)

0.033

0.033

0.0830.083

0.166

0.1

0.1

0.1

0.3

Iteration 2

(a, 0.1)(b, 0.133)

(c, 0.183)

(d, 0.2)

(e, 0.383)

Page 31: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Find nearest neighbors

3131

Page 32: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

32

Nearest neighbor

Nearest neighbor (kNN)• Input: A set S of points in a space M, a query point p in M, a

distance function dist(u, v), and a positive integer k• Output: Find top-k points in S that are closest to p based on

dist(p, u)

A number of techniques 32

Applications• POI recommendation: find me top-k restaurants close to where

I am • Classification: classify an object based on its nearest neighbors• Regression: property value as the average of the values of its k

nearest neighbors Linear search, space partitioning, locality sensitive hashing, compression/clustering based search, …

Euclidean distance, Hamming distance, continuous variables, …

Page 33: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

33

kNN join

kNN join• Input: Two datasets R and S, a distance function dist(r, s),

and a positive integer k• Output: pairs (r, s) for all r in R, where s is in S, and is one of

the k-nearest neighbors of r

Can we do better?33

A naive algorithm• Scanning S once for each object in R• O(|R| |S|): expensive when R or S is large

Pairwise comparison

Page 34: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

3434

Blocking and windowing

Several indexing and ordering techniques

only pairs in the same block are

compared

windowing

D B2B1

B3 partitioning

pairs into blocks

D D sorting

slidingwindow

window of a fixed size; only pairs in the same window are compared;

blocking

Grid order: rectangle cells, sorted by surrounding points GORDER: An Efficient Method for KNN Join

Processing. VLDB 2004.

Page 35: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Keyword search

3535

Page 36: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Keyword search

36

Ford company

Michigan , city history

Jaguar XJ

USA

result 1

Black Jaguar

history

North America

habitat

result 2

offer

New York, city

United States

Jaguar XK 001

result 3

White Jaguar

historyhabitat

South America

result 4

offer

Chicago

USA

Jaguar XK 007

result 5

Michigan

Query Q: ‘[Jaguar’, ‘America’, ‘history’]

Input: A list Q of keywords, a graph G, a positive integer k Output: top-k “matches” of Q in G

Information retrieval

What makes a match? How to sort the matches? How to efficiently find top-k matches?

Questions to answer

Page 37: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

37

Semantics: Steiner tree

What can we do about it?37

Match: a subtree T of G such that • each keyword in Q is contained in a leaf of T

Ranking: • The total weight of T (the sum of w(e) for all edges e in T)

Input: A list Q of keywords, a graph G, a weight function w(e) on the edges on G, and a positive integer k

Output: top-k Steiner trees that match Q PageRank scores

The cost to connect the keywords Complexity?

NP-complete

Page 38: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

38

Semantics: distinct-root (tree)

O(|Q| (|V| log |V| + |E|))38

Match: a subtree T of G such that • each keyword in Q is contained in a leaf of T

Ranking: • dist(r, q): from the root of T to a leaf q• The sum of distances from the root to all leaves of T

Input: A list Q of keywords, a graph G, and a positive integer k Output: top-k distinct trees that match Q

To avoid too homogeneous result, e.g., a hub as a root Diversification:

• each match in the top-k answer has a distinct rootHow many candidate matches for root?

Page 39: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

39

Semantics: Steiner graphs

Revision: minimum subgraphs39

Match: a subgraph G’ of G such that it is • r-radius: the shortest distance between any pair of nodes in G is

at most r (at least one pair with the distance); and• each keyword is contained in a content node • a Steiner node: on a simple path between a pair of content nodes

Input: A list Q of keywords, an undirected (unweighted) graph G, a positive integer r, and a positive integer k

Output: Find all r-radius Steiner graphs that match Q

Computation: Mr, the r-th power of adjacency graph of G

Page 40: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

40

Answering keyword queries

A well studied topic 40

G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE 2002.

V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB 2005.

H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on graphs. SIGMOD 2007.

A host of techniques• Backward search• Bidirectional search• Bi-level indexing• …

Page 41: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

However, …

41

Ford company

Michigan , city history

Jaguar XJ

USA

result 1

Black Jaguar

history

North America

habitat

result 2

offer

New York, city

United States

Jaguar XK 001

result 3

White Jaguar

historyhabitat

South America

result 4

offer

Chicago

USA

Jaguar XK 007

result 5

Michigan

Query Q: ‘[Jaguar’, ‘America’, ‘history’]

The semantics is rather “ad hoc”

What does the user really want to find? Tree or graph? How to

explain matches found?

Add semantics to keyword search

company

dealer

history

Americacountry

city

cars

A long way to go

Page 42: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Summing up

4242

Page 43: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

43

Graph query languages

Unfortunately, no “standard” query language for social graphs, yet43

SoQL: an SQL-like language to retrieve paths

CRPQ: extending conjunctive queries with regular path expressions• R. Ronen and O. Shmueli. SoQL: A language for querying and creating

data in social networks. ICDE, 2009.• P. Barceló, C. A. Hurtado, L. Libkin, and P. T. Wood. Expressive

languages for path queries over graph-structured data. In PODS, 2010

There have been efforts to develop graph query languages.Querying topological structures

SPARQL: for RDF data• http://www.w3.org/TR/rdf-sparql-query/

Read this

Page 44: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

44

Summary and review

Why are reachability queries? Regular path queries? Connected

components? Complexity? Algorithms?

What are factors for PageRank? How does PageRank work?

What are kNN queries? What is a kNN join? Complexity?

What are keyword queries? What is its Steiner tree semantics?

Distinct-root semantics? Steiner graph semantics? Complexity?

Name a few applications of graph queries you have learned.

Find graph queries that are not covered in the lecture.

Page 45: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

45

Project (1)

Recall regular path queries:

45

Develop two algorithms for evaluating regular path queries:• a sequential algorithm by using 2-hop covers; and • an algorithm in MapReduce

Prove the correctness of your algorithms and give complexity analysisExperimentally evaluate your algorithms, especially their scalability

• Input: A node-labeled directed graph G, a pair of nodes s and t in G, and a regular expression R

• Question: Does there exist a path p from s to t that satisfies R?

A development project

Page 46: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

46

Project (2)

GPath. Extend XPath to query directed, node-labeled graphs.

46

Design GPath, a query language for graphs. A GPath query Q starts

from a context node v in a graph G, traverses G and returns all the

nodes that are reachable from v by following Q. GPath should

support the child axis, wildcard *, self-or-descendants (//), and filters

(aka qualifiers, such as [p = c]). Justify your design.

Develop an algorithm that, given a GPath query Q, a graph G, and

a context node v in G, computes Q(G), the set of nodes reachable

from v in G by following Q.

Give a complexity analysis of your algorithm and show its

correctness.

Experimentally evaluate your algorithm

A research project

Page 47: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

47

Project (3)

Study keyword search in graphs.

47

Pick a semantics for keyword search and an algorithm for

implementing keyword search based on the semantics

Justify your choice: semantics and algorithm

Implement the algorithm in whatever language you like

Experimentally evaluate your implementation

Demonstrate your keyword search support

A development project

Page 48: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Reading

48

• A. Mendelzon, and P. Wood. Finding regular simple paths in graph databases. SICOMP, 24(6), 1995. ftp://ftp.db.toronto.edu/pub/papers/sicomp95.ps.Z

• J. Xu and J. Chang. Graph reachability queries: a survey. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.352.2250

• J. Xu, L. Qin, and J. Chang. Keyword Search in Relational Databases: A Survey.

http://sites.computer.org/debull/A10mar/yu-paper.pdf

Acknowledgments: some animation slides are borrowed from

www.cs.kent.edu/~jin/Cloud12Spring/GraphAlgorithms.pptx

Page 49: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Papers for you to review

49

• E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SODA 2002. http://www.cs.tau.ac.il/~heran/cozygene/publications/papers/labels.pdf

• R. Bramandia, B. Choi, and W. Ng. On Incremental Maintenance of 2-hop Labeling of Graphs. WWW 2008.

http://wwwconference.org/www2008/papers/pdf/p845-bramandia.pdf

• M. Kaul, R.Wong, B. Yang, C. Jensen. Finding Shortest Paths on Terrains by Killing Two Birds with One Stone. VLDB 2014. http://www.vldb.org/pvldb/vol7/p73-kaul.pdf

• C. Xia, H. Lu, B. Ooi and J. Hu. GORDER: An Efficient Method for KNN Join Processing. VLDB 2004. http://www.vldb.org/conf/2004/RS20P2.PDF

Page 50: 1 QSX: Querying Social Graphs Graph Queries and Algorithms Graph search (traversal) PageRank Nearest neighbors Keyword search Graph pattern matching

Papers for you to review

50

• G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE 2002.

http://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/BanksICDE2002.pdfR• V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H.

Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB 2005.

http://www.researchgate.net/publication/221310807_Bidirectional_Expansion_For_Keyword_Search_on_Graph_Databases

•H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on graphs. SIGMOD 2007.

http://db.cs.duke.edu/papers/2007-SIGMOD-hwyy-kwgraph.pdf

•Y. Wu, S. Yang, M. Srivatsa, A. Iyengar, X. Yan. Summarizing Answer Graphs Induced by Keyword Queries.. VLDB 2014. http://www.cs.ucsb.edu/~yinghui/mat/papers/gsum_vldb14.pdf