graph algorithms ch. 5 lin and dyer. graphs are everywhere manifest in the flow of emails...
TRANSCRIPT
![Page 1: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/1.jpg)
Graph Algorithms
Ch. 5 Lin and Dyer
![Page 2: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/2.jpg)
Graphs
• Are everywhere• Manifest in the flow of emails• Connections on social network• Bus or flight routes• Social graphs: twitter friends and followers• Take a look at Jon Kleinger’s page and book on
Networks, Crowds and Markets Reasoning about a highly connected world.
![Page 3: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/3.jpg)
Graph algorithms– Graph search and path planning:: shortest path to a node– Graph clustering:: diving the graphs into smaller related
clusters– Minimum spanning tree:: graph that covers the nodes in an
efficient way– Bipartite graph match:: div graph into two mapping sets: job
seekers and employers– Maximum flow:: designate source and sink; determine max
flow between the two: transportation– Identifying special nodes: authoritative nodes: containment of
spread of diseases; Broad street water pump in London, cholera and beginnings of epidemiology
![Page 4: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/4.jpg)
Graph Representations
n1 n2
n3
n4
n5
How do you represent this visual diagram as data?
![Page 5: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/5.jpg)
Simple, Baseline Data Structure
0 1 0 1 0
0 0 1 0 1
0 0 0 1 0
0 0 0 0 1
1 1 1 0 0
n1
n2
n3
n4
n5
n1 n2 n3 n4 n5
(i) Adjacency matrix – thisis good for linear algebra;But most web links and social Networks are sparsex/ 1000000000Space req. is O(n2)
n1n2
n1 [n2, n4]n2 [n3, n5]n3 [n4]n4 [n5]n5 [n1,n2,n3]
(ii) Adjacency lists
![Page 6: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/6.jpg)
Problem definition: intuition
• Input: graph adjacency list with edges and vertices, w edges distances, starting vertex
• Output(goal): label the nodes/vertices with the shortest distance value from the starting node
![Page 7: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/7.jpg)
single source shortest path problem
• Sequential solution: Dijkstra’s algorithm 5.2Dijkstra (G, w, s) // w edge distances list, s starting node, G graph d[s] 0 for all other vertices d[v] ∞Q {V} // Q is priority queue based on distanceswhile Q # 0 u min(Q) // node with min d value for all vertex v in u.adjacencyList if d[v] > d[u] + w[u,v] d[v] d[u] + w[u,v] mark u and remove from Q
At each iteration of while loop, the algorithm expands the node with the shortest distance and updates distances to all reachable nodes
![Page 8: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/8.jpg)
Sample graph : lets apply the algorithm 5.2
∞
n1
n2
n3
n4
n5
0
∞
∞∞
10
5
2 3
1
9
4 67
2
![Page 9: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/9.jpg)
Issues
• Sequential• Need to keep global state: not possible with
MR• Lets see how we can handle this graph
problem for parallel processing with MR
![Page 10: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/10.jpg)
Parallel Breadth First
• Assume distance of 1 for all edges (simplifying assumption): later we will expand it to other distances
![Page 11: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/11.jpg)
Issues in processing a graph in MR
• Goal: start from a given node and label all the nodes in the graph so that we can determine the shortest distance
• Representation of the graph (of course, generation of a synthetic graph)
• Determining the <key,value> pair• Iterating through various stages of processing
and intermediate data• When to terminate the execution
![Page 12: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/12.jpg)
Input data format for MR• Node: nodeId, distanceLabel, adjancency list {nodeId, distance}• This is one split• Input as text and parse it to determine <key, value>• From mapper to reducer two types of <key, value> pairs• <nodeid n, Node N>• <nodeid n, distance until now label>• Need to keep the termination condition in the Node class• Terminate MR iterations when none of the labels change, or when
the graph has reached a steady state or all the nodes have been labeled with min distance or other conditions using the counters can be used.
• Now lets look at the algorithm given in the book
![Page 13: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/13.jpg)
Mapper
Class Mapper method map (nid n, Node N) d N.distance emit(nid n, N) // type 1 for all m in N. Adjacencylist emit(nid m, d+1) // type 2
![Page 14: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/14.jpg)
Reducer
Class Reducer method Reduce(nid m, [d1, d2, d3..]) dmin = ∞; // or a large # Node M null for all d in [d1,d2, ..] { if IsNode(d) then M d else if d < dmin then dmin d}
M.distance dmin // update the shortest distance in M emit (nid m, Node M)
![Page 15: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/15.jpg)
Trace with sample Data
1 0 2:3:2 10000 3:4:3 10000 2:4:54 10000 5:5 10000 1:4
![Page 16: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/16.jpg)
Intermediate data
1 0 2:3:2 1 3:4:3 1 2:4:5:4 10000 5:5 10000 1:4:
![Page 17: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/17.jpg)
Intermediate Data
1 0 2:3:2 1 3:4:3 1 2:4:5:4 2 5:5 2 1:4:
![Page 18: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/18.jpg)
Final Data
1 0 2:3:2 1 3:4:3 1 2:4:5:4 2 5:5 2 1:4:
![Page 19: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/19.jpg)
Sample Data1 0 2:3:
2 10000 3:4: 3 10000 2:4:5
4 10000 5: 5 10000 1:4
1 0 2:3: 2 1 3:4:
3 1 2:4:5 4 10000 5: 5 10000 1:4
1 0 2:3: 2 1 3:4:
3 1 2:4:5 4 2 5: 5 2 1:4
![Page 20: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/20.jpg)
PageRank
• Original algorithm (huge matrix and Eigen vector problem.)
• Larry Page and Sergei Brin (Standford Ph.D. students)• Rajeev Motwani and Terry Winograd (Standford
Profs)
![Page 21: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/21.jpg)
General idea
• Consider the world wide web with all its links.• Now imagine a random web surfer who visits a page
and clicks a link on the page• Repeats this to infinity• Pagerank is a measure of how frequently will a page
will be encountered.• In other words it is a probability distribution over
nodes in the graph representing the likelihood that a random walk over the linked structure will arrive at a particular node.
![Page 22: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/22.jpg)
PageRank Formula
P(n) = α randomness factor G is the total number of nodes in the graph L(n) is all the pages that link to n C(m) is the number of outgoing links of the page mNote that PageRank is recursively defined.It is implemented by iterative MRs.
![Page 23: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/23.jpg)
Example
• Figure 5.7• Lets assume alpha as zero• Lets look at the MR
![Page 24: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/24.jpg)
Mapper for PageRank
Class Mapper method map (nid n, Node N) p N.Pagerank/|N.AdajacencyList| emit(nid n, N) for all m in N. AdjacencyList emit(nid m, p)“divider”
![Page 25: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/25.jpg)
Reducer for Pagerank
Class Reducer method Reduce(nid m, [p1, p2, p3..]) node M null; s = 0; for all p in [p1,p2, ..] { if p is a Node then M p else s s+p } M.pagerank s emit (nid m, node M)
“aggregator”
![Page 26: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/26.jpg)
Lets trace with sample data
12
4
3
![Page 27: Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs:](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e045503460f94af0a6c/html5/thumbnails/27.jpg)
Discussion
• How to account for dangling nodes: one that has many incoming links and no outgoing links– Simply redistributes its pagerank to all– One iteration requires pagerank computation + redistribution of
“unused” pagerank• Pagerank is iterated until convergence: when is convergence
reached?• Probability distribution over a large network means
underflow of the value of pagerank.. Use log based computation
• MR: How do PRAM alg. translate to MR? how about other math algorithms?