presented by ozgur d. sahin. outline introduction neighborhood functions anf algorithm modifications...
Post on 21-Dec-2015
215 views
TRANSCRIPT
![Page 1: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/1.jpg)
Presented by Ozgur D. Sahin
![Page 2: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/2.jpg)
Outline
Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions
![Page 3: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/3.jpg)
Introduction & Motivation
Graph-based data is becoming more importatnt Internet modeling, academic citations, phone records,
movie databases, CAD circuits Example Questions:
How robust is the Internet to failures? What are the most influential database papers? What is the best opening move in tic-tac-toe? Are phone call patterns in Asia similar to those in the U.S.?
Goal: Quickly answer questions on graph- represented data
![Page 4: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/4.jpg)
Answering Questions
We can answer these questions if we can compute following three properties related to connectivity and neighborhood structure: Graph Similarity: Decide if two graphs have similar
connectivity/neighborhood structure Subgraph Similarity: Compare how two subgraphs of a
given graph are connected Vertex Importance: Assign an importance to each node
based on its connectivity
This paper provides such a tool: ANF (Approximate Neighborhood Function)
![Page 5: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/5.jpg)
Challenges
Following properties should be satisfied: Error Guarantees: Accurate estimates Fast: Scale linearly with n (# of nodes) and m (# of edges)
Low Storage Adapts to available memory Parallelizable Sequential scan of the edge file Estimates per node
![Page 6: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/6.jpg)
Definitions - Neighborhood Functionsdist(u,v): # of edges on the shortest path from u to v
Define following neighborhood functions:
![Page 7: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/7.jpg)
Definitions - Neighborhood FunctionsGeneralize these two definitions to deal with subgraphs:
![Page 8: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/8.jpg)
Basic ANF Algorithm
N(h) can be computed by a graph traversal Graph traversal accesses edges in random order Running time is O(nm)
Access edges in sequential order: M(x,h) is the set of nodes within distance h of node x
![Page 9: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/9.jpg)
Basic ANF Algorithm
How to compute the number of distinct elements in the set M(x,h): A dictionary data structure: O(n2log n) time/space Use bits to mark membership: O(n2) space Use ‘probabilistic counting algorithm’
Approximate set sizes using ‘log n+r’ bits
![Page 10: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/10.jpg)
Probabilistic counting algorithm Approximate set sizes using ‘log n+r’ bits Instead of one bit per node, give half the nodes bit 0, a
quarter of them bit 1, and so on (A node is given bit i with probability 1/2i+1)
The approximation of the size of a set is proportional to 2b, where b is the least bit that has not been set in the bit representation of this set
Use k parallel approximations M(x,h) is represented by k(log n+r) bits
![Page 11: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/11.jpg)
Basic ANF Algorithm Consider a ring with 5 nodes
Example for k=3 and r=0 Bit 0 is the leftmost bit in each 3-bit mask
M(2,1) is the union of M(2,0), M(1,0), and M(3,0): M(2,1)=M(2,0) OR M(1,0) OR M(3,0)
IN(2,1) is computed from the average of the least zero bit positions: Avg=(2+1+1)/3=4/3 IN(2,1) = (24/3)/0.77359 = 3.25
![Page 12: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/12.jpg)
Basic ANF Algorithm
![Page 13: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/13.jpg)
Modifications
M(x,h) uses M(y,h-1) but not M(y,h-2), so just keep the M(y,h-1) during iteration h.
Include a mark bit to handle generalized neighborhood functions
Break bit masks into smaller pieces if they are larger than the available memory
![Page 14: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/14.jpg)
Leading Ones Compression
As ANF runs, most bit masks will have many leading 1’s
Compress bit masks by including a counter of the leading ones
Bit shuffling of k parallel bit masks enables further compression: 11010,11100 1111011000
Provides up to 23% speed-up
![Page 15: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/15.jpg)
Experiments
Data Sets: 3 real (Router, Cornell, Cora) and 4 synthetic
Evaluation Metric:
![Page 16: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/16.jpg)
Experiments - Accuracy
k=64: - ANF achieves less than 7% error
- ANF’s error is independent of the data set
![Page 17: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/17.jpg)
Experiments - Time
![Page 18: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/18.jpg)
Experiments - Scalability
![Page 19: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/19.jpg)
Data Mining with ANF
ANF tool can be used to answer graph mining problems: Best opening move for Tic-Tac-Toe game Clustering movie classes Measuring the robustness of the Internet
Use summarized statistics derived from neighborhood function: Many real graphs follow a power law:
N(h) hH, where H is defined as the ‘hop exponent’ Use ‘individual hop exponent’ as a measure of importance
![Page 20: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/20.jpg)
Tic-Tac-Toe
Show: The best opening move is the center square Each possible board configuration is a node and
there is an edge from board x to board y if it is a possible move
Compute individual neighborhood functions for each of the 9 possible first moves
![Page 21: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/21.jpg)
Clustering Movies
Consider IMDB (Internet Movie Data Base) where each movie is identified as being in one or more classes (such as documentaries, dramas, comedies, etc)
Construct a graph for each class and cluster similar ones
![Page 22: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/22.jpg)
Internet Router Data How robust the Internet is to router failures
Delete some number of routers and measure connectivity
-Random failures do not disrupt the Internet
-Targeted failures can dramatically disrupt it
![Page 23: Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d565503460f94a339dc/html5/thumbnails/23.jpg)
Conclusions
ANF uses an efficient and accurate approximation algorithm
ANF tool provides several advantages including following: Accurate Fast Low storage requirements Parallelizable
ANF makes it possible to answer many interesting questions