data mining --- mining graphslohall/dm/cis6930_dm_gm.pdfmathematics of networks (graphs) cap5771.1...

38
Data mining --- mining graphs CAP5771.1 University of South Florida Xiaoning Qian

Upload: others

Post on 19-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Data mining --- mining graphs

CAP5771.1University of South Florida

Xiaoning Qian

Page 2: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Today’s Lecture

1. Complex networks

2. Graph representation for networks

3. Markov chain

4. Viral propagation

5. Google’s PageRank

CAP5771.1

Page 3: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

1. M Faloutsos, P Faloutsos, C Faloutsos, On power-lawrelationships of the Internet topology. Comput. Commun. Rev.29:251-262, 1999

2. JM Kleinberg, Navigation in a small world—it is easier to findshort chains between points in some networks than others. Nature,406:845, 2000

3. AL Barabasi, Linked: The New Science of Networks. Cambridge,MA, 2002

4. DJ Watts, The “new” science of networks. Annu. Rev. Sociol.30:243–70, 2004

5. DL Alderson, Catching the “network science” bug: Insight andopportunity for the operations researcher. Operations Research56(5): 1047-1065, 2008

6. MEJ Newman, Networks: An Introduction. Oxford, 2010

Xiaoning Qian ([email protected]) 10/13/11

“New” Science of Networks

CAP5771.1

Page 4: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

The first network/graph problem Find a tour crossing every bridge just once, Euler, 1735

Xiaoning Qian ([email protected]) 10/13/11

Network Science

CAP5771.1

Bridges of Königsberg

Page 5: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

1. S Milgram, The small world problem. Psychol.Today 2:60–67, 1967

“New” Science Unprecedented number of empirical networks Much larger scale networks Visualization does not convey enough information

Computer are much more powerful Highly interdisciplinary

Xiaoning Qian ([email protected]) 10/13/11

Network Science

CAP5771.1

Page 6: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Mining networks/graphs

Topology/structure of complex networks Global: degrees, centrality, connectivity, etc.

Scale-free (power-law) networks: 6 degree separation?

Local: clustering (community), network motifs, etc.

Dynamics/behavior of complex networks Global: the topological effect on dynamics

How information, virus, disease, rumors, etc. propagate?

Local: how individual nodes behave

Xiaoning Qian ([email protected]) 10/13/11

Network Science

CAP5771.1

Page 7: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (Yeast PPI)

CAP5771.1

Page 8: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (Yeast signaling)

CAP5771.1

Page 9: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (food web)

CAP5771.1

Page 10: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (friendship)

CAP5771.1

Page 11: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (romantic relation)

CAP5771.1

Page 12: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (author citation)

CAP5771.1

Page 13: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (Internet)

CAP5771.1

Page 14: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Complex Networks (Web)

CAP5771.1

Page 15: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

What is a network/graph? A collection of vertices/nodes joined by edges Different types of vertices and edges:

Directed vs. Undirected Weighed vs. Binary Labeled vs. Nonlabeled Bipartite graphs Hypergraphs

Mathematically,G = {V, E}

Page 16: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Undirected network: Undirected network: <v<vii, , vvjj> > ∈∈ E => < E => < vvjj, v, vii > > ∈∈ E E

Page 17: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Adjacency matrix L Symmetric for undirected graphs Square matrix for (self-)graphs; rectangular for bipartite

graphsLijij = eijij if <v<vii, , vvjj> > ∈∈ E E

Matrix analysis for graph mining!

Simple graphs, connected graphs, complete graphs, …

Page 18: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Mathematics of Networks (Graphs)

CAP5771.1

Node degree cii

The number of edges incident with vertex vvii

Neighbor set Input-, output-degrees Degree distributions (power-law) …

Trail (distinct edges), path (distinct nodes), cycle, cut, …

Page 19: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Sequential data

Xiaoning Qian ([email protected]) 10/13/11

Markov chain

CAP5771.1

Page 20: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

What is a Markov chain?

Finite Markov chain -- (Q, P) Q = {q1, q2, …, qs} : a finite set of states P : state transition probability matrix Given a sequence of observations: The probability of the sequence is:

For first-order time-homogeneous Markov chain:

Hence,

CAP5771.1

Page 21: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Finite Markov chain -- (Q, P) Q = {B, q1, q2, …, qs} : a finite set of states P : state transition probability matrix

initial state probability:

The probability of a sequence can be expressed with P:

Note: The output are states at each time -- states areobservable!!

Xiaoning Qian ([email protected]) 10/13/11

What is a Markov chain?

CAP5771.1

Page 22: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

3-state Markov chain model for the weather:

Q = {Rain (or snow), Cloudy, Sunny};

P is given in the figure; Initial state probability

RR

CC

SS

Xiaoning Qian ([email protected]) 10/13/11

An example

0.40.4

0.30.3

0.30.3

0.20.2

0.20.2

0.60.6

0.80.8

0.10.1

0.10.1

CAP5771.1

Page 23: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Chapman-Kolmogorov equationp(xn) = P(n-1) p(x1)

Limiting distribution (stationary/steady-state distribution) Irreducibility, Periodicity, Ergocity

p = P p

How to solve p? Eigen-decomposition of P Power method

Xiaoning Qian ([email protected]) 10/13/11

Chapman-Kolmogorov Equation

CAP5771.1

Page 24: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Random walk on graphs (network diffusion) is a Markovprocess.

Xiaoning Qian ([email protected]) 10/13/11

Random walk on graphs

CAP5771.1

Page 25: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

The algorithm of Google---PageRank

Xiaoning Qian ([email protected]) 10/13/11

What’s behind Google?

CAP5771.1

Page 26: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

What is an important Webpage?There are many Webpages pointing to it

Important Webpage point to more important Webpage

Importance diffuses based on links between Webpages

Vertices: Webpages; Edges: hyperlinks;

HITS: JM Kleinberg

Page 27: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk) on Web

pi: importance for page i; Lij: link from page j to i;

λ λ

Hence, the problem becomes a Markov chain problem(diffusion process):

λ λ

Page 28: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk with restart) on Web

pi: importance for page i; Lij: link from page j to i;

λ λ

Page 29: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

Diffusion (Random walk with restart) on Web

Matrix form:

λ λ

PseudocountDiffusion factor

λ λ

Page 30: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

PageRank

CAP5771.1

How do we solve this?λ λ

Note that p is simply for ranking and the absolutevalues are not critical! WLOG, we assume

Hence, the problem becomes a Markov chain problem(diffusion process):

λ λ

Page 31: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Viral propagation

CAP5771.1

How does the virus spread over the network? Will it become an “epidemic” outbreak? How fast the virus will die out or become “epidemic”? How we should design “robust” networks to prevent

cascading failures?

* A Ganesh et. al., The effect of network topology on the spread of epidemics. INFOCOM, 2005

* D Chakrabati, Tools for large graph mining. Ph.D. Thesis, CMU, 2005

Page 32: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

Mathematical Epidemiology

CAP5771.1

SIR (Susceptible-Infective-Recovered) model

SIS (Susceptible-Infective-Susceptible) model Catching the disease from Infective neighbors (birth rate): β

Recover rate: δ

Epidemic threshold: τ

Page 33: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

????

Sum and Product rules in probability!!

SIS model is again a Markov process!

Page 34: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

Sum and Product rules in probability!!

Page 35: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

Sum and Product rules in probability!!

With appropriate approximations, we can derivep(vi

t=susceptible) = p(vit-1=susceptible) ζi + p(vi

t-1=infective) δ

1-p(vit) = [1-p(vi

t-1)] ζi + p(vit-1) δ

and

pt =[ βL + (1-δ)I ] pt-1

Page 36: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

With appropriate approximations, we can derivept

=[ βL + (1-δ)I ] pt-1

Eigen-decomposition of the matrix S= [ βL + (1-δ)I ]

LL

pp pp 00

LLHence,

Page 37: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Xiaoning Qian ([email protected]) 10/13/11

SIS model

CAP5771.1

With appropriate approximations, we can derivept

=[ βL + (1-δ)I ] pt-1

Eigen-decomposition of the matrix S= [ βL + (1-δ)I ]

Epidemic threshold:

LLHence,

LL

Page 38: Data mining --- mining graphslohall/dm/CIS6930_dm_gm.pdfMathematics of Networks (Graphs) CAP5771.1 What is a network/graph? A collection of vertices/nodes joined by edges Different

Networks/graphs are everywhere and require new toolsto study them efficiently and effectively.

Random walk (Markov chain) on graphs and itsextension can be a useful technique to “mine” complexnetworks/graphs PageRank Viral propagation

Have you learned anything? :)

I am teaching Biological Network Analysis, Spring 2012.

Xiaoning Qian ([email protected]) 10/13/11

Summary

CAP5771.1