liang ge. introduction important concepts in mcl algorithm mcl algorithm the features of mcl...
TRANSCRIPT
Liang Ge
Introduction
Important Concepts in MCL Algorithm
MCL Algorithm
The Features of MCL Algorithm
Summary
Simualtion of Random Flow in graph
Two Operations: Expansion and Inflation
Intrinsic relationship between MCL process result and cluster structure
Popular Description: partition into graph so that
Intra-partition similarity is the highest
Inter-partition similarity is the lowest
Observation 1:
The number of Higher-Length paths in G is large for pairs of vertices lying in the same dense cluster
Small for pairs of vertices belonging to different clusters
Oberservation 2:
A Random Walk in G that visits a dense cluster will likely not leave the cluster until many of its vertices have been visited
Measure or Sample any of these—high-length paths, random walks and deduce the cluster structure from the behavior of the samples quantities.
Cluster structure will show itself as a peaked distribution of the quantities
A lack of cluster structure will result in a flat distribution
Markov Chain
Random Walk on Graph
Some Definitions in MCL
A Random Process with Markov Property
Markov Property: given the present state, future states are independent of the past states
At each step the process may change its state from the current state to another state, or remain in the same state, according to a certain probability distribution.
A walker takes off on some arbitrary vertex
He successively visits new vertices by selecting arbitrarily one of outgoing edges
There is not much difference between random walk and finite Markov chain.
Simple Graph
Simple graph is undirected graph in which every nonzero weight equals 1.
Associated Matrix
The associated matrix of G, denoted MG ,is defined by setting the entry (MG)pq equal to w(vp,vq)
Markov Matrix
The Markov matrix associated with a graph G is denoted by TG and is formally defined by letting its qth column be the qth column of M normalized
The associate matrix and markov matrix is actually for matrix M+I
I denotes diagonal matrix with nonzero element equals 1
Adding a loop to every vertex of the graph because for a walker it is possible that he will stay in the same place in his next step
Find Higher-Length Path
Start Point: In associated matrix that the quantity (Mk)pq has a straightforward interpretation as the number of paths of length k between vp and vq
(MG+I)2
MG
MG
Flow is easier with dense regions than across sparse boundaries,
However, in the long run, this effect disappears.
Power of matrix can be used to find higher-length path but the effect will diminish as the flow goes on.
Idea: How can we change the distribution of transition probabilities such that prefered neighbours are further favoured and less popular neighbours are demoted.
MCL Solution: raise all the entries in a given column to a certain power greater than 1 (e.g. squaring) and rescaling the column to have the sum 1 again.
Expansion Operation: power of matrix, expansion of dense region
Inflation Operation: mention aboved, elimination of unfavoured region
http://www.micans.org/mcl/ani/mcl-animation.html
Find attractor: the node a is an attractor if Maa is nonzero
Find attractor system: If a is an attractor then the set of its neighbours is called an attractor system.
If there is a node who has arc connected to any node of an attractor system, the node will belong to the same cluster as that attractor system.
Attractor Set={1,2,3,4,5,6,7,8,9,10}The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10}The overlaping clusters are {1,2,3,11,12,15},{4,5,6,7,13},{8,9,12,13,14,15},{10,12,13}
how many steps are requred begore the algorithm converges to a idempoent matrix?
The number is typically somewhere between 10 and 100
The effect of inflation on cluster granularity
R denotes the inflation operation constants. A denotes the loop weight.
MCL stimulates random walk on graph to find cluster
Expansion promotes dense region;while Inflation demotes the less favoured region
There is intrinsic relationship between MCL result and cluster structure