biological graph models
Post on 02-Jul-2015
62 Views
Preview:
DESCRIPTION
TRANSCRIPT
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Biological random graph models
Georgios Drakopoulos
CEID
April 2, 2014
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Agenda
Topics
Graph mining
Characteristics
Distributions
Models
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Agenda
Topics
Graph mining
Characteristics
Distributions
Models
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Overview
Applications
Distributed processing.
Social media.
Spatial databases.
Protein networks.
Queries
Neighborhood size.
Shortest paths.
Minimum cuts.
Maximum flow.
Connected components.
Partitioning.
Bipartiteness.
Circles.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Challenges
Computational
Memory requirements.
Data is distributed across:
Memory hierarchy.Multiple disks.Network.
Algorithmic
Reorganization.
Randomization.
Heuristics.
Visualization.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Agenda
Topics
Graph mining
Characteristics
Distributions
Models
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Overview
Characteristics
Self similar and scale free.
Shrinking diameter.
Giant component.
Communities.
Densification.
Preferential attachment.
Heavy tails in degree distributions.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Scale free graphs
Definition
Let g (n) be the graph growth function. Then
g (β0 n) = g (β0) g (n)
⇔
limt→+∞
(g (β0 n)
g (n)
)= Θ (1)
Notes
The most fundamental property.
Allows self-similar graphs.
Contributes to overall graph robustness.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Shrinking diameter
Definition
Maximum shortest path length descreases.
Notes
Prone to outliers.
Effective diameter.
Small world phenomena.
Robustness.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Densification
Definition
Ratio of edges to vertices is an increasing time function.
Notes
Average degree 〈d〉 increases.
〈d〉 exact behaviour is of interest.
Degree distribution is of interest too.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Giant connected component
Definition
Graph connectivity threshold point t0:
Many, medium components before t0.
One, large component after t0.
Notes
t0 depends on average degree 〈d〉.t0 is a phase transition point.
Phase transition in connectivity pattern.
Interesting patterns around t0.
Similar to phase change from complexity theory.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Agenda
Topics
Graph mining
Characteristics
Distributions
Models
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Overview
Significant graph metric
Defines graph topology.
Affects graph growth rate.
Affects self-similarity.
Notes
Degree distribution alone cannot describe growth pattern.
Vertex degree-degree correlation is more informative.
Vertex degree ranking is of interest.
No completely satisfactory model exists.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Exponential
Definition
prob {X = k} = λ0 e−λ0 k λ0 > 0
Notes
Tractable bounds.
Closed forms.
Memoryless property.
Quick decay.
Last two properties undesired for self similarity.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Binomial
Definition
prob {X = k} =
(n
k
)pk0 (1− p0)1−k 0 < p0 < 1
Notes
n is the number of vertices.
Some early models lead to it.
When n grows large, it is approximated by exponential.
Quick decay.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Poisson
Definition
prob {X = k} = e−〈d〉〈d〉k
k!
Notes
Other early models lead to it.
Also approximated by exponential.
Also quick decay.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Power law
Definition
prob {X = k} = P (k) = α0 k−γ0 α0 > 0, γ0 ≥ 1
Notes
α0 is a normalization constant.
γ0 is termed the system exponent.
Characterizes graph subclasses.Connections to Lyapunov exponent.
Zipf law for γ0 = 1.
Information retrieval.
Lotka law for γ0 = 2.
Scientific citation networks.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Power law
Notes
When γ0 < 2, then the average degree diverges.
When γ0 < 3, then the standard deviation diverges.
Empirically for most scale free graphs 2 < γ0 < 3.
Real world graph properties have been attributed to divergence.
Forces graph to be finite.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Cutoff power law
Definition
prob {X = k} = α0 (k + k0)−γ0 e−kk1 α0 > 0, γ0 ≥ 1
Notes
α0 and γ0 as before.
k0 and k1 need to be determined.
k0 is the degree threshold.
Exponential decay through k1 only for large degrees.
Fits best to actual data.
Upper and lower degree bounds are important.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Multifractal
Definition
prob {X = k} =
p∑i=1
wi Pi (k) ,
p∑i=1
wi = 1
Notes
Weighted power law sum.
More general than power law.
Harder to fit.
Recent model.
Promising early results.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Agenda
Topics
Graph mining
Characteristics
Distributions
Models
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Complementary approaches
Power distributions are given
Locate large graphs that possess power distributions.
Derive properties concerning deeper structure.
Black box approach.Closer to real world problems.Domain specific.
Power distributions are generated
Generate small graphs that possess power distributions.
Find evolution model ensuring power distributions.
Monitor graph properties and structure.
White box approach.Generic and flexible.More demanding.Overmodeling and undermodeling issues.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Overview
Random models
Gn,p or Erdos - Renyi.
Albert - Barabasi.
Gene duplication.
Kronecker.
Deterministic.Probabilistic.
Aiello family.
Other models
Gn.
Gn,m.
Kumar family.
R-Mat.
Jelly point.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Gn,p model
Overview
Static.
Probabilistic.
Each of the(n2
)edge exists independently with probability p.
Notes
Easy interpretation.
Easy generation.
Closed forms for most characteristics.
Independent edges lead to exponential distribution.
No preferential attachment.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Albert - Barabasi model
Overview
Dynamic.
Probabilistic.
Number of vertices and edges evolves.
Start with n0 connected verticesAdd a new vertex of random degree τ < n0.Select τ vertices with probability proportional to their degree.
Notes
Easy interpretation.
Preferential attachment.
Uniform selection probability leads to exponential distribution.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Gene duplication model
Overview
Start with a graph of n0 genes.
Select uniformly a gene g0.
Duplicate g0.
With probability p0 remove an edge from g0.
Edges are uniformly selected.
With probability q0 add an edge to g0.
Genes are uniformly selected.
Notes
Dynamic.
Probabilistic.
Peferential attachment.
Leads to power law.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Kronecker modelDeterministic variant
Overview
Graphs are represented by adjacency matrices.
Kronecker products are heavily used.
Generation
Select a generator graph.
Not an easy topic.
Leads to self similar graphs.
Has most known properties.
Generalizes R-Mat model.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Kronecker modelDeterministic variant
Adjacency matrix
A [i , j ]4=
{1, (vi , vj) ∈ E
0, (vi , vj) 6∈ E
Linear-algebraic aspect
Time evolving.
Adjacency matrix density.
Power distribution in:
Adjacency matrix eigenvalues.Principal eigenvector components.
Alternative representations:
Incidence matrix.Circuit matrix.(Normalized) Laplace matrix.Konig matrix.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Kronecker modelAnalysis
Kronecker product
For A ∈ Rm1×n1 and B ∈ Rm2×n2 :
A⊗B4=
a[1, 1]B a[1, 2]B . . . a[1, n1]Ba[2, 1]B a[2, 2]B . . . a[2, n1]B
......
. . ....
a[m1, 1]B a[m1, 2]B . . . a[m1, n1]B
∈ Rm1m2×n1n2
where a[i , j ] denotes the entry of A at row i and column j .
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Kronecker modelExample
Generator graph
A[0] = A[1] =
1 1 11 1 01 0 1
Next step
A[2] = A[1] ⊗ A[0] =
1 1 1 1 1 1 1 1 11 1 0 1 1 0 1 1 01 0 1 1 0 1 1 0 11 1 1 1 1 1 0 0 01 1 0 1 1 1 0 0 01 0 1 1 1 1 0 0 01 1 1 0 0 0 1 1 11 1 0 0 0 0 1 1 01 0 1 0 0 0 1 0 1
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Kronecker modelProbabilistic variant
Overview
Replace adjacency matrix with probability matrix.
Each row sums to one.Nonzero entires.Row stochastic matrix.Doubly stochastic matrix if symmetric.
Follow deterministic procedure.
Result is distribution matrix.
Notes
More general.
More flexible.
Numerical challenges.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Aiello model family
Overview
Reviews existing random graph models.
Provides a partial framework for building random graphs.
Introduces four new random graph models termed A, B, C, andD:
Models A and B are simple models for directed graphs.Model C encompasses and extends Models A and B.Model D generalizes Model C for undirected graphs.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Model AOverview
Graph evolution
Start with n0 connected vertices.
With probability (1− p0):
Add a new vertex v0.Set in- and out-degree of v0 are one.Select one vertex vout and form an edge (vout , v0).Increase out-degree of vout by one.Select one vertex vin and form an edge (v0, vin).Increase in-degree of vin by one.Both vin and vout are chosen with probability proportional totheir in- or out-degree.
With probability p0:
Select two vertices vin and vout .Add the edge (vin , vout)Both vin and vout are chosen with probability proportional totheir in- or out-degree.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Model AAnalysis
Key points
Power distributions for both in- and out-degrees.
Simplistic model.
The same exponent for in- and out-degree distributions.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Model BOverview
Graph evolution
Almost the same dynamics as model A.
Use in- and out-weights to select nodes instead of in- andout-degrees.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Model BAnalysis
Key points
Power distributions for both in- and out-degrees.
More flexible model.
Different exponents for in- and out-degree distributions.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Model COverview
Graph evolution
Almost the same dynamics as model B.
When adding a new vertex:
me,e edges randomly.mn,e to the new vertex randomly.me,n from the new vertex randomly.mn,n loops to the new vertex.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Model CAnalysis
Key points
Power distributions for both in- and out-degrees.
Far more flexible model.
Different exponents for in- and out-degree distributions.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Model DOverview
Graph evolution
Unirected version of model B.
Still use two weights internally.
Biologicalrandom graph
models
GeorgiosDrakopoulos
Graph mining
Characteristics
Distributions
Models
Bibliography
Bibliography
Aiello W., Chung F., and Lu L., Random evolution in massivegraphs, in Abello J., Pardalos P., and Ressende M. editors,Handbook of massive data sets, Kluwer, 2002.
Vitter J., Algorithms and data structures for external memory:Dealing with massive data, ACM, 2001.
Leskovec J., Chakrabarti D., Kleinberg J., and Faloutsos C.,Realistic, mathematically tractable graph generation andevolution using Kronecker multiplication, PKDD05, Springer,2005.
top related