biological graph models

41
Biological random graph models Georgios Drakopoulos Graph mining Characteristics Distributions Models Biological random graph models Georgios Drakopoulos CEID April 2, 2014

Upload: georgios-drakopoulos

Post on 02-Jul-2015

62 views

Category:

Engineering


1 download

DESCRIPTION

Presentation to the bioinformatics and computational biology group of ceid regarding biological graph models. April 2014

TRANSCRIPT

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Biological random graph models

Georgios Drakopoulos

CEID

April 2, 2014

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Agenda

Topics

Graph mining

Characteristics

Distributions

Models

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Agenda

Topics

Graph mining

Characteristics

Distributions

Models

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Overview

Applications

Distributed processing.

Social media.

Spatial databases.

Protein networks.

Queries

Neighborhood size.

Shortest paths.

Minimum cuts.

Maximum flow.

Connected components.

Partitioning.

Bipartiteness.

Circles.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Challenges

Computational

Memory requirements.

Data is distributed across:

Memory hierarchy.Multiple disks.Network.

Algorithmic

Reorganization.

Randomization.

Heuristics.

Visualization.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Agenda

Topics

Graph mining

Characteristics

Distributions

Models

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Overview

Characteristics

Self similar and scale free.

Shrinking diameter.

Giant component.

Communities.

Densification.

Preferential attachment.

Heavy tails in degree distributions.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Scale free graphs

Definition

Let g (n) be the graph growth function. Then

g (β0 n) = g (β0) g (n)

limt→+∞

(g (β0 n)

g (n)

)= Θ (1)

Notes

The most fundamental property.

Allows self-similar graphs.

Contributes to overall graph robustness.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Shrinking diameter

Definition

Maximum shortest path length descreases.

Notes

Prone to outliers.

Effective diameter.

Small world phenomena.

Robustness.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Densification

Definition

Ratio of edges to vertices is an increasing time function.

Notes

Average degree 〈d〉 increases.

〈d〉 exact behaviour is of interest.

Degree distribution is of interest too.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Giant connected component

Definition

Graph connectivity threshold point t0:

Many, medium components before t0.

One, large component after t0.

Notes

t0 depends on average degree 〈d〉.t0 is a phase transition point.

Phase transition in connectivity pattern.

Interesting patterns around t0.

Similar to phase change from complexity theory.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Agenda

Topics

Graph mining

Characteristics

Distributions

Models

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Overview

Significant graph metric

Defines graph topology.

Affects graph growth rate.

Affects self-similarity.

Notes

Degree distribution alone cannot describe growth pattern.

Vertex degree-degree correlation is more informative.

Vertex degree ranking is of interest.

No completely satisfactory model exists.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Exponential

Definition

prob {X = k} = λ0 e−λ0 k λ0 > 0

Notes

Tractable bounds.

Closed forms.

Memoryless property.

Quick decay.

Last two properties undesired for self similarity.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Binomial

Definition

prob {X = k} =

(n

k

)pk0 (1− p0)1−k 0 < p0 < 1

Notes

n is the number of vertices.

Some early models lead to it.

When n grows large, it is approximated by exponential.

Quick decay.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Poisson

Definition

prob {X = k} = e−〈d〉〈d〉k

k!

Notes

Other early models lead to it.

Also approximated by exponential.

Also quick decay.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Power law

Definition

prob {X = k} = P (k) = α0 k−γ0 α0 > 0, γ0 ≥ 1

Notes

α0 is a normalization constant.

γ0 is termed the system exponent.

Characterizes graph subclasses.Connections to Lyapunov exponent.

Zipf law for γ0 = 1.

Information retrieval.

Lotka law for γ0 = 2.

Scientific citation networks.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Power law

Notes

When γ0 < 2, then the average degree diverges.

When γ0 < 3, then the standard deviation diverges.

Empirically for most scale free graphs 2 < γ0 < 3.

Real world graph properties have been attributed to divergence.

Forces graph to be finite.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Cutoff power law

Definition

prob {X = k} = α0 (k + k0)−γ0 e−kk1 α0 > 0, γ0 ≥ 1

Notes

α0 and γ0 as before.

k0 and k1 need to be determined.

k0 is the degree threshold.

Exponential decay through k1 only for large degrees.

Fits best to actual data.

Upper and lower degree bounds are important.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Multifractal

Definition

prob {X = k} =

p∑i=1

wi Pi (k) ,

p∑i=1

wi = 1

Notes

Weighted power law sum.

More general than power law.

Harder to fit.

Recent model.

Promising early results.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Agenda

Topics

Graph mining

Characteristics

Distributions

Models

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Complementary approaches

Power distributions are given

Locate large graphs that possess power distributions.

Derive properties concerning deeper structure.

Black box approach.Closer to real world problems.Domain specific.

Power distributions are generated

Generate small graphs that possess power distributions.

Find evolution model ensuring power distributions.

Monitor graph properties and structure.

White box approach.Generic and flexible.More demanding.Overmodeling and undermodeling issues.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Overview

Random models

Gn,p or Erdos - Renyi.

Albert - Barabasi.

Gene duplication.

Kronecker.

Deterministic.Probabilistic.

Aiello family.

Other models

Gn.

Gn,m.

Kumar family.

R-Mat.

Jelly point.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Gn,p model

Overview

Static.

Probabilistic.

Each of the(n2

)edge exists independently with probability p.

Notes

Easy interpretation.

Easy generation.

Closed forms for most characteristics.

Independent edges lead to exponential distribution.

No preferential attachment.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Albert - Barabasi model

Overview

Dynamic.

Probabilistic.

Number of vertices and edges evolves.

Start with n0 connected verticesAdd a new vertex of random degree τ < n0.Select τ vertices with probability proportional to their degree.

Notes

Easy interpretation.

Preferential attachment.

Uniform selection probability leads to exponential distribution.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Gene duplication model

Overview

Start with a graph of n0 genes.

Select uniformly a gene g0.

Duplicate g0.

With probability p0 remove an edge from g0.

Edges are uniformly selected.

With probability q0 add an edge to g0.

Genes are uniformly selected.

Notes

Dynamic.

Probabilistic.

Peferential attachment.

Leads to power law.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Kronecker modelDeterministic variant

Overview

Graphs are represented by adjacency matrices.

Kronecker products are heavily used.

Generation

Select a generator graph.

Not an easy topic.

Leads to self similar graphs.

Has most known properties.

Generalizes R-Mat model.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Kronecker modelDeterministic variant

Adjacency matrix

A [i , j ]4=

{1, (vi , vj) ∈ E

0, (vi , vj) 6∈ E

Linear-algebraic aspect

Time evolving.

Adjacency matrix density.

Power distribution in:

Adjacency matrix eigenvalues.Principal eigenvector components.

Alternative representations:

Incidence matrix.Circuit matrix.(Normalized) Laplace matrix.Konig matrix.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Kronecker modelAnalysis

Kronecker product

For A ∈ Rm1×n1 and B ∈ Rm2×n2 :

A⊗B4=

a[1, 1]B a[1, 2]B . . . a[1, n1]Ba[2, 1]B a[2, 2]B . . . a[2, n1]B

......

. . ....

a[m1, 1]B a[m1, 2]B . . . a[m1, n1]B

∈ Rm1m2×n1n2

where a[i , j ] denotes the entry of A at row i and column j .

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Kronecker modelExample

Generator graph

A[0] = A[1] =

1 1 11 1 01 0 1

Next step

A[2] = A[1] ⊗ A[0] =

1 1 1 1 1 1 1 1 11 1 0 1 1 0 1 1 01 0 1 1 0 1 1 0 11 1 1 1 1 1 0 0 01 1 0 1 1 1 0 0 01 0 1 1 1 1 0 0 01 1 1 0 0 0 1 1 11 1 0 0 0 0 1 1 01 0 1 0 0 0 1 0 1

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Kronecker modelProbabilistic variant

Overview

Replace adjacency matrix with probability matrix.

Each row sums to one.Nonzero entires.Row stochastic matrix.Doubly stochastic matrix if symmetric.

Follow deterministic procedure.

Result is distribution matrix.

Notes

More general.

More flexible.

Numerical challenges.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Aiello model family

Overview

Reviews existing random graph models.

Provides a partial framework for building random graphs.

Introduces four new random graph models termed A, B, C, andD:

Models A and B are simple models for directed graphs.Model C encompasses and extends Models A and B.Model D generalizes Model C for undirected graphs.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Model AOverview

Graph evolution

Start with n0 connected vertices.

With probability (1− p0):

Add a new vertex v0.Set in- and out-degree of v0 are one.Select one vertex vout and form an edge (vout , v0).Increase out-degree of vout by one.Select one vertex vin and form an edge (v0, vin).Increase in-degree of vin by one.Both vin and vout are chosen with probability proportional totheir in- or out-degree.

With probability p0:

Select two vertices vin and vout .Add the edge (vin , vout)Both vin and vout are chosen with probability proportional totheir in- or out-degree.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Model AAnalysis

Key points

Power distributions for both in- and out-degrees.

Simplistic model.

The same exponent for in- and out-degree distributions.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Model BOverview

Graph evolution

Almost the same dynamics as model A.

Use in- and out-weights to select nodes instead of in- andout-degrees.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Model BAnalysis

Key points

Power distributions for both in- and out-degrees.

More flexible model.

Different exponents for in- and out-degree distributions.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Model COverview

Graph evolution

Almost the same dynamics as model B.

When adding a new vertex:

me,e edges randomly.mn,e to the new vertex randomly.me,n from the new vertex randomly.mn,n loops to the new vertex.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Model CAnalysis

Key points

Power distributions for both in- and out-degrees.

Far more flexible model.

Different exponents for in- and out-degree distributions.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Model DOverview

Graph evolution

Unirected version of model B.

Still use two weights internally.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Bibliography

Bibliography

Aiello W., Chung F., and Lu L., Random evolution in massivegraphs, in Abello J., Pardalos P., and Ressende M. editors,Handbook of massive data sets, Kluwer, 2002.

Vitter J., Algorithms and data structures for external memory:Dealing with massive data, ACM, 2001.

Leskovec J., Chakrabarti D., Kleinberg J., and Faloutsos C.,Realistic, mathematically tractable graph generation andevolution using Kronecker multiplication, PKDD05, Springer,2005.

Biologicalrandom graph

models

GeorgiosDrakopoulos

Graph mining

Characteristics

Distributions

Models

Questions?

Thank you for your time