graph spectra through network complexity measures: information content of eigenvalues

Graph Spectra through NetworkComplexity Measures

Information Content of Eigenvalues

Hector Zenil(joint work with Narsis Kiani and Jesper Tegner)

Unit of Computational Medicine, Karolinska Institutet

@ Department of Mathematics, Stockholm University

May 27, 2015Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 1 / 42

Outline:

1 Estimating Kolmogorov complexity2 n-dimensional complexity3 Graph Algorithmic Probability and Kolmogorov complexity of

networks4 Applications to complex networks and graph spectra

Material mostly drawn from:

1 joint with Soler et al. Computability (2013). [1]2 joint with Gauvrit et al. Behavior Research Methods (2013). [3]3 Zenil et al. Physica A (2014). [4]4 joint with Soler et al. PLoS ONE (2014). [6]5 Zenil, Kiani and Tegner, LNCS 9044, (2015). [2]6 Zenil and Tegner, Symmetry (forthcoming).7 Zenil, Kiani and Tegner, Seminars in Cell and Developmental

Biology (in revision).

Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 2 / 42

Main goal

Main goal throughout this talk: To study properties of graphs andnetworks with measures from information theory and algorithmic

complexity.

Table : Numerical calculations of (mostly) uncomputable functions:

Busy Beaver problem upper semi-computableKolmogorov-Chaitin complexity lower semi-computable

Algorithmic Probability (Solomonoff-Levin) upper semi-computableBennett’s Logical Depth uncomputable

Lower semi-computable: can be approximated from above.Upper semi-computable: can be approximated from below.


The basic unit in Theoretical Computer Science

The cell (the smallest unit of life) is to Biology what the Turing machine isto Theoretical Computer Science.

Finite state diagram

[A.M. Turing (1936)]Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 4 / 42

One machine for everything

Computation (Turing-)universality

(a) Turing proves that a M with input x can be encoded as an input M(x)for a machine U such that if M(x) = y then U(M(x)) = y for any Turingmachine M.

You do not need a computer for each different task, only one!There is no distinction between software/hardware or data/program

Together with Church’s thesis:

Church-(Turing)’s thesis

(b) Every effectively computable function is computable by a Turingmachine.

Together the 2 suggest that:

Anything can be programmed/simulated/emulated by a universalTuring machine.


The undecidability of the Halting Problem

The existence of the universal Turing machine U brings a fundamentalGodel-type contradiction about the power of U (any universal machine):

Let’s say we want to know whether a machine M will halt for input x .

Assumption:

We can program U in such a way that if M(x) halts then U(M(x)) = 0otherwise U(M(x)) = 1. So U is a (halting) decider.

Contradiction:

Let M(x) = U(x), then U(U(x)) = 0, if and only if, U(x) = 1 andU(U(x)) = 1, if and only if, U(x) = 0.

Therefore the assumption that we can know whether a Turing machinehalts in general is not true.

There is also a non-constructive proof using Cantor’s diagonalisation method.


Computational irreducibility

(1) Most fundamental irreducibility:

If M halts for input x , you have to run either M(x) or U(M(x)) to knowit, but if M does not halt, neither running M(x) or U(M(x)) will tell youthat they do not halt.

Most uncomputability results are of this type, you can know in onedirection but not the other (e.g. when a string is random as we will see).

(2) Secondary irreducibility (corollary):

U(M(x)) can only produce time speedup on M(x) but not computationspeed up (connected to time complexity, P 6= NP time results) in general,specially for (1). In other words, O(U(M(x))) ∼ O(M(x)), orO(U(M(x))) = c × O(M(x)), with c a constant.

(2) is believed to be more pervasive than what (1) implies.


Complexity and information content of strings

Example (3 strings of length 40)

a: 1111111111111111111111111111111111111111b: 11001010110010010100111000101010100101011c: 0101010101010101010101010101010101010101

According to Shannon (1948):

(a) has minimum Entropy (only one micro-state).(b) has maximum Entropy (two micro-states with same frequencyeach).(c) has also maximum Entropy! (two micro-states with samefrequency each).

Shannon Entropy inherits from classical probability

Shannon Entropy suffers of similar limitations: strings (b) and (c) have thesame Shannon Entropy (same number of 0s and 1s) but they appear ofvery different nature to us.


Statistical v algorithmic

Entropy rate can only fix statistical regularities but not correlation

Thue-Morse sequence: 01101001100101101001011001101001Segment of π in binary: 0010010000111111011010101000100

Definition

Kolmogorov(-Chaitin) complexity (1965,1966):

KU(s) = min{|p|,U(p) = s}

Algorithmic Randomness (also Martin Lof and Schnorr)

A string s is random if K (s) (in bits) ∼ |s|.

Correlation versus causation

Shannon Entropy is to correlation what Kolmogorov is to causation!


Example of an evaluation of K

The string 01010101...01 can be produced by the following program:

Program A:1: n:= 02: Print n3: n:= n+1 mod 24: Goto 2

The length of A (in bits) is an upper bound of K (010101...01) (+ thehalting condition).

Semi-computability of K

Exhibiting a short version of a string is a sufficient test fornon-randomness, but the lack of a short description (program) does notimply a sufficient test for randomness.


The founding theorem of K complexity: Invariance tochoice of U

Do we measure K with programming language or universal TM U1 or U2?

|KU1(s)− KU2(s)| < cU1,U2

It is not relevant in the limit, the difference is a constant that vanishes thelonger the strings.

Rate of convergence of K and the behaviour of c with respect to |s|

The Invariance theorem in practice is a negative result

The constant involved can be arbitrarily large, the theorem tells nothingabout the convergence. Any estimating method of K is subject to it.


Compression is Entropy rate not K

Actual implementations of lossless compression have 2 main drawbacksand pitfalls:

Lossless compression as entropy rate estimators

Actual implementations of lossless compression algorithms (e.g.Lempev-Ziv, BZip2, PNG), seek for statistical regularities, repetitions in asliding fixed-length window of size w , hence entropy rate estimators up toblock (micro-state) length w . Their success is only based on one side ofthe non-randomness test, i.e. low entropy = low K .

Compressing short strings

The compressor also adds the decompression instructions to the file. Anystring shorter than say 100 bits is impossible to further compress or to getany meaningful ranking from compressing them (100 bps strings instructural molecular biology is long).


Alternative to lossless compression algorithms

Figure : (originally Emile Borel’s infinite monkey theorem): A monkey on acomputer produces more structure by chance than a monkey on a typewriter.

[Inspired by a sketch from C. Bennett]Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 13 / 42

Algorithmic Probability (semi-measure, Levin’s UniversalDistribution)

Definition

The classical probability of production of a bit string s among all 2n bitstings of size n (classical monkey theorem):

Pr(s) = 1/2n (1)

Definition

Let U be a (prefix-free from Kraft’s inequality) universal Turing machineand p a program that produces s running on U, then

m(s) =∑

p:U(p)=s

1/2|p| < 1 (2)


The algorithmic Coding theorem

Connection to K !

The greatest contributor in the def. of m(s) is the shortest program p, i.e.K (s).

The algorithmic Coding theorem describes the reverse connection betweenK (s) and m(s):

Theorem

K (s) = − log2(m(s)) + O(1) (3)

Frequency and complexity are related

If a string s is produced by many programs then there is also a shortprogram that produces s (Thomas & Cover (1991)).

[Solomonoff (1964); Levin (1974); Chaitin (1976)]Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 15 / 42

The Coding Theorem Method (CTM) flow chart

Enumerate & run every TM ∈ (n,m) for increasing n and m (Busy Beaver values

to determine halting time, otherwise informed runtime cutoff value (see e.g.

Calude & Stay, Most programs stop quickly or never halt, 2006).

[Soler, Zenil et al, PLoS ONE (2014)]Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 16 / 42

Changes in computational formalism

[H. Zenil and J-P. Delahaye, On the Algorithmic Nature of the World; 2010]


Elementary Cellular Automata

An elementary cellular automaton (ECA) is defined by a local functionf : {0, 1}3 → {0, 1},

Figure : Space-time evolution of a cellular automaton (ECA rule 30).

f maps the state of a cell and its two immediate neighbours (range = 1)to a new cell state: ft : r−1, r0, r+1 → r0. Cells are updated synchronouslyaccording to f over all cells in a row.

[Wolfram, (1994)]Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 18 / 42

Convergence in ECA classification (CTM v Compress)

Scatterplot of ECA classification: CTM (x-axis) versus Compress (y-axis).

[Soler-Toscano et al., Computability ; 2013]


Part II

GRAPH ENTROPY, GRAPH ALGORITHMICPROBABILITY AND GRAPH KOLMOGOROV

COMPLEXITY


Graph Entropy definitions are not robust

Several definitions (e.g. from molecular biology) if Graph Entropy havebeen proposed, e.g.:

A complete graph has highest entropy H if defined as containing allpossible subgraphs up to the graph size, i.e.

H(G ) = −|G |∑i

P(Gi ) log2 P(Gi )

where Gi is a subgraph of increasing size i in G . However,H(Adj(G )) = −P(Adj(G )) log2 P(Adj(G )) = 0 ! (and also all the adjmatrices of all the subgraphs, so the sum would be 0 too !)

Graph Entropy

Complete and disconnected have then maximal and minimal entropyrespectively. Alternative definitions include, for example, the number ofbifurcations traversing the graph starting from any random node, etc.


Graph Kolmogorov complexity (Physica A)

Unlike Graph Entropy, Graph Kolmogorov complexity is very robust:

complete graph: K ∼ log(|N|) E-R random graph: K ∼ |E |M. Gell-Mann (Nobel Prize 1969) thought that any reasonable measure of complexity of

graphs should have both completely disconnected and completely connected graphs to

have minimal complexity (The quark and the jaguar, 1994).

Graph Kolmogorov complexity

Complete and disconnected graphs with |N| nodes have low (algorithmic)information content. In a random graph every edge e ∈ E requires someinformation to be described. Both K (G ) ∼ K (Adj(G )) !


Numerical estimation of K (G )

An labelled graph is uniquely represented by its adjacency matrix. So thequestion is What is the Kolmogorov complexity of an adjacencymatrix?

Figure : Two-dimensional Turing machines, also known as Turmites (Langton, Physica

D, 1986).

We will provide the definition of Kolmogorov complexity for unlabelledgraphs later.

[Zenil et al. Physica A, 2014]


An Information-theoretic Divide-and-Conquer Algorithm!

The Block Decomposition method uses the Coding Theorem method.Formally, we will say that an object c has (2D) Kolmogorov complexity:

K2Dd×d(c) =

∑(ru ,nu)∈cd×d

K2D(ru) + log2(nu) (4)

where cd×d represents the set with elements (ru, nu), obtained fromdecomposing the object into (overlapping) blocks of d × d with boundaryconditions. In each (ru, nu) pair, ru is one of such squares and nu itsmultiplicity.

[Zenil et al., Two-Dimensional Kolmogorov Complexity and Validation of theCoding Theorem Method by Compressibility (2012)]


Classification of ECA by BDM (= Km) and Compress

Representative ECAs sorted by BDM (top row) and Compress (bottom row).

[H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit, Two-DimensionalKolmogorov Complexity and Validation of the Coding Theorem Method by

Compressibility (2012)]


Complementary methods for different object lengths

The methods coexist and complement each other for different stringlengths (transitions are also smooth).

method short strings long strings scalability time domain< 100 bits > 100 bits

Lossless O(n) Hcompression × 3 3

CodingTheorem O(exp) K

method (CTM) 3 × ×CTM + BlockDecomposition O(n) K → Hmethod (BDM) 3 3 3

Table : H stands for Shannon Entropy and K for Kolmogorov complexity. BDMcan therefore be taken as an improvement to (Block) Entropy rate for a fixedblock size. For CTM: http://www.complexitycalculator.com


http://www.complexitycalculator.com

Graph algorithmic probability

Works on directed and undirected graphs.

Torus boundary conditions provide a solution to the boundaries problem.

Overlapping sub matrices avoids the problem of not permutation invariance butleads to overfitting.

The best option is to recursively divide into square matrices for which exactcomplexity estimations are known.

[Zenil et al. Physica A (2014)]Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 27 / 42

K and graph automorphism group (Physica A)

Figure : Left: An adjacency matrix is not a graph invariant yet isomorphic graphshave similar K . Right: Graphs with large automorphism group size (groupsymmetry) have lower K .

This correlation suggests that the complexity of unlabelled graphs is captured by

the complexity of their adjacency matrix (which is a labelled graph object).

Indeed, in Zenil et al. LNCS we show that the complexity of a labelled graph is a

good approximation to its unlabelled graph complexity.

[Zenil et al. Physica A (2014)]


Unlabelled Graph Complexity

The proof sketch of the labelled graph complexity ∼ unlabelled graphcomplexity uses the fact that there is an algorithm (e.g. brute force) offinite (small) size that produces any isomorphic graph from any other.

Yet, one can define Graph unlabelled Kolmogorov complexity as follows:

Definition

Graph Unlabelled Kolmogorov Complexity: Let Adj(G ) be the adjacencymatrix of G and Aut(G ) its automorphism group, then,

K (G ) = min{K (Adj(G ))|Adj(G ) ∈ A(Aut(G ))}

where A(Aut(G )) is the set of adjacency matrices of all G ∈ Aut(G ).(The problem is believed to be in NP but not in NP-complete).

[Zenil, Kiani and Tegner (forthcoming)]


Graph automorphisms and algorithmic complexity by BDM

Classifying (and clustering) ∼ 250 graphs (no Aut(G ) correction) withdifferent topological properties by K (BDM):

[Zenil et al. Physica A (2014)]Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 30 / 42

Graph definitions

Definition

Dual graph: A dual graph of a plane graph G is a graph that has a vertexcorresponding to each face of G , and an edge joining two neighboringfaces for each edge in G .

Definition

Graph spectra: The set of graph eigenvalues of the adjacency matrix iscalled the spectrum of the graph. The Laplacian matrix of a graph issometimes also known as the graph’s spectrum.

Definition

Cospectral graphs: Two graphs are called isospectral or cospectral if theyhave the same spectra.


Testing compression and BDM on dual graphs

[Zenil et al. Physica A (2014)]


H, compression and BDM on cospectral graphs

Figure : Information content of graphs and networks. (A) Asymptoticbehavior of BDM and of (a) Shannon’s entropy when adding disconnected nodesto a cycle graph of size 4 as a test of error estimation of graph entropy and graphcomplexity. (Bb) Examples of cospectral graphs. Entropy (B) and algorithmiccomplexity estimations by Bzip2 (C) and BDM (D) for a set of 180 graphs andtheir cospectrals.


Quantifying Loss of Information in Network-basedDimensionality Reduction Techniques

Figure : Flowchart of Quantifying Loss of Information in Network-basedDimensionality Reduction Techniques.


Methods of (Algorithmic) Information Theory in networkdimensionality reduction

Figure : Information content of graph spectra and graph motif analysis.Information content of 16 graphs of different types and the information contentof their graph spectra approximated by Bzip2, Compress and BDM.



Figure : Information content progression of sparsification. Information lossafter keeping from 20 to 80% of the graph edges (100% corresponds to theinformation content of the original graph).



Figure : Plot comparing all methods as applied to 4 artificial networks. Theinformation content measured as normalized complexity with two different losslesscompression algorithms was used to assess the sparsification, graph spectra andgraph motif methods. The 6 networks from the Mendes DB are of the same sizeand each method displays different phenomena.


Eigenvalue information weight effect on graph spectra

In graph spectra either the largest eigenvalue (λ1) is only considered, or alleigenvalues (λ1...λn) are given the same weight. Yet eigenvalues capturedifferent properties and are sensitive to graph specifity, e.g. in a completegraph λ1 provides the graph size.

Figure : Graph spectra can be plotted in an n-dimensional space where n is thegraph node size (and number of Eigenvalues). When a graph G evolves itsspectra changes from Spec1(G ) to Spec2(G ′) as in the figure, but if not alleigenvalues are equally important hence the distance d(Spec1(G ), Spec2(G ′)) ison a manifold and not on Euclidian space.


Eigenvalues in Graph Spectra are not all the same

Nor their magnitude is of any relevance (e.g. taking the largest one only):

Figure : Statistics (ρ) and p-value plots between graph complexity (BDM) andlargest, second largest and smallest Eigenvalues of 204 different graph classesincluding 4913 graphs. Clearly the graph class complexity correlates in differentways to different Eigenvalues.

[Source: Zenil, Kiani and Tegner LNCS (2015)]


Eigenvalues of evolving networks

Most informative eigenvalues to characterize a family of networks andindividuals in such a family:

Figure : The complexity of graph versus the complexity of the list of eigenvaluesper position (rows) provides information about the amount and kind ofinformation stored in each eigenvalue, and the maximum entropy of rows alsoidentifies the eigenvalue that best characterize the changes in the evolvingnetwork that otherwise display very little topological changes.


Entropy and complexity of Eigenvalue families

Let n be the number of datapoints of an evolving graph (or a family ofgraphs to study), H the Shannon Entropy, K Kolmogorov complexity andKS the Kolmogorov-Sinai Entropy (∼ interval Shannon Entropy), then weare interested in:

H(Spec(G i )),K (Spec(G i )),KS(Spec(G i ))

where i ∈ {1, . . . , n} to study the Eigenvalue behavior with respect toKBDM(G i ), and

KS(λ11, λ21, . . . , λ

n1)

. . .

KS(λ22, λ22, . . . , λ

n2)

. . .

maximizing the differences between G i hence characterizing G in time.Zenil, Kiani, Tegner (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 41 / 42

Part 2 Summary

1 We have a sound, robust and native 2-dimensional complexitymeasure applicable to graphs and networks.

2 The method is scalable, e.g. in 3 dimensions, I call CTM3D , the “3Dprinting complexity measure” because as you can see it only requiresthe Turing machine to operate in a 3D grid, effectively the probabilityof a random computer program to print a 3D object!

3 The defined graph complexity measure captures algebraic, topological(and, forthcoming, even physical properties) of graphs and networks.

4 There is a potential for applications in network and synthetic biology.

5 The method may prove to be very effective at giving proper weight toeigenvalues and even shedding light on their meaning and informationcontent.


F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, Correspondenceand Independence of Numerical Evaluations of Algorithmic InformationMeasures, Computability, vol. 2, no. 2, pp. 125–140, 2013.

H. Zenil, N.A. Kiani, J. Tegner, Numerical Investigation of Graph Spectraand Information Interpretability of Eigenvalues, IWBBIO 2015, LNCS 9044,pp. 395–405. Springer, 2015.

N. Gauvrit, H. Zenil, F. Soler-Toscano and J.-P. Delahaye, Algorithmiccomplexity for short binary strings applied to psychology: a primer, BehaviorResearch Methods, vol. 46-3, pp 732-744, 2013.

H. Zenil, F. Soler-Toscano, K. Dingle and A. Louis, Correlation ofAutomorphism Group Size and Topological Properties with Program-sizeComplexity Evaluations of Graphs and Complex Networks, Physica A:Statistical Mechanics and its Applications, vol. 404, pp. 341–358, 2014.

J.-P. Delahaye and H. Zenil, Numerical Evaluation of the Complexity ofShort Strings, Applied Mathematics and Computation, 2011.

F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, CalculatingKolmogorov Complexity from the Output Frequency Distributions of SmallTuring Machines, PLoS ONE, 9(5): e96223, 2014.


J.P. Delahaye and H. Zenil, On the Kolmogorov-Chaitin complexity for shortsequences, in Cristian Calude (eds), Complexity and Randomness: FromLeibniz to Chaitin, World Scientific, 2007.

G.J. Chaitin A Theory of Program Size Formally Identical to InformationTheory, J. Assoc. Comput. Mach. 22, 329-340, 1975.

R. Cilibrasi and P. Vitanyi, Clustering by compression, IEEE Trans. onInformation Theory, 51(4), 2005.

A.N. Kolmogorov, Three approaches to the quantitative definition ofinformation Problems of Information and Transmission, 1(1):1–7, 1965.

L. Levin, Laws of information conservation (non-growth) and aspects of thefoundation of probability theory, Problems of Information Transmission,10(3):206–210, 1974.

R.J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2,Information and Control, 7:1–22 and 224–254, 1964.

S. Wolfram, A New Kind of Science, Wolfram Media, 2002.


graph spectra through network complexity measures: information content of eigenvalues

Science

turing machine halts

universal turing machine

universal machine

algorithmic complexity

narsis kiani

information theory

stockholm university

input mx