detecting topological patterns in protein...

84
Detecting topological patterns in complex networks Sergei Maslov Brookhaven National Laboratory

Upload: others

Post on 08-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Detecting topological patterns in complex networks

Sergei MaslovBrookhaven National Laboratory

Page 2: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Networks in complex systemsComplex systems

Large number of componentsThey interact with each otherAll components and/or interactions are differentfrom each other (unlike in traditional physics)104 types of proteins in an organism,106 routers, 109 web pages, 1011 neurons

The simplest property of a complex system: who interacts with whom? can be visualized as a networkBackbone for real dynamical processes

Page 3: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Why study the topology of complex networks?

Lots of easily available data: that’s where the state of the art information is (at least in biology)Large networks may contain information about basic design principles and/or evolutionary history of the complex systemThis is similar to paleontology: learning about an animal from its backbone

Page 4: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Plan of my lecturesPart 1: Bio-molecular (protein) networks

What is a protein network?Degree distributions in real and random networksBeyond degree distributions: How it all is wired together? Correlations in degreesHow do protein networks evolve?

Part 2: Interplay between an auxiliary diffusion process (random walk) and modules and communities on the Internet and WWW

Page 5: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Collaborators:Biology:

Kim Sneppen – U. of CopenhagenKasper Eriksen – U. of LundKoon-Kiu Yan – Stony Brook U.Ilya Mazo – Ariadne GenomicsIaroslav Ispolatov – Ariadne Genomics

Internet and WWW:Kasper Eriksen – U. of LundKim Sneppen – U. of CopenhagenIngve Simonsen – NORDITAKoon-Kiu Yan – Stony Brook U.Huafeng Xie –City University of New York

Page 6: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Postdoc positionLooking for a postdoc to work in my group at Brookhaven National Laboratory in New Yorkstarting Fall 2005Topic - large-scale properties of (mostly) bionetworks (partially supported by a NIH/NSF grant with Ariadne Genomics)E-mail CV and 3 letters of recommendation to: [email protected] www.cmth.bnl.gov/~maslov

Page 7: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Protein networksNodes – proteins Edges – interactions between proteins

Metabolic (all metabolic pathways of an organism)Bindings (physical interactions)Regulations (transcriptional regulation, protein modifications)Disruptions in expression of genes (remove or overexpress one gene and see which other genes are affected) Co-expression networks from microarray data (connect genes with similar expression patterns under many conditions)Synthetic lethals (removal of A doesn’t kill, removal of B doesn’t kill but removal of both does)Etc, etc, etc.

Page 8: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Sources of data on protein networks

Genome-wide experimentsBinding – two-hybrid high throughput experiments Transcriptional regulation – ChIP-on-chip, or ChIP-then-SAGEExpression, disruption networks – microarraysLethality of genes (including synthetic lethals):

Gene knockout – yeast RNAi –worm, fly

Many small or intermediate-scale experimentsPublic databases: DIP, BIND, YPD (no longer public), SGD, Flybase, Ecocyc, etc.

Page 9: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

MAPK signalingInhibition of apoptosis

Images from ResNet3.0 by Ariadne Genomics

Page 10: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Pathway network paradigm shift

Pathway Network

Page 11: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Transcription regulatory networks

Bacterium: E. coli3:2 ratio

Single-celled eukaryote:S. cerevisiae; 3:1 ratio

Page 12: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Data from Ariadne Genomics

Homo sapiens

Total: 120,000 interacting protein pairs extracted from PubMed as of 8/2004

Page 13: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Degree (or connectivity) of a node – the # of neighbors

DegreeK=4

DegreeK=2

Page 14: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Directed networks havein- and out-degrees

Out-degreeKout=5

In-degreeKin=2

Page 15: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

How many transcriptional regulators are out there?

Page 16: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Fraction of transcriptional regulators in bacteria

from Stover et al., Nature (2000)

Page 17: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Complexity of regulation grows with complexity of organism

NR<Kout>=N<Kin>=number of edgesNR/N= <Kin>/<Kout> increases with N

<Kin> grows with NIn bacteria NR~N2 (Stover, et al. 2000) In eucaryots NR~N1.3 (van Nimwengen, 2002)

Networks in more complex organisms are more interconnected then in simpler ones

Page 18: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Complexity is manifested in Kin distribution

E. coli vs H. sapiens

Page 19: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Figure from Erik van Nimwegen, TIG 2003

Page 20: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Table from Erik van Nimwegen, TIG 2003

Page 21: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Protein-protein binding (physical interaction)

networks

Page 22: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Two-hybrid experiment

To test if A interacts with B create two hybridsA* (with Gal4p DNA-binding domain) and B* (with Gal4p activation domain)

High-throughput: all pairs among 6300 yeast proteins are tested: Yeast: Uetz, et al. Nature (2000), Ito, et al., PNAS (2001); Fly: L. Giot et al. Science (2003)

Gal4-activated reporter gene, say GAL2::ADE2Gal4-binding domain

A* B*

Page 23: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Protein binding network in yeast

Page 24: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Protein binding networks

Two-hybrid nuclear Database (DIP) core set

S. cerevisiae

Page 25: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Degree distributionsin random and real networks

Page 26: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Degree distribution in a random network

Randomly throw E edges among N nodesSolomonoff, Rapaport, Bull. Math. Biophysics (1951)Erdos-Renyi (1960)Degree distribution –PoissonK~λ±√λ with no hubs(fast decay of N(K))

( ) exp( )!

2 /

K

N K NK

K E N

λ λ

λ

= −

= =

Page 27: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Degree distribution in real protein networks

Histogram N(K) is broad: most nodes have low degree ~ 1, few nodes – high degree ~100Can be approximately fitted with N(K)~K-γ

functional formwith γ~=2.5

H. Jeong et al. (2001)100 101 102 10310-4

10-2

100

102

104

x-2.5

Page 28: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Network properties of self-binding proteins AKA homodimers

I. Ispolatov, A. Yuryev, I. Mazo, SMq-bio.GN/0501004.

Page 29: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

There are just TOO MANY homodimers

• Null-model • Pself ~<k>/N• Ndimer=N • Pself= <k>• Not surprising ashomodimers have many functional roles

Page 30: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Network properties around homodimers

Page 31: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Fly: two-hybrid data Human: database data

Likelihood to self-interact vs. K

Pself~0.05, Pothers~0.0002 Pself~0.003, Pothers~0.0002

Page 32: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

What we think it means?In random networks pdimer(K)~K2 not ~K like our empirical observationK is proportional to the “stickiness” of the protein which in its turn scales with

the area of hydrophobic residues on the surface# copies/cellits popularity (in datasets taken from a database)Etc.

Real interacting pairs consists of an “active” and “passive” protein and binding probability scales only with the “stickiness” of the active protein“Stickiness” fully accounts for higher than average connectivity of homodimers

Page 33: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Propagation of signals and perturbations in networks

Page 34: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Naïve argumentAn average node has <K> first neighbors, <K><K-1> second neighbors, <K><K-1><K-1> third neighborsTotal number of neighbors affected is1+<K> + <K><K-1> +<K><K-1><K-1>+…If <K-1> ≥ 1 a perturbation starting at a single node MAY affect the whole network

Page 35: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Where is it wrong?

Probability to arrive to a node with K neighbors is proportional to K!The right answer: <K(K-1)>/<K> ≥ 1a perturbation would spreadCorrelations between degrees of neighbors would affect the answer

Page 36: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

How many clusters?

If <K(K-1)>/<K> >> 1 there is one “giant” cluster and few small ones. If <K(K-1)>/<K> ∼ 1 small clusters have a broad (power-law) distributionPerturbation which affects neighbors with probability p propagates if p<K(K-1)>/<K> ≥ 1For scale-free networks P(K)~K-γ with γ<3, <K2>=∞ ⇒ perturbation always spreads in a large enough network

Problem 2: For a given p and γ how large should be the network for the perturbation to spread

Page 37: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Amplification ratios

• A(dir): 1.08 - E. Coli, 0.58 - Yeast• A(undir): 10.5 - E. Coli, 13.4 – Yeast• A(PPI): ? - E. Coli, 26.3 - Yeast

Problem 3: derive the above formula for directed networks

Page 38: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Beyond degree distributions:How is it all wired together?

Page 39: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Central vs peripheral network architecture

central(hierarchical)

peripheral(anti-hierarchical)

From A. Trusina, P. Minnhagen, SM, K. Sneppen, Phys. Rev. Lett. (2004)

random

Page 40: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Correlation profile

Count N(k0,k1) – the number of links between nodes with connectivities k0 and k1

Compare it to Nr(k0,k1) – the same property in a random networkQualitative features are very noise-tolerant with respect to both false positives and false negatives

Page 41: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

PRP45

SYF1

RFA1RFA3

SMT3

HTB2

ALPHA1

FUS3DIG2

DIG1

APN2

GCN4

RRN10

RPN10

LSM2PAT1

RVB1

SKI6

MED6

RFC5RFC2

DCP1

SIF2RNA14

MGA1

NPL4CDC48

YAP6

STD1

NTC20

MSI1

NEM1CDC73

TAF90PIF1

NUP100

YAP1

YBR223C

YBR239CYPL133C

NUP192

NIP29

BBP1

DPB3ADA2

HAS1

KAR4

IME4

ALPHA2TOP1

POL4GZF3

CIN8

POP3

RSC6RSC8

UME6

PRP11

PRP21

PRP9

UBC9

RRP42

MTR3

NIP100

NUP84PDS1

NUP145

DPB11RRS1

SAS10

DEG1PRP43

NET1MPP10

RAP1

KSS1

PBP1

CDC13

STN1

HSP26YDR026C

SRP40

LYS14YDR449C

GTS1

DAM1

CHA4

SEN2

CTF13CBC2

RPO26

RAD55RAD51

TRM1

GBP2

NGG1

TID3

RVB2

NUP42

SPC19

SPC34

HMT1

FKH1

RRP45

NUP53

TFB1PTA1

JNM1MCM21

SNU23SNM1 SMB1

SPO12

SKP1SGT1

MSN5

LSM6LSM5

RPT3

NCB2

BUR6

REC107

PRP3

PHO85

PRP4

CCL1

NUP82

SLT2

SRB2

RPC40

SRB4

DOA4HTA1YFR038W

CRM1

IME1CBF5

TSP1MCM1

MED11

RIM11

MER1POP2

CIN5

CKB2

TAF47

MAF1

PRE10

PUP3

GLE2

DMC1

GCD14

MSH4

MSH5

UPF3

GAT1SCL1

YFL052W

NIC96

SFH1

NUP57

RNA15NUP2RRN9

RAD6

TOA1DUO1

RPB9

POP7CDC34 ATC1

SWM1FLO8

RPS22A

TAF17

NAB2GAR1

FZF1

CTK3

SPT2

DOT5SFL1

RDH54

HAP2HAP3

SNU71

PRP40

REC102

NUP49

NSP1

SYF2

ULP1

ZPR1HRT1

RIM101

SMD3

POP8

PRE8

TFC7

ARP1

CDC23

LEU3

STB5

CDC9

MEK1

RPT5

SIK1

FPR4

SNP1

HDA1

LSM1

HYS2

POL32

SRP1

LSM8

ISY1

RPA12

RPA14

TAF25

SPC97

RAD27

PHD1 NUP120

HMO1

DBR1

DAL80SAP1

NUP133

STU2

CDC45

DOT6

MSL5

SME1

RED1

SMD2

SMX3YSH1

RPC11

KAP95

RIF2 SIR2

PRI1

SWI3

CNA1

YOX1

RNH35RRP6

SLK19YOR206W

WTM1

YPR144C

NBP1

NDC1

OGG1

RCL1

MFT1PHO23

CUP2SNF6

RAD10

SPT4

TAF19

CAC2

RLF2CTK2

NUP116SPC72

SWH1CAD1

YAP3SPS18

YNG1

PUP1

SAS5

TYE7

SEN54

MEX67

STO1

ASM4RGM1

CEF1

KRI1

CUS1

CAT8

TFC5

IMP4

TOP2

TFG2

FIP1

LSM7

DAL82

NUF2

RFC4

RFC3

TRF4 NVJ1

HRP1

SWI6

BUB1

WHI2

PRE9

RGT1

CKA2

RPT4

RIS1

CDC31

KAR1

YKR017C

HSH49

RTF1

NOP4

KAP120

PRP46

RAD53

HRR25

TFB3

NOT5

SUA7

RNR3

ESS1

GCD10

SPT20CDC21

PRE2

FHL1

YTH1

RPN7 RPN3

DBF20

Pajek

Page 42: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Correlation profile of the protein interaction network

R(k0,k1)=N(k0,k1)/Nr(k0,k1) Z(k0,k1) =(N(k0,k1)-Nr(k0,k1))/∆Nr(k0,k1)

Similar profile is seen in the yeast regulatory network

Page 43: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Pajek

Page 44: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Correlation profile of the yeast regulatory network

R(kout, kin)=N(kout, kin)/Nr(kout,kin) Z(kout,kin)=(N(kout,kin)-Nr(kout,kin))/ ∆Nr(kout,kin)

Page 45: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Some scale-free networks may appear similar

In both networks the degree distribution is scale-free P(k)~ k-γ with γ~2.2-2.5

Page 46: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

But: correlation profiles give them unique identities

InternetProtein interactions

Page 47: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

How to construct a proper random network?

Page 48: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Randomization of a network

given complex network random

Page 49: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Basic rewiring algorithm

Randomly select and rewire two edges Repeat many times

• R. Kannan, P. Tetali, and S. Vempala, Random Structures and Algorithms (1999)

• SM, K. Sneppen, Science (2002)

Page 50: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Metropolis rewiring algorithm

Randomly select two edgesCalculate change ∆E in “energy function” E=(Nactual-Ndesired)2/Ndesired

Rewire with probability p=exp(-∆E/T)

“energy” E “energy” E+∆E

SM, K. Sneppen:cond-mat preprint (2002),Physica A (2004)

Page 51: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

How do protein networks evolve?

Page 52: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Gene duplication

Pair of duplicated proteins

2 Shared interactionsOverlap=2/5=0.4

Pair of duplicated proteins

5 Shared interactionsOverlap=5/5=1.0

Right after duplication After some time

Page 53: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Yeast regulatory network

SM, K. Sneppen, K.Eriksen, K-K. Yan, BMC Evolutionary Biology (2003)

Page 54: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

100 million years ago

Page 55: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Protein interaction networksSM, K. Sneppen, K.Eriksen, K-K. Yan, BMC Evolutionary Biology (2003)

Page 56: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Why so many genes could be deleted without any

consequences?

Page 57: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Possible sources of robustness to gene deletionsBackup via the network (e.g. metabolic network could have several pathways for the production of the necessary metabolite)Not all genes are needed under a given condition (say rich growth medium)Affects fitness but not enough to killProtection by closely related homologs in the genome

Page 58: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Protective effect of duplicates

Gu, et al 2003Maslov, Sneppen, Eriksen, Yan 2003

Yeast Worm

Maslov, Sneppen, Eriksen, Yan 2003

Z. Gu, … W.-H. Li, Nature (2003)

Maslov, Sneppen, Eriksen, Yan BMC Evol. Biol.(2003)

Page 59: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

SummaryThere are many kinds of protein networksNetworks in more complex organisms are more interconnectedMost have hubs – highly connected proteinsHubs often avoid each other (networks are anti-hierarchical)There are many self-interacting proteins. Probability to self-interact linearly scales with the degree K.Networks evolve by gene duplicationsRobustness is often (but not always) provided by gene duplicates

Page 60: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Part 2

Page 61: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Modules in networks and how to detect them using the Random walks/diffusion

K. Eriksen, I. Simonsen, SM, K. Sneppen, PRL (2003)

Page 62: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

What is a module?Nodes in a given module (or community group or functional unit) tend to connect with other nodes in the same module

Biology: proteins of the same function or sub-cellular localizationWWW – websites on a common topicInternet – geography or organization (e.g. military)

Page 63: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Do you see any modules here?

Page 64: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Random walkers on a networkStudy the behavior of many VIRTUAL random walkers on a networkAt each time step each random walker steps on a randomly selected neighborThey equilibrate to a steady state ni ~ ki (solid state physics: ni = const)Slow modes allow to detect modules and extreme edges

Page 65: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Matrix formalism

ˆ( 1) ( )

1/ if j iˆ0 otherwise

i ij jj

jij

n t T n t

KT

+ =

↔⎧= ⎨⎩

Page 66: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Eigenvectors of the transfer matrix Tij

( )

( ) ( ) ( )

( ) ( )

( )

ˆ

( )

1 1

i ij jj

t

i i

v T v

n t v

α α α

α α

α

λ

λ

λ

=

=

− ≤ ≤

Page 67: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

US Military

Russia

Page 68: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

2 0.9626 RU RU RU RU CA RU RU ?? ?? US US US US ??

(US Department of Defence)

3 0.9561 ?? FR FR FR ?? FR ??

RU RU RU ?? ?? RU ??

4 0.9523 US ?? US ?? ?? ?? ?? (US Navy)

NZ NZ NZ NZ NZ NZ NZ

5. 0.9474 KR KR KR KR KR ?? KR

UA UA UA UA UA UA UA

Page 69: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Hacked site

Page 70: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins
Page 71: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

How Modules in the WWW influence Google ranking

H. Xie, K.-K. Yan, SM, cond-mat/0409087

Page 72: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

How Google works?Too many hits to a typical query: need to rank One could rank the importance of webpages by Kin, but:

Too democratic: It doesn’t take into account importance of webpages sending links it’s easy to trick and artificially boost the rank

Google’s solution: simulate the behavior of many “random surfers” and then count the number of hits for every webpage

Popular pages send more surfers your way the Google weight is proportional to Kin weighted by popularity Surfer get bored following links use α=0.15 for random jumps without any linksLast rule also solves the problem that some pages have no out-links

Page 73: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Mathematics of the Google

Google solves a self-consistent Eq.:

Gi ~ Σj Tij Gj - find principal eigenvector Tij= Aij/Kout (j)Model with random jumps:

Gi ~ (1-α)Σj Tij Gj + α Σj Gj

Page 74: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

How do WWW communities (modules) influence the Gi?

Naïve argument: communities tend to “trap” random surfers they should increase the Google ranking of nodes in the community

Page 75: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

log 1

0(G

c/G

w)

Ecc

Test of a naïve argument

Naïve argument is completely wrong!

Page 76: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

EccEww

Page 77: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Gc – average Google rank of pages in the community; Gw – in the outside world

Ecw Gc/<Kout>c – current from C to Wmust be equal to: Ewc Gw/<Kout>w – current from W to C

Gc /Gw depends on the ratio between Ecwand Ewc – the number of edges (hyperlinks) between the community and the world

Page 78: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins
Page 79: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins
Page 80: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins
Page 81: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Networks with artificial communities

To test let’s generate a scale-free network with an artificial community of Nc pre-selected nodes Use metropolis algorithm with H=-(# of intra-community nodes) and some inverse temperature βDetailed balance:

Page 82: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Collaborators:

Kim Sneppen – Nordita + NBIKasper Eriksen –Lund U.Ingve Simonsen – U. of Trondheim

Huafen Xie – City University of NYKoon-Kiu Yan - Stony Brook U.

Page 83: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

Summary

Diffusion process and modules (communities) in a network influence each otherIn the “hardware” part of the Internet – the network of routers (Autonomous systems) --diffusion allows one to detect modules and extreme edgesIn the “software” part – WWW communities affect Google ranking in a non-trivial way

Page 84: Detecting topological patterns in protein networksmaslov.bioengineering.illinois.edu/geilo_lecture_2005.pdf · Protein networks Nodes – proteins Edges – interactions between proteins

THE END