funcoup data integration and networks of functional coupling in eukaryotes andrey alexeyenko

31
FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Post on 19-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

FunCoupdata integration and networks

of functional coupling in eukaryotes

Andrey Alexeyenko

Page 2: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

FunCoup is a data integration framework to discover

functional coupling in eukaryotic proteomes with

data from model organisms

AHuman

BHuman

?F

ind

ort

hol

og

s*

Mouse

Worm

Fly

Yeast

Hig

h-th

roug

hput

ev

iden

ce

* Remm M, Storm CE, Sonnhammer ELL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314:1041-1052.

Page 3: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

FunCoup is a naïve Bayesian network (NBN)

Bayesian inference:

Genes A and B are functionally coupled

Genes A and B co-expressed

P(C|E) = (P(C) * P(E|C)) / P(E)

A<->B

Page 4: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Problem: Solution:

Naïve Bayesian network.Calculate a belief change instead

(likelihood ratios, LR)

Absolute probabilities of FC are intractable. The full Bayesian network is impossible

A<->B

P(B|C), P(C|B)

P(B|A), P(A|B)

P(B|D), P(D|B)

P(A|C), P(C|A)

P(D|C), P(C|D)

P(A|D), P(D|A)

P(E|+) / P(E|-)

A<->B

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

Page 5: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

gene evolutionfunctional link

Problem: Solution:

Via groups of orthologs that emerged via the speciation

How to establish optimal bridges between species?

Page 6: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Problem: Solution:Treat ALL inparalogs equally, and

choose the BEST valueIn situatons with multiple inparalogs, how to deal with alternative evidence?

Page 7: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Problem: Solution:Render data uncorrelated with principal components analysis

(PCA)

Collected features are often telling the same: badly compatible with NBN

X: Feature A

Y:

Fea

ture

B

PC1

= α 11

X+ α 21

Y

PC2 = α21 X+ α

22 Y

: a pair of proteins

X: Feature A

Y:

Fea

ture

B Y

X

Page 8: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Problem: Solution:

Render features discrete

A feature distribution shape may be unpredictable: hard to learn the “feature -> evidence” mapping

Page 9: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Problem: Solution:Find them individually for each data set and FC class, accounting for the joint “feature – class” distribution

Distribution areas informative of FC may vary

0-1 1Pearson r

+ + + + + + + +++ +++ +++ ++ + ++

- - - ----- -- ------ - - -- - - -

Page 10: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Problem: Solution:Positive setRandom set________Replace negative

sets with randomly picked ones:

Impossible to guarantee absence of FC in negative training sets

Negative set Positive set

not coupled proteins coupled proteins

Page 11: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Problem: Solution:

Enforce confidence check and remove insignificant nodes

Some LR are weak and arise due to non-representative sampling

P(E|+) / P(E|-)

A<->B

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-) test

Page 12: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Problem: Solution:Multinet

Decide which types of FC are needed (provide as positive training sets) and

perform the previous steps customized

Definitions and notions of FC vary

A<>B

P(E|+) / P(E|-)

A| B

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

A<>B

A||B

A|B

Page 13: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

FunCoup’s web interfaceNew!

Hooper S., Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics. 2005 Dec 15;21(24):4432-3. Epub 2005 Sep 27.

http

://w

ww

.sbc

.su.

se/~

anda

le/f

unco

up.h

tml

Page 14: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Proteins of the Parkinson’s disease pathway (KEGG #05020)

Physical protein-protein interaction

“Signaling” link

Metabolic “non-signaling” link

Multinet presents several link types in parallel

Page 15: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Multilateral data transfer

Human

Ciona

Worm

Mouse Rat

Fly

Yeast

Arabidopsis

PCA

NBN

Data from the same species is an important but not indispensable component of the framework. Hence, a network can be constructed for an organism with no experimental datasets at all.

Page 16: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

FunCoup builds a network for an uncaracterized organism

(C. intestinalis)

• Build multi-species clusters of ortologs (e.g. human + C.intestinalis + D.melanogaster + C.elegans) [*]

• Extend known metabolic pathway assignments to the novel organism (e.g. Ciona)

• Collect well-studied organisms’ data• Using this data, train FunCoup on the set created

in (2)• Test each pair of proteins in the novel organism

for being coupled

*Alexeyenko A, Tamas I, Liu G, Sonnhammer ELL Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics. 2006 15;22(14):e9-e15.

Page 17: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Reconctructing the “regulatory blueprint”* in C. intestinalis

*Im

ai K

S, L

evin

e M

, Sat

oh N

, Sat

ou Y

(20

06)

Reg

ulat

ory

blue

prin

t for

a c

hord

ate

embr

yo. S

cien

ce, 2

6:11

83-7

.

Proteins of the “Regulatory Blueprint for a Chordate Embryo” [*]

18 links mentioned in [*] AND found by FunCoup

Links found by FunCoup (about 140)

The rest, 202 links from [*] that FunCoup did not find, not shown

Page 18: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Set of outgoing links of the “regulatory blueprint”

Page 19: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

…and a tight cluster from it:

00562 Inositol phosphate metabolism00632 Benzoate degradation via CoA ligation00760 Nicotinate and nicotinamide metabolism04310 Wnt signaling pathway04330 Notch signaling pathway04350 TGF-beta signaling pathway04360 Axon guidance04510 Focal adhesion04512 ECM-receptor interaction04514 Cell adhesion molecules04520 Adherent junction04530 Tight junction04630 Jak-STAT signaling pathway04640 Hematopoietic cell lineage04670 Leukocyte transendothelial migration04810 Regulation of actin cytoskeleton

ADAM10  Myosin light chain 2 Cadherin EGF LAG seven-pass G-type receptor 2 Neurotrophic tyrosine kinase, receptor-related 3

Inferred KEGG pathways:

…and annotations of human orthologs:

The Ciona genes were not described, but may receive this annotation via orthology:

Page 20: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

The limits of data integration

1 2 3 4 5

N o . o f spec ies

0.004

0.005

0.006

0.007

0.008

0.009

0.010

0.011

0.012

0.013

Are

a un

der

RO

C,

spec

ifici

ty >

96% P C A -p rocessed

R aw da ta

4 8 12 16 20 24 28 32 36 40 44

N o . o f features

0.004

0.005

0.006

0.007

0.008

0.009

0.010

0.011

0.012

0.013

Are

a un

der

RO

C,

spec

ifici

ty >

96%

P C A -p rocessed R aw da ta

Page 21: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Condfidence estimation

Sensitivity (from “gold standard” set of FC):

Sens = TP / (TP + FN) Specificity (from a set of “No / not known FC”)

Spec = TN / (TN + FP)Positive Predictive Value (from everything predicted by

FunCoup):

PPV = TP / (TP + FP)

PPV answers the question:

“How much should we trust the FunDoup predictions”

Page 22: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

2 4 6 8 10 12

F ina l B ayes ian sco re

0 .0

0 .1

0 .2

0 .4

0 .6

0 .8

1 .0

PP

V e

stim

ate

s ignaling m etabo lic P P I

Page 23: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

320 ,000 / 12 ,750 9 4,0 00 / 11 ,100 37 ,000 / 8 ,500 17 ,000 / 5 ,450

N o . o f links / ind iv idual pro te ins in the hum an netwo rk

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

Con

fiden

ce s

core

M em bers o f sam e s ignaling pathway

P hysica l pro te in-pro te in interactio n

Page 24: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Correction of confidence by amount of evidence

1. Record the amount of information (AOE ~ non-empty values) that describes each pair of proteins A<->B

2. Correct each final Bayesian score:

FBS’(A<->B) = FBS(A<->B) + beta * (M(AOE) – AOE(A<-

>B));

beta is the linear regression coefficient of:FBS = alpha + beta * AOE

Page 25: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Confidence saturated at FBS = 12.5

2 4 6 8 10 12

FunC oup b ina l B ayes ian sco re

0%

20%

40%

60%

80%

100%P

PV

est

imat

e (~

con

fiden

ce)

N o co rrectio n -> O prior = 0 .0008 C o rrectio n = 0.007 -> O prior = 0 .0008 C o rrectio n = 0.015 -> O prior = 0 .0007

Page 26: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

How the yeast complex entities are conserved?

Log overlap between KEGG and Gavin et al., 2006

1 2 3 4 5 6 7

yeast

worm

fly

mouse

human

thaliana

Lo

g o

verl

ap

KE

GG

vs.

"G

avi

n e

t al.,

20

06

"

Core-Core Core-Modu Core-Attr Modu-Modu Modu-Attr DiffModules Attr-Attr

Page 27: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Conclusions

http://FunCoup.sbc.su.se

• After the optimization, the naïve Bayesian network is well suited

for collection/evaluation of sparse, diverse, and noisy features,

and is, in itself, efficient to discover novel cases of FC

• Orthologs are optimal to transfer information across species

• The multiple class training enabled specific prediction of different

types of functional coupling

• Across-species information flow is not symmetrical but reversible

– hence the networks of uncharacterized proteomes In FunCoup

• In the Bayesian output, no missing values exist – thus a

multivariate classification technique may be applied as a post-

processor

Page 28: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Acknowledgements:

• Erik Sonnhammer• Tomas Ohlson• Mats Lindskog• Kristoffer Forslund• Gabriel Östlund• Kevin O’Brien• Carsten Daub

Page 29: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

ValidationJack-knife procedure:

Take “positive” and “negative” sets Split each randomly as 50:50 Use the first parts to train the algorithm, the second to test the

performance Repeat a number of times

Analysis Of VAriance:

Introduce features A, B, C in the workflow of FunCoup (e.g., using PCA, selecting nodes of BN by relevance, ways of using ortholog data etc.)

Run FunCoup with all possible combinations of absence/presence of A, B, C to produce a balanced and orthogonal ANOVA design with replicates

Study effects of A,B,C or their combinations AxB, BxC,.. AxBxC to see if they influence the performance significantly (whereas all other effects did not exist)

Page 30: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko

Estimating quality of prediction

Sensitivity: TP / (TP + FN)1 - Specificity: FP / (FP + TN) Individual points represent varying cut-offs

Page 31: FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko