building biological networks from diverse genomic data

33
Building biological networks from diverse genomic data Chad Myers Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics Princeton University PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006

Upload: solstice-rhoda

Post on 31-Dec-2015

24 views

Category:

Documents


4 download

DESCRIPTION

Building biological networks from diverse genomic data. Chad Myers Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics Princeton University PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006. - PowerPoint PPT Presentation

TRANSCRIPT

Building biological networks from diverse genomic data

Chad Myers

Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics

Princeton University

PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006

2

Motivation: building biological networks from experimental data

Explosion of functional genomic DATA

KNOWLEDGE of components and inter-relationships that lead to function

? Find missing pathway components

Detect uncharacterized crosstalk between pathways

Discover novel pathways

3

Motivation: building biological networks from experimental data

noisy

How can we harness this information without sacrificing precision?

4Directed network discovery: involving the biologist in the search process

Previous approaches to network analysis from genomic data:

largely undirected global approaches that detect interesting network features

Incorporating expert direction can:

Improve sensitivity and precision by using context information

Focus on relevant information for biologist user (allows interactivity)

Two-hybrid interaction network, yeast (SH3

domain) Boone lab

Previous work: Bader et al. (2003), Asthana et al. (2004)Yamanashi et al. (2004,2005), Kato et al. (2005)

5

bioPIXIE system overview

bioPIXIE: Pathway Inference from eXperimental Interaction Evidence

6

Overview

How do we integrate heterogeneous evidence?

Expert-driven network discoveryMaking it usable: practical visualization

and other interface considerationsDoes it work?

(evaluation experiments and biological validation)

Challenges/opportunities and future work

7

Heterogeneous data integration

Diverse forms of data: what’s a unifying framework?

Variable coverage, reliability, and relevance Integration scheme should utilize information in data

when available, but be robust when missing

physical binding

genetic interaction

cellular localization

expressionsequence (TF motifs, coding,…)

Bayes net

Map to associations of genes/proteins

8 Bayes net for evidence integration

Functional Relationship

Microarray correlation

Shared transcription

factors

Purified complex

Affinity precipitation

2 Hybrid

Syntheticlethality

Syntheticrescue

Co-localization

evidenceproteintorelatedlyfunctionalisprotein jiPWe infer:

Input evidence: grouped by lab (source) and by type

Structure:

Naïve Bayes (~60 nodes)

(also tried TAN)

CPT’s:

learned from GO gold standard

Fully-connected, weighted graph

of proteins

9

Overview

How do we integrate heterogeneous evidence?

Expert-driven network discoveryMaking it usable: practical visualization

and other interface considerationsDoes it work?

(evaluation experiments and biological validation)

Challenges/opportunities and future work

10

Expert-driven network discovery Local search in the PPI network centered at the

query

Which proteins should we extract as a single, functionally coherent group?

Should consider: confidence in links and topology surrounding query group

11

Extracting relevant proteins

Basic idea: compute expected linkage to query set eij = P ( protein i is functionally related to protein j | evidence)

Xij : binary RV with prob. eij

SQ ( pi ): # of links from protein i to query set, Q

Find proteins that maximize:

Qpij

Qpij

QpijiQ

jjj

eXEXEpSE

What about indirect links to the query set?

12 Graph search: handling indirect links

Solution: iterative expanding search where indirect links to the query through high confidence neighbors

are counted

13

Overview

How do we integrate heterogeneous evidence?

Expert-driven network discoveryMaking it usable: practical visualization

and other interface considerationsDoes it work?

(evaluation experiments and biological validation)

Challenges/opportunities and future work

14

Making bioPIXIE usable

Guiding principles: Accessibility (users can access most recent data with little effort)

Simplicity vs. flexibility

Drill-down (details, e.g. supporting exp. data, hidden until requested)

Browseable

15

Graph visualization

16

Overview

How do we integrate heterogeneous evidence?

Expert-driven network discoveryMaking it usable: practical visualization

and other interface considerationsDoes it work?

(evaluation experiments and biological validation)

Challenges/opportunities and future work

17

Evaluation experiments

Recovering known network components:

How much does integration help?

Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS)

10 random proteins as query set and try to recover remaining members

18

Evaluation experiments (2)

Recovering known network components:

Do naïve methods of integration/search work just as well?

Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS)

10 random proteins as query set and try to recover remaining members

19 Biological validation: finding new components

S. cerevisiae uncharacterized gene, YPL077C

Predicted involvement in chromosome segregation

Using bioPIXIE to characterize unknown genes

20

Biological validation: finding new components

P-value based on blind counting: 1.98x10-7 , Fisher’s exact test

21

(Helmut Pospiech)

Biological validation: novel links between pathways

DNA replication initiation:

Cdc7: “switch” that starts replication (activated by Dbf4)

Linked to Hsp90 complex by our method

Hsp90 (yeast- hsc82,hsp82):Cytosolic molecular chaperone that participates in the folding of several signaling kinases and hormone receptors

22

Genetic analysis of DNA replication-Hsp90 link

105 cells

105 cells

105 cells

wt

db

f4Δ

hsp

82Δ

db

f4Δ

hsp

82Δ

wt

db

f4Δ

hsc

82Δ

db

f4Δ

hsc

82Δ

wt

db

f4Δ

cpr7

Δ

db

f4Δ

cpr7

Δ

RT

30°C

37°C

YKO Dbf4 vs. hsp82, hsc82 and co-chaperones: cpr7, sti1, cdc37

23

Overview

How do we integrate heterogeneous evidence?

Expert-driven network discoveryMaking it usable: practical visualization

and other interface considerationsDoes it work?

(evaluation experiments and biological validation)

Challenges/opportunities and future work

24 Practical challenges/opportunities

Visualizing complex networks of interactions in a meaningful way

how does it scale with added data? easy user navigation around the network

Data-centric vs. established knowledge viewsHow do we overlay current knowledge of pathways with predictions derived from experimental data?

25

Future workAn observation:

The more specific we can be about the end goal, the better the accuracy of our prediction

26

Future workExploiting relevance and reliability variation: context-specific integration

27

Summary

bioPIXIE can facilitate precise network discovery from experimental data using:

Bayesian data integration Expert-directed search Web-based dynamic interfacebioPIXIE is an effective tool for browsing

genomic evidence and generating specific, testable hypotheses

http://pixie.princeton.edu

28

Acknowledgements

http://pixie.princeton.edu

Olga TroyanskayaDrew RobsonAdam Wible

Kara Dolinski

Camelia Chiriac

Matt Hibbs

Curtis Huttenhower

David Botstein Lab

Leonid Kruglyak LabThank you!

29

Evaluation experiments (3): what about noise in the query set?

AU

PR

C

# of random proteins out of 20

total query proteins

31

30°C

37°C

HU 0 mM HU 50 mM HU 100 mM

wt

cpr7

Δ

sti1

Δ

db

f4Δ

hs

p8

hs

c8

db

f4Δ

hs

c8

db

f4Δ

sti1

Δ

db

f4Δ

cpr7

Δdb

f4Δ

hs

p8

wt

cpr7

Δst

i1Δ

db

f4Δ

cpr7

Δ wt

cpr7

Δ

sti1

Δ

db

f4Δ

cpr7

Δhs

p8

hs

p8

hs

c8

hs

c8

db

f4Δ

db

f4Δ

db

f4Δ

hs

p8

db

f4Δ

hs

p8

db

f4Δ

hs

c8

db

f4Δ

hs

c8

db

f4Δ

sti1

Δ

db

f4Δ

sti1

Δ

Hydroxyurea sensitivity (replication inhibitor)

106 cells

106 cells

32

Is this interaction specific to DNA replication?

37°C

wt

cpr7

Δ

sti1

Δ

db

f4Δ

hs

p8

hs

c8

db

f4Δ

hs

c8

db

f4Δ

sti1

Δ

db

f4Δ

cpr7

Δdb

f4Δ

hs

p8

wt

cpr7

Δst

i1Δ

db

f4Δ

cpr7

Δ wt

cpr7

Δ

sti1

Δ

db

f4Δ

cpr7

Δhs

p8

hs

p8

hs

c8

hs

c8

db

f4Δ

db

f4Δ

db

f4Δ

hs

p8

db

f4Δ

hs

p8

db

f4Δ

hs

c8

db

f4Δ

hs

c8

db

f4Δ

sti1

Δ

db

f4Δ

sti1

Δ

106 cells

MMS treatment has no apparent effect at RT, 30°C or 37°C (shown)

MMS sensitivity (induces DNA damage)

Conclusions:

Hsp90 complex plays specific role in DNA replication

Hsc82 and hsp82 do not have identical function

Possible new link between signaling cascades, stress, and DNA replication

Our system generates specific, testable hypotheses

33

34