making sense of large amounts of molecular data

36
Making sense of large amounts of molecular data Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest National Laboratory 1

Upload: bebe

Post on 24-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Making sense of large amounts of molecular data. Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest National Laboratory. How do components of biological systems interact to produce behavior?. Nucleic Acids. Proteins. Macromolecular - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Making sense of large amounts of molecular data

Making sense of large amounts of molecular data

Jason E. McDermott, PhDResearch Scientist

Computational Biology and Bioinformatics GroupPacific Northwest National Laboratory

1

Page 2: Making sense of large amounts of molecular data

Proteins

Nucleic Acids

MacromolecularComplex

How do components of biological systems interact to produce behavior?

Page 3: Making sense of large amounts of molecular data

3

Molecular pathways

mTOR pathwayEGFR pathway

http://biocarta.com

Page 4: Making sense of large amounts of molecular data

A Mammoth Problem

Page 5: Making sense of large amounts of molecular data

Scientific Method Overview

5

Hypothesis

Experimental design

Data generation

Analysis/modeling

Predictions

Interpretation

HypothesisHypothesis

Hypothesis

Page 6: Making sense of large amounts of molecular data

6

Circumstantial EvidenceTraditional experimental approach

Cigarette butt on streetNeighbor was eyewitness to crimeMissing jewelry from the houseFingerprints on doorknob

High-throughput experimental approach

Cigarette sales in cityTestimony from everyone on the blockAll diamonds sold over last year in 10 mile radiusFingerprints on every surface in the house

Page 7: Making sense of large amounts of molecular data

7

ProblemNew methods generating mountains of dataVery complex systemsTraditional methods fail in some casesProgress will be made through better use of this data

ObjectivesFormulate hypotheses for further investigationIdentify gene/protein ‘targets’Identify pathways that drive diseaseDevelop systems-level biological understanding

Page 8: Making sense of large amounts of molecular data

8

What is a ‘target’?

‘Critical nodes’Regulators of important processesOutcome of modeling (a prediction) that can be used to formulate a hypothesis

What are targets used for?Mechanistic understanding of disease processesPotential biomarkers of diseasePotential therapeutic treatments: drug development

Page 9: Making sense of large amounts of molecular data

9

Examples I’ll be talking aboutBacterial virulence (Salmonella Typhimurium)Viral pathogenesis (avian flu and SARS)Ovarian cancer

Approaches I’ll be talking aboutMachine learningBiological networksData integration

Page 10: Making sense of large amounts of molecular data

LPSTLR4MEKERKEgr-1

pH

Mg2+

ROS/RNS

SP

I2-T3S

Bac

teria

l de

tect

ion

Hos

t def

enseE

nvironmenta

l responseV

irulence activation

ssrA/B

phoP/Q

ompR/envZ

ydgT

Bac

teria

l su

rviv

al

Invasion

Effectors

Env

ironm

enta

l M

odul

atio

n

Pat

hoge

n di

rect

edH

ost

dire

cted

SP

I1+

SCV

LPS

iNOSNRAMP

Fe2+

Effectors

(e.g. SifA

, SlrP,

SseJ, S

spH2)

SP

I2-T3S

Environm

ental response

Virulence

activation

ssrA/B

phoP/Q

ompR/envZ

ydgT

Effectors

(e.g. SifA

, SlrP,

SseJ, S

spH2)

Salmonella Typhimurium

Pathogen Host

Page 11: Making sense of large amounts of molecular data

Karou Geddes

Type-III secretion system secreted effectors

SlrPSspH2

SseISseJSifASifBSpvB

SseK-1SopD-1

InvJSipC

+25 other known effectors+??? other unknown effectors

http://en.wikipedia.org/

Page 12: Making sense of large amounts of molecular data

Overview of the SVM-based Identification and Evaluation of Virulence Effectors (SIEVE) Method

Page 13: Making sense of large amounts of molecular data

D2

D1

SVM-based Discrimination

Positive

Negative

Page 14: Making sense of large amounts of molecular data

SIEVE Validation Using CyaA Fusions14

0 20 40 60 80 100 120 140 160 180 2000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Secretion versus SIEVE score

CyaA Activity (relative to SrfH)

SIEV

E Zs

core

McDermott, et al. 2011. Infection and Immunity. 79(1):23-32Niemann, et al. 2011. Infection and Immunity. 79(1): 33-43

Page 15: Making sense of large amounts of molecular data

Biological Networks

Types of networksRegulatory networksProtein-protein interaction networksBiochemical reaction networksAssociation networks

NetworkNode = gene/protein or other componentEdge = inferred relationship between components

15

McDermott JE, et al. 2010. Drug Markers, 28(4):253-66.

Page 16: Making sense of large amounts of molecular data

Merging disparate observations of a system to produce a single, more informative view

16

SNVs

CNVs

mRNA

methylation

proteinphosphorylatio

n

miRNA

GenomeComparison

Pathway enrichment

LEAP

Network analysis

metabolome

Page 17: Making sense of large amounts of molecular data

Can we infer a relationship between two genes or proteins based on their expression profiles over a large number of different conditions?

A

B

C

Faith, J., et al. “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.” 2007. PLoS Biology 5:e8

Network inference method

conditions

gene

Page 18: Making sense of large amounts of molecular data

18

What are networks useful for?

Networks can be used for:Pretty figuresHypothesis generationFunctional modules and their organizationTopological identification of target critical nodesPredicting future states of the network

Networks are NOT useful for:Final mechanistic insightFine distinction of types of interactions between componentsCausality

Page 19: Making sense of large amounts of molecular data

Yu H et al. PLoS Comp Biol 2007, 3(4):e59

Hubs High centrality, highly

connected Exert regulatory influences Vulnerable

Bottlenecks High betweenness Regulate information flow

within network Removal could partition

network

Page 20: Making sense of large amounts of molecular data

20

Bottlenecks in Salmonella are essential for virulence

McDermott J, et al. 2009. J. Comp. Bio. 16(2):169-180

Page 21: Making sense of large amounts of molecular data

21

Discovery of a novel class of effectors by integrating transcriptomic and proteomic networks

Page 22: Making sense of large amounts of molecular data

Respiratory virus pathogenesisWhat are the causes of pathogenesis in respiratory viruses?Goal: Identify and prioritize potential mediators of pathogenesis that are common and unique to influenza and SARS Goal: Identify and prioritize potential mediators of high-pathogenecity viral infectionApproach:

Mouse models of infectionTranscriptomicsNetwork-based approachTopological network analysis to define targetsValidation studies

Page 23: Making sense of large amounts of molecular data

Ido1/Tnfrsf1b ModuleKepi Module

SARS-CoV-infected Wild type Mouse Inferred Network

Page 24: Making sense of large amounts of molecular data

Hypotheses for Validation

KO Mouse

Infection

Survival Death Negative NegativePhenotype:

Network: Altered Altered Altered Negative

Page 25: Making sense of large amounts of molecular data

Predicted targets abrogate influenza pathogenesis

Tnfrsf1b (aka. Tnfr2)Predicted common regulator for influenza and SARS pathogenesisTnfa bindingNegatively regulate TNFR1 signaling, which is proinflammatoryPromote endothelial cell activation/migrationActivation and proliferation of immune cells

25

H5N1 infection

0 1 2 3 4 5 6 770

80

90

100

110

B6TnfrsfPe

rcen

t Sta

rting

Wei

ght

SARS infection

Page 26: Making sense of large amounts of molecular data

0

5

10

-5

Page 27: Making sense of large amounts of molecular data

Biological Drivers in Ovarian CancerWhat genomic characteristics of ovarian cancer are executed at the protein level?

Can protein expression be used to identify the most important genomic changes?

How can we improve the survival of women with ovarian cancer?

Can proteomics provide insight into the biological processes associated with poor survival?Can we use a pathway-based approach to suggest novel therapeutic targets?

27

Page 28: Making sense of large amounts of molecular data

Proteomics

Chemoresistance in ovarian and breast cancerTumor samples from The Cancer Genome Atlas

Depth of genomic characterizationMany tumors

Proteomics and phosphoproteomics characterization of these tumorsPathway/network analysis to reveal patterns and biomarkersIntegrate data into single view of the system

28

Page 29: Making sense of large amounts of molecular data

Clustering of Proteins and Phosphoproteins

ProteinsiTRAQ Batch

Proteomic Subtypes

Transcriptomic Subtype

Log2 abundance relative to universal reference pool

Phosphoproteins

Page 30: Making sense of large amounts of molecular data

Linear regression of abundance versus days-to-death suggests possible correlations with patient survival

Protein Abundance Phosphorylation (normalized to abundance)

A Subset of Proteins and Phosphopeptides Correlate with Patient Survival

Page 31: Making sense of large amounts of molecular data

PDGFRB Pathway

Correlated with short survival

Correlated with long survival

mRNA abundance

protein abundance

Not observed

phosphorylation

Weak correlation

Weak correlation

Page 32: Making sense of large amounts of molecular data

Module 1 (short survival)

Correlated with short survival

Correlated with long survival

Protein

Phosphorylated protein

mRNA

AP-1 pathwayNFAT TF pathway

Module 2 (long survival)

CD8 T cell receptor downstream pathwayIl12-2 pathwayIl12-STAT4 pathway

Integrated Co-abundance Network for Ovarian Cancer

Page 33: Making sense of large amounts of molecular data

P-value 0.007IGKV1-5 LAX1AMPD1IGHMSLAMF7

P-value 0.005ATF3DUSP1FOSBZFP36

Kaplan-Meier plots from integrated CNV, mRNA expression, and mutations

% s

urvi

val

% s

urvi

val

Months survival Months survival

Survival Analysis from Network Targets

Page 34: Making sense of large amounts of molecular data

34

ConclusionsSeveral effective ways of big data integration

Machine learning approachesBiological network representationData integration

Understanding of disease requires system-level viewsRelatively simple approaches can yield novel insightCombining different views of system can improve insightData analysis and modeling is a starting point- not an end point

Page 35: Making sense of large amounts of molecular data

35

AcknowledgementsSysBEP (http://www.sysbep.org)

NIAID/NIH Y1-AI-8401PI: Josh Adkins, PNNL

Systems Virology (http://www.systemsvirology.org)NIAID/NIH HHSN272200800060CPI: Michael Katze, UW

Clinical Proteomics Tumor Analysis ConsortiumNCI/NIH 1U24CA160019 PIs: Richard Smith, PNNL; Karin Rodland, PNNL

Many, many people in these and other projects who helped with this work and made it possible

Page 36: Making sense of large amounts of molecular data

About Me

Email: [email protected]: http://www.jasonya.com/wp/about/Twitter: @BioDataGanacheBlog: The Mad Scientist’s Confectioner’s Club

http://www.jasonya.com/wp/

36