making sense of large amounts of molecular data

Post on 24-Feb-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Making sense of large amounts of molecular data. Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest National Laboratory. How do components of biological systems interact to produce behavior?. Nucleic Acids. Proteins. Macromolecular - PowerPoint PPT Presentation

TRANSCRIPT

Making sense of large amounts of molecular data

Jason E. McDermott, PhDResearch Scientist

Computational Biology and Bioinformatics GroupPacific Northwest National Laboratory

1

Proteins

Nucleic Acids

MacromolecularComplex

How do components of biological systems interact to produce behavior?

3

Molecular pathways

mTOR pathwayEGFR pathway

http://biocarta.com

A Mammoth Problem

Scientific Method Overview

5

Hypothesis

Experimental design

Data generation

Analysis/modeling

Predictions

Interpretation

HypothesisHypothesis

Hypothesis

6

Circumstantial EvidenceTraditional experimental approach

Cigarette butt on streetNeighbor was eyewitness to crimeMissing jewelry from the houseFingerprints on doorknob

High-throughput experimental approach

Cigarette sales in cityTestimony from everyone on the blockAll diamonds sold over last year in 10 mile radiusFingerprints on every surface in the house

7

ProblemNew methods generating mountains of dataVery complex systemsTraditional methods fail in some casesProgress will be made through better use of this data

ObjectivesFormulate hypotheses for further investigationIdentify gene/protein ‘targets’Identify pathways that drive diseaseDevelop systems-level biological understanding

8

What is a ‘target’?

‘Critical nodes’Regulators of important processesOutcome of modeling (a prediction) that can be used to formulate a hypothesis

What are targets used for?Mechanistic understanding of disease processesPotential biomarkers of diseasePotential therapeutic treatments: drug development

9

Examples I’ll be talking aboutBacterial virulence (Salmonella Typhimurium)Viral pathogenesis (avian flu and SARS)Ovarian cancer

Approaches I’ll be talking aboutMachine learningBiological networksData integration

LPSTLR4MEKERKEgr-1

pH

Mg2+

ROS/RNS

SP

I2-T3S

Bac

teria

l de

tect

ion

Hos

t def

enseE

nvironmenta

l responseV

irulence activation

ssrA/B

phoP/Q

ompR/envZ

ydgT

Bac

teria

l su

rviv

al

Invasion

Effectors

Env

ironm

enta

l M

odul

atio

n

Pat

hoge

n di

rect

edH

ost

dire

cted

SP

I1+

SCV

LPS

iNOSNRAMP

Fe2+

Effectors

(e.g. SifA

, SlrP,

SseJ, S

spH2)

SP

I2-T3S

Environm

ental response

Virulence

activation

ssrA/B

phoP/Q

ompR/envZ

ydgT

Effectors

(e.g. SifA

, SlrP,

SseJ, S

spH2)

Salmonella Typhimurium

Pathogen Host

Karou Geddes

Type-III secretion system secreted effectors

SlrPSspH2

SseISseJSifASifBSpvB

SseK-1SopD-1

InvJSipC

+25 other known effectors+??? other unknown effectors

http://en.wikipedia.org/

Overview of the SVM-based Identification and Evaluation of Virulence Effectors (SIEVE) Method

D2

D1

SVM-based Discrimination

Positive

Negative

SIEVE Validation Using CyaA Fusions14

0 20 40 60 80 100 120 140 160 180 2000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Secretion versus SIEVE score

CyaA Activity (relative to SrfH)

SIEV

E Zs

core

McDermott, et al. 2011. Infection and Immunity. 79(1):23-32Niemann, et al. 2011. Infection and Immunity. 79(1): 33-43

Biological Networks

Types of networksRegulatory networksProtein-protein interaction networksBiochemical reaction networksAssociation networks

NetworkNode = gene/protein or other componentEdge = inferred relationship between components

15

McDermott JE, et al. 2010. Drug Markers, 28(4):253-66.

Merging disparate observations of a system to produce a single, more informative view

16

SNVs

CNVs

mRNA

methylation

proteinphosphorylatio

n

miRNA

GenomeComparison

Pathway enrichment

LEAP

Network analysis

metabolome

Can we infer a relationship between two genes or proteins based on their expression profiles over a large number of different conditions?

A

B

C

Faith, J., et al. “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.” 2007. PLoS Biology 5:e8

Network inference method

conditions

gene

18

What are networks useful for?

Networks can be used for:Pretty figuresHypothesis generationFunctional modules and their organizationTopological identification of target critical nodesPredicting future states of the network

Networks are NOT useful for:Final mechanistic insightFine distinction of types of interactions between componentsCausality

Yu H et al. PLoS Comp Biol 2007, 3(4):e59

Hubs High centrality, highly

connected Exert regulatory influences Vulnerable

Bottlenecks High betweenness Regulate information flow

within network Removal could partition

network

20

Bottlenecks in Salmonella are essential for virulence

McDermott J, et al. 2009. J. Comp. Bio. 16(2):169-180

21

Discovery of a novel class of effectors by integrating transcriptomic and proteomic networks

Respiratory virus pathogenesisWhat are the causes of pathogenesis in respiratory viruses?Goal: Identify and prioritize potential mediators of pathogenesis that are common and unique to influenza and SARS Goal: Identify and prioritize potential mediators of high-pathogenecity viral infectionApproach:

Mouse models of infectionTranscriptomicsNetwork-based approachTopological network analysis to define targetsValidation studies

Ido1/Tnfrsf1b ModuleKepi Module

SARS-CoV-infected Wild type Mouse Inferred Network

Hypotheses for Validation

KO Mouse

Infection

Survival Death Negative NegativePhenotype:

Network: Altered Altered Altered Negative

Predicted targets abrogate influenza pathogenesis

Tnfrsf1b (aka. Tnfr2)Predicted common regulator for influenza and SARS pathogenesisTnfa bindingNegatively regulate TNFR1 signaling, which is proinflammatoryPromote endothelial cell activation/migrationActivation and proliferation of immune cells

25

H5N1 infection

0 1 2 3 4 5 6 770

80

90

100

110

B6TnfrsfPe

rcen

t Sta

rting

Wei

ght

SARS infection

0

5

10

-5

Biological Drivers in Ovarian CancerWhat genomic characteristics of ovarian cancer are executed at the protein level?

Can protein expression be used to identify the most important genomic changes?

How can we improve the survival of women with ovarian cancer?

Can proteomics provide insight into the biological processes associated with poor survival?Can we use a pathway-based approach to suggest novel therapeutic targets?

27

Proteomics

Chemoresistance in ovarian and breast cancerTumor samples from The Cancer Genome Atlas

Depth of genomic characterizationMany tumors

Proteomics and phosphoproteomics characterization of these tumorsPathway/network analysis to reveal patterns and biomarkersIntegrate data into single view of the system

28

Clustering of Proteins and Phosphoproteins

ProteinsiTRAQ Batch

Proteomic Subtypes

Transcriptomic Subtype

Log2 abundance relative to universal reference pool

Phosphoproteins

Linear regression of abundance versus days-to-death suggests possible correlations with patient survival

Protein Abundance Phosphorylation (normalized to abundance)

A Subset of Proteins and Phosphopeptides Correlate with Patient Survival

PDGFRB Pathway

Correlated with short survival

Correlated with long survival

mRNA abundance

protein abundance

Not observed

phosphorylation

Weak correlation

Weak correlation

Module 1 (short survival)

Correlated with short survival

Correlated with long survival

Protein

Phosphorylated protein

mRNA

AP-1 pathwayNFAT TF pathway

Module 2 (long survival)

CD8 T cell receptor downstream pathwayIl12-2 pathwayIl12-STAT4 pathway

Integrated Co-abundance Network for Ovarian Cancer

P-value 0.007IGKV1-5 LAX1AMPD1IGHMSLAMF7

P-value 0.005ATF3DUSP1FOSBZFP36

Kaplan-Meier plots from integrated CNV, mRNA expression, and mutations

% s

urvi

val

% s

urvi

val

Months survival Months survival

Survival Analysis from Network Targets

34

ConclusionsSeveral effective ways of big data integration

Machine learning approachesBiological network representationData integration

Understanding of disease requires system-level viewsRelatively simple approaches can yield novel insightCombining different views of system can improve insightData analysis and modeling is a starting point- not an end point

35

AcknowledgementsSysBEP (http://www.sysbep.org)

NIAID/NIH Y1-AI-8401PI: Josh Adkins, PNNL

Systems Virology (http://www.systemsvirology.org)NIAID/NIH HHSN272200800060CPI: Michael Katze, UW

Clinical Proteomics Tumor Analysis ConsortiumNCI/NIH 1U24CA160019 PIs: Richard Smith, PNNL; Karin Rodland, PNNL

Many, many people in these and other projects who helped with this work and made it possible

About Me

Email: Jason.McDermott@pnnl.govAbout: http://www.jasonya.com/wp/about/Twitter: @BioDataGanacheBlog: The Mad Scientist’s Confectioner’s Club

http://www.jasonya.com/wp/

36

top related