modeling functional genomics datasets cvm8890-101 lesson 6 11 july 2007bindu nanduri

35
Modeling Functional Modeling Functional Genomics Datasets Genomics Datasets CVM8890-101 CVM8890-101 Lesson 6 Lesson 6 11 July 2007 11 July 2007 Bindu Nanduri Bindu Nanduri

Upload: wilfrid-blake

Post on 28-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Modeling Functional Genomics Modeling Functional Genomics DatasetsDatasets

CVM8890-101CVM8890-101Lesson 6Lesson 6

11 July 200711 July 2007 Bindu NanduriBindu Nanduri

Lesson 6: Functional Lesson 6: Functional genomics modeling II: a genomics modeling II: a

pathway analysis example. pathway analysis example.

Introduction to protein interaction networks

Cancer

Proliferation

Differentiation

Quiescence

Programmed Cell Death

Cell

Differentiation

Proliferation

Differentiation

Quiescence

Programmed Cell DeathAnergyActivation

CD4 +T ‘helper” Lymphocyte

Lymphoma

Agbase protein annotation processAgbase protein annotation process

Protein identifiers or Fasta format

GORetriever

Annotated Proteins

GOanna

Proteins with no annotations

GOSlimViewer

44%

67%

67%

Proliferation

AngiogenesisApoptosis

MigrationQuiescenceDifferentiation

AnergyActivation

SenescenceCell Cycle

100% 20% 80% 69%31%

56%

79% 21%

92% 8% 92% 8% 32%68%

33%

33%

Potential CD4+ T lymphocyte Biological Processes

AP-1 dependent gene

expressionMetastasisTumor

invasion

AP-1

Integrin Signaling Pathway

Hypothesis driven data analysis

Exploration of data to identify pathways of interacting proteins

Protein protein interaction networks (PPI)

Why study PPIsWhy study PPIs

Proteins do not function alone!!!!!Proteins do not function alone!!!!!

PPI are inherent to the function of PPI are inherent to the function of multiprotein complexesmultiprotein complexes

PPIs can help infer function : where PPIs can help infer function : where functional information is available for one functional information is available for one partnerpartner

Changes in normal PPI can result in Changes in normal PPI can result in

diseasedisease

Types of PPI

PPI categories based on composition, affinity and timescale of interaction

Homo and hetero oligomeric complexes: interactions between identical ornon-identical chains

Obligate PPI: protomers do not exist in as stable structures in vivothese are functionally obligate

Non-obligate PPI: protomers can exist as stable structures, may co-localizefor function /are co-localized

c

Arc repressor dimer necessary for DNA binding

Non-obligate homo dimer Sperm lysin

PPI based on the life time of the complex: transient or permanent

Permanaent interactions are stable and exist only as complex

Transient interactions are marked by association/dissociation cycles in vivo

Weak interactions (sperm lysin) associate and dissociate

Strong transient interactions require a molecular triggerheterotrimeric G protein dissociates to G-alpha and g-beta and g-gamma when it binds to GTP , GDP-bound form is a trimer

Control of protein oligomerization

PPI interactions are a continuum of obligate and non-obligate states

Interactions of complexes driven by concentration and free energy of complex relative to alternate states

Take home message of PPI types

PPI interactions are a continuum of obligate and non-obligate states

Interactions of complexes driven by concentration and free energy of complex relative to alternate states

How to identify PPI

Experimental Computational

Gene Coexpression

TAP assays

Sequence coevolution

Yeast two hybrid Phylogenetic profile

Gene Cluster

Rosetta stone method

Text mining

TAP assays

Yeast two hybrid (Y2H)

Protein arrays

PLoS Computational Biology March 2007, Volume 3 e42

Y2H Assay

Eukaryotic transcription factors have DNA binding and activation domain

Physical association of these domainsactivates transcription

Cretae chimeric proteins with either BD or AD tranfect yeast

Gal4/LexA based reporters

In vivo method that can detecttransient PPI

TAP Assay

TAP tag consists of two IgG binding domains of Staphylococcus protein Aand calmodulin binding peptideseperated by tobacco etch virus protease cleavage site

TAP provides direct information on protein complexes

O. Puig et al,Methods, 2001

PLoS Computational Biology March 2007, Volume 3 e42

Gene Coexpression

Expression profile similarity

correlation coefficient between relative expression levels of two genes/proteinsthe normalized difference between their absolute expression levels

The distribution for target proteins is compared with the distributions for random noninteracting protein pairs

Expression levels of physically interacting proteins coevolvecoevolution of gene expression is a better predictor of proteininteractions than coevolution of amino acid sequencesGood for studying permanent complexes : ribosome, proteasome

PLoS Computational Biology March 2007, Volume 3 e42

Protein microarrays/chips

Protein chips are disposable arrays of microwells in silicone elastomer sheets placed on top of microscope slides

Target proteins are over expressed immobilized and probed with fluorescentlylabeled proteins

H Zhu et al (2000) “Analysis of yeast protein kinases using protein chips” Nature Genetics 26: 283-289

can detect PPI between actual proteins

PLoS Computational Biology March 2007, Volume 3 e42

Database/URL/FTP Type

DIP http://dip.doe-mbi.ucla.edu E,SBIND http://bind.ca E,C,SMPact/MIPS http://mips.gsf.de/services/ppi E,C,FSTRING http://string.embl.de E,P,FMINT http://mint.bio.uniroma2.it/mint E,CIntAct http://www.ebi.ac.uk/intact E,CBioGRID http://www.thebiogrid.org E,CHPRD http://www.hprd.org E,CProtCom http://www.ces.clemson.edu/compbio/ProtCom S,H3did, Interprets http://gatealoy.pcb.ub.es/3did/ S,HPibase, Modbase http://alto.compbio.ucsf.edu/pibase S,HCBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbm SSCOPPI http://www.scoppi.org/ SiPfam http://www.sanger.ac.uk/Software/Pfam/iPfam SInterDom http://interdom.lit.org.sg PDIMA http://mips.gsf.de/genre/proj/dima/index.html F,SProlinks http://prolinks.doe-mbi.ucla.edu/cgibin/functionator/pronav/ FPredictome http://predictome.bu.edu/ F

PLoS Computational Biology March 2007, Volume 3 e42

Database/URL/FTP Type

DIP http://dip.doe-mbi.ucla.edu E,SBIND http://bind.ca E,C,SMPact/MIPS http://mips.gsf.de/services/ppi E,C,FSTRING http://string.embl.de E,P,F

Type of data (high-throughput experimental data (E), structural data (S), manual curation(C), functional predictions (F), and interface homology modeling (H)Unit of interaction :P is proteinIntAct http://www.ebi.ac.uk/intact E,CBioGRID http://www.thebiogrid.org E,CHPRD http://www.hprd.org E,CProtCom http://www.ces.clemson.edu/compbio/ProtCom S,H3did, Interprets http://gatealoy.pcb.ub.es/3did/ S,HPibase, Modbase http://alto.compbio.ucsf.edu/pibase S,HCBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbm S

PPI database comparisons PPI database comparisons

Proteins: Structure, Function and Bioinformatics 63:490-500 2006

Experimental PPI dataset overlap is small Experimental PPI dataset overlap is small

High FP rate in high- throughput expHigh FP rate in high- throughput exp

………….difficult to confirm by multiple sources.difficult to confirm by multiple sources

How to identify PPI

Experimental Computational

Gene Coexpression

TAP assays

Sequence coevolution

Yeast two hybrid Phylogenetic profile

Gene Cluster/neighborhood

Rosetta stone method

Text mining

TAP assays

Yeast two hybrid (Y2H)

Protein arrays

PLoS Computational Biology March 2007, Volume 3 e43

Phylogenetic profile (PP)

Hypothesis: functionally linked and potentially interacting nonhomologous proteins co-evolve and have orthologs in the same subset of fully sequenced organisms

PLoS Computational Biology March 2007, Volume 3 e43

Gene Cluster, Gene Neighborhood

Genes in the gene cluster/operon are co-regulated and participate in the same biological function

PLoS Computational Biology March 2007, Volume 3 e43

Sequence Co-evolution

interacting proteins very often co-evolve

changes in one protein ( loss of function or Interaction) compensated by the correlated changes in another protein.

The orthologs of co-evolving proteinstend to interact, thereby making it possible to infer unknowninteractions in other genomes

co-evolution can be reflected in terms of the similarity between phylogenetic trees of two non-homologousinteracting protein families

PLoS Computational Biology March 2007, Volume 3 e43

Rosetta Stone method

interacting proteins/domains have homologs in other genomesfused into one protein chain, a Rosetta Stone protein

Gene fusion occurs to optimize co-expression of genes encoding forinteracting proteins.

Text MiningUtilizing the wealth of publicly available data Utilizing the wealth of publicly available data

..search Medline or PubMed for words or word ..search Medline or PubMed for words or word combinationscombinations

co-occurrence of words together is a simple metric, however co-occurrence of words together is a simple metric, however prone to high false positive ratesprone to high false positive rates

Natural Language Processing (NLP) methods are specific Natural Language Processing (NLP) methods are specific

““A binds to B”; “A interacts with B”; “A associates with B” A binds to B”; “A interacts with B”; “A associates with B” difficult to detect so it has a higher false negative difficult to detect so it has a higher false negative raterate

Normally requires a list of known gene names or protein Normally requires a list of known gene names or protein names for a given organismnames for a given organism

GO ToolBoxGenome Biol. 2004;5(12):R101.

ProtQuant tool