ismb, june.26.2005. from protein sequences…to protein networks database dna and protein sequences...

38
ISMB, June.26.2005

Post on 22-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

ISMB, June.26.2005

Page 2: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

From protein sequences…to protein networks

DatabaseDNA and protein

sequences

Query Sequence

GACTGCATTAC

Family ofhomologous genes

Cellular responseof interest

Interaction pathwaysassociated with

cellular response

Database / Scaffold ofMolecular Interactions

Page 3: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Cytoscape.orgCytoscape is a freely-available (open-source, java-based) bioinformatics software platform forvisualizing biological networks (e.g. molecular interaction networks) andanalyzing networks with gene expression profiles and other state data.

Additional features are available as plugins.  • jActiveModules: identify significant “active” subnetworks• Expression Correlation Network: cluster expression data • Agilent Literature Search: build networks by extracting interactions from scientific literature.• MCODE: finds clusters of highly interconnected regions in networks• cPath: query, retrieve and visualize interactions from the MSKCC Cancer Pathway database• BiNGO: determine which Gene Ontology (GO) categories are statistically over-represented in a set of genes • Motif Finder: runs a Gibbs sampling motif detector on sequences for nodes in a Cytoscape network. • CytoTalk: Interact with Cytoscape from Perl, Python, R, shell scripts or C or C++ programs.

Core Features

• Customize network data display using visual styles

• Powerful graph layout tools

• Easily organize multiple networks

• Easily navigate large networks

• Filter the network

• Plugin APIInput/Output

• Protein protein interactions from BIND, TRANSFAC databases

• Gene functional annotations from Gene Ontology (GO) and KEGG databases

• Biological models from Systems Biology Markup Language (SBML)

• cPath: Cancer Pathway database

• Proteomics Standards Initiative Molecular Interaction (PSI-MI) or Biopathway Exchange Language (BioPAX) formats

• Oracle Spatial Network data model

Page 4: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Outline

• Introduction (5 min)• Cytoscape as a network integration and query tool

• Basic features demo (15 min)• Load network• Navigate/Zoom/Select/Filter Nodes• Create subnetworks• Visual styles• Layout

• Plugin demo (25 min)• MCODE and BinGO• Agilent Literature Search Plug-in• cPATH

• Future work (5 min)

Page 5: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

cPath PlugIn

• cPath: Overview

• cPath: XML Web Service

• cPath Cytoscape PlugIn– Demo: Download sample protein-protein

interaction network.– Demo: Drill down to protein details.

Page 6: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Page 7: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

cPath: XML Web Services API

• Provides a URL-HTTP XML Web Services API to all cPath Data.

• Formats:– PSI-MI: Proteomics Standards Initiative Molecular Interaction

Format– BioPAX: Biological Pathway Exchange Format

• Commands:– Query by keyword; query by interactor name; query by Pub

Med ID, etc.• Example Query:

• http://www.cbio.mskcc.org/cpath/webservice.do?version=1.0&cmd=get_by_interactor_name_xref&q=P04273&format=psi_mi&startIndex=0&organism=&maxHits=10

Page 8: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

cPath Cytoscape PlugIn

• Enables Cytoscape users to easily query, download and visually render interactions stored in cPath.

• Utilizes the cPath XML Web Service• Automatically bundled with Cytoscape 2.1

– Works out of the box

• Additional details available on the Cytoscape PlugIn home page:– http://cytoscape.org/plugins2.php

Page 10: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

cPath PlugIn Demo

Page 11: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Baker’s yeast(Saccharomyes

cerevisiae)

Nematode worm(Caenorhabditis

elegans)

http://www.pathblast.org

FUTURE DIRECTIONS:Cross-comparison of networks

(1) Alignment of networks across species (network conservation)(2) Correspondence between physical and genetic networks(3) Conserved regions in the presence vs. absence of stimulus

Fruit fly(Drosophila

melanogaster)

Page 12: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Network alignment with PathBLAST

Pe random

Pv random

q

eq

p

vpPS

10

10

log

log

P is a path in the global alignment graph.

The v and e represent vertices and edges in P.

The value p(v) is the prob. of true homology for the proteins aligned at v.

The value q(e) is the prob. that the protein interaction at e is real, i.e., not a false-positive.

Page 13: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Example yeast/worm/fly alignments

Roded Sharan et al. PNAS 2005

Page 14: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Integration of genetic and physical interactions

160 between-pathway models

101 within-pathway models

Num interactions:1,102 genetic933 physical

Ryan Kelley et al. Nature Biotechnology 2005

Page 15: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

A between-pathway model

Page 16: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Upcoming Events

• Cytoscape ConferenceNov 30th and Dec 1st, 2005

• RECOMB Satellite Conference on Network Biology and Gene RegulationDec 2nd-4th, 2005

Mailing lists– [email protected][email protected]

Page 17: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Cytoscape Team

Trey IdekerMark AndersonNerius LandysRyan KelleyChris Workman

Past contributors:Nada AminOwen OzierJonathan Wang

Benno SchwikowskiLee HoodRichard BonneauRowan Christmas

Past contributors:Iliana Avila-CampilloLarissa KamenkovichAndrew MarkielPaul Shannon

Chris SanderGary BaderEthan CeramiRob Sheridan

AgilentAnnette AdlerAllan KuchinskyAditya VailayaMike Creech

Page 18: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Funding Sources

• NIH (NIGMS) R01 GM070743-01Program Manager: John Whitmarsh

• NCI caBIGKen Buetow, Peter Kovitz

• Unilever, PLCGuy Werner

• PathBLAST network comparisonNSF Quantitative Systems Biology

Program Manager: Mitra Basu

Page 19: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

ExtensibleArchitecture: 100% open source Java

– Core + plugin API– Plugins are independently licensed– “Just need to write the algorithm”– Template code samples

Plugin

Page 20: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Layout

• 16 algorithms available through plugins

• Zooming, hide/show, alignment

Page 21: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

yFiles Organic

Page 22: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

yFiles Circular

Page 23: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Visual Styles

• Map graph attributes to visual attributes

• Define visual styles for later use

• Graph has node and edge attributes• E.g. expression data, interaction type, GO function

• Mapped to visual attributes• E.g. node/edge size, shape, color, font…

• Take continuous gene expression data and visualize it as continuous node colors

Page 24: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Visual Styles

Load “Your Favorite Network”

Page 25: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Visual Styles

Load “Your Favorite Expression”Dataset

Page 26: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Visual Styles

Map expression valuesto node colors using acontinuous mapper

Page 27: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Visual Styles

Expression data mappedto node colors

Page 28: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Visual Styles• Node attributes: node color, border color, border type,

node shape, size, label, font• Edge attributes: edge color, line types, arrows, label,

font• Multidimensional visual attribute mapping soon

Page 29: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes
Page 30: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

MCODE and Biomodules Plugins (MSKCC and ISB)

• Clusters in a protein-protein interaction network have been shown to represent protein complexes and parts of pathways

• Clusters in a protein similarity network represent protein families

• Network clustering is available through the MCODE Cytoscape plugin

Page 31: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Proteasome 26S

Proteasome 20S

Ribosome

RNA Pol core

RNA Splicing

Page 32: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

Biomodules (ISB)

Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A,Dimitrov K, Siegel AF, and Galitski TGenome Res. 2004 14: 380-390

Page 33: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Agilent Literature Search Plugin for Cytoscape

Extract Nouns/Verbs

(User Context/BNS)

SentenceTokenization

No

Is InterestingSentence?

Yes

Normalize Nouns(User Context/BNS)

Classify Sentence Into Interaction Type

BindCleaveInhibitPromoteCatalyze

Convert to ALFA

Retrieved Documents

Meta-Search

Terms Context

Query

Get Document

Output ALFANetwork

Query Interface

Information Extraction Routine

Output CytoscapeNetwork

Page 34: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Page 35: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Cytoscape Network produced by Literature Search.

Abstract from the scientific literature

Sentences for an edge

Page 36: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Active Modules(UCSD)

Ideker T, Ozier O, Schwikowski B, Siegel AFBioinformatics. 2002;18 Suppl 1:S233-40

Page 37: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Active Modules

Page 38: ISMB, June.26.2005. From protein sequences…to protein networks Database DNA and protein sequences Query Sequence GACTGCATTAC Family of homologous genes

www.cytoscape.org

Biomodules (ISB)

Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A,Dimitrov K, Siegel AF, and Galitski TGenome Res. 2004 14: 380-390