netbiosig2013-talk gang su

Tags:

Post on 10-May-2015

1.490 Views

Category:

Health & Medicine

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation for Network Biology SIG 2013 by Gang Su, University of Michigan, USA. “CoolMap Cytoscape App: Flexible Multi-scale Heatmap-Driven Molecular Network Exploration”

TRANSCRIPT

A ‘Cool’ Heatmap: and its Applications in Flexible Multi-scale Molecular Network Exploration

Molecular����������� ������������������  Behavioral����������� ������������������  Neuroscience����������� ������������������  Institute����������� ������������������  Department����������� ������������������  of����������� ������������������  Computational����������� ������������������  Medicine����������� ������������������  and����������� ������������������  Bioinformatics����������� ������������������  University����������� ������������������  of����������� ������������������  Michigan,����������� ������������������  Ann����������� ������������������  Arbor����������� ������������������  48109����������� ������������������  sugang@umich.edu����������� ������������������  

Gang����������� ������������������  Su,����������� ������������������  PhD����������� ������������������  

Network����������� ������������������  Biology����������� ������������������  Sig����������� ������������������  2013����������� ������������������  Friday����������� ������������������  July����������� ������������������  19th,����������� ������������������  Berlin,����������� ������������������  Germany����������� ������������������  

Heatmap… What is it? ‘CoolMap.. I, am your father’

¤  One of the most popular way of visualizing tabular data ¤  X column, Y row, value color

¤  Trees for hierarchical clustering, or groups are often drawn along the sides

¤  Great format for visual exploration and pattern discovery

¤  Used along with node-edge network views such as Cytoscape-clusterExplorer

¤  The paradigm remains largely unchanged

The American Statistician, 2009; !PNAS Dec. 8, 1998 Vol. 95 No. 25 14863-14868 !

Czekanowski (1909) ! Brinton (1914) !Loua (1873) ! Eisen (1998) !12k citations !

The Good, the Bad, and the Ugly… of the conventional heatmaps

¤  The Good ¤  Mapping number to color makes it intuitive

¤  Clustering patterns become conspicuous and interpretable

¤  The Bad

¤  Increasingly difficult to visualize and explore big datasets

¤  Difficult for data other than numeric

¤  The Ugly ¤  Difficult to incorporate existing annotations such as pathways and ontologies

¤  Difficult to visualize high-level relationships such as overall pathway to pathway correlations

The “Figure 1” Phenomena

There are known knowns, and there are known unknowns.

PLoS Genet. 2008 Mar 14;4(3):e1000034 ! BMC Bioinformatics. 2011; 12(Suppl 1); 2011 !

How do we relate the unknown to the known: From observed patterns to existing knowledge interactively and intuitively?

The $$$ Solution

There are only that many screens you can buy

The CoolMap Solution: Nuts and Bolts

¤  Core concept: ‘Collapsible Heatmap’ ¤  The tree nodes can be expanded/collapsed at any level:

¤  Think about a two-way multi tree

¤  Collapsed data are represented using aggregation functions (mean, median, etc.)

¤  The aggregation enables the user to explore data at multiple levels:

¤  Identify potential signals from high level aggregated views

¤  Expand nodes or interest, while keeping the context around

!Using mean to collapse four numeric cells

The two way tree can be expanded and collapsed at multiple levels

CoolMap: Core Design Concepts

¤  Extensible Interfaces: ¤  A Loader that imports custom data objects into a ‘base’ matrix

¤  An aggregator that transforms a group of ‘base’ data objects into a ‘view’ data object

¤  A render that renders the ‘view’ data object to the designated region in the interactive view

Example:

¤  Gene expression values of all genes in pathway A, sample group B, aggregated using median, and rendered in color [0.5, 1, 2.1, 3.2, 4.3] [2.1]

¤  Nucleotide sequences belong to the same transcription factor binding sites, aggregated using IUPAC consensus code to a single letter, and rendered in text: [A,A,A,A,T] [A] A

¤  The ‘base’ matrix can use a variety of data structures, such as arrays, lists, sparse matrices or even remote services

¤  Flexible Row/Column Ontological Trees: ¤  Multiple-inheritance tree

¤  Genes or metabolites may be shared by multiple pathways or ontological terms, and may occur more than once.

¤  Trees from different sources

¤  Side by side comparison of different ontologies (GO, KEGG, Hierarchical Clustering)

¤  Trees may be used at any level

¤  Tree nodes at any level can be inserted into any place in the tree.

Near-ready Releases ¤  CoolMap Core

¤  Core interfaces, data structures and utility functions for base matrix, view matrix, ontology trees, renderers, interactive view panels, etc.

¤  CoolMap Application ¤  An application with auxiliary modules such as dynamic multiple dataset

synchronization, searcher, filters, sorters, data persistence etc.

¤  Followed many best practices from Cytoscape

¤  CoolMap Cytoscape Prototype Plugin ¤  A Cytoscape plugin that enables two way communication between

Cytoscape and CoolMap

Our user classroom user study of a group of undergraduate students with preliminary computer and bioinformatics background shows:

65% found it easy or not difficult to learn 74% highly enjoyed or enjoyed the software

Screenshot

Case Study 1: Eisen Yeast Data

Eisen (1998) !

Gene expression fold change of selected gene groups and experiment conditions

CoolMap makes it easier to interpret data from the higher concept levels

CoolMap!

Case Study 1: Eisen Yeast Data (con’t)

CoolMap reveals more than meets the eye from conventional heatmaps

The peculiar outlier sample of spo5 2 Fold change reversed across many pathways Easier to identify in the aggregated view

í

Case Study 1: Eisen Yeast Data (con’t)

Using CoolMap’s multi-view link functions to compare different ontology definitions Left: Go 6096: Glycolysis Right: Eisen’s annotated Glycolysis cluster

Integrate existing knowledge with observed data for hypothesis generation

Case Study 2: Diet Induced Differential Gene Expression

¤  Individuals fed on SFA (Saturated Fatty Acid) and Monounsaturated Fatty Acid (MUFA) diets demonstrate differential gene expression over 8 week span

¤  Authors picked a list of immune related genes showed up-regulation of these genes

The American journal of clinical nutrition 90, 1656-64 (2009) !

CoolMap!

Probe level expression profiles can be maintained

Case Study 2: Diet Induced Differential Gene Expression (cont’d)

Using ontology groups (genders) leads to new discoveries: up-regulated gene groups and gender-specific responses: weaker patterns. Total of 25k probes

Case Study 2: Diet Induced Differential Gene Expression (cont’d)

Up-regulated clusters Female-specific Male-specific

Case Study 3: Mother-Child Nutrition Data (Unpublished)

v The aggregated group view makes it much easier to interpret at concept level v We can immediately identify that:

§  BCAA AcylCarnitines(0.45), Long Chain AcylCarnitines(0.34), PPARa methylation (0.52), ESR Methylation (0.32) are highly correlated between mother and child

Burant C. Unpublished data !

Case Study 3: Mother-Child Nutrition Data (Unpublished) PPARa: One Level Down ê

¤  Validation ¤  Boxplot overlay (left) and expanded view (right) shows the high correlation is unlikely to be a result

from error, outliers or noise (mean 0.52) ¤  Strong association of PPARa methylation levels in mother and child.

¤  Hypothesis ¤  As PPARa regulates genes involved in cell proliferation, cell differentiation and inflammation

responses, the expression profile of these genes may also be correlated in mother and child.

http://www.ncbi.nlm.nih.gov/gene/5465 !

Burant C. Unpublished data !

Case Study 3: Mother-Child Nutrition Data (Unpublished) BCAA AcylCarnitines

¤  The Mother-child correlation is lower (mean 0.45) ¤  The BCAA AcylCarnitines intra-child group have a larger variance comparing with Mother

¤  While C3 is highly correlated, C4 has low correlation

Case Study 4: DNA Methylation Missing values and ragged data (unpublished)

¤  Sparse or Ragged matrix ¤  Normalized methylation data: every gene has a different number of methylation sites.

¤  Collapsing by cell line (Caski.1 and Caski.2 cell lines) reveals the aggregated (mean, etc.) normalized methylation value. Expansion by cell line reveals details for each methylation site.

Sartor M. Unpublished data !

Case Study 5: Continuous Glucose Monitoring (CGM)

Display glucose level at: •  a variety of time resolutions:

From 5 min to 1 month •  and sample groups:

age groups, gender

Link hypoglycemia events to blood sugar changes.

Case Study 6: Sequence Analysis Example

¤  Interactive Consensus sequence exploration: CRP (Catabolite Activator Protein) binding site, 49 sequences in dozens of promoters | Chip-seq

¤  Extend CoolMap: Loader, Aggregator, Renderer [Annotator]

Full Sequence View !

Sequence Logo !

Consensus View !

Consensus View with base percentage overlay !

Consensus View with GC content overlay !Genome Res. 2004 June; 14(6): 1188-1190 !

Case Study 7: Network Analysis

¤  Link Cytoscape with CoolMap: ¤  Network node link with CoolMap views, by ID, attribute names, etc.

¤  Explore identified patterns in an experiment to curated networks – an alternative for JTreeView; create correlation matrices from Cytoscape numeric attributes;

¤  Use pathways and ontologies to view sub-network to sub-network connectivity

¤  Cluster network based on attributes, and compare unsupervised clustering v.s. annotated pathways and ontologies.

Need two monitors!

Case Study 7: Network Analysis (con’t)

Top Left: MAPK pathway in ‘galFiltered.cys’ network from Cytoscape Bottom Left: Part of the same network arranged with pathways and the adjacency matrix, and sum as aggregator. Each cell shows the number of edges within each pathway, as well as the number of inter-pathway edges. A good ‘community’ clustering will have most of the green dots along the diagonal Right: The same view with MAPK pathway expanded, showing dense intra-cluster connectivity

Case Study 7: Network Analysis (con’t)

Left: a correlation matrix can be created from gal expression profiles, and then use pathways to arrange them into a condensed concept correlation view. Hierarchical clustering can be run from the concept level. Right: The selected region contains nodes are annotated with KEGG pathway: Cell cycle and are close to each other in the network

Acknowledgement

Thank you! Primary Advisor

Dr Fan Meng

Committee Mentors

Dr Brian D. Athey (Co-chair)

Dr Charles F. Burant and his lab

Dr Barbara Mirel

Dr Maureen Sartor

Testers

Usability testers and software testers, fellow Bioinformatics brethren.

Development

Please contact me if you are interested in development or testing:

sugang@umich.edu

top related