computing co-expression relationships wen-dar lin

39
Computing Co- Expression Relationships Wen-Dar Lin

Upload: augustine-price

Post on 14-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing Co-Expression Relationships Wen-Dar Lin

Computing Co-Expression Relationships

Wen-Dar Lin

Page 2: Computing Co-Expression Relationships Wen-Dar Lin

Contents

• Motivation• Basic Idea• Case Studies

– An Example of Single Experiment

– An Example of Time-Course Experiment

• Potential Applications• Availability• Future Works

Page 3: Computing Co-Expression Relationships Wen-Dar Lin

Motivation

• Given a set of differentially displayed genes that are reported by an array experiment.– We would like to know relationships among

these genes.– These relationships may recover important

modules or motifs with respect to the experiment.

Page 4: Computing Co-Expression Relationships Wen-Dar Lin

Motivation

• Co-expression relationships are one kind of the most biologically meaningful and easily computable relationships.– Co-expression relationships form modules that

may infer important biological information.– They can be computed from a large amount of

publicly available array data.

Page 5: Computing Co-Expression Relationships Wen-Dar Lin

Basic Idea

• Array data can be retrieved from publicly available data repository– like the NASCarrays, NCBI GEO, EMBL-EBI

ArrayExpress

• They should be normalized before computing the co-expression relationships.– e.g. normalized by the RMA method

Page 6: Computing Co-Expression Relationships Wen-Dar Lin

Basic Idea

• Defining co-expression relationships– We define that a co-

expression relationship between two genes exists if the pearson correlation coefficient between their normalized expression levels is greater than or equal to a certain threshold.

slide # 1 2 3 4 …

gene X 1 2 10 3 …

gene Y 5 2 12 4 …

X

Y

Page 7: Computing Co-Expression Relationships Wen-Dar Lin

Basic Idea

• Properties of pearson correlation coefficient– Let Correl(A, B) be the

pearson correlation coefficient between normalized expression levels of gene A and gene B.

– 0 Correl(A, B) 1

from http://www.gseis.ucla.edu/courses/ed230bc1/notes1/var1.html

negative correlation

Page 8: Computing Co-Expression Relationships Wen-Dar Lin

Basic Idea

• The computational assistance– Given a set of interested genes– Compute co-expression relationships among

them– Identify co-expression clusters

Page 9: Computing Co-Expression Relationships Wen-Dar Lin

Case Studies

• We have implemented aforementioned ideas into a tool kit and applied it to two case studies.– A single experiment– A time-course experiment

Page 10: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

• In this example, an array experiment was performed– 178 differentially displayed genes were

identified.– Based on RMA array data of 300 ATH1 slides

downloaded from the NASCarrays• sample of each slide was derived nonexclusively

from roots• Threshold for pearson correlation coefficient = 0.7

Page 11: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

Two larger clusters

One minor subcluster

Page 12: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

• We may compute co-expression relationships based on all kinds of array experiment data– Based on RMA array data of 1436 ATH1 slides

downloaded from the TAIR, co-expression relationships were identified

• Threshold for pearson correlation coefficient = 0.7

Page 13: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

Two larger clusters

Page 14: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

• Is there any difference between the graphs based on root-array data and that based on all-array data?– By differentially marking clusters of one graph

onto the other graph.

Page 15: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

Two clusters mapped by the other graph

One cluster that should be root-specific

Page 16: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

Cluster sizes:47 & 14

Cluster size: 9

Page 17: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

• Some remarks– The number of differentially displayed genes reported

by the experiment is 178

– The number of clustered genes is 47+14+9 = 70• Reduced by more than 50%

– The co-expression relationships are recovered• Each cluster may be a module that usually work together.

– Finding tissue-specific co-expression relationships• Can be done by mapping the graph based on all-array data

onto the graph based on tissue-related-array data.

Page 18: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

• In addition to cluster genes according to co-expression relationships, we also fished genes that may potentially co-expressed.– These genes may not be identified as

differentially displayed in the experiment.

Page 19: Computing Co-Expression Relationships Wen-Dar Lin

A Single Experiment

• A GO enrichment analysis was also carried out– using the GOBU software (gobu.iis.sinica.edu.tw)– which should give a conceptual view of clustered

genes.

Page 20: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

• In this example, a time-course array experiment was performed– Three time points– About 800 genes differentially displayed at

least one time point.– Based on array data of 300 ATH1 slides

extracted from RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays

• Threshold for pearson correlation coefficient = 0.8

Page 21: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

Time point 1

About 100 genes

About 100 genes

Page 22: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

Time point 2

About 100 genes

About 100 genes

Page 23: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

Time point 3

About 100 genes

About 100 genes

Page 24: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

• Though this clustering and time-course expression data shows some biological meaning,– this size of clustered genes (more than 200)

• makes the graph too complex and

• is too large to be realized in a short time.

Page 25: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

• Reducing the size of clustered genes may help– reducing complexity of the graph and

– realizing revealed co-expression module

• We reduced the graph by removing co-expression relationships that generally exist in the entire plant– based on RMA array data of about 2600 ATH1 slides

downloaded from the NASCarrays

– Threshold for pearson correlation coefficient = 0.7

Page 26: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

• Edges (relationships) to be removedY

root-related

others

X

Page 27: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

• Edges (relationships) to be retainedY

root-related

others

X

Page 28: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

About 20 genes

About 60 genes

About 50 genes

Time point 1

Page 29: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

About 20 genes

About 60 genes

About 50 genes

Time point 2

Page 30: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

About 20 genes

About 60 genes

About 50 genes

Time point 3

Page 31: Computing Co-Expression Relationships Wen-Dar Lin

A time-course experiment

• Some remarks– The number of differentially displayed genes at least

one time point is about 800.

– The number of clustered genes is about 60+50+20 = 130

• Reduced by more than 80%

– The retained graph contains edges, i.e., gene pairs, that are co-expressed in root but not in the entire plant

• The recovered clusters should be root specific.

Page 32: Computing Co-Expression Relationships Wen-Dar Lin

Potential Applications

• We have created a tool kit that– computes co-expression relationships based on array

data• where probe names can be replaced by aliases made by

something like orthologous mapping• can be used for studying non-model organism using array data

of a model organism.

Page 33: Computing Co-Expression Relationships Wen-Dar Lin

Potential Applications

• We have created a tool kit that– fills colors according to graphs by

• intensity fold-changes, or

• clusters in another graph

Page 34: Computing Co-Expression Relationships Wen-Dar Lin

Potential Applications

• We have created a tool kit that– removes/retains co-expression relationships in

another graph– finds specific or common co-expression

relationships200 genes 120 genes

Page 35: Computing Co-Expression Relationships Wen-Dar Lin

Potential Applications

• We have created a tool kit that– fishes genes that are

potentially co-expressed with assigned bait

Page 36: Computing Co-Expression Relationships Wen-Dar Lin

Future Works

• Incorporate pathway database– like the AraCyc– for finding relationships between co-expression

clusters and known pathways

• A user-friendly interface which would – facilitate using this tool kit and– help manage output data

Page 37: Computing Co-Expression Relationships Wen-Dar Lin

Availability

• The tool kit is now an open-source project– http://maccu.sourceforge.net– Project name: MACCU

• Multi-Array Correlation Computation Utility

– A detailed description of each program module has been created.

– A running script with example is provided.

Page 38: Computing Co-Expression Relationships Wen-Dar Lin

Special Thanks

• I would like to thank– Drs. Chang (Bill), Schmidt & Wu

• for raising this idea,

• the initial implementation, and

• valuable comments.