investigating the correspondence between transcriptomic and proteomic expression profiles using...

32
Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter Kolch, Katrina M. Waters, Tao Liu, Brian Thrall and H. Steven Wiley BIOINFORMATICS Vol. 24 no. 24 2008, pages 2894-2900

Post on 15-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models

Simon Rogers, Mark Girolami, Walter Kolch, Katrina M. Waters, Tao Liu, Brian Thrall and H. Steven Wiley

BIOINFORMATICS Vol. 24 no. 24 2008, pages 2894-2900

Page 2: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Outline

● Introduction● The coupled mixture model● Result and discussion● Conclusions

Page 3: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Introduction

● Proteomic:– A blend of “protein” and “genome”– Large-scale study of proteins– More complicated than genomics– mRNA is not always translated into protein, the amount of

protein produced for a given amount of mRNA depends on the gene it is transcribed from and on the current physiological state of the cell.

● Modern transcriptomics and proteomics enable us to survey the expression of RNAs and proteins at large scales.

Page 4: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Introduction

● There is an increasing interest in comparing and Co-analyzing transcriptome and proteome expression data.– A major open question is whether transcriptome and

proteome expression is lined and how it is coordinated.

– Make inferences and predictions about how the network of regulatory control varies at the mRNA and protein levels.

Page 5: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Introduction

● Two strategies:– Concatenating:

● Given mRNA and proteomic data for some set of N genes at T time points.

● Combining the both data into one vector of length 2T.● Groups together genes that share similar mRNA and

protein profiles.● Inflexible! How about the genes that share similar mRNA

profiles but have very different protein profiles (amd vice versa)

Page 6: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Introduction

● A great more clusters in the concatenated space than in either individual representation. (double the size of the feature space without increasing the number of data instances)

– Clustering completely independent● Lose the explicit relationship between the two datasets

Page 7: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Introduction

● A probabilistic clustering model that couples together transcriptomic and proteomic profiles–

– based on two coupled statistical mixture models.

Clustering independentlyconcatenating

For a particular dataset, at which point on this scale our model naturally sits?

Page 8: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

The coupled mixture model

● Assuming we have two separate mixture models, one for the mRNA data (with K components) and one for the proteomic data (with J components)

● Prior distribution over both sets of components : p(k,j)

● Factorize the joint prior as p(k,j)=p(k)p(j|k)– the components of p(j|k) provides us withh details of

the relationship between expression at the mRNA and protein levels.

Page 9: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

The coupled mixture model

● Defining p(k) as πk and p(j|k) as θjk and the complete sets of these parameters as π and θ, respectively. The likelihood of a dataset (X) of G genes is

where Δk and Δj correspond to any parameters unique to the k-th mRNA and j-th protein cluster, respectively.

Page 10: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

The coupled mixture model

● Initialisation and re-starting– The expectation-maximization (EM) algorithm can be

used to find a local maxima of a lower bound on the likelihood function.

– Sensitive to initial conditions● Run the algorithm from 100 random initializations and

keep the one that gave the highest value of the lower bound.

Page 11: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

The coupled mixture model

● Reproducibility– The symmetry of the likelihood with respect to

permutations of the component labels (j and k) makes it very difficult to compare results produced from multiple restarts.

– Comparing the enriched GO terms across multiple restarts.

– If the results are reproducible, we would expect a significant proportion of GO terms to be enriched over many random initializations.

Page 12: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

● Apply to a large dataset of quantitative transcriptomic and proteomic expression data obtained from a human breast epithelial cell line stimulated by epidermal growth factor (EGF) over a series of timepoints corresponding to one cell cycle.

● The number of components K and L were determined individually using the Bayesian Information Criterion (BIC) (K=15, J=20)

Page 13: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

● Preliminary analysis– We first clustered the two data sets separately and

analyzed the similarity between the obtained clustering, finding that there is a very low level of agreement.

– We looked at the number of enriched Gene Ontology (GO) terms found when the two representations are clustered individually and when they are concatenated. Significantly fewer were found when concatenated than when the data sets are analyzed individually.

Page 14: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

• High-level observations– The model defines a prior distribution over the

component to which a protein profile should be assigned conditioned on which component the associated mRNA profile belongs to –

– KxJ matrix provide us with some insight regarding the level of connectivity between the two representations.

)|( kjp

Page 15: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

– is very diffuse rather than being dominated by a small number of protein clusters.

– Each mRNA cluster is connected to a large number of protein clusters, and vice versa, suggesting that the relationship between transcriptional and translation control is a very complex one.

– Quantify the level of complexity by analyzing the entropy of .

• If there is one-to-one relationship between mRNA and protein clusters, the entropy would be close to 0.

)|( kjp

)|( kjp

Page 16: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

● The fact that the decrease is so small can be partly explained by the observation the genes appear to be organized into many small group with homogeneous mRNA and protein profiles.

The left curve gives the true entropy, the right gives the entropy obtained when the proteins are permuted.

Page 17: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

p(j|k) > 0.1

Page 18: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

● Cluster-cluster relationships– The ribosomes:

● In the highly complicated network, one very strong connection stands out:the connection between protein cluster j=4 and mRNA cluster k=3 and k=11

● P(k=3|j=4)=0.3656 and P(k=11|j=4)=0.2316both in the highest 10 values out of the total KxJ =300 values.

● A total of 18 out of the 19 proteins in j=4 are ribosomal and they exhibit an exceptionally high expression homogeneity.

– These proteins must act together to form the large and small ribosomal subunits.

Page 19: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Copyright restrictions may apply.

Rogers, S. et al. Bioinformatics 2008 24:2894-2900; doi:10.1093/bioinformatics/btn553

Rather similar profile

Quite diverse expression profiles

Page 20: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Copyright restrictions may apply.

Rogers, S. et al. Bioinformatics 2008 24:2894-2900; doi:10.1093/bioinformatics/btn553

Isolating mRNA cluster k=3

Enormous diversity of protein profiles

It does not seem unreasonable from these observation to suggest that all of these processes are heavily regulated at the protein level.

Page 21: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

– Cell adhesion

Page 22: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Copyright restrictions may apply.

Rogers, S. et al. Bioinformatics 2008 24:2894-2900; doi:10.1093/bioinformatics/btn553

Page 23: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

– The chaperonin TCP-1 complex

Page 24: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Results and Discussion

● Summary– The correlation between transcription and translation

seems to be generally low and diverge with evolution.– This correlation becomes very limited in mammals.– This results indicate that transcriptional (mRNA) and

translational (protein) networks may have evolved independently unless the rare occasions where a strong selection factor in favor of correlation between gene transcription and protein translation was present.

Page 25: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Conclusion

● The model consists of two Gaussian mixtures coupled through a joint prior on the mixture components and allows us to find clusters of genes similar at the mRNA and protein levels and unravel the links between them.

Page 26: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

GO的介紹● The Gene Ontology project, or GO, provides a

controlled vocabulary to describe gene and gene product attributes in any organism.

● http://www.geneontology.org/ GO的入口網站

● GO裡有三個重要的資料庫  1. FlyBase (Drosophila)

  2. Saccharomyces Genome Database

  3.Mouse Genome Database

Page 27: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

ontologies

● GO裡所要用到的檔案內容有下列三種模式  1.molecular function

  2. biological process

  3. cellular component

Page 28: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

● 一個 gene product 裡有一個或許多個  molecular functions ,並且被利用於一個或多 個 biological processes ,也可能是經由一個 或多個 cellular components 所組合成的

Page 29: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Molecular function

● 在go裡Molecular function 代表的是activities ,而不是一個實體 (molecules or complexes) 。

  Examples of broad functional terms are catalytic activity, transporter activity, or binding;

  Examples of narrower functional terms are adenylate cyclase activity or Toll receptor binding.

Page 30: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Biological process

● 一個 biological process 是經由一個或多個整齊的 molecular functions 集合所完成的事件序列。

  Examples of broad biological process terms are cellular physiological process or signal transduction.

  Examples of more specific terms are pyrimidine metabolism or alpha-glucoside transport.

Page 31: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Cellular component

● A cellular component is just that, a component of a cell but with the proviso that it is part of some larger object, which may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein dimer).

Page 32: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models Simon Rogers, Mark Girolami, Walter

Gene Ontology term

● Each GO term consists of a unique alphanumerical identifier.

● When a term has multiple meanings depending on species, the GO uses a "sensu" tag to differentiate among them.

● Terms are classified into only one of the three ontologies, which are each structured as a directed acyclic graph.