gene expression clustering. the main goal gain insight into the gene’s function. using: sequence...
TRANSCRIPT
![Page 1: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/1.jpg)
Gene Expression Clustering
![Page 2: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/2.jpg)
The Main Goal
Gain insight into the gene’s function.
Using: Sequence Transcription levels.
![Page 3: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/3.jpg)
Microarray Technology
![Page 4: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/4.jpg)
Microarray Technology
Microarray - standard laboratory technique. Information about gene expression. Tens of thousands of data points. Analyze by computational methods.
![Page 5: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/5.jpg)
Gene Clustering
To cluster genes means to group together genes with similarity in their expression patterns.
![Page 6: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/6.jpg)
Why do we need to cluster genes?
Unknown gene function. Common regulatory elements. Pathways and biological processes. Defining new disease subclasses. Predict categorization of new samples. Data reduction and visualization.
![Page 7: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/7.jpg)
Gene Clustering
Clustering methods can be divided into two major groups: Supervised clustering –classify according to previous
knowledge (group prediction). Unsupervised clustering – no previous knowledge is
used (pattern discovery).
![Page 8: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/8.jpg)
Unsupervised clustering
In many cases we have little a-priory knowledge about genes.
There are many different methods of unsupervised clustering.
We will present Hierarchical clustering.
![Page 9: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/9.jpg)
The Method
![Page 10: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/10.jpg)
Hierarchical clustering
All data instances start in their own clusters. Two most closely related clusters are merged. Repeated until a single cluster remains.
Arranges the data into a tree structure Can be broken into the desired number of
clusters.
![Page 11: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/11.jpg)
Hierarchical clusteringThe raw data
GeneChip1Chip2…Chip20
1x1,1x1,2…x1,20
2x2,1x2,2…x2,20
3x3,1x3,2…x3,20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12,000x12000,1x12000,2…x12000,20
![Page 12: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/12.jpg)
Hierarchical clusteringNormalized data
![Page 13: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/13.jpg)
Hierarchical clusteringCalculate the Distance Matrix
Euclidean distance formula:
)...,(),...,( 2121 nn yyyyxxxx
n
iii yxyxd
1
2)(),(
Correlation coefficient ():
N
i
N
i
N
i
N
XEXiXV
XiN
XE
YVXV
YEYiXEXiN
yxd
1
2
1
1
))(()(
1)(
)()(
))())(((1
),(
A
B
C
![Page 14: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/14.jpg)
Hierarchical clustering Calculate the Distance Matrix
Average linkage - midpoint. Single linkage – smallest distance. Complete linkage - largest distance.
![Page 15: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/15.jpg)
Hierarchical clustering Calculate the Distance Matrix
GeneChip1Chip2
A-2.01.0
B-1.5-0.5
C1.00.25
ABC
A0.001.583.09
B1.580.002.61
C3.092.610.00
![Page 16: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/16.jpg)
Hierarchical clusteringAverage Linkage Algorithm
ABCD
A0.001.583.094.74
B1.580.002.615.00
C3.092.610.002.70
D4.745.002.700.00
ADBC
![Page 17: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/17.jpg)
Hierarchical clustering Average Linkage Algorithm
ABCD
AB0.002.854.81
C2.850.002.70
D4.812.700.00
ABDC
![Page 18: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/18.jpg)
Hierarchical clustering Average Linkage Algorithm
ABCD
AB0.003.83
CD3.830.00
ABCD
![Page 19: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/19.jpg)
Hierarchical clusteringdendogram
ABCD
![Page 20: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/20.jpg)
Hierarchical clustering heat maps
red corresponding to high expression levels
green corresponding to low expression levels
black corresopnding to intermediate expression levels.
![Page 21: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/21.jpg)
Hierarchical clustering Experiment Control
Random 1 – randomized by rows.
Random 2 – randomized by columns.
Random 3 – randomized by both rows and columns.
![Page 22: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/22.jpg)
Examples
![Page 23: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/23.jpg)
Example I
We present here an experiment of Spellman et al that was published in Mol. Biol. Cell 9, 3273-3297 (1998).
Goals of the experiment: Identify all cell cycle regulated genes in Yeast. Show clustering at work.
![Page 24: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/24.jpg)
Example ICell Cycle
![Page 25: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/25.jpg)
Example IMethods
DNA microarrays contained all the yeast genome.
Measure levels of mRNA as a function of time.
![Page 26: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/26.jpg)
Example IMethods
Synchronization: factor. Elutriation – size based. Cdc15 – heat mutation.
Factors: cln3p, clb2p deletation. induced with these factors.
Data from a previously published study (Cho et al. 1998)
Control sample: asynchronous cultures.
![Page 27: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/27.jpg)
Example IMethods
Measurements analyzed based on:
Fourier algorithm - assesses periodicity.
Correlation measurement - compared with previously identified cell cycle regulated genes.
![Page 28: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/28.jpg)
Example IMethods
Calculate a score for each gene - "CDC score".
Threshold CDC value.
91% of the genes previously shown to be cell cycle regulated are included.
About 800 genes were identified as cell cycle regulated.
![Page 29: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/29.jpg)
Example IPhasing
By time of peak expression:
![Page 30: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/30.jpg)
By similarity of expression across the measurements:
Example IClustering
![Page 31: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/31.jpg)
Example IClustering
Hierarchical clustering. Identified 9 clusters.
Genes in each cluster share: Common upstream elements Regulation by similar transcription factors. Common function (only for known genes). Cln3p and clb2p has the same effect on the
genes in a cluster.
![Page 32: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/32.jpg)
Example IClustering
Histone cluster: A very tight cluster. Repeated SCB motif in promoter. Induced by Cln3. Unaffected by Clb2. Peak during S phase.
![Page 33: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/33.jpg)
Example IResults
Genes with known functionality:
Cell cycle regulated functions The MET cluster. Genes involved in secretion and lipid synthesis.
Known genes discovered as cell cycle regulated.
![Page 34: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/34.jpg)
Example IResults
New binding sites for regulators.
The CLB cluster is highly regulated. Aligning the genes in the cluster. New consensus for MCM1+SFF binding site.
![Page 35: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/35.jpg)
Example IResults
MCM1:T-T-A-C-C-N-A-A-T-T-N-G-G-T-A-A SFF: G-T-M-A-A-C-A-A New motif:T-T-W-C-C-Y-A-A-W-N-N-G-G-W-A-A-W-W-N-R-T-A-A-A-Y-A-A
![Page 36: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/36.jpg)
Example II
Gasch AP. et al. Genomic expression programs in the response of yeast cells to environmental changes.Mol Biol Cell. 2000; 11(12): 4241-57
Main Goal: Characterize the yeast response to environmental
changes, and particularly to stress conditions.
![Page 37: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/37.jpg)
Example IIMethods
Yeast cells responding to diverse environmental stresses.
Microarray contained all yeast genes. Results were organized by hierarchical
clustering.
![Page 38: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/38.jpg)
![Page 39: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/39.jpg)
Example II General features of the stress response
Massive and rapid changes. Transient changes.
Correlated with the magnitude of the shift: Duration Amplitude Steady-state difference.
![Page 40: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/40.jpg)
![Page 41: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/41.jpg)
Example II General features of the stress response
Some genes responded in a stereotypical manner.
Some genes had unique response. No two expression programs were identical.
![Page 42: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/42.jpg)
Example II The Environmental Stress Response (ESR)
About 900 genes responded in a stereotypical manner.
ESR – Environmental Stress Response.
Two large clusters of genes: repressed genes (~ 600) induced genes (~ 300)
Showed reciprocal response.
![Page 43: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/43.jpg)
![Page 44: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/44.jpg)
Example II The Environmental Stress Response (ESR)
Response to different shift in: Temperature Osmolarity.
![Page 45: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/45.jpg)
osmolarityHeat shock
Example II The Environmental Stress Response (ESR)
The ESR is not a response to all environmental changes.
![Page 46: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/46.jpg)
Example II The Environmental Stress Response (ESR)
Shift between two equally stressful environments: 29oC and hyper-osmotic medium. 33oC with normal osmolarity.
sum of the responses.
Independent response to each of the changes.
![Page 47: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/47.jpg)
Example II The Environmental Stress Response (ESR)
Previously known: STRE promoter. Recognized by Msn2p and Msn4p.
One all-purpose regulatory system ?
![Page 48: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/48.jpg)
![Page 49: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/49.jpg)
Example II The Environmental Stress Response (ESR)
TRX2 cluster genes: Dependent on Msn2/Msn4p in response to heat
shock. Unaffected from Msn2/Msn4p in response to H2O2.
Contained binding site for Yap1p.
Yap1p deletion strain.
![Page 50: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/50.jpg)
![Page 51: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/51.jpg)
Example II The Environmental Stress Response (ESR)
Revealed that TRX2 cluster genes: Induced by Yap1p in response to H2O2 treatment Unaffected by the deletion in response to heat shock.
ESR regulated by different transcription factors.
Regulation is condition-specific and gene-specific.
![Page 52: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/52.jpg)
Example II Specific Response
Response to stress: Stereotypic response (ESR). Specific response.
Character cell’s response to specific stress.
Example: Heat-shock response ESR initiated fast (minutes). Induction of chaperones. Alternative carbon source utilization.
![Page 53: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/53.jpg)
Conclusions
![Page 54: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/54.jpg)
Hierarchical clusteringConclusion
Difficulty: Post transcriptional regulation.
Solution: Use the method in cases the main regulation is in transcription level (example – Yeast cell cycle).
![Page 55: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/55.jpg)
Hierarchical clusteringConclusion
Difficulty: No statistical foundation for the decision of where to cut the dendogram.
Solution: Split a tree in such a way which will produce clusters of genes with homogeneity. Such a split is considered to be evidence that the grouping was correct.
![Page 56: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/56.jpg)
Hierarchical clustering Conclusion
Difficulty: The algorithm will produce clusters in any case.
Solution:Introduces a small amount of random to the data, re-cluster the data and compare the results to the original clustering. If the results are the same, then the clustering is not representing true biological meaning.
![Page 57: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/57.jpg)
Hierarchical clustering Conclusion
Discover gene’s function. Status of cellular processes. Information on regulatory mechanisms. General cell behaviors. Assign genes to pathways. Unknown biological pathways.
![Page 58: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels](https://reader035.vdocument.in/reader035/viewer/2022062519/56649db45503460f94aa4685/html5/thumbnails/58.jpg)
References Eisen M. B., Spellman P. T., Brown R. O., Botstein D. Cluster
analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA, 95: 14863-14868, 1998
Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273-3297 (1998).
Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO.
Genomic expression programs in the response of yeast cells to environmental changes.Mol Biol Cell. 2000; 11(12): 4241-57.
Shannon William, Culverhouse Robert, Duncan Jill. Analyzing microarray data using cluster analysis. Pharmacogenomics, 2003, 4(1):41-51. Review.
Kaminski Naftali, Friedman Nir. Practical Approaches to Analyzing Results of Microarray Experiments. American Journal of Respiratory and Cell Molecular Biology, 2002, 27:125-132. Reviwe.