lecture 4 microarray & analysis alizadeh et al. nature 403 (2000) 503-511

Click here to load reader

Post on 22-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Lecture 4 Microarray & Analysis Alizadeh et al. Nature 403 (2000) 503-511
  • Slide 2
  • Microarray revolutionized biology and medicine research One gene at a time before, now tens of thousands simultaneously - PROTEOMICS Gene expression Gene disease relation Gene-gene interaction Finding Co-Regulated Genes Understanding Gene Regulatory Networks Many, many more
  • Slide 3
  • Basic idea of Microarray ( probe ) ( microchip ) ( sample ) ( hybridization )
  • Slide 4
  • Basic idea of Microarray Construction Place array of probes on microchip Probe (for example) is oligonucleotide ~25 bases long that characterizes gene or genome Each probe has many, many clones Chip is about 2cm by 2cm Application principle Put (liquid) sample containing genes on microarray and allow probe and gene sequences to hybridize and wash away the rest Analyze hybridization pattern
  • Slide 5
  • cDNA microarray schema cDNA
  • Slide 6
  • Microarray analysis Operation Principle: Samples are tagged with flourescent material to show pattern of sample-probe interaction (hybridization) Microarray may have 60K probe
  • Slide 7
  • Microarray Processing sequence From: Shin-Mu Tseng [email protected]
  • Slide 8
  • Gene Expression Data Gene expression data on p genes for n samples Genes mRNA samples Gene expression level of gene i in mRNA sample j = Log (Red intensity / Green intensity) Log(Avg. PM - Avg. MM) sample1sample2sample3sample4sample5 1 0.46 0.30 0.80 1.51 0.90... 2-0.10 0.49 0.24 0.06 0.46... 3 0.15 0.74 0.04 0.10 0.20... 4-0.45-1.03-0.79-0.56-0.32... 5-0.06 1.06 1.35 1.09-1.09...
  • Slide 9
  • Some possible applications Sample from specific organ to show which genes are expressed Compare samples from healthy and sick host to find gene-disease connection Probes are sets of human pathogens for disease detection
  • Slide 10
  • Amount of data from single microarray is huge If just two color, then amount of data on array with N probes is 2 N Cannot analyze pixel by pixel Analyze by pattern cluster analysis
  • Slide 11
  • Major Data Mining Techniques Link Analysis Associations Discovery Sequential Pattern Discovery Similar Time Series Discovery Predictive Modeling Classification Clustering
  • Slide 12
  • Strengthens signal when averages are taken within clusters of genes (Eisen) Useful (essential ?) when seeking new subclasses of cells, tumours, etc. Leads to readily interpreted figures Cluster Analysis: grouping similarly expressed genes, Cell samples, or both
  • Slide 13
  • Some clustering methods and software Partitioning K-Means, K-Medoids, PAM, CLARA Hierarchical Cluster, HAC BIRCH CURE ROCK Density-based CAST, DBSCAN OPTICS CLIQUE Grid-based STING CLIQUE WaveCluster Model-based SOM (self-organized map) COBWEB CLASSIT AutoClass Two-way Clustering Block clustering
  • Slide 14
  • A review paper assessing various methods Algorithmic Approaches to Clustering Gene Expression Data, Ron Shamir School of Computer Science, Tel-Aviv University Tel-Aviv http://citeseer.nj.nec.com/shamir01alg orithmic.html Conclusion: hierarchical clustering exceptional
  • Slide 15
  • Partitioning
  • Slide 16
  • Density-based clustering
  • Slide 17
  • Hierarchical (used most often) agglomerativity divisivity
  • Slide 18
  • Hierarchical Clustering: grouping similarly expressed genes gene Sample A 0.6 0.2 0 0.7.. 0.3 B 0.4 0.9 0 0.5.. 0.8 C 0.2 0.8 0.3 0.2.. 0.7 . Gene Expression Profile Analysis From: Shin-Mu Tseng [email protected] 1 2 3 4.. 1000
  • Slide 19
  • After Clustering gene sample.. 3 1 4.. 2 1000 A.. 0 0.6 0.7.. 0.2 0.3 B.. 0 0.4 0.5.. 0.9 0.8 C.. 0.3 0.2.. 0.8 0.7 . Gene Expression Profile Analysis From: Shin-Mu Tseng [email protected]
  • Slide 20
  • Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) data clustered randomized row column both time
  • Slide 21
  • distance measurements correlation coefficients association coefficients probabilistic similarity coefficients Types of Similarity Measurements
  • Slide 22
  • Correlation Coefficients The most popular correlation coefficient is Pearson correlation coefficient (1892) correlation between X={X 1, X 2, , X n } and Y={Y 1, Y 2, , Y n } where From: Shin-Mu Tseng [email protected] s XY s XY is the similarity between X & Y
  • Slide 23
  • Now can use similarity for Tree construction Normalize similarity so that =1 Then have nxn similarity matrix S whose diagonal elements are 1 Define distance matrix by (for example) D = 1 S Diagonal elements of D are 0 Now use distance matrix to built tree (using some tree-building software recall lecture on Phylogeny) s XX
  • Slide 24
  • A dendrogram (tree) for clustered genes 12345 Cluster 6=(1,2) Cluster 7=(1,2,3) Cluster 8=(4,5) Cluster 9= (1,2,3,4,5) Let p = number of genes. 1. Calculate within class correlation. 2. Perform hierarchical clustering which will produce (2p-1) clusters of genes. 3. Average within clusters of genes. 4 Perform testing on averages of clusters of genes as if they were single genes. E.g. p=5
  • Slide 25
  • A real case Nature Feb, 2000 Paper by Allzadeh. A et al Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
  • Slide 26
  • Validation Techniques Huberts Statistics X= [X(i, j)] and Y= [Y(i, j)] are two n n matrix X(i, j) similarity of gene i and gene j Huberts statistic represents the point serial correlation where M = n (n - 1) / 2 A higher value of represents the better clustering quality. if genes i and j are in same cluster, otherwise From: Shin-Mu Tseng [email protected]
  • Slide 27
  • Discovering sub-groups
  • Slide 28
  • Time Course Data Gene Expression is time-dependent
  • Slide 29
  • Sample of time course of clustered genes time
  • Slide 30
  • Limitations Cluster analyses : Usually outside the normal framework of statistical inference Less appropriate when only a few genes are likely to change Needs lots of experiments Single gene tests : May be too noisy in general to show much May not reveal coordinated effects of positively correlated genes. Hard to relate to pathways
  • Slide 31
  • Some useful links Affymetrix www.affymetrix.com Michael Eisen Lab at LBL (hierarchical clustering software Cluster and Tree View (Windows)) rana.lbl.gov/ Stanford MicroArray Database (Xcluster (Linux)) genome-www4.stanford.edu/MicroArray/SMD/ Review of Currently Available Microarray Software www.the-scientist.com/yr2001/apr/profile1_010430.html Microarray DB www.biologie.ens.fr/en/genetiqu/puces/bddeng.html
  • Slide 32
  • Eisen, M. B. et al., (1998). "Cluster analysis 'and display of genome-wide expression patterns." Proc Natl Acad Sci U S A 95(25): 14863-8. Wen, X., et al., (1998). "Large-scale temporal gene ex- pression mapping of central nervous system development." Proc Natl Acad Sci U S A 95(1): 334-9. U. Alon, et al., (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS, 96:6745-6750, June 1999. Spellman, P. T. et al., (1998). "Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9(12): 3273-97 Some papers