introduction to microarray analysis and technology dave lin - november 5, 2001

Download Introduction to Microarray Analysis and Technology Dave Lin - November 5, 2001

If you can't read please download the document

Upload: elvina

Post on 20-Mar-2016

19 views

Category:

Documents


2 download

DESCRIPTION

Introduction to Microarray Analysis and Technology Dave Lin - November 5, 2001. Overview. Why Biologists care about Genomics. Why statisticians/computer scientists may care about genomics Preprocessing issues Sources of variability in constructing microarrays Postprocessing issues - PowerPoint PPT Presentation

TRANSCRIPT

  • Introduction to Microarray Analysis and TechnologyDave Lin - November 5, 2001

  • OverviewWhy Biologists care about GenomicsWhy statisticians/computer scientists may care about genomicsPreprocessing issuesSources of variability in constructing microarraysPostprocessing issuesAnalysis of data

  • What makes one cell different from another? liver vs. brainCancerous vs. non-cancerousTreatment vs. control

  • Old Days 100,000 genes in mammalian genomeeach cell expresses 15,000 of these geneseach gene is expressed at a different levelestimated total of 100,000 copies of mRNA/cell1-5 copies/cell - rare -~30% of all genes10-200 copies/cell - moderate200 copies/cell and up - abundant

  • Cells can be defined by:Complement of Genes (which genes are expressed)How much of each gene is expressed (quantity)

    What makes one cell different from another?Try and find genes that are differentially expressedStudy the function of these genesFind which genes interact with your favorite geneExtremely time-consuming.Huge amounts of effort expended to find individual genes that may differ between two conditions

  • Genomics. Almost useless term-defines many different concepts and applications.

    Microarrays-massively parallel analysis of gene expression-screen an entire genome at once-find not only individual genes that differ,but groups of genes that differ.-find relative expression level differences-how quantitative can they be?

  • Microarrays-Based on old techniquemany flavors- majority are of two essential varietiescDNA Arrays printing on glass slidesminiaturization, throughputfluorescence based detection Affymetrix Arraysin situ synthesis of oligonucleotideswill not consider Affymetrix arrays further.

  • Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,

    MASSIVE PCRPCR PURIFICATION and PREPARATIONPREPARING SLIDESPRINTINGBuilding the Chip:Full yeast genome = 6,500 reactionsIPA precipitation +EtOH washes + 384-well formatThe arrayer: high precision spotting device capable of printing 10,000 products in 14 hrs, with a plate change every 25 minsPolylysine coating for adhering PCR products to glass slidesPOST PROCESSINGChemically converting the positive polylysine surface to prevent non-specific hybridization

  • Fabrication of Spotted Arrays20,000Precipitations

    20,000 resuspensionsConsolidate forprintingSpot on Glass SlidesArrayed LibraryNormalized/Subtracted20,000 PCRreactions

  • Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,

    Micro Spotting pin

  • Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,

    Hybing the Chip:ARRAY HYBRIDIZATIONPROBE LABELINGDATA ANALYSISCy3 and Cy5 RNA samples are simultaneously hybridized to chip. Hybs are performed for 5-12 hours and then chips are washed.Two RNA samples are labelled with Cy3 or Cy5 monofunctional dyes via a chemical coupling to AA-dUTP. Samples are purified using a PCR cleanup kit.Ratio measurements are determined via quantification of 532 nm and 635 nm emission values. Data are uploaded to the appropriate database where statistical and other analyses can then be performed.

  • Labeling of RNAs with Cy3 or Cy5

    Two general methods

    -Dye conjugated nucleotide

    -Amino-allyl indirect labeling

  • Direct labeling of RNAAAAAAAARNATTTTTTTTCCAACCTATGGTTCy5-dUTPGGTTGGATACCcDNAcDNA synthesis+orCy3-dUTP

  • AAAAAAATTTTTTTTCCAACCTATGGGGTTGGATACCIndirect labeling of RNATModified nucleotideCy3 GGTTGGATACCadditioncDNA synthesis

  • Dye effect issues

    Direct methodUnequal incorporation of Cy5 vs. Cy3Very poor overall incorporation of direct-conjugatednucleotide = more starting RNA for labeling.

    Indirect methodPresumably less bias in initial incorporation of activated nucleotide, but not clear if more or lessdye is addedBoth MethodsCy3 fluoresces more brightly than Cy5labeling is very highly sequence dependent

    Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,

    Micrograph of a portion of hybridization probe from a yeast mciroarray (after hybridization).

  • Layout of the cDNA MicroarraysSequence verified, normalized mouse cDNAs19,200 spots in two print groups of 9,600 each4 x 4 grid, each with 25 x24 spotsControls on the first 2 rows of each grid.

  • Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

    Practical Problems 1Comet TailsLikely caused by insufficiently rapid immersion of the slides in the succinic anhydride blocking solution.

    Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

    Practical Problems 2

    Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

    Practical Problems 3High Background2 likely causes:Insufficient blocking.Precipitation of the labeled probe.

    Weak Signals

    Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

    Practical Problems 4Spot overlap:Likely cause: toomuch rehydrationduring post -processing.

    Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

    Practical Problems 5Dust

  • Pin-specific printingdifferences

    Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,

    Normalization - lowessGlobal lowessAssumption: changes roughly symmetric at all intensities.

    Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,

    Normalisation - print-tip-groupAssumption: For every print group, changes roughly symmetric at all intensities.

  • Pre-processing Issues

    -Definition of what a real signal iswhat is a spot, and how to determine what shouldbe included in the analysis?-How to determine backgroundlocal (surrounding spot) vs. global (across slide)-How to correct for dye effect-How to correct for spatial effecte.g. print-tip, others-How to correct for differences between slidese.g. scale normalization

  • Experimental Design Issues

    What is the best means of performing the experimentTo obtain the desired answer?

    Biologists assumptions and statisticians differ.

    Biologist viewpointmake everything exactly the same so that differences will stand out

    Statistician viewpointmake everything as random as possibleso that real trends will stand out

  • Most biologists will ask- what are the differences betweentwo samples?-implicit questions associated with microarrays-

    What is the best way to determine this? e.g. Design; replicates; conditions.How do I obtain the most reliable results? e.g. measurements, normalizationHow do I determine what a significant difference is?Do I care about subtle changes, or justthe extremes?How is information best extracted?Is correlation useful? What type of clustering?How is information combined?How do you model the interactions of 1000s of genes

  • Design: Two Ways to Do the Comparisons

  • Advantages of Our DesignLower variability Increased precisionIncrease in measurement of expression -> increased precision