introduction to microarray analysis and technology dave lin - november 5, 2001
Post on 18-Dec-2015
219 views
TRANSCRIPT
Introduction to Microarray Analysis and TechnologyDave Lin - November 5, 2001
OverviewOverview
—Why Biologists care about Genomics
—Why statisticians/computer scientists
—may care about genomics•Preprocessing issues
•Sources of variability in constructing
microarrays•Postprocessing issues
•Analysis of data
What makes one cell different from another?
liver vs. brain
Cancerous vs. non-cancerous
Treatment vs. control
Old Days
100,000 genes in mammalian genome
each cell expresses 15,000 of these genes
each gene is expressed at a different level
estimated total of 100,000 copies of mRNA/cell
1-5 copies/cell - “rare” -~30% of all genes
10-200 copies/cell - “moderate”
200 copies/cell and up - “abundant”
Cells can be defined by:Complement of Genes (which genes are expressed)How much of each gene is expressed (quantity)
What makes one cell different from another?Try and find genes that are differentially expressedStudy the function of these genesFind which genes interact with your favorite gene
Extremely time-consuming.
Huge amounts of effort expended to find individual genes that may differ between two conditions
Genomics. Almost useless term-defines many different concepts and applications.
Microarrays-massively parallel analysis of gene expression-screen an entire genome at once-find not only individual genes that differ,but groups of genes that differ.-find relative expression level differences-how quantitative can they be?
Microarrays-
Based on old techniquemany flavors- majority are of two essential varieties
cDNA Arrays printing on glass slides
miniaturization, throughputfluorescence based detection
Affymetrix Arraysin situ synthesis of oligonucleotideswill not consider Affymetrix arrays further.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
THE PROCESSTHE PROCESSBuilding the Chip:
MASSIVE PCR PCR PURIFICATIONand PREPARATION
PREPARING SLIDES PRINTING
Preparing RNA:
CELL CULTUREAND HARVEST
RNA ISOLATION
cDNA PRODUCTION
Hybing the Chip:POST PROCESSING
ARRAY HYBRIDIZATION
PROBE LABELING
DATA ANALYSIS
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
MASSIVE PCR PCR PURIFICATION and PREPARATION
PREPARING SLIDES
PRINTING
Building the Chip:
Full yeast genome = 6,500 reactions IPA precipitation +EtOH
washes + 384-well format
The arrayer: high precision spotting device capable of printing 10,000 products in 14 hrs, with a plate change every 25 mins
Polylysine coating for adhering PCR products to glass slides
POST PROCESSING
Chemically converting the positive polylysine surface to prevent non-specific hybridization
Fabrication of “Spotted Arrays”Fabrication of “Spotted Arrays”
20,000Precipitations
20,000 resuspensions
Consolidate forprinting
Spot on Glass Slides
Arrayed LibraryNormalized/Subtracted
20,000 PCRreactions
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Printing ApproachesPrinting Approaches
Non - Contact
• Piezoelectric dispenser
• Syringe-solenoid ink-jet dispenser
Contact (using rigid pin tools, similar to filterarray)
• Tweezer
• Split pin
• Micro spotting pin
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Micro Spotting pin
Micro Spotting pin
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Microarray GridderMicroarray Gridder
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Practical ProblemsPractical Problems
— Surface chemistry: uneven surface may lead to high background.
— Dipping the pin into large volume -> pre-printing to drain offexcess sample.
— Spot variation can be due to mechanical difference between pins.Pins could be clogged during the printing process.
— Spot size and density depends on surface and solutionproperties.
— Pins need good washing between samples to prevent samplecarryover.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Hybing the Chip:
ARRAY HYBRIDIZATION
PROBE LABELING
DATA ANALYSIS
Cy3 and Cy5 RNA samples are simultaneously hybridized to chip. Hybs are performed for 5-12 hours and then chips are washed.
Two RNA samples are labelled with Cy3 or Cy5 monofunctional dyes via a chemical coupling to AA-dUTP. Samples are purified using a PCR cleanup kit.
Ratio measurements are determined via quantification of 532 nm and 635 nm emission values. Data are uploaded to the appropriate database where statistical and other analyses can then be performed.
Labeling of RNAs with Cy3 or Cy5
Two general methods
-Dye conjugated nucleotide
-Amino-allyl indirect labeling
Direct labeling of RNA
AAAAAAA RNATTTTTTTT
CCAACCTATGG T
T
Cy5-dUTP
GGTTGGATACC
cDNA
cDNA synthesis + or
Cy3-dUTP
AAAAAAATTTTTTTT
CCAACCTATGG
GGTTGGATACC
Indirect labeling of RNA
T Modified nucleotide
Cy3
GGTTGGATACC
addition
cDNA synthesis
Dye effect issues
Direct methodUnequal incorporation of Cy5 vs. Cy3Very poor overall incorporation of direct-conjugatednucleotide = more starting RNA for labeling.
Indirect methodPresumably less bias in initial incorporation of activated nucleotide, but not clear if more or lessdye is added
Both MethodsCy3 fluoresces more brightly than Cy5labeling is very highly sequence dependent
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Micrograph of a portion of hybridization probe from a yeast mciroarray (after hybridization).
Layout of the cDNA MicroarraysLayout of the cDNA Microarrays
—Sequence verified, normalized mouse cDNAs—19,200 spots in two print groups of 9,600
each– 4 x 4 grid, each with 25 x24 spots– Controls on the first 2 rows of each grid.
77
pg1 pg2
Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 1
• Comet Tails• Likely caused by
insufficiently rapid immersion of the slides in the succinic anhydride blocking solution.
Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 2
Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 3
High Background• 2 likely causes:
– Insufficient blocking.
– Precipitation of the
labeled probe.
Weak Signals
Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 4
Spot overlap:Likely cause: toomuch rehydrationduring post -processing.
Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research
Practical Problems 5
DustDust
Pin-specific printingdifferences
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Normalization - lowessNormalization - lowess• Global lowess
• Assumption: changes roughly symmetric at all intensities.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research,
Normalisation - print-tip-groupNormalisation - print-tip-groupAssumption: For every print group, changes roughly symmetric
at all intensities.
Pre-processing Issues
-Definition of what a real signal iswhat is a spot, and how to determine what shouldbe included in the analysis?
-How to determine backgroundlocal (surrounding spot) vs. global (across slide)
-How to correct for dye effect-How to correct for spatial effect
e.g. print-tip, others-How to correct for differences between slides
e.g. scale normalization
Experimental Design Issues
What is the best means of performing the experimentTo obtain the desired answer?
Biologists’ assumptions and statisticians’ differ.
Biologist viewpointmake everything exactly the same so that differences will stand out
Statistician viewpointmake everything as random as possibleso that real trends will stand out
Most biologists will ask- what are the differences betweentwo samples?
-implicit questions associated with microarrays-
What is the best way to determine this? e.g. Design; replicates; conditions.
How do I obtain the most reliable results? e.g. measurements, normalization
How do I determine what a significant difference is?Do I care about “subtle” changes, or justthe extremes?
How is information best extracted?Is correlation useful? What type of clustering?
How is information combined?How do you model the interactions of 1000s of genes
Design: Two Ways to Do the Comparisons
Advantages of Our DesignAdvantages of Our Design
—Lower variability —Increased precision—Increase in
measurement of expression -> increased precision