gene expression microarrays microarray normalization stat 115 2012
TRANSCRIPT
![Page 1: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/1.jpg)
Gene Expression MicroarraysMicroarray Normalization
Stat 115
2012
![Page 2: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/2.jpg)
Outline
• Gene expression microarrays– Differential Expression– Spotted cDNA and oligonucleotide arrays
• Microarray normalization methods– Median scaling, Lowess, and Qnorm– MA plots
• Microarray databases
2
![Page 3: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/3.jpg)
Central Dogma of Molecular Biology
DNA replication
DNA
RNA
Transcription
Physiology
Folded withfunction
Protein
Translation
Reverse transcription
3
![Page 4: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/4.jpg)
Imagine a Chef
Restaurant Dinner Home Lunch
Certain recipes used tomake certain dishes
4
![Page 5: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/5.jpg)
Each Cell Is Like a Chef
5
![Page 6: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/6.jpg)
Each Cell Is Like a Chef
Infant Skin Adult Liver
Glucose, Oxygen, Amino Acid
Fat, AlcoholNicotine
HealthySkin Cell
State
DiseaseLiver Cell
State
Certain genes expressed tomake certain proteins
6
![Page 7: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/7.jpg)
Differential Expression
• Understand the transcription level of gene(s) under different conditions– Cell types (brain vs. liver)– Developmental (fetal vs. adult)– Response to stimulus (rich vs poor media)– Gene activity (wild type vs. mutant)– Disease states (healthy vs. diseased)
7
![Page 8: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/8.jpg)
High Throughput Measures of Gene Expression
• Measure gene expression: quasi-estimate of the protein level and cell state
• High throughput: measure mRNA level of all the genes in the genome together
• Checking what the chef is making in many different situations
• Different microarrays:– Spotted cDNA microarrays – oligonucleotide arrays
8
![Page 9: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/9.jpg)
Microarrays
• Grow cells at certain condition, collect mRNA population, and label them
• Microarray has high density sequence specific probes with known location for each gene/RNA
• Sample hybridized to microarray probes by DNA (A-T, G-C) base pairing, wash non-specific binding
• Measure sample mRNA value by checking labeled signals at each probe location
9
![Page 10: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/10.jpg)
Spotted cDNA Arrays
• Pat Brown Lab, Stanford University
• Robotic spotting of cDNA (mRNA converted back to DNA, no introns)
• Several thousands of probes / array
• One long probe per gene
10
![Page 11: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/11.jpg)
Spotted cDNA Arrays
• Competing hybridization– Control– Treatment
• Detection– Green: high control– Red: high treatment– Yellow: equally high– Black: equally low
11
![Page 12: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/12.jpg)
Why Competing Hybridization?
• DNA concentration in probes not the same, probes not spotted evenly
12
![Page 13: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/13.jpg)
cDNA Microarray Readout
• Result often viewed with Excel or wordpad
13
![Page 14: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/14.jpg)
Oligonucleotide Arrays
• GeneChip® by Affymetrix• Parallel synthesis of
oligonucleotide probes (25-mer) on a slide using photolithographic methods
• Millions of probes / microarray
• Multiple probes per gene• One-color arrays
14
![Page 15: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/15.jpg)
Affymetrix GeneChip Probes
15
![Page 16: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/16.jpg)
Labeled Samples Hybridize to DNA Probes on GeneChip
16
![Page 17: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/17.jpg)
Shining Laser Light CausesTagged Fragments to Glow
17
![Page 18: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/18.jpg)
Perfect Match (PM) vs MisMatch (MM)(control for cross hybridization)
18
![Page 19: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/19.jpg)
Affymetrix Microarray Imagine Analysis
• Gridding: based on spike-in DNA• Affymetrix GeneChip Operating System
(GCOS)– cel file
X Y MEAN STDV NPIXELS
701 523 311.0 76.5 16702 523 48.0 10.5 16
– cdf file• Which probe at (X,Y) corresponds to which probe
sequence and targeted transcript• MM probes always (X,Y+1) PM
19
![Page 20: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/20.jpg)
Array Platform Comparisons• cDNA microarrays:
– Two-color assay, comparative hybridization– Cheaper ($50-$200 / chip)– Flexibility of custom-made array: do not need whole
sequence• Oligonucleotide GeneChip:
– One-color assay, absolute expression level – A little more expensive ($200-500 / chip)– Automated: better quality control, less variability– Easier to compare results from different experiments
• Many more commercial array platforms– Agilent, ABI, Amgen, NimbleGen…– Some use long oligo probes: 30-70 nt
20
![Page 21: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/21.jpg)
Experimental Design Issues
• Replicates: always preferred• Biological replicates: repetition of the
experiment prior to extracting mRNA – Multiple cell conditions & individuals
• Technical replicates: repetition of experimental conditions after mRNA extraction – Include reverse transcription, probe labeling,
and hybridization
21
![Page 22: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/22.jpg)
Normalization
• Try to preserve biological variation and minimize experimental variation, so different experiments can be compared
• Consideration: scale, dye bias, location bias, probe bias, …
• Assumption: most genes / probes don’t change between two conditions
• Normalization can have larger effect on analysis than downstream steps (e.g. group comparisons)
22
![Page 23: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/23.jpg)
Dye Swap in cDNA Microarrays
• Cy5, Cy3 dyes do not label equally– log2R/G -> log2RTRUTH /GTRUTH - c
• So swap the dyes in a replicate experiment, ideally
• Combine by subtract the normalized log-ratios:[ (log2 (R/G) - c) - (log2 (R’/G’) - c’) ] / 2
[ log2 (R/G) + (log2 (G’/R’) ] / 2
[ log2 (RG’/GR’) ] / 2
swapExpRG
GeneAExpRG
GeneA RatioRatio .'/'
2/
2 )(log)(log
23
![Page 24: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/24.jpg)
Median Scaling
• Linear scaling– Ensure the different arrays have the same
median value and same dynamic range
– X' = (X – c1) * c2
array2 array2
arra
y1
arra
y1
24
![Page 25: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/25.jpg)
Loess
• LOcally WEighted Scatterplot Smoothing
• Fit a smooth curve– Use robust local linear fits– Effectively applies different scaling factors at
different intensity levels– Y = f(X)– Transform X to X' = f(X)– Y and X' are comparable
25
![Page 26: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/26.jpg)
Reference for Normalization
• Need to pick one reference sample– “Middle” chip: median of median– Pooled reference RNA sample– Selection of baseline chip influences the results
• Need to pick a subset of genes to estimate the scaling factor or smooth curve– Housekeeping genes: present at constant levels– Invariant rank: If a gene is not differentially
expressed, its rank in the two arrays (or colors) should be similar
26
![Page 27: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/27.jpg)
Quantile Normalization
Probes
Experiments Mean
• Bolstad et al Bioinformatics 2003– Currently considered the best normalization method
– Assume most of the probes/genes don’t change between samples
• Calculate mean for each quantile and reassign each probe by the quantile mean
• No experiment retain value, but all experiments have exact same distribution
27
![Page 28: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/28.jpg)
Dilution Series
• RNA sample in 5 different concentrations
• 5 replicates scanned on 5 different scanners
• Before and after quantile normalization
28
![Page 29: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/29.jpg)
Normalization Quality CheckMA Plot
log2R vs log2G Values should be on diagonal
M=log2R- log2GA=(log2R+log2G)/2Values should scatter around 029
![Page 30: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/30.jpg)
Before Normalization
• Pairwise MA plot for 5 arrays, probe (PM)
2
2
log ( / )
log
i j
i j
M PM PM
A PM PM
30
![Page 31: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/31.jpg)
After Normalization
• Pairwise MA plot for 5 arrays, probe (PM)
2
2
log ( / )
log
i j
i j
M PM PM
A PM PM
31
![Page 32: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/32.jpg)
Public Microarray Databases
• SMD: Stanford Microarray Database, most Stanford and collaborators’ cDNA arrays
• GEO: Gene Expression Omnibus, a NCBI repository for gene expression and hybridization data, growing quickly.
• Oncomine: Cancer Microarray Database– Published cancer related microarrays– Raw data all processed, nice interface
32
![Page 33: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/33.jpg)
Homework
• How many data series are there on GEO with Affymetrix gene expression profiles of– Human breasts– Human prostates– Human brains– Mouse liver– Just the numbers
• Which series have > 10 samples– Use the DataSet Browser format
33
![Page 34: Gene Expression Microarrays Microarray Normalization Stat 115 2012](https://reader036.vdocument.in/reader036/viewer/2022081416/56649daa5503460f94a98ab3/html5/thumbnails/34.jpg)
Acknowledgment
• Terry Speed, Rafael Irizarry & group• Kevin Coombes & Keith Baggerly• Erick Rouchka• Wing Wong & Cheng Li• Mark Reimers• Erin Conlon• Larry Hunter• Zhijin Wu• Wei Li
34