microarray design with an illumina focus andy lynch 23/07/08
TRANSCRIPT
![Page 1: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/1.jpg)
Microarray Design with an Illumina focus
Andy Lynch23/07/08
![Page 2: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/2.jpg)
Overview
• The BeadArray Technology
• Sources of variance
• Bead-level data
• Prior information
• Specific experiment types
• Reasons for choosing ‘sub-optimal’ designs
• Closing thoughts
![Page 3: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/3.jpg)
The Technology
![Page 4: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/4.jpg)
The Bead
Each silica bead is 3 microns in diameter
700,000 copies of same probe sequence are covalently attached to each bead for hybridisation & decoding
BEAD AACGTATACGACTATCGTGTACAGTATAGC
bases used to identify the bead-type
50 bases that target the RNA (for example)
of interest
UUGCAUAUGCUGAUAGCACAUGUCAUAUCG
Complementary RNA with dye
attached
![Page 5: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/5.jpg)
Human expression beadchips
![Page 6: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/6.jpg)
Human expression beadchips
HumanRef-88 Parallel Arrays on the chip Each Array has ~24,000 'high-quality' RefSeq derived probesApprox 30 copies of each bead type
HumanWG-6 V16 Parallel Arrays on the chip, each consisting of 2 parallel stripsStrip 1 has the ~24,000 RefSeq derived probesStrip 2 has ~24,000 other probes (some RefSeq derived)Approx 30 copies of each bead type
![Page 7: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/7.jpg)
HumanWG-6 V2, V36 Parallel Arrays on the chip, each consisting of 2 parallel stripsEach strip has ~48,000 probesApprox 30 copies of each bead type
HumanHT-1212 Parallel Arrays on the chip consisting of 1 stripEach strip has ~48,000 probes*Fewer copies (?~15) of each bead type
Human expression beadchips
![Page 8: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/8.jpg)
Control beads
Many negative controls ~1000 depending on chip-type- each with replicates
Some house-keeping, biotin, and “high stringency” controls
Labelling controls (may not be used)
Some perfect-match/mis-match pairs(useless in HumanWG 6 V3)
Some general hybridization controls
![Page 9: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/9.jpg)
SAMS
![Page 10: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/10.jpg)
SAMSEach array on the end of a fibre-optic cable
96 arrays in a module
Each array has about 1500 probe-typesabout 30 replicates of each
Used for specialist probe panelscan be custom made
Often used for two-colour work
Used for genotyping, allele specific expression, methylation, expression (esp with poor quality RNA) and microRNAs
![Page 11: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/11.jpg)
The processBeads are allocated at random to the wells
Presumed independently
Address sequences are used to identify the beads- Some beads will fail to be identified
Presume this is independent of bead-type
Array rejected if not all beadtypes are present in suitable numbers- Applies to HumanWG 6 and HumanRef 8 - At least 5 replicates on the array?- Seems to have at least one bead on each strip of two strip arrays
Sample hybridized to array
Can either return “bead-level” intensities/locations or Illumina summaries
![Page 12: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/12.jpg)
Illumina Summaries
For each bead type…
…on the original (i.e. non-logged) scale…
… outliers are removed (>3 MAD from the median)… number of beads is reported… mean intensity… s.e. of intensity… p-value for comparison with negative controls
![Page 13: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/13.jpg)
Illumina Summaries
For two-colour platforms we may wish to then calculate…
… log-ratio (log(R/G)) … beta (R/(R+G))… sum (R+G)… theta (2*arctan(R/G)/π)
However we can’t get very good estimates of the confidence in these values since the covariance of the red and the green signals is not reported in the summary information.
![Page 14: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/14.jpg)
Substructures
Array
Strip
Strip segment
The strips that make up one or half of one array themselves consist of 9 sub-sections (segments)
Probably shouldn't treat an array as 18 technical replicates, but need to be aware of the issue
![Page 15: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/15.jpg)
SubstructuresThe 96 arrays in a SAM are arranged in a 12x8 layout
Each individual array consists of an approximate hexagon of 49,777 beads arranged in 547 hexagons of 91 beads
91 beads
547 sub-units
14->
14->
27->
![Page 16: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/16.jpg)
Sources of variation
![Page 17: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/17.jpg)
Differences between probes
• Not all probes are equally well designed• There are thermodynamic differences between probes• The additional probes on the HumanWG6 arrays are a-priori
less likely to see expression• Some probes contain SNPs, mismatches, splice junctions etc
• Some probes target the 3’ end of a gene some the 5’ end• Some probes have multiple matches in the transcriptome
others have no good match
![Page 18: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/18.jpg)
Sources of variation
• Variation enters at many levels
bead < probe < strip < array < chip
• Random numbers of beads mean that some arrays provide more evidence than others
![Page 19: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/19.jpg)
Sources of variation
• Differences between chips (as expected)
• Gradients within chips (widely reported) – known that there is a between array gradient– also a perpendicular (along array) gradient in many
chips • not observable with summary data
• Quality of final array on chip has been questioned on occasion
• Differences between strips– not surprising given the gradient
• not observable with summary data
![Page 20: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/20.jpg)
Negative control intensities
![Page 21: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/21.jpg)
Positive control intensities
![Page 22: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/22.jpg)
Bead-level data
![Page 23: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/23.jpg)
Bead-Level Data
• As an alternative to the summary data – can obtain bead level data, – or the raw images and a list of bead locations
and identities
• Need to adjust the scanner settings to achieve this
• The beadarray bioconductor package is available to handle the data
![Page 24: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/24.jpg)
Bead-Level Advantages
• Can perform better quality control• Can rescue arrays/strips that might otherwise need to be
discarded
![Page 25: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/25.jpg)
Bead-Level Advantages
• Can separate the two strips– Either normalize them while combining– Or take two technical replicates
• Can analyse the data on the scale of our choice– Usually log– Includes outlier removal
• For two-colour arrays, can calculate standard errors of beta, theta etc.
![Page 26: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/26.jpg)
Eliciting priors
![Page 27: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/27.jpg)
Eliciting prior information
• Default ‘LIMMA’ analysis returning the log-odds of being differentially expressed essentially assumes a uniform prior for the probes
• Certainly with the HumanWG-6 the refseq and non-refseq probes would have different a priori odds
• May wish to elicit more specific priors, but can’t get 48,000!
• Priors by pathway?
![Page 28: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/28.jpg)
Eliciting prior information
• While we are about it, can try to gauge– Which contrasts are more important?– Which ‘treatments’ are expected to be similar?
![Page 29: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/29.jpg)
Summary• Not all arrays will provide equal amounts of evidence
– Numbers of beads will vary from chip to chip
• Some 'arrays' may provide no evidence for certain probe types– In HumanWG 12 this is a 'feature‘– In HumanWG 6 V2/3 may result from treating the two strips as
technical replicates– May result from excising part of the array in quality control
• Block designs required– may need to consider blocks of 6, 8, or 12
• Need to know if we will have raw or summarized data
![Page 30: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/30.jpg)
First design question
• If using Illumina for expression, which array to use?
• The 6 has extra probes (but these just as likely to hinder) and is expensive
• The 8 only has good quality probes, is cheaper, but lacks some probes on the 6
• The 12 is cheapest, but risks having no or few beads for some probes
![Page 31: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/31.jpg)
Some specific types of experiment
![Page 32: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/32.jpg)
Platform Comparison Studies
• E.g. MAQC (nature biotech, 2006, 24 1140-1150)
• How do you decide on the number of arrays to compare?• How do you choose an analysis method that isn’t biased
towards one of the platforms?
![Page 33: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/33.jpg)
Platform Evaluation
• How do we determine absolutely the performance of a platform?
• Titration series? (e.g. BMC Bioinformatics, 2006, 7, 511)– What levels of dilution?
• Spiked-in probes? (e.g. Affymetrix Latin Square data for expression algorithm assessment 2001) – How many and at what levels?
![Page 34: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/34.jpg)
Logical experiments
• Often want to find genes that show up with one treatment but not another
• Extreme example is identification of siRNA offtargets as in Nature Methods (2006) 3 199-204
• They had 4 siRNAs with the same target and replicates for each.
• The question is what genes are differentially expressed only by one siRNA?
• Need to weigh up number of alternative treatments, FPR,
FNR, and number of biological replicates
![Page 35: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/35.jpg)
Time Series
• Choice of time points
• Replicate the same time points or intervening ones?
• Control series?– Same time points?
• Cell cycle?
![Page 36: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/36.jpg)
Reasons for departing from the theoretically optimal
design
![Page 37: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/37.jpg)
Robustness
• Quite common to design experiments to be robust to losing a single array
• Now, may need to be robust to losing a chip
• In SAM experiments, may need to be robust to losing the edge rows and columns.
• Can cause tension if there is a shortage of samples for some treatments
![Page 38: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/38.jpg)
Validation
• May want to sacrifice the ability to estimate our quantity of interest in order to be able to evaluate performance
• For classifications such as CNV calls might want to include a series of many replicates
• Can estimate false calling rates by analysis of the consistency of calls within the replicates
![Page 39: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/39.jpg)
Validation
• Some genomic information (SNPs, CNVs etc.) we expect to be inherited at a certain rate.
• Inclusion of pedigrees can allow estimation of inheritance rates
• Discrepancies between the expected and observed rates can allow for estimation of the false calling rates
![Page 40: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/40.jpg)
Validation
• The gold standard of validation is to use a lower-throughput, high performance, technology such as RTPCR
• Expensive to do, can only validate a small subset of probes
• Need to choose which ones
• Need to decide how many
• The more we anticipate running, the fewer the number of microarrays we can have
![Page 41: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/41.jpg)
Other…
• May wish to include arrays that
– allow for ongoing QC of the microarray facility– gain information to facilitate planning future experiments– ‘complete’ the data set for future data mining
![Page 42: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/42.jpg)
Closing thoughts
![Page 43: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/43.jpg)
Thoughts
• If we are concerned about the block effects, we might want to construct log-ratios within chips
• Can even split the two strips
• If we could successfully control for block effects and batch
effects then sequential designs would potentially play a role
A B
C
DE
F
A B
C
DE
F
A B
C
DE
F
![Page 44: Microarray Design with an Illumina focus Andy Lynch 23/07/08](https://reader035.vdocument.in/reader035/viewer/2022062519/56649f045503460f94c17c53/html5/thumbnails/44.jpg)
Acknowledgements
Thanks to:
Mark Dunning, Matt Ritchie, Nat Thorne for slides
Illumina for some of the pictures
Ian Mills, Charlie Massie, Mahesh Iddawela for some of the illustrative data