mmi genomics/ucd & msu visits – may 2003 design and analysis of cdna microarray experiments at...
TRANSCRIPT
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNAMicroarray Experiments atCSIRO Livestock Industries
Toni Reverter ([email protected])CLI BioinformaticsQueensland Bioscience PrecinctBrisbane, 4067 Australia
Introduction: Analysis possibilities Challenges Process for microarray
Technical Concerns: Design Image (data) quality Data analysis
Contents
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Determine genes which are differentially expressed (DE).
Connect DE genes to sequence databases to search for common upstream regions.
Overlay DE genes on pathway diagrams.
Relate expression levels to other information on cells, e.g. tumor types.
Identify temporal and spatial trends in gene expression.
Seek roles of genes based on patterns of co-regulation.
…many more!!!
Analysis Possibilities(adapted from Hongzhe Li, 2002)
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
MMI Genomics/UCD & MSU Visits – May 2003
Challenges
Time Dependent
Chronology
Logical
1800s – DATA30-60s – METHODS50-70s – SOFTWARE1980s – COMPUTER
cDNA
Human Dependent
Skill Integration
QuantitativeComputer Sci.StatisticiansMathematicians …….
Non-QBiochemistsPhysiologistsPathologists …….
BANANA EGG
“banana omelette”
Historical Excitement Balance Interdisciplinary
Data Dependent
Paradigm
Distribution
Source Size
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Array Process
cDNA “A” Cy5 cDNA “B” Cy3
Tissue Samples
Treat A Treat B
mRNA Extraction & Amplification
Hybridization
Laser 1 Laser 2
Optical Scanner
+
Image Capture
Analysis
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
What you see isWhat you get
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Egg Level (Biochemist):1. Preparation (Printing) of the Chip2. RNA Extraction, Amplification and Hybridisation3. Optical Scanner (Reading)
Banana Level (Quantitative):1. Design2. Image (data) Quality3. Data Analysis
Replication:
1. Animal2. Sample3. Array4. Spot
Technical Concerns
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
############################################################## # GP3xCLI # # GenePix Processing Program by CSIRO Livestock Industries # # # # Enquiries: [email protected] # # Copyright (c) 2003 CSIRO-LI # ############################################################## GPR Input: F12.gpr Processed on: Tue Apr 8 13:40:01 EST 2003 =-=-=-=-=-=-= IMAGE QUALITY =-=-=-=-=-=-=-= Total No. of Spots ------------------------> 19200 Spots with Flag = -50 --------------------> 4720 Spots with Flag = -100 --------------------> 12 Red dye with Background >= Foreground ---> 892 Green dye with Background >= Foreground ---> 915 Median to Mean Correlation Analysis: DATA LEFT RED GREEN Corr Raw Log2 Raw Log2 ______________________________________ > 0.00 19200 19200 19200 19200 > 0.20 19199 19200 19199 19200 > 0.40 19183 19200 19192 19200 > 0.60 19008 19200 19102 19200 > 0.80 17061 19199 18541 19198 > 0.85 14466 19193 17872 19196 > 0.90 10491 19137 15786 19181 =-=-=-=-=-=-= VALID SPOTS* =-=-=-=-=-=-=-= Total No. of Valid Spots -----------------> 14433 Percentage of Valid Spots -----------------> 75.2 Total No. of Genes ------------------------> 7220 Mean No. Repetitions -----> 2 for 6600 Genes Min. No. Repetitions -----> 1 for 580 Genes Max. No. Repetitions -----> 24 for 8 Genes Log(R/G) vs 0.5*Log(R*G) ________ ____________ N 14433 14433 Mean -0.017 10.327 Std 0.617 2.079 Min -8.711 3.246 Max 4.030 15.994 Correlation 0.362 Log(R/G) across Intensity Values Intensity Spots % <0 % >0 __________________________________ ( 0 , 4) 4 100.0 0.0 ( 4 , 8) 1499 74.1 25.9 ( 8 , 12) 9847 40.4 59.6 (12 , 16) 3083 17.3 82.7 __________________________________ *NB: Valid Spot defined as spots with Background < Foreground for both Red and Green channels and with a Quality Flag of 0.
Technical Concerns: Image (Data) Quality GP3xCLI
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Clever Programming Tailored to your needsN=1
for filename in R16T0S1.gpr R16T0S2.gpr R16T24S1.gpr R16T24S2.gpr S32T0S1.gpr S32T0S2.gpr S32T24S1.gpr S32T24S2.gprdo
# Get valid readings, compute log ratios
awk 'NR>30 && $NF>=0 && $4!="no_spot" && substr($4,1,5)!="score" && \ substr($4,1,5)!="custo" && substr($4,1,6)!="spotre" && \ $9>$12 && $18>$21 {print $4, $9-$12, $18-$21, log($9-$12)/log(2.0), \ log($18-$21)/log(2.0)}' $filename | sort > junk1awk '$2!=$3 {print $0, $4-$5, 0.5*($4+$5)}' junk1 > junk2
# get the median of log ratios
REC=`wc -l junk2 | awk '{print int($1/2)}'`MED=`sort -n +5 junk2 | awk -v rec=$REC 'NR==rec {print $6}'`echo "Median of file" $filename " = " $MED
# Global normalization: substract the median to each log ratio
awk -v median=$MED -v slide=$N '{print "Slide_"slide, int(slide/2+.5), \ $1, $6-median}' junk2 | sort +2 > dat.$N
N=`expr $N + 1`done
cat dat.* > total.dat
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Experimental Design
O
B
A
AB
O
B
A
AB
O
B
A
AB
Reference Loop All-Pairs
Variance of Estimated Effects(Relative to the All-Pairs)
Reference
1132
Loop
4/31
8/31
All-Pairs
1121
Main effect of AMain effect of BInteraction ABContrast A-B
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Experimental Design
Glonek & Solomon Factorial and Time Course Designs for cDNA Microarray Experiments
• Read pp 1-2
• DefinitionA design with a total of n slides and design matrix X is said to be admissibleif there exists no other design with n slides and design matrix X* such that
ci* ciFor all i with strict inequality for at least one i. Where ci* and ci are respectivelythe diagonal elements of (X*’X*)-1 and (X’X)-1.
• Read pp 24
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
What is the No. of Possible Configurations?
No. of Arrays: (S-1) to S·(S-1)
S = 3 2 6
S = 4 3 12
S = 12 11 132
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
SA-1
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Pie-Bald black Non-Pie-Bald black
Normal
White
Recessive SA-1 = 53 = 125
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
0 hr 24 hr
MMI Genomics/UCD & MSU Visits – May 2003
SA-1 = 1210 = 62 Billion!
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
0 hr 24 hr
R
R
R R
R
R
R
R
RR
R
R
G
G
G G
G
G
G
G
G
G
G
G
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
F HS
M TM
F HS
M HS
F TM
M HS
F HS
M HS
R
R
R
R
R
R
R
R
R
R
R
R
R
R
G
GG
G
G
G
G
G
G
G
G
G
G
G
MMI Genomics/UCD & MSU Visits – May 2003
24: 23 To 552
14: 13 To 182
pooling
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Slide RES SUS 0 3 24 M F HS TM
1 0 0 1 0 -1 0.066 -0.066 0.266 -0.266
2 0 0 -1 1 0 0.600 -0.600 -0.600 0.600
3 1 -1 1 -1 0 -0.600 0.600 -0.400 0.400
4 -1 1 -1 0 1 0.600 -0.600 0.400 -0.400
5 -1 1 1 -1 0 -0.600 0.600 1 -1
6 1 -1 1 -1 0 0.666 -0.666 -0.400 0.400
7 1 -1 -1 0 1 -0.666 0.666 0 0
8 0 0 1 0 -1 -0.333 0.333 0 0
9 0 0 0 -1 1 0.333 -0.333 -0.666 0.666
10 0 0 0 1 -1 -1.000 1.000 0 0
11 -1 1 0 -1 1 -0.500 0.500 0 0
12 1 -1 0 1 -1 -1.000 1.000 -1 1
13 -1 1 0 1 -1 0.666 -0.666 0.666 -0.666
14 0 0 0 1 -1 -1.000 1.000 0 0
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
RES SUS 0 3 24 M F HS TM
RES 8 -8 1 0 -1 -1.766 1.766 -3.866 3.866
SUS 8 -1 0 1 1.766 -1.766 3.866 -3.866
0 8 -4 -4 -1.335 1.335 0.666 -0.666
3 10 -6 -1.033 1.033 -0.468 0.468
24 10 2.368 -2.368 -0.198 0.198
M 6.247 -6.247 0.493 -0.493
F 6.247 -0.493 0.493
HS 3.798 -3.798
TM 3.798
Sum(ABS) 29.3 29.3 22.0 23.0 27.1 21.7 21.7 17.6 17.6
Sum(ABS) 26.8 26.8 39.1 23.1 17.3 7.1 7.1 14.3 14.3
Reference Design
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Statistical Analysis
Technique Choice AimReal Ideal
1. Transformation Base-2 Log Numerically tractable Gaussian
2. Normalisation Location: M - c Systematic effects Gaussian2.i. Global: - Mean
- Median - Regr. Coeff (LOWESS)
2.ii. Local: - LOWESS within print-tip-group
3. Standarisation Scale Parameter Stabilise variance Gaussian
Data Beautifying Techniques
MMI Genomics/UCD & MSU Visits – May 2003
Pin group (sub-array) effects
Boxplots of log ratios by pin groupLowess lines through points from pin groups
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Statistical AnalysisAssumption: The proportion of genes that are DE is minimal
Q: Which genes to use?A: Only the ones (housekeeping) that we know are not DENB: “Boutique” arrays become a nuisance
Adapted from T Speed 2002
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Except Log2, everything else applies only to Ratios:
M = log2(R/G)
Except Log2, everything else applies only within slide
Everything is beautified to identify DE genes straight from “M vs A” plot (A = Average) from a single slide or from a function of M’s (t-stat) across slides
Technical Concerns: Statistical Analysis
Data Beautifying Techniques
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Include the “possible systematic sources of variation” into a model-based (eg. ANOVA) analysis and the data will be implicitly normalised. Then check the residuals.
Whenever possible, avoid ratios.
Technical Concerns: Statistical Analysis
Log2(Intensity) = Array +Dye +A*D +Treatment +Sample +Gene +Gene*Treatment +Gene*Sample +Residuals
Normalisation Model
Gene Model
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Rockhampton Model
Log2(Intensity) = Design +Array +
Dye + Array*Dye + (Diet +) Gene + Gene*Diet + Residuals
N Levels 2
14 228 3
7,34722,041
300,936
RockhamptonMEDIUM
(4 Animals)LOW
(3 Ani, 1 Rep)
(pooled 3 Anim)HIGH
(pooled 2 Anim)
MEDIUM(Pooled & Ampl)
LOW(Pooled & Ampl)
Reference
All Pairs
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Japanese Model
Log2(Intensity) = Array +Dye +
Array*Dye + Breed + Time + Breed*Time + Gene +
Gene*Br*Ti + Residuals
N Levels 12
2 24
2 36
590035,400
259,080
Japanese
JANUARY 01
JAPANESE
JUNE 01
JAPANESE
OCTOBER 01
JAPANESE
HOLSTEIN HOLSTEIN HOLSTEIN
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Rockhampton Japanese
ResidualGene
Gene*Treatment
% Total VarianceExplained by
GeneGene*Treatment
0.7203.6640.137
81.13.0
0.8892.8830.133
73.83.4
REML
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Clever Programming Tailored to your needs
T24 - T0
-4
-2
0
2
4
-4 -2 0 2 4
Resistant
Dis
ease
Interaction Solutions
Your Needs: “Important values are…”1. Away from (0,0)2. In quadrants 1 and 4.
Generate a new variable:
+1.0*[(R24-R0)+(S0-S24)] if R0<R24 & S0>S24
+0.5*[(R24-R0)+(S24-S0)] if R0<R24 & S0<S24
-0.5*[(R0-R24)+(S0-S24)] if R0>R24 & S0>S24
-1.0*[(R0-R24)+(S24-S0)] if R0>R24 & S0<S24
…then apply model-based clustering.
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Clever Programming BAYESMIX
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Clever Programming BAYESMIX
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Clever Programming BAYESMIX
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Human Dependent
Challenges
Interdisciplinary Skills
Minimal knowledge of the application discipline is needed
…..failing that, the Statisticians will win, ..…but with the wrong weapons.
1. Amount of Expression = Amount of Response2. Same cut-off point to judge all genes3. Over-emphasis in normalization (hence, despise “Boutique Arrays”)4. Over-emphasis in variance stabilization5. Over-emphasis in controlling false-positives
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
The Statistical Analysis of cDNA Microarray Data:
General:1. Still in its infancy (…possibly even embryonic stage)2. Many decisions have a heuristic rather than a
theoretical foundation3. No hope for a “One size fits all” software (even method)4. Safer to aim towards “Tailor to one’s needs”5. Integration of interdisciplinary skills is a must
Conclusion
Livestock Species:1. Tailing humans (…at the moment)2. Strong background knowledge of genetics accumulated3. Journals will soon be inundated4. We have the opportunity to participate
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at CSIRO Livestock Industries (Toni Reverter)
Thank You!
MMI Genomics/UCD & MSU Visits – May 2003