design of microarray experiments - college of agriculture ... · pdf filesame rna; multiple...

Design of Microarray Experiments

Xiangqin Cui

Experimental design

• Experimental design: is a term used about efficient methods for planning the collection ofdata, in order to obtain the maximum amount of information for the least amount of work. Anyone collecting and analyzing data, be it in the lab, the field or the production plant, can benefit from knowledge about experimental design.http://www.stat.sdu.dk/matstat/Design/index.html

Experimental Design Principles

• Randomization• Replication• Blocking• Use of factorial experiments instead of the

one-factor-at-a-time methods.• Orthogonality

Randomization

• The experimental treatments are assigned to the experimental units (subjects) in a random fashion. It helps to eliminate effect of "lurking variables", uncontrolled factors which might vary over the length of the experiment.

Commonly Used Randomization Methods

• Number the subjects to be randomized and then randomly draw the numbers using paper pieces in a hat or computer random number generator.

1 2 3 4 5 6

Hormone treatment : (1,3,4); (1,2,6)

Control : (2,5,6); (3,4,5)

Plan 1 plan 2

Randomization in Microarray Experiments

• Randomize arrays in respect to samples• Randomize samples in respect to

treatments• Randomize the order of performing the

labeling, hybridizations, scanning ……

Replication• Replication is repeating the creation of a

phenomenon, so that the variability associated with the phenomenon can be estimated.

Replications should not be confused with repeated measurements which refer to literally taking several measurements of a single occurrence of a phenomenon.

Replication in microarray experiments

• What to replicate?– Biological replicates (replicates at the experimental

unit level, e.g. mouse, plant, pot of plants…)• Experimental unit is the unit that the experiment treatment or

condition is directly applied to, e.g. a plant if hormone is sprayed to individual plants; a pot of seedlings if different fertilizers are applied to different pots.

– Technical replicates• Any replicates below the experimental unit, e.g. different

leaves from the same plant sprayed with one hormone level; different seedlings from the same pot; Different aliquots of the same RNA extraction; multiple arrays hybridized to the same RNA; multiple spots on the same array.

Power: the probability for a real positive to be identified

It depends on: 1) The size of difference (log ratio) to be detected

The bigger the difference, the higher the power.

2) The sample sizeThe larger the sample size, the higher the power.

3) The error variance of the treatmentThe bigger the error variance, the lower the power.

How Many Replicates

How Many Replicates

* Minimum number of replication (r) per group: 3 – 5 (two groups)

(reliable variance estimates, permutation test, df error, etc.)

rrB2

rrB1

A2A1

Factorial 2 x 2

1173Total

840Error

111A*B

111B

111A

(r=3)(r=2)(r=1)S.V.

* Sample size calculation for experiments with two groups

REFERENCE:

BALANCED BLOCK:

( )2

2)1()2/1(

)/(zz4

nσδ

+= β−α−

( )2

2)1()2/1(

)/(zz

nτδ

+= β−α−

# arrays (total)log(FC)

SD

(in general, τ > σ)

Example (Dobbin and Simon, 2003):

δ=1, α=0.001, β=0.05:σ ≈ 0.50 → n = 30 (i.e. 15A + 15B + 30R)

τ ≈ 0.67 → n = 17 (i.e. 17A + 17B)

Error variance (EV) of the estimated treatment effect:

)(2 22

422

eA

eeM

mnmnmEV

σσσσσ

+−+=

m : number of mice per treatmentn : number of array pairs per mouse

Reference DesignTrt A Trt B

M1 M2 M3 M4

R R R R

Error variance of the estimated treatment effect:

mnmEV eM

22 σσ+=Approximately

Resource allocation:

AM nCmmCCost ⋅+=

m mice / trtn array pairs / mouse

CM cost / mouseCA cost / array pair

The optimum number of array pairs per mouse:

A

M

M

e

CCn ⋅= 2

2

σσ

mnmEV eM

22 σσ+=

Examples for resource allocation

Mouse price Array price/pair # of array pairsper mouse

$15 $600 1

$300 $600 1

$1500 $600 2

• Using variance components estimated from kidney in Project Normal data.• r = 1 ( no replicated spots on array )• Reference design

More efficient array level designs, such as direct comparisons and loop designs, can reduce the optimum number of arrays per mouse.

Pooling Samples

22 1Mpool k

σσ α=

k : pool size

α : constant for the effect of pooling. 0 < α < 1α = 0, pooling has no effect.α = 1, pooling has maximum effect.

Pooling can reduce the biological variance but not the technicalvariances. The biological variance will be replaced by:

Note: More pools with fewer arrays per pool is better than fewer pools with more arrays per pools when fixed number of arrays and mice are to be used.

Power Increase to Detect 2 fold change by Pooling

2 pools / trt4 pools / trt


2 mice / trt4 mice / trt


( Pool size k = 3, α = 1 )

Significance level:0.05 after Bonferroni correction

PowerAtlas• It is an online tool to calculate sample size based on

similar data set in GEO or user provided pilot data. http://www.poweratlas.org/

Page et al., BMC Bioinformatics. 2006 Feb 22;7:84.

Gadbury GL, et al. Stat Meth Med Res 13: 325-338, 2004.

EDR is the proportion of genes that are truly differentially expressed that will be called significant at the significance level of 0.05. This is analogous to average power. TP is the proportion of genes called 'significant' that are truly differentially expressed. TN is the proportion of genes called 'not significant' that are truly not differentially expressed between the conditions.

Blocking

• Some identified uninteresting but varying factors can be controlled through blocking.

• Completely randomized design

• Complete block design

• Incompletely block designs

Completely Randomized Design

There is no blocking

Example

Compare two hormone treatments (trt and control) using 6 Arabidopsis plants.

1 2 3 4 5 6

Hormone trt: (1,3,4); (1,2,6)Control : (2,5,6); (3,4,5)

Note: Design with one-color microarrays is often completely randomized design.

Complete Block Design

There is blocking and the block size is equal to the number of treatments.

Example:

Compare two hormone treatments (trt and control) using 6 Arabidopsis plants. For some reason plant 1 and 2 are taller, plant 5 and 6 are thinner.

Randomization within blocks

1 2 3 4 5 6Hormone treatment: (1,4,5); (1,3,6)Control : (2,3,6); (2,4,5)

Incomplete Block Design

There is blocking and the block size is smaller than the number of treatments.

Example:

Compare three hormone treatments (hormone level 1, hormone level 2, and control) using 6 Arabidopsis plants. For some reason plant 1 and 2 are taller, plant 5 and 6 are thinner.

Randomization within blocks

1 2 3 4 5 6Hormone level1: (1,4); (2,4)Hormone level2: (2,5); (1,6)Control : (3,6); (3.5)

Designs for two-color microarrays

• Experimental design for one color microarray array is straight forward with out the required blocking at the array level.

• For two color microarrays, we need to pair the samples and label then with Cy3 and Cy5 for hybridization to one array.

• How do we pair?

Blocking in microarray experiments• In two-color platform, the arrays vary due to the loading of DNA quantity, spot morphology, hybridization condition, scanning setting…It is treated as a block of size 2, the samples are compared within each array.

• If there are two lots of chips in the experiment and there is large variation across chip lots. We can treat chip lots as a blocking factor.

Example: compare three samples: A, B, C

Block (array) 1 Block (array) 2 Block (array) 3

A

B

B

C

C

A

A1 B1Cy5 Cy3

A1 B1

Dye-swap

• Dye swappingSeparates the dye effect from the sample effect.Results are less sensitive to the data transformation processes

Kerr and Churchill(2001) Biostatistics, 2:183-201.

A1 B1

Replicated Dye-Swap

A2 B2

Reference designAll samples are compared to a single reference sample.

Single reference

A B C

R

Double reference

A B C

R

Example Microarray Experiment

comparing 3 treatments

Note: Arrows are used to represent a two-color microarray. The arrow head represents the red channel and the tail represents the green channel. T1 to T3, the three treatments; M1 to M15, the 15 plants; R, reference sample.

T1 T2 T3

M1 M2M3 M4M5 M6 M7 M8M9 M10 M11 M12M13 M14 M15

R R R R R R R R R R R R R R R

Loop Design

A

C

D B

Basic types of loop designs

A B

C

Single loop Double loop

A B

C

Complex loop designs

A1 B1

C1 F1

D1 E1

Pera I DBAHigh fat

Low fat

A B C

FE D

A2 B2

C2 F2

D2 E2

Two-factor factorial design at the high level

Reference or Loop

• Reference design: It has lower overall efficiency, but it has equal efficiency for all sample comparisons, and is easy to extend.

• Loop design: higher overall efficiency, harder to extend, unequal efficiency for all comparisons,

Comparing the efficiency of reference versus loop designs

Yang and Speed (2002) Nat. Rev. Genet 3:579

A B

C

Loop:

Reference:

A

R

A

R

B

R

B

R

A1 B1

C1

Connected Loop

A1 B2

C1 C2

A2 B1

Regular Loop

A1

R1

A2

R2

B1

R3

B2

R4

Classical Reference

A1

R1

A2

R1

B1

R1

B2

R1 Common Reference

Alternative Designs

Experiments With Two Groups

Reference with dye-swap

A1 B1

R R R R

A2 B2

A1 B1

R R R R

A2 B2A3 B3

R R R R

A4 B4Reference with alternating dye

Is alternating dye really necessary ?

Experiments with Two Groups (Direct comparison)

“Complete” block with dye-swap

“Complete” block design; Row-column design

A1 A2

B1 B2

A3 A4

B3 B4

A1 A5A2 A6A3 A7A4 A8

B1 B5B2 B6B3 B7B4 B8

Experiments with Three Groups

Reference with dye-swap

Reference with alternating dye

A1 B1

R R R R

A2 B2 C1

R R

C2

A1 B1

R R R R

A2 B2A3 B3

R R R R

A4 B4 C1

R R

C2 C3

R R

C4

Experiments With Three Groups

A1 B1

C1

A3 B3

C3

A4 B4

C4

A2 B2

C2

A1 B2

C1 C2

A2 B1A1 B1

B2 C2

C1

A2

=

Multiple connected loops

Multiple regular loops = Incomplete block structure

Pooling mice

22 1Mpool k

σσ α=

k : pool size

α : constant for the effect of pooling. 0 < α < 1α = 0, pooling has no effect.α = 1, pooling has maximum effect.

Pooling can reduce the mouse variance but not the technical variances. The mouse variance will be replaced by:

Note: More pools with fewer arrays per pool is better than fewer pools with more arrays per pools when fixed number of arrays and mice are to be used.

Power Increase to Detect 2 fold change by Pooling





( Pool size k = 3, α = 1 )

Significance level:0.05 after Bonferroni correction

• Use of factorial experiments instead of the one-factor-at-a-time methods.

• Orthogonality: each level of one factor has all levels of the other factor.

Example: To compare two treatments (T1, T2) and two strains (S1, S2)

T2S2T1S2S2

T2S1T1S1S1

T2T1

Design Principle Continues

ReferencesChurchill GA. Fundamentals of experimental design for cDNA microarrays. Nature Genet. 32: 490-495, 2002.

Cui X and Churchill GA. How many mice and how many arrays? Replication in mouse cDNA microarray experiments, in "Methods of Microarray Data Analysis III", Edited by KF Johnson and SM Lin. Kluwer Academic Publishers, Norwell, MA. pp 139-154, 2003.

Gadbury GL, et al. Power and sample size estimation in high dimensional biology. Stat Meth Med Res 13: 325-338, 2004.

Kerr MK. Design considerations for efficient and effective microarray studies. Biometrics 59: 822-828, 2003.

Kerr MK and Churchill GA. Statistical design and the analysis of gene expression microarray data. Genet. Res. 77: 123-128, 2001.

Kuehl RO. Design of experiments: statistical principles of research design and analysis, 2nd ed., 1994, (Brooks/cole) Duxbry Press, Pacific Grove, CA.

Page GP et al. The PowerAtlas: a power and sample size atlas for microarray experimental design and research. BMC Bioinformatics. 2006 Feb 22;7:84.

Rosa GJM, et al. Reassessing design and analysis of two-colour microarray experiments using mixed effects models. Comp. Funct. Genomics 6: 123-131. 2005.

Wit E, et al. Near-optimal designs for dual channel microarray studies. Appl. Statist. 54: 817-830, 2005.

Yand YH and Speed T. Design issues for cDNA microarray experiments. Nat. Rev. Genet. 3: 570-588, 2002.

design of microarray experiments - college of agriculture ... · pdf filesame rna; multiple...

Documents