re-interpreting dna microarray data sungchul ji, ph.d. department of pharmacology and toxicology...

19
RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 [email protected]

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

RE-INTERPRETING DNA MICROARRAY DATA

Sungchul Ji, Ph.D.Department of Pharmacology and Toxicology

Rutgers University

Piscataway, N.J. 08855

[email protected]

Page 2: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The Cell as the Smallest DNA-Based Molecular Computer(S. Ji, BioSystems 52, 123-133, 1999)

DNA

Cells Brains Computers1 2

34

5

Ontogeny = 1, 2

Epistemology = 3, 4, and 5

Page 3: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

RP RT H

(+) DNA ------ > (-) mRNA ------ > (+) DNA ------- > (-) DNA. | | DNA | Polymerase RP = RNA polymerase | or RT = Reverse transcriptase | Synthesizer H = Hybridizes to; no enzymes needed | \/ (-) DNA

(Used to fabricate DNA microarrays)

 

The Complementary (+/-) Relations among the various DNA and RNA molecules involved in Microarray Experiments

Page 4: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

DNA Microarrays

• There are two kinds of DNA microarrays – cDNA or EST microarray and the Gene Chips. cDNA/EST microarrays are discussed first.

• One microarray can measure 104 mRNA levels simultaneously

• mRNA levels in the cell are determined by mRNA synthesis (Vsyn) and mRNA hydrolysis (Vhyd), because the rate of change in mRNA levels inside the cell is always:

dR/dt = Vsyn - Vhyd

• Only when certain kinetic conditions are met (to be discussed later) can the DNA microarray technique measure rates of gene expression [1].

• Each square can recognize one kind of mRNA molecules.

Page 5: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

How to Make Oligonucleotide Gene Chips [1]

Page 6: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Preparation of Oligonucleotide Arrays (or Gene Chips) [2]

1) “Gene chips” is a phrase coined by a commercial organization Affymetric, Inc. in California. Unlike DNA microarrays that are produced by individual scientists and by biotechnology companies, gene chips are an exclusive product of Affymetric.

2) The technical base for producing gene chips originated from the computer chip industry and DNA synthesis lab. The key steps involved in producing gene chips are schematically shown in the next slide: The surface of the glass base is treated with chemicals so that it can bind single nucleotides

(A, T, G, or C).  In a step-wise process, each small area of the chip (also 10,000 squares each 100 micron x

100 micron) serves as the site for the synthesis of short cDNA fragments (oligonucleotides on the order of 10 to 20 nucleotides each), one nucleotide at a time.

A first mask covers most the glass surface of the chip except for all sites which are destined to contain an oligonucleotide which begins with A and these exposed sites are then coupled to adenosine (A). The next mask (after some chemistry) allows other sites to couple C; a third mask allows the coupling of T, and a final mask allows the coupling of G. The combination of all four masks completes the attachment of the first nucleotide to the glass surface.

A second nucleotide is then added to each site with four other masks. Thus, for a chip bearing 10 base-long oligonucleotides, it is necessary to use 40 masks in all

for directing each nucleotide to its proper area.  3) Affymetric can synthesize 10,000 to 1,000,000 different oligonucleotides on each chip!

Page 7: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

How DNA Microarray Experiments are Done

1. Isolate mRNA from broken cells.

2. Synthesize fluorescently labeled cDNA from mRNA using reverse transcriptase and fluorescent nucleotides.

3. Prepare a microarray either with EST or oligonucleotides (synthesized right on the microarray surface by Affimetric,Inc.).

4. Pour the fluorescently labeled cDNA preparations over the microarray surface to effect hybridization. Wash off excess debris.

5. Measure fluorescently labeled cDNA using a computer-assisted microscope.

6. The final result is a table of numbers, each number registering the fluorescent intensity which is in turn proportional to the concentration of cDNA (and ultimately mRNA) located at row x and column y, row indicating the identity of genes, and y the conditions under which the mRNA levels are measured.

1

2

3

4

5

5

6

Page 8: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Covalent and Noncovalent Interactions in Microarray Experiments

CTAATGT (Original DNA) 1 2

3

3

1) Transcription inside the cell2) Reverse transcription inside the test

tube3) Hybridization on the microarray

surface4) Probably millions of cDNA molecules

are adsorbed on each square on a DNA microarray.

5) If mRNA formed inside the cell is stable, then the amount of mRNA formed during Step 1 can be estimated from the amount of cDNA bound to microarray surface in Step 3.

6) But mRNA molecules inside the cell are unstable, because they are rapidly hydrolyzed into ribonucleotides by various ribonucleases. Therefore, it is impossible to estimate how many mRNA molecules are formed in Step 1 by measuring how many molecules of cDNA are bound to microarray surface in Step 3 (more on this later).

Page 9: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Gradients

Proteins

RNA

Genes

InputOutput

AminoAcids

Ribo-nucleotides

Figure 1. The molecular model of the cell known as the Bhopalator[S. Ji, J. theoret. Biol. 116:399-426 (1985)].

1

3

5

2

46

7

8

910

Page 10: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The changes in mRNA levels of human fibroblasts (cells of connective tissues that synthesize and secrete fibrillar procollagen, fibronectin, and collagenase) measured with DNA microarrays over a time period of 24 hours.

Green represents a decrease in mRNA levels (or “genes”, which term is strictly speaking incorrect; see below), black no change, and red an increase.

Each mRNA molecule is represented by a single row of colored boxes, and each measuring time point is represented by a single column.

Notice that the mRNA molecules belonging to cluster A started to decrease around 8 hours after beginning experiment.

The mRNA molecules belonging to cluster E began to increase at around 5 hours after the beginning of the experiment.

Cluster Analysis

Page 11: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu
Page 12: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

How to Interpret DNA Microarray Data (I) What we measure with DNA microarrays are changes in florescence intensities.

The changes in fluorescence intensities can be divided into two categories – artifactual and non-artifactual. The present state of the development of the microarray technique is such that artifactual fluorescence intensity changes account for about 50%! This is why it is a common practice to use the notion of “fold changes” referring to fluorescence intensity changes that are greater than 100% (which would be one-fold change).

Only the non-artifactual fluorescence intensities can be related to mRNA levels.

mRNA levels measured with DNA microarrays can be divided into two categories – steady state and non-steady state. The difference between these two categories of mRNA levels can be represented mathematically as follows, where R is a mRNA level and t is time:

Steady state : dR/dt = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1)Non-steady state: dR/dt is not zero.

The steady-state mRNA levels divide into two categories – dynamic and equilibrium. The intracellular levels of mRNA molecules are always determined by two terms – the source term (i.e., the rate of mRNA synthesis, denoted by V_syn) and the sink term (i.e., the rate of mRNA hydrolysis into smaller fragments, denoted as V_hyd):

dR/dt = V_syn – V_hyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . (2)

There are two ways of making Eq. (2) = 0; when V_syn and V_hyd are equal, and when V_syn and V_hyd are both zero’s:

Dynamic steady state: V_syn = V_hyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . (3)Equilibrium steady state: V_syn = V_hyd = 0 . . . . . . . . . . . . . . . . . . . . . . . . . (4)

Page 13: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

How to Interpret DNA Microarray Data (II) The non-steady state mRNA levels divide into two categories:

On-the-way-up: dR/dt > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . (5)On-the-way-down: dR/dt < 0 . . .. . . . . . . . . . . . . . . . . . . . . . . . . (6)

It is probably safe to assume that V_syn is always independent of R (i.e., gene expression is turned on or off by factors other than intracellular levels of corresponding mRNA levels). But V_hyd may often depend on R, leading to the conclusion that there are at least two categories of dynmaic steady states:

Zero-order dynamic steady state: V_hyd = k (R)0 = k . . . . . . . . . . . . . . . . . . (7)First-order dynamic steady state: V_hyd = k (R)1 =

kR . . . . . . . . . . . . . . . . . (8)

These results can be summarized as follows:

Combining Equations (3) and (8) leads to the following useful relation:

V_syn = V_hyd = kR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (9)

Equation (9) states that, under the conditions of the first-order dynamic steady state, the mRNA levels, R, measured with DNA microarrays are directly proportional to the rates of expression of their corresponding genes, V_syn or rates of transcript degradation.

An important corollary of Equation (9) is that, under all other conditions, there is no direct proportionality relation between mRNA levels and the rates of expression of their corresponding genes.

Page 14: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Six Categories of Microarray Fluorescence Intensities(Only non-artifactual fluorescence intensities, i.e., A through D, measure mRNA levels)

V _ h yd = kZ ero -O rd er

(A _ 0 )

V _ h yd = kR1 s t O rd er

(A _ 1 )

D yn am icS tead y S ta te

(A )

E q u ilib riu mS tead y S ta te

V _ syn = V _ h yd = 0(B )

S tead y S ta te

O n -th e-W ay-U pV _ syn - V _ h yd > 0

(C )

O n -th e-W ay-D ow nV _ syn - V _ h yd < 0

(D )

N on -S tead y S ta te

N on -A rtifac tu a l A rtifac tu a l(E )

M ic roarrayF lu orescen ce

In ten s ity

Page 15: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

YBL091C-A

-40

0

40

80

120

160

0 5 10 15 20

TL

TR

1

6

aYNL162W

0

10

20

30

40

0 100 200 300TL

TR

1

6

b

YLR084C

0

0.5

1

1.5

2

0 20 40 60 80

TL

TR

16

cYHR029C

-5

0

5

10

15

20

25

0 10 20 30 40TL

TR

1

6

d

Page 16: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Table 1. The frequency distributions of the 8 modules of RNA metabolism (defined in the legend to Figure 1) as the functions of the 5 time periods following the glucose- galactose shift. If the angles are homogeneously distributed over 360°, the expected distributions can be calculated as shown in the 7 th row. The p-values for the difference between the observed and the expected distributions are given in the last row. The differences are all significant, except for Mechanism 5.

Mech Segments

1 2 3 4 5 6 7 8 Total

1 0 142 234 3470 96 1732 12 39 5725

2 14 18 3 37 5 3729 617 1302 5725

3 340 1914 52 638 314 1471 28 968 5725

4 477 4237 21 151 61 143 19 616 5725

5 12 1151 238 4213 38 56 4 13 5725

Total, Observed

(%)

843(2.94)

7462(26.07)

548(1.91)

8509(29.73)

514(1.80)

7131(24.91)

680(2.38)

2938(10.26)

28625

Total, Expected

(%)

477(1.67)

6678(23.33)

477(1.67)

6678(23.33)

477(1.67)

6678(23.33)

477(1.67)

6678(23.33)

28625

p-value 0 0 0.0011 0 0.0919 0.0000 0 0

Page 17: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Average mRNA Levels= Glycolysis; = Oxphos

0

10

20

30

40

50

-200 0 200 400 600 800 1000Time, min

mR

NA

, mo

lecu

les/

cell

a Time - v_S plots = Glycolysis; = Oxphos

0

0.2

0.4

0.6

0.8

1

-200 0 200 400 600 800 1000

Time, min

v_S

, mo

lecu

les/

cell/

min

b

Degradation/Transacription (D/T) Ratios vs Time= Glycolysis; = Oxphos

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0 200 400 600 800Time, min

D/T

Rat

ios

c

Page 18: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

ΔX ΔXS ΔXD ΔXS /ΔXD

Module

(or Mechanism)

+

+ ΔXD < ΔXS < 1 A

0 ΔXD < 0

(Physically Not Allowed)

(Physically

Not Allowed)

0

+ ΔXD = ΔXS = 1 B

0 ΔXD = 0 (Mathematically

Not Allowed) C

-+ ΔXD > ΔXS > 1 D

0 ΔXD > 0 0 E

Table 2. The five mechanisms (or modules) of the changes (Δ) in transcript abundances (X) in cells due to transcript synthesis (ΔXS) and transcript degradation (ΔXD) :

ΔX = ΔXS – ΔXD

Page 19: RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

References:

[1] Ji, S. (2004). Molecular Information Theory: Solving the Mysteries of DNA. In: Modeling in Molecular Biology (G. Ciobanu, ed.), Elsevier (in press).

[2] Watson, S. J., and Akil, U. (1999). Gene Chips and Arrays Revealed: A Primer on

Their Power and Their Uses. Biol. Psychiatry 45:533-543.