gene expression data analyses (1) trupti joshi computer science department 317 engineering building...

39
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North E-mail: [email protected] 573-884-3528(O)

Post on 22-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Gene Expression Data Analyses

(1)

Trupti Joshi

Computer Science Department317 Engineering Building North

E-mail: [email protected](O)

Lecture Schedule for Gene Expression Analyses

Concept of microarray and experimental design for DNA microarray (9/6/05)

Data transformation and normalization for DNA microarray (9/8/05)

Statistical analysis for DNA microarray and Software comparison (9/13/05)

Clustering Techniques for DNA microarray (Dr. Dong Xu 9/15/05)

Lecture Outline

Central Dogma of Molecular Biology

Introduction to Gene Expression and

Microarray

Experimental Design

Lecture Outline

Central Dogma of Molecular Biology

Introduction to Gene Expression and

Microarray

Experimental Design

Central Dogma of Molecular Biology

Gene Expression

mRNA level

Protein level

Lecture Outline

Central Dogma of Molecular Biology

Introduction to Gene Expression and

Microarray

Experimental Design

Introduction: Gene Expression

Same DNA in all cells, but only a few percent common

genes expressed (house-keeping genes).

A few examples:

(1) Specialized cell: over-represented hemoglobin in blood cells.

(2) Different stages of life cycle: hemoglobins before and after

birth, caterpillar and butterfly.

(3) Different environments: microbial in nutrient poor or rich

environment.

(4) Diversity of life.

Microarray is about gene expression.

All information about living being is coded in DNA as a set of genes.

Each gene contains structural information about protein sequence and regulatory information about protein expression.

Intermediate step between gene and protein is mRNA.

The concentration of mRNA is measured by microarray.

Problem

RNA levels and protein levels are not always directly correlated.

No mRNA no protein; Relation is not simple and not universal.

Functional genomics fill the gap between gene expression and organism function.

The meaning of life is hidden in gene expression value but it is not easy to get it out.

Eucaryote Gene Expression Control

DNAPrimaryRNA

transcriptmRNA mRNA

nucleus cytosol

RNA transportcontrol

inactivemRNA

mRNA degradation

control

translationcontrol

nucleus membrane

transcriptionalcontrol

protein

inactiveprotein

protein activitycontrol

RNA processing

control

Microarray mRNAMass-spec protein

Principle of DNA Microarray

Complimentary hybridizationis the basis of RNA measurement.

Base-pairing rules DNA: A-T and G-C RNA: A-U, G-C, G-U

A--TG--CT--AC--G

Microarray Technology

Macroarray: sample spot sizes >= 300 microns

Microarray: typically < 200 micronsbiochip, DNA chip, DNA microarray, gene array,

genome array, gene chip

Initial Ideas of DNA Microarray

Immunoassay

Ekins, R. and F. W. Chu. Microarrays: their origins and applications. Trends in Biotech. 17: 217-218

Application of DNA Microarray Technology

Gene discovery Biological mechanisms (gene regulatory network, etc.) Disease diagnosis (cancer, infectious disease, etc.) Drug discovery: Pharmacogenomics Toxicological research: Toxicogenomics Microbial diversity in the environment …

Increasing Microarray Applications

Advantages and Disadvantages of Micoarray

Advantages:High-throughput

Analyze gene expressions of different cells or from cells under different condition simultaneously

Disadvantages:High noise

Relatively high cost

Categories of DNA Microarray

Probe based cDNA microarray: cDNA (500~5,000 bases) as probe. 10,000-

20,000 spots/slide. Oligo microarray (Affimetrix Microarray): oligonucleotide (20~80-

mer oligos) as probe. 200,000-500,000 spots/slide.

Dye based Double label. For example, Cy3 and Cy5.

One sample is labeled with a “green” dye and the other with “red”. Relative fluorescent intensity of red and green from the same spot.

Single label. All samples are labeled with one color. Absolute fluorescent intensity between different slides. Does not control for the amount of DNA in each spot.

ChipsTypically a glass slide with

cDNA or oligo

Printed by robot or

synthesized by photo-lithography.

Typical arrays are 25x75

mm. Contains up to 500,000 probed gene fragments.

Probe Layout on Chips

Positive control

Genome DNA

House keeping genes

Negative control

Spots with cDNA from very different species

Blank spots

Spots with buffer

Samples

Technical replicates

Microarray Procedures

RNA extractioncDNA prepration

cDNA labeling

Sample mixing

Hybridization

Scanning

Image Analysis

Data transformation and Normalization

Statistical analysis

Experimental Design Data interpretation

Molecular Interaction on microarray

1 molecule per square angstroms

Large molecules are easily to be folded

by themselves

Short targets are better than large

targets to interact with tethered oligos

Ideally, target and probe should have

the same length

Molecules interaction are dynamic

Competitive hybridization

Lecture Outline

Central Dogma of Molecular Biology

Introduction to Gene Expression and

Microarray

Experimental Design

Experimental Protocol

A. Synthesis of cDNA

Synthesis of the second strand DNA

B. Labeling

C. Hybridization D. Scanning

Rational for Experimental Design

Scientific constrains:Scientific aims and their priorities

Physical constrains:Number of slides

Amount of mRNA

Goal of an optimal design: Minimize costs from money, time

Maximize the useful information

Issues for Experimental Design

ScientificSpecific questions and their priorities.

Practical (logistic)Types of mRNA samples: reference, control, treatment.Amount of material available (mRNA, slides, dyes).

Other factorsThe experimental process before hybridization: sample

isolation, mRNA extraction, amplification, and labeling.Controls planned: positive, negative, ratio, and so on.Verification method: northern blot, reverse transcriptase

(RT)-PCR, in situ hybridization, and so on.

Variability and Replicates

Gene expression level for one gene in different slides may not be the same

Replicates:Technical replicates: the target mRNA is from the same

pool (RNA extraction) Reduce variability

Biological replicates: the target mRNA is from different individual extraction.

Obtain averages of independent data Validate generalizations of conclusions

Variation within technical replicates are smaller than that within Biological replicates

Importance of Replicates

Graphical Representation of Design

Use directed graphs

Node: sample

Edge: hybridization, use Cy3 Cy5

Weight: replicates

Cy3: green Cy5: red

Cy3+Cy5: blue

Direct & Indirect Comparison

Compared objectives: T and C

Directive design: TC are on the same slide

Indirect design: TR and CR are on the same slides, respectively. But T and C are on different slides

Variance & Std Deviation

Variance

The most common statistical measure of variability of a random quantity or random sample about its mean. Its scale is the

square of the scale of the random quantity or sample.

Standard Deviation

Standard deviation is the square root of the variance.  It measures the spread of a set of observations. The larger the standard deviation is, the more spread out the observations are.

Variance for Indirect Design

For sample T and C:

Differential Expression

Direct design

Indirect design

T2log C2log

))'/'(log)/((log2/1 22

^

CTCTD

2/)var( 2

D

)'/(log)/(log 22

^

RCRTD 22)var(

D

α and β are means of log intensities across slides for a typical gene.

Dye-swapped Replication

Two hybridizations for two mRNA samples are on the two slides, but dye swapped. For example, Cy3 for A and Cy5 for the first hybridization (slide 1), then C5 for A and Cy3 for the second hybridization (slide 2).

Advantage: reduce systematic bias (e.g. dye bias)

Two sets of replications Dye-swapped replications

Reference Design

It may not be feasible to perform direct design when experimental conditions are more than 3.

Factors in the design

Single factor Two factors Multiple factors

Single Factor Experiments

Time-course Experiments

2x2 factorial experiments

Lecture Outline Central Dogma of Molecular Biology Introduction to Microarray

Application Advantage vs. Disadvantage Chips Microarray procedure

Experimental design Rational Variability and Replication Graphical representation Direct comparison and Indirect comparison Dye swap Reference design Single-factor design Multifactorial design

Reading Assignments

Suggested reading: Yang, YH and T. Speed. 2002. Design issues for

cDNA microarray experiments. Nature Reviews, 3: 579-588.

Statistical analysis of gene expression microarray data. Chapter 2. pp. 35-92. Chapman&Hall/CRC Press, 2003.