microarray - wordpress.comfrom genex spot of genex with complementary sequence ... hybridization of...

Post on 12-Mar-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Microarray

Mitesh Shrestha

• transcription • post transcription (RNA stability) • post transcription (translational control) • post translation (not considered gene regulation)

the “transcriptome”

Genes can be regulated at many levels

RNA PROTEIN DNA TRANSCRIPTION TRANSLATION

Usually, when we speak of gene regulation, we are referring to transcriptional regulation. The complete set of all genes being transcribed are referred to as the “transcriptome.”

In the last dozen years, it has become possible to look at

the entire transcriptome in a single experiment!

While there are a number of variations, there are

essentially two basic ways of doing this—using

sequencing-based methods and microarrays. These

have largely replaced older methods such as subtractive

hybridization and differential display.

Sequencing-based methods are very powerful but have

typically been prohibitively expensive. However, with

recent advances in low-cost, high-throughput next

generation sequencing, these methods—referred to as

“RNA-seq”—are becoming more common and may soon

be dominant.

Genomic analysis of gene expression

• Methods capable of giving a “snapshot” of RNA expression of all genes

• Can be used as diagnostic profile – Example: cancer diagnosis

• Can show how RNA levels change during development, after exposure to stimulus, during cell cycle, etc.

• Provides large amounts of data • Can help us start to understand how whole

systems function

Benfey and Protopapas, "Genomics" © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper

Saddle River, New Jersey 07458

Although details of the methods vary, the concept behind

RNA-seq is simple:

• isolate all mRNA

• convert to cDNA using reverse transcriptase

• sequence the cDNA

• map sequences to the genome

The more times a given sequence is detected, the more

abundantly transcribed it is. If enough sequences are

generated, a comprehensive and quantitative view of the

entire transcriptome of an organism or tissue can be

obtained.

RNA-seq

Nucleic acid hybridization

Gene expression assays

The main types of gene expression assays:

– Serial analysis of gene expression (SAGE);

– Short oligonucleotide arrays (Affymetrix);

– Long oligonucleotide arrays (Agilent);

– Fibre optic arrays (Illumina);

– cDNA arrays (Brown/Botstein)*.

Biological

Question

Sample

Preparation

Data Analysis &

Modelling

Microarray Reaction

MicroarrayDete

ction

Taken from Schena & Davis

Microarray life cyle

Evolution & Industrialization 1989: First Affymetrix Genechip Prototype

1994: First Commercial Affymetrix Genechip

1994- First cDNAs arrays were developed at Stanford University.

1994: First Commercial Scanner-Affymetrix

1996- Commercialization of arrays

1997-Genome-wide Expression Monitoring in S. cerevisiae

Types of Microarrays

-Expression Arrays -Protein microarrays (Proteomics) -Resequencing arrays -CGH arrays- Comparative genomic hybridization -SNP Arrays -Antibody Arrays -Exon arrays-Alternative splice variant detection -Tissue Arrays

Microarrays may eventually be eclipsed by sequence-based methods, but

meanwhile have become incredibly popular since their inception in 1995

(Schena et al. (1995) Science 270:467-70).

Microarrays are based on the ability of complementary strands of DNA

(or DNA and RNA) to hybridize to one another in solution with high

specificity so can be used for DNA or RNA abundance on a genomic

scale in different types of cells.

There are now many variations. We’ll take a quick look at the two basic

types: Affymetrix (high density oligonucleotide) and glass slide (cDNA,

long oligo, etc). Both are conceptually similar, with differences in

manufacture and details of design and analysis.

DNA microarrays

Cell A Cell B

Hybridizaton to chip

Labeled cDNA

from geneX

Spot of geneX with

complementary sequence

of colored cDNA This spot shows red color after scanning.

Idea of Microarray

Several Types of Arrays

• Spotted DNA arrays – Developed by Pat Brown’s lab at Stanford

– PCR products of full-length genes (>100nt)

• Affymetrix gene chips – Photolithography technology from computer industry

allows building many 25-mers

• Ink-jet microarrays from Agilent – 25-60-mers “printed directly on glass slides

– Flexible, rapid, but expensive

Array Fabrication Spotting

• Use PCR to amplify DNA

• Robotic "pen" deposits DNA at defined coordinates

• approximately 1-10 ng per spot

• Experimentation with oligos (40, 70 bp)

This machine can make 48 microarrays simultaneously.

Array Fabrication Photolithography

• Light activated synthesis • synthesize oligonucleotides on glass slides

• 107copies per oligo in 24 x 24 µm square

• Use 20 pairs of different 25-mers per gene

• Perfect match and mismatch

Array Fabrication Photolithography

Affymetrix Microarrays

50um

1.28cm

~107 oligonucleotides, half perfectly match mRNA (PM), half have one mismatch (MM) Raw gene expression is intensity difference: PM - MM

Raw image

Agilent cDNA microarray and oligonucelotides microarray

• Agilent delivering printed 60-mer microarrays in addition to 25-mer formats.

• The inkjet process uses standard phosphoramidite chemistry to deliver extremely small volumes (picoliters) of the chemicals to be spotted.

Biological question

Differentially expressed genes

Sample class prediction etc.

Testing

Biological verification

and interpretation

Microarray experiment

Estimation

Experimental design

Image analysis

Normalization

Clustering Discrimination

R, G

16-bit TIFF files

(Rfg, Rbg), (Gfg, Gbg)

Microarray Experiments

Experimental Design for Microarrays

There are a number of important experimental design

considerations for a microarray experiment:

•technical vs biological replicates

•amplification of RNA

•dye swaps

•reference samples

Experimental Design for Microarrays

Technical vs biological replicates

•technical replicates are repeat hybridizations using the

same RNA isolate

•biological replicates use RNA isolated from separate

experiments/experimental organisms

Although technical replicates can be useful for reducing

variation due to hybridization, imaging, etc., biological

replicates are necessary for a properly controlled experiment

Experimental Design for Microarrays

Amplification of RNA

• linear amplification methods can be used to increase the

amount of RNA so that microarray experiments can be

performed using very small numbers of cells. It’s not clear

to what degree this affects results, especially with respect

to rare transcripts, but seems to be generally OK if done

correctly

Experimental Design for Microarrays

Dye swaps

When using 2-color arrays, it’s important to hybridize

replicates using a dye-swap strategy in which the

colors (labels) are reversed between the two

replicates. This is because there can be biases in

hybridization intensity due to which dye is used (even

when the sequence is the same).

S1 S2

S1 S2

Experimental Design for Microarrays

Reference samples

•one common strategy is to use a reference sample

in one channel on each array. This is usually

something that will hybridize to most of the

features (e.g., a complex RNA mixture). Using a

reference sample allows comparisons to be made

between different experimental conditions, as each

is compared to the common reference.

S1

S2

S3

R

R

R

compare

S1/R vs. S2/R vs. S3/R

The Workflow of Microarray

Array

Hybridized Array

Hybridization

Scanning

Plate

Array Fabrication

Plate Preparation RNA extraction

Labeled cDNA

cDNA synthesis

and labeled

sample

cDNA Synthesis And Directly Labeling

Cy3 and Cy5 cDNA Hybridization On To The Chip

1.Loading from the corner of the cover slip It is time consuming and easily producing bubbles.

2. Loading sample at the center of array then put the slip smoothly Faster, and have lower chance of bubble producing then the last one.

3. Loading sample at the side of the array then put the slip on. Solution would attach to the slip right after the slip contact with it, and would diffuse with the movement of slip when we slowly move down.

1

2

3

Sample loading

e.g. treatment / control

normal / tumor tissue

Sample loading

Sample loading

Scan

Green: down regulate Red: up regulate Yellow: equal level

RESULTS

The colors denote the degree of expression in the experimental versus the control cells.

Gene not expressed in control or in experimental cells

Only in control

cells

Mostly in control

cells

Only in experimental

cells

Mostly in experimental

cells

Same in both cells

Image analysis

• The raw data from a cDNA microarray experiment consist of pairs of image files, 16-bit TIFFs, one for each of the dyes.

• Image analysis is required to extract measures of the red and green fluorescence intensities for each spot on the array.

Steps in image analysis

1. Addressing. Estimate location of spot

centers.

2. Segmentation. Classify pixels as

foreground (signal) or background.

3. Information extraction. For

each spot on the array and each

dye

• foreground intensities;

• background intensities;

• quality measures.

Why do we calculate the background intensities?

• Motivation behind background adjustment: A spot’s measured fluorescence intensity includes a contribution that is not specifically due to the hybridization of the target to the probe, but to something else, e.g. the chemical treatment of the slide, autofluorescence etc. Want to estimate and remove this unwanted contribution.

Quantification of expression

For each spot on the slide we calculate

Red intensity = Rfg - Rbg

fg = foreground, bg = background, and

Green intensity = Gfg – Gbg

cDNA gene expression data

Genes

mRNA samples

Gene expression level of gene 5 in mRNA sample 4

= log2( Red intensity / Green intensity)

sample1 sample2 sample3 sample4 sample5 …

1 0.46 0.30 0.80 1.51 0.00 ...

2 -0.10 0.49 0.24 0.06 0.46 ...

3 0.15 0.74 0.04 0.10 0.20 ...

4 -0.45 -1.03 -0.79 -0.56 -0.32 ...

5 -0.06 1.06 1.35 1.09 -1.09 ...

Data on p genes for n samples

down-regulated gene

Up-regulated gene

unchanged expression

Homogeneity and Separation Principles

• Homogeneity: Elements within a cluster are close to each other

• Separation: Elements in different clusters are further apart from each other

• …clustering is not an easy task!

Given these points a clustering algorithm might make two distinct clusters as follows

Bad Clustering

This clustering violates both Homogeneity and Separation principles

Close distances from points in separate clusters

Far distances from points in the same cluster

Good Clustering

This clustering satisfies both Homogeneity and Separation principles

Clustering Techniques

• Agglomerative: Start with every element in its own cluster, and iteratively join clusters together

• Divisive: Start with one cluster and iteratively divide it into smaller clusters

• Hierarchical: Organize elements into a tree, leaves represent genes and the length of the pathes between leaves represents the distances between genes. Similar genes lie within the same subtrees

Hierarchical Clustering

1 2

3

4

6

5

7 8 9

7 9 8 4 5 1 2 3 6

Validation of data

There’s no way that all of your microarray data can

be validated.

It’s strongly recommended that any key findings

be verified by independent means.

Northern blots and quantitative RT-PCR are the

typical ways of doing this; real-time, quantitative

RT-PCR is generally the method of choice.

• EXPERIMENT DESIGN

type, factors, number of arrays, reference sample, qc, database

accession (ArrayExpress, GEO)

• SAMPLES USED, PREPARATION AND LABELING

• HYBRIDIZATION PROCEDURES AND PARAMETERS

• MEASUREMENT DATA AND SPECIFICATIONS

quantitations, hardware & software used for scanning and analysis,

raw measurements, data selection and transformation procedures, final

expression data

• ARRAY DESIGN

platform type, features and locations, manufacturing protocols or

commercial p/n

MIAME (Minimal Information About a Microarray Experiment)

When you publish a microarray experiment, you are expected to make available

the following minimal information. This allows others to evaluate your data and

compare it to other experimental results:

Repositories of Microarray Studies

• Due to the large use of microarrays, data repositories have flourished world-wide. Three of the largest databases of gene expression are:

1. The Gene Expression Omnibus (GEO)

2. National Center for Biotechnology Information (NCBI)

3. Stanford Microarray Data Base (SMD)

And for PLANTS

Plant Expression database

PLEXdb

Tiled microarrays

So-called tiled microarrays cover a genomic region (or the

whole genome!) at high coverage. Probes are designed to cover

virtually every basepair of the sequence, usually excluding

only simple sequence repeats. In this way, there is no bias

toward known transcribed regions.

genomic sequence probes on array

probe size and spacing determines the resolution of the array

Expression Arrays Most common type of microarray

Spotted glass, cartridge, and electronic

Involves extracting RNA from a sample and converting it to cDNA by priming off of the Poly A tail of mRNA for eukaryotes and using random hexamers for prokaryotes [WHY?]

Measures the amount and type of mRNA transcripts

Provides information on whether genes are up or down regulated in a specific condition

Can find novel changes in ESTs for specific conditions

Protein Microarrays

True protein microarrays are evolving very slow and only a few exist. Technology is not straight forward due to inherent characteristic of proteins [e.g. available ligands, folding, drying…] Most are designed to detect antibodies or enzymes in a biological system Protein is on the microarray Some detect protein-protein interaction by surface plasmon resonance other use a fluorescence based approach

Protein Microarrays

The Invitrogen Human Protein Microarray is a high-density microarray It contains thousands of unique human proteins [kinases, phophatases, GPCRs, nuclear receptors, and proteases]

Antibody Arrays -Assay hundreds of native proteins simultaneously

-Compare protein abundances in a variety of biological

samples

-GenTel and BD biosciences

-Antibody or ligand is on the microarray

Antibody Arrays-labeling scheme

Targets DNA not RNA like expression

Requires amplification of target DNA

Uses multiple probes sets to determine base change at a specific nucleotide position in the genomic DNA.

Use thousand of oligos that “tile” or span the genomic DNA for characterization.

Provides sequence and genotyping data including LOH, Linkage analysis and single nucleotide polymorphisms

SNP, Genotyping, and DNA Mapping Arrays

Resequencing Arrays [Affy]

Enable the analysis of up to 300,000+ bases of double-stranded sequence (600,000 bases total) on a single Affy array Used for large-scale resequencing of organisms genome and organelles Faster and cheaper than sequencing but very limited to few organisms and/or organelles Large potential

Exon Arrays-Alternative splice variant detection

Probes are designed for hybridizing to individual exons of genomic DNA

Tissue or development specific splicing leads to normal or expected protein diversity

Defective splicing can lead to disease

CGH Arrays- Comparative Genomic Hybridization

Provides DNA and chromosomal information DNA Copy number and allele-specific information

Enables the identification of critical gene(s) that have altered copy number and may be responsible for the development and progression of a particular disease.

Determine regions of chromosomal deletion (LOH) or amplification

CGH Arrays- Comparative Genomic Hybridization

CGH (comparative genomic hybridization) looks at cytogenetic

abnormalities

•genomic DNA hybridized to array

•often uses large clones (e.g., BACs) as array features

Tissue Arrays

Slide based “spotted” tissues (not really)

Applications of microarrays

• Measuring transcript abundance (cDNA arrays);

• Genotyping;

• Estimating DNA copy number (CGH);

• Determining identity by descent (GMS);

• Measuring mRNA decay rates;

• Identifying protein binding sites;

• Determining sub-cellular localization of gene products;

• Classification – there’s a lot of promise in medicine (especially cancer research) for this

Other types and uses of microarrays: ChIP-chip

Other types and uses of microarrays: RIP-chip

Similar to ChIP-chip but for discovering RNA binding

proteins rather than DNA binding proteins

Other types and uses of microarrays: PBMs

Protein-binding microarrays can be used to identify transcription

factor binding sequences (motifs)

•double-stranded DNA probes used on array

•purified protein hybridized to array

•detected by antibody to protein or to epitope tag

•can use real genomic sequence or carefully designed

oligonucleotides

•possible to look at all possible 10-mer nucleotide sequences

on a single array!

Berger, M.F. and M.L. Bulyk. 2006. Methods Mol Biol 338: 245-260.

Berger, M.F., A.A. Philippakis, A.M. Qureshi, F.S. He, P.W. Estep, 3rd, and M.L. Bulyk. 2006. Nat

Biotechnol 24: 1429-1435.

Microarray Limitations

Cross-hybridization of sequences with high identity

Chip to chip variation

True measure of abundance?

Does mRNA levels reflect protein levels? Generally, do not “prove” new biology - simply suggest genes involved in a

process, a hypothesis that will require traditional experimental verification.

What fold change has biological relevance?

Need cloned EST or some sequence knowledge -- rare messages may be undetected

Expensive!! Not every lab can afford experiment repeat.

The real limitation is Bioinformatics

top related