dna microarray and array data analysis some of the slides are adapted from the lecture notes of dr....

27
DNA microarray and array DNA microarray and array data analysis data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility at CWRU

Post on 19-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

DNA microarray and DNA microarray and array data analysisarray data analysis

Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility at CWRU

Page 2: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility
Page 3: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

What is DNA MicroarrayWhat is DNA Microarray DNA microarray is a new technology to DNA microarray is a new technology to

measure the level of the measure the level of the mRNA gene mRNA gene productsproducts of a living cell. of a living cell.

A microarray chip is a rectangular chip A microarray chip is a rectangular chip on which is imposed a grid of on which is imposed a grid of DNA DNA spotsspots. These spots form a . These spots form a two two dimensional arraydimensional array. .

Each spot in the array contains millions Each spot in the array contains millions of copies of some DNA strand, bonded of copies of some DNA strand, bonded to the chip.to the chip.

Chips are made tiny so that a small Chips are made tiny so that a small amount of RNA is needed from amount of RNA is needed from experimental cells.experimental cells.

Page 4: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

DNA MicroarrayDNA Microarray Many applications in both basic and clinical Many applications in both basic and clinical

research research determining the role a gene plays in a pathway, determining the role a gene plays in a pathway,

disease, diagnostics and pharmacology, …disease, diagnostics and pharmacology, …

There are three main platforms for There are three main platforms for performing microarray analyses. performing microarray analyses. cDNA arrayscDNA arrays (generic, multiple (generic, multiple

manufacturers)manufacturers) Oligonucleotide arraysOligonucleotide arrays ( (genechipsgenechips) )

(Affymetrix)(Affymetrix) cDNA membranes (radioactive detection)cDNA membranes (radioactive detection)

Page 5: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

cDNA MicroarraycDNA Microarray Spot cloned cDNAs onto a glass/nylon Spot cloned cDNAs onto a glass/nylon

microscope slidemicroscope slide usually PCR amplified segments of plasmidsusually PCR amplified segments of plasmids Complementary hybridizationComplementary hybridization

-- CTAGCAGG actual gene-- CTAGCAGG actual gene

-- GATCGTCC cDNA (-- GATCGTCC cDNA (Reverse transcriptase)Reverse transcriptase)-- CUAGCAGG mRNA-- CUAGCAGG mRNA

Label 2 mRNA samples with 2 different colors Label 2 mRNA samples with 2 different colors of fluorescent dye -- control vs. experimentalof fluorescent dye -- control vs. experimental

Mix two labeled mRNAs and hybridize to the Mix two labeled mRNAs and hybridize to the chipchip

Make two scans - one for each colorMake two scans - one for each color Combine the images to calculate ratios of Combine the images to calculate ratios of

amounts of each mRNA that bind to each spotamounts of each mRNA that bind to each spot

Page 6: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

CTRL

TEST

Spotted Microarray Spotted Microarray Process Process

Page 7: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

cDNA Array Experiment cDNA Array Experiment MovieMovie

http://http://www.bio.davidson.edu/courses/genowww.bio.davidson.edu/courses/genomics/chip/chip.htmlmics/chip/chip.html

Page 8: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

““Long Oligos”Long Oligos”

Like cDNAs, but instead of using a Like cDNAs, but instead of using a cloned gene, design a 40-70 base cloned gene, design a 40-70 base probe to represent each geneprobe to represent each gene

Relies on genome sequence Relies on genome sequence database and bioinformaticsdatabase and bioinformatics

Reduces cross hybridizationReduces cross hybridization Cheaper and possibly more sensitive Cheaper and possibly more sensitive

than Affy. systemthan Affy. system

Page 9: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

AffymetrixAffymetrix Uses 25 base oligos synthesized in place on Uses 25 base oligos synthesized in place on

a chip (20 pairs of oligos for each gene)a chip (20 pairs of oligos for each gene) cRNA labeled and scanned in a single cRNA labeled and scanned in a single

“color”“color” one sample per chipone sample per chip

Can have as many as Can have as many as 47,000 probes47,000 probes on a on a chip (HG-U133 Plus 2.0 Array)chip (HG-U133 Plus 2.0 Array)

Arrays get smaller every year (more genes)Arrays get smaller every year (more genes) Chips are expensive (about $400/chip)Chips are expensive (about $400/chip) Proprietary system: “black box” software, Proprietary system: “black box” software,

can only use their chipscan only use their chips

Page 10: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Affymetrix Genome Arrays

Page 11: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Affymetrix GeneChipAffymetrix GeneChip®® Probe Probe ArrayArray

Page 12: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Affymetrix GeneChipAffymetrix GeneChip®® Probe Probe

ArraysArrays

24~50µm

Each probe cell or feature containsmillions of copies of a specificoligonucleotide probe

Image of Hybridized Probe Array

Single stranded, fluorescentlylabeled cRNA target

Oligonucleotide probe

**

**

*

1.28cm

GeneChip Probe Array

Hybridized Probe Cell

BGT108_DukeUniv

*

Page 13: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

AffymetrAffymetrix ix

GeneChiGeneChippProbe: Probe:

25 bases long single 25 bases long single stranded DNA stranded DNA oligosoligos

Probe Cell: Probe Cell: Single square-Single square-

shaped feature on shaped feature on an array containing an array containing one type of probe. one type of probe.

Contains millions of Contains millions of probe moleculesprobe molecules

Probe Pair: Probe Pair: Perfect Perfect

Match/MismatchMatch/MismatchProbe Set

Page 14: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Array Design

3’

5’ Twenty oligo probes are selected from the last 600 bases from the 3’ end of the gene

Twenty oligo probes are selected from the last 600 bases from the 3’ end of the gene

Perfect Match

Mismatch

25 mer DNA oligoFor each probe selected, a partner containing a central mutation is also made

Perfect MatchMismatch

Probe Set

Probe Pair

PMMM Probe Cell

24m

24mFor each gene a total of 20 probe pairs are arrayed on the chip

Page 15: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Probe Sub-types on chips

Known genes Specific transcripts Exemplars Consensus Housekeeping genes

Expressed sequence tags (ESTs) Spiked control transcripts

Page 16: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Total RNA (5-8 g) AAAAAAAAA

cRNA preparation

cRNA is now ready for hybridization to test chip

cDNA Strand 1 synthesis TTTTTTTTTNNNNNNNNNAAAAAAAAA

SS II reverse transcriptaseT7RNA pol. promoter

cDNA Strand 2 synthesis

TTTTTTTTTNNNNNNNNNAAAAAAAAANNNNN

E. coli DNA pol. I

T7RNA pol. promoter

NNNNNNNN

IVT cRNA synthesis amplifies and labels transcripts with

Biotin NNNNNNNNNNNNNAAAAAAAAAAAAAAN

TTTTTTTTTT T

UUUUUUUUUU

………..UUUUUUUUUU………..UUUUUUUUUU………..UUUUUUUUUU………..UUUUUUUUUU………..……

…….

T7 RNA pol.TT

Fragmented cRNA

Page 17: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

cDNA probes

B

B

BB

B

B

B

B

B

B

B

B

BB

B

B

B

BB

B

B

cRNA labeled targets

B

B

B

B

B

B

BB

B

B

B

BB

B

B

cRNA labeled targets

Non-SpecificBinding

SpecificBinding

Post hybridiz-ation washes

SFL

SFL

SFL

Page 18: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

B

B

B

SFL

SFL

SFL

B

BB SFL

SFL

SFL

Streptavidin

Page 19: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Microarray experimentMicroarray experiment

ScanScan

Wash Wash StainStain

BB BB BB BB

Biotin-Labeled Biotin-Labeled cRNA transcriptcRNA transcript

cDNAcDNA

IVTIVT

(B-UTP)(B-UTP)

Poly (A)Poly (A)++

RNARNA

AAAAAAAA

HybridizeHybridize

(1-18 hours)(1-18 hours)

FragmentFragment(heat, Mg(heat, Mg2+2+))

Biotin-Labeled Biotin-Labeled cRNA fragmentscRNA fragments

BB BB

BB

BB

CellsCells

Page 20: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

The chip image data file (or “.dat” file) is the first part of data acquisition and appears on the computer screen upon completion of the laser scan.

Here, we zoom in to see an individual probe set that has been highlighted

Probe set

.dat file

Page 21: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

The first image is “sample1.dat.” note the pixel to pixel variation within a probe cell

A “*.cel.” file is automatically generated when the “*.dat” image first appears on the screen. Note that this derivative file has homogenous signal intensity within its probe cells

.cel file

Page 22: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Affymetrix Algorithms 1.1 Adjusting MMs to purge negative values

All MMs < PMs,No adjustment

necessary

Few MMs > PMs, change MMs based on weighted mean of other MMs

Most MMs > PMs, change MMs to be slightly lesss than PM

1. Signal

Page 23: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

Affymetrix Algorithms Signal Calculation.Calculate the signal

PM 1000 5000 430 765 355 98 3005 413 20333 590MM 900 2000 230 25 331 40 1200 203 6197 230

PM-MM 100 3000 200 740 24 58 1805 210 14136 360

Using Tukey’s biweight mean = 1780Signal (expression level) = 1780

Having adjusted the MM values, we now calculate the signal

The PM values.

The PM-MM values are calculated.The MM values.

Standard deviations

1

1 2 3 4 5 6

Weight factor

The unweighted mean is vulnerable to outlier data. In order to protect against this, we dampen the effect of outliers by using the Tukey bi-weight mean. PM-MM values that are a number of standard deviations away from the mean are given low weights in accordance with the graph shown here. Individual PM-MM data are multiplied by the weight factor before calculation of the mean. The weighted mean is then called the “signal.”

Unweighted mean = 2063

Page 24: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

.xls file

Page 25: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

ALL_vs_AML_train_set_38_sorALL_vs_AML_train_set_38_sorted.rested.res

Page 26: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility

ALL_vs_AML_train_set_38_sortALL_vs_AML_train_set_38_sorted.clsed.cls

38 2 138 2 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

1 11 1

27 11

Page 27: DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility