expression data and microarrays cmmb november 29, 2001 todd scheetz

63
Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Expression Data and Microarrays

CMMB

November 29, 2001

Todd Scheetz

Page 2: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Overview

Gene expression– mRNA– protein

Northern Blots

RT-PCR

SAGE

MicroArray

Page 3: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Gene Expression Review

Transcription– generation of mRNA from genomic DNA

a complete copy is made, including both introns and exons. pre-mRNA

AAAA...

genomic

pre-mRNA

Page 4: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Gene Expression ReviewProcessing / Splicing

– removal of the introns from the pre-mRNA

mature mRNA– also exported from the nucleus to the

cytoplasm– alternative splicing

AAAA...

pre-mRNA

AAAA...

AAAA...

mature mRNAs(splice variants)

Page 5: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Gene Expression ReviewTranslation

– takes an mRNA molecule and uses it to construct an amino acid sequence.

– the ribosome is the underlying machinery used in the process of translation.

Page 6: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Measuring Gene Expression

Two major differentiating factors…Quantitative vs. Qualitative

mRNA vs protein

Most techniques can be used to determine quantitative expression levels.

Ex. EST sequencing

Page 7: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Measuring Gene Expression

More sophisticated experiments…Comparing expression levels of multiple genes

Comparing co-regulation or differential regulation.

Ex. EST sequencing

Page 8: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Northern Blot

Measure relative expression levels of mRNA

1. mRNA isolation and purification

2. electrophorese on a gel

3. The gel is probed by hybridizing with a labeled clone for the gene under study.

Page 9: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Northern Blot

Page 10: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Northern Blot

Page 11: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

RT-PCR

Measures relative expression of mRNA

1. Isolate and purify mRNA

2. reverse transcription

3. PCR amplification

4. run on gel and probe/hybridize

Page 12: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

RT-PCR

Page 13: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

RT-PCR

Why use RT?Can observe very low levels of expression

Requires very small amounts of mRNA

The bad…Potential expression-level skew due to non-

linearity of PCR

Have to design multiple custom primers for each gene.

Page 14: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

SAGE

Page 15: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

SAGE

Page 16: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

SAGE

Tags are isolated and concatermized.

Relative expression levels can be compared between cells in different states.

Page 18: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

What are they?allow 1000’s of expression analyses to be

performed concurrently.

What technologies are used?

How to analyze the image?

How to analyze the expression data?

What bioinformatics challenges are there?

Page 19: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Potential Microarray Applications

• Drug discovery / toxicology studies

• Mutation/polymorphism detection Differing expression of genes over:– Time– Tissues– Disease States

• Sub-typing complex genetic diseases

Page 20: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

DNA Array Technology

Array TypeSpot Density

(per cm 2 )Probe Target Labeling

Nylon Macroarrays < 100 cDNA RNA RadioactiveNylon Microarrays < 5000 cDNA mRNA Radioactive/FlourescentGlass Microarrays < 10,000 cDNA mRNA FlourescentOligonucleotide Chips <250,000 oligo's mRNA Flourescent

Page 21: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Physical Spotting

Page 22: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

Page 23: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Glass Microarray

326 Rat Heart Genes, 2x spotting

Page 24: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Photolithographic

Page 25: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

Page 26: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

Page 27: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

Page 28: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

Overview of data capturetwo different mRNA populations, labeled with

different fluors

excited by a laser

each fluour excites at a different wavelength, which is captured using a photodetector attached to a filter tuned to the particular fluor

Page 29: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

Overview of image analysisspot identification

grid alignment

skew

image normalizationvariable background

uneven hybridization

Page 30: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Microarray Data Pipeline

Page 31: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Image Analysis/Data Quantization

• Feature (target probe) segmentation

• Data extraction and quantization of:– Background– Feature

• Correlation of feature identity and location within image

• Display of pseudo-color image

Page 32: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Image Segmentation

+

Page 33: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Microarray Experiment Design

• Type I: (n = 2)– How is this gene expressed in target 1 as compared to

target 2?

– Which genes show up/down regulation between the two targets?

• Type II: (n > 2)– How does the expression of gene A vary over time,

tissues, or treatments?

– Do any of the expression profiles exhibit similar patterns of expression?

Page 34: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Motivation & Design Constraints

• Probe set design involves the prioritizing and parsing of an initial data set containing potentially hundreds of thousands of probe candidates to define a reasonably sized set for use in a microarray experiment

• A single hybridization can produce several thousand data tuples, each containing multiple (n>10) measurements

• No “All-in-one” software package is currently available, therefore, communication of data between the packages must be facilitated by the pipeline

Page 35: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Probe Set Design

• Goal of probe set design is to identify a reasonably sized subset of probes from a much larger starting set from a variety of sources

• By defining a set of criteria, an investigator should be able to create new probe sets or refine existing sets

• Pruning a data set should be done in several stages: Use readily available information to limit scope of data Obtain more information about remaining probes Narrow focus based on additional information Iterate until desired data set is obtained

Page 36: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Sample Probe Set Design Criteria

• 1° -- Direct– Species

– Tissue

– Chromosome

– Sequence Available• Quality

• Tail/Poly(A) signal

– Map position known?

– Cluster size

• 2° -- Indirect– Blast results

• Confidence value

• Homology (or lack of)

• Annotation contains words like “transfer”

• 3’ & 5’ EST reads hit same gene

– Syntenic Map Information

– Known phenotypes in other species

Page 37: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

cDNA Microarray Slide Creation• cDNA clones defining a probe set must be re-arrayed from

their sources (e.g. local storage or commercial) into a format suitable for amplification and printing (e.g. 96-well microtiter plates)

• Based on the size of the probe set and the limitations of the printer, a parameter set (# of pens, spot spacing, grid dimensions,…) must be defined for printing the probe set onto the slide(s)

• A mapping operation must be performed in order to track each probe from source to destination in order to correlate known information with a particular “spot” in a microarray image

Page 38: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

Overview of data analysisvs. time

vs. other genesco-reg.

diff. reg

pathway ident.

Page 39: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Data Analysis• Data analysis consists of several post-quantization

steps:– Statistics/Metrics Calculations– Scaling/Normalization of the Data– Differential Expression– Coordinated Gene Expression (aka clustering)

• Most software packages perform only a limited number of analysis tasks

• Databases can facilitate the movement of data between packages

Page 40: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Scaling and/or Normalization

• Positive Controls– ‘Spiked’ DNA– Housekeeping Genes– Total Array

• Negative Controls– Foreign DNA– ‘Empty’ spots

Page 41: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

• Linear regression

• Log-linear regression

• Ratio statistics

• Log(ratio) mean/median centering

• Nonlinear regression

Scaling and/or Normalization

Page 42: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

Bioinformatics challenges

1. data management

2. utilizing data from multiple experiments

(type II)

3. utilizing data from multiple groups

* with different technologies

* with only processed data available

Page 43: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

+ - ++ - - ++ - - - + + -- + -

Database(s)

1 2 3 4

Timepoints

Exp

ress

ion

Lev

elCondition1 2 3 4

Gen

e A

B

C E

D

0 60 120 180

Time

Exp

ress

ion

Lev

el

3’ … A C G G G C … … ATG … 5’

3’ … A C G G G C … … ATG … 5’

3’ … A C G G G A … … ATG … 5’

Local Alignment

Search Window

A

C B

?-

0

+

Page 44: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MicroArray

data management

clone - spot

clone - gene

raw expression level

normalized expression level

annotation/links

expression profile

Page 45: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

MArray Expt Mgmt Redux

Experiment 5-Tuple:(Probe Set_ID, Target_ID, Hyb Condition_ID, Hyb Iteration_ID, GenePix_Analysis_ID)

Page 46: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Database Support (EBI Schema)

http://www.ebi.ac.uk/arrayexpress/http://www.bioinf.man.ac.uk/microarray/maxd

Page 47: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Differential Expression

• Type I analysis

• Look for genes with vastly different expression under different conditions– How do you measure “vastly different”?– What role should derived statistics play?

Page 48: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Type I: Differential ExpressionGene 1 vs Gene 2

0

10000

20000

30000

40000

50000

60000

0 10000 20000 30000 40000 50000 60000Gene 1

Ge

ne

2

Page 49: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Coordinated Gene Expression

• Type II analysis

• “Eisen”ized data (dendrograms)

• Self-Organizing Maps

• Principal Component Analysis

• k-means Clustering

Page 50: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Hierarchical Clustering

Page 51: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Self Organizing Maps

Page 52: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Current Software

Software Name Provider

Pro

be

Se

t De

sig

n

Qu

an

tiza

tion

Sta

tistic

s

No

rma

liza

tion

Diff E

xp

CG

E

Array Explorer Spotfire Inc X X X XArray Gauge FujiFilm X X X XArrayDesigner Premier Biosoft Intl Inc XArraySCOUT Lion Bioscience X X XArrayStat Imaging Research Inc X X XArrayViewer TIGR X X XArrayVision Imaging Research Inc X XarrayW oRx Applied Precision Inc X X XCluster/Xcluster Stanford University X XCrazy Quant U of W ashington XGeneCluster MIT X XGenePix Pro Axon Instruments X X XGeneSight Biodiscovery X X X XGeneSpring Silicon Genetics X X X XGeneTAC Genomic Solutions X XImagene Biodiscovery X X X XMicroArray Suite Scanalytics X X X XMicromax NEN X X XOmniGrid GeneMachines XPathways Analysis Research Genetics X X X X XQuant Array Packard Instrument Co X X X XScanAlyze Stanford University X X XSeqArray GCG X XSpotfinder TIGR X X XDotsReader Cose X X XResolver Rosetta

Page 53: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Software/Pipeline Integration

• A centralized database facilitates the archival, manipulation, and mining of all microarray data

• Most analysis programs can output data in a textual format which is easily input into the database

• Output from one program can be used as input to a second program either directly or through a filtering operation facilitated by the database and a set of programs to mine and manipulate the data

• Data from multiple hybridizations may need to be combined in order to perform coordinated gene expression analysis

Page 54: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Standards...

Want ability to exchange microarray experiment data using a common format.

MGED -- Microarray Gene Expression Groupwww.mged.org

MAGEML

Rosetta InpharmaticsGEML -- www.geml.org

MIAME - Minimum Information About Microarry Experiments

Page 55: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Data and Limitations

Current Controversy:Should the raw data be archived?

If so, who should do it?

Each slide (25 mm x 75 mm) is scanned at 200 pixels per mm.Typical spot size = 100 um

Center-to-center = 195 um

Potential spots = 42,000

“Raw” image size = ~250 MB

Page 56: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Other Types of Microarrays

• Genomic BAC arrays– allows assessment of “small” deletions

• Tissue arrays– allows assessment of protein expressions

Page 57: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Type II: Data Partitioning

• Identify genes with similar expression

• Grouping unknown genes with known genes may provide insight into function of unknown genes

• Only useful for genes with varying expression levels

Page 58: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Protein Expression

Protein expression may not correlate with mRNA expression.

How to measure levels of protein expression?

Immunochemistry2-antibody approach

Page 59: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Protein Expression

Indirect Immunofluorescence

cells are fixed

permeabilize the cells

incubate with primary antibody

incubate with secondary antibody

Page 60: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Protein Expression

Page 61: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Protein Expression

Immunofluorescence

green -- tubulin

red -- gamma tubulin

blue -- DNA

Page 62: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Protein Expression

Immunofluorescence

red -- alpha tubulin

green -- vimentin (cytoskeletal protein)

blue -- DNA

Page 63: Expression Data and Microarrays CMMB November 29, 2001 Todd Scheetz

Protein Expression

High-throughput methods

array multiple tissue samples onto slide, and hybridize