functional genomics i - microarrays. vtrevino@itesm.mx transcriptomics proteomics metabolomics ...

Post on 17-Dec-2015

222 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BIOINFORMATICSDR. VÍCTOR TREVIÑOVTREVINO@ITESM.MXA7-421

Functional Genomics I - Microarrays

vtrevino@itesm.mx

FUNCTIONAL GENOMICS TECHNOLOGIES

Transcriptomics Proteomics Metabolomics Genomics

SNP (Single Nucleotide Polymorphisms) CNV (Copy Number Variation, CGH)

Epigenomics

vtrevino@itesm.mx

MICROARRAYS

Technology that provides measurments of thousands of molecules in the same experiment and reasonable prices and precision

Generally in the size of a typical microscope slide (75 x 25 mm (3" X 1") and about 1.0 mm thick)

Biological Question

ExperimentalDesign

MicroarrayExperiment

Pre-processing

Differential Expression Clustering Prediction

Biology: Verification and Interpretation

Image Analysis

Background

Normalization

Sumarization

Transformation

vtrevino@itesm.mx

MICROARRAYS

Google Images

vtrevino@itesm.mx

GENE EXPRESSION

Molecular Cell Biology [Lodish,Berk,Matsudaira,Kayser,Kreiger,Scott,Zipursky,Danell] (5th Ed)

Gene Expression

vtrevino@itesm.mx

MEASURING GENE EXPRESSION

100bp

200bp

- + - + - +

RWPE-1 DU-145 PC-3

100

bp la

dder

mRNA, Gene X

http://www.bio168.com/mag/1B8B368B092A/20-3.jpg

107 c

opie

s

106 c

opie

s

105 c

opie

s

104 c

opie

s

103 c

opie

s

102 c

opie

s

10 c

opie

s

PCR

QPCR

vtrevino@itesm.mx

MICROARRAY - HIBRIDISATION

Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

vtrevino@itesm.mx

http://www.well.ox.ac.uk/genomics/facilitites/Microarray/Welcome.shtml

DNA MICROARRAY TECHNOLOGY

vtrevino@itesm.mx

MICROARRAYS

Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

www.niaid.nih.gov/dir/services/rtb/microarray/overview.asp

http://metherall.genetics.utah.edu/Protocols/Microarray-Spotting.html

http://www.lbl.gov/Science-Articles/Archive/cardiac-hyper-genes.html

http://www.nrc-cnrc.gc.ca/multimedia/picture/life/nrc-bri_micro-array_e.html

http://learn.genetics.utah.edu/units/biotech/microarray/genechip.jpg

Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

vtrevino@itesm.mx

MICROARRAYS – PROBE PRODUCTION

vtrevino@itesm.mx

Affymetrix Images – 1 dyetwo-dyesMICROARRAY TECHNOLOGIES

vtrevino@itesm.mx

MICROARRAY QUALITY

Affymetrix Spotted Arrays Inkjet arrays

Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

vtrevino@itesm.mx

MICROARRAYS

Dr. Hugo BarreraMicroarrays Course EMBO-INER 2005, Mexico City

mRNAExtraction

(and amplification)

Labelling

Hybridization

Scanning

StatisticalAnalysis

Image Analysis &Data Processing

PROCESS

Healty/Control Disease/Treatement

REFERENCE TEST

Gene: A 1-1 B 1-0 C 3-3 D 0-3Gene: E 3-0 F 0-1 G 1-1 H 2-0Gene: I 2-2 J 0-0 K 3-0 L 2-1

Gene D 0.001Gene E 0.005Gene K 0.001

TWO-DYES

mRNA/cDNA

LabeledmRNA

DigitalImage

Microarray

Data

SelectedGenes

PRODUCT

TEST

Gene: A 1 B 1 C 1 D 0Gene: E 4 F 1 G 1 H 2Gene: I 2 J 0 K 5 L 2

Sample

Gene D 0.001Gene E 0.005Gene K 0.001Gene J 0.003

ONE-DYE

vtrevino@itesm.mx

MICROARRAY SCANNING

Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

vtrevino@itesm.mx

MICROARRAY – LASER AND THE SCANNED IMAGE

Dr. Hugo Barrera, Microarrays Course EMBO-INER 2005, Mexico City Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

5m Laser 10m Laser

Pre-processing

Image Analysis

Background

Normalization

Sumarization

Transformation

Microarray - Pre-Processing Purpose

Output: Data File(unique "global relative" measure of expression for every gene with

minimal experimental error)

Input: Scanned Image File

vtrevino@itesm.mx

MICROARRAY IMAGE ANALYSISTECHNOLOGIES

DNA Probes Oligos~2040nt

Target (cDNA, PCR products, etc.)

Copies per gene Usually 1Usually 3

OrganizationSectors (print-tip) n x m probsets

Probeset

mprobsets(~100)

ysectors(~=3)

x sectors (~=3) n probsets (~100)

Sectorsi x j spots (18x20)

Empty spotslanding lights

perfect match probes (pm)mismatch probes (mm)

Controls

vtrevino@itesm.mx

MICROARRAY - IMAGE ANALYSISTECHNOLOGIES

10,000 genes* 2 dyes

* 3 copies/gene* ~40 pixels/gene

= 2,400,00 values

only 10,000 values

10,000 genes* 20 oligos

* 2 (pm,mm)* ~ 36 pixels/gene

= 14,400,00 values

only 10,000 values

RAW DATA

Image AnalysisPre-processing

vtrevino@itesm.mx

IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye

• foreground intensities• background intensities• quality measures.

Addressing Done by GeneChip Affymetrix software

vtrevino@itesm.mx

IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye

• foreground intensities• background intensities• quality measures.

Addressing (by grid, GenePix)

vtrevino@itesm.mx

IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye

• foreground intensities• background intensities• quality measures.

Segmentation

Circular feature Irregular feature shape

Finally compute Average

Background Reduction

Extraction:

DeterminingBackground

2-Color

Results (GenePix).gpr file "results" for one array

10,000 genes~ 30,000 values

(.gal files 1 file for a "list" of array)

Affymetrix

Results.cel file "results" for one array

(raw - no background reduced)

10,000 genes~ 400,000 values

Image Analysis

vtrevino@itesm.mx

IMAGE ANALYSIS

Segmentation(Spot detection)

BackgroundEstimation

Value

Value = Spot Intensity – Spot Background

Gene 1Gene 2Gene 3

.

.Gene k

.

.Gene N

Sample 1

100209

-7..

9882..

2298

Sample 1

984209

2..

9711..

28

vtrevino@itesm.mxDATA TRANSFORMATION – TWO DYES

Gene 1Gene 2Gene 3

.

.Gene k

.

.Gene N

Sample 1

100209

-7..

9882..

2298

Sample 1

984209

2..

9711..

28 G=Sample 1

R=

Sam

ple

1

G=Sample 1

R=

Sam

ple

1

Log2

Log2

vtrevino@itesm.mxDATA TRANSFORMATION – TWO DYES

Gene 1Gene 2Gene 3

.

.Gene k

.

.Gene N

Sample 1

100209

-7..

9882..

2298

Sample 1

984209

2..

9711..

28

(log2 scale)

RG

1 value?

22

2

GRLogA

G

RLogM

A

M

MA-PlotG=Sample 1

R=

Sam

ple

1

8 10 12 14 16

-4-3

-2-1

01

(log2(G)+log2(R)) / 2

log2

(R)-

log2(G

)

A

M

"With-in"(2 color technologies)

Normalization – 2 dyes

(assumption: Majority No change)

Normalization – 2 dyes

(assumption: Majority No change)

Before

After

"With-in"(2 color technologies)

Normalization – 2 dyes"With-in" Spatial

(2 color technologies)

Before NormalizationAftter loess

Global Normalization

Aftter loessby Sector (print-tip)

Normalization

vtrevino@itesm.mxDATA TRANSFORMATION – ONE DYE

Gene 1Gene 2Gene 3

.

.Gene k

.

.Gene N

Sample 1

100209

-7..

9882..

2298

Log2

7 8 9 10 11 12

0.0

0.5

1.0

1.5

density(x = log2(t[, 15] + 200), adjust = 0.475)

N = 3840 Bandwidth = 0.1051

Density

9 10 11 12 13 14 15 16

0.0

0.2

0.4

0.6

0.8

1.0

log intensity

de

nsi

ty

10 11 12 13 14 15

0.0

0.2

0.4

0.6

0.8

x

de

nsi

ty

Before normalization After normalization

Between-slides

Normalization – 1 or 2 dyes

quantileMAD (median absolute deviation)

scaleqspline

invariantset

loess

Sumarization = "Average"(Intensities)

Summarization – AffymetrixOligonucleotide dependent technologies

Usual Methods:• tukey-biweight• av-diff• median-polish

PMMM

The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.

vtrevino@itesm.mx

MICROARRAYS – FILTERING / TREATING UNDEFINED VALUES

Some spots may be defective in the printing process

Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc)

Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods

(warning)

vtrevino@itesm.mx

MICROARRAY – DATA FILTERING

More than 10,000 genes Too many data increases Computation Time and

analysis complexity Remove

Genes that do not change significantly Undefined Genes Low expression

Keeping Large signal to noise ratio Large statistical significance Large variability Large expression

vtrevino@itesm.mx

Image Analysis`

Background Subtraction

Normalization

Summarization

Transformation

Data Processing

BackgroundDetection & Subtraction

a)

Filtering

Microarray

ImageScanning

SpotDetection

IntensityValue

Affymetrix

Two-dyes

b) Image Analysis and Background Subtraction

c)

Transformation

BetweenWithin

d)

A=log2(R*G)/2

M=

log

2(R

/G) Normalization

MICROARRAY PRE-PROCESSING SUMMARY

vtrevino@itesm.mx

MICROARRAY REPOSITORIES

vtrevino@itesm.mx

MICROARRAY APPLICATIONS

Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007

vtrevino@itesm.mx

MICROARRAY DATA MATRIX

Gene 1Gene 2Gene 3

.

.

.

.Gene N

Class ASamples

Class BSamples

Normal Tissue,Cancer A,

Untreated,Reference,

Tumour Tissue,Cancer B,Treated,Strains,…

….

….

vtrevino@itesm.mx

MICROARRAYS – WHAT CAN BE DONE WITH DATA?

Differential Expression Unsupervised Classification Biomarker detection Identifying genes related to survival times Regression Analysis Gene Copy Number and Comparative Genomic

Hibridization Epigenetics and Methylation Genetic Polymorphisms and SNP's Chromatin Immuno-Precipitation On-Chip Pathogen Detection …

vtrevino@itesm.mx

Differential Expression

Positive Negative

SamplesA

SamplesB

SamplesA

SamplesB

Gene Selection

µ=dµ=d

Exp

ress

ion

Leve

l

DIFFERENTIAL EXPRESSION

Gene 1Gene 2Gene 3

.

.

.

.Gene N

Class ASamples

Class BSamples

Normal Tissue,Cancer A,

Untreated,Reference,

Tumour Tissue,Cancer B,Treated,Strains,…

p-value FDR q-Value

Biomarker Detection

Positive Negative

SamplesClass A

SamplesClass B

SamplesClass A

SamplesClass B

µ=dµ=d

Gene Selection

Exp

ress

ion

Leve

l

Biomarker Discovery

Gene 1Gene 2Gene 3

.

.

.

.Gene N

Class ASamples

Class BSamples

Normal Tissue,Cancer A,

Untreated,Reference,

Tumour Tissue,Cancer B,Treated,Strains,…

vtrevino@itesm.mx

A C G B H E D I K M LSamples

Co-ExpressedGenes

Unsupervised Sample Classification

HJ2.b

HJ0

He0

He2.b

Hh6.tw

Hh4.b

Hh2.b

Hh4.tw

Hh2.tw

Hh0

Hh6.b

IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3

HJ2.b HJ

0

He0

He2.b

Hh6.t

w

Hh4.b

Hh2.b

Hh4.t

w

Hh2.t

w Hh0

Hh6.b

IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3

HJ2.b HJ0 He0

He2.b

Hh6.tw Hh4

.bHh2

.bHh4

.twHh2

.twHh0

Hh6.b

IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3

a

B

Low

High

Expression

HJ2.b HJ

0

He0

He2.b

Hh6.t

w

Hh4.b

Hh2.b

Hh4.t

w

Hh2.t

w Hh0

Hh6.b

IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3

HJ2.

b

HJ0

He0

He2.

b

Hh6.

tw

Hh4.

b

Hh2.

b

Hh4.

tw

Hh2.

tw Hh0

Hh6.

b

IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3

123456789b

UNSUPERVISED CLASSIFICATION

vtrevino@itesm.mx

Genes Associated to Survival Times and Risk

Positive NegativeGene Selection

+

+

++++++++

++++

+

Kaplan-Meier Plot

Time

Haza

rd

1.0

0.0

+

+

++++++++

++++

+

Kaplan-Meier Plot

Time

Haza

rd

1.0

0.0

0.0 0.0

SURVIVAL TIMES

Gene 1Gene 2Gene 3

.

.

.

.Gene N

Class ASamples

Class BSamples

Normal Tissue,Cancer A,

Untreated,Reference,

Tumour Tissue,Cancer B,Treated,Strains,…

vtrevino@itesm.mx

Regression: Gene Association to outcome

Positive NegativeGene Selection

Dep

en

den

t Vari

ab

le

Gene Expression

Dep

en

den

t Vari

ab

le

Gene Expression

Slope ≠ 0 Slope = 0

REGRESSION

Gene 1Gene 2Gene 3

.

.

.

.Gene N

Class ASamples

Class BSamples

Normal Tissue,Cancer A,

Untreated,Reference,

Tumour Tissue,Cancer B,Treated,Strains,…

vtrevino@itesm.mx

M M M M M

M M M M M M M M

M M M M M

M M M

M M M

M M M

X X

Unmethylated Fraction Hypermethylated Fraction

Sample Control Sample Control

Cleavage withmethylation-sensitive

restriction enzyme

Cleavage withTasI Csp6I

CpG specificAdaptor Ligation Adaptor Ligation

CpG specificcleavage with

McrBC

Cleavage withmethylation-sensitive

restriction enzyme

Adaptor-specificamplification

Adaptor-specificamplification

Unmethylated fraction Hypermetylation fraction

Cy5(red)

Cy3(green)

Cy5(red)

Cy3(green)

Microarray Microarray

CPG METHYLATION

Labelling DetectionHybridisation

AA CG CC

……

SNP1SNP2SNP3

3'

T

3'

T

3'

G

3'

C

3'

G

3'

G

T G

G

C

5'

5'

5'

5'SNP1

SNP2

SNP3

Products of 1nt primerextension (in solution)

Capture

C TGA

5'

GC

5'

CG

AA CG CC

SNP1SNP2SNP3

5'5'5'5'

+

Transcribed RNA+ reverse transcriptase

5' 5'

GCGCA^C

5'5'

TA C^AExtension

ddNTPs(one labelled)

5'

TA

5'

TA

5'

GC

5'

CG

5'

GC

5'

GC

AA CG CC

……

SNP1SNP2SNP3

Extension(1nt)

+

Labelled ddNTPsPCR products+ DNA polymerase

TC GA

SNP1 SNP2 SNP3a

b

c

Chromatin Immuno-Precipitation(ChIP-on-Chip)

Precipitation ofAntibody-TF-DNA

complex

Fusion ofTag sequenceinto TF gene

Labelling ofprecipitated

DNA

MicroarrayHybridisation

IncubationDNA-Tagged TF

Transcription Factor Tag

Antibodyagainst

tag peptide

vtrevino@itesm.mx

(1) ACGGCTAGTCACAAC...(2) GCTAGTCACAACCCA...(3) GCTAGTCCGGCACAG......

Sample

Spotted Hybridized

(1) (2) (3)

PATHOGEN/PARASITES DETECTION

EXAMPLE 1: DIFFERENTIAL EXPRESSION

Placenta 1 Placenta 2mRNA Extraction

Reference Pool

Labelling

MicroarrayHybridization(by duplicates)

Scanning &Data Processing

Detection ofDifferentially

Expressed Genes

Validation andAnalysis

Green GreenRedRed

t-test H0: µ = 0p-values correction: False Discovery Rate

Comparison With Known Tissue Specific Genes

ImageAnalysis

WithinNormalization

(per array)

BetweenNormalization

(all arrays)

(controls)

(Dr. Hugo Barrera)

a b

c dPlacenta/Reference Control/Control

51 52 56 54

(a) Microarray Experiment

Ratio(log2)

10 -6

Pla

cen

ta

(b) T1dbase

T1 score

1 0

Lu

ng

T

hala

mu

s A

myg

dala

S

pin

al

Cord

Test

is

Kid

ney

Liv

er

Pit

uit

ary

T

hyr

oid

C

ere

bell

um

H

ypoth

ala

mu

s C

au

date

Nu

cleu

s E

xocr

ine

Pan

creas

Lym

ph

Nod

e

Fro

nta

l C

ort

ex

Sto

mach

B

reast

B

on

e M

arr

ow

Pan

creati

c Is

lets

U

teru

s O

vary

S

kin

H

eart

S

kele

tal

Mu

scle

P

rost

ate

T

hym

us

Sali

vary

Gla

nd

T

rach

ea

Pla

cen

ta 2

Rep

lcate

2

Pla

cen

ta 2

Rep

lica

te 1

Array:

Pla

cen

ta 1

Rep

lica

te 1

Pla

cen

ta 1

Rep

lica

te 2

vtrevino@itesm.mx

OTHER MICROARRAYS

Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007

vtrevino@itesm.mx

ANTIBODIES MICROARRAYS

Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007

vtrevino@itesm.mx

PROTEIN MICROARRAYS

Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007

vtrevino@itesm.mx

CARBOHYDRATE MICROARRAY

vtrevino@itesm.mx

SMALL-MOLECULE MICROARRAYS

top related