detection and restoration of hybridization problems in affymetrix genechip data by parametric...

Post on 12-Jan-2016

223 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Detection and Restoration of Hybridization Problems in Affymetrix

GeneChip Data by Parametric Scanning

Tomokazu Konishi

Akita Pref. Univ.

a monolith?

!

praise and censurefor microarray technology

毀毀毀毀

Admired:the comprehensiveness

Criticized:the low reproducibility

Origin of the failure:the intelligent framework for data processing

values without units

obtained from hybridization images

Index CHI11 69715.252 55335.523 89216.684 145717.85 128202.26 1143737 75725.448 39021.069 115491.510 97384.4711 10182312 194268.713 114838.714 118911.315 45748.0516 113630.717 53177.5518 65225.819 117009.420 32688.52

excitation light fluorescence

Requirements for understanding unitless values

philosophy or

metaphysics

framework

ex. International Systems of Units (SI)

desirable framework for microarray

• Measurements–Standard–Scale

• Interpretations–direct link to cell functions

Measurements

• Standard• Scale

are available from the data distribution

Parametric Normalization (2003-4)

data service is available through Skylight Biotech Inc.

Human fibroblast

(Iyer et al. 1999 Science)

ord

ere

d re

spo

nse

va

lue

t h e o re t ica l- 3 -2 -1 0 1 2 3

z -s c o re

-3

-2

-1

0

1

2

3

raw data

SuperNORM

lognormal distribution can be foundby subtracting proper background

distortedby saturation

noise-affected

normalized

statistical framework

• Standard• Scale

are available from the data distribution

Parametric Normalization (2002-4)

data service is available through Skylight Biotech Inc.

desirable framework for microarray

• Measurements–Standard–Scale

• Interpretations–direct link to cell functions

statistical framework

cell

link to cell functions

• Most of the factors are well characterized

• Bottom-Up approach

– nucleotide sequence recognition factors

– controlling the rate limiting steps

– concrete physics

The cell is not a black-box

cell

[mRNA]s are on the balancesrates of synthesis and degradation

Pseudo Equilibrium

Cytosol

Interactions among factorschange the rate

energies describe the interactions

]Eregulator[em0p KkCE

]regulator[cr00

p KkCΔGRate of synthesis

Rate of degradation

energies = k [factors]

energies determine the [mRNA]

kc = ApP0[polymerase]/Ad

 

T

EGEk

Rexp][mRNA d

0pp

cg

link to cell functions

Thermodynamic Model of

Transcriptome Formation (2005)

(the theory and some supporting evidences)

desirable framework for microarray

• Measurements– Standard

– Scale

• Interpretations– direct link to cell functions

physical framework

The framework for microarray data

science

Common to

physics and

biochemistry

feedback to the wet methods

Parametric Scanning

The GeneChip system

• detects nucleotide hybridization,– Measure mRNA levels– Find SNPs

• has 1,000,000 probes (=cells) synthesized in situ

column ~ 1,000 cells

row ~ 1,000 cells

pseudo image: comparison with a standard

scratchy noise

pseudo image: comparison with a standard

pseudo image: comparison with a standard

malfunction region

pseudo image: comparison with a standard

air bubble?

troubles should be removedin prior to data analyses

ideal standard

ideal standard

group of chips

brief normalization&

trimmed mean

finding problems

ideal standard each chip

(z-score)(z-score)

Δz

Normalize with robust parameters

distribution of Δz

in >85% of data,

Δz : N(0, 12)

scanning the image by using moving windows

scanning

• Any of large Δz can be large signal

• Clusters of large Δz should be extremely rare

medians in the moving window are challenged by the test

distribution of the medians of windows

N(0, 12)

(central limiting theorem)

test level is given as the expectation of cancellation

expect = 2

Number of windows:

n=500,000

Double sided test

test=qnorm(1/n)= -4.61

cancellation

expectation = 2

improvement in reproducibilityPMdata, repeated measurements

Some innocent data are cancelled accompanying

only a limited portion!

Now we can handle the .cel data

0

20-2

-2

22 4 7 3 9 3 _ a t

exp

1

exp 2

exp

1

0

-2

22 4 6 0 0 4 _ a t

20-2exp 2

verification by the cells

finding splicing variants

Cancellation by dChip packagePMdata, repeated measurements

After cancellation Cancelled data

improved reproducibility

MAS 5.0 SuperNORM

Exp. 1 (logratio)

Exp

. 2 (

logr

atio

)

Exp. 1 (logratio)

Exp

. 2 (

logr

atio

)

commercially available from …

Skylight-biotech Inc.

http://www.super-norm.com

top related