a routine approach to quality control
DESCRIPTION
A Routine Approach to Quality Control. Peter Haberl 19. 11. 2001. Content. The GDE Controller. Workflow Gradients Distortions Local defects Condensing. Playing with negative AvgDiff values. GDE Controller. Workflow. .CEL. DB. Upload server. GD Expressionist™ Analyst. - PowerPoint PPT PresentationTRANSCRIPT
DATA
GENE
A Routine Approach
to Quality Control
Peter Haberl 19. 11. 2001
DATA
GENE
DATA
GENE
Content
The GDE Controller
Playing with negative AvgDiff values
- Workflow- Gradients- Distortions- Local defects- Condensing
DATA
GENE
GDE Controller
... is part of the GD ExpressionistTM system
feature data(.CEL files)
GD CoBi™Database
Upload server
.ABS
.REL
.CEL
DB
GD Expressionist™Controller
GD Expressionist™Analyst
Workflow
DATA
GENE
... extends the conventional data flow
Quality Control
.DAT
Affymetrix GeneData
- Intensity values- Flagging of outliers
Condensation
.ABS
.CDF
Outliersok
Condensation
.CEL
.INS
.CDF
.CHP
.CEL
.CEL
DB
GD Expressionist™Analyst
GDE Controller Workflow
DATA
GENE
GDE Controller Workflow
login
options andthresholds
available chip layouts(.CDF files)
available experiments(.CEL files)
DATA
GENE
... detection
The Controller is about ...
... correction
... condensing
of location dependent systematic effects (gradients)of intensity dependent systematic effects (distortions)of local defects
of global gradientsof global distortions
constructing expression values using different algorithms
GDE Controller Workflow
DATA
GENE
Gradients:
incomplete washing?
thermal effects?
... ?
GDE Controller Gradients
DATA
GENE
Idea:*)
(single chip version)
*) developed in discussions with H. Seidel (Schering, Berlin)
...
divide the chip into 4 x 4 sectors (as for the background determination)
look at the feature distribution in each sector, in particular at the mode (maximum position) and the width
ln (
cou
nts
)
ln ( intensity )
GDE Controller Gradients
DATA
GENE
In an iterative process, transform the intensities I(x,y) I’(x,y) = a(x,y) I(x,y) + b(x,y) such that the sector histograms become aligned.
scale factor a(x,y)in first step:
offset b(x,y)in first step:
all sector histograms after first step:
all sector histograms after third step:
GDE Controller Gradients
DATA
GENE
It was later decided to perform only a multiplicative correction, I(x,y) I’(x,y) = a(x,y) I(x,y) for two reasons:- practical application showed that the scale factor is the dominant effect;- the observable AvgDiff is insensitive to the offset b(x,y) .
A basic assumption of the ‘single-chip’ version is that the distribution of bright and dark features is random. If this assumption is violated (e.g. for the yeast chip), the ‘single-chip’ version encounters problems.
The ‘multi-chip’ version compares the sector histograms not among themselves, but to the sector histograms of a ‘reference chip’. (This is of course only possible if enough ‘similar’ chips are available.)
GDE Controller Gradients
DATA
GENE
Result of Gradient Correction:
‘heat map’ of the scale factor a(x,y)
original corrected
GDE Controller Gradients
DATA
GENE
Further example of Gradient Correction:
‘heat map’ of the scale factor
original
corrected
GDE Controller Gradients
DATA
GENE
Distortions:
A log-log plot of coding (i.e. PMand MM) features can show a nonlinear relationship when compared to the features of a ‘reference chip’.
One of the reasons can be that chips from different chip lots are combined to a series.
(Again, the reference chip can only be constructed if enough ‘similar’ chips are available.)
GDE Controller Distortions
DATA
GENE
Idea:
divide the reference signal region into stripes containing the same number of points (red lines)
in each stripe, determine the median of experiment signals (or – equivalently – the point of maximum density)
force this median line to be the diagonal of the new point cloud; this determines the (intensity dependent) transformation
reference
exp
erim
en
t
GDE Controller Distortions
DATA
GENE
Result of Distortion Correction:
impossibleto correct
GDE Controller Distortions
DATA
GENE
Reference chip:
serves as a ‘virtual standard’ for a given experiment set
Both gradient and distortion detection/correction require the concept of a
the experiment set should be homogeneous:- chips from the same production lot- probes from the same tissue- a small number of differentially expressed genes- doesn’t change the characteristic pattern
the reference chip is computed featurewise (as mean or median)
the chips have to be made comparable, for instance with a global logarithmic-mean normalization
normalized set
reference chip
GDE Controller Reference Chip
DATA
GENE
Local defects:
There are local defects which are already visible in a global chip view:
Aim: Can we reliably detect smaller local defects, if possible automatically?
view ofoutlierlocations:
GDE Controller Local Defects
DATA
GENE
Idea:
construct a ‘ratio chip’ by dividing each feature by its counterpart on the reference chip
for visualisation purposes, show in- green features which are brighter- red features which are darker- black features that don’t change
local defects should show up asspeckles of homogeneous color,with diameters of at least several features
0
1
0 1 2
y00 y01 y02
y10 y11 y12
reference
0
1
0 1 2
x00 x01 x02
x10 x11 x12
experiment
x00/y000
1
0 1 2
x01/y01 x02/y02
x10/y10 x11/y11 x12/y12
ratio chip
GDE Controller Local Defects
DATA
GENEdifferential regulation
actual defects
GDE Controller Local Defects
DATA
GENE
This method can identify defects which would be hard to find ...
GDE Controller Local Defects
DATA
GENE
... or invisible, even in a zoomed view:
GDE Controller Local Defects
DATA
GENE
For old (row-wise spotted) chips,there is the danger that differen-tially expressed genes are detected as chip artefacts
Application of pattern search algorithms can solve this problem
differential regulation
GDE Controller Local Defects
DATA
GENE
Further exampleof a local defect:
GDE Controller Local Defects
DATA
GENE
Defects can have a certainspatial extension:
GDE Controller Local Defects
DATA
GENE
GDE Controller Local Defects
Most frequent structures:
DATA
GENE
GDE Controller Local Defects
... and others:
DATA
GENE
An interactive chip viewer allows to
- view identified mask areas- zoom and find out which genes - are affected by masking - manually edit the masked areas
GDE Controller Local Defects
DATA
GENE
GDE Controller Workflow
reporting
export to database, into analysissoftware or as .CEL files
choose between differentcondensing algorithms:MAS4, MAS5, GeneData( = trimmed mean of log(PM) )
DATA
GENE
replicates: large differential expression:
log-log plot:
correlation of large values is visible
only positive values can be displayed
Playing with negative AvgDiff values
DATA
GENE
replicates: large differential expression:
linear-linear plot:
Playing with negative AvgDiff values
negative values can be displayed
poor resolution for small values
large values appear scattered
DATA
GENE
replicates:
‘cube-root’ plot:
damping at large values
‘zero density regions’ (artefact)
display of positive and negative values
y = AvgDiff 3
Playing with negative AvgDiff values
DATA
GENE
‘lin-log’ transformation:
damping of high values
interpolates smoothly between linear (for small values) and logarithmic (for large values) behaviour
y = sign(x)*ln( 1 + |x| )
sign(x)*ln( 1 + |x| ) =
Playing with negative AvgDiff values
y = x
y = ln(x)
x + o(x3) , x <x2
2-+
± ln( |x| ) + + o( ) ,1 x
1 x2
< 1
x >> 1
=
DATA
GENE
replicates: large differential expression:
‘lin-log’ plot:
Playing with negative AvgDiff values
A good choice is x = AvgDiff / Target , i.e. the target intensity sets the scale
Lines of constant factors are shown in blue (2), red (5) and green (10)
DATA
GENE
Consider the following ‘experiment’:Construct faked .CEL files, where all PM-MM-pairs are interchanged,and condense them with the old Affymetrix algorithm (ignoring AbsCall).
Amusing observation:If one ignores that the scale factor gets negative,
(MAS doesn’t: “Failed to analyze due to invalid Scale Factor”)the old (MAS4) algorithm would be invariant under PM MM !
Target
TrimmedMean(AvgDiff)SF =
Playing with negative AvgDiff values
The ‘lin-log’ plots allow to look at positive and negative AvgDiff valuessimultaneously. But why would we want to look at the negatives at all?
DATA
GENE
perfect group separation:
within replicate groups
across replicate groups
Playing with negative AvgDiff values
Original data: the ‘three-tissue-dataset’: 3 groups with 6 replicates each
DATA
GENE
PM MM data:
These are log-log-plotsof negative AvgDiffs.
The good correlation athigh values indicatesthat these numbers arereproducible.
The difference betweenreplica groups is not so obvious, but ...
Playing with negative AvgDiff values
DATA
GENE
... clustering again results in a complete group separation:
Take-home message:The mismatches carryinformation which can bemeasured reproducibly andcan be used (at least) forpattern comparisons.
Playing with negative AvgDiff values