The following slides have been adapted from
http://www.tm4.org/ to be presented at the
Follow-up course on Microarray Data Analysis
(Nov 20-24 2006, PICB Shanghai) by Peter Serocka
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
TIGR Spotfinder:a tool for microarray image
processing
The Institute for Genomic Research
Developer: Vasily Sharov
Microarray Data Flow
Raw Gene Expression Data
Normalized Data with Gene Annotation
Interpretation of Analysis Results
Image File
Gene Annotation
ScannerPrinter
Image Analysis
Normalization / Filtering
Expression Analysis
Microarray Data Flow
Raw Gene Expression Data
Normalized Data with Gene Annotation
Interpretation of Analysis Results
Image File
Gene Annotation
ScannerPrinter
Image Analysis
Normalization / Filtering
Expression Analysis
.tif
.mev (.gpr)
.mev (.gpr, .txt)
.ann (.gal)
TIGR Others
Slide Images .tif .tif
Gene Expression tables .mev
.tav - outdated
.gpr (GenePix)
.txt (tab-delimited, Excel)
Gene Annotations and
Array layout information
.ann .gal
Data File Formats
Cy5 intensity
Cy3
Cy5
Cy5-cDNA
Cy3-cDNA
RT
RT
cDNAarray
Cy3 intensity
Sample2 mRNA
Sample1 mRNA
Process Overview
Basic Steps from Image to File
1.) Image File Loading
2.) Construct or Apply an Overlay Grid
3.) Computations• Find Spot Boundary and Area• Intensity Calculation• Background Calculation and Correction
4.) Quality Control
5.) Text File Output
Basic DemonstrationExploring the Interface
(Using An Existing Grid File)
Microarray Image ParametersMA Scanner generates two 16 bit gray scale TIFF images: one image for each labeling probe (Cy3 and Cy5)
16 bit schema provides signal dynamic range from 0 to 216=65536
Each image size varies from 20 to 30 MB for scanning resolution 10 m/pixel
Image size 22 MB
Image size 28 MB
Typical layout of microarray image
(images scanned at 10m/pix resolution)
Processing Overview
Apply the Grid
Determine Spot Boundary
Calculate Spot Intensity
Determine Backgroundand Correct Intensity
Applying an Overlay Grid
• What does it accomplish?
– The grid cells set a boundary for the spot finding algorithms.
– The grid cells also define an area for background correction.
pin X pin X
pin Y
pin Y
Gridding Dimension Parameters
spot spacing
Spot Spacing Parameter
Spot Finding
Spot finding requires an estimated spot size.The spot can be drawn as an irregular contour, as an ellipse, or as unconnected pixels.
Area insidecontouris used for spot intensity calculation
Area outsidecontour is used forlocal background calculation
Processing Overview
Apply the Grid
Determine Spot Boundary
Calculate Spot Intensity
Determine Backgroundand Correct Intensity
Background Calculation
Background intensity is calculated as themedian pixel intensity from the area within thesquare and outside the spot.
A separate local background is calculated for each spot using the non-spot pixels from it’s square.
localbackground
area
Spot Definition and Calculations
Spot Area, A = number of pixels within the defined spot boundary
BKG = median pixel value withinthe cell (excluding the spot pixels)
Integral = Sum of all spot pixels excluding saturated pixels
Reported “Intensity”=Integral-BKG*A
Spot Integration with Background Correction
Quality Control Issues
Two measures of spot quality are reported by SpotFinder:
• Saturation Factor
• QC Score: reports shape and signal to noise ratio
Saturation Examples
Partially saturated spots can look like this:
saturated area
non-saturated area
Completely saturated spots can look like this:fully saturated spot
Saturation, Pixel Value Limit
Output:pixel value
Input:fluorescencedye light signal
216=65536
Saturation Factor
-Partially saturated spots can be handled in SpotFinder by excluding the saturated pixels from spot area and intensity calculations.
-Fully saturated spots can not be recovered in SpotFinder. In this case rescanning with lower excitation power or PMT gain could be considered.*Faint spots may possibly be lost.
Saturation Factor = (# good pixels in spot)
(total number of spot pixels)
Saturation, RI Plot
RI plot: log(IB/IA) vs 1/2log(IA*IB)
clearly displays the saturation limits
Quality Control, QC Score
A QC Score is generated for each spot andis based on the spot shape and a measure ofsignal to noise ratio.
shape signal/noise shape signal/noise
QCA QCB
QC Score
Spot Shape Parameter
Shape Factor = (Spot Area/Perimeter)
Spots with large perimeters relative to spotarea will have a low shape factor.
Signal to Noise Ratio
med(BKG)
0
Pix
el V
alu
es
*med(BKG) + * SD(BKG)
S/N factor = fraction of spot pixelsexceeding:
216
SD(BKG)
Quality Control Calculation
QC Score = (QCA+QCB)/2
QCA=sqrt(QC shape*QC S/N) for channel A
QCB=sqrt(QC shape*QC S/N) for channel B
Quality Control, RI Plot
RI plot:
log(IB/IA) vs1/2log(IA*IB)
plotted for means shows clearly low intensity distortion due to background overestimation.
Data from earlier slide processed without QC filter
Quality Control
(data provided by E. Snesrud)
Quality Control
(data provided by E. Snesrud)
A - Spot area is larger than 50 pixels
B - Spot area is between 30 and pixels
C - spot area is smaller than 30 pixels
X - Spot rejected by QC based on spot shape
and spot intensity relative to surrounding background
U - Spot rejected (“flagged”) by user
Y - Bad spot, background is higher than spot intensity
Z - Spot was not detected by the program
S - Warning: some spot pixels are saturated
SpotFinder Flag Descriptions
UID Unique identifier for this spot
IA Intensity value in channel A
IB Intensity value in channel B
R Row (slide row)
C Column (slide column)
MR Meta-row (block row)
MC Meta-column (block column)
SR Sub-row
SC Sub-column
Output data (.mev) per spot:
FlagA TIGR Spotfinder flag value in channel A
FlagB TIGR Spotfinder flag value in channel B
SA Actual spot area (in pixels)
SF Saturation factor
QC Cumulative quality control score
QCA Quality control score in channel A
QCB Quality control score in channel B
Output data (.mev) per spot:
BkgA Background value in channel A
BkgB Background value in channel B
SDA Standard deviation for spot pixels in channel A
SDB Standard deviation for spot pixels in channel B
SDBkgA Standard deviation of the background in channel A
SDBkgB Standard deviation of the background in channel B
Output data (.mev) per spot:
MedA Median intensity value in channel A
MedB Median intensity value in channel B
MNA Mean intensity value in channel A
MNB Mean intensity value in channel B
X/Y X resp. Y coordinates of the spot cell
PValueA P-value in channel A
PValueB P-value in channel B
DBID Data Base ID (if UID is substituted)
Output data (.mev) per spot: