ieee transactions on information …pleiad.umdnj.edu/~will/ibm/pdfs/ieee.trans-2004.pdf ·...

8
IEEE Proof IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 8, NO. 2, JUNE 2004 1 A Prototype for Unsupervised Analysis of Tissue Microarrays for Cancer Research and Diagnostics Wenjin Chen, Michael Reiss, and David J. Foran, Member, IEEE Abstract—The tissue microarray (TMA) technique enables re- searchers to extract small cylinders of tissue from histological sec- tions and arrange them in a matrix configuration on a recipient paraffin block such that hundreds can be analyzed simultaneously. TMA offer several advantages over traditional specimen prepara- tion by maximizing limited tissue resources and providing a highly efficient means for visualizing molecular targets. By enabling re- searchers to reliably determine the protein expression profile for specific types of cancer, it may be possible to elucidate the mech- anism by which healthy tissues are transformed into malignan- cies. Currently, the primary methods used to evaluate arrays in- volves the interactive review of TMA samples while they are viewed under a microscope and subjectively evaluated and scored by a technician. This process is extremely slow, tedious, and prone to error. In order to facilitate large-scale, multiinstitutional studies of a more automated and reliable means for analyzing TMAs is needed. We report here a web-based prototype which features au- tomated imaging, registration, and distributed archiving of TMAs in multiuser network environments. The system utilizes a prin- cipal color decomposition approach to identify and characterize the predominant staining signatures of specimens in color space. This strategy was shown to be reliable for detecting and quanti- fying the immunohistochemical expression levels for TMAs. Index Terms—Automated analysis, breast cancer, tissue mi- croarrays (TMAs). I. INTRODUCTION T HE TISSUE microarray (TMA) represents a powerful new technology designed to efficiently and economically as- sess the expression of proteins or genes across large sets of tissue specimens assembled onto a single glass microscope slide [1], [2]. This new technique should not be confused with DNA mi- croarrays, in which each tiny spot on the grid represents a unique cloned complementary DNA (cDNA) or oligonucleotide. One of the advantages of TMA technology is that it allows amplifica- tion of limited tissue resources by providing the means for pro- ducing large numbers of small core biopsies, rather than a single section. Using this technology, a carefully planned array can be constructed with cases from pathology tissue block archives, such that a 20-year survival analysis can be performed on a co- hort of 600 or more patients using only a few microliters of Manuscript received December 15, 2002; revised October 3, 2003. This work was supported in part by NIH Contract 1 RO1 LM007455-01A1 from the Na- tional Library of Medicine, and in part by the Cancer Institute of New Jersey. W. Chen and D. J. Foran are with the Center of Biomedical Imaging and Informatics, University of Medicine and Dentistry of New Jersey, Piscataway, NJ 08854 USA (e-mail: [email protected], [email protected]). M. Reiss is with the Cancer Institute of New Jersey, University of Medicine and Dentistry of New Jersey, New Brunswick, NJ 08901 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TITB.2004.828891 antibody. Another major advantage of the TMA technique is the fact that each specimen is treated in an identical manner. Consequently, reagent concentrations are consistent across discs within each TMA specimen, as are the incubation times, temper- atures, and washing conditions. Using conventional protocols, a study composed of 300 tissue samples would involve processing of 300 slides, which is at least 20 batches of 15 slides. Using TMAs, the entire cohort can be processed on a single slide. Currently, the primary methods used to evaluate the arrays involve manual interactive review of TMA samples while they are subjectively evaluated and scored. An alternative, but less utilized, approach for evaluation is to sequentially digitize each specimen for subsequent semiquantitative assessment [3]. Both procedures ultimately involve the interactive evaluation of TMA samples which is a slow, tedious process that is prone to error. For many researchers and technicians, even just simply navigating among the regularly arranged tissue cores under a microscope makes it difficult to keep track of one’s current disc position within the array. This is especially problematic at high magnifications. To address these issues, the system that we have developed features computer-assisted navigation tools for traversing the specimen. Beyond the algorithmic and software development that is re- quired for analyzing TMAs, reliable tools are also needed to fa- cilitate large-scale multisite collaboration for a broad spectrum of research and clinical activities including tissue banking, pro- teomics, and outcome studies. Future progress in several key areas will rely upon the capacity of individuals to dynamically acquire, share, and assess microarrays and correlated data. We have developed a web-based prototype for automatically imaging, analyzing, and archiving TMAs. The system consists of a robotic microscope interfaced with a JAVA-based micro- controller and an imaging workstation. The software is both platform- and operating-system-independent, and with minor modifications can interface with any commercially available robotic microscopy equipment. The system utilizes a combina- tion of sophisticated image processing and pattern recognition strategies to coregister specimens while the software directs a robotic microscope to systematically image specimens at mul- tiple optical magnifications, delineate array discs, extract spec- tral and spatial signatures of the specimen, and populate local and/or distributed relational databases with the resulting data in- cluding pointers to imaged arrays. The prototype features both stand-alone and network modes. A visually intuitive interface was developed to enable local and remote users to manipulate digitized arrays to facilitate reorganization of specimens to sup- port new experiments and to provide a means for data assimi- lation. The system utilizes color decomposition of the staining 1089-7771/04$20.00 © 2004 IEEE

Upload: dokhanh

Post on 27-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON INFORMATION …pleiad.umdnj.edu/~will/IBM/pdfs/IEEE.Trans-2004.pdf · Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders

IEEE

Proo

f

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 8, NO. 2, JUNE 2004 1

A Prototype for Unsupervised Analysis of TissueMicroarrays for Cancer Research and Diagnostics

Wenjin Chen, Michael Reiss, and David J. Foran, Member, IEEE

Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders of tissue from histological sec-tions and arrange them in a matrix configuration on a recipientparaffin block such that hundreds can be analyzed simultaneously.TMA offer several advantages over traditional specimen prepara-tion by maximizing limited tissue resources and providing a highlyefficient means for visualizing molecular targets. By enabling re-searchers to reliably determine the protein expression profile forspecific types of cancer, it may be possible to elucidate the mech-anism by which healthy tissues are transformed into malignan-cies. Currently, the primary methods used to evaluate arrays in-volves the interactive review of TMA samples while they are viewedunder a microscope and subjectively evaluated and scored by atechnician. This process is extremely slow, tedious, and prone toerror. In order to facilitate large-scale, multiinstitutional studiesof a more automated and reliable means for analyzing TMAs isneeded. We report here a web-based prototype which features au-tomated imaging, registration, and distributed archiving of TMAsin multiuser network environments. The system utilizes a prin-cipal color decomposition approach to identify and characterizethe predominant staining signatures of specimens in color space.This strategy was shown to be reliable for detecting and quanti-fying the immunohistochemical expression levels for TMAs.

Index Terms—Automated analysis, breast cancer, tissue mi-croarrays (TMAs).

I. INTRODUCTION

THE TISSUE microarray (TMA) represents a powerful newtechnology designed to efficiently and economically as-

sess the expression of proteins or genes across large sets of tissuespecimens assembled onto a single glass microscope slide [1],[2]. This new technique should not be confused with DNA mi-croarrays, in which each tiny spot on the grid represents a uniquecloned complementary DNA (cDNA) or oligonucleotide. Oneof the advantages of TMA technology is that it allows amplifica-tion of limited tissue resources by providing the means for pro-ducing large numbers of small core biopsies, rather than a singlesection. Using this technology, a carefully planned array canbe constructed with cases from pathology tissue block archives,such that a 20-year survival analysis can be performed on a co-hort of 600 or more patients using only a few microliters of

Manuscript received December 15, 2002; revised October 3, 2003. This workwas supported in part by NIH Contract 1 RO1 LM007455-01A1 from the Na-tional Library of Medicine, and in part by the Cancer Institute of New Jersey.

W. Chen and D. J. Foran are with the Center of Biomedical Imaging andInformatics, University of Medicine and Dentistry of New Jersey, Piscataway,NJ 08854 USA (e-mail: [email protected], [email protected]).

M. Reiss is with the Cancer Institute of New Jersey, University of Medicineand Dentistry of New Jersey, New Brunswick, NJ 08901 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TITB.2004.828891

antibody. Another major advantage of the TMA technique isthe fact that each specimen is treated in an identical manner.Consequently, reagent concentrations are consistent across discswithin each TMA specimen, as are the incubation times, temper-atures, and washing conditions. Using conventional protocols, astudy composed of 300 tissue samples would involve processingof 300 slides, which is at least 20 batches of 15 slides. UsingTMAs, the entire cohort can be processed on a single slide.

Currently, the primary methods used to evaluate the arraysinvolve manual interactive review of TMA samples while theyare subjectively evaluated and scored. An alternative, but lessutilized, approach for evaluation is to sequentially digitize eachspecimen for subsequent semiquantitative assessment [3]. Bothprocedures ultimately involve the interactive evaluation ofTMA samples which is a slow, tedious process that is prone toerror. For many researchers and technicians, even just simplynavigating among the regularly arranged tissue cores under amicroscope makes it difficult to keep track of one’s currentdisc position within the array. This is especially problematic athigh magnifications. To address these issues, the system thatwe have developed features computer-assisted navigation toolsfor traversing the specimen.

Beyond the algorithmic and software development that is re-quired for analyzing TMAs, reliable tools are also needed to fa-cilitate large-scale multisite collaboration for a broad spectrumof research and clinical activities including tissue banking, pro-teomics, and outcome studies. Future progress in several keyareas will rely upon the capacity of individuals to dynamicallyacquire, share, and assess microarrays and correlated data.

We have developed a web-based prototype for automaticallyimaging, analyzing, and archiving TMAs. The system consistsof a robotic microscope interfaced with a JAVA-based micro-controller and an imaging workstation. The software is bothplatform- and operating-system-independent, and with minormodifications can interface with any commercially availablerobotic microscopy equipment. The system utilizes a combina-tion of sophisticated image processing and pattern recognitionstrategies to coregister specimens while the software directs arobotic microscope to systematically image specimens at mul-tiple optical magnifications, delineate array discs, extract spec-tral and spatial signatures of the specimen, and populate localand/or distributed relational databases with the resulting data in-cluding pointers to imaged arrays. The prototype features bothstand-alone and network modes. A visually intuitive interfacewas developed to enable local and remote users to manipulatedigitized arrays to facilitate reorganization of specimens to sup-port new experiments and to provide a means for data assimi-lation. The system utilizes color decomposition of the staining

1089-7771/04$20.00 © 2004 IEEE

Page 2: IEEE TRANSACTIONS ON INFORMATION …pleiad.umdnj.edu/~will/IBM/pdfs/IEEE.Trans-2004.pdf · Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders

IEEE

Proo

f

2 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 8, NO. 2, JUNE 2004

characteristics of specimens to reliably detect and quantify theimmunohistochemical expression levels of tissue discs withinthe array.

II. MATERIALS

Four TMAs consisting of approximately 130 discs each weregenerated from 57 primary breast carcinomas and 16 associatednormal breast tissue specimens. All specimens were obtainedfrom women under the age of 35. Consecutive microarray sec-tions were stained with antibodies for phosphorylated Smad2,and Smad4, and counterstained with hematoxylin. A fourth slidewas stained with hematoxylin alone. All tissue array discs wereevaluated by pathologists under microscope, as reported in aprevious study [4].

The prototype TMA analysis system was developed using anOlympus AX70 microscope (Olympus America Inc., Melville,NY) equipped with a Prior six-way robotic stage and motorizedturret (Prior Scientific, Inc., Rockland, MA). The server work-station was developed using a standard Pentium II computer,equipped with 256 MB of random access memory (RAM), anda Windows 98 operating system (Microsoft, Corp., Redmond,WA). The TMA software automatically images and digitizesthe tissue micoarrays using an Olympus DC330 720-line,3-Chip video camera (Olympus America Inc., Melville, NY)and a Flashpoint 128 high-resolution frame grabber (IntegralTechnologies, Inc., Indianapolis, IN).

A JAVA-based microscope controller was developed to allowremote users to control the movement of the stage, selection ofobjectives, adjustment of illumination conditions, shutter speed,gain on the video camera, and entropy-based autofocusing op-tion which was developed for the system. During the entirecourse of these operations, remote users are provided with up-dated broadcasts of the digitized specimen across networks (In-ternet, local area network, or wide area network) while it is ma-nipulated.

Middleware was developed to provide communicationsamong the robotics, the image processing modules, and anOracle 8i Database Management System (Redwood City, CA).The database resides on a networked Pentium III computerequipped with 256 MB of RAM, and a Windows NT 4.0workstation operating system (Microsoft, Corp., Redmond,WA).

Based upon preliminary experiments, it is recommended thatclient computers be equipped with a minimum of 64 MB ofRAM and a clock speed of at least 200 MHz. The client applica-tion has been successfully tested on Windows (Microsoft, Corp.,Redmond, WA), Solaris (Sun Microsystems, Inc., Santa Clara,CA), RedHat Linux (RedHat, Inc., Raleigh, NC) and MacintoshOS X (Apple, Cupertino, CA).

III. METHODS

A. Unsupervised Array Registration

In order to develop a reliable means for performing unsuper-vised registration of arrays, it was necessary to devise an algo-rithm which could accurately extract the exact grid location ofeach disc throughout the specimen. To achieve this objective,

the system performs a low-resolution pilot scan while gener-ating an image map of the array. During the course of the scan,the system automatically locates, delineates, and indexes eachdisc using column and row indexes.

1) Protocol for Unsupervised Array Registration: The op-tical and mechanical components of the system are calibratedto ensure accurate stage locations and measurements. Slight er-rors in lens cofocal and cocentering are compensated for usingempirical data derived from previous experiments.

A quilted digital version of the array is automatically gen-erated by the system using slightly overlapping frames of con-secutive optical fields to stitch together the composite image.Image quality was maintained throughout the studies using en-tropy-based autofocusing.

The size of tissue discs, in the image map, is computed usingthe core diameter of the physical array, the microscope mag-nification, and the scan settings. A disc template is automat-ically generated by encoding 1s onto the area correspondingto specimen discs. The template is completed by inserting atwo-pixel-wide boundary of 1s around the cluster of 1s [5].

Disc centroids are identified by convolving the map imagewith the disc template and then applying a Mexican-hat operatorand spatial filtering to ensure only one centroid point is gener-ated for each disc.

The grid structure of the microarray is determined by de-tecting the alignment of candidate disc locations, as describedabove, and utilizing a modified Hough transformation, as de-scribed in Section IV.

2) Alignment of Candidate Discs: One way to detect straightlines in Cartesian coordinate systems is to map candidate lines

from each pair of candidate points intoin Hough space [6]. Utilizing two-dimensional (2-D) peak de-tection algorithms, it is then possible to identify points whichexhibit local maximums corresponding to lines in the originalimage.

Due to the grid-like appearance of TMAs, when the resultingHough space is projected onto the axis, there are two peakswhich can be detected 90 from one another. They correspondedto the column and row orientation (Fig. 1), i.e., the overall ro-tation of the array. The intercepts are then computed by least-square fitting. By doing so, the 2-D peak detection process issimplified into two one-dimensional operations, and computa-tion is thereby reduced.

B. Principal Color and Color Decomposition

As in most immunohistochemical applications, the TMA im-ages in our studies exhibit multiple shades and combinations oftwo or more distinct dyes. In this particular example, the speci-mens were stained with DAB <<AUTHOR: PLEASE SPELLOUT DAB>> chromogen and counterstained with hematoxylin.

Fig. 2 shows the plots for approximately 500 000 pixelcolors which were randomly selected from a representativeimage TMA. When viewing Fig. 2, please note that each of theplots (a, b, c) are based upon projections of three-dimensional(3-D) distributions, therefore, those color points which lieclosest to the point of viewing tend to obscure those whichare located behind them. Each of the plots have been rotatedto provide an optimal viewing angle in order to demonstrate

Page 3: IEEE TRANSACTIONS ON INFORMATION …pleiad.umdnj.edu/~will/IBM/pdfs/IEEE.Trans-2004.pdf · Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders

IEEE

Proo

f

CHEN et al.: PROTOTYPE FOR UNSUPERVISED ANALYSIS OF TMAs FOR CANCER RESEARCH AND DIAGNOSTICS 3

Fig. 1. Unsupervised array registration. (a) Superimposed onto the map image, the stars show the detected candidate center points for each disc, and the intersectinglines show the orientation of rows and columns. (b) Angle-count diagram showing two peaks 90 apart corresponding to the row and column orientations withinthe array.

Fig. 2. Color characteristics of immumohistochemical stained TMA specimen shown in different color spaces. (a) RGB color space. (b) L u v color spaceshowing spreads of color points. (c)L h c color representation. (d) Another view angle ofL u v color space. Please note that the solid line in (b) and (d) showsthe PCVs, and thickness of color superplane in inset (d) is mainly caused from the fact that images are acquired in 16-bit color instead of 24 bit.

the degree of separation of colors in each of the three colorspaces [<<AUTHOR: PLEASE SPELL OUT RGB>> (RGB)

, [7]]. The hue axis in Fig. 2(c) is shownin polar representation, thus, the increment at highest end ofthe plot actually connects with the first peak. To exploit the

excellent color separation illustrated in Fig. 2(c), the principalcolor vector (PCV) determinations and color decompositionsare performed in color space as described below.

We have developed protocols to detect and characterize thestaining signatures for each dye within the specimen based upon

Page 4: IEEE TRANSACTIONS ON INFORMATION …pleiad.umdnj.edu/~will/IBM/pdfs/IEEE.Trans-2004.pdf · Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders

IEEE

Proo

f

4 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 8, NO. 2, JUNE 2004

Fig. 3. Example results of color decomposition. (a) One subfield of original disc image from a DAB/hematoxylin stained specimen. (b) The DAB staining mapof (a). (c) The hematoxylin staining map of (a). Please note that some hematoxylin stained nuclei (Arrow 1) are absent in the DAB staining map and some DABstained nuclei (Arrow 2) are absent in hematoxylin map. The majority of nuclei, however, bear a combination of the two stains (Arrow 3). (d) Subfield of originaldisc image from a NovaRed/hematoxylin stained specimen. (e) The NovaRed staining map. (f) The hematoxylin staining map.

a vector decomposition in color space. As shown inFig. 3(b) and (d), color vectors that were generated for a rep-resentative TMA specimen which had been stained with DABand hematoxylin distributes along a hyperplane in colorspace [7]. By performing a polar transformation of thecolor space plot and identifying the two principal peak colors inthis representation, as the PCVs, the stain signaturesof each dye can be extracted. This is accomplished by decom-posing all color vectors according to the two PCVs and a thirdvector which lies perpendicular to both PCVs. The protocol forPCV extraction and color decomposition are as follows.

1) Protocol for Identifying PCVs:

• Software developed in JAVA directs the roboticscope and image acquisition module to systematicallyextract representative chromatic information frompixels throughout the TMA. During feasibility studies500 000 RGB color vectors were generated for arepresentative imaged array. The dataset consisted of140 images (1368 1232 pixels each) correspondingto six digitally quilted microscopic fields of the spec-imen. The images were automatically captured at aninstrument setting of 10 .• The algorithms computes the average background

color and subtracts it from each RGB color vector toproduce background corrected values for each RGBvector. Any pixel which has an adjusted RGB valuewhich falls outside of the 0–255 range are assigned thewhite value 255.• All adjusted color vectors are converted to

color space, and subsequently mapped into

space [7] as and

, where

.

• During feasibility studies, peak detection was ap-plied to the plot, and correspondingvalues for 20 color points on each peak were averagedto produce two PCVs and . Regression analysisis currently being integrated into the algorithms to fur-ther improve accuracy and reproducibility.

2) Protocol for Color Decomposition:

• The values are corrected based upon ab-solute white using .• In order to decompose the 3-D color vector, three

vectors are required, therefore, a third vector is deter-mined perpendicular to the PCVs, using

• Each adjusted color from Step 2.2 is decomposed

into by solving

and indexing the resulted into a data struc-ture which is consistent with the original RGB colorvector. The value has been determined to be negli-gible.

Page 5: IEEE TRANSACTIONS ON INFORMATION …pleiad.umdnj.edu/~will/IBM/pdfs/IEEE.Trans-2004.pdf · Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders

IEEE

Proo

f

CHEN et al.: PROTOTYPE FOR UNSUPERVISED ANALYSIS OF TMAs FOR CANCER RESEARCH AND DIAGNOSTICS 5

• For each disc within the imaged array, corre-sponding values for each pixel correspondto the staining signatures of the two dyes. Please seeSection IV for performance data.

IV. RESULTS

A. Unsupervised Registration

Currently there are relatively few facilities which have thenecessary expertise and specialized equipment to create TMAs.Consequently, histological samples are often forwarded fromsatellite locations to specialized centers for microarray prepa-ration. In order to enable other researchers and clinicians toutilize centralized resources located at the Cancer Institute ofNew Jersey while providing quick access to image data, a dis-tributed telemicroscopy subsystem [8] was integrated with theTMA analysis prototype described above. This hybrid systemenables individuals to operate the systems from remote loca-tions. Whether the user is located locally or is logged in from aremote site, once the microarray has been loaded onto the mi-croscope stage and the “Registration” command is issued, therobotic microscope automatically begins acquiring digital im-ages of slightly overlapping frames of the TMA sample in araster pattern.

The grid recovery module of the software, which wasdescribed in Section III, was shown to automatically correct formechanical distortions while accurately inserting place-holdersat any location within the specimen where discs had beendislodged during the physical processing stage. The systemalso provided reliable reporting for all of the discs within themicroarray which exhibited no expression. Within the graphicaluser interface, true discs and place-holders are encircled withdifferent colors to distinguish one from the other [5]. To testthe prototype system, remote registration was performed onfour TMA sample slides, taken from the same recipient block.Whether a user operates the microscope locally or remotely,the client application receives the scaled frame images andautomatically stitches them together giving rise to a mapimage which subsequently undergoes unsupervised registrationprotocol, as described in Section III. Within seconds, theregistration results are displayed on the client interface whilethe recovered grid is superposed on the original map image.

B. Distributed Imaging

Throughout the performance studies, the unsupervised TMAregistration protocol and the low-resolution pilot scan wereable to reliably recover all grid rotations. Upon completionof the registration procedure, the software directs the roboticmicroscope to systematically digitize each tissue core atmultiple magnifications, which are preselected by the user atthe client-side. Each imaged disc is automatically archivedwhile an Oracle8i database is updated with a pointer to thecorresponding image file consisting of a full-size 24-bit colornoncompressed imaged disc. In the case of remote operation,the images are transmitted to the client computer using TCPnetwork protocol. Each image frame corresponds to one entiretissue disc at an instrument setting of 4 magnification. Whenhigher magnifications are selected, multiple image frames are

transmitted to the client and automatically stitched togetherby the software. Experiments conducted using a TMA with a10 14 grid configuration and an instrument setting of 10 ,required six microscopic frames for each disc. Image data forthe entire specimen occupied more than 600 MB of hard diskspace. Distribution of the data can be accomplished throughdirect Internet connection, or the information can be transferredto a recordable compact disk.

C. Quantitative Interpretation of Protein Expression

Utilizing the color decomposition strategy detailed in SectionIII, each of the discs within the arrays were split into DAB andHematoxylin staining maps based upon their profiles in colorspace. Fig. 3(a)–(c) shows an original stained section of a discand the corresponding output images after they have undergonequantitative analysis. As shown, the staining characteristics ofeach nucleus appears as a continuous representation of stainingintensity for each of the dyes. It is interesting to note that theprotocol that we have reported is able to unveil and quantifythe underlying staining characteristics of even those cells whichsuffer from visual masking due to the counterstaining.

The following measurements are automatically generated foreach tissue disc: 1) integrated staining intensity, which is com-puted as sum of the DAB staining intensity over the entiredisc; 2) effective staining area, which includes measurementsfor only those pixels which express above the threshold . Thesystem labels discs which exhibit an effective staining area ofmore than 10 000 pixels as effectively stained discs; and 3) effec-tive staining intensity, which is computed as the average stainingintensity across the effectively stained pixels. Noneffectivelystained discs are assigned 0 effective staining intensity automat-ically.

Using the measurements described above, we analyzed theavailable breast cancer arrays stained with Smad2, phosphory-lated Smad2, and Smad4. The results of these experiments wereconsistent with those reported in previous human evaluationstudies [4], with most of the cancer and control discs expressingSmad2 and Smad4, and phospho-Smad2 expression observed inboth cancer and control tissue sample (see Table I). Due to theproliferation of epithelial tissue in these specimen, however, theeffective staining areas and, hence, integrated staining intensi-ties of cancer tissue discs were larger than those of normal tissuediscs.

In the TMAs under study, two cores from each cancerspecimen were arranged onto the array. The set of six columnsof discs located on the left side of Fig. 1, are referred to as Set 1and the set of six columns of discs located on the right side arelabeled as Set 2. Discs in Set 1 and Set 2 have one-to-one cor-respondences. Integrated staining intensity, effective stainingarea, and effective staining intensity of discs coming from thesame source were plotted in Fig. 4 with Set 1 on the ordinateand Set 2 on the abscissa. Referring to the graph, data pointsbetween the two dashed lines were within 20% vicinity to eachother. As can be seen from the Fig. 5, whereas, measurementsfor integrated staining intensity and effective staining areavaries between discs originating from same tissue source,effective staining intensities were well correlated, suggestingthat tissue discs coming from the same tissue source tends to

Page 6: IEEE TRANSACTIONS ON INFORMATION …pleiad.umdnj.edu/~will/IBM/pdfs/IEEE.Trans-2004.pdf · Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders

IEEE

Proo

f

6 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 8, NO. 2, JUNE 2004

TABLE ISTATISTICS ON MEASUREMENTS OF TISSUE ARRAY SPECIMEN

* The mean and standard deviation (number in parenthesis) of effective statining intensity on effectively stained discs.

** Count of discs that are effectively stained over total number of tissue discs present on the specimen. When an entire of a large portion of

a tissue disc is missing from the specimen, the disc is not counted.

Fig. 4. Correlation of tissue discs which came from the same cancer tissue source on measurements: (a) Integrated staining intensity. (b) Effective staining area.(c) Effective staining intensity. Points between the dashed lines are considered to be within 20% vicinity to each other (please see text).

have similar levels of staining. Fig. 5 shows an example of apair of tissue discs coming from the same tissue source havingdifferent integrated staining area but similar effective stainingintensity resulting in different integrated staining intensities.Please also note that, on the Smad4 specimen, tissue discsfrom Set 1 were shown to have slightly higher staining levelthan Set 2 suggesting uneven staining of the specimen on thelong axis. This observation was subsequently confirmed byhuman inspection after the phenomenon had been detectedquantitatively.

D. Database Design

The Oracle 8i database which is used to house the TMA dataconsists of a physical specimen layer (PSL), a digital samplelayer (DSL), and a quantification layer (QL).

The PSL of the database relates to the construction and prepa-ration of the actual TMA sample. The specific data which are

housed in this layer are referred to as the “array profile.” Thespecific fields which are contained in this layer include 1) therecipient array format information, e.g., array dimensionality,cylinder diameter, and interval; 2) the donor block information;and 3) the array construction information which records the cor-respondence between the specific cylinder grid location and itsdonor. A visually intuitive array profile editor has been devel-oped to facilitate the design, editing, and managing of array pro-files.

The DSL of the database stores pointers to the digital im-ages of the constituent tissue discs archived at multiple resolu-tions along with the correlated disc locations (within the TMA),scan settings at the time of acquisition and corresponding imagemaps.

Since the TMA technique results in a standardized set oftissue samples, it provides an ideal data set for developing andevaluating the use of image processing and computer visionprotocols [8] for their reliability in performing quantitative im-muno-histochemistry. The third layer of the database, the QL,

Page 7: IEEE TRANSACTIONS ON INFORMATION …pleiad.umdnj.edu/~will/IBM/pdfs/IEEE.Trans-2004.pdf · Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders

IEEE

Proo

f

CHEN et al.: PROTOTYPE FOR UNSUPERVISED ANALYSIS OF TMAs FOR CANCER RESEARCH AND DIAGNOSTICS 7

Fig. 5. Due to heterogeneity of tissue composition, two discs that came from the same tissue source can present similar effective staining intensity but differenteffective staining area and integrated staining intensity. Shown in (a) and (b) are two discs that came from the same tissue source and stained for Smad2, respectively,having the same effective staining intensity (e), but different integrated staining intensities (c) due to differences in effective stained area (d).

supports automated segmentation and computation of proteinexpression levels across each disc, as specified in Section III.

V. DISCUSSION AND FUTURE DIRECTIONS

Although some DNA microarray readers are capable ofreading TMAs, automatic imaging and evaluation of TMAsamples present a host of unique technical challenges. First,TMA samples often exhibit morphological irregularities. Forexample, aside from the overall rotation of the grid on the slide,tissue discs are sometimes shifted from regular grid positionsas a result of mechanical deformations which can arise duringconstruction of the physical TMA. Such artifacts are oftenintroduced during the slicing stage of specimen preparation.In addition, discs occasionally become detached and fall outduring preparation. To address these issues, it was necessaryto develop a robust alignment algorithm which could reliablyrecover sample grids and compensate for detached discs.

Another important distinction which makes the analysis ofTMAs especially challenging is that unlike cDNA microarrays,which can generally be considered homogeneous across a givenwell, the discs comprising an array typically consist of a set ofheterogeneous stained tissues which essentially renders simpleimage analysis strategies completely inadequate for any mean-ingful assessment. The analysis is further complicated by thefact that depending on the type of tumor or tissue section ana-lyzed, the area of interest may represent nearly the entire disc oronly a small percentage thereof. For example, a pancreatic car-cinoma or lobular carcinoma of the breast with substantial dys-plastic response may show stromal tissue representing a largepercentage of the total area of the disc. If the goal of the assayis to determine epithelial cell expression of a given marker, aprotocol must be used that evaluates only that component of thedisc. The protocol must not only be able to identify the regionof interest but it must also perform normalization operations so

that the expression level read from any given disc can be com-pared with those reported for others.

Modern cancer research will rely increasingly on large-scaleimmunohistochemical studies to reveal the underlying expres-sion patterns of malignancies. However, such studies are diffi-cult to conduct due to the current lack standards in the prepara-tion of tissue microarrrays and because of the limited reliabilityof the available methods used for quantitative analysis.

The heterogeneous nature of histological samples makes itnecessary to utilize a number of different criteria in order to ac-curately evaluate immunohistochemical staining characteristics[9]. For example, at the specimen level, it is important to deter-mine the integrated staining intensity, the stained area for whichpixel color exceeds a specific threshold value, and the averagestaining intensity within the stained region. At the tissue level, itwould be desirable to be able to reliably distinguish among thestaining characteristics of each type of tissue and microanatomicstructure while computing each of those staining intensities sep-arately. At the cellular level, it is informative to report the pro-portion of positively stained cells. Although it was beyond thescope of this current study, work is underway in our lab to de-velop a set of optimized algorithms which can accurately reportthe exact location of staining antigens within the nucleus andcytoplasm since both can potentially lead to further diagnosticand research insight.

Our group is currently exploring the use of the color decom-position algorithms for a broader set of immumohistochemicalstaining agents. Recent feasibility studies showed that this ap-proach was effective in analyzing histological specimens stainedwith Nova Red and Vector Red against hematoxylin [pleaserefer to Fig. 3(d) and (e)] in spite of the fact that these stainsexhibit closer color temperatures than the original DAB exper-iments which were detailed in this manuscript. One of the ad-vantages of the decomposition algorithm is that it does not forceeach pixel within an imaged specimen to be classified as having

Page 8: IEEE TRANSACTIONS ON INFORMATION …pleiad.umdnj.edu/~will/IBM/pdfs/IEEE.Trans-2004.pdf · Abstract—The tissue microarray (TMA) technique enables re-searchers to extract small cylinders

IEEE

Proo

f

8 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 8, NO. 2, JUNE 2004

been stained by only one dye or another as some recently re-ported color-thresholding approaches do [10]. The inherent lim-itation of those thresholding strategies is that they cause the non-dominant color information to be lost. Furthermore, there is noproven method to achieve threshold reproducibly when usingsuch methods. Limitation of the algorithm is that since it sepa-rates colors based on characteristics of the constituent stains,this approach would not be appropriate for use with stainingagents in which the staining colors fluctuate as a function ofchanges in local chemical conditions across the specimen, e.g.,Giemsa stains.

Throughout the course of our study, both cancer and con-trol specimens stained positively; however, cancer tissue discswere shown to have larger effective staining area and integratedstaining intensity. Our experiments further showed that tissuediscs originating from the same tissue source had correlated ef-fective staining intensity measures. One specimen used in ourexperiment presented slight uneven staining across long axis ofthe slide; this was reflected in results of the correlation study. Inorder to accurately justify for such situations in future experi-ments, we recommend that positive control tissue cores be usedthroughout a given TMA specimen.

While we are encouraged by the results of the studies that wehave conducted thus far, we are planning to continue testing andrefining the algorithms that were used to quantify expressionlevels. In addition, we plan to explore the use of advanced com-puter vision techniques to discriminate among specific stainingprofiles in order to improve reliability in subcellular localiza-tion, and comparison of nuclear and membranous staining pat-terns.

REFERENCES

[1] J. Kononen et al., “Tissue microarrays for high-throughput molecularprofiling of tumor specimens,” Nature Med., vol. 4, pp. 844–844, 1998.

[2] D. L. Rimm, R. L. Camp, L. A. Charette, J. Costa, D. A. Olsen, and M.Reiss, “Tissue microarray: A new technology for amplification of tissueresources,” Cancer J., vol. 7, no. 1, pp. 24–31, 2001.

[3] N. R. Mucci, G. Akdas, S. Manely, and M. Rubin, “Neuroendocrineexpression in metastatic prostate cancer: Evaluation of high throughputtissue microarrays to detect heterogeneous protein expression,” HumanPathology, vol. 31, no. 4, pp. 406–414, 2000.

[4] W. Xie, J. C. Mertens, D. J. Reiss, D. L. Rimm, R. L. Camp, B. G. Haffty,and M. Reiss, “Alterations of Smad signaling in human breast carcinomaare associated with poor outcome: A tissue microarray study,” CancerResearch, vol. 62, no. 2, pp. 497–505, Jan. 2002.

[5] W. Chen, D. J. Foran, and M. Reiss, “Unsupervised imaging, registrationand archiving of tissue microarrays,” in Proc. AMIA 2002 Symp., pp.136–139.

[6] J. C. Ross, Image Processing Handbook, 2nd ed. Boca Raton, FL:CRC, 1995.

[7] G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods,Quantitative Data and Formulai. New York: Wiley, 1982, pp.165–168.

[8] D. J. Foran, D. Comaniciu, P. Meer, and L. A. Goodell, “Computer-as-sisted discrimination among malignant lymphomas and leukemiausing immunophenotyping, intelligent image repositories, and telemi-croscopy,” IEEE Trans. Inform. Technol. Biomed., vol. 4, pp. 265–273,Dec. 2000.

[9] T. Seidal, A. J. Balaton, and H. Battifora, “Interpretation and quantifica-tion of immunostains,” Amer. J. Surgical Pathology, vol. 25, no. 9, pp.1204–1207, 2001.

[10] F. O. Ranelletti, G. Almadori, B. Rocca, G. Ferrandina, G. Ciabattoni,and A. Habib, “Prognostic significance of cyclooxygenase-2 in Laryn-geal Squamous cell carcinoma,” Int. J. Cancer (Pred. Oncol.), vol. 95,pp. 343–349, 2001.

Wenjin Chen received the Bachelor of Medicinedegree from Beijing Medical University, Beijing,China, in 1997. She is currently working toward thePh.D. degree in computational molecular biologyat the University of Medicine and Dentistry of NewJersey (UMDNJ), Piscataway, NJ.

Her research interests include biomedical imaging,image pattern recognition, biomedical instrumenta-tion, and informatics.

Michael Reiss received the medical degree from theUniversity of Amsterdam, The Netherlands, in 1976.

From 1977 to 1982, he served as a Resident inInternal Medicine at University Hospital Binnen-gasthuis and Academic Medical Center, Amsterdam,The Netherlands. He became a Fellow in the MedicalOncology Program at Yale University School ofMedicine, New Haven, CT, in 1985. He joined thefaculty at the Department of Internal Medicine,Yale University School of Medicine, from 1985to 2000, and was recruited by the Cancer Institute

of New Jersey (CINJ) at the University of Medicine and Dentistry of NewJersey-Robert Wood Johnson Medical School in January 2001, where he holdsthe rank of Professor in Medicine and in Molecular Genetics and Microbiology.He is the Director of the Breast Cancer Research Program, Director of theTissue Microarray Shared Resource, and Codirector of Translational Resarchat CINJ. His expertise lies primarily in the area of the cellular and molecularbiology of keratinocytes and human squamous carcinomas, as well as the roleof TGFß in human cancer.

David J. Foran (S’89–M’91) received the B.S. de-gree from Rutgers University, New Brunswick, NJ,in 1983, and the Ph.D. degree in biomedical engi-neering from the University of Medicine and Den-tistry of New Jersey (UMDNJ) and Rutgers Univer-sity, Piscataway, NJ, in 1992.

He served as a Physics Instructor at New JerseyInstitute of Technology, Newark, NJ, from 1984 to1985, and worked as a Junior Scientist at Johnson &Johnson Research, Inc., North Brunswick, NJ, from1986 to 1988. He received one year of postdoctoral

training at the Department of Biochemistry at UMDNJ-Robert Wood JohnsonMedical School (RWJMS), in 1993. He joined the faculty at RWJMS in 1994,where he is currently an Associate Professor of Pathology and Radiology andthe Director of the interdepartmental Center for Biomedical Imaging and In-formatics. He also serves as the Associate Director for Research for the uni-versity-wide Informatics Institute. He is a Member of the Graduate Faculty inthe Program in Computational Molecular Biology and Genetics and he is a Re-search Associate Professor at the Center for Advanced Information Processing,both at Rutgers University. His research interests include quantitative, biomed-ical imaging, computer-assisted diagnosis, and medical informatics.