unsupervised image classification over supercomputers ...mqhuang/papers/2014_image... · academic...

This article was downloaded by: [University of Arkansas Libraries - Fayetteville]On: 30 May 2014, At: 12:00Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

GIScience & Remote SensingPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tgrs20

Unsupervised image classification oversupercomputers Kraken, Keenelandand BeaconXuan Shia, Miaoqing Huangb, Haihang Youc, Chenggang Laib &Zhong Chend

a Department of Geosciences, University of Arkansas,Fayetteville, AR 72701, USAb Department of Computer Science and Computer Engineering,University of Arkansas, Fayetteville, AR 72701, USAc Institute of Computing Technology, Chinese Academy ofSciences, Beijing, Chinad School of Computational Science and Engineering, GeorgiaInstitute of Technology, Atlanta, GA 30332, USAPublished online: 28 May 2014.

To cite this article: Xuan Shi, Miaoqing Huang, Haihang You, Chenggang Lai & Zhong Chen (2014)Unsupervised image classification over supercomputers Kraken, Keeneland and Beacon, GIScience &Remote Sensing, 51:3, 321-338

To link to this article: http://dx.doi.org/10.1080/15481603.2014.920229

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,

http://www.tandfonline.com/loi/tgrs20

http://dx.doi.org/10.1080/15481603.2014.920229

systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Unsupervised image classification over supercomputers Kraken,Keeneland and Beacon

Xuan Shia*, Miaoqing Huangb, Haihang Youc, Chenggang Laib and Zhong Chend

aDepartment of Geosciences, University of Arkansas, Fayetteville, AR 72701, USA; bDepartment ofComputer Science and Computer Engineering, University of Arkansas, Fayetteville, AR 72701,USA; cInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, China; dSchool ofComputational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

(Received 18 October 2013; accepted 21 April 2014)

The iterative self-organizing data analysis technique algorithm (ISODATA) wasimplemented over supercomputers Kraken, Keeneland and Beacon to explore scalableand high-performance solutions for image processing and analytics using emergingadvanced computer architectures. When 10 classes are extracted from one 18-GBimage tile, the calculation can be reduced from several hours to no more than90 seconds when 100 CPU, GPU or MIC processors are utilized. High-performancescalability tests were further implemented over Kraken using 10,800 processors toextract various number of classes from 12 image tiles totalling 216 gigabytes. As thefirst geospatial computations over GPU clusters (Keeneland) and MIC clusters(Beacon), the success of this research illustrates a solid foundation for exploring thepotential of scalable and high-performance geospatial computation for the nextgeneration cyber-enabled image analytics.

Keywords: high-performance computing; image classification; GPU; MIC

1. Introduction

Satellite imagery and aerial photos have been a source of geospatial data that areextensively utilized in geospatial analytics for a variety of research activities related tothe dynamics of coupled natural environment and human systems. With the rapid advancein sensor technology development, many satellites, such as SPOT-5, IKONOS-2,QuickBird, WorldView-2, etc., can generate high spatial resolution images that arecompetitive with aerial photos (Jacobsen 2011). Such fast developments and an accom-panying increase in the amount of image data leads to a variety of research challenges inimage processing and analytics using high spatial resolution remotely sensed data.

Today, advanced computer architectures and high-performance computing technolo-gies have been utilized in processing high spatial resolution imagery and hyperspectraldata with notable success. Scalability of image processing is often achieved throughvarious applications of batch processing. However, there still remains a great challengein both geographic information science (GIScience) and remote sensing to achieve bothhigh performance and scalability in many applications. This challenge is especiallymanifest when dependency between data and computation leads to specific communica-tion requirements throughout the course of problem solving. High performance can oftenbe achieved by deploying more computing resources to accelerate the calculation

*Corresponding author. Email: [email protected]

GIScience & Remote Sensing, 2014Vol. 51, No. 3, 321–338, http://dx.doi.org/10.1080/15481603.2014.920229

© 2014 Taylor & Francis

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

processes. However, when data and the computing process have a strong dependencyamong different segments of data or computational procedures, implementing solutionsare not as easily scalable. Solutions that are applicable to process small-scale data in abatch mode may not generate quality products that are consistent and comparable to theproduct derived from a single process. Image classification is a prime example and thefocus of this article. Although the quality of classification output of each individual pieceof data or imagery may be logical, the results when individual outputs are combined maynot be consistent and correct. This is in direct contrast to the output generated from asingle process. The classification result has strong dependency upon the global image datainvolved in the calculation process. Although algorithms for image classification may beimproved to achieve high performance, many of them are not easily scalable to handle alarger study area of interest.

Cyberinfrastructure (CI), a term coined by the National Science Foundation (NationalScience Foundation 2007) covering advanced and powerful computing systems withscalable data storage, repositories and visualization environments, is a key resource toenable geospatial computation with scalability and high performance. Although CI ischaracterized by its massively parallel computing environment, geographic informationsystems (GIS) and remote sensing software have been mainly developed for desktopapplications using sequential algorithms. A number of research challenges are arising inresponse to the emerging computing infrastructure and the solution may have to beexplored even at the interface between software and hardware, with significant softwarere-design and re-engineering expected.

Three supercomputers were deployed in this pilot study. Kraken was the first super-computer to achieve a petaflop (1015 floating point operations per second) capability foracademic research and employs more than 112,896 processors. Keeneland was the firsthybrid computer system to employ 240 central processing units (CPUs) and 360 graphicprocessing units (GPUs) using the Keeneland Initial Delivery System (KIDS). TheBeacon system has 48 compute nodes and each node is equipped with two 8-coreprocessors and four many-integrated core (MIC) Xeon Phi coprocessors (60 cores each)for a total of 768 (48 × 16) conventional cores and 11,520 (48 × 4 × 60) coprocessors oraccelerator cores. We successfully implemented the iterative self-organizing data analysistechnique algorithm (ISODATA) for unsupervised image classification over one 18-GBimage tile at a spatial resolution of 0.5 meter through the message passing interface (MPI;Gropp, Lusk, and Skjellum 1994; Pacheco 1997; Snir et al. 1995) + CPU, MPI + GPUand MPI + MIC to achieve high performance when 100 CPUs, GPUs and MICs wereused. Although enabled by multithreading technology, ERDAS IMAGINE may requireapproximately 6.5 hours to read and classify one image into 10 classes, while the hybridMPI solutions can complete the same task in no more than 90 seconds.

The solution is scalable when sufficient computing resources are available. When 900processors on Kraken were applied, unsupervised image classification of one 18-GBimage tile can be accomplished in 6 seconds. Furthermore, we were able to classify 2,4, 8 and 12 tiles of images of up to 216 gigabytes into 5, 10, 15 and 20 classes,respectively, which would be unpractical using a commercial remote sensing softwareover desktop computer systems.

This article will first review the previous works on scalable and high-performancecomputing in remote sensing applications, and the arising scalability problem in imageclassification to establish the research context. Research challenges in scalable and high-performance image processing and analytics can then be identified along with a review ofthe study area and the ISODATA algorithm, followed by an introduction to the computing

322 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

infrastructures of Kraken, Keeneland and Beacon. The high-performance solutionsthrough MPI + CPU, MPI + GPU and MPI + MIC are introduced with the performanceanalysis followed by the introduction of the scalability benchmark tests. Future researchdirections are discussed in the conclusion.

2. Previous works and research context

Processing remotely sensed data is both data and computing intensive. Analysis ofhuge amounts of data routinely associated with remotely sensed images has been achallenge to the research community. Efforts to explore scalable and high-performancesolutions through the processing of large volumes of data can be traced back to the1990s. Initially, researchers used multiple computers to process the data separately byrunning the same scripts or programs on an individual machine. Later, attemptsinvolved the use of a local network to link multiple computers into a cluster, andgrid systems were constructed by integrating remotely distributed homogeneous orheterogeneous computers and workstations (Foster and Kesselman 1999; Foster,Kesselman, and Tuecke 2001). In recent years, cloud computing has emerged, whichis based on the concept of service computing by offering the infrastructure, platformand data as services (Armbrust et al. 2010; Yang et al. 2011). Advances in hardwaredevelopment have transformed the computing infrastructure by shifting from homo-geneous systems employing identical processing elements to hybrid computing archi-tectures that employ multi-core processors in combination with special-purpose chipsand accelerators, such as the GPU and Field-Programmable Gate Array (FPGA). Newmulti-core architectures combined with application accelerators hold the promise ofincreasing performance by exploiting levels of parallelism not supported by theconventional systems.

Following the trajectory of the evolving Cyberinfrastructure for scientific computa-tion, many researchers have been applying parallel and distributed computing technol-ogies for scalable and high-performance data processing and analytics over remotelysensed data (Filippi et al. 2012; Zhang et al. 2013). Although considerable publicationshave documented the achievements of previous works, only a few are referenced in thisarticle to highlight the literature in this domain. In works published earlier, a fewcomputer nodes and clusters were used to exemplify such capability (Wang et al.2004; Zhang and Tsou 2009) with limited achievement to improve the performance.This included the work of image classification on distributed workstations (Dhodhi et al.1999) that could only have a speedup factor of 8 on a 12-spectral band image with a sizeof 512 × 512 pixels in each band. Following the trend in grid computing, researchersdeployed grid systems for high-performance image processing and analytics, as well asfor large-scale data storage. They also offered data and analytical services through gridsystems (Nico, Fusco, and Linford 2003; Yang et al. 2004; Petcu et al. 2007). Bydeploying Microsoft's Azure cloud computing resources, researchers tried to re-projectlarge satellite data, for which it would take tens of months to continuously process suchdata over a high-end Quad-core desktop machine (Li et al. 2010). In hyperspectralimage processing, cutting-edge technologies of GPU, FPGA and hybrid clusters havebeen applied to achieve high-performance solutions (Valencia et al. 2008; Plaza andChang 2008; Plaza, Plaza, and Vegas 2010; Paz and Plaza 2010; Gonzalez et al. 2013).

To the best of our knowledge, previous works only deal with small-scale data withtens or hundreds of megabytes, or one to two thousand pixels in each of the twodimensions in one band for a multi-band image. In the case of hyperspectral data

GIScience & Remote Sensing 323

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

processing, a typical size of the data discussed in previous works (Plaza and Chang 2008;Plaza, Plaza, and Vegas 2010; Paz and Plaza 2010) is 614 × 512 pixels with 224 spectralbands and a total size of 140 MB. As demonstrated by Li et al. (2010), a large amount ofdata can be processed with high-performance computing resources using batch mode onlyif there is no dependency between tasks. Today, the size of high spatial resolution imageryand aerial photos are often measured in gigabytes or terabytes. When hundreds of giga-bytes of data are available for processing and analytics, scalable computing capability iscritical to derive consistent output products with quality. One obvious constraint is that acomputer may not have sufficient memory to process a large remote sensing image. Alarge image may be split into pieces (e.g., image tiles) and algorithms can be designed andrun in batch mode to process each smaller piece of the image data. Unfortunately, ifdependencies between image tile processes exist, then the output generated from theseparated processes will not be consistent.

A comparison between the results of unsupervised image classification over highspatial resolution imagery using (a) two separated processes versus (b) a single processto complete the same task as a whole is illustrative of between-tile dependencies(Figure 1). When the whole image is split into two pieces, the output of two separateclassification processes are not consistent and comparable in contrast to the resultgenerated from a single classification process covering the entire imagery data. Whenthe entire image is applied in a single classification process, local pixel information fromboth pieces is utilized to generate accurate classification using the global pixel values,which is different from the output products that were derived from two separate processesusing the local pixel values in two pieces of imagery. Because the classification is basedon the statistics of the image data, pixel values in separated tiles of imagery or localinformation, it may not be a correct representation of multiple tiles of imagery as a whole.As a result, the classification result is not comparable and consistent either between thetwo separated but neighbouring images or between these two separated images and themerged single image as displayed (Figure 1).

Although previous works focused on techniques to achieve high performance of asequential algorithm, a parallel algorithm has to be devised for an extremely large dataset,which could be scaled up to tens of thousands of compute cores on a supercomputer. Inthis article, we introduce a highly parallel and high-performance solution for data proces-sing and analytics over gigabytes of high-resolution satellite imagery and aerial photoswith consistent quality on supercomputers Kraken, Keeneland and Beacon that are hostedat the National Institute for Computational Sciences (NICS).

Figure 1. Inconsistency in unsupervised image classification (ISODATA as implemented inERDAS Imagine 2010) through processing of distinct tiles in separate processes versus processingof merged imagery in a single process.

324 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

3. Research challenge in scalable and high-performance imagery processing andanalytics

Consider an aerial image source covering the great metropolitan area of Atlanta(Figure 2). The imagery has a spatial resolution of 0.5 meter and is stored and organizedby tiles. In each tile, the imagery has three bands, each of them has a dimension of80,000 × 80,000 pixels, and each covers 1600 km2. The total number of pixels in each tileis 19,200,000,000. The file size of each tile is about 18 GB when stored in ERDASEImagine IMG format. The total amount of image data is over 500 gigabytes. At such ascale, image processing and analytics could hardly be implemented over a desktopcomputer with efficiency. Computers with limited memory could not load and read thedata. Although the latest desktop machine may have much larger memory and higher CPUfrequency, it still takes long time to process one tile of such large imagery.

We choose unsupervised image classification as the research theme in response to thescalability challenge in high-performance computing. It is also a foundation for otherclassification approaches (Congalton 2010), such as supervised classification and objectoriented classification (Weber and Langille 2007; Tang and Pannell 2009; Stine et al.2010; Tenenbaum, Yang, and Zhou 2011). The iterative self-organizing data analysistechnique algorithm (ISODATA; Jensen 1999) is one of the most frequently used algo-rithms for unsupervised image classification algorithms in remote sensing applications.The time complexity of ISODADA algorithm is O(M*N), where M is the maximumnumber of iterations and N is the total number of objects (e.g., the total pixels in animage). It is a data intensive algorithm with computation similar to level one Basic LinearAlgebra Subprograms (BLAS). For large-scale images, the size of the image can reachtens of gigabytes to terabytes. Correspondingly, the number of pixels in an image canreach billions. Therefore, the sequential implementation on a single desktop computer stillneeds to take a very long time to finish even one iteration. For a parallel implementation

Figure 2. Data source and coverage of the study area.


Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

using many nodes on a high-performance computer, the performance barrier is data inputand output (I/O), especially when each MPI process uses the Geospatial Data AbstractionLibrary (GDAL) application programming interface (API; GDAL 2012) to read data fromfiles at the initialization stage. At this point, the I/O contention between each process willcause serious performance degradation. In this article, we show that with minimum effortof performance tuning, an efficient MPI implementation with sufficient parallel I/O tuningon a Lustre parallel file system, good performance can be achieved.

In general, ISODATA can be implemented in three steps: (1) calculate the initial meanvector of each class, (2) classify each pixel to the nearest class and (3) calculate the newclass means based on all pixels in one class. The second and third steps are repeated untilthe change between two iterations is found to be sufficiently small. To perform ISODATAclassification, three parameters need to be specified, including (1) the number of classes tobe created, (2) the convergence threshold which is the maximum percentage of pixelswhose class values are allowed to be unchanged between iterations and (3) the maximumnumber of iterations to be performed.

The initial class means are derived by the statistics of original data sets, although theinitial means can be assigned arbitrarily (Jensen 1999). Throughout the iteration proce-dures, although the maximum number of iterations (M) and the convergence threshold (T)are not reached, the means of all classes are recalculated, causing the class means to shiftin the feature space. During the iterative calculations, each pixel is compared to the newclass means and will be assigned to the nearest class mean. During the process ofclassification, each class is labelled as a certain type of object. The change between twoconsecutive iterations can be either the percentage of pixels whose class labels have beenchanged between two iterations or the accumulated distances of the class means that havebeen changed in the feature space between two iterations. The iterative process will beterminated if either the maximum number of iterations or the convergence threshold (T) isreached or exceeded, which means that the change between two iterations is small enoughbased on the maximum percentage of pixels whose class values are unchanged betweeniterations.

We used ERDAS IMAGINE (Intergraph 2013), a well-known commercial softwarefor processing remotely sensed data for comparison of unsupervised image classificationquality and performance. IMAGINE was installed on a dual-core desktop computer with64-bit operation system. An Intel Xeon Processor W3535 with 6 GB RAM and 2.8 GHzmain frequency or clock speed was utilized to process the 18-GB image of interest(Figure 3) in this research.

As introduced previously (Figure 1), the quality of the classification products may notbe consistent and comparable if the data is processed separately in pieces. When pieces ofdata are merged and processed in a whole procedure, the statistics of the whole data wouldbe different from that in separated pieces. Such a difference leads to the change incalculating the class means and thus will generate different results in the classification.With the aim of processing large imagery in a whole process to obtain a consistentclassification product, we tested three supercomputers to implement ISODATA over onetile of the 18-GB image. Furthermore, we tested the scalability of the parallel solution onKraken to process multiple tiles of the 18-GB image. Details about the three super-computers are introduced in the next section.

To ensure that our high-performance solution has the same quality of classification, wefirst compared the products of classification using small datasets. The classificationprocess and the output product from our solution are exactly the same as that derivedfrom ERDAS because the number of iterations, the convergence threshold and the

326 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

minimum number of pixels in each class are all the same. In the case of large imagery,however, ERDAS adopts a hybrid algorithm that is not released publicly. When the file issmaller, the resulting differences were not observed, but when a larger image was tested,the differences were magnified. This means the criteria for classification and the quality ofthe output product is not consistent when different scales of data are processed.

When supercomputers were utilized in the classification process, the methodologyremained consistent along with the increased scalability of data and computationalprocess, while the performance achievement was impressive. In the case of this 18-GBimage, IMAGINE required about 3 hours 44 minutes 37 seconds (13,477 seconds in total)to read data and classify the imagery into five classes (Figure 4). Because IMAGINEincorporates some extra procedures in generating the output product of the classifiedimagery, such as writing signature files, the performance of writing the output result wasnot compared in this case. The classified product generated from Kraken (Figure 5) isslightly different from the product derived from IMAGINE (Figure 4), such as the resultsat the lower right corner in both figures.

4. Advanced computing infrastructure and system

4.1. Kraken

Kraken is a supercomputer managed by the National Institute for Computational Sciences(NICS) at the Oak Ridge National Laboratory (ORNL) in the United States, and is fundedby the National Science Foundation (NSF). It provides a petascale computing environ-ment that is fully integrated with the Extreme Science and Engineering Discovery

Figure 3. Portion of the 18-GB image data used in the research (row 1 column 4 of the index inFigure 2).


Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

Figure 5. Classification result generated from high-performance solution on Kraken (five classes).

Figure 4. Classification result generated from ERDAS IMAGINE (five classes).

328 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

Environment (XSEDE). Kraken is a Cray XT5 system consisting of 9408 compute nodeswith 112,896 compute cores, 147 terabytes of memory and 3.3 petabytes of storage. Eachcompute node contains 12 AMD 2.6 GHz Istanbul compute cores and 16 GB memory.The peak performance of the Kraken supercomputer is 1.17 petaflops. Kraken wasdeployed for both performance and scalability tests through the MPI + CPU solutionbecause it has sufficient computing resources or cores for the scalability test.

4.2. GPU and Keeneland

Keeneland is a powerful hybrid supercomputing system jointly developed by theGeorgia Institute of Technology, the University of Tennessee at Knoxville and theOak Ridge National Laboratory sponsored by the NSF. At the time of its release in2010, it ranked 118th in the list of top 500 supercomputers in the world. Keeneland iscomposed of an HP SL-390 (Ariston) cluster with Intel Westmere hex-core CPUs,NVIDIA 6GB Fermi GPUs and a Qlogic QDR InfiniBand interconnect. The KIDSsystem has 120 nodes with 240 CPUs and 360 GPUs. Each node has 2 Westmere hex-core CPUs, whereas each CPU has 67 GFLOPS of computing power and 3 GPUs. EachGPU can generate 515 GFLOPS of computing power. Every four nodes are placed in theHP S6500 Chassis and every six Chassis is placed in a rack. In total, seven racks areincluded in the Keeneland system. The full-scale Keeneland system was added to theXSEDE in 2012. Keeneland was deployed for performance test through bothMPI + CPU and MPI + GPU solutions.

4.3. MIC and Beacon

Beacon is a supercomputing system that offers access to 48 compute nodes and 6 I/Onodes joined by FDR InfiniBand interconnect providing 56 GB/s of bi-directional band-width. Each compute node is equipped with two Intel® Xeon® E5–2670 processors, fourIntel® Xeon Phi™ coprocessors 5110P, 256 GB of RAM and 960 GB of SSD storage.Each I/O node provides access to an additional 4.8 TB of SSD storage. Beacon provides768 conventional cores and 11,520 accelerator cores that provide over 210 TFLOP/s ofcombined computational performance, 12 TB of system memory, 1.5 TB of coprocessormemory and over 73 TB of SSD storage in aggregate. Beacon was deployed forperformance tests through an MPI + MIC solution.

4.4. The lustre parallel file system

Considering the large imagery data utilized in this research, file reading can take a longertime than the computational time for classification in parallel. When such large data isprocessed, the data has to be first read from the storage disk to memory and I/O operationswill have significant impact on the overall performance. Because a Lustre parallel filesystem (Cluster File Systems, Inc 2002) is installed on Kraken, we applied a techniquecalled file striping that significantly improved the I/O performance. Nowadays, a parallelshared file system is a minimal requirement for a supercomputer. Lustre is a distributedfile system used for large-scale cluster computing. It can support up to tens of thousandsof client systems and serve petabytes of storage and tens of gigabytes per second of I/Othroughput. As of June 2010, 15 of the top 30 supercomputers in the world use the Lustrefile system.


Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

A Lustre file system consists of two major units (Figure 6) including a single metadatatarget (MDT) per file system that stores metadata, such as filenames, directories, permis-sions and file layout, on the metadata server (MDS), plus one or more object storageservers (OSSs) that store file data on one or more object storage targets (OSTs). An OSStypically serves between two and eight targets, with each target being a local disk filesystem up to 8 terabytes (TBs) in size. The capacity of a Lustre file system is the sum ofthe capacities provided by the targets. One big file can be striped to access multiple OSTsconcurrently for better I/O performance.

File striping is a process in which Lustre automatically divides data into chunks anddistributes them across multiple OSTs. It plays a vigorous role in running large-scalecomputation by reducing the time spent on reading or writing a big data file to signifi-cantly improve the file I/O performance. By setting the stripe count and stripe size inLustre, which are the tuneable parameters of the Lustre file system, we successfullyoptimized the I/O process that is critical to the general performance of the entirecomputation process. Stripe count is the number of OSTs into which a file is stored.For example, if the stripe count is set to 10, the file will approximately be partitioned inequal portions on 10 different OSTs. Stripe size is the chunk size that a file is split intoand distributed across OSTs. When an application performs I/O on a Lustre file system,choosing different values of stripe size and count can affect the I/O performance drama-tically, with orders of magnitude differences in performance possible.

5. High-performance computation of ISODATA over Keeneland and Beacon

Dhodhi et al. (1999) implemented a distributed ISODATA application over a cluster ofeight workstations to process a 12-spectral band image for a size of 512 × 512 pixels ineach band. Following a master–slave approach, a supervisor was applied to read theimage, partition it into blocks and send the blocks to each worker. The supervisor also hadto get information from each worker about all pixel labels to calculate new centroids foreach cluster, while the workers would calculate the number of pixels belonging to a

Figure 6. Lustre file system architecture (Cray 2009).

330 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

cluster and the sum of the feature vectors of all pixels assigned to each cluster. For such asmall imagery data, it required 1297 seconds (21 minutes and 36 seconds) to finish theclassification process.

In the high-performance and parallel computing context, theoretically, domain decom-position, functional decomposition and the master–slave model are the three commonapproaches to employ multiple cores and processors in the parallel computing environ-ment. In the master–slave model, one node acts as the master or the system host andassigns the jobs to the other worker nodes. Considering the supercomputing capability onKraken, Keeneland and Beacon, processing gigabytes of imagery data is applicable butthe master–slave approach is not the reasonable solution. Given the 18-GB file size of theimagery, using one core to read the entire data and distribute blocks of data to other coreswould obviously increase the I/O time. During the classification process of iterativecalculations, passing messages between the workers and the master core will increasethe time for communication and synchronization when hundreds or thousands of cores aredeployed, while the master core will have heavy duty to summarize the information fromthe other cores. For this reason, we apply a parallel reading solution that directs all coresto read a segment of data simultaneously, while all cores will do the same job in parallelthroughout the course of image classification. By using the MPI_Allreduce function,required information for classification can be broadcasted to all cores that can do theiterative calculation simultaneously. In this way, both of the processes of I/O and iterativeclassification can be parallelized.

Figure 7 displays the general workflow of implementing ISODATA by MPI. Wevirtually cut the entire image into pieces and used tens to hundreds of cores to readdata through the GDAL API to perform unsupervised image classification according to

Figure 7. Workflow of implementing ISODATA over the supercomputers via MPI.


Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

the classic ISODATA algorithm on CPU, GPU or MIC processors. The workflow(Figure 7) starts from setting the three required input variables as the number ofclasses, maximum number of iterations and the convergence threshold. Once eachprocessor reads part of the data from the image file, the sum and the square sum ofeach band assigned to this computing node is first calculated to determine the meanand the standard deviation of each band. Based on the statistical calculation results,the initial centres of each cluster or class are computed. Once each pixel is assigned toa cluster based on the Euclidean distance between this pixel and the centre of theclass, a series of calculations is implemented to derive the amount of data remainingwith the same class label, the sum of each cluster and count the number of pixelsbelonging to each cluster, the number of pixels that did not change the class and theratio of unchanged pixels. If the ratio is bigger than the threshold, or the number ofiteration exceeds the maximum number, then stop the classification process.Otherwise, new centres of each cluster are calculated and values are distributed onto computing nodes. The classification process is repeated by assigning each pixel to acertain class based on the Euclidean distance between the pixel and the new classcentres. Initially, we divided the image by the number of cores used in the calculation.The first thread in the MPI program will handle all residual pixels in the classificationcalculation process. During each iteration of the classification processes, statisticalinformation of the pixels on each core needs to be collected through MPI_Allreduceand summarized and calculated by each core to do classification according to aconsistent standard in a distributed computing environment.

Obviously, classification over supercomputers has a strong dependency among eachpiece of data to generate a consistent result. When such dependency is involved, thereshould be more challenges in handling a variety of complicated issues, such as commu-nication, synchronization and load balance strategies, among scalable parallelized anddistributed computing cores, which all have a significant impact on the final performance.As described previously (Figure 7), each core will read a segment of the three-band imageand compute the sum and the square sum of each band assigned to this core. The meanvalue and standard deviation of each band is retrieved through the MPI_Allreducefunction. In this way, each core can calculate the initial centre of each cluster and assigneach pixel to one cluster based on the Euclidean distance between the pixel and the clustercentre. Although the convergence threshold and maximum number of iterations are notachieved, each core will count the number of pixels that do not change the label,recalculate the sum of each cluster and number of pixels assigned to the cluster, derivethe new mean value and standard deviation of each band through the MPI_Allreducefunction and determine the new centres of each cluster. In the case of Keeneland, thehybrid MPI + GPU programs needs to take some extra procedures to copy the data backand forth between CPUs and GPUs (Figure 7).

To validate the solution derived from this pilot study, we first developed a serial Cprogram as well as a GPU program running on a desktop computer that implemented theISODATA algorithm so as to establish a standard for performance comparison and qualitycontrol (Ye and Shi 2012). The performance of the hybrid solutions of MPI + CPU,MPI + GPU and MPI + MIC on Keeneland and Beacon is reported (Table 1) when onetile of the 18-GB image was classified into 10 classes. When 100 GPUs on Keeneland and100 MICs on Beacon are used, unsupervised image classification can be completed withinone and a half minutes. When ERDAS IMAGINE was used to classify an 18-GB imageinto 10 classes, it took about 5.5 hours to complete the task of unsupervised imageclassification.

332 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

6. I/O optimization and scalability benchmark tests over Kraken

Because supercomputers Keeneland and Beacon have limited computing nodes, Kraken wasused to validate the scalability of the proposed solution through a series of benchmark tests.As observed with Keeneland and Beacon (Table 1), although the computing time was reducedwhen more processors were deployed, the time used in file reading was not in a consistentscale along with the increasing number of processors. For this reason, I/O optimization issignificant when larger datasets are utilized in unsupervised image classification.

Domain decomposition is designed and achieved by distributing the computation taskappropriately across multiple nodes over Kraken. Although each node only processes asubset of the big data, considering each tile of the 18-GB image data has 80,000 rows and80,000 columns, even a small difference in the data partition may generate a load balanceissue. For example, if the data is partitioned as 600 columns or rows by using 600 cores, onaverage each core will process 133 rows or columns. In this case, one core will process 333rows or columns, while the other 599 cores will process 133 rows or columns. Because eachrow or column has 80,000 pixels in three bands, one core will have extra computation taskto process (333−133) × 80,000 × 3 = 48,000,000 more pixels while all the others may haveto wait till synchronization is accomplished. For this reason, we designed a specific strategyfor data partitioning or domain decomposition to manage the load balance issue by assign-ing the number of cores as the square of a number (i.e., n × n), such as 576, which meansthe data is partitioned in to about 24 × 24 segments. Because one of 576th partition of theimage data has about (80,000 × 80,000 × 3/576) = 33,333,333 pixels, each core will read33,333,333 pixels, while only one core needs to process the residuals and read an extra of64 × 3 = 192 more pixels than the other cores to achieve load balance. Considering 192pixels is about 0.000001% of the total pixel numbers (80,000 × 80,000 × 3) in each tile ofthe 18-GB image data, the impact of the residual pixels in workload imbalance could, thus,be trivial (about 0.0576% of 33,333,333, the average workload on each core), although thedata partitioning or domain decomposition approach can be improved and optimized.

When the Lustre file system is utilized in the optimization process, the impact of thestripe count and stripe size as the two Lustre parameters is significant over the overallperformance of the entire operation. We listed the performance measurements of imagereading and classification (Table 2). All benchmark test runs use 576 compute cores, whilethe image size is 18 GB. On Kraken, the default stripe size is set to be 1 MB and the stripecount is set to be four. Except the first test with default settings, we set the stripe size to be20 MB. We observed the performance improve with increasing number of the stripe countas more OSTs were used. As the Lustre file system is shared across the whole machine by

Table 1. Performance comparison (Time is measured in seconds. Classification computation canbe accomplished in about a half minute when 100 GPUs and 100 MICs are used).

Number ofProcessors

Keeneland (KIDS) Beacon

MPI + CPU MPI + GPU MPI + MIC

Read Comp. Total Read Comp. Total Read Comp. Total

36 54.59 114.16 168.75 44.90 71.76 116.66 21.88 69.38 91.2660 44.58 83.95 128.53 43.88 54.45 98.32 874.51 54.03 928.5464 49.15 65.90 115.05 52.10 46.00 98.10 32.72 39.71 72.4380 52.50 51.18 103.67 48.97 36.39 85.37 15.37 39.07 54.44100 1.29 81.31 82.59 36.35 33.81 70.16 41.99 31.52 73.51


Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

all users, however, using more OSTs does not show much performance improvement. Wecan see that the performance dropped when the stripe count was increased from 80 to 160.

A third set of benchmark tests were conducted (Table 3). This time, we showed theimpact of different stripe sizes on the I/O performance while we fixed the number of stripecount as 80, which is the optimum shown previously (Table 2). We can see that theperformance is not affected too much by increasing the stripe size. When the stripe size isover 10 MB, it does not seem to be more beneficial to gain better performance. When thestripe size is 10 megabytes and the stripe count is 80, we are satisfied with the result thatthe I/O time is less than the classification time, while the total time for file reading andclassification is the minimum in all bench mark tests (Tables 3–4).

Table 4 shows the performance comparison with different number of compute coreswith optimized stripe size and stripe count. We can see that the implementation ofISODATA over the Kraken scales well after we set the right Lustre parameters, whichis not surprising as the whole process is both data and compute intensive. The optimalnumber of processors on supercomputer Kraken for processing this specific set of imagedata is determined through a fourth benchmark test (Table 4). When all the processors onKraken can be utilized, the proposed solution has the capability to process more than 2 TBimage data for unsupervised image classification in a single process on Kraken and toachieve high performance by the combination of optimal stripe count, stripe size andnumber of processors. Justification and adaptation have to be made by different computersystems on different sets of image data.

The above optimized solution has been extended to process multiple tiles of the 18-GB imagery files in a single process. A performance comparison is reported (Table 5) on

Table 2. Performance comparison with different stripe counts and 576 cores.

Stripe count 4 10 20 40 80 160

Stripe size (MB) 1 20 20 20 20 20Read time (sec) 63.03 22.77 8.77 5.14 3.5 4.35Classification time (sec) 6.45 4.69 3.99 3.74 3.94 4.38

Table 3. Performance comparison with different stripe sizes and 576 cores.

Stripe count 4 80 80 80 80 80

Stripe size (MB) 1 5 10 20 40 80Read time (sec) 63.03 3.85 2.94 3.5 4 4.66Classification time (sec) 6.45 4.24 3.56 3.94 4.06 4.02

Table 4. Performance comparison by different number of cores with optimized stripe count andstripe size. The best performance is achieved for about 6.08 seconds.

Number of cores 144 324 576 900

Stripe count 80 80 80 80Stripe size (MB) 10 10 10 10Read time (sec) 5.66 5.13 2.94 2.77Classification time (sec) 13.72 6.15 3.56 3.31

334 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

Table

5.Perform

ance

comparisonof

theclassificatio

non

multip

letiles

(tim

eismeasuredin

second

sfordata

reading[I/O],classificatio

n[CLS]andthetotal

time.

IRisthenu

mberof

iteratio

ns.In

allcases,thetotaltim

eiswith

inabou

t1minute.).

#of

tiles

5classes

10classes

15classes

20classes

I/O

CLS

Total

IRI/O

CLS

Total

IRI/O

CLS

Total

IRI/O

CLS

Total

IR

14.32

2.13

6.45

44.25

8.62

12.87

115.51

12.07

17.57

116.00

18.13

24.13

132

8.94

2.16

11.10

420

.31

7.92

28.23

1017

.16

11.32

28.47

119.02

15.09

24.11

124

21.01

2.21

23.23

416

.40

7.95

24.35

1014

.80

13.41

28.21

1316

.40

7.95

24.35

108

28.83

2.23

31.06

428

.95

7.41

36.36

928

.67

14.78

43.46

1429

.52

15.34

44.86

1212

44.86

2.29

47.15

445

.92

6.57

52.49

858

.31

9.43

67.74

941

.56

15.37

56.93

12


Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

the classification of multiple tiles into 5, 10, 15 and 20 classes in a single process usingthe optimized number of compute cores with optimized stripe size and stripe count. Foreach class, this table shows the time used in file reading (I/O), classification (CLS), thetotal time (Total) and number of iterations (IR) for the classification processes. Time ismeasured in seconds. When ERDAS is used to classify one tile of the imagery data into10, 15 and 20 classes, the corresponding times used to accomplish the task over a desktopcomputer are about 19,800 seconds, 23,400 seconds and 27,000 seconds.

We noticed the inconsistencies in the I/O times when reading the same data.Although there is a general and reasonable trend of increasing the I/O time along withthe increasing size of data, the inconsistencies in I/O times may be caused by someinternal issues in Kraken that is out of the control of any users as application developer.In all cases, 216 GB image can be classified into 5, 10, 15 and 20 classes within almost1 minute.

7. Conclusion

Our pilot study has demonstrated the unique value and capability of the emergingadvanced computing infrastructure and system to manipulate complex geospatial compu-tations over large data with high performance. In this research, we processed one 18-GBtile of high-resolution imagery data and expanded the solution successfully to process216 GB of data in a single process, although the proposed strategy for data partitioning ordomain decomposition can be further improved and optimized. This pilot study has laid afoundation to expand this initiative to other imagery processing tasks, such as supervisedor object-based image classification and change detection, image transformations(smoothing, enhancement, compression, pyramid generation, Fourier and wavelet trans-formation), image corrections and rectification (correct the influence of atmosphere,terrain, position and attitude), image re-projection from one coordinate system to anotherand image fusion between low-resolution and high-resolution images. However, fordifferent kind of applications or algorithms, the parallel solution could be dissimilar. Inextreme cases, the parallelism of some sequential algorithms could almost be a completere-engineering process. For example, the affinity propagation (AP) algorithm is a cluster-ing approach in spatial statistics. Transforming the original serial C program for AP into aGPU program may need significant algorithm re-design because the original serialprogram may not be intuitively transformable to parallel programs. Particularly, if datacommunication is necessary between distributed computing nodes, MPI's SEND andRECV functions have to be applied.

The potential of new computer architectures and computing technologies still needsfurther exploration and validation. For example, the latest Kepler GPU has the cap-ability of direct cross-GPU communication, while our solution is applicable to previousgenerations of GPUs with the Fermi architecture. As a result, communication betweenGPU is implemented through MPI programs, which would increase the communicationtime when copying the data back and forth from the CPUs to GPUs. In the case of MIC,although it is straightforward to port MPI + CPU code to MIC cluster to achievesignificant performance improvement, a lot of on-board memory is used for operationsystem and MPI support. The offloading model can be further explored to compare withthe native model to understand which approach is better or more efficient in suchcomputations.

336 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

AcknowledgementsThis research was supported partially by the National Science Foundation through the award OCI-1047916. The computations were first performed on Kraken at the National Institute forComputational Sciences. This research used resources of the Keeneland Computing Facility at theGeorgia Institute of Technology, which is supported by the National Science Foundation underContract OCI-0910735. This research also used Beacon, which is a Cray CS300-ACTM ClusterSupercomputer. The Beacon Project is supported by the National Science Foundation and the Stateof Tennessee.

ReferencesArmbrust, M., I. Stoica, M. Zaharia, A. Fox, R. Griffith, A. D. Joseph, R. Katz, et al. 2010. “AView

of Cloud Computing.” Communications of the ACM 53 (4): 50–58. doi:10.1145/1721654.1721672.

Cluster File Systems, Inc. 2002. “Lustre: A Scalable, High Performance File System.” TechnicalReport, White Paper. http://www.cse.buffalo.edu/faculty/tkosar/cse710/papers/lustre-whitepaper.pdf

Congalton, R. G. 2010. “Remote Sensing: An Overview.” Giscience & Remote Sensing 47 (4).doi:10.2747/1548-1603.47.4.443.

Cray. 2009. “Getting Started on MPI I/O. S–2490–40.” Accessed May 14, 2014. http://docs.cray.com/books/S-2490-40/S-2490-40.pdf

Dhodhi, M. K., J. A. Saghri, I. Ahmad, and R. Ul-Mustafa. 1999. “D-ISODATA: A DistributedAlgorithm for Unsupervised Classification of Remotely Sensed Data on Network ofWorkstations.” Journal of Parallel and Distributed Computing 59 (2): 280–301. doi:10.1006/jpdc.1999.1573.

Filippi, A. M., B. L. Bhaduri, T. Naughton, A. L. King, S. L. Scott, and I. Güneralp. 2012.“Hyperspectral Aquatic Radiative Transfer Modeling Using a High-Performance ClusterComputing-Based Approach.” Giscience & Remote Sensing 49 (2). doi:10.2747/1548-1603.49.2.275.

Foster, I., and C. Kesselman, eds. 1999. The Grid: Blueprint for a New Computing Infrastructure.San Fransisco, CA: Morgan Kaufmann.

Foster, I., C. Kesselman, and S. Tuecke. 2001. “The Anatomy of the Grid.” International Journal ofSupercomputer Applications 15 (3): 200–222.

GDAL (Geospatial Data Abstraction Library). 2012. Accessed May 14, 2014. http://www.gdal.org/Gonzalez, C., S. Sanchez, A. Paz, J. Resano, D. Mozos, and A. Plaza. 2013. “Use of FPGA or Gpu-

Based Architectures for Remotely Sensed Hyperspectral Image Processing.” Integration, theVLSI Journal 46 (2): 89–103.

Gropp, W., E. Lusk, and A. Skjellum. 1994. Using MPI: Portable Parallel Programming with theMessage-Passing Interface. Cambridge, MA: MIT Press Scientific And EngineeringComputation Series. ISBN 0-262-57104-8.

Intergraph. 2013. ERDAS Field Guide. Huntsville, AL: Intergraph Corporation.Jacobsen, K. 2011. ”Characteristics of Very High Resolution Optical Satellites for Topographic

Mapping.” In: IntArchPhRS vol. XXXVIII-4/W19, Hannover, 2011, 6 S. CD.Jensen, J. R. 1999. Introductory Digital Image Processing: A Remote Sensing Perspective. 2nd ed,

231–239. Englewood Cliffs, NJ: Prentice-Hall.Li, J., D. Agarwal, M. Humphrey, C. van Ingen, K. Jackson, and Y. Ryu. 2010. “Escience in the

Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows AzurePlatform.” In Proceedings of the 24th IEEE International Parallel and Distributed ProcessingSymposium (IPDPS 2010), Atlanta, GA, April 19–23. IEEE. http://xplorebcpaz.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5470418&queryText%3DEscience+in+the+555+Cloud%3A+A+MODIS+Satellite+Data+Reprojection+and+Reduction+Pipeline+in+the+Windows+Azure+Platform

National Science Foundation. 2007. “Cyberinfrastructure Vision for 21st Century Discovery.”Accessed May 14, 2014. http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf

Nico, G., L. Fusco, and J. Linford. 2003. “Grid Technology for the Storage and Processing ofRemote Sensing Data: Description of an Application.” In SPIE Proceedings Vol. 4881: Sensors,Systems, and Next-Generation Satellites, 677–685. doi:10.1117/12.462921.


Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

http://www.cse.buffalo.edu/faculty/tkosar/cse710/papers/lustre-whitepaper.pdf

http://www.cse.buffalo.edu/faculty/tkosar/cse710/papers/lustre-whitepaper.pdf

http://docs.cray.com/books/S-2490-40/S-2490-40.pdf

http://docs.cray.com/books/S-2490-40/S-2490-40.pdf

http://www.gdal.org/

http://xplorebcpaz.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5470418&queryText%3DEscience+in+the+555+Cloud%3A+A+MODIS+Satellite+Data+Reprojection+and+Reduction+Pipeline+in+the+Windows+Azure+Platform




http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf

Pacheco, P. S. 1997. Parallel Programming with MPI. San Francisco, CA: Morgan Kaufmann.ISBN 1-55860-339-5

Paz, A., and A. Plaza. 2010. “Clusters versus Gpus for Parallel Automatic Target Detection inRemotely Sensed Hyperspectral Images.” EURASIP Journal on Advances in Signal Processing2010: 18. doi:10.1155/2010/915639.

Petcu, D., D. Zaharie, D. Gorgan, F. Pop, and D. Tudor. 2007. “MedioGrid: A Grid-based Platformfor Satellite Image Processing.” In Proceeding of the 4th IEEE International Workshop onIntelligent Data Acquisition and Advanced Computing Systems: Technology and Applications,September 6–8, 137–142. Dortmund: IEEE Xplore Press. doi:10.1109/IDAACS.2007.4488392.

Plaza, A., and C.-I. Chang. 2008. “Clusters Versus FPGA for Parallel Processing of HyperspectralImagery.” International Journal of High Performance Computing Applications 22 (4): 366–385.doi:10.1177/1094342007088376.

Plaza, A., J. Plaza, and H. Vegas. 2010. “Improving the Performance of Hyperspectral Image andSignal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-BasedSystems.” Journal of Signal Processing Systems 61: 293–315. doi:10.1007/s11265-010-0453-1.

Snir, M., S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. 1995. MPI: The CompleteReference. Cambridge, MA: MIT Press. ISBN 0-262-69215-5.

Stine, R. S., D. Chaudhuri, P. Ray, P. Pathak, and M. Hall-Brown. 2010. “Comparison of DigitalImage Processing Techniques for Classifying Arctic Tundra.” Giscience & Remote Sensing 47(1). doi:10.2747/1548-1603.47.1.78.

Tang, Y., and C. W. Pannell. 2009. “A Hybrid Approach for Land Use/Land Cover Classification.”Giscience & Remote Sensing 46 (4). doi:10.2747/1548-1603.46.4.365.

Tenenbaum, D. E., Y. Yang, and W. Zhou. 2011. “A Comparison of Object-Oriented ImageClassification and Transect Sampling Methods for Obtaining Land Cover Information fromDigital Orthophotography.” Giscience & Remote Sensing 48 (1). doi:10.2747/1548-1603.48.1.112.

Valencia, D., A. Lastovetsky, M. O’Flynn, A. Plaza, and J. Plaza. 2008. “Parallel Processing ofRemotely Sensed Hyperspectral Images on Heterogeneous Networks of Workstations UsingHeterompi.” International Journal of High Performance Computing Applications 22 (4): 386–407. doi:10.1177/1094342007088377.

Wang, J., X. Sun, Y. Xue, Y. Hu, Y. Luo, Y. Wang, S. Zhong, A. Zhang, J. Tang, and G. Cai. 2004.“Preliminary Study on Unsupervised Classification of Remotely Sensed Images on the Grid.”Lncs 3039: 981–988.

Weber, K. T., and J. Langille. 2007. “Improving Classification Accuracy Assessments withStatistical Bootstrap Resampling Techniques.” Giscience & Remote Sensing 44 (3).doi:10.2747/1548-1603.44.3.237.

Yang, C., M. Goodchild, Q. Huang, D. Nebert, R. Raskin, Y. Xu, M. Bambacus, and D. Fay. 2011.“Spatial Cloud Computing: How Can the Geospatial Sciences Use and Help Shape CloudComputing?” International Journal of Digital Earth 4 (4): 305–329. doi:10.1080/17538947.2011.587547.

Yang, X. J., Z. M. Chang, H. Zhou, X. Qu, and C. J. Li. 2004. “Services for Parallel Remote-Sensing Image Processing Based on Computational Grid.” Lecture Notes in Computer Science3252: 689–696.

Ye, F., and X. Shi. 2012. “Parallelizing ISODATA Algorithm for Unsupervised Image Classificationon GPU.” In Modern Accelerator Technologies for Geographic Information Science, edited byX. Shi, V. Kindratenko, and C. Yang. Berlin: Springer.

Zhang, T., and M. H. Tsou. 2009. “Developing a Grid Enabled Spatial Web Portal for InternetGIServices and Geospatial Cyberinfrastructure.” International Journal of GeographicalInformation Science 23 (5): 605–630.

Zhang, W., L. Wang, D. Liu, W. Song, Y. Ma, P. Liu, D. Chen, et al. 2013. “Towards Building aMulti-Datacenter Infrastructure for Massive Remote Sensing Image Processing.” Concurrency610 and Computation: Practice and Experience 25 (12): 1798–1812. doi:10.1002/cpe.2966.

338 X. Shi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

rkan

sas

Lib

rari

es -

Fay

ette

ville

] at

12:

00 3

0 M

ay 2

014

unsupervised image classification over supercomputers ...mqhuang/papers/2014_image... · academic...

Documents