gpu-accelerated computation for electromagnetic scattering of a double-layer vegetation model

8
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 4, AUGUST 2013 1799 GPU-Accelerated Computation for Electromagnetic Scattering of a Double-Layer Vegetation Model Xiang Su, Jiaji Wu, Bormin Huang, and Zhensen Wu Abstract—In this paper we develop a graphics processing unit (GPU)-based massively parallel approach for efcient computa- tion of electromagnetic scattering via a proposed double-layer vegetation model composed of vegetation and ground layers. The proposed vector radiative transfer (VRT) model for vegetation scattering considers different sizes and orientations of the leaves. It uses the Monte Carlo method to calculate the backward scat- tering coefcients of rough ground and vegetation where the leaves are approximated as a large number of randomly oriented at ellipsoids and the ground is treated as a Gaussian random rough surface. In the original CPU-based sequential code, the Monte Carlo simulation to calculate the electromagnetic scattering of vegetation takes up 97.2% of the total execution time. In this paper we take advantage of the massively parallel compute capability of NVIDIA Fermi GTX480 with the Compute Unied Device Architecture (CUDA) to compute the multiple scattering of all the leaf groups simultaneously. Our parallel design includes the registers for faster memory access, the shared memory for parallel reduction, the pipelined multiple-stream asynchronous transfer, the parallel random number generator and the CPU-GPU het- erogeneous computation. By using these techniques, we achieved speedup of 213-fold on the NVIDIA GTX 480 GPU and 291-fold on the NVIDIA GTX 590 GPU as compared with its single-core CPU counterpart. Index Terms—Compute unied device architecture (CUDA), electromagnetic scattering, equation double-layer vegetation model, graphics processing unit (GPU), Monte Carlo method, parallel computing. I. INTRODUCTION T HE characteristics of electromagnetic scattering and propagation have attracted great interests in such elds as modern communications, remote sensing, target identication and environmental monitoring because of the complexity of the ground and vegetation layers. Researchers investigate the problem of vegetation scattering primarily via experiments and theoretical modeling. The main purpose of the experiments is Manuscript received March 13, 2012; revised August 01, 2012; accepted Au- gust 28, 2012. Date of publication April 29, 2013; date of current version July 22, 2013. This work was supported in part by the Natural Science Foundation of China (61172031, 61077009), and the Fundamental Research Funds for The Central Universities (K50510020002). X. Su is with the School of Science, Xidian University, Xi’an, Shaanxi province, China (e-mail: [email protected]). J. Wu and Z. Wu are with the Institute of Intelligent Information Processing, Xidian University, Xi’an, Shaanxi province, China (corresponding author, e-mail: [email protected]). B. Huang is with the Space Science and Engineering Center, the University of Wisconsin-Madison, USA (e-mail: [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/JSTARS.2012.2219508 to determine the quantitative relationship among the scattering coefcients, vegetation types, frequencies, polarizations and incident angles [1]–[5] The scattering characteristics of Earth’s vegetation have broad applications in agricultural remote sensing. For example, the radiation and scattering data of crops in different growth stages can help quantitatively investigate biomass in vegetation and moisture content in soil which are the vital growth param- eters for crops. The polarization of microwave scattering by different types of crops can help the classication of crops. By observing the mechanism of interaction between electromag- netic waves and vegetation, the vegetation parameter can be inferred. The experimental result is often subject to weather and en- vironmental conditions. Therefore its validity and applicability have limitations. For a theoretical model the variables can be controlled manually, it is thus easier to evaluate the impact of a variable on the result. The vegetation scattering models [6]–[14] have been proposed with different levels of com- plexity. For example, Richards established the backscattering model of forest for the L-Band sensor [10]. Karam proposed a vegetation model which only considered the scattering of leaves [11]. A forest model consisting of a layer of cylinders is built by Durden [12]. The double-layer vegetation model used in the current study was derived from the Michigan microwave canopy scattering (MIMICS) three-layer vegetation model [13] proposed by Ulaby et al.. With the development of computer technology in recent years, the Monte Carlo method [15]–[17] has been widely used to simulate the scattering in random media. In this paper, the model based on Monte-Carlo simu- lation techniques has been used. It is a double-layer coherent scattering model composed of a layer of leaves satisfying a uniform random distribution and a layer of soil with rough surface [18]–[20]. In the current study, the Monte Carlo method is used to process a large number of leaves with stochastic orientations and construct a vegetation layer. The vector radiative transfer (VRT) equation is then established, and its iterative solution is solved. In the original serial method, the leaves are divided into different groups according to their orientations and sizes. The phase matrixes of groups are then individually calculated and the sum of the individual results is obtained. The problem is suitable for parallel implementation by simultaneously creating and computing all leaf groups, resulting in a dramatic decrease in the execution time. In recent years GPUs have been a disruptive technology in high performance computing. GPUs use the ever increasing transistors for adding more compute cores [21]–[23]. GPUs 1939-1404/$31.00 © 2012 IEEE

Upload: zhensen

Post on 12-Dec-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 4, AUGUST 2013 1799

GPU-Accelerated Computation for ElectromagneticScattering of a Double-Layer Vegetation Model

Xiang Su, Jiaji Wu, Bormin Huang, and Zhensen Wu

Abstract—In this paper we develop a graphics processing unit(GPU)-based massively parallel approach for efficient computa-tion of electromagnetic scattering via a proposed double-layervegetation model composed of vegetation and ground layers. Theproposed vector radiative transfer (VRT) model for vegetationscattering considers different sizes and orientations of the leaves.It uses the Monte Carlo method to calculate the backward scat-tering coefficients of rough ground and vegetation where the leavesare approximated as a large number of randomly oriented flatellipsoids and the ground is treated as a Gaussian random roughsurface. In the original CPU-based sequential code, the MonteCarlo simulation to calculate the electromagnetic scattering ofvegetation takes up 97.2% of the total execution time. In this paperwe take advantage of the massively parallel compute capabilityof NVIDIA Fermi GTX480 with the Compute Unified DeviceArchitecture (CUDA) to compute the multiple scattering of allthe leaf groups simultaneously. Our parallel design includes theregisters for faster memory access, the shared memory for parallelreduction, the pipelined multiple-stream asynchronous transfer,the parallel random number generator and the CPU-GPU het-erogeneous computation. By using these techniques, we achievedspeedup of 213-fold on the NVIDIA GTX 480 GPU and 291-foldon the NVIDIA GTX 590 GPU as compared with its single-coreCPU counterpart.

Index Terms—Compute unified device architecture (CUDA),electromagnetic scattering, equation double-layer vegetationmodel, graphics processing unit (GPU), Monte Carlo method,parallel computing.

I. INTRODUCTION

T HE characteristics of electromagnetic scattering andpropagation have attracted great interests in such fields as

modern communications, remote sensing, target identificationand environmental monitoring because of the complexity ofthe ground and vegetation layers. Researchers investigate theproblem of vegetation scattering primarily via experiments andtheoretical modeling. The main purpose of the experiments is

Manuscript received March 13, 2012; revised August 01, 2012; accepted Au-gust 28, 2012. Date of publication April 29, 2013; date of current version July22, 2013. This work was supported in part by the Natural Science Foundationof China (61172031, 61077009), and the Fundamental Research Funds for TheCentral Universities (K50510020002).X. Su is with the School of Science, Xidian University, Xi’an, Shaanxi

province, China (e-mail: [email protected]).J. Wu and Z. Wu are with the Institute of Intelligent Information Processing,

Xidian University, Xi’an, Shaanxi province, China (corresponding author,e-mail: [email protected]).B. Huang is with the Space Science and Engineering Center, the University

of Wisconsin-Madison, USA (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/JSTARS.2012.2219508

to determine the quantitative relationship among the scatteringcoefficients, vegetation types, frequencies, polarizations andincident angles [1]–[5]The scattering characteristics of Earth’s vegetation have

broad applications in agricultural remote sensing. For example,the radiation and scattering data of crops in different growthstages can help quantitatively investigate biomass in vegetationand moisture content in soil which are the vital growth param-eters for crops. The polarization of microwave scattering bydifferent types of crops can help the classification of crops. Byobserving the mechanism of interaction between electromag-netic waves and vegetation, the vegetation parameter can beinferred.The experimental result is often subject to weather and en-

vironmental conditions. Therefore its validity and applicabilityhave limitations. For a theoretical model the variables can becontrolled manually, it is thus easier to evaluate the impactof a variable on the result. The vegetation scattering models[6]–[14] have been proposed with different levels of com-plexity. For example, Richards established the backscatteringmodel of forest for the L-Band sensor [10]. Karam proposeda vegetation model which only considered the scattering ofleaves [11]. A forest model consisting of a layer of cylinders isbuilt by Durden [12]. The double-layer vegetation model usedin the current study was derived from the Michigan microwavecanopy scattering (MIMICS) three-layer vegetation model [13]proposed by Ulaby et al.. With the development of computertechnology in recent years, the Monte Carlo method [15]–[17]has been widely used to simulate the scattering in randommedia. In this paper, the model based on Monte-Carlo simu-lation techniques has been used. It is a double-layer coherentscattering model composed of a layer of leaves satisfying auniform random distribution and a layer of soil with roughsurface [18]–[20].In the current study, the Monte Carlo method is used to

process a large number of leaves with stochastic orientationsand construct a vegetation layer. The vector radiative transfer(VRT) equation is then established, and its iterative solution issolved. In the original serial method, the leaves are divided intodifferent groups according to their orientations and sizes. Thephase matrixes of groups are then individually calculated andthe sum of the individual results is obtained. The problem issuitable for parallel implementation by simultaneously creatingand computing all leaf groups, resulting in a dramatic decreasein the execution time.In recent years GPUs have been a disruptive technology in

high performance computing. GPUs use the ever increasingtransistors for adding more compute cores [21]–[23]. GPUs

1939-1404/$31.00 © 2012 IEEE

1800 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 4, AUGUST 2013

Fig. 1. Double-layer vegetation model.

also have very high memory bandwidth. Therefore, GPUsare well suited for massively data parallel processing withhigh floating point arithmetic intensity. In November 2007,NVIDIA introduced Compute Unified Device Architecture(CUDA), a general-purpose parallel computing architectureand language for GPUs. Because of these advantages, manysuccessful CUDA programs have appeared in the literature.A recent JSTARS special issue devoted to high performancecomputing in remote sensing features several papers dealingwith GPU [24]–[28]. In this study we design a GPU-basedmassively parallel algorithm for computing electromagneticscattering of the proposed double-layer vegetation modelThe rest of the paper is organized as follows. Section II de-

scribes using Monte Carlo method and vector radiative transferto calculate the electromagnetic scattering of double-layer veg-etation model. Sections III and IV respectively explain serialprogram and CUDA program with many optimized techniques.Finally, Section V concludes the paper.

II. DOUBLE-LAYER VEGETATION MODEL

The MIMICS model simulates a forest as a three-layer vege-tation model consisting of the canopy, trunk, and ground layers.In our study we focus on the scattering of such crops as rice andsoybean which can be simulated by a double-layer vegetationmodel with the vegetation and ground layers (Fig. 1). Thus, theMIMICS model needs to be modified for this purpose by settingthe height of the trunk layer to be zero. The vegetation layer ismodeled using discrete leaves, stems, and other components,which are simulated using flat ellipsoids with different shapesand orientations. Each leaf is approximated as a disc-like ellip-soid by letting one of the ellipsoid axes to be very small. If weelongate another axis, it is degraded to a stick-like shape resem-bling the stem of crop. The ground layer is modeled using theGaussian random rough surface. In addition, we use the zero-and first-order solution to obtain the scattering coefficients ofthe rough ground, vegetation, and their combinationThe far-field scattering by a small particle located at the origin

of the coordinate system can be described as [29]

(1)

where the internal field is an unknown quantity.Using the general Rayleigh–Gans approximation, can

be replaced by the incident field. Let the incident wave be aplane wave so that

(2)

where is the polarization tensor defined as

(3)

with the three components of the demagnetization factor de-noted as , , and , respectively.Substituting (2) in (1) gives

(4)

In (4), is the scattering amplitude matrix of the ellip-soid:

(5)

where is the volume of ellipsoid,(the subscript , or ), is the scattering wave

polarization unit vector and is the incident wave polarizationunit vector.In the vector radiative transfer (VRT) theory, intensity

is used to describe the scattering, absorption and propagationof an electromagnetic wave in random media [30]. con-sists of four polarization parameters called Stokes parameters

. The Stokes matrix relates the Stokesparameters of the scattered wave to those of the incidentwave. Elliptically polarized wave can be decomposed into thesum of the vertical and horizontal polarization components:

. The most important property of the Stokes pa-rameters is the incoherent addition, that is, when the scatteringfield of every discrete random scatterer is non-coherent, theStokes parameters of the total scattering field can be calculatedby adding the Stokes parameters of every scattering field.For a single particle, the relationship between Stokes param-

eters of scattering field and the ones of incident field can be ex-pressed as [31]

(6)

where the Stokes matrix is shown in (7) at the bottom of thenext page. By substituting (5) into (7), the Stokes matrix for thescattering by an ellipsoid is obtained.In active remote sensing, the radiative intensity satisfies an

integro-differential equation called the VRT equation, which isgiven by

(8)

where is the phase matrix that provides the contri-butions from the direction into the direction , is theextinction matrix determined by the Stokes parameters of thescatterers, and is the vector from the observation point to thescatterer.

SU et al.: GPU-ACCELERATED COMPUTATION FOR ELECTROMAGNETIC SCATTERING OF A DOUBLE-LAYER VEGETATION MODEL 1801

Fig. 2. Dividing leaves into groups.

The phase and extinctionmatrices, which contain informationabout radiation propagation and scattering, need to be known inorder to obtain the solution of the VRT function. The orienta-tions, sizes, and relative permittivity values of the leaves in thisvegetation model are random quantities, and their statistical av-erage can be used to obtain the scattering properties of the veg-etation layer. The leaves are divided into groups based ontheir sizes and orientations. The leaves in each group have thesame size and orientation as depicted in Fig. 2. Let the numberof scatterers in a unit volume be and the orientation densityfunction of the scatterer be . The Euler angles ,, are assumed to be independently and uniformly distributedand take the following form:

where

(9)

Considering the incoherent addition of the Stokes parameters,the phase matrix of the vegetation layer can be expressed asfollows:

(10)

In (9) and (10), the subscript represents the group,and indicates the average of the phase matrix in group .This phase matrix is expressed as

(11)

where is the Stokes matrix of the group.The extinction matrix can be approximated as follows:

(12)

TABLE IPARAMETERS OF RICE

where , , , and can be obtained from theforward scattering theory and are expressed as

(13)

Substituting (10) and (13) into (8) and using the iterative ap-proach to solve for the VRT equation yield

(14)Only the - and first-order iterative solutions, which re-

spectively correspond to the scattering of the ground and theinteraction between the vegetation and ground layers, are nec-essary when the scatterers are not dense. Using these iterationsolutions, the bistatic scattering coefficient can be expressed asfollows:

(15)where , or indicate the direction of polarization, isthe bistatic angle and is the -order solution ofthe VRT equation.

III. VEGETATION SIMULATION

This section describes the double-layer vegetation modelused to compute the electromagnetic scattering of the vege-tation. We use rice as an example. The scattering coefficientsin the simulation are calculated for various incident anglesranging from 0 to 85 at 5 intervals. The parameters of riceare listed in Table I.Fig. 3 illustrates the flowchart of the original CPU code.

The calculation consists of three parts: i) initialization ofsome important parameters and computation of the relativepermittivity and the demagnetization factors, ii) calculation ofthe phase matrix with the incident angle ranging from 0 to

(7)

1802 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 4, AUGUST 2013

Fig. 3. Flowchart of the original code to run on a CPU.

TABLE IIPSEUDO CODE FOR THE CALCULATION OF THE PHASE MATRIX

85 by first grouping the leaves based on their orientations andthen computing the phase matrix of each group, followed byadding the individual results to obtain the total phase matrix;and iii) computation of the backscattering coefficient based onthe total phase matrix. The performance profiling of the CPUcode shows that the second part took up 97.8% of the totaltime whereas the first and third parts only took 2% and 0.2%,respectively.Table II shows the pseudo code for calculating the phase ma-

trix. The in line 1 gives the number of leaves groups for sam-pling by using Monte-Carlo method. The line 4 indicates the in-cident angle varies from 0 degree to 85 at 5 intervals. The uni-form_rand in lines 7, 8 and 9 represents the random number ofuniform distribution and they are generated by Blitz-0.9, whichis programmed by using the templates of C++ language to effi-ciently implement matrix operation and random number gener-ation. It is easy to see that the code from line 6 to line 12 repeats

TABLE IIIEXECUTION TIME OF THE CPU CODE

TABLE IVSPECIFICATION OF NVIDIA GTX 480

32768 times. The next section will show how to use GPU to ac-celerate this part.For further improving the performance, the C code was com-

piled with the optimization compiling option. Table IIIshows the CPU times for different numbers of leaf groups withand without the optimization. As seen, the CPU code withthe optimization yields about 30 speedup. The GPUspeedup reported in Section IV is with respect to the CPU codewith the optimization.

IV. GPU CUDA IMPLEMENTATION

In 2007, NVIDIA introduced CUDA, an extension to theC programming language, for general-purpose computing onGPUs. It is designed so that its constructs allow for natural ex-pression of data-level parallelism. A CUDA program is orga-nized into two parts: a serial program running on the CPU and aparallel part running on the GPU [32], [33]. The parallel part iscalled a kernel. A CUDA program automatically uses more par-allelism on GPUs that have more processor cores. A C programusing CUDA extensions distributes a large number of copies ofthe kernel into available multiprocessors (MP) to be executed si-multaneously. The CUDA code consists of three computationalphases: transfer of data into the global memory of the GPU, ex-ecution of the CUDA kernel, and transfer of results from theGPU into the memory of CPU.The proposed GPU-based program for the double-layer

vegetation modeling is executed on a personal computer withone Intel i7–930 2.8 GHz quad-core CPU and one NVIDIAFermi GTX 480 GPU. GTX 480 has 480 compute cores andhigh memory bandwidth. The GTX 480 specification is listedin Table IV.As previously mentioned, the calculation of the phase matrix

is the most time-consuming part. As shown in Fig. 4, we imple-ment it on a GPU with 128 blocks and 256 threads per block.Each block is processed by one MP. Because the scattering co-efficients for each of 18 scattering angles should be calculated,the kernel is executed eighteen times. Each thread calculates thephase matrix of one leaf group. The result is transferred from theregisters to the shared memory and all the phase matrices are

SU et al.: GPU-ACCELERATED COMPUTATION FOR ELECTROMAGNETIC SCATTERING OF A DOUBLE-LAYER VEGETATION MODEL 1803

Fig. 4. Framework of the CUDA code.

Fig. 5. Comparison of the serial and parallel programs.

summed in each block. The final result of each block is trans-ferred back to the host memory.By taking the backscattering of rice on the L-band as an ex-

ample, the results of the serial and parallel programs are shownin Fig. 5.As listed in Table IV, the Fermi GTX 480 GPU has 32768

registers and 49152 bytes shared memory per MP. All interme-diate variables are stored in registers for faster access and onlythe phase matrices are stored in shared memory. By profilingour CUDA code, it showed that we used 32256 registers and8196 bytes shared memory per MP. The threads copy the cal-culated phase matrices of the leaves into the shared memories.The reduction can also be executed in parallel because all thethreads in the same block can access the shared memory on theSM, and the complexity of can be reduced to .Table V shows how to implement parallel reduction by usingshared memory.The performance of the preliminary CUDA code is listed in

Table VI. As shown, the execution time approximately increases

TABLE VPARALLEL REDUCTION

TABLE VIA PRELIMINARY CUDA CODE PERFORMANCE

with the number of groups and thus the speedup is about thesame.The CUDA code consists of three phases, namely, data

transfer from the CPU host to the GPU device, kernel executionon GPU, and data transfer from the GPU device to the CPUhost. Decreasing data transfer time or its ratio to the totalprogram execution time can improve the performance. The datatransfer time can be reduced using asynchronous transfer[34],[35], which allows host-device data transfer and GPU kernelinstruction execution at the same time and thus conceal part ofdata transfer time [36] as illustrated in Fig. 6.The scattering coefficient computation is packaged in the

stream to update the previous result. Considering that thecode needs to calculate the scattering coefficient in different

1804 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 4, AUGUST 2013

Fig. 6. Asynchronous data transfer.

TABLE VIISPEEDUP IMPROVEMENT WITH ASYNCHRONOUS DATA TRANSFER

TABLE VIIIGENERATING RANDOM NUMBER

scattering angles, several streams can be used to concealhost-device data transfer. Table VII lists the performance of thecode using asynchronous data transfer.The performance of the CUDA code does not improve much

even though the asynchronous transfer conceals part of thetransfer data because a serial method was used to generaterandom numbers and transferred them to the device memory.Assuming that 32,768 leaf groups are computed and eachgroup needs 5 random numbers, then 640 KB of data shouldbe transferred. NVIDIA introduced a library called the CUDARandom Number Generation (cuRAND) library to generaterandom numbers of a specific distribution. In this work everyCUDA thread generates 5 random numbers. The cuRANDlibrary has two ways to generate a random number: i) serialgeneration in host memory and ii) parallel generation in devicememory. We use the second way to make it more efficient.After generating a random number, there are three ways tostore the number: i) storing it in global memory, ii) storing itin register memory, and iii) using it directly without storage.Here we choose the last method to get the fastest access speed.As shown in Table VIII, the Euler angles of everyleaf group could be obtained by using these random numbersdirectly.Fig. 6 shows the process of storing and transferring data to im-

prove speedup performance. First, the incident angles ,in total of 8 bytes, are transferred from host memory to devicememory, and then the kernel is executed to obtain the relativepermittivity, demagnetization factors and the phase matrix, fi-nally the scattering efficient under one scattering angle, in totalof 16 bytes, is transferred from device memory to host memory.Table IX lists the significantly improved speedup using the

parallel random number generator, as compared to the speedupresults in Table VII.

TABLE IXIMPROVED SPEEDUP WITH THE CURAND LIBRARY

Fig. 7. Data access.

Fig. 8. Process of data transfer in the proposed GPU model.

So far the proposed vegetation scattering model is fully par-allel. Although there is no much data transfer in the model, allthe threads repeat some parts of the code, such as the compu-tation for soil relative permittivity, and all of them obtain sim-ilar results. All the Euler angles are scalars. There are32,768 leaf groups and each group has its own unique values ofthe Euler angles. Fig. 7 shows the process of storing and trans-ferring data to improve program performance. Initially the in-cident angles , in total of 8 bytes, are transferring fromhost memory to device memory. The CUDA kernel then com-putes the relative permittivity, demagnetization factors and thephase matrix. Finally the scattering coefficients for each scat-tering angle, in total of 16 bytes, are transferred from devicememory to host memory.Modern high-performance computing promotes heteroge-

neous computation, in which the CPU mainly executes theserial calculation and GPU focuses on the massively parallelcomputation. The calculation of the soil relative permittivityand demagnetization factors of each group is executed onlyonce by the CPU and then these parameters along with theincident angles, in total of 48 bytes, are transferred to the globalmemory (Fig. 8), thereby avoiding the repeated computationin the kernel function. The kernel only needs to calculate thephase matrix of every group. Table X shows the performance

SU et al.: GPU-ACCELERATED COMPUTATION FOR ELECTROMAGNETIC SCATTERING OF A DOUBLE-LAYER VEGETATION MODEL 1805

Fig. 9. Multi-GPU configuration.

TABLE XTIMING RESULT ON GTX480

TABLE XIEXECUTION TIMING AND SPEEDUP BENCHMARK WITH THE NVIDIA

DUAL-GPU GTX590 CARD

of the final code on one NVIDIA GTX480 GPU. As seen, thespeedup increases with the number of groups (threads) becausemore registers are used.The single-GPU version is further extended to execute on

the NVIDIA GTX 590 card with two 512-core GPUs sharingthe same PCI-E bus bandwidth. To efficiently calculate 18 scat-tering angles, the kernel function should be executed eighteentimes on one GPU. We distributed them to two GPUs and everyGPU runs the kernel for nine times, as illustrated in Fig. 9. Thedual-GPU execution time and speedup results for different num-bers of leaf groups are listed in Table XI.The final version of the CUDA program includes several

ways to accelerate speedup: i) using the registers for fastermemory access and the shared memory for parallel reduction,ii) developing multiple pipelined streams and asynchronoustransfer for simultaneous host-device data transfer and kernelexecution to conceal data transfer time, iii) adopting a parallelversion of the random number generator, and iv) performingthe CPU-GPU heterogeneous computation. By using thesetechniques, the final speedup is 213 on the NVIDIA GTX480 GPU and 291 on the dual-GPU GTX 590 card.

V. CONCLUSION

In this paper, we have designed a highly parallel GPU-basedalgorithm for the Monte-Carlo-based electromagnetic scat-tering of a double-layer vegetation model to run on NVIDIAFermi GTX480 with 480 cores and NVIDIA dual-GPU GTX590 with 1024 cores. In this approach, the leaves are firstdivided into different groups according to their orientations andsizes. The phase matrix of each leaf group is then calculated,and the sum of the individual phase matrices is computed toobtain the backscattering coefficient. The large number of leafgroups and the group independence make the problem suitablefor massive parallelization using GPU/CUDA. We detailedthe techniques to improve the speedup performance for theNVIDIA Fermi GPUs. We used as many registers as possiblefor faster memory access. The asynchronous transfer wasintroduced to conceal the data transfer time, and the cuRANDlibrary was used to efficiently and effectively generate randomnumbers in parallel. Heterogeneous computation was alsoimplemented to allow CPU and GPU to execute the serial andparallel tasks, respectively. By using these techniques, the finalspeedup is 213 on the NVIDIA GTX 480 GPU and 291on the dual-GPU GTX 590 card with respect to its pure-CPUcounterpart with the compiling optimization.

REFERENCES[1] Trevor, “UHF propagation through woods and underbrush,” RCA Rev.,

vol. 5, 1940.[2] S. Seker and A. Schneider, “Stochastic model for pulsed radio trans-

mission through stratified forests,” IEE Prod., vol. 134, Aug. 1987.[3] N. C. Curri, E. E. Martin, and F. B. Dyer, “Radar Foliage Penetration

Measurements at Millimeter Wavelengths,” Georgia Inst. Technol.,Eng. Experiment Station, Atlanta, GA, 1975.

[4] D. L. Hogan, “An Analysis of Foliage Effects on Long-Range Surveil-lance,” MIT Lincoln Lab., Lexington, MA, 1980.

[5] A. Tavakoli, K. Sarabandi, and F. T. Ulaby, “Horizontal propagationthrough periodic vegetation canopies,” IEEE Trans. Antennas Propag.,vol. 39, pp. 1014–1023, 1991.

1806 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 4, AUGUST 2013

[6] E. P. W. Attema and F. T. Ulaby, “Vegetation modeled as a watercloud,” Radio Sci., vol. 13, pp. 357–364, 1981.

[7] L. Tsang, M. C. Kubacsi, and J. A. Kong, “Radiative transfer theoryfor active remote sensing of a layer of small ellipsoidal scatters,” RadioSci., vol. 16, pp. 321–329, 1981.

[8] L. H. Lang, “Electromagnetic backscattering from a sparse distributionof lossy dielectric scatters,” Radio Sci., vol. 16, pp. 15–30, 1981.

[9] A. K. Fung and H. S. Fung, “Application of first-order renormalizationmethod to scattering from a vegetation-like half-space,” IEEE Trans.Geosci. Electr., vol. GE-15, no. 4, pp. 189–195, Oct. 1977.

[10] J. A. Richards, G. Sun, and D. S. Simonett, “L-band radar backscattermodeling of forest stands,” IEEE Trans. Geosci. Remote Sens., vol.GE-25, pp. 487–498, 1987.

[11] M. A. Karam and A. K. Fung, “Leaf-shape effects in electromagneticwave scattering from vegetation,” IEEE Trans. Geosci. Remote Sens.,vol. 27, pp. 187–697, 1989.

[12] S. L. Durden, J. J. Van Zyl, and H. A. Zebker, “Modeling and observa-tion of the radar polarization signature of forested areas,” IEEE Trans.Geosci. Remote Sens., vol. 37, pp. 290–310, 1989.

[13] F. T. Ulaby, K. Sarabandy, K. McDonald, M.Whitt, andM. C. Kobson,“Michigan microwave canopy scattering model,” Int. J. Remote Sens.,vol. 11, pp. 1223–1253, 1990.

[14] R. H. Lang and J. S. Sighu, “Electromagnetic backscattering from alayer of vegetation: A discrete approach,” IEEE Trans. Geosci. RemoteSens., vol. GE-21, pp. 62–71, 1983.

[15] K. Sarabandi, P. F. Polatin, and F. T. Ulaby, “Monte Carlo simulationof scattering from a layer of vertical cylinders,” IEEE Trans. AntennasPropag., vol. 41, pp. 465–473, 1993.

[16] P. F. Polatin, K. Sarabandi, and F. T. Ulaby, “Monte Carlo simula-tion of electromagnetic scattering from a heterogeneous two-compo-nent medium,” IEEE Trans. Antennas Propag., vol. 43, pp. 1048–1057,1995.

[17] L. Tsang, K. H. Ding, G. Zhang, C. C. Hsu, and J. A. Kong,“Backscattering enhancement and clustering effects of randomlydistributed dielectric cylinders overlying a dielectric half space basedon Monte-Carlo simulations,” IEEE Trans. Antennas Propag., vol. 43,pp. 488–498, 1995.

[18] M. Zhang, Y. X. Song, Z. S. Wu, and A. Y. Ma, “Simulation of low-grazing scattering properties of vegetation,” Chinese Phys. Lett., pp.502–505.

[19] Y.-X. Song, M. Zhang, and Z.-S. Wu, “Analysis the electromagneticscattering of the double-layer vegetation,” J. Xidian University, pp.410–414, 2003.

[20] M. A. Karam,M. A. Karam, A. K. Fung, and Y.M.M. Antar, “Electro-magnetic wave scattering from some vegetation sample,” IEEE Trans.Geosci. Remote Sens., vol. 26, pp. 799–808, 1988.

[21] NVIDIA CUDA C Programming Guide. NVIDIA Corp., 2011.[22] CUDA C Best Practices Guide. NVIDIA Corp., 2011.[23] Tuning CUDA Application for Fermi. NVIDIA Corp., 2011.[24] C.-C. Chang, Y.-L. Chang, M.-Y. Huang, and B. Huang, “Acceler-

ating regular LDPC code decoders on GPUs,” IEEE J. Sel. Topics Appl.Earth Observ. Remote Sens. (JSTARS), vol. 4, no. 3, p. 653, Sep. 2011.

[25] S.-C. Wei and B. Huang, “GPU acceleration of predictive partitionedvector quantization for ultraspectral sounder data compression,” IEEEJ. Sel. Topics Appl. Earth Observ. Remote Sens. (JSTARS), vol. 4, no.3, p. 677, Sep. 2011.

[26] C. Song, Y. Li, and B. Huang, “A GPU-accelerated wavelet decom-pression system with SPIHT and Reed-Solomon decoding for satelliteimages,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. (JS-TARS), vol. 4, no. 3, p. 683, Sep. 2011.

[27] J. Mielikainen, B. Huang, and H.-L. A. Huang, “GPU-acceleratedmulti-profile radiative transfer model for the infrared atmosphericsounding interferometer,” IEEE J. Sel. Topics Appl. Earth Observ.Remote Sens. (JSTARS), vol. 4, no. 3, p. 691, Sep. 2011.

[28] C. A. Lee, S. D. Gasster, A. Plaza, C.-I Chang, and B. Huang, “Re-cent developments in high performance computing for remote sensing:A review,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. (JS-TARS), vol. 4, pp. 508–527, Sep. 2011.

[29] L. Tsang, J. A. Kong, and R. T. Shin, Theory of Microwave RemoteSensing. New York: Wiley-Interscience, 1985.

[30] A. K. Fung, Microwave Scattering and Emission Models and TheirApplications. Boston, MA: Artech House, 1994.

[31] L. Tsang, J. A. Kong, and K.-H. Ding, Scattering of ElectromagneticWaves: Theories and Applications. New York: Wiley, 2000.

[32] J. Sanders and E. Kandrot, CUDA by Example an Introduction to Gen-eral-Purpose GPU Programming. Boston, MA: Addison-Wesley,2010.

[33] D. B. Kirk and W. W. Hwu, Programming Massively Parallel Proces-sors: A Hands-on Approach. New York: Elsevier, 2009.

[34] Whitepaper, NVIDIA GF100. NVIDIA Corp., 2010.[35] NVIDIA’s Next Generation CUDA Compute Architecture: Fermi.

NVIDIA Corp., 2010.[36] NVIDIAN GeForce GTX480/470/465 GPU Datasheet. NVIDIA

Corp., 2010.

Xiang Su received the B.S. degree in 2009 from theSchool of Science, Xidian University, China, wherehe is currently pursuing the Ph.D. degree.His research interests include GPU high-perfor-

mance computing in remote sensing and computa-tional electromagnetics.

Jiaji Wu received the B.S. degree in electricalengineering from Xidian University, Xi’an, China,in 1996, the M.S. degree from the National TimeService Center (NTSC), Chinese Academy of Sci-ences, in 2002, and the Ph.D. degree in electricalengineering from Xidian University in 2005.He currently is a Professor at Xidian University.

His current research interests include still imagecoding, hyperspectral/ multispectral image com-pression, communication and high-performancecomputing.

Bormin Huang received the M.S.E. degree inaerospace engineering from the University ofMichigan, Ann Arbor, and the Ph.D. degree in thearea of satellite remote sensing from the Universityof Wisconsin-Madison.He was in NASA Langley Research Center during

1998–2001 for the NASA New Millennium Pro-gram’s Geosynchronous Imaging Fourier TransformSpectrometer (GIFTS). He is currently a researchscientist and principal investigator at the SpaceScience and Engineering Center, the University

of Wisconsin-Madison, where he advises and supports both national andinternational graduate students and visiting scientists. He has authored orcoauthored over 100 scientific and technical publications, including the book“Satellite Data Compression” (Springer, 2011). He has broad interests andexperiences in remote sensing science and technology, including satellitedata compression and communications, remote sensing image processing,remote sensing forward modeling and inverse problems, and high-performancecomputing in remote sensing.Dr. Huang has been serving as a Chair for the SPIE Europe Conference on

High Performance Computing in Remote Sensing since 2011 and for the SPIEConference on Satellite Data Compression, Communications, and Processingsince 2005. He currently serves as an Associate Editor for the Journal of Ap-plied Remote Sensing and an Editor for the Journal of Geophysics and RemoteSensing. He has also served as a program committee member and session chairfor several IEEE or SPIE conferences.

ZhensenWu is a Professor at the Institute of Intelligent Information Processing,Xidian University, China.