ieee transactions on electromagnetic …icte.uowm.gr/uploads/zigkiridis/publications/journals/33....

12
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY 1 GPU-Based Calculation of Lightning-Generated Electromagnetic Fields in 3-D Problems With Statistically Defined Uncertainties Georgios G. Pyrialakos, Theodoros T. Zygiridis, Member, IEEE, Nikolaos V. Kantartzis, Senior Member, IEEE, and Theodoros D. Tsiboukis, Senior Member, IEEE Abstract—A complete computational framework for the efficient study of lightning-induced electromagnetic fields and solution of pertinent problems with uncertainties in realistic environments is presented in this paper. The latter often involve various fac- tors, such as material inhomogeneities, rough terrain surfaces, and irregular lightning channels that may inhibit the utilization of simplified approaches. To deal with these situations of augmented complexity, the finite-difference time-domain method is applied in 3-D curvilinear formulation, ensuring that all the important details are taken into account. As the study of real-life lightning problems involves intense computations, the algorithm is accel- erated by exploiting the computing capabilities of contemporary graphics processing units. Our implementation relies on a massive parallelization approach, introduces several new optimized prac- tices, and ensures significant shortening of the simulations’ dura- tion. Hence, the investigation of configurations with uncertainties and the extraction of statistical features are greatly facilitated. In other words, the proposed approach comprises an instructive con- tribution toward the foundation of a useful tool for the in-depth investigation of lightning-related phenomena. Index Terms—Finite-difference time-domain method (FDTD), graphics processing unit (GPU) computing, lightning, nonorthog- onal grids, stochastic properties. I. INTRODUCTION T HE reliable calculation of lightning-generated electromag- netic pulses, the assessment of their consequences, and the impact of statistically-varying factors are subjects of consider- able importance for the engineering community. In essence, the consistent investigation of pertinent phenomena constitutes a key element of studies in several problems, including overhead transmission lines [1], [2], underground cables [3]–[5], human- safety issues [6], electrical wiring [7] and circuitry protection, Manuscript received March 17, 2015; revised May 13, 2015; accepted June 19, 2015. This work was supported by the European Union (European Social FundESF) and Greek National Funds through the Operational Program “Edu- cation and Lifelong Learning” of the National Strategic Reference Framework (NSRF)Research Funding Program: Aristeia. Investing in knowledge society through the European Social Fund. G. Pyrialakos, N. Kantartzis, and T. Tsiboukis are with the Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece (e-mail: [email protected]; [email protected]; tsibukis @auth.gr). T. Zygiridis is with the Department of Informatics and Telecommunications Engineering, University of Western Macedonia, Kozani 50100, Greece (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TEMC.2015.2450496 quality issues in distribution networks [8], grounding systems [9], as well as other significant configurations [10]–[13]. Over the past years, various analytical [14]–[16] and nu- merical methods for predicting electromagnetic fields due to lightning strikes have been presented and investigated. The popular Cooray–Rubinstein approximate formula [17] can be used for calculating the horizontal electric component over lossy grounds. Results from the formula’s time-domain coun- terpart can be also obtained [18] through a procedure that necessitates the calculation of several integrals. Other method- ologies [19], [20] aim directly at the computation of Sommer- feld integrals, which describe the behavior of electric dipoles (treated as building blocks of the lightning channel). Although fast, the aforementioned approaches cannot easily take into account a number of important factors, which are present in real-world problems. Hence, computer simulations are deemed more suitable for modeling lightning in complex environ- ments. Regarding these solutions, the finite-difference time- domain (FDTD) method [21] is the most widespread choice. In the ideal case of geometries with rotational symmetry, a two-dimensional (2-D) implementation suffices [22], even if mixed propagation paths are present [23]. In more advanced contexts [3], the generated fields are computed with a 2-D approach, and then embedded into a three-dimensional (3- D) simulation, to predict induced voltages. FDTD studies in 3-D are also performed in [24] and [25], for calculating the horizontal electric field and assessing the reliability of the Cooray–Rubinstein formula over rough grounds. In [26], a treat- ment of large-scale problems is presented, which incorporates moving windows in the computational domain, and performs parallel computations based on a message-passing interface. Apart from the FDTD technique, other alternatives based on the multiresolution time domain scheme [27], the finite-element technique [28], the transmission-line-matrix method [29], and the constrained-interpolation-profile approach [30] have been applied for problems of similar complexity. Nevertheless, none of these methodologies investigates the statistical properties of the produced electromagnetic fields. Evidently, the development of a computational methodology that delivers all the necessary calculations, both fast and reliably under realistic constraints, remains an open problem. Moreover, available approaches do not tackle the fact that some aspects of lightning problems exhibit a certain level of randomness, which induces uncertainty to the outputs. In this paper, we present a solution that, to the best of our knowledge, for the first time 0018-9375 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Upload: dothuy

Post on 25-Apr-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY 1

GPU-Based Calculation of Lightning-GeneratedElectromagnetic Fields in 3-D ProblemsWith Statistically Defined Uncertainties

Georgios G. Pyrialakos, Theodoros T. Zygiridis, Member, IEEE, Nikolaos V. Kantartzis, Senior Member, IEEE,and Theodoros D. Tsiboukis, Senior Member, IEEE

Abstract—A complete computational framework for the efficientstudy of lightning-induced electromagnetic fields and solution ofpertinent problems with uncertainties in realistic environmentsis presented in this paper. The latter often involve various fac-tors, such as material inhomogeneities, rough terrain surfaces,and irregular lightning channels that may inhibit the utilization ofsimplified approaches. To deal with these situations of augmentedcomplexity, the finite-difference time-domain method is appliedin 3-D curvilinear formulation, ensuring that all the importantdetails are taken into account. As the study of real-life lightningproblems involves intense computations, the algorithm is accel-erated by exploiting the computing capabilities of contemporarygraphics processing units. Our implementation relies on a massiveparallelization approach, introduces several new optimized prac-tices, and ensures significant shortening of the simulations’ dura-tion. Hence, the investigation of configurations with uncertaintiesand the extraction of statistical features are greatly facilitated. Inother words, the proposed approach comprises an instructive con-tribution toward the foundation of a useful tool for the in-depthinvestigation of lightning-related phenomena.

Index Terms—Finite-difference time-domain method (FDTD),graphics processing unit (GPU) computing, lightning, nonorthog-onal grids, stochastic properties.

I. INTRODUCTION

THE reliable calculation of lightning-generated electromag-netic pulses, the assessment of their consequences, and the

impact of statistically-varying factors are subjects of consider-able importance for the engineering community. In essence, theconsistent investigation of pertinent phenomena constitutes akey element of studies in several problems, including overheadtransmission lines [1], [2], underground cables [3]–[5], human-safety issues [6], electrical wiring [7] and circuitry protection,

Manuscript received March 17, 2015; revised May 13, 2015; accepted June19, 2015. This work was supported by the European Union (European SocialFund−ESF) and Greek National Funds through the Operational Program “Edu-cation and Lifelong Learning” of the National Strategic Reference Framework(NSRF)−Research Funding Program: Aristeia. Investing in knowledge societythrough the European Social Fund.

G. Pyrialakos, N. Kantartzis, and T. Tsiboukis are with the Department ofElectrical and Computer Engineering, Aristotle University of Thessaloniki,Thessaloniki 54124, Greece (e-mail: [email protected]; [email protected]; [email protected]).

T. Zygiridis is with the Department of Informatics and TelecommunicationsEngineering, University of Western Macedonia, Kozani 50100, Greece (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TEMC.2015.2450496

quality issues in distribution networks [8], grounding systems[9], as well as other significant configurations [10]–[13].

Over the past years, various analytical [14]–[16] and nu-merical methods for predicting electromagnetic fields due tolightning strikes have been presented and investigated. Thepopular Cooray–Rubinstein approximate formula [17] can beused for calculating the horizontal electric component overlossy grounds. Results from the formula’s time-domain coun-terpart can be also obtained [18] through a procedure thatnecessitates the calculation of several integrals. Other method-ologies [19], [20] aim directly at the computation of Sommer-feld integrals, which describe the behavior of electric dipoles(treated as building blocks of the lightning channel). Althoughfast, the aforementioned approaches cannot easily take intoaccount a number of important factors, which are present inreal-world problems. Hence, computer simulations are deemedmore suitable for modeling lightning in complex environ-ments. Regarding these solutions, the finite-difference time-domain (FDTD) method [21] is the most widespread choice.In the ideal case of geometries with rotational symmetry, atwo-dimensional (2-D) implementation suffices [22], even ifmixed propagation paths are present [23]. In more advancedcontexts [3], the generated fields are computed with a 2-Dapproach, and then embedded into a three-dimensional (3-D) simulation, to predict induced voltages. FDTD studies in3-D are also performed in [24] and [25], for calculating thehorizontal electric field and assessing the reliability of theCooray–Rubinstein formula over rough grounds. In [26], a treat-ment of large-scale problems is presented, which incorporatesmoving windows in the computational domain, and performsparallel computations based on a message-passing interface.Apart from the FDTD technique, other alternatives based on themultiresolution time domain scheme [27], the finite-elementtechnique [28], the transmission-line-matrix method [29], andthe constrained-interpolation-profile approach [30] have beenapplied for problems of similar complexity. Nevertheless, noneof these methodologies investigates the statistical properties ofthe produced electromagnetic fields.

Evidently, the development of a computational methodologythat delivers all the necessary calculations, both fast and reliablyunder realistic constraints, remains an open problem. Moreover,available approaches do not tackle the fact that some aspects oflightning problems exhibit a certain level of randomness, whichinduces uncertainty to the outputs. In this paper, we present asolution that, to the best of our knowledge, for the first time

0018-9375 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY

Fig. 1. Typical computational domain of the lightning problem, for setupswith inhomogeneous grounds, nonflat terrains, and irregular lightning channels.

fulfills these two objectives. Its basic element is the full-waveFDTD simulations, conducted within a 3-D curvilinear frame-work. This choice is highly flexible: it can model nonsymmetricconfigurations, rough terrains, and inhomogeneous soils withCartesian meshes, while the stricter case of tilted or irregu-lar lightning patterns is dealt with nonorthogonal grids. Fur-thermore, computations are parallelized by implementing andoptimizing the algorithm on graphics processing units (GPUs)[31]–[33]. Their multitude of cores allows us to exploit the par-allelization potential of the FDTD method, while advanced pro-gramming practices contribute toward the most efficient sourceutilization. In this way, marked reduction of computational timesis accomplished, which is crucial for the extraction of statisticalinformation. Numerical tests verify the accuracy and speed ofthe GPU implementations. Also, the impact of one or more fac-tors on the induced fields in several cases is assessed. Overall,we show that the proposed approach can solve both reliablyand efficiently real-life applications that incorporate lightning-induced phenomena, even when uncertainties are involved.

II. PROBLEM MODELING

The typical layout of the lightning problems is sketched inFig. 1. The 3-D computational domain is occupied by either airor ground material. The sources of the electromagnetic fieldsappear on the lightning channel only, which either is assumedstraight, or may have a more complex pattern (most studies tilltoday consider only the straight-channel case). The current dis-tribution is determined according to the modified transmissionline model with exponential decay [34],

I(z, t) = I(0, t − z/v)e−αzu (t − z/v) (1)

where u(t) is the unit-step function and I(0, t) is the time sig-nature of the channel base current (z = 0 indicates the earth’ssurface). The speed of the return stroke is v = 1.5 × 108 m/sand the constant α of the exponential decay is 1/2000 m−1 .The above formula can be also used for tilted straight channels,after replacing z with z′, which describes the position alongthe new axis of the skewed mesh. Irregular channels are treatedsimilarly, after defining a length-measuring variable along thechannel curve. Complying with a common practice, the basecurrent is represented by two Heidler functions:

I(0, t) =2∑

�=1

I0�

η�

(t

τ�1

)2e−t/τ� 2

1 + (t/τ�1)2 (2)

where η� = exp[−(τ�1/τ�2)(2τ�2/τ�1)1/2 ], � = 1, 2. The valuesof the parameters in (2) for first and subsequent strokes can befound in [20].

As mentioned, realistic factors such as medium inhomo-geneities, rough soil surface, random channel geometries, etc.,practically necessitate the implementation of 3-D simulations.Herein, spatial steps of 1-m length are sufficient, and the com-putational domains are surrounded by a 16-cell convolutionperfectly matched layer (CPML) [35], capable of matching loss-less and lossy media. When a problem’s geometry permits so,a perfect magnetic conductor is inserted at the xz plane of thechannel, accompanied by image values for certain field compo-nents. In this way, redundant calculations due to symmetry areavoided and memory requirements are minimized. Parallel com-putations are performed with single accuracy, exploiting the fullpotential of the GPUs. Tests with double-accuracy variables donot produce any noteworthy changes, while some performanceloss is observed.

Before proceeding, we would like to elaborate further onsome aspects that will be explored. The first one refers to thegeometry of the lightning channel. As our purpose is to providea thorough investigation, the cases where: 1) the channel is com-pletely straight, but forms an oblique angle with the horizontalplane, and 2) the channel is not straight, but exhibits a moreirregular pattern are considered in the simulations, apart fromthe completely straight perpendicular channels. Given that thestandard FDTD meshes lack the necessary level of flexibility,the algorithm is modified, so that it allows grid deformationaccording to the channel’s geometry. In this way, we avoid stair-case approximations in the vicinity of the problem’s source thatare likely to produce additional errors. At the same time, the im-plementation remains relatively simple, because the structurednature of the mesh is preserved and only modifications alongthe z-axis are called for. Fig. 2 depicts the local grid distortionin the general case, so that cell edges change in accordance tothe channel pattern.

Following [21], the update equations can be derived from theintegral form of Maxwell’s equations and involve the necessarymetric coefficients. For instance, the Ex update is

Ex |n+1i+ 1

2 ,j,k= Ca |i+ 1

2 ,j,k Ex

∣∣∣n

i+ 12 ,j,k

+Cb |i+ 1

2 ,j,k

Δy√

gi+ 12 ,j,k

(Hz̃ ′ |n+ 1

2i+ 1

2 ,j+ 12 ,k

− Hz̃ ′ |n+ 12

i+ 12 ,j− 1

2 ,k

)

−Cb |i+ 1

2 ,j,k

Δz′√

gi+ 12 ,j,k

(Hy |

n+ 12

i+ 12 ,j,k− 1

2− Hy |

n+ 12

i+ 12 ,j,k+ 1

2

)(3)

where Ca and Cb are standard spatially varying constants,

Ca |i+ 12 ,j,k =

2ε|i+ 12 ,j,k − σ|i+ 1

2 ,j,kΔt

2ε|i+ 12 ,j,k + σ|i+ 1

2 ,j,kΔt(4)

Cb |i+ 12 ,j,k =

2Δt

2ε|i+ 12 ,j,k + σ|i+ 1

2 ,j,kΔt. (5)

gi+ 12 ,j,k corresponds to the determinant of the metric tensor,

Δz′ = Δz/ cos θk , and θk is the tilt angle of the individual cell

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PYRIALAKOS et al.: GPU-BASED CALCULATION OF LIGHTNING-GENERATED ELECTROMAGNETIC FIELDS IN 3-D PROBLEMS 3

Fig. 2. Nonorthogonal FDTD grid that conforms to the irregular shape of alightning channel.

with respect to the original z-axis. The necessary projectionsused in (3) are obtained according to

Hz̃ ′ |n+ 12

i+ 12 ,j+ 1

2 ,k=

(tan2θk + 1

)Hz ′ |n+ 1

2i+ 1

2 ,j+ 12 ,k

+14

tan θk

(Hx |

n+ 12

i,j+ 12 ,k− 1

2+ Hx |

n+ 12

i+1,j+ 12 ,k− 1

2

+ Hx |n+ 1

2i,j+ 1

2 ,k+ 12

+ Hx |n+ 1

2i+1,j+ 1

2 ,k+ 12

). (6)

Similar equations are introduced for the CPML updates as well.The modified stability criterion is now described by

Δt ≤

⎣cmax

⎧⎨

√√√√3∑

�=1

3∑

m=1

g�,m

⎫⎬

⎭i,j,k

⎦−1

. (7)

Since the cell distortion in this case is only pertinent to the z-axis, we have g13 = tan θk , g33 = tan2 θk + 1, g11 = g22 = 1,while the rest of the metric coefficients are equal to zero.

The second factor that needs to be addressed is the mod-eling of distorted nonflat terrains. In this case, we use sim-ple averaging of material parameters, at positions on the air–ground interface. Specifically, if sair and sground denote either thedielectric constant or the conductivity of the air and the ground,respectively, and v is the percentage of a cell volume occupiedby ground material, then the effective value of the parameterassigned to this specific cell is set according to

seff = vsground + (1 − v)sair. (8)

III. PARALLEL GPU IMPLEMENTATION

GPU computing is nowadays deemed a highly effective solu-tion for the acceleration of scientific applications. This shouldbe considered rather expected, if one bears in mind the con-stantly growing capabilities of GPUs, their relatively low cost,and expanding utilization in general purpose computing. Due tocontinuous research and development, modern GPU hardwareintroduces a multitude of processing cores and various levels inthe memory hierarchy. Such features render GPUs most suit-able for the parallelization of the FDTD method [36], [37],as its equations exhibit a nontrivial degree of independence of

the involved field elements, which facilitates the execution ofmany simultaneous updates. Several studies have pointed outthat GPU programming can be very useful in reducing FDTDcomputing times; hence, its application in repeated 3-D simula-tions appears to be a consistent choice. The following literaturereview describes how FDTD implementations have benefitedfrom the development of GPU computing.

A. Available Parallelization Strategies on GPUs

Early implementations of the FDTD method on GPUs werehighly complicated, due to the lack of properly structured pro-gramming languages. In one of the first attempts [38], texturememory was used for array storage, and a moderate accelerationfactor of 7 was accomplished for a 2-D problem. With the avail-ability of a more easy-to-use programming tool (“Brook”) thatallowed the incorporation of kernels, GPU exploitation becamesimpler [39]. In the latter publication, tiling was applied to cre-ate 2-D arrays from 3-D ones, and speedups up to 25 times werereported. Later, the advent of today’s programming platforms[40], combined with the upgrade of GPU architectures, maderoom for more efficient implementations. For example, the issueof shared-memory utilization is addressed in [41], along withappropriate schemes for data transfers. The work presented in[42] stresses the significance of coalesced memory accesses,and investigates different organizations of a block’s threads in1-D arrays. Another implementation of the 2-D algorithm ispresented in [43], where texture as well as shared memoriesare utilized, and acceleration up to 100 times is accomplished.The application of the 3-D method to MRI problems is pre-sented in [44], where a 45× acceleration is noted. Three differ-ent approaches regarding the mapping of Yee cells to threadsare examined in [45] without using shared memory, while theintegration of streams is mentioned in [46], to enable concur-rency and hide memory latencies during upload and calculationof data.

Consequently, it is certain that the parallelization of the stan-dard FDTD scheme on GPUs has reached an adequate levelof maturity today. Nonetheless, developing a parallel numericalcode should not be considered a trivial task, especially if fulloptimization and utilization of available resources are sought.In this paper, we have performed a critical compilation of someof the developed best practices, and combined them for MonteCarlo simulations of lightning problems, using curvilinear gridsand CPML boundaries. In addition, we describe a number ofdifferent programming strategies, concerning specific aspects ofthe algorithm’s parallelization, and provide evidence that verifythe optimality of the proposed actions.

B. Proposed GPU Implementation

In our case, the developed 3-D code is based on the CUDA5.5 programming platform [40]. A flowchart that describes theimplemented algorithm is depicted in Fig. 3. The computationalapproach consists of a loop in time, during which the mainupdate of the electric- and the magnetic field components iscompleted. In the case of fully orthogonal grids, the use of onekernel for each update suffices; however, we have found that

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY

Fig. 3. Flowchart describing the GPU implementation of the algorithm thatenables the calculation of statistical properties.

Fig. 4. Improvement due to the use of different streams for magnetic fieldupdate, compared to the single-kernel implementation.

a multikernel implementation (exploiting streams, as discussedlater) is more beneficial in the case of curvilinear FDTD algo-rithm. To calculate the field in the surrounding absorbing layers,two more groups of kernels are introduced. Each one comprises12 CPML kernels and is initiated after the execution of thecorresponding main one. This structure of the CPML update isthe result of our search for optimum performance, accordingto which it was determined that each kernel should be respon-sible for one boundary layer, and one specific component ofthe electromagnetic field. In particular, the update procedurewithin each one of the domain’s six sides requires four kernels(two for electric and two for magnetic components), in orderfor the additional calculations (due to the extra CPML terms)to be completed.

In the following paragraphs, the selection of the above, aswell as other programming strategies is justified and tested.

1) Use of Streams: Aiming at the highest level of paralleliza-tion, we have utilized streams at various points. Streams referto different and independent flow sequences, programmed tobe executed at the same time, avoiding serialization. Thanksto them, we have partially achieved the concurrent executionof the main, as well as the CPML kernels (note that such asolution is very rarely analyzed in pertinent publications). Toassess their performance, the improvement in the case of themagnetic field update of the curvilinear algorithm is evident inFig. 4, as the required calculations are completed in 38% lesstime, when streams are used (the timings refer to a single time-step update). Practically, performance is improved by “hiding”the kernel and variable reinitialization overhead. As far as theCPML updates are concerned, the implementation of the afore-mentioned kernel groups with the aid of parallel streams, alsoallows—to a certain degree—their concurrent execution, and

Fig. 5. Performance differences between single- and multiple-kernel imple-mentations of the CPML updates.

facilitates memory coalescing, as explained later. Fig. 5 verifiesthe gain due to this multiple-kernel implementation, since therequired time is reduced by a factor of 3.6.

2) Memory Considerations: A successful GPU implementa-tion depends on several factors, and the incorporation of streamsmay not be sufficient, if other aspects of parallelization areneglected. In essence, one has to take into consideration the typeand size of the available hardware memories, in order to exploittheir strengths and, at the same time, avoid the correspondingshortcomings. In the proposed implementation, the global mem-ory is used for storing the main field components, as well as theCPML variables. We have paid attention to the proper matrixalignment in memory, which ensures that adjacent threads ac-cess similarly placed elements from memory. Only when thisaction takes place, then transfers of 32 elements from globalmemory (in the case of floats) are performed in a single mem-ory access cycle (coalesced access). Especially for the CPMLupdates, specific CUDA grid alignments have to be selected;otherwise, a naive implementation of a unique kernel for allCPML areas would render coalesced access almost impossible.This is due to the utilization of a small grid size along the perpen-dicular (with respect to the CPML interfaces) directions, whichin certain cases does not allow assigning the first dimensionof the grid to the first dimension of the CPML auxiliary fieldmatrices. However, the latter action is required for accomplish-ing the coalescing mechanism. A simple, yet inefficient solutionwould make use of a full-size grid, which would neverthelessintroduce a large number of idle threads. Hence, dividing thealgorithm into individual parts with different grid requirementsthat exploit streams appears as the most appropriate choice.

Furthermore, the constant memory is employed for storingthe constants of each simulation, while parameter matrices areloaded to global memory and mapped to a surface reference,exploiting texture memory. The latter constitutes a buffer op-timized for read-only memory access of this type. In order toidentify the parts of the algorithm that can actually benefit fromtexture utilization, several tryouts have been conducted for theconstant matrices. The latter corresponds to material, as wellas geometrical parameters. We have found that 10% speedupscan be accomplished, in the case of the curvilinear FDTD algo-rithm. Tests have also pointed out that the use of texture mem-ory can be beneficial, in cases where field matrices are treatedas read only. Evidently, this is a kernel-dependent case, i.e.,

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PYRIALAKOS et al.: GPU-BASED CALCULATION OF LIGHTNING-GENERATED ELECTROMAGNETIC FIELDS IN 3-D PROBLEMS 5

TABLE ICOMPUTATIONAL TIMES OF A SIMPLE TEST SIMULATION

FOR VARIOUS CHOICES OF THE KERNEL SIZE

Size Time (ms) Size Time (ms) Size Time (ms)

8 × 16 711 8 × 8 807 8 × 32 79916 × 16 625 16 × 8 697 16 × 32 65132 × 16 589 32 × 8 595 32 × 32 73364 × 16 731 64 × 8 621 64 × 32 −

magnetic field matrices can be treated as read only within anyelectric-update kernel.

3) Kernel Size and Register Usage: To maintain a highdegree of parallelization, the kernels have been optimized interms of the block size. In essence, a grouping of 32 × 16 thread-per-block has been found to guarantee very good performance,in the case of Cartesian grids. Numerical evidence is presentedin Table I, which displays the computational times in the case ofa 101 × 101 × 101 grid and 250 time-steps, for different kerneldimensions. Note that the GPUs used in this paper can handle amaximum of 1024 threads per block. After performing a similarstudy, the most suitable kernel size for the curvilinear algorithmwas found to be 16 × 8.

Moreover, it has been ensured that register usage remainsbelow a certain threshold for the entire implementation. Thiseffectively allows more threads to be active at a given time foreach streaming processor; thus, increasing the pool of avail-able “ready to be executed” commands. The complexity of thecurvilinear algorithm did, however, require more extensive ma-nipulation of the kernel variables. To achieve operation just be-low a certain register limit, the GPU implementation has beendesigned to minimize performance leeching from unnecessaryglobal memory accesses or extra arithmetic commands.

4) Shared-Memory Exploitation: Possible benefits from theon-chip shared memory have been investigated, since such apractice is quite common in relevant studies. Being availableonly to the threads of the same block, the main principle regard-ing shared-memory utilization is to either minimize redundantglobal memory accesses, or to replace those unable to be opti-mized (via coalesced access). However, for the simple FDTDalgorithm (orthogonal grids), shared memory has been provento provide virtually no benefit. The reason behind this is that ourimplementation practically accomplishes the maximum numberof instructions per cycle (IPC); hence, no room for improve-ment is left. If shared memory is used, the additional overhead(e.g., loading of halo elements) unavoidably produces perfor-mance degradation. On the other hand, this is not the case whenimplementing the curvilinear FDTD version, as a small—butnonnegligible—speedup is then achieved. This can be attributedto the fact that elements of the field matrices are requested morethan once (in average), due to the algorithm’s spatial field aver-aging requirements. In fact, the maximum number of IPC cannotbe reached in this case, and incorporating shared memory con-tributes positively.

5) Use of Atomics: Due to the selected kernel organization,atomic operators have been applied at some points to assist with

Fig. 6. Example of error that may emerge in the case of parallel CPMLupdates, and solution provided by the use of atomic operations. Ty and Tzdenote terms appearing in the update of Ex in the CPML corner regions.

Fig. 7. Errors due to reflections from the CPML, where implementations withand without atomics are compared.

the proper CPML kernel execution. Specifically, atomics arenecessary in rare conditions occurring during the simultaneousupdate of elements at the mesh corners. In general, the incor-poration of atomics prevents the access of an element residingon the global memory, until all other operations involving thiselement have been completed. In this way, certain paralleliza-tion errors are prevented. Fig. 6 clarifies the correction providedin our algorithm, when the race between concurrent threads islikely to produce undesirable miscalculations (top and bottomPML layers are considered in this figure). With the proposed so-lution, correct updating is guaranteed, without any performancedegradation.

The necessity of atomics incorporation is also verified viaanother test simulation. The specific computational setup con-sists of a 101 × 101 × 101 space with a hard source locatedat the center. Fig. 7 displays the absolute error after 250 time-steps regarding the Hz component, along a line close to theCPML corners. As seen, incorrect handling of the paralleliza-tion strategy may result in additional flaws, which in this caseare interpreted as artificial reflections from the corners of thesurrounding medium. Note that the levels of these errors are nottrivial; hence, the application of atomics is not just complemen-tary here, but rather necessary.

IV. CODE VALIDATION AND MEMORY UTILIZATION

Before proceeding to the investigation of specific lightningproblems with uncertainties, we test the reliability of the parallel3-D FDTD code, by performing comparisons with other numer-ical as well as analytical solutions in simple configurations. In

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY

Fig. 8. Waveforms from serial 2-D and parallel 3-D simulations, for the radialcomponent of the electric field intensity at (r, z) = (150, 12 m) (first stroke).

addition, a convergence study determines the smallest size ofthe lightning channel that ensures sufficient accuracy.

First, parallel 3-D computations are validated by compar-ing the waveform at a specific point with the one calculatedwith a serial 2-D CPU code. The latter applies the FDTDdiscretization in cylindrical coordinates, taking into accountthe problem’s symmetry. For this test, we consider a config-uration where the ground parameters are invariant, equal toσ = 0.001 S/m, εr = 5. Furthermore, the lightning channel isstraight and vertical, and the terrain is flat (hence, rotational sym-metry is guaranteed). The 3-D computational space comprises200 × 53 × 5001 cells. Such a large number of cells along thez-axis is not actually needed; however, we also wish to verifythe potential of modeling large-scale problems. Fig. 8 comparesthe results regarding the radial component of the electric fieldintensity at (r, z) = (150, 12 m), when the first-stroke current isconsidered. It is observed that 3-D and 2-D computations coin-cide, thus, verifying the reliability of the GPU implementation.

Another validation test is next performed, where the GPUcomputations are compared to analytical calculations. Specif-ically, reference solutions are obtained using the techniquedescribed in [20], which transforms Sommerfeld integrals byexpressing them in terms of Hankel functions, and then fol-lows a modified integration path on the complex plane. Twoexamination points are considered ((r, z) = (100, 10 m) and(200, 15 m)), for the case of the subsequent lightning stroke.The comparison with the GPU-obtained values is depicted inFig. 9, where, as previously, agreement is verified. Given thatthe technique in [20] is an analytical one, it should produceresults very close to the exact ones; hence, the accuracy of theproposed numerical model can be considered very promising.

As already mentioned, a critical issue in parallel 3-D simula-tions is the available memory of a single GPU unit. Given that arealistic estimate of a channel’s size is of the order of kilometers,domains with a large number of cells in the z-direction cannot beavoided. Hence, it is crucial to determine the minimum channellength that actually needs to be modeled, to ensure both reliablecomputations and a balanced treatment of memory resources.

Fig. 9. Waveforms from analytical computations and parallel 3-D simulations,for the radial component of the electric field intensity (subsequent stroke).

Fig. 10. Effect on the lightning-channel length on the radial component of theelectric field intensity at (r, z) = (150, 12 m) (first stroke).

Bearing this in mind, the predicted radial field of the previousexamples (first-stroke case) is examined for various lightningchannels. According to Fig. 10, the results converge after a cer-tain source size, at least within the time period examined here.Therefore, we safely conclude that the incorporation of at least1500 m of the channel is necessary, so that the validity of thesimulation is not compromised.

Complementary to the previous test, the following one re-veals the effect of the channel truncation at different observationheights. Specifically, the relative (%) error at positions z = 50,100, and 150 m is depicted in Fig. 11 (r = 100 m, the subse-quent stroke is considered, and ground parameters are the sameas previously). Reference waveforms are obtained with a muchlonger channel. It becomes apparent that the effect of reduc-ing the actual channel size appears more pronounced at higherpositions. Nevertheless, the corresponding values are practi-cally insignificant, as they remain lower than 0.2%, proving thatour assumption for the necessary channel length remains validnot only for positions close to the ground. Additional tests re-vealed that the error remains lower than 1% even with 10×moretime steps.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PYRIALAKOS et al.: GPU-BASED CALCULATION OF LIGHTNING-GENERATED ELECTROMAGNETIC FIELDS IN 3-D PROBLEMS 7

Fig. 11. Relative (%) error at different observation heights, due to the trunca-tion of the lightning channel (r = 100 m, εr = 5, σ = 0.001 S/m). The caseof subsequent stroke is considered.

TABLE IISIMULATION TIMES (IN SEC) OF SERIALIZED AND PARALLEL FDTD

IMPLEMENTATIONS FOR DIFFERENT GRIDS

Grid Size CPU time GPU1 time GPU2 time(×speedup) (×speedup)

2.5 × 106 cells 1795 s 51 s (×35) 20 s (×90)5 × 106 cells 3651 s 96 s (×38) 36 s (×101)107 cells 7722 s 158 s (×49) 61 s (×127)2 × 107 cells 15130 s 309 s (×49) 120 s (×126)4 × 107 cells 31093 s − 243 s (×128)8 × 107 cells 63375 s − 528 s (×120)

V. ASSESSMENT OF GPU ACCELERATION

In problems with uncertainties, the extraction of statisticalproperties requires performing several simulations, which canbe extremely time consuming in the case of 3-D problems. Thisdrawback has been our main motivation for the GPU paral-lelization, as our intention is to perform Monte Carlo analy-ses for lightning-induced electromagnetic fields. In this section,the potential of GPU-FDTD implementations to solve theaforementioned problems rapidly is investigated. We considerCartesian meshes with different sizes and perform simula-tions with 12 000 iterations. The CPU code is executed on anIntel Core i7-3820 processor at 3.6 GHz, while parallel com-putations are carried out on two different GPUs. The first one(GPU1) is a NVIDIA Tesla C2050 unit, with 448 cores and3 GB of global memory. The second GPU (GPU2) is an NVIDIAGTX Titan, and offers 2688 cores and 6 GB of memory capac-ity. The timings are compared in Table II. Although the largerproblems could not be handled by the older GPU, the obtainedresults fully justify our decision regarding the parallelization ofthe FDTD approach. Compared to the serial code, GPU1 is ableto attain speedups up to 49×. Even greater reduction of the com-putational times is accomplished with the newer GPU, as we findacceleration factors as high as 128. Consequently, the significanttime reduction enables the execution of repeated simulationswithin reasonable time limits, facilitating the investigation ofstatistical properties.

VI. NUMERICAL RESULTS

The rest of the paper is devoted to numerical results, wherethe stochastic nature of various aspects of the lightning problem

Fig. 12. Mean value (black lines) and standard deviation (blue bars) of thehorizontal components of the electric- and magnetic field intensities at point(r, z) = (100, 2 m), due to subsequent stroke current.

is investigated. Our main goal is twofold. First, to demonstratethat the simplified models used in many studies till today maynot correspond reliably to real-life situations. Second, to providean assessment of the impact that the fluctuations of modelingvariables have on the final results. Pertinent statistics regardingthe electric field intensity are extracted after multiple simula-tions, in the context of a Monte Carlo procedure. The quantityof interest is the horizontal (radial) electric field, as this is morelikely to cause detrimental effects on specific structures, e.g.,power lines [47].

A. Effect of Ground Inhomogeneities

Let us, initially, examine the case where the ground has acompletely flat surface, but is made up of inhomogeneous lossymaterial. This is in contrast to many other approaches, where thesoil is considered completely homogeneous. Specifically, themean values for the ground parameters are selected 〈εr 〉 = 5and 〈σ〉 = 0.001 S/m, with their standard deviations being 1and 5 × 10−4 S/m, respectively. Moreover, the necessary ran-dom values have normal distributions and are obtained with theBox–Muller technique [48]. Normally, the extraction of reliablestatistics calls for a high number of individual tests. Yet, the con-sidered level of randomness in this problem allows us to drawconsistent conclusions with a relatively low number of simula-tions (60 − 100). The latter have been carried out for 6000 steps,in a computational domain comprising 150 × 100 × 2000 cells.The average values of the horizontal electric and magnetic fieldat point (r, z) = (100, 2 m), together with error bars denotingthe corresponding standard deviations, are depicted in Fig. 12.Evidently, even small spatial fluctuations of the material param-eters trigger nontrivial changes of the field magnitude. In thiscase, we find a maximum absolute value of 541.2 V/m for theelectric field, and a maximum value of 24.36 V/m for its stan-dard deviation. Similar properties are exhibited by the magneticfield, whose standard deviation becomes as large as 15.5% of themean value at early times, and reduces to < 3% when t > 2μs.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY

Fig. 13. Representative geometry of a rough terrain and snapshot of the elec-tric field distortion at z = 10 m.

Fig. 14. Statistical parameters of the radial electric field due to the subsequentstroke at (r, z) = (100, 10 m), when the terrain is not flat. Cases where thecorrelation length l is set to different values are displayed.

B. Effect of Terrain Roughness

The next aspect of our problem is the roughness of the ground,which has been taken into consideration in very few instancestill today [49], [50]. A technique similar to the one used forgenerating randomly inhomogeneous material is also appliedherein, in order to introduce varying fluctuations on the groundsurface [51]. Essentially, the height of the terrain is selected to belocally modified by only ±1 m. A representative portion of theanomalous terrain is given in Fig. 13, together with a snapshotof the electric field distortion on the z = 10 m plane. The latteris obtained from the calculated field, after subtracting the valuesthat correspond to a completely flat ground surface. Althoughonly one of the various cases examined is depicted, it is evidentthat the ground irregularity induces local fluctuations to thefield distribution that should not be overlooked. The mean valueand standard deviation of the radial electric field at a specificobservation point are given in Fig. 14, for three different valuesof the correlation length l [51] and a vertical straight channelconfiguration. It is noticed that the standard deviation of thefield values is quite significant, and may reach levels up to10% of the mean value at the corresponding time instants. Inany case, it becomes evident that even small deviations from acompletely flat earth surface can induce nontrivial changes to themagnitude of the lightning-produced pulses. Additionally, we

Fig. 15. Radial component of the electric field intensity, for different angularpositions of a straight lightning channel (first-stroke case).

Fig. 16. Radial component of the electric field intensity, for different angularpositions of a straight lightning channel (subsequent-stroke case).

may conclude that larger correlation lengths trigger higher fieldlevels. This seems expected, since shorter correlation lengthsimply “rougher” surfaces; hence, stronger scattering at randomdirections occurs in these cases.

C. Effect of Straight Channel’s Inclination

It is a common approach, especially—but not only—in calcu-lations via integral expressions, to consider the lightning sourcein a completely vertical position. This practice facilitates analyt-ical computations, but also introduces an unnecessary simplifi-cation. To assess the effect of the channel’s angular position onthe outcome, we present various results with straight lightningchannels at tilted placements. We avoid staircase approxima-tions to the channel’s geometry by using skewed meshes, wherethe cells’ tilt angle is selected according to the direction of thechannel. Representative results of the horizontal electric fieldcomponent are depicted in Figs. 15 (first stroke) and 16 (sub-sequent stroke). The examined angles, which are positive whenthe channel is tilted toward the observation point, fall within the

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PYRIALAKOS et al.: GPU-BASED CALCULATION OF LIGHTNING-GENERATED ELECTROMAGNETIC FIELDS IN 3-D PROBLEMS 9

Fig. 17. Examples of lightning channels with irregular geometries.

range from −π/15 to π/15 rad. As observed, the magnitude ofthe induced field waveforms are very sensitive to the channel’sslope. For example, a π/15 angular modification may cause upto four times more intense horizontal fields. Compared to thecase of a completely vertical channel, the radial electric fieldnow is a combination of two components, one parallel and oneperpendicular to the channel, resulting in considerable changesthat cannot be neglected.

D. Effect of Channel Irregularity

Apart from inclined straight channels, a more realistic ap-proach should include lightning geometries with more irregularshapes. Our framework can incorporate nonorthogonal meshes,where cells may be continuously and randomly tilted along thez-axis. This permits us to study several different models ofthe lightning channel, and evaluate their effect on the excitedfields. Some examples of nonstraight channels are depicted inFig. 17. Such geometries are produced via a randomized al-gorithm, which initially divides the channel into smaller linearparts (with lengths from 1 to 75 m in the displayed cases). Then,different skew angles are assigned to individual pieces, result-ing in more realistic patterns. For the examples in Fig. 17, theaverage value of the tilt angles (generally forced to be ≤ π/10)is selected zero.

The impact of the channel geometry on the electric field isshown in Figs. 18 and 19. Specifically, Fig. 18 displays thepredicted curves when the first stroke is considered with irreg-ular channels similar to those in Fig. 17. The observation pointis (r, z) = (100, 10 m). Two different cases are tested: Case Irefers to a zero average segment inclination with its standarddeviation set to π/10, while Case II assumes an average ofthe angular positions equal to π/15 and values within a range of2π/7.5 rad. Statistics are obtained after 80 simulations. As seen,standard deviations exhibit (now shown as a percentage of thecorresponding mean values) trends close to those of the mean-value curves. Specifically, for the considered period, the fieldproduced by the first-stroke current continues to grow in time, asthe mean value has not started reducing to zero yet. Similar ob-servations can be made for Fig. 19, where the subsequent strokeis considered. Now, Case I refers to the same parameter selec-tion as Fig. 18, while Case II assumes an average of the angularpositions equal to π/15, and values within a range of 2π/7.5rad. As mentioned earlier, an—overall—positive tilt angle re-

Fig. 18. Statistical parameters of the radial electric field due to the first strokeat (r, z) = (100, 10 m). Tilt angles of channel segments have values between±π/10 rad in Case I, and between π/15 ± π/10 rad in Case II.

Fig. 19. Statistical parameters of the radial electric field due to the subsequentstroke at (r, z) = (100, 10 m). Tilt angles of channel segments have valuesbetween ±π/10 rad in Case I, and between π/15 ± π/7.5 rad in Case II.

sults in stronger fields at the point of interest. Regarding thecomputational cost, distorted meshes require more time-stepsdue to stability restrictions; thus, rendering GPU parallelizationeven more valuable.

E. Impact of Combined Factors

The last numerical result is concerned with the overall impacton the field’s statistical properties when the uncertainty of vari-ous factors is taken into account within a single configuration. Inessence, we perform simulations where the stochastic nature ofthe ground’s electric parameters, the terrain’s roughness, and thechannel’s irregularity are considered to be present at the sametime. The standard deviation of the ground’s dielectric permit-tivity is set to 10% of its mean value (equal to 5), its conduc-tivity is selected 0.01 S/m, and the terrain’s surface is distortedas described in Section VI-B. In addition, the tilt angles of thechannel’s individual elements have a zero mean value and a stan-dard deviation equal to π/10. The statistical parameters of boththe horizontal electric- and magnetic field intensities due to the

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY

Fig. 20. Statistical parameters of the radial electric and horizontal magneticfield due to the subsequent stroke at (r, z) = (100, 10 m), when inhomogeneousground, rough terrain and irregular channel are combined (left axis: electric field,right axis: magnetic field).

subsequent stroke are illustrated in Fig. 20 for the observationpoint of previous tests ((r, z) = (100, 10 m)). Comparing withCase I of Fig. 19, where only the channel irregularity is con-sidered, the mean value of the electric field displays a similarpattern, but with lower levels (around 25%). This observationimplies that among the three individual factors, the producedfield is influenced primarily by the channel’s geometry and sec-ondarily by the ground’s inhomogeneity and roughness. Inter-estingly, if the obtained standard deviation is expressed as apercentage of the corresponding mean value, it appears higherthan all previous cases, which is a conclusion consistent withthe presence of uncertainty in various aspects of the lightningproblem.

VII. CONCLUSION

We have presented a parallel computational approach for thestatistical study of lightning-generated electromagnetic fields incomplex 3-D environments, exploiting the potential of modernGPUs. Compared to standard serialized solutions, the proposedimplementation can accelerate computations by factors higherthan 100 in cases of large grids. We have also demonstrated theutility of the rapid simulations in problems where modeling un-certainties need to be considered, and statistical parameters canbe extracted after performing a multitude of trials in reasonablecomputing times. Results have verified that the stochastic char-acter of electric and geometric features of lightning problemshas a significant impact on the produced electromagnetic fields,and failure to take these into account may degrade the relia-bility of results. The proposed computational framework has avery broad range and is suitable for the efficient investigationof diverse engineering problems with random characteristics inother EMC applications, as well.

ACKNOWLEDGMENT

One of the GTX Titan GPUs used for this research was do-nated by the NVIDIA Corporation. The authors would like to

thank Dr. K. Rallis for providing the programming code foranalytically calculating field values, according to [20].

REFERENCES

[1] T. Thang, Y. Baba, N. Nagaoka, A. Ametani, J. Takami, S. Okabe, andV. A. Rakov, “FDTD simulation of lightning surges on overhead wires inthe presence of corona discharge,” IEEE Trans. Electromagn. Compat.,vol. 54, no. 6, pp. 1234–1243, Dec. 2012.

[2] H. Sumitani, T. Takeshima, Y. Baba, N. Nagaoka, A. Ametani, J. Takami,S. Okabe, and V. A. Rakov, “3-D FDTD computation of lightning-inducedvoltages on an overhead two-wire distribution line,” IEEE Trans. Electro-magn. Compat., vol. 54, no. 5, pp. 1161–1168, Oct. 2012.

[3] B. Yang, B.-H. Zhou, B. Chen, J.-B. Wang, and X. Meng, “Numerical studyof lightning-induced currents on buried cables and shield wire protectionmethod,” IEEE Trans. Electromagn. Compat., vol. 54, no. 2, pp. 323–331,Apr. 2012.

[4] J. Paknahad, K. Sheshyekani, and F. Rachidi, “Lightning electromagneticfields and their induced currents on buried cables. Part I: The effect of anoceanland mixed propagation path,” IEEE Trans. Electromagn. Compat.,vol. 56, no. 5, pp. 1137–1145, Oct. 2014.

[5] J. Paknahad, K. Sheshyekani, F. Rachidi, M. Paolone, and A. Mimouni,“Evaluation of lightning-induced currents on cables buried in a lossydispersive ground,” IEEE Trans. Electromagn. Compat., vol. 56, no. 6,pp. 1522–1529, Dec. 2014.

[6] M. Becerra and V. Cooray, “On the interaction of lightning upward con-necting positive leaders with humans,” IEEE Trans. Electromagn. Com-pat., vol. 51, no. 4, pp. 1001–1008, Nov. 2009.

[7] M. Ishii, K. Miyabea, and A. Tatematsu, “Induced voltages and currentson electrical wirings in building directly hit by lightning,” Electr. Power.Syst. Res., vol. 85, pp. 2–6, 2012.

[8] C. Yao, H. Wu, Y. Mi, Y. Ma, Y. Shen, and L. Wang, “Finite differencetime domain simulation of lightning transient electromagnetic fields ontransmission lines,” IEEE Trans. Dielectr. Electr. Insul., vol. 20, no. 4,pp. 1239–1246, Aug. 2013.

[9] M. Khosravi-Fasrani, R. Moini, S. H. H. Sadeghi, and F. Rachidi, “Onthe validity of approximate formulas for the evaluation of the lightningelectromagnetic fields in the presence of a lossy ground,” IEEE Trans.Electromagn. Compat., vol. 55, no. 2, pp. 362–370, Apr. 2013.

[10] E. Perrin, C. Guiffaut, A. Reineix, and F. Tristant, “Using a design-of-experiment technique to consider the wire harness load impedances in theFDTD model of an aircraft struck by lightning,” IEEE Trans. Electromagn.Compat., vol. 55, no. 4, pp. 747–753, Aug. 2013.

[11] P.D. Kannu and M.J. Thomas, “Lightning-induced voltages in a satel-lite launch-pad protection system,” IEEE Trans. Electromagn. Compat.,vol. 45, no. 4, pp. 644–651, Nov. 2003.

[12] A. Shoory, F. Vega, P. Yutthagowith, F. Rachidi, M. Rubinstein, Y. Baba,V. A. Rakov, K. Sheshyekani, and A. Ametani, “On the mechanism ofcurrent pulse propagation along conical structures: Application to talltowers struck by lightning,” IEEE Trans. Electromagn. Compat., vol. 54,no. 2, pp. 332–342, Apr. 2012.

[13] Y. Baba and V. A. Rakov, “Applications of the FDTD method to lightningelectromagnetic pulse and surge simulations,” IEEE Trans. Electromagn.Compat., vol. 56, no. 6, pp. 1506–1521, Dec. 2014.

[14] V. Cooray, “On the accuracy of several approximate theories used inquantifying the propagation effects on lightning generated electromagneticfields,” IEEE Trans. Antennas Propag., vol. 56, no. 7, pp. 1960–1967, Jul.2008.

[15] A. Shoory, A. Mimouni, F. Rachidi, V. Cooray, and M. Rubinstein,“On the accuracy of approximate techniques for the evaluation of light-ning electromagnetic fields along a mixed propagation path,” Radio Sci.,vol. 46, no. 2, pp. 1–8, Apr. 2011.

[16] A. Shoory, F. Rachidi, F. Delfino, R. Procopio, and M. Rossi, “Lightningelectromagnetic radiation over a stratified conducting ground: 2. Validityof simplified approaches,” J. Geophys. Res., Atmos., vol. 116, no. D11,pp. 1–10, Jun. 2011.

[17] M. Rubinstein, “An approximate formula for the calculation of the hori-zontal electric field from lightning at close, intermediate, and long ranges,”IEEE Trans. Electromagn. Compat., vol. 38, no. 3, pp. 531–535, Aug.1996.

[18] C. Caligaris, F. Delfino, and R. Procopio, “Cooray–Rubinstein formulafor the evaluation of lightning radial electric fields: Derivation and im-plementation in the time domain,” IEEE Trans. Electromagn. Compat.,vol. 50, no. 1, pp. 194–197, Feb. 2008.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PYRIALAKOS et al.: GPU-BASED CALCULATION OF LIGHTNING-GENERATED ELECTROMAGNETIC FIELDS IN 3-D PROBLEMS 11

[19] F. Delfino, R. Procopio, M. Rossi, F. Rachidi, and C. A. Nucci,“An algorithm for the exact evaluation of the underground lightning elec-tromagnetic fields,” IEEE Trans. Electromagn. Compat., vol. 49, no. 2,pp. 401–411, May 2007.

[20] K. Rallis, T. Theodoulidis, and T. Zygiridis, “Efficient calculation of thelightning generated electric field above ground,” in Proc. Int. Symp. Elec-tromagn. Compat., Rome, Italy, Sep. 17–21, 2012, pp. 1–5.

[21] A. Taflove and S. C. Hagness, Computational Electrodynamics: TheFinite-Difference Time-Domain Method, 3rd ed. Norwood, MA, USA:Artech House, 2005.

[22] Q. Zhang, D. Li, Y. Zhang, J. Gao, and Z. Wang, “On the accuracy ofWait’s formula along a mixed propagation path within 1 km from thelightning channel,” IEEE Trans. Electromagn. Compat., vol. 54, no. 5,pp. 1042–1047, Oct. 2012.

[23] Q. Zhang, D. Li, Y. Fan, Y. Zhang, and J. Gao, “Examination of theCooray–Rubinstein (C-R) formula for a mixed propagation path by usingFDTD,” J. Geophys. Res., Atmos., vol. 117, no. D15, pp. 1–7, Aug. 2012.

[24] D. Li, Q. Zhang, Z. Wang, and T. Liu, “Computation of lightning hori-zontal field over the two-dimensional rough ground by using the three-dimensional FDTD,” IEEE Trans. Electromagn. Compat., vol. 56, no. 1,pp. 143–148, Feb. 2014.

[25] D. Li, Q. Zhang, T. Liu, and Z. Wang, “Validation of the Cooray–Rubinstein (C-R) formula for a rough ground surface by using three-dimensional (3-D) FDTD,” J. Geophys. Res., Atmos., vol. 118, no. 22,pp. 1–6, Nov. 2013.

[26] T. Oikawa, J. Sonoda, M. Sato, N. Honma, and Y. Ikegawa, “Analysis oflightning electromagnetic field on large-scale terrain model using three-dimensional MW-FDTD parallel computation,” Electr. Eng. Jpn., vol. 184,no. 2, pp. 20–27, 2013.

[27] Z.-D. Jiang, B.-H. Zhou, Y.-W. Liu, and B. Yang, “A multiresolution time-domain method for LEMP calculation and comparison with FDTD,” IEEETrans. Electromagn. Compat., vol. 56, no. 2, pp. 419–426, Apr. 2014.

[28] F. Napolitano, A. Borghetti, C. A. Nucci, F. Rachidi, and M. Paolone, “Useof the full-wave finite element method for the numerical electromagneticanalysis of LEMP and its coupling to overhead lines,” Electr. Power Syst.Res., vol. 94, pp. 24–29, Jan. 2013.

[29] Y. Tanaka, Y. Baba, N. Nagaoka, and A. Ametani, “Computation of light-ning electromagnetic pulses with the TLM method in the 2-D cylindricalcoordinate system,” IEEE Trans. Electromagn. Compat., vol. 56, no. 4,pp. 949–955, Aug. 2014.

[30] Y. Tanaka, Y. Baba, N. Nagaoka, and A. Ametani, “Computation of light-ning electromagnetic pulses with the constrained interpolation profilemethod in the 2-D cylindrical coordinate system,” IEEE Trans. Elec-tromagn. Compat., vol. 56, no. 6, pp. 1497–1505, Dec. 2014.

[31] D. De Donno, A. Esposito, G. Monti, and L. Tarricone, “MPIE/MoM ac-celeration with a general-purpose graphics processing unit,” IEEE Trans.Microw. Theory Technol., vol. 60, no. 9, pp. 2693–2701, Sep. 2012.

[32] C. Potratz, H.-W. Glock, and U. van Rienen, “Time-domain field and scat-tering parameter computation in waveguide structures by GPU-accelerateddiscontinuous-Galerkin method,” IEEE Trans. Microw. Theory Tech.,vol. 59, no. 11, pp. 2788–2797, Nov. 2011.

[33] G. G. Pyrialakos, T. T. Zygiridis, N. V. Kantartzis, and T. D. Tsiboukis,“GPU-based three-dimensional calculation of lightning-generated electro-magnetic fields,” in Proc. IEEE Int. Conf. Numerical Electromagn. Model.Optimization RF, Microw. Terahertz Appl., Pavia, Italy, May 14–16, 2014,pp. 1–4.

[34] C. A. Nucci, G. Diendorfer, M. A. Uman, F. Rachidi, M. Ianoz, andC. Mazzetti, “Lightning return stroke current models with specifiedchannel-base current: A review and comparison,” J. Geophys. Res., Atmos.,vol. 95, no. D12, pp. 20395–20408, Nov. 1990.

[35] J. A. Roden and S. D. Gedney, “Convolution PML (CPML): An efficientFDTD implementation of the CFS-PML for arbitrary media,” Microw.Opt. Technol. Lett., vol. 27, no. 5, pp. 334–339, Dec. 2000.

[36] D. Donno, A. Esposito, L. Tarricone, and L. Catarinucci, “Introductionto GPU computing and CUDA programming: A case study on FDTD,”IEEE Antennas Propag. Mag., vol. 52, no. 3, pp. 116–122, Jun. 2010.

[37] W. Yu, X. Yang, Y. Liu, R. Mittra, D.-C. Chang, C.-H. Liao, M. Akira, W.Li, and L. Zhao, “New development of parallel conformal FDTD methodin computational electromagnetics engineering,” IEEE Antennas Propag.Mag., vol. 53, no. 3, pp. 15–41, Jun. 2011.

[38] S. E. Krakiwsky, L. E. Turner, and M. M. Okoniewski, “Accelerationof finite-difference time-domain (FDTD) using graphics processor units(GPU),” in Proc. IEEE MTT-S Int. Microw. Symp. Dig., 6–11 Jun., 2004,vol. 2, pp. 1033–1036.

[39] M. J. Inman and A. Z. Elsherbeni, “Optimization and parameter explo-ration using GPU based FDTD solvers,” in Proc. IEEE MTT-S Int. Microw.Symp. Dig., Atlanta, GA, USA, 15–20 Jun. 2008, pp. 149–152.

[40] NVIDIA. (2013). CUDA C Programming Guide [Online]. Available:http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf

[41] P. Sypek, A. Dziekonski, and M. Mrozowski, “How to render FDTDcomputations more effective using a graphics accelerator,” IEEE Trans.Magn., vol. 45, no. 3, pp. 1324–1327, Mar. 2009.

[42] V. Demir and A. Z. Elsherbeni, “Compute unified device architecture(CUDA) based finite-difference time-domain (FDTD) implementation,”ACES J., vol. 25, no. 4, pp. 303–314, Apr. 2010.

[43] M. R. Zunoubi, J. Payne, and W. P. Roach, “CUDA implementation ofTEz -FDTD solution of Maxwell’s equations in dispersive media,” IEEEAntennas Wireless Propag. Lett., vol. 9, pp. 756–759, 2010.

[44] J. Chi, F. Liu, E. Weber, Yu Li, and S. Crozier, “GPU-accelerated FDTDmodeling of radio-frequency field-tissue interactions in high-field MRI,”IEEE Trans. Biomed. Eng., vol. 58, no. 6, pp. 1789–1796, Jun. 2011.

[45] M. Livesey, J. F. Stack, Jr., F. Costen, T. Nanri, N. Nakashima, and S. Fu-jino, “Development of a CUDA implementation of the 3D FDTD method,”IEEE Antennas Propag. Mag., vol. 54, no. 5, pp. 186–195, Oct. 2012.

[46] C. Richter, S. Schps, and M. Clemens, “GPU acceleration of finite differ-ence schemes used in coupled electromagnetic/thermal field simulations,”IEEE Trans. Magn., vol. 49, no. 5, pp. 1649–1652, May 2013.

[47] V. Cooray, “Horizontal electric field above- and underground producedby lightning flashes,” IEEE Trans. Electromagn. Compat., vol. 52, no. 4,pp. 936–943, Nov. 2010.

[48] G. E. P. Box and M. E. Muller, “A note on the generation of randomnormal deviates,” Annu. Math. Stat., vol. 29, no. 2, pp. 610–611, 1958.

[49] Q. Zhang, J. Yang, X. Jing, D. Li, and Z. Wang, “Propagation effect of afractal rough ground boundary on the lightning-radiated vertical electricfield,” Atmos. Res., vol. 104–105, pp. 202–208, Feb. 2012.

[50] Q. Zhang, X. Jing, J. Yang, D. Li, and X. Tang, “Numerical simulation ofthe lightning electromagnetic fields along a rough and ocean-land mixedpropagation path,” J. Geophys. Res., Atmos., vol. 117, no. D20, pp. 1–7,Oct. 2012.

[51] K. Uchida, M. Takematsu, J.-H. Lee, K. Shigetomi, and J. Honda, “Ananalytic procedure to generate inhomogeneous random rough surface,” inProc. 16th Int. Conf. Netw.-Based Inf. Syst., Gwangju, Korea, 4–6 Sep.2013, pp. 494–501.

Georgios G. Pyrialakos was born in Thessaloniki,Greece, in 1990. He received the Diploma degree inelectrical and computer engineering from the Aristo-tle University of Thessaloniki, Thessaloniki, Greece,in 2013, where he is currently working toward thePh.D. degree at the Applied Computational Electro-magnetics Laboratory.

His research interests include the advancementof computational electromagnetics, focusing mainlyon the FDTD method and its application to realisticEMC problems, and the hardware acceleration of the

related algorithms. His other fields of research include the numerical evaluationof graphene and other advanced materials.

Theodoros T. Zygiridis (M’13) received theDiploma and Ph.D. degrees in electrical and computerengineering, both from Aristotle University of Thes-saloniki, Thessaloniki, Greece, in 2000 and 2006,respectively.

He is currently an Assistant Professor at theDepartment of Informatics and Telecommunica-tions Engineering, University of Western Macedonia,Kozani, Greece. His research interests include thearea of computational electromagnetics, and mainlyfocus on FDTD methods, error-optimized techniques,

unconditionally stable schemes, parallelization on GPUs, lightning problems,simulations with uncertainties, etc.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY

Nikolaos V. Kantartzis (SM’12) received theDiploma and Ph.D. degrees in electrical and com-puter engineering from the Aristotle University ofThessaloniki (AUTH), Thessaloniki, Greece, in 1994and 1999, respectively.

In 1999, he joined the Applied and ComputationalElectromagnetics Laboratory, Department of Electri-cal and Computer Engineering, AUTH, as a Postdoc-toral Research Fellow, where he is currently an Asso-ciate Professor. He has authored or coauthored threebooks and several refereed journal papers in the area

of computational electromagnetics, EMC analysis, metamaterials, and absorb-ing boundary conditions. His main research interests include EMC modeling,time- and frequency-domain algorithms, metamaterials, advanced microwavecomponents, and antenna applications.

Theodoros D. Tsiboukis (SM’99) received theDiploma degree in electrical and mechanical engi-neering from the National Technical University ofAthens, Athens, Greece, in 1971, and the Doctor Eng.degree from the Aristotle University of Thessaloniki(AUTH), Thessaloniki, Greece, in 1981.

From 1981 to 1982, he was with the ElectricalEngineering Department, University of Southamp-ton, Southampton, U.K., as a Senior Research Fellow.Since 1982, he has been with the Department of Elec-trical and Computer Engineering (DECE), AUTH,

where he is currently a Professor. He has served in numerous administrativepositions, including Director of the Division of Telecommunications, DECE(1993–1997) and Chairman, DECE (1997–2001). He was the Chairman of thelocal organizing committee of the 8th International Symposium on TheoreticalElectrical Engineering (1995). He has authored or coauthored eight books andover 350 refereed journal and conference papers. He was the Guest Editor of aspecial issue of the International Journal of Theoretical Electrotechnics (1996).He is currently the Head of the Advanced and Computational ElectromagneticsLaboratory, DECE. His main research interests include electromagnetic-fieldanalysis by energy methods, computational electromagnetics (FEM, vector fi-nite elements, MoM, FDTD method, integral equations, absorbing boundaryconditions), metamaterials, photonic crystals, inverse and EMC problems.

Prof. Tsiboukis received several awards and distinctions. He is a Member ofvarious societies, associations, chambers, and institutions.