new method for fast transient simulation of large linear circuits using high-order stable methods

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON COMPONENTS, PACKAGING AND MANUFACTURING TECHNOLOGY 1

New Method for Fast Transient Simulation of LargeLinear Circuits Using High-Order Stable MethodsMina A. Farhan, Student Member, IEEE, Emad Gad, Member, IEEE, Michel S. Nakhla, Life Fellow, IEEE,

and Ramachandra Achar, Fellow, IEEE

Abstract— A new algorithm based on A-stable and L-stablehigh-order time-domain integration methods is presented forthe simulation of large linear circuits such as those occur-ring in modeling chip interconnects and packaging structures.The proposed method takes advantage of the special structureof the mathematical formulation of circuits encountered inthese applications to reduce the computational cost significantly.Several circuit examples are presented to demonstrate thespeedup achieved by the proposed algorithm.

Index Terms— A-stability, circuit simulation, high-order inte-gration methods, high-speed circuits, linear circuits, L-stability,multiderivative methods, numerical solution of differential equa-tions, stiff circuits.

I. INTRODUCTION

T IME-DOMAIN transient simulation is a fundamentalcomponent in computer-aided design tools of high-speed

circuits. The main advantage of transient circuit simulatorsbased on the SPICE paradigm [1] is their ability to easilyincorporate models with arbitrary levels of details.

In many packaging and microwave applications, however,the underlying circuits are composed of large linear circuitswith few nonlinear terminations representing either drivers orreceivers. Those applications are frequently encountered in thesituation of modeling high-speed circuits. For example, 3-Delectromagnetic methods such as partial equivalent elementcircuits [2] have been instrumental in modeling printed circuitboards, interconnects, and power systems [3], [4]. Anotherexample can be found in recent trends of modeling microwaveand package structures using the data obtained either frommeasurements or electromagnetic simulations via approachessuch as the vector-fit [5].

The circuit models arising in these situations are mainlylinear, with tens of thousands of components. Simulationof such circuits is typically carried out using the classicaltransient simulators based on SPICE, which interprets thecircuit as a mixed system of differential algebraic equations(DAEs) and solves it numerically at discrete time points.

Manuscript received February 25, 2012; revised July 30, 2012; acceptedNovember 30, 2012. Recommended for publication by Associate Editor J. Tanupon evaluation of reviewers’comments.

M. A. Farhan, M. S. Nakhla, and R. Achar are with the Departmentof Electronics, Carleton University, Ottawa, ON K1S 5B6, Canada (e-mail:[email protected]; [email protected]; [email protected]).

E. Gad is with the School of Electrical Engineering and Computer Science(SEECS), University of Ottawa, Ottawa, ON K2J 0G4, Canada (e-mail:[email protected]).

Digital Object Identifier 10.1109/TCPMT.2012.2233545

Nonetheless, traditional DAE solvers used in current commer-cial packages are mainly derived from the general class knownas linear multistep methods (LMS) [6], which suffer from aninherent conflict between the “order” and “stability” of theparticular method [7] that is used to numerically approximatethe solution of DAE.

The order of the method (or order of convergence) reflectsthe computational effort needed to achieve a better accuracy.It basically quantifies how fast the numerical approximationconverges to the exact solution of the differential equation asthe size of the time step is reduced. Thus, a high-order methoddoes not necessitate the reduction of the step size excessivelyto obtain an acceptable accurate solution. In contrast, a low-order method necessitates significant reduction in the lengthof the time step, causing solutions to be performed at alarge number of time points. Hence, high-order methods thatnaturally lead to a reduction in the number of time steps would,therefore, be crucial for efficient simulations.

Nevertheless, high-order methods lacking numerical sta-bility have little value. In general, it is desired that theintegration method be guaranteed stable for all types ofstable circuits (i.e., for all circuits with poles over the left-half plane of the complex domain). This concept is knownin the mathematical literature as A-stability, and an inte-gration method exhibiting this feature is called A-stable.Also, maintaining stability at s = ∞ in the complex plane(L-stability [8]) is a crucial requirement for handling stiffcircuits. It should be noted that an L-stable method is nec-essarily A-stable. Hence, an ideal method should be L-stablewith high order and minimal computational effort. However,L-stable methods cannot have an order higher than 2 for theLMS class [9]. It is not surprising that such a conflict hasoften been referred to as a barrier because of the difficulties itproduces.

In a recent work [10], it was demonstrated that such a barrierdoes not exist if one considers a totally different class ofmethods based on the Obreshkov formula (ObF) [11]. It wasshown that with proper formulation and implementation, ObF-based methods can surpass the performance of conventionalsimulators, such as SPICE [which is based on low-ordermethods, e.g., the trapezoidal rule (TR)]. Furthermore, struc-tural characterization of these methods as applied to nonlinearcircuit simulation was presented more recently in [12]. Inaddition, this class of methods has also been used in [13]for discrete modeling of continuous time-domain systems forreal-time simulation [14]–[16].

2156–3950/$31.00 © 2013 IEEE


2 IEEE TRANSACTIONS ON COMPONENTS, PACKAGING AND MANUFACTURING TECHNOLOGY

The main objective of this paper is to present a new imple-mentation algorithm for the ObF-based high-order methodsthat is tailored to the specific types of circuits encountered inhigh-speed packaging and signal/power integrity applications.The proposed algorithm presented here demonstrates how thelinear nature of such circuits can be used to significantlyreduce the computational overhead involved in the ObF-basedDAE solvers, thereby yielding high speedup.

The rest of this paper is organized as follows. Section IIpresents a brief background on the ObF-based high-ordercircuit simulation. Section III describes the development ofthe proposed method. Section IV discusses implementationstrategies aimed at improving the performance. Sections Vand VI present numerical examples and concluding remarks,respectively.

II. REVIEW OF THE OBRESHKOV-BASED HIGH-ORDER

METHODS

A general linear circuit is described in the time-domainusing the modified nodal analysis (MNA) as follows:

Gx(t)+ Cd xdt= b(t) (1)

where G and C ∈ RN×N are matrices describing the mem-

ory and memoryless elements in the circuit, respectively,x(t) ∈ R

N is the vector of unknown node voltages and cur-rents, and b(t) ∈ R

N is the vector of independent stimuli tothe circuit.

Applying Obreshkov-based high-order integration methodto numerically approximate the transient solution of the MNAequations in (1) at t = tn+1 results in the following system ofequations:

Gξn+1 + Cξn+1 = b (2)

where G and C ∈ RNk×Nk are block structured matrices in

which a block at the (i, j) entry, (with i, j = 1, . . . , N), isa k × k matrix whose structure depends on the nature of thei th and j th variable in the MNA formulation. In the case oflinear circuits, a block in G, denoted by [G](i, j ), is a diagonalblock of form

[G](i, j ) = gi, j Ik (3)

where gi, j is the value of the conductance connecting nodesi and j , and Ik is the identity matrix of size k × k.

On the other hand, a block in C is given by

[C](i, j ) = ci, j

hT (4)

where h is the size of the time step, i.e., h = tn+1 − tn , andci, j is the value of the capacitance connecting nodes i and j ,and T ∈ R

k×k is given by

T =

⎡⎢⎢⎢⎣

0 1 . . . 0...

. . .. . .

...0 . . . 0 1−α0

αk−α1

αk. . . −αk−1

αk

⎤⎥⎥⎥⎦ (5)

where

αi = (−1)(i)(m + k − i)!k!

i !(m + k)!(k − i)! . (6)

If the i th or j th variables of the MNA formulation representcurrents, such as those in inductors or voltage sources, then theabove blocks will have the same structure but with differentvalues of gi, j and ci, j .

The vectors ξn+1 and bn+1 represent expanded versions ofthe vector of unknowns and independent stimuli, respectively.More precisely, these two vectors are also structured as asequence of N subvectors of k components each in thefollowing manner:

ξn+1 = [xTn+1,1 . . . xT

n+1,N ]T (7)

and

bn+1 = [bTn+1,1 . . . b

Tn+1,N ]T (8)

where xn+1,i ∈ Rk , i = 1, . . . , N is given by

xn+1,i = [x (0)n+1,i x (1)

n+1,i . . . x (k−1)n+1,i ]T (9)

with x ( j )n+1,i denoting the numerical approximation to the

j th-order derivative of the i th component of the exact solutionx(t) at t = tn+1, bn+1,i ∈ R

k , and i = 1, . . . , N , is given by

bn+1,i = [b(0)n+1,i b(1)

n+1,i . . . b(k−1)n+1,i − un,i ]T (10)

with b( j )n+1,i denoting the numerical approximation to the j th-

order derivative of the i th component of the independentstimulus b(t) at t = tn+1 and un,i being the i th componentof the vector un ∈ R

N , computed from the previous solutionobtained at t = tn as follows:

un = Chαk

m∑j=0

β j hj x( j )

n (11)

βi = (m + k − i)!m!i !(m + k)!(m − i)! . (12)

The integers k used in (5) and m used in (12) determinethe order and the stability characteristic of the method used.For example, it can be shown that the method is A-stable ifk ≥ m ≥ k − 2 and L-stable if and only if k > m ≥ k − 2 [10].In addition, its order of integration is given by k + m. It shouldbe stressed at this point that the order of the method representsno restriction on the stability characteristics of the method.For example, the Gear’s method [17] loses the A-stability fororders higher than 2, whereas the Obreshkov method remainsA-stable as long as k = m.

Transient simulation of the circuit proceeds by solving thelinear system of equations in (1) for n = 1, 2, . . . , startingfrom some initial conditions computed as shown in [10]. Ateach time step, the matrix J = G + C is factorized using ablock form of the LU factorization which was implementedusing the industry standard KLU algorithm [18]. The block LU(BLU) factorization can be succinctly described as a standardLU technique in which the scalar automatic operations ofdivision, multiplication, and subtraction are replaced by theirmatrix counterparts on the k × k matrix blocks given by (3),(4), or the combination thereof.


FARHAN et al.: NEW METHOD FOR FAST TRANSIENT SIMULATION OF LARGE LINEAR CIRCUITS 3

III. DEVELOPMENT OF THE PROPOSED METHOD

In this section, details of the proposed algorithm for efficientanalysis of large linear circuits using L-stable high-orderintegration methods are given.

Highlighting the main objective of this paper can be bestapproached by first estimating the computational complexityof the BLU factorization scheme used in [12], in order tocontrast it with the complexity arising from the proposedalgorithm. This task can be easily carried out because of thefact that the matrix J has the same structure of the typicalN × N MNA matrices, e.g., G + 2/hC , which arise whileusing the TR method, except that the entries in the latter areexpanded to size k block matrices in J . It is known that thecomplexity of a single standard LU factorization for a typicalmatrix with MNA-like sparsity and size N grows in proportionto Nα , where α is a factor that is approximately equal to1.2, as has been reported recently in [19]. Hence, using asparse ordering for the block entries of J and noting that eachscalar operation in the standard LU factorization is replacedby a matrix block operation of the same kind, it is reasonableto conclude that the cost of BLU factorization will scalein proportion to Nαk3. This estimate of the computationalcomplexity is typically denoted by O(Nαk3).

It is important to note that the increase in the computationaleffort from O(Nα) in a low-order method (such as the TR)compared to O(Nαk3) in a ObF-based high-order method ispart of the computational overhead that is involved naturallyin any high-order scheme for numerical solution of DAE.However, it is the saving in the number of time steps, whichis made possible by the larger step sizes, that typically offsetsthis overhead and results in an overall gain in the simulationtime.

The central contribution of this paper aims at reducing thecomplexity of the overhead from O(Nαk3) to O(Nαk) for thecase of linear circuits. The proposed algorithm takes advantageof the special structure of the blocks in J to achieve this goal.

Consider the system in (2) rewritten in the form

Jξn+1 = bn+1. (13)

An arbitrary block in J can be written as

[ J](i, j ) = gi, j I + ci, j

hT . (14)

Using an eigen decomposition of J (i, j ), each block can berepresented as

[ J](i, j ) = VλT V−1 (15)

where V ∈ Rk×k is the matrix of eigenvectors and λT is a

diagonal matrix having the eigenvalues of J (i, j ) as its diagonalelements.

The key idea in the proposed algorithm is motivated by theobservation that the matrix V is independent of the block-specific entries gi, j , ci, j and the step size h. Therefore, thematrix V is shared by all the blocks in J .

In fact, V depends only on the value of integer k, and isavailable in closed form, or can be computed offline prior toany computation. Indeed, this fact enables rewriting (13) as

ϒ�ϒ−1ξn+1 = bn+1 (16)

Algorithm 1: Block-LU Factorization

input : � ∈ RkN×kN

output: L, U ∈ RkN×kN

1 begin2 L← I kN ;3 for p← 1 to N do4 r1 ← 1, r2 ← k N ;5 c1 ← (p − 1)k + 1, c2 ← pk;6 y← BFS(L, �(r1 : r2, c1 : c2));7 r2 ← pk;8 U (r1 : r2, c1 : c2)← y (r1 : r2, c1 : c2) ;9 H ← BPivot( y) ; //Choose the block

pivot10 for q ← p + 1 to N do11 r1 ← (q − 1)k, r2 ← qk;12 L (r1 : r2, c1 : c2)← y (r1 : r2, c1 : c2) H−1

where

ϒ =⎡⎢⎣

V. . .

V

⎤⎥⎦

︸︷︷︸Nk×Nk

(17)

and � has the same block-structured form of J , but with theexception that each block is now a diagonal block, whosediagonal entries are given by gi, j + (ci, j /h)λp , with λp beingthe pth eigenvalue of T and p = 1, . . . , k.

To take advantage of the formulation in (16), we use thefollowing change of variables:

ρn+1 = ϒ−1ξn+1. (18)

Then, the system of equations in (16) can be written as

�ρn+1 = ϒ−1 bn+1. (19)

Solving the system in (19) for ρn+1 can be carried out usingthe BLU factorization. However, in contrast to the previousformulation in (13), the blocks in � of (19) are all diagonal,therefore requiring only at most k computations during anyblock operation such as inversion, multiplication, or addition.This fact implies that the overall computation complexitywill be O(Nαk) as against O(Nαk3) that is incurred in theprevious approaches [12].

In order to emphasize the advantage of this new formulation,we use the pseudo-code shown by Algorithm 1 to illustrate thefundamental computational operations in the block form of LUfactorization, where the BFS procedure at line 6 is illustratedby pseudo-code in Algorithm 2. Algorithm 3 describes theoperation of block forward/backward substitution module.

The above pseudo-code illustrations make it clear that thebasic computational effort in the BLU factorization involvesmatrix–matrix multiplication (step 7 in Algorithm 2 and inAlgorithm 3), matrix inversion (step 12 in Algorithm 1 andstep 15 in Algorithm 3), and matrix–vector multiplicationsfor the matrices of the individual blocks which are diagonal



Algorithm 2: BFS

input : L ∈ RkN×kN , A ∈ R

kN×k

output: y ∈ RkN×k

1 begin2 y← A;3 for p← 2 to N do4 r1 ← (p − 1)k + 1, r2 ← pk;5 for q ← 1 to p − 1 do6 c1 ← (q − 1)k + 1, c2 ← qk;7 y (r1 : r2, :)← y (r1 : r2, :)−8 L (r1 : r2, c1 : c2) y (c1 : c2, :)

with size k. Therefore, each one of those computations can becarried out with complexity that scales as k per block.

The memory required to save the matrix resulting fromapplying the proposed method would be k times larger thanconventional TR method. However, note that in the case ofthe proposed method, it is only required to store the diagonalblocks. Therefore, a smaller memory is required to factorizethe matrix than if the entries are treated as full blocks. TheLU factorization of the matrix using the proposed method isO(k Nα), where k is the order of integration and N is the sizeof unknown variables. These minor increases in the memoryand factorization costs are far outweighed by the significantCPU-time savings using the proposed method due to its abilityto take larger time steps.

It must be stressed that the above pseudo-code illustrationsdo not take into account the sparse structure of the matrix �.In the practical implementation, an additional layer is addedto these algorithms to take advantage of the sparsity of thematrix arising naturally from the circuit topology. Sparsity-enabled BLU factorization can be briefly described as atwo-step process. The first step is symbolic to determine thenonzero and expected fill-in patterns of the matrix J ∈ R

N×N ,whose structure mirrors the structure of J = G + C , in thesense that an entry (i, j) in J is nonzero if, and only if, theblock entry (i, j) in J is nonzero.

The second step is numerical and carried out in the mannerillustrated by Algorithm 1 but with the exception that only theblocks of � that have been determined by fill-in pattern [12],deduced in the first step, are used in the computation.

IV. PROPOSED IMPLEMENTATION STRATEGIES

It is important to stress that solving for ρn+1 in (19) throughthe BLU factorization of � is only one component of thetransient simulation. However, in order to maintain the samecomplexity of O(Nαk) in the other computational componentsinvolved, several implementation issues have to be carefullyconsidered. This section discusses those issues.

In addition to solving for ρn+1, the basic time-steppingmechanism involves several other steps. More precisely, thosesteps are:

1) the mapping from ρn+1 to ξn+1;2) computing the new value for the source vector bn+1;

Algorithm 3: Block Forward/Backward Substitution

input : L, U ∈ RkN×kN , b ∈ R

kN×k

output: ρ ∈ RkN×k

1 begin2 y← b;3 for p← 2 to N do4 r1 ← (p − 1)k + 1, r2 ← pk;5 for q ← 1 to p − 1 do6 c1 ← (q − 1)k + 1, c2 ← qk;7 y (r1 : r2, :)← y (r1 : r2, :)−8 L (r1 : r2, c1 : c2) y (c1 : c2, :)9 ρ ← y;

10 for p← N to 1 do11 r1 ← (p − 1)k + 1, r2 ← pk;12 for q ← 1 to p − 1 do13 c1 ← (q − 1)k + 1, c2 ← qk;14 D← U(r1 : r2, r1 : r2) ;15 ρ (r1 : r2, :)← D−1(ρ (r1 : r2, :)−16 U (r1 : r2, c1 : c2) ρ (c1 : c2, :)

)

3) computing the right-hand side vector ϒ−1 bn+1 in (19).The following subsections address handling the above com-

putational issues efficiently.

A. Mapping ρn+1 to ξn+1

It is obvious that ξn+1 represent the physical variables inwhich a designer is typically interested, whereas ρn+1 rep-resents a transformation of those variables. Clearly, mappingfrom ρn+1 to ξn+1 requires multiplying the former by ϒ toobtain the latter. This process, however, will scale as O(Nk2)and could, therefore, make the complexity of the overallalgorithm proportional to Nk2.

However, it is typically the case that the designer is inter-ested in viewing the response at a small set of selectedvariables, e.g., the voltage at the receiving end of a victimline in a huge interconnect network. The knowledge of thosevoltages and currents can be utilized to make sure that onlythose variables in the ρn+1 space are mapped using V to theξn+1 space. This fact can be used to limit the transformationfrom ρn+1 to ξn+1 to only those variables indicated by thedesigner.

B. Computing b

As shown in (8), computing b requires having access to allthe components in ξn+1. Unlike the previous issue, computingthe full vectors of x(i)

n is mandatory to form bn+1, therebyrequiring an O(Nk2) computation. In order to address thisissue, next, we show that by adopting a careful orderingstrategy, the cost of this computational step can be reducedto just O(Nk).

To this end, we note that the relation between ρn+1 andξn+1 is through the transformation

ξn = ϒρn. (20)



Using the fact that ξn is partitioned into N vectors, with kcomponents each

ξn = [x (0)n,1 . . . hk−1x (k−1)

n,1 . . . . . . x (0)n,N . . . hk−1x (k−1)

n,N ]T (21)

and adopting a partitioning similar to that of ρn

ρn = [ρ(0)n,1 . . . hk−1ρ

(k−1)n,1 . . . . . . ρ

(0)n,N . . . hk−1ρ

(k−1)n,N ]T (22)

shows that

h j x ( j )n,p =

k−1∑i=0

Vj,i hiρ(i)

n,p (23)

where Vj,i is component at the j th row and i th column in thematrix V .

Now, consider the vector∑m

i=0 βi hi x(i)n needed to compute

bn+1. This vector can be formulated in the following form:

m∑i=0

βi hi x(i)

n =

⎡⎢⎢⎢⎢⎢⎣

m∑i=0

βi hi x (i)n,1

...m∑

i=0βi hi x (i)

n,N

⎤⎥⎥⎥⎥⎥⎦

. (24)

Substituting from (23) into (24) shows that

m∑i=0

βi hi x(i)

n =

⎡⎢⎢⎢⎢⎢⎢⎣

m∑l=0

βl

k−1∑i=0

Vl,i hiρ(i)n,1

...m∑

l=0βl

k−1∑i=0

Vl,i hiρ(i)n,N

⎤⎥⎥⎥⎥⎥⎥⎦

. (25)

Interchanging the double summation yields

m∑i=0

βi hi x(i)

n =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

k−1∑i=0

�i︷︸︸︷m∑

l=0

βl Vl,i hiρ(i)n,1

...k−1∑i=0

m∑l=0

βl Vl,i

︸︷︷︸�i

hiρ(i)n,N

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(26)

where �i is defined as

�i =m∑

l=0

βl Vl,i (27)

and needs to be evaluated only once before the beginning ofthe transient simulation.

Equation (26) shows that the vector∑m

i=0 βi hi x(i)n can

be computed using Nk multiplication and addition, therebymaking the overall complexity of this step of O(Nk).

C. Computing the Right-Hand Side Vector

In order to solve for ρn+1 in (19), the vector bn+1 is multi-plied by ϒ−1 to form the right-hand side. This multiplicationscales to O(Nαk2) and may increase the overall complexityof the algorithm.

Fig. 1. Physical parameters of Example 1.

Fig. 2. Interconnects circuit for Example 1.

However, the vector bn+1 represents the independentsources in the circuit and in most practical circuits, the numberof sources is very small compared to the number of nodes.Utilizing this fact, the multiplication ϒ−1 b can be computedin almost O(Nαk) by mapping the few entries in b due to theindependent stimuli and multiplying only the last componentin bn+1,i by the last column in V for all i = 1, . . . , N .This strategy keeps the computational complexity very closeto O(Nαk).

V. NUMERICAL EXAMPLES

This section presents the results of the numerical simula-tions using the proposed algorithm. We compare the CPUtime taken by the proposed algorithm versus the time takenby the low-order methods used in traditional SPICE-likesimulators. For the latter, we use the TR, which representsthe most accurate A-stable method in the whole class of LMSmethod [20]. It should be noted that the proposed algorithmreduces to the TR when executed with k = m = 1, in whichcase the order becomes equal to 2. We also employ the notationof (i/j) to denote running the proposed algorithm using m = iand k = j , thereby making the order equal to i + j .

A. Example 1: Two Coupled Transmission Lines

The circuit considered in this example is a coupled two-conductor transmission line whose physical cross-section isshown in Fig. 1 and length 8 cm. Fig. 2 shows the connectionsat the terminals of the lines. The p.u.l parameters of thecoupled lines are extracted using field solvers in HSPICE [21].The lines are then modeled using a uniform lumped segmen-tation approach [22] with 80 lumped sections for each line.

Table I summarizes the number of time points and CPU timetaken by the proposed technique at different orders, startingwith order 2 which represents the TR. The last column showsthe speedup achieved by the high-order method relative to thelow-order of the TR.



0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1x 10−8

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

Time (sec)

Vol

tsProposed method (order 6/6)Proposed method (order 3/3)TR

Fig. 3. Waveform comparison at the far end of victim line 2 (Example 1).

TABLE I

CPU TIME COMPARISON OF EXAMPLE 1

Order Number of Time Steps Proposed Method (s) Speedup w.r.t TR

2 (TR) 480 0.18 1.00

4 170 0.07 2.57

6 140 0.04 4.5

8 105 0.03 6.00

10 85 0.028 6.43

12 65 0.021 8.57

Fig. 4. Interconnect circuit for Example 2.

Fig. 3 shows a sample time-domain waveform at the far-endof the victim line (second line) using different orders. As seenfrom Table I, the number of time points required decreasessignificantly at higher orders.

B. Example 2: Massively Coupled Interconnect Bus

In this example, we consider a coupled interconnect circuitshown in Fig. 4. The length of the lines is 6 cm. The coupledlines are modeled using lumped segmentation approach [22](40 sections). The lines are excited with a voltage sourceattached to each fourth line, with 0.5 ns fall/rise time, 10 nspulse width, and 3.5 V magnitude. The size of the resultingmatrix is Nk × Nk, with N = 8000 and k representing theorder of the method being used.

To further illustrate the computational advantage of theproposed method, Fig. 5 shows the CPU time of one LUmatrix factorization (versus k) using the proposed algorithm,and contrasts it with the regular block LU factorization used

1 2 3 4 5 60

1

2

3

4

5

6

7

k

Tim

e (s

ec)

Previous Method in [12]Proposed Method

Fig. 5. CPU time comparison of one LU factorization (Example 2).

TABLE II

CPU TIME COMPARISON OF EXAMPLE 2 FOR L -STABLE METHODS

Order Number of Time Steps Proposed Method (s) Speedup w.r.t BE

0/1 (BE) 550 215.41 1.00

1/3 285 99.99 2.15

2/4 105 57.71 3.73

3/5 67 46.56 4.63

4/6 35 30.55 7.05

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1x 10−8

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Time (sec)

Vol

ts

Proposed method (order 6/6)Proposed method (order 4/4)TR

Fig. 6. Waveform comparison at the near end of aggressor line 12(Example 2).

previously [12]. For k > 1, this figure basically depicts themajor part of the additional computations incurred from thehigh-order scheme, when compared with the low-order (k = 1)one. It should be noted, however, that this computationalgrowth is a part of an overhead needed to implement thehigh-order scheme. However, it is usually the savings in thenumber of time points, made possible by the larger step sizes,that significantly offsets this overhead and results in an overallsaving of computational time.

What Fig. 4 clearly demonstrates is that the proposedalgorithm makes the computational overhead grow linearlywith the order of the method k, which agrees with the



50 Ω

1pF

1pF

Line 1

Line 64

50 Ω

1pF

1pF

Line 1

Line 64

50R

1pF

1pF

Line 1

Line 64

50 Ω

50Ω

50 Ω

10Ω

10Ω

10Ω

10Ω

Segment 1

Segment 2

Segment 3

Fig. 7. Large coupled interconnects network (Example 3).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1x 10−8

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Time (sec)

Vol

ts


Fig. 8. Waveform comparison at the far end of victim line 6 in segment 2(Example 3).

TABLE III



2 (TR) 205 261.09 1.00

4 102 78.33 3.33

6 81 57.86 4.51

8 67 40.35 6.47

10 51 31.28 8.34

12 33 26.84 9.72

analysis presented in the previous section with regards to thecomputational complexity. This should be contrasted againstthe quadratic growth in computational overhead that resultsfrom applying a regular BLU factorization [12] without takingadvantage of the special structure of the underlying matrixblocks.

Table II summarizes the results obtained when the proposedalgorithm is executed in its L-stable mode (k − 2 ≤ m < k).It also shows the achieved speedup relative to the low-orderL-stable method backward Euler (BE), which is obtained asthe special case at m = 0, k = 1.

Fig. 9. Eight-port linear circuit (Example 4).

TABLE IV



2 (TR) 652 2.35 1

4 306 1.12 2.09

6 204 0.92 2.55

8 101 0.60 3.91

10 62 0.32 7.34

12 35 0.22 10.68

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 10−8

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Time (sec)

Vol

ts


Fig. 10. Waveform comparison at port 2 (Example 4).

Fig. 6 shows a sample time-domain waveform at the nearend of the aggressor line 12 using different integration orders.As can be observed, the results obtained from the high-ordercan approximate the waveforms at a much lower number oftime points.

C. Example 3: Network of Massively Coupled Interconnects

The interconnect network shown in Fig. 7 is considered forthis example. The length of the lines is 4 cm. The coupledlines are modeled using lumped segmentation approach [22](40 sections). The circuit is excited with a current sourceattached to each fourth line in segment 1, with 0.5 ns fall/risetime, 10 ns pulse width, and 0.06 A magnitude.

Fig. 8 shows a sample comparison of time-domain responsesat Far end of line 6 in segment 2 using the proposed methodwith different integration orders.

Table III shows the total CPU time required to perform thetransient simulation with different orders. Column 3 shows thespeedup obtained using the proposed method compared to theconventional TR method.



D. Example 4: S-Parameters-Based Macromodel

In this example, we consider an interconnect circuit char-acterized by tabulated data of the S-parameters at differentfrequency points. The data is first fitted using a vector-fittingalgorithm [5], and then the results are synthesized using linearcomponents. The input ports are excited with voltage sourcesof 0.1 ns rise/fall time, 10 ns pulse width, and 3 V magnitude.The output ports are terminated with linear capacitors of 1 pF,as shown in Fig. 9. The CPU time comparison is shownin Table IV. In addition, Fig. 10 shows the time-domaincomparison of the waveforms at port 2 using the TR method,orders 4 and 6.

VI. CONCLUSION

In this paper, an efficient BLU factorization algorithm wasproposed to accelerate the performance of the Obreshkov-based high-order methods for simulation of large linearcircuits that typically arise from high-speed interconnectsand packages. The core of the proposed technique is a newalgorithm to perform the factorization of the system matrix.This factorization takes advantage of the special matrix struc-ture that arises from the linearity of the underlying circuit.It was shown, analytically and through numerical simulations,that the proposed technique reduces the factorization complex-ity from O(Nαk3) to only O(Nαk), where N is the size of thecircuit MNA formulation and k is an integer that correspondsto the order used.

REFERENCES

[1] K. Kundert, The Designer’s Guide to SPICE and Spectre. Norwell, MA:Kluwer, 1995.

[2] C. Balanis, Advanced Engineering Electromagnetics. New York: Wiley,1989.

[3] H. Heeb and A. E. Ruheli, “Three dimensional interconnect analysisusing partial element equivalent circuits,” IEEE Trans. Circuits Syst. I,Fundam. Theory Appl., vol. 39, no. 11, pp. 974–982, Nov. 1992.

[4] A. E. Ruehli, “Equivalent circuit models for three dimentional multi-conductor systems,” IEEE Trans. Microw. Theory Tech., vol. 22, no. 3,pp. 216–221, Mar. 1974.

[5] B. Gustavsen and A. Semlyen, “Rational approximation of frequencyresponses by vector fitting,” IEEE Trans. Power Del., vol. 14, no. 3,pp. 1052–1061, Jul. 1999.

[6] J. Vlach and K. Singhal, Computer Methods for Circuit Analysis andDesign. New York: Van Nostrand, 1983.

[7] G. Dahlquist, “A special stability problem for linear multistep methods,”BIT Numer. Math., vol. 3, no. 1, pp. 27–43, 1963.

[8] U. A. Ascher and L. R. Petzold, Computer Methods for Ordinary Dif-ferential Equations and Differential-Algebraic Equations. Philadelphia,PA: Soc. Ind. Appl. Math., 1998.

[9] A. Iserles and S. P. Nørsett, Order Stars. London, U.K.: Chapman &Hall, 1991.

[10] E. Gad, M. Nakhla, R. Achar, and Y. Zhou, “A-stable and L-stablehigh-order integration methods for solving stiff differential equations,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 28, no. 9,pp. 1359–1372, Sep. 2009.

[11] N. Obreshkov, “Sur les quadrature mecanique,” Akad. Nauk., vol. 65,no. 3, pp. 191–289, 1942.

[12] Y. Zhou, E. Gad, M. S. Nakhla, and R. Achar, “Structural charac-terization and efficient implementation techniques for A-stable high-order integration methods,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 31, no. 1, pp. 101–108, Jan. 2012.

[13] J. C. G. Pimentel, E. Gad, and S. Roy, “High-order A-stable andL-stable state-space discrete modeling of continuous systems,” IEEETrans. Circuits Syst. I, Reg. Papers, vol. 59, no. 2, pp. 346–359,Feb. 2012.

[14] Y. Zhao, J. Feng, and C. Tse, “Discrete-time modeling and stabil-ity analysis of periodic orbits with sliding for switched linear sys-tems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 11,pp. 2948–2955, Nov. 2010.

[15] F. del Aguila-Lopez, P. Pala-Schonwalder, P. Molina-Gaudo, andA. Mediano-Heredia, “A discrete-time technique for the steady-stateanalysis of nonlinear class-e amplifiers,” IEEE Trans. Circuits Syst. I,Reg. Papers, vol. 54, no. 6, pp. 1358–1366, Jun. 2007.

[16] W. Melvin and D. Frey, “Continuous-time to discrete-time conversionvia a novel parametrized s-to-z-plane mapping,” IEEE Trans. CircuitsSyst. II, Analog Digit. Signal Process., vol. 44, no. 10, pp. 829–834,Oct. 1997.

[17] C. Gear, “Simultaneous numerical solution of differential-algebraicequations,” IEEE Trans. Circuit Theory, vol. 18, no. 1, pp. 89–95,Jan. 1971.

[18] T. A. Davis and E. Palamadai, User Guide for KLU and BTF., Dept.Comput. Inf. Sci. Eng., Univ. Florida, Gainesville, 2009.

[19] N. Kapre and A. DeHon, “SPICE2 spatial processors interconnected forconcurrent execution for accelerating SPICE circuit simulator using anFPGA,” IEEE Trans. Comput.-Aided Design Integr. Circuit Syst., vol. 31,no. 1, pp. 9–22, Jan. 2012.

[20] G. Wanner, E. Hairer, and S. P. Nøsett, “Order stars and stabilitytheorems,” BIT Numer. Math., vol. 18, no. 4, pp. 475–489, 1978.

[21] HSPICE, Star-Hspice Manual, Meta-Software Inc., Mountain View, CA,2010.

[22] C. Paul, Analysis of Multiconductor Transmission Lines. New York:Wiley, 1994.

Mina A. Farhan (S’09) received the B.Eng. degreein electrical engineering from Minia University,Minya, Egypt, in 2006, and the M.A.Sc. degree fromCarleton University, Ottawa, ON, Canada, in 2010,where he is currently pursuing the Ph.D. degreein electrical engineering with the Department ofElectronics.

His current research interests include computer-aided design of very large scale integration circuits,simulation and modeling of high-speed intercon-nects, model order reduction techniques, parallel

processing, and numerical techniques.

Emad Gad (S’99–M’04) received the B.Sc. degreefrom Alexandria University, Alexandria, Egypt, andthe M.Sc. degree from Cairo University, Giza, Egypt,in 1991 and 1997, respectively, both in electricalengineering, and the Ph.D. degree from CarletonUniversity, Ottawa, ON, Canada, in 2003.

He is currently an Associate Professor with theSchool of Electrical Engineering and Computer Sci-ence, University of Ottawa, Ottawa. His currentresearch interests include numerical simulation andmodeling approaches of high-speed and radio fre-

quency circuits.Dr. Gad was a co-recipient of the 2002 IEEE Microwave Prize for a

significant contribution to the field of endeavor of the IEEE MTT Society, theGovernor General Gold medal, the Carleton University Medal for outstandingacademic performance at the graduate level, and the Ottawa Center forResearch and Innovation (OCRI 2003) Award as a student researcher of theyear. He is a practicing Professional Engineer of Ontario.



Michel S. Nakhla (S’73–M’75–SM’88–F’98–LF’12) received the Ph.D. degree in electrical engi-neering from the University of Waterloo, Waterloo,ON, Canada, in 1975.

He is currently a Chancellors Professor of elec-trical engineering with Carleton University, Ottawa,ON, Canada. From 1976 to 1988, he was with Bell-Northern Research, Ottawa, as the Senior Managerof the Computer-Aided Engineering Group. In 1988,he joined Carleton University, as a professor andthe Computer-Aided Engineering Senior Industrial

Chair established by Bell-Northern Research and the Natural Sciences andEngineering Research Council of Canada. He is the Founder of the High-Speed CAD Research Group at Carleton University. He has authored or co-authored more than 300 peer-reviewed papers in journals and conferences,two books, six multimedia books on signal integrity, and seven chapters indifferent books. His current research interests include modeling and simulationof high-speed circuits and interconnects, nonlinear circuits, radio frequencyand microwave circuits, parallel processing, multidisciplinary optimization,and neural networks.

Dr. Nakhla is on various international committees, including the StandingCommittee of the IEEE International Signal Propagation on InterconnectsWorkshop, the Technical Program Committee of the IEEE InternationalMicrowave Symposium, the Technical Program Committee of the IEEEConference on Electrical Performance of Electronic Packaging and Systems,and the CAD Committee (MTT-1) of the IEEE Microwave Theory andTechniques Society. He is an Associate Editor of the IEEE TRANSACTIONS

ON ADVANCED PACKAGING and was an Associate Editor of the IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS. He was a member on manyCanadian and international government-sponsored research grants selectionpanels He was a technical consultant for several industrial organizations andis the principal investigator for several major sponsored research projects.

Ramachandra Achar (S’95–M’00–SM’04–F’13)received the B.Eng. degree in electronics engineer-ing from Bangalore University, Bangalore, India, theM.Eng. degree in microelectronics from the BirlaInstitute of Technology and Science, Pilani, India,and the Ph.D. degree from Carleton University,Ottawa, ON, Canada, in 1990, 1992, and 1998,respectively.

He is currently a Professor with the Department ofElectronics Engineering, Carleton University, wherehe has been since 2000. He has held positions in

leading research labs, including T. J. Watson Research Center, IBM, NewYork, in 1995, Larsen and Toubro Engineers Ltd., Mysore, India, in 1992,Central Electronics Engineering Research Institute, Pilani, India, in 1992,and the Indian Institute of Science, Bangalore, India, in 1990. He hasauthored or co-authored over 180 peer-reviewed articles in internationaljournals and conferences, six multimedia books on signal integrity, and fivechapters in different books. His current research interests include signal andpower integrity analysis, circuit simulation, parallel and numerical algorithms,EMC/EMI analysis, microwave and RF algorithms, modeling and simulationmethodologies for sustainable and renewable energy, and mixed-domainanalysis.

Dr. Achar was a recipient of several prestigious awards, including theCarleton University Research Achievement Award in 2010 and 2004, theNatural Science and Engineering Research Council Doctoral Medal in 2000,the University Medal for the Outstanding Doctoral Work in 1998, the StrategicMicroelectronics Corporation Award in 1997, and the Canadian Microelec-tronics Corporation Award in 1996. He was also a co-recipient of the IEEEAdvanced Packaging Best Transactions Paper Award in 2007. His studentshave won numerous Best Student Paper Awards in international forums. Dr.Achar is currently a Distinguished Lecturer of the IEEE Circuits and SystemsSociety and a Guest Editor of the IEEE TRANSACTIONS ON COMPONENTSPACKAGING MANUFACTURING TECHNOLOGY, for two special issues “Vari-ability Analysis” and “3DICs/Interconnects.” He was the General Co-Chairof the IEEE International Conference on Electrical Performance of ElectronicPackages & Systems (EPEPS-2010, 2011) and as an International GuestFaculty on the invitation of the Department of Information Technology ofGovernment of India, under the SMDP-II Program. He is currently the Gen-eral Chair for HPCPS-2012 and on the executive/steering/technical-programcommittees of several leading IEEE international conferences, such as EPEPS,EDAPS, ECTC, SPI, ASP-DAC, etc., and on the technical committees ofthe EDMS (TC-12 of CPMT) and CAD (MTT-1). He is a Founding FacultyMember of the Canada-India Center of Excellence, a member on the CanadianStandards Committee On nanotechnology, the Chair of the joint chapters ofCAS/EDS/SSC societies of the IEEE Ottawa Section, and is a consultant forseveral leading industries focused on high-frequency circuits, systems andtools. He is a practicing Professional Engineer of Ontario.

new method for fast transient simulation of large linear circuits using high-order stable methods

Documents