efficient model order reduction of electrical networks ... · abstract e cient model order...

Efficient Model Order Reduction of Electrical Networks with ManyPorts

by

Denis Oyaro

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of The Edward S. Rogers Sr. Department of Electrical andComputer EngineeringUniversity of Toronto

c© Copyright 2015 by Denis Oyaro

Abstract

Efficient Model Order Reduction of Electrical Networks with Many Ports

Denis Oyaro

Master of Applied Science

Graduate Department of The Edward S. Rogers Sr. Department of Electrical and Computer

Engineering

University of Toronto

2015

Model order reduction of electrical networks plays a key role in accelerating circuit simulations.

Traditional reduction methods, however, do not scale well with network size and number of

ports, since they utilize dense matrix manipulations that require a lot of memory and CPU

time. In addition, the obtained reduced models are dense and thus inefficient in subsequent

simulations. In this thesis, we propose TurboMOR, an efficient reduction technique for RC

networks with many ports. TurboMOR performs reduction efficiently by avoiding dense matrix

manipulations. TurboMOR overcomes the scalability and accuracy limitations of previously-

proposed methods. TurboMOR guarantees passivity by construction and leads to sparse models

that run efficiently in conventional circuit simulators.

ii

Dedication

To the most High, my parents, and sisters.

iii

Acknowledgements

I would like to express my sincere appreciation to Prof. Piero Triverio, my research and

thesis supervisor, for his invaluable support, encouragement, and mentorship. I have learnt a

lot under his supervision, and I am honored to have been part of his research group these past

two years.

I would also like to thank Prof. Hum, Prof. Sarris and Prof. Adve for being on my thesis

committee. Their insightful feedback and suggestions have helped improve this thesis.

Finally, I would like to acknowledge the Graduate Student Assistantship, ECE fellowship,

and the Rogers Scholarship, which have all supported me financially throughout my Master’s

degree program.

iv

Contents

1 Introduction 1

1.1 Compact Modeling of On-chip Interconnects . . . . . . . . . . . . . . . . . . . . . 1

1.2 Model Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Moment Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.3 Classes of Model Order Reduction Techniques . . . . . . . . . . . . . . . . 10

1.2.4 PRIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.5 Recent Techniques For Electrical Networks with Many Ports . . . . . . . 15

1.3 Thesis Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 TurboMOR: Reduction of RC Networks with Many Ports 19

2.1 Theoretical Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.1.1 Matching Two Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.2 Matching Four Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.3 Matching More Than Four Moments . . . . . . . . . . . . . . . . . . . . . 28

2.2 Practical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2.1 First Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2.2 From the Second to the Second Last Iteration . . . . . . . . . . . . . . . . 30

2.2.3 Last Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3 On the Singularity of Matrix G . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4 Proof of Moment Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

v

2.5 Partitioning for Very Large Networks . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.6.1 Reduction Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.6.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.6.3 Efficiency of the Reduced Models . . . . . . . . . . . . . . . . . . . . . . . 44

2.6.4 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3 On Reduction in Presence of Inductors 51

3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.1.1 Reducing RLC Networks with SIP . . . . . . . . . . . . . . . . . . . . . . 54

3.1.2 The Proposed LDL Method . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.2.1 Non-singular Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.2.2 Singular Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Conclusions 74

4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Bibliography 77

vi

List of Tables

1.1 Number of nodes, resistors, capacitors and inductors of power grid benchmarks

from real industrial designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Number of nodes, resistors and capacitors of power grid benchmarks from real

3D integrated circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Reduction time for the different methods on various test networks (time in sec-

onds). The acronym P stands for partitioning. . . . . . . . . . . . . . . . . . . . 39

2.2 Reduction time of TurboMOR with partitioning and SparseRC, for very large

power grid benchmarks. All times in seconds. . . . . . . . . . . . . . . . . . . . . 41

2.3 Simulation time of the ROMs obtained with the different methods (time in sec-

onds). The acronyms S.T and P stand for simulation time and partitioning,

respectively, and speed represents speedup factor with respect to original simu-

lation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.4 Simulation time of reduced models of very large power grids (in seconds). An

asterisk stands for numerical problems. . . . . . . . . . . . . . . . . . . . . . . . . 47

vii

List of Figures

1.1 A magnified view of interconnect metallization layers in an integrated circuit. . . 2

1.2 A simple model of a power distribution network. . . . . . . . . . . . . . . . . . . 2

1.3 A graphical illustration of the concept of model order reduction of interconnect

networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 A decomposition of system (2.7a), (2.7b) into two dynamically coupled subsys-

tems, Σ(1)1 and Σ

(1)2 . At DC, the two subsystems are completely decoupled. . . . 23

2.2 A decomposition of system (2.7a), (2.7b) into three dynamically coupled subsys-

tems. At DC, all three subsystems are decoupled. . . . . . . . . . . . . . . . . . . 27

2.3 Reduction time of PRIMA and TurboMOR without partitioning vs number of

ports. In both methods, reduction is done to match six moments (q = 3). . . . . 42

2.4 Reduction time of SparseRC and TurboMOR with partitioning vs number of

ports. In both methods, reduction is done to match six moments (q = 3). . . . . 43

2.5 Reduction time of PRIMA and TurboMOR without partitioning vs the node to

port ratio. Both methods match six moments. . . . . . . . . . . . . . . . . . . . . 43

2.6 Reduction time of SparseRC and TurboMOR with partitioning vs the node to

port ratio. Both methods match six moments. . . . . . . . . . . . . . . . . . . . . 44

2.7 Transient response of original network and reduced models from TurboMOR and

PRIMA. Both reduced models match two moments at DC. . . . . . . . . . . . . 48

2.8 Maximum absolute transient error computed across all ports of the original net-

work, due to reduced models from TurboMOR and PRIMA. Both reduced models

match two moments at DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

viii

2.9 Transient response of original network and reduced models from TurboMOR and

PRIMA. Both reduced models match four moments at DC. . . . . . . . . . . . . 49

2.10 Maximum absolute transient error computed across all ports of the original net-

work, due to reduced models from TurboMOR and PRIMA. Both reduced models

match four moments at DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.1 A decomposition of system (3.9a)-(3.9b) into a cascade of two subsystems, Σ1 and

Σ2. At DC, subsystem Σ2 has no influence on the transfer function of (3.9a)-(3.9b). 56

3.2 Sample RLC circuit that demonstrates that the G4 block in (3.2a) can be singular. 59

3.3 Reduction time of proposed LDL method, SIP and PRIMA vs number of ports.

All methods match one moment at DC. . . . . . . . . . . . . . . . . . . . . . . . 67

3.4 Reduction time of proposed LDL method, SIP and PRIMA vs the node-to-port

ratio. All methods match one moment at DC. . . . . . . . . . . . . . . . . . . . . 68

3.5 Magnitude of the near-end coupling coefficient between two adjacent lines. The

response is generated using the original network, and the reduced models ob-

tained from the proposed LDL method and PRIMA. Both reduced models match

one moment at DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.6 The same response as in Fig. 3.5. This figure includes also the result from SIP.

Both reduced models match one moment at DC. . . . . . . . . . . . . . . . . . . 70

3.7 The magnitude of the transmission coefficient between two ports of a network

with 2 lines. The response is generated using the original network, and the

reduced model from the proposed LDL method. The reduced model matches

one moment at DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.8 The magnitude of the transmission coefficient between two ports of a network

with 3 lines. The response is generated using the original network, and the

reduced model from the proposed LDL method. The reduced model matches

one moment at DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

ix

Chapter 1

Introduction

1.1 Compact Modeling of On-chip Interconnects

Integrated circuits can be found in almost all electronic devices and systems in use today. They

are a key part of consumer products like computers, cellphones, global positioning systems,

digital cameras and many others. Examples of such chips include memory chips and micro-

processors. An integrated circuit consists of non-linear elements, like transistors and diodes

connected by a dense network of tiny wires, typically known as interconnect network or sim-

ply interconnect. On-chip interconnects distribute signals and power to all components of the

chip. Figure 1.1 shows a small portion of a typical interconnect in an integrated circuit, taken

from [1]. An example of an interconnect network is the power distribution network or power

grid, which distributes power to the circuits on a chip [2]. Power grids typically consist of an

array of orthogonal wires that distribute electrical power from the voltage regulators to the logic

gates. An example of a power grid is depicted in Figure 1.2, taken from [3]. Typically, power

grids comprise of several metal layers, with wires at different layers connected using vias [3].

Another example is the on-chip bus, which consists of lines that carry signals from one part of

a chip to another part. For instance, CPU-memory buses transfer signals between the central

processing unit (CPU) and memory.

The recent advancements in integrated circuit design have led to chips with smaller feature

sizes and dense integration. A chip today comprises of billions of transistors. This trend is

expected to continue in the years to come, based on observations from the International Tech-

1

Chapter 1. Introduction 2

Figure 1.1: A magnified view of interconnect metallization layers in an integrated circuit.

Figure 1: Illustration of a small portion of a typical power grid.

For this set of benchmarks, we will focus on the steady state(DC) problem. The reason is that the complexity of DC prob-lem is largely independent of circuit activity, and therefore en-ables us to provide realistic power grid designs without the in-tellectual property complications that would arise from havingto share the other details of integrated circuit. In the near fu-ture, and depending on the response from the research com-munity, we hope to extend the benchmarks to include time andfrequency domain operation as well.

III. POWER GRID MODELING

In this section we will describe the modeling assumptionsthat were made in producing the benchmarks. The assumptionswere used in developing the equivalent electrical model for thepower grid, for the integrated circuit, and for the package. Thenext three subsections describe each of these components.

A. Power Grid Extraction

A power grid is typically composed of an orthogonal meshof wires, a somewhat idealized representation is illustrated inFigure 1. In a realistic design, the mesh is not complete, i.e.some wires may be missing or truncated. Also, the periodicityand density of the wires may vary, so areas of the chip whichrequire less power may have fewer and narrower power gridwires than areas which are more power hungry.The translation of the geometric shapes representing the

power grid wires to an equivalent circuit useful for simulationis done via circuit extraction, which has always been a standardpart of the integrated circuit design process. In the context ofthe steady state analysis of power grids, however, the circuitextraction procedure can be drastically simplified. First, weare only interested in the resistance of the power grid, so thereis not need to look into the capacitance and inductance of thepower grid wires. Second, power grid wires are typically quite

Figure 2: Justification for typical power grid wire aspect ratio.

long. To understand why that it, let us go back to the assump-tion that roughly 10% of the wiring resources are used for thepower grid. Let us also assume that on any given wiring level,the width and spacing of wires are equal and let us denote thatdimension by D. Finally, assume we have a power grid withtwo nets, VDD and VSS (also commonly referred to as ground).Since one of every 10 wires is a power wire, then one of every20 wires is a VDD wire. Assuming that the metal level abovehas the same dimensionD, this means that a typical power gridsegment will be of width D and length 20D. This situation isillustrated in Figure 2.Given the above observations, an efficient scheme for power

grid extraction would rely on modeling each intersection as avia, and each via-to-via segment as a single resistor. Such anextraction scheme is significantly simpler than the fully gen-eral algorithms used for ”real” circuit extraction, and can beimplemented for full-chip design quite efficiently.Two further simplifications serve to complete the electrical

equivalent model of the power grid:

• First, we assume that connection between circuits and thepower grid occur at the lowest metal level (obviously),which we will refer to as the M1 level. The reason forthis choice is obvious, since no contact can occur to thedevices that make up the integrated circuit except at thelowest level of metal.

• Second, we assume that the connection between circuitsand the power grid only occurs at the intersection betweenthe lowest and next-to-lowest level (which we will refer toas the M2 level). The rationale for this choice has moreto do with managing the complexity of the overall prob-lem than anything else. Consider the case where -say- 5individual gates are connected between the wires (as il-lustrated in Figure 2). If we were to model the individ-ual connections between the gates and the power grid wewould have 5 times the number of nodes.

Figure 1.2: A simple model of a power distribution network.

nology Roadmap for Semiconductors [4]. The density and size of components on a chip has led

to interconnect networks of great complexity and density [5]. The advent of 3D integration [6,7]

is also significantly increasing the complexity of signal and power networks on a chip, since you

have several dies stacked on top of each other. The AMD Fiji is a prominent example of 3D

integrated circuit, being the first graphic processing unit to feature this novel technology. The

complexity of interconnect networks in 3D integrated circuits is unprecendented, due to the

addition of the vertical dimension. Advances in interconnect design, together with the increase

of transistor switching speeds, have greatly affected signal and power integrity on chips [8–10].

The electromagnetic effects of the interconnects, due to the resistance and capacitance of the

wires and vias, have influenced signal propagation and power distribution. Effects such as

reflections, crosstalk, and attenuation cause delays and distortion of signals on interconnect

lines. At higher frequencies of operation, the inductances of the wires can cause ringing [8],


Benchmarks #Nodes #Resistors #Capacitors #Inductors

ibmpg2t 164,238 245,163 36,838 330

ibmpg3t 1,041,535 1,602,626 201,054 955

ibmpg4t 1,212,365 1,826,589 265,944 962

ibmpg6t 2,367,183 2,410,486 761,484 381

Table 1.1: Number of nodes, resistors, capacitors and inductors of power grid benchmarks fromreal industrial designs.

and further compromise signal integrity. For power grid networks, the DC and time-varying

voltage drops along the wire lines, as circuit blocks on the chip draw current, lead to voltage

fluctuations or power supply noise at the terminals of the circuits. Such fluctuations affect the

switching speeds and reliability of the chips [2,11]. If the voltages at the terminals drop too low,

the switching of the transistors could slow down, or even they could fail to switch. A high rise

in voltage, on the other hand, could cause circuit malfunction as a result of an excessive electric

field. Correctly designing interconnects has therefore become critical to satisfying signal and

power integrity requirements. The International Technology Roadmap for Semiconductors [4]

lists the design of interconnects as one of the main factors affecting overall chip performance.

The computer-aided design of integrated circuits involves a post-layout verification stage in

which chip performance is analyzed while accounting for the electromagnetic effects due to the

interconnect parasitics. During this stage, the parasitic resistance, capacitance and inductance

of the interconnects is extracted using electromagnetic solvers to obtain RC or RLC equivalent

network models [12–15]. Some of the extraction tools used by industry include Ansys Q3D

Extractor and Mentor Graphics Calibre xRC. The choice of an RC or RLC model depends on

the frequency of operation and length of the interconnect lines [8]. At higher frequency and

for long interconnect lines, inductive effects become prominent. For on-chip interconnects, RC

models are the most popular choice, followed by RLC models for critical, high-performance

networks. Inclusion of inductance is instead typically mandatory at the package and printed-

circuit-board level. After extraction, the model for the parasitic network is connected to the

other chip components, and a system-level simulation is perform to verify the design. However,

parasitic networks can be very large and complex. They can easily consist of several millions of

passive elements and nodes. Table 1.1 shows the number of nodes and components that can


Benchmarks #Nodes #Resistors #Capacitors

3D-µPD 1,021,203 1,178,097 967,057

3D-SAP 4,589,076 7,639,042 4,147,603

3D-PAC 8,988,071 13,238,494 6,068,910

Table 1.2: Number of nodes, resistors and capacitors of power grid benchmarks from real 3Dintegrated circuits.

Interconnect

Network Non-linear

circuits

Model Order

Reduction

Compact

Model

Non-linear

circuits Non-linear

circuits

Non-linear

circuits

Figure 1.3: A graphical illustration of the concept of model order reduction of interconnectnetworks.

be found in a typical model for the power grid alone. These numbers are taken from a popular

set of benchmark models from IBM [3,16]. The structure of these power grids is similar to the

example in Figure 1.2. Table 1.2 gives the number of nodes and components of some power

grid models from 3D integrated circuit designs [17]. By comparing the numbers in Tables 1.1

and 1.2, we can notice the substantial increase in complexity from 2D to 3D integration!

Direct simulations involving large parasitic networks can require a lot of memory and CPU

time. Even with the most scalable circuit simulators available on the market, such simulations

can easily take several days or can even be infeasible [18–20]. As a consequence, there has

been a growing demand by design engineers for reduced-order models for parasitic networks.

Typical parasitic networks can be reduced substantially since their behavior is mostly influenced

by a handful of dominant poles. Many poles present in the original network indeed play a


significant role only at very high frequency, well beyond the bandwidth actually excited during

the operation of the integrated circuit. It is therefore possible to accurately approximate the

network response with only a small set of poles, while neglecting all the others. One of the

most popular approaches to generate such compact models of interconnect networks is model

order reduction [21–24].

The electrical behavior of an interconnect network can be modeled mathematically using

linear differential equations, with the unknowns referred to as state variables of the network.

The number of state variables is the order of the network, and is very high due to the network

size. Model order reduction seeks a mathematical model with a much smaller number of state

variables, that still accurately approximates the electrical behavior of the network while also

preserving its physical properties like passivity. Figure 1.3 illustrates the concept of model order

reduction of interconnect networks. Applying model order reduction in interconnect simulation

can be summarized as follows:

1. Extract an equivalent electrical network (RC or RLC) of the on-chip interconnect.

2. Generate an accurate reduced order approximation of the electrical network.

3. Convert the reduced order approximation to an equivalent circuit. This can be done using

the procedure in [25].

4. Connect the equivalent circuit to the rest of the components on the chip and simulate for

power or signal integrity verification.

In the next section, we review the most popular model order reduction methods available

today, with special emphasis on those that are typically applied to interconnect networks. Then,

we discuss the limitations of existing methods in handling the interconnect models that arise in

industrial designs. These limitations are becoming a significant issue for many microelectronic

companies, and are the main motivation behind this work.


1.2 Model Order Reduction

1.2.1 Problem Formulation

Parasitic networks consist of resistors, capacitors and, if needed, inductors. Ports are defined

where the interconnect network is connected to a non-linear device, such as a transistor. Con-

sider a parasitic network of order m and with p ports. Using modified nodal analysis [26],

such a network can be represented in the Laplace domain under the impedance or admittance

representation as Gx(s) + sCx(s) = Bu(s)

y(s) = LTx(s)

(1.1)

where matrices G, C ∈ Rm×m are the conductance and capacitance matrices, respectively. Ma-

trix G is in general indefinite1 while matrix C is symmetric and non-negative definite2. Vectors

u(s),y(s) ∈ Rp represent the inputs and outputs of the network, respectively. In the impedance

representation, the port currents and voltages are the inputs and outputs, respectively. The

reverse holds for the admittance representation. Matrix B ∈ Rm×p maps the ports to the

nodal equations, and vector x(s) ∈ Rm collects all nodal voltages, voltage source currents and

inductor currents. Matrix L ∈ Rm×p selects the outputs from x(s). The variable s here denotes

the Laplace variable, and T stands for matrix transpose. The inputs and outputs of (1.1) are

related by

y(s) = H(s)u(s) (1.2)

where

H(s) = LT (G + sC)−1B (1.3)

is the transfer function matrix of the network.

Model order reduction techniques aim at approximating network (1.1) with a model

Gx(s) + sCx(s) = Bu(s)

y(s) = LT x(s)

(1.4)

1A matrix is indefinite when it is neither positive nor negative definite.2A matrix A is non-negative definite (equivalently: positive semi-definite) if is is symmetric and xTAx ≥ 0

for any vector x.


where G, C ∈ Rn×n, B, L ∈ Rn×p and x(s) ∈ Rn. The order of this new model is n, and

typically n� m. The input-output relation of (1.4) is

y(s) = H(s)u(s) (1.5)

where

H(s) = LT (G + sC)−1B (1.6)

is the transfer function matrix of the reduced model.

The ideal model order reduction technique for electrical networks must satisfy the following

requirements:

1. The reduction technique must accurately approximate the input-output characteristics of

the original network (1.1). The error in approximation

e(s) = ‖H(s)− H(s)‖ (1.7)

must be small enough in a suitable norm, over the frequency range of interest. Typically

the ∞-norm, or the Euclidean norm is used. Such an accuracy requirement ensures that

the reduced model correctly predicts the response of the original network.

2. The reduction technique must generate passive reduced models. Since interconnects are

passive systems, and therefore can not produce energy on their own, it is reasonable to

generate reduced models that are passive. Moreover, non-passive models could lead to

unstable transient simulations [27]. For networks in impedance or admittance representa-

tion, a necessary and sufficient condition for passivity is that the transfer function matrix,

H(s), must be positive real [28]. Matrix H(s) is positive real if and only if it satisfies the

following conditions [24]:

(a) H(s) is regular, for Re(s) > 0.

(b) H(s) = H(s), for Re(s) > 0.

(c) H(s) + H(s)H ≥ 0, for Re(s) > 0.


Here, symbols . and H denote complex conjugate and Hermitian transpose, respectively,

while symbol ≥ denotes non-negative definiteness. For networks in the modified nodal

analysis form (1.1), sufficient but not necessary conditions for positive realness are [18]

G + GT ≥ 0 (1.8)

C = CT ≥ 0 (1.9)

L = B (1.10)

The interconnect networks are passive, and therefore satisfy conditions (1.8), (1.9) and (1.10)

in the modified nodal analysis form. One way to make the reduced model passive is to

ensure it satisfies

G + GT ≥ 0 (1.11)

C = CT ≥ 0 (1.12)

L = B (1.13)

3. The reduction technique must be computationally efficient. Due to the limited computa-

tional resources, we seek reduction techniques that can properly utilize the available mem-

ory resources, and whose algorithms are not so complex that they require a prohibitive

amount of CPU time. Such methods, in addition, must scale well with the network size

and number of ports. If the model order reduction technique is too slow, it may not be

worth reducing the network.

4. Lastly, the reduced models obtained must be faster to simulate than the original networks.

This is actually the goal of employing model order reduction in VLSI computer-aided

design. The original network matrices in (1.1), though large in size, are generally sparse.

We would like methods that generate reduced model matrices that are also sparse. This

is because the use of reduced but dense matrices in simulations could turn out to be

computationally costly, even more than with the original matrices, due to expensive matrix

inversions and factorizations. This would defeat the purpose of model order reduction.


1.2.2 Moment Matching

Before reviewing the different classes of model order reduction techniques in the literature, we

first discuss the concept of moment matching, central to most techniques. Moment matching

is one of the most popular ways of preserving accuracy in the reduced model and dates back

to the earliest technique, the asymptotic waveform evaluation (AWE) technique [29, 30]. In

order to explain it, let us consider the Taylor series expansion of the original network’s transfer

function H(s) at a fixed frequency point, say s = 0.

H(s) =+∞∑k=0

Mksk (1.14)

The coefficients Mk, of the Taylor series expansion, are referred to as moments of the network.

These coefficients have a physical meaning. For the expansion point s = 0, the 0-th moment is

the response of the network at DC and the low order moments represent the behavior of the

network at low frequencies. The moments Mk can be calculated as

Mk =1

k!

dkH(s)

dsk

∣∣∣∣s=0

∀k = 0, 1, 2, ... (1.15)

Through substitution of H(s) using (1.3) and performing matrix derivatives, moments in (1.15)

can be expressed in terms of the network matrices [31]

Mk = LT (−G−1C)kG−1B ∀k = 0, 1, 2, ... (1.16)

Similarly for the transfer function H(s) of the reduced model, the Taylor series expansion at

s = 0 is

H(s) =

+∞∑k=0

Mksk (1.17)

where the coefficients Mk are the moments of the reduced model. They describe the frequency

response of the model. The low order moments capture the response of the reduced model at


low frequencies. We can calculate Mk as

Mk =1

k!

dkH(s)

dsk

∣∣∣∣∣s=0

∀k = 0, 1, 2, ... (1.18)

Using (1.6) and performing matrix derivatives, the moments (1.18) can be expressed in terms

of the reduced model matrices

Mk = LT (−G−1C)kG−1B ∀k = 0, 1, 2, ... (1.19)

As mentioned earlier, a model order reduction technique aims at capturing the input-ouput

behavior of the original network by minimizing the norm of the error between H(s) and H(s),

defined as e(s) in (1.7), over a frequency range of interest. One way of doing this is by moment

matching

Mk = Mk ∀k = 0, 1, 2, ... , q − 1 (1.20)

up to a desired order q. The relation (1.20) implies that the reduced model is a low-pass

approximation of the original network. Typically, matching more moments of the original

network improves the accuracy of the reduced model over a wide frequency band, but increases

its size.

1.2.3 Classes of Model Order Reduction Techniques

Explicit Moment Matching methods

Several classes of model order reduction methods exist in the literature [23]. The earliest class is

known as explicit moment matching methods, and the most popular of these is the Asymptotic

Waveform Evaluation (AWE) technique [29, 30, 32]. This class of methods explicitly computes

the moments of the original network. Using these moments, it then computes the transfer

function H(s) of the reduced model as a pade approximation [33] of the transfer function H(s)

of the original network. For simplicity, let us consider the case of a one-port network with

transfer function h(s) and moments mk. After explicitly computing moments of the original


network, the transfer function h(s) of the reduced model is computed as

h(s) =α0 + α1s

1 + α2s2 + ... + αn−1s

n−1

1 + β1s1 + β2s2 + ... + βnsn≈ m0s

0 +m1s1 +m2s

2 + ... +mqsq (1.21)

with q = 2n− 1. The process of explicitly computing the moments of the original network and

then solving for the numerator and denominator coefficients in (1.21) is numerically unstable,

especially when dealing with high order moments [24]. Ill-conditioning can even arise at as low

as the 4th or 5th order moment. As a result, explicit moment matching methods tend to lose

accuracy in applications where high order moment approximations are required.

Implicit Moment Matching Methods

This class still achieves moment matching, but without explicitly computing the moments of

the original network [23, 31, 34, 35]. An implicit moment matching method performs reduction

as follows. The first step is to apply a change of variable x = Qx in the original network (1.1),

where Q ∈ Rm×n and x ∈ Rn. This is followed by premultiplying (1.1) with matrix WT , where

W ∈ Rm×n. The reduced model obtained isWTGQx(s) + sWTCQx(s) = WTBu(s)

y(s) = LTQx(s)

(1.22)

and the reduced model matrices are

G = WTGQ, C = WTCQ, B = WTB, L = WTL (1.23)

The matrices W and Q are referred to as projection matrices, and are usually full rank to

avoid generating singular reduced models [34]. In addition, they satisfy the relation WTQ = I,

where I ∈ Rn×n is the identity matrix. If W = Q, (1.22) is an orthogonal projection, commonly

known as Galerkin projection. Otherwise, it is a biorthogonal projection. Matrices W and Q

are constructed using the concept of Krylov subspace [34]. A Krylov subspace is in general

constructed from two matrices, that we denote here as A and R. For implicit moment matching

methods, A and R are related to the network matrices in (1.1). A Krylov subspace of order q


is defined as

Kq(A,R) = span{R,AR,A2R, ...,Aq−1R} (1.24)

Matrices W and Q are constructed in such a way that their columns form basis vectors of

Krylov subspaces in the form (1.24). By constructing the projection matrices in this way, the

reduced model (1.22) will match moments of the original network up to a given order [34]. In

a biorthogonal projection, the reduced model will match the first 2q moments of the original

network [36]. In an orthogonal projection, it will match the first q moments [18]. The computa-

tion of the basis vectors is numerically stable, which makes implicit moment matching methods

more robust and accurate than explicit moment matching methods [24]. In a biorthogonal

projection, the basis vectors are computed using the Lanczos process [36]. In an orthogonal

projection, they are computed using the Arnoldi process [37]. Due to the construction of the

reduced matrices in a biorthogonal projection, passivity is not always guaranteed. Later in

section 1.2.4, we will discuss the method PRIMA [31], an implicit moment matching method,

that uses an orthogonal projection to generate passive reduced models. This method is based

on the Arnoldi process [37], and we will often refer to it in this thesis.

Truncated Balanced Realization Methods

The other major class of model order reduction techniques comprises of the truncated balanced

realization methods [22, 38–40]. This class makes use of control theory. Instead of using mo-

ment matching, accuracy is achieved by identifying and discarding the weakly controllable and

observable states in the original system. These correspond to states that are difficult to excite,

and when excited do not contribute significantly to the output. Discarding such states in the

reduced model does not affect significantly the input-output behavior of the original system.

Truncated balanced methods provide an error metric, related to the number of discarded states,

that can be used a prior to set the level of accuracy in the reduced models. This is not the case

for moment matching methods because it is hard to know in advance the number of moments

required for sufficient accuracy. The process of identifying states to discard is however compu-

tationally expensive, especially for large systems. Because of this, implicit moment matching

methods are always preferred to the truncated balanced realization methods.


The list of classes discussed here is not exhaustive. There are other classes such as elimina-

tion methods [41, 42] that work directly on reducing certain parts of the original network. For

such methods, knowledge of the network topology helps in reduction but this is not always avail-

able. We focus in this thesis on implicit moment matching methods because of their efficiency

and accuracy over truncated balanced realization methods and explicit moment matching meth-

ods, respectively. Next, we discuss one of the most popular implicit moment matching methods

in use today, and highlight its limitations which motivate the development in this thesis.

1.2.4 PRIMA

The Passive Reduced-Order Interconnect Macromodeling Algorithm (PRIMA) [31] is one of the

most popular model order reduction methods in use today, owing to its accuracy and preser-

vation of passivity. It is an implicit moment matching method. PRIMA uses an orthogonal

projection to generate the reduced matrices (1.23), i.e W = Q and QTQ = I. The projec-

tion matrix Q is constructed in such way that its columns form basis vectors of the Krylov

subspace (1.24), of order q, with A = −G−1C and R = G−1B. This is illustrated by the

relation

colspan(Q) = Kq(−G−1C,G−1B) (1.25)

where Q ∈ Rm×pq. Replacing W with Q in (1.23), the reduced matrices generated by PRIMA

are given by

G = QTGQ (1.26)

C = QTCQ (1.27)

B = QTB (1.28)

L = QTL (1.29)

and the order of the reduced model is pq. Due to the relation (1.25), the reduced model from

PRIMA will match the first q moments of the original network [18]. In the particular case

of RC networks, a model of order pq will match the first 2q moments [43]. To compute the

columns of Q, PRIMA employs a numerically stable algorithm, the block-Arnoldi process [35].


This process generates the columns in such a way that each column is orthogonal to all other

columns. Numerically, this means that the dot product between any two different columns is

approximately zero. This is done to ensure that Q is full rank, and does not lead to singular

reduced models. Due to the orthogonalization procedure used in the block-Arnoldi method [35],

matrix Q is dense.

Passivity

A transformation in the form (1.26), with Q full rank, is called a congruence [44]. A congruence

transformation has the special property that it preserves the sign of the original matrix in the

transformed matrix. For instance, if the original matrix is non-negative definite, the transformed

matrix will also be non-negative definite. It is because of this special property that PRIMA is

able to preserve passivity in the reduced model. As a simple proof of the passivity preserving

property of PRIMA, let us consider the passivity conditions (1.8)-(1.10) which as we know are

satisfied by the original network. The goal is to show that the reduced model matrices (1.26)-

(1.29) also satisfy the same conditions. If we substitute (1.26) into (1.8), we have

G + GT = QTGQ + QTGTQ (1.30)

= QT (G + GT )Q (1.31)

Equation (1.31) is a congruence transformation of matrix (G + GT ). Since we know that

matrix (G + GT ) ≥ 0 and that a congruence preserves matrix sign, then matrix (G + GT ) ≥ 0.

Condition (1.8) is therefore satisfied. For condition (1.9), the reduced matrix C in (1.27) is

a congruence transformation of the original matrix C which as we know C = CT ≥ 0. This

implies that matrix C is also symmetric and non-negative definite. Finally, we know that the

original network satisfies condition (1.10), L = B. From (1.28)-(1.29), we see that this implies

that matrix L = B. PRIMA therefore preserves passivity by construction.

Our next discussion looks at how the explicit construction of Q and the projections used

for the reduced models limit the use of PRIMA on electrical networks with many ports.


Limitations of PRIMA

The PRIMA method does not scale well with the size and number of ports of a network. This

makes it inefficient for macromodeling interconnects in today’s VLSI chips. The inefficiency is

mainly due to the following factors:

1. The orthogonalization procedure used to generate the projection matrix leads to a dense

matrix Q. This means that all the mpq elements in Q have to be stored in memory before

performing the congruence transformations. For large networks such as the parasitic net-

works of interconnects, the matrix order m can exceed several millions while the number

of ports p can even be more than several thousands [16]. Storing the projection matrix

for such networks requires a prohibitive amount of computer memory.

2. The dense structure of the projection matrix leads to dense matrix manipulations in (1.26)-

(1.29). Generating the reduced models involves matrix products between large and dense

matrices. Such products are computationally expensive to perform, requiring a lot of

CPU time and memory [43]. For some numerical examples that we look at later in the

thesis, PRIMA actually runs out of memory.

3. The reduced model matrices are also dense. Transient simulations involving such matrices

are computationally costly and in some cases more costly that with the original parasitic

network. This is due to the expensive matrix factorizations and inversions involved.

The drawbacks of existing methods, like PRIMA, call for better and more efficient model

order reduction techniques for electrical networks with many ports [45]. In the next subsection,

we look at some of the recent methods that have been developed to address these drawbacks.

We point out their individual limitations, which motivate the work done in this thesis.

1.2.5 Recent Techniques For Electrical Networks with Many Ports

A number of techniques have been developed in recent years to tackle the challenges of reducing

electrical networks with many ports. One group of techniques aims at reducing the number

of ports before applying a model order reduction method like PRIMA. The hope is that this

will reduce the computational cost of model order reduction. Methods in this group include


SVDMOR [46], ESVDMOR [47], RecMOR [48], and others like [49,50]. Reduction of the ports

is done by exploiting the correlation between different ports of the network. In cases of high

correlation, a significant reduction in the number of ports can be achieved. Unfortunately, for

practical networks such as modern VLSI interconnects, there is little correlation between the

network ports. This therefore limits the use of these methods.

A second group divides up the input ports into a number of smaller clusters, prior to model

order reduction. Methods here include [51–53]. Each cluster of inputs makes up a subsystem

of the network. These subsystems are reduced individually with any model order reduction

technique. The reduced models obtained have a block diagonal structure, which makes them

sparse. As a result, they are faster to simulate than PRIMA’s. The setup of the subsystems

however does not guarantee passivity of the reduced models.

Another method, known as SIP [43], avoids explicitly constructing the projection matrix.

The reduced matrices (1.26)-(1.29) are instead computed through sparse matrix manipulations,

using the concept of Schur complement [54]. This is closely related to what is done in PACT [55].

Such a process of generating the reduced models is more efficient, making SIP more scalable

to networks with many ports. SIP, however, can only match at most two moments of the

original network per frequency point. For RLC networks, it only matches the first moment

while for RC networks, it matches the first two moments. This level of accuracy is not always

sufficient for reduced order modeling. The authors of [43] suggest using multi-point moment

matching [18,34] to improve accuracy, but this could lead to singular reduced models. A solution

for this singularity problem does not seem trivial.

The most recent method, SparseRC [56], uses graph-partitioning together with model order

reduction to efficiently reduce very large RC networks with many ports. The original network is

partitioned into smaller subnetworks that can be reduced, individually, using a technique similar

to SIP [43]. This simplifies the problem of reducing a very large network to that of reducing

relatively smaller networks, which are easily managed. The reduced model obtained is passive

and sparse. However, just like SIP, SparseRC can only match two moments per frequency point.

PRIMA could be used to match additional moments as suggested by the authors of [56]. This

however reduces the efficiency of SparseRC due to the inherent problems of PRIMA.

In conclusion, existing model order reduction methods can hardly handle the complexity of


parasitic networks that arise in modern industrial designs. These limitations present a challenge

in quickly predicting power and signal integrity issues in future integrated circuits. Therefore,

there is a need for efficient model order reduction methods for electrical networks with many

ports, that can generate accurate, passive and sparse reduced models.

1.3 Thesis Goal

Our goal in this thesis is to develop an accurate, passive and efficient model order reduction

technique for large electrical networks with many ports. The strategy we use to accomplish this

goal is as follows:

1. Accuracy of reduced models. We aim at achieving accuracy in the reduced models

through moment matching. Our reduction technique must be able to match an arbitrary

number of moments of the original network at DC. In this way, there is full control over

the level of accuracy in the reduced models.

2. Preserving passivity. Our reduction technique must also generate a passive reduced

model if the original network is passive. To achieve this, we will use only congruence

transformations. As already mentioned, congruence transformations have the special

property of maintaining the signs of the original network matrices in the reduced matrices.

Therefore, if the original matrices satisfy the passivity conditions, the reduced matrices

will implicitly do.

3. Efficiency. We plan to accomplish this in two ways.

(a) efficient reduction process. Here the objective is to speed up reduction, and properly

manage the use of memory. We will achieve this by avoiding the explicit construction

of a large and dense projection matrix, and also by avoiding matrix products between

large and dense matrices.

(b) efficiency of the reduced models. We aim to obtain sparse reduced models that are

faster to simulate.


1.4 Thesis Outline

This thesis is structured as follows. In Chapter 2, we discuss a novel model order reduction

technique efficient for large RC networks with many ports. We present the theory, implementa-

tion details and numerical results that show superior performance of our method over the state

of the art. The work done here is the core part of the thesis. Chapter 3 looks at reduction in

presence of inductors. This part of the thesis is still a work in progress, but already we can

share some interesting insights. We conclude in Chapter 4 with a summary of our work, of our

contributions and of possible future directions of investigation.

Chapter 2

TurboMOR: Reduction of RC

Networks with Many Ports

As discussed in chapter 1, PRIMA [31] does not scale well with the order and number of ports of

electrical networks, like RC networks. The reduction consumes a lot of memory and CPU time.

In addition, the reduced model generated is large and dense, and therefore slows down subse-

quent simulations. Also, storing such reduced models is a challenge. Techniques like SIP [43]

and SparseRC [56] are efficient for such networks, but are limited in accuracy. The method

SIP [43] can only match the first two moments at DC. Any attempts to improve accuracy,

using multipoint SIP [43], may generate singularities in the reduced model. SparseRC [56], on

the other hand, matches more than two moments by using PRIMA. This reduces its efficiency

since it inherits the challenges of PRIMA. In this chapter, we propose a novel technique for

the efficient model order reduction of RC networks with many ports, named TurboMOR. The

proposed method can efficiently match an arbitrary number of moments at DC. The first two

moments are matched in a way similar to SIP [43] and SparseRC [56]. Additional moments are,

however, matched without use of PRIMA. Instead, reduction is done with efficient computa-

tions that do not involve explicitly constructing a large and dense matrix. The reduced models

generated by TurboMOR are sparse, and therefore can accelerate subsequent simulations. The

sparsity of the models also reduces memory required in storage. The models are also passive

by construction. The use of passive models prevents unstable transient simulations [27]. Tur-

19

Chapter 2. TurboMOR: Reduction of RC Networks with Many Ports 20

boMOR can also be combined with partitioning to handle reduction of very large networks. As

numerical results will show, the proposed method scales well with the size and number of ports

of a network, outperforming the state of the art. An additional feature of TurboMOR is that

the reduction process, as we will show, offers a novel and insightful interpretation of moment

matching, from a system theory perspective.

The rest of the chapter is organized as follows. We will first derive the proposed method

in Section 2.1, then discuss the implementation details in Section 2.2. In Section 2.3, we will

discuss how to handle a singularity problem that may arise during reduction. A proof of moment

matching will be given in Section 2.4. In Section 2.5, we will discuss how to integrate graph-

partitioning in the proposed method. We will then present numerical results in Section 2.6.

Finally, we will conclude in Section 2.7.

2.1 Theoretical Derivation

To begin our discussion of TurboMOR, let us consider a passive RC network using an impedance

representation. The network equations in the Laplace domain are in the form of (1.1), repeated

here for convenience Gx(s) + sCx(s) = Bu(s)

y(s) = BTx(s)

(2.1)

The conductance matrix G ∈ Rm×m is symmetric and non-negative definite, and vector x(s) ∈

Rm collects only nodal voltages. To shorten notation, we omit from here on the explicit depen-

dence of vectors y, x and u on Laplace variable s.

The nodes of network (2.1) can be reordered such that port nodes come first, followed by

internal nodes. Reordering of nodes will induce a reordering of the nodal voltages in x, and a

corresponding reordering of the system equations. The resulting nodal equation reads

G11 ∗

G21 G22

x1

x2

+ s

C11 ∗

C21 C22

x1

x2

=

B1

0

u (2.2a)

y =

[BT

1 0

]x1

x2

(2.2b)


where vectors x1 ∈ Rp and x2 ∈ Rm−p are the port and internal nodal voltages, respectively,

and symbol ∗ denotes the transpose of the block across the diagonal of a symmetric matrix.

This reordering leads to a zero block in matrix B, since the port currents enter directly only in

the first p equations. Now we have two groups of nodes. The blocks G11 and C11 represent the

resistive and capacitive couplings, respectively, among only the port nodes. Blocks G22 and C22

respectively encode the resistive and capacitive couplings among only internal nodes. Lastly,

blocks G21 and C21 represent the resistive and capacitive couplings, respectively, between the

ports and internal nodes.

We are now in a position to derive the TurboMOR algorithm. As we will show, the proposed

method uses efficient and memory-conscious Householder reflections [44] to efficiently match

more than two moments at DC. Moments are matched two at a time, in an iterative fashion.

The reduced model generated has a block-diagonal structure, which makes it sparse. Passivity

is also preserved in the reduced models due to use of only congruence transformations. Graph-

partitioning can be integrated in the proposed algorithm to handle reduction of very large

networks. We also show a novel and insightful way of interpreting moment matching, from a

system theory perspective.

2.1.1 Matching Two Moments

The first iteration of TurboMOR is similar to reduction in SIP [43] and SparseRC [56]. We

start by eliminating all resistive couplings between the internal nodes and port nodes. This

is done by eliminating the blocks G21 and GT21 in (2.2a) using Gaussian elimination. Let us

assume for now that G22 is positive definite, and so invertible. This is true in most cases.

Later, in section 2.3, we show how to handle cases in which G22 is singular. We first compute

the Cholesky factorization [44]

G22 = KKT (2.3)


where K ∈ Rm−p×m−p is a lower triangular matrix, and is invertible. Matrix K is referred to

as the Cholesky factor of G22. We then apply a change of variable

x1

x2

= Q(1)

x1

x(1)2

(2.4)

in (2.2a)-(2.2b), where

Q(1) =

Ip 0

−K−TK−1G21 Im−p

(2.5)

with Ip ∈ Rp×p and Im−p ∈ Rm−p×m−p as identity matrices. We follow this with a premultipli-

cation of (2.2a) by (Q(1))T to obtain

(Q(1))T

G11 ∗

G21 G22

Q(1)

x1

x(1)2

+ s(Q(1))T

C11 ∗

C21 C22

Q(1)

x1

x(1)2

= (Q(1))T

B1

0

u

(2.6a)

y =

[BT

1 0

]Q(1)

x1

x(1)2

(2.6b)

The new system (2.6a)-(2.6b) is a congruence transformation of (2.2a)-(2.2b). This transforma-

tion does not change the transfer function or the moments of the original network (2.2a)-(2.2b),

since Q(1) is invertible. We can simplify (2.6a)-(2.6b), through matrix multiplication, to obtain

G(1)11 0

0 G22

x1

x(1)2

+ s

C(1)11 ∗

C(1)21 C22

x1

x(1)2

=

B1

0

u (2.7a)

y =

[BT

1 0

] x1

x(1)2

(2.7b)


Σ(1)1

Σ(1)2

ddt

ddt

u(1)1

y(1)2 u

(1)2

x1

u (port currents)

y (port voltages)

Figure 2.1: A decomposition of system (2.7a), (2.7b) into two dynamically coupled subsystems,

Σ(1)1 and Σ

(1)2 . At DC, the two subsystems are completely decoupled.

where

G(1)11 = G11 −GT

21K−TK−1G21 (2.8)

C(1)11 = C11 −GT

21K−TK−1C21 −CT

21K−TK−1G21 + GT

21K−TK−1C22K

−TK−1G21 (2.9)

C(1)21 = C21 −C22K

−TK−1G21 (2.10)

There is now no resistive coupling between the internal nodes and port nodes, only capacitive

coupling. We can interpret system (2.7a), (2.7b) as a cascade of two subsystems. A first

subsystem Σ(1)1 of order p

Σ(1)1 :

G

(1)11 x1 + sC

(1)11 x1 = u

(1)1 + B1u

y = BT1 x1

(2.11)

and a second subsystem Σ(1)2 of order m− p

Σ(1)2 :

G22x

(1)2 + sC22x

(1)2 = −C

(1)21 u

(1)2

y(1)2 = −(C

(1)21 )Tx

(1)2

(2.12)

This is shown graphically in Fig. 2.1. The two subsystems are only coupled dynamically through

the time derivatives u(1)1 = sy

(1)2 and u

(1)2 = sx1. Only the first subsystem Σ

(1)1 is directly

connected to the input and output ports of the original network. At DC, Σ(1)1 and Σ

(1)2 are

completely decoupled, and the input-output behavior of the original network depends only on

Σ(1)1 . At low frequencies, there is weak coupling between the two subsystems, so the overall


network response depends mainly on Σ(1)1 . Therefore, we can use subsystem Σ

(1)1 alone

G

(1)11 x1 + sC

(1)11 x1 = B1u

y = BT1 x1

(2.13)

as a reduced order model of dimension p. This model matches the first two moments of the

original system (2.7a), (2.7b) at DC. The first two moments of the original network are the

constant term and coefficient of s in the Taylor series expansion (1.14). From Fig. 2.1, the

transfer function of subsystem Σ(1)2 gets multiplied by s2, and so it is not surprising that Σ

(1)2

only affects the third and higher moments of system (2.7a), (2.7b). A proof of moment matching

is given in Section 2.4.

The model (2.13) therefore shares the same accuracy and size as models from existing tech-

niques like PRIMA. Computing the reduced model matrices G(1)11 and C

(1)11 , using (2.8) and (2.9)

respectively, is however computationally cheap. We only need to compute the Cholesky factor-

ization of the sparse matrix G22, and then perform forward and backward substitutions using

the sparse Cholesky factor K. All these operations involve only sparse matrices, and can be

efficiently handled with sparse linear algebra routines.

To match additional moments at DC, a thing that SIP [43] fails to do, we must consider

the effect of the second subsystem Σ(1)2 on the the network response. Rather than applying

PRIMA to reduce Σ(1)2 , as done in SparseRC [56], we pursue a more efficient approach that

avoids the problems associated with PRIMA. We process Σ(1)2 so as to decompose it into two

other subsystems.

2.1.2 Matching Four Moments

In the second iteration, we work on decomposing subsystem Σ(1)2 . We start by first applying a

change of variable x(1)2 = K−T z

(1)2 in (2.12). We then left multiply (2.12) by K−T to obtain

Im−pz

(1)2 + sK−1C22K

−T z(1)2 = −K−1C

(1)21 u

(1)2

y(1)2 = −(C

(1)21 )TK−T z

(1)2

(2.14)


System (2.14) is a congruence transformation of (2.12), and turns G22 into the identity matrix.

We keep matrix K−1C22K−T in factored form throughout the entire algorithm, since explicitly

computing it would lead to a huge and dense matrix. This matrix will only be required in a few

matrix products with smaller matrices, with at most p columns. By keeping K−1C22K−T in

factored form, such products can be computed efficiently. We therefore only explicitly compute

matrix K−1C(1)21 , which involves only a forward substitution with the sparse matrix K. The

Cholesky factor K has already been computed in the first iteration.

Our next step exploits the fact that a Householder reflector can be used to introduce zeros

within a column vector [44]. We compute the QR1 factorization of the input-to-state matrix

K−1C(1)21 , in (2.14), by applying a series of Householder reflectors [44]

(Q(2))TK−1C(1)21 =

R(2)

0

(2.15)

Matrix R ∈ Rp×p is an upper triangular matrix, and Q(2) ∈ Rm×p is an orthogonal matrix

given by a product of p Householder reflectors. Each Householder reflector Q(2)i ∈ Rm×p is

defined as

Q(2)i = I− β(2)i v

(2)i (v

(2)i )T ∀k = 1, 2, ... , p (2.16)

where scalar β(2)i is the Householder constant and vector v

(2)i is the Householder vector. There-

fore, by applying a change of variable

z(1)2 = Q(2)

x(2)1

x(2)2

(2.17)

1A QR factorization decomposes a matrix into a product of an orthogonal matrix Q (QTQ = I) and an uppertriangular matrix R [44].


in (2.14), followed by a left multiplication of (2.14) by (Q(2))T , we obtain

Ip 0

0 Im−2p

x

(2)1

x(2)2

+ s

C(2)11 ∗

C(2)21 C

(2)22

x

(2)1

x(2)2

=

−R(2)

0

u(1)2 (2.18a)

y(1)2 =

−R(2)

0

T x

(2)1

x(2)2

(2.18b)

where

C(2)22 =

[0 Im−2p

](Q(2))TK−1C22K

−TQ(2)

0

Im−2p

(2.19)

C(2)21 =

[0 Im−2p

](Q(2))TK−1C22K

−TQ(2)

Ip

0

(2.20)

C(2)11 =

[Ip 0

](Q(2))TK−1C22K

−TQ(2)

Ip

0

(2.21)

System (2.18a), (2.18b) is a congruence transformation of (2.14), and is in the form of (2.7a), (2.7b).

The interpretation of (2.7a), (2.7b) in the first iteration, shown in Fig. 2.1, can also be applied

here. We can interpret (2.18a), (2.18b) as a cascade of two subsystems, a first subystem Σ(2)1

Σ(2)1 :

Ipx

(2)1 + sC

(2)11 x

(2)1 = u

(2)1 −R(2)u

(1)2

y(1)2 = −(R(2))Tx

(2)1

(2.22)

of order P and a second subsystem Σ(2)2

Σ(2)2 :

Im−2px

(2)2 + sC

(2)22 x

(2)2 = −C

(2)21 u

(2)2

y(2)2 = −(C

(2)21 )Tx

(2)2

(2.23)

of order m− 2p. Subsystems Σ(2)1 and Σ

(2)2 are only coupled dynamically through u

(2)1 = sy

(2)2

and u(2)2 = sx

(2)1 . We can replace subsystem Σ

(1)2 , in Fig. 2.1, by Σ

(2)1 and Σ

(2)2 to give Fig. 2.2.

The original system (2.7a), (2.7b) can now be interpreted as made up of three subsystems that


Σ(1)1

Σ(2)1

Σ(2)2

ddt

ddt

ddt

ddt

u(1)1

y(1)2 u

(1)2

x1

u(port currents)

y(port voltages)

x(2)1u

(2)1

y(2)2 u

(2)2

Figure 2.2: A decomposition of system (2.7a), (2.7b) into three dynamically coupled subsystems.At DC, all three subsystems are decoupled.

are all coupled dynamically. Just like in the first iteration, subsystem Σ(2)1 alone approximates

Σ(1)2 by matching its first two moments at DC. If we use Σ

(2)1 alone to approximate Σ

(1)2 in

Fig. 2.1, we obtain a reduced model

G(1)11 0

0 Ip

x1

x(2)1

+ s

C(1)11 ∗

R(2) C(2)11

x1

x(2)1

=

B1

0

u (2.24a)

y =

[BT

1 0

] x1

x(2)1

(2.24b)

of dimension 2p. This model matches four moments of the original system (2.7a), (2.7b) at DC,

as proved in section 2.4. It has a block-diagonal structure, which makes it sparse. Such a sparse

model is faster to simulate than a corresponding dense model, of the same size and accuracy,

generated by PRIMA [31]. This is because it is cheaper, during simulation, to perform matrix

factorizations and inversions of sparse matrices. In addition, storing sparse matrices consumes

less memory than dense matrices.

We can repeat the process to match more moments at DC. This is done by now processing

subsystem Σ(2)2 in Fig. 2.2.


2.1.3 Matching More Than Four Moments

To match more than four moments, we will have to decompose subsystem Σ(2)2 in Fig. 2.2.

Generally, for each iteration j ≥ 3, we will first apply a series of p Householder reflectors to

compute the QR factorization of the input-to-state matrix C(j−1)21 of subsystem Σ

(j−1)2

(Q(j))TC(j−1)21 =

R(j)

0

(2.25)

where R(j) is upper triangular, and Q(j) is orthogonal. Matrix Q(j) is a product of the House-

holder reflectors. We will then apply a congruence transformation to subsystem Σ(j−1)2 using

Q(j), just like in the second iteration. This will result in a new system in the form (2.18a), (2.18b).

We can therefore apply the reduction used in the first two iterations. The new system can be

interpreted as a cascade of two subsystems, a first subsystem Σ(j)1 of order p and a second

subsystem Σ(j)2 of order m − jp. Subsystem Σ

(j)1 alone will form part of the reduced model

adding two moments to the already matched moments. The second subsystem Σ(j)2 will only

be further decomposed if j < q, otherwise it will be discarded. After q iterations, the reduced

model obtained will be of order qp, and will be in the form

G(1)11

Ip. . .

Ip

x1

x(2)1

...

x(q)1

+ s

C(1)11 ∗

R(2) C(2)11

. . .

. . .. . . ∗

R(q) C(q)11

x1

x(2)1

...

x(q)1

=

B1

0

...

0

u (2.26a)

y =

[BT

1 0 . . . 0

]

x1

x(2)1

...

x(q)1

(2.26b)

The reduced model (2.26a), (2.26b) comprises of q subsystems, each of order p, that are only

dynamically coupled through the blocks R(j) (j = 2, ... , q). As proved in section 2.4, this


reduced model matches the first 2q moments of the original system (2.7a), (2.7b) at DC.

Therefore, with TurboMOR, we generate reduced models of the same size and accuracy as

PRIMA. We however generate the reduced models more efficiently, avoiding any explicit com-

putations of huge and dense matrices. The models from TurboMOR are sparse, owing to their

block-diagonal structure. As a result, they are faster to simulate than the corresponding dense

models from PRIMA, as we will demonstrate in section 2.6. Even though models from PRIMA

can be made sparse using eigenvalue decomposition, this requires extra CPU time. The spar-

sity of TurboMOR’s models also makes storing them cheaper than models from PRIMA, since

less memory is consumed. Compared to SIP [43], which only matches two moments at DC,

TurboMOR can generate models that match an arbitrary number of moments. Moreover, ad-

ditional accuracy is attained without any singularity problems, unlike multipoint SIP [43]. The

advantage of TurboMOR over SparseRC [56] is in matching more than two moments, we avoid

the use of PRIMA. TurboMOR also generates passive models, that prevent unstable transient

simulations [27]. The use of only congruence transformations preserves the positive definiteness

of matrices G and C of the original network (2.1). We can also observe in model (2.26a), (2.26b)

that TurboMOR preserves B1, which is the mapping of ports to state equations. This property,

as discussed in [25], simplifies the connection of the reduced model to other components. The

model (2.26a), (2.26b) can also be converted into an equivalent RC network, using the synthesis

method in [25]. This enables integration into existing electronic design tools, like SPICE.

For interconnects modeled as RC networks, the low-pass type approximation (2.26a), (2.26b)

is sufficient because the dominant poles of the transfer function of an RC network are the

low-frequency poles. The magnitude of the transfer function is highly attenuated at higher

frequencies, which implies that the response of the network to high frequency components of

an input signal can be safely neglected.

We do not need to use frequency shifting [57] in TurboMOR when reducing RC networks

because of two main reasons. First, the input-output behavior of an RC network is described

by a smooth frequency response that can be accurately predicted by only matching a number

of moments at DC, as done in the reduced model (2.26a), (2.26b). We will show a numerical

example in Section 2.6, where we accurately predict the response of an RC network by only

matching moments at DC. This is not the case for an RLC network, where the input-output


behavior is described by a complex frequency response with several resonances. In such a case,

frequency shifting would be useful for improving the accuracy of the reduced model over a wider

frequency band [18,34]. The second reason is that for RC networks in which matrix G in (2.1)

is singular, techniques like PRIMA [31] can only carry out reduction by employing frequency

shifting so as to avoid inverting a singular matrix. Reduction in TurboMOR, however, does

not require matrix G to be invertible. We therefore do not need to use frequency shifting even

when reducing RC networks in which matrix G is singular.

2.2 Practical Implementation

We now discuss how TurboMOR can be implemented for speed, and efficient memory consump-

tion. The reduction process is implemented practically as follows.

2.2.1 First Iteration

In iteration 1, we only need to compute matrices G(1)11 and C

(1)11 in (2.13). We compute matrix

C(1)21 in (2.7a), (2.7b) if and only if we want to match more than two moments. We start by

computing the Cholesky factor K of matrix G22 in (2.2a), (2.2b). This is done efficiently using

a sparse factorization routine for positive definite matrices, such as the supernodal method [58]

that is employed by the chol routine in MATLAB. After obtaining factor K, we then compute

matrices G(1)11 and C

(1)11 using (2.8) and (2.9), respectively. The computation involves forward

substitutions, backward substitutions and products of sparse matrices, which are fast to im-

plement in a numerical language such as MATLAB. At this point, if we only want to match

two moments, we have all the matrices needed to form the reduced model (2.13). If we need to

match more than two moments, we have to also compute matrix C(1)21 using (2.10), and carry

it on to the next iteration.

2.2.2 From the Second to the Second Last Iteration

For every iteration j = 2, ... , q − 1, only the matrices R(j) and C(j)11 , associated with the first

subsystem Σ(j)1 , enter the reduced model (2.26a), (2.26b). The matrices C

(j)21 and C

(j)22 , of the

second subsystem Σ(j)2 , are carried to the next iteration. First we perform the QR decomposi-


tion (2.25) of the input-to-state matrix of subsystem Σ(j−1)2 by the Householder method [44].

We implement this efficiently using a direct call2 to the LAPACK routine DGEQRF [59]. The

DGEQRF routine returns the upper triangular matrix R(j) and a factorization of the orthog-

onal matrix Q(j) in (2.25). This factorization consists of only Householder vectors v(j)i and

constants β(j)i in (2.16). We therefore never have to compute explicitly matrix Q(j).

Next, we compute matrices C(j)11 and C

(j)21 using the recursive relation

C(j)11

C(j)21

= (Q(j))TC(j−1)22 Q(j)

Ip

0

(2.27)

that can be derived from (2.20) and (2.21). Note that C(1)22 = K−1C22K

−T . Matrix products

involving Q(j) or (Q(j))T are handled efficiently with a direct call to LAPACK routine DOR-

MQR [59], and using the factored form of Q(j) returned by the routine DGEQRF. Lastly, the

matrix C(j)22 is never computed explicitly, rather its factorization

C(j)22 =

[0 I(m−jp)

](Q(j))TC

(j−1)22 Q(j)

0

I(m−jp)

(2.28)

, derived recursively from (2.19), is carried on to the next iteration, together with matrix C(j)21 .

2.2.3 Last Iteration

In the last iteration q, we only need matrices C(q)11 and R(q) in (2.26a) (2.26b). Matrix R(q) is

computed from the QR decomposition of C(q−1)21 in (2.25), which also returns the orthogonal

matrix Q(q). Matrix C(q)11 is then computed if q > 2

C(q)11 = (Q(q))TC

(q−1)22 Q(q) (2.29)

In the case q = 2,

C(2)11 = (Q(2))TK−1C22K

−TQ(2) (2.30)

2We use the direct call to avoid explicitly computing the orthogonal matrix in a QR factorization, whichwould be otherwise huge and dense. The direct call returns instead factors of the orthogonal matrix. Anotherdirect call uses these factors to implement efficiently any operations involving the orthogonal matrix.


The above practical implementation details allow TurboMOR to avoid huge and dense

matrix products, which require a lot of memory and CPU time. This, as numerical results will

demonstrate, makes TurboMOR more scalable than existing techniques.

2.3 On the Singularity of Matrix G

In our theoretical derivation of TurboMOR, we assume the conductance matrix G is positive

definite, and so invertible. This is true in most cases. For cases where matrix G is singular,

TurboMOR can still be used for reduction but a preprocessing step is required. Note that

reduction with TurboMOR only requires that the block matrix G22, in G, is invertible, and not

the whole of G. We can therefore use a strategy similar to what is done in SparseRC [56] of

first identifying the rows and columns in G22 responsible for its singularity, and then promoting

such rows and columns to the first set of equations in the original network (2.2a). Identifying

the singular rows and columns is simple due to the nature of G22. Matrix G22 is diagonally

dominant, that is for every row i in G22,

|gii| ≥∑j 6=i

|gij | (2.31)

where gij is the ijth element. In addition, G22 has only positive or zero real diagonal entries

and negative or zero real off-diagonal entries. This means that relation (2.31) can be written

as

gii +∑j 6=i

gij ≥ 0 (2.32)

for every row i, and also that G22 is non-negative definite. For matrix G22 to be invertible and

therefore positive definite, it must be strictly diagonally dominant. The relation (2.32) must

be a strict inequality. We can therefore identify the rows in G22 for which the relation (2.32) is

equal to zero, and promote them to the first set of equations in (2.2a), (2.2b). The remaining

rows satisfy (2.32) in a strict sense, making G22 invertible. Retaining the singular rows increases

the size of the reduced model (2.26a), (2.26b). If the number of singular rows is d, the reduced

model size will be q(p+ d). In practice, however, the number of singular rows is very low, and

so the model size does not increase substantially.


2.4 Proof of Moment Matching

Our goal here is to prove that TurboMOR can match an arbitrary number of moments of the

original network. We show that the reduced model (2.26a)-(2.26b), obtained after q iterations,

matches 2q moments. Throughout this proof, we assume that matrix G is positive definite and

therefore invertible. If matrix G is singular, the moments at s = 0 are undefined and we can no

longer establish accuracy in terms of moment matching. However, TurboMOR can still be used

to reduce the network. The starting point of our proof is from system (2.7a)-(2.7b), which is

generated by applying a congruence transformation to the original network (2.2a)-(2.2b). The

congruence matrix used is invertible, and so the transformation does not change the transfer

function of the original network nor its moments. We therefore only have to prove that the

reduced model from TurboMOR matches moments of system (2.7a)-(2.7b). Our proof is based

on deriving a relation between moments Mk of original system (2.7a)-(2.7b) and the moments

of the inner subsystem Σ(1)2 (2.12), resulting from the first iteration of TurboMOR. The transfer

function of system (2.7a)-(2.7b) can be written, just like in [55,56], as

H(s) = BT1

[G

(1)11 + sC

(1)11 − s2H1(s)

]−1B1 (2.33)

where

H1(s) =(C

(1)21

)T(G22 + sC22)

−1C(1)21 (2.34)

is the transfer function of the inner subsystem Σ(1)2 (2.12). Let us denote the moments of

subsystem Σ(1)2 as Nl. The Taylor series expansion of H1(s) about s = 0 is

H1(s) =

+∞∑l=0

Nlsl (2.35)

By applying equations (2.35) and (1.14) to (2.33), we get

+∞∑k=0

Mksk = BT

1

[G

(1)11 + sC

(1)11 −

+∞∑l=0

Nlsl+2

]−1B1 (2.36)


Equation (2.36) can also be written as

[G

(1)11 + sC

(1)11 −

+∞∑l=0

Nlsl+2

] +∞∑k=0

B−T1 Mksk = B1 (2.37)

where B−T1 is the inverse transpose of matrix B1. Typically, for circuits, matrix B1 is a

permutation of an identity matrix and therefore invertible. In cases where matrix B1 is singular,

the dependent columns represent correlation between some inputs. We can reduce the number

of inputs before model order reduction using the technique in [46]. This leads to a new matrix

B1 which is full rank, and therefore invertible. By expanding the left hand side of (2.37), we

obtain

+∞∑k=0

G(1)11 B−T1 Mks

k ++∞∑k=0

C(1)11 B−T1 Mks

k+1 −+∞∑k=0

+∞∑l=0

NlB−T1 Mks

k+l+2 = B1 (2.38)

The right and left hand sides of (2.38) are polynomials of Laplace variable s. For these two

sides to be equal, the coefficients of each power of s on either side must be equal. Equating the

coefficients of s0, we get

G(1)11 B−T1 M0 = B1 ⇒M0 = BT

1

(G

(1)11

)−1B1 (2.39)

Matrix G(1)11 is positive definite and invertible, since it is a diagonal block of a matrix ob-

tained from a congruence transformation of the positive definite matrix G. Equating also the

coefficients of s1, we obtain

M1 = −BT1 (G

(1)11 )−1C

(1)11 (G

(1)11 )−1B1 (2.40)

From (2.39) and (2.40), we see that the first two moments of original system (2.7a)-(2.7b) only

depend on matrices G(1)11 , B1 and C

(1)11 , which are all present in reduced model (2.13). The

reduced model (2.13) therefore matches the first two moments of original system (2.7a)-(2.7b)

at DC. By now equating coefficients of a generic power sr for r ≥ 2, we obtain

Mr = −BT1

(G

(1)11

)−1C

(1)11 B−T1 Mr−1 + BT

1

(G

(1)11

)−1 r−2∑l=0

NlB−T1 Mr−l−2 (2.41)


The relationship in (2.41) shows that moment Mr of original system (2.7a)-(2.7b) depends on:

1. the matrices G(1)11 , B1 and C

(1)11 of subsystem Σ

(1)1 .

2. the moments Nl of subsystem Σ(1)2 , up to the order r − 2.

If we replace Σ(1)2 with a model that matches its first r − 2 moments, we are able to match

the first r moments of the original system. The relation (2.41) can be used at each iteration

of TurboMOR. Therefore after q iterations, the reduced model (2.26a)-(2.26b) will match the

first 2q moments of the original system (2.7a)-(2.7b) at DC. The established recursive relation

between a system and its inner subsystem Σ(j)2 enables TurboMOR to match two moments for

each iterative step j.

Our proposed proof generalizes the proof of moment matching in SparseRC [56], which only

considers matching the first two moments. Using the developed framework, we can compare

the accuracy of fast model order reduction methods, like TurboMOR and SparseRC, to that of

PRIMA.

2.5 Partitioning for Very Large Networks

Direct reduction of very large networks can be challenging, and in some cases infeasible. Exam-

ples of such networks include some of the power grid benchmarks in [16] that contain millions

of nodes and components. Computing the reduced model matrices, for such networks, may not

be possible even for the simple case of matching two moments. This is because the matrices

get huge and we can not perform directly the forward substitutions, backward substitutions

and matrix products such as in the update equations (2.8)-(2.9). Another problem is that the

direct reduction of such networks introduces a lot of fill-ins in the reduced models, which make

them completely inefficient for use in subsequent simulations.

A possible procedure of handling such a network is by first partitioning it into smaller

subnetworks, and then reducing each subnetwork individually. This is known as a divide and

conquer approach. The problem of reducing the very large network simplifies to that of reducing

smaller networks that can be easily managed. The partitioning strategy adopted however is

crucial to the success of such a procedure.


We choose the same partitioning strategy that SparseRC [56] uses. We first identify subsets

of nodes in the network that are only connected through a small group of nodes. These subsets

are the subnetworks to be reduced. The small group of nodes that connect them, referred to as

separator nodes, are to be retained in the reduced model so as to avoid introducing fill-ins. To

demonstrate this way of partitioning, let us consider a case of three partitions of the original

network (2.1)

G1 0 ∗

0 G2 ∗

G31 G32 G3

x1

x2

x3

+ s

C1 0 ∗

0 C2 ∗

C31 C32 C3

x1

x2

x3

=

B1

B2

B3

u (2.42a)

y =

[BT

1 BT2 BT

3

]x1

x2

x3

(2.42b)

Matrices G1, C1, B1 and G2, C2, B2 correspond to subnetworks 1 and 2, respectively, while

G3, C3 correspond to a group of separator nodes. Sub networks 1 and 2 are directly decoupled,

and are only indirectly connected through the group of separator nodes x3. The form (2.42a)-

(2.42b) is known as the bordered block diagonal form [60]. Since the separator nodes are

retained in the reduced model, it would be nice to find a partitioning that minimizes the

number of separator nodes. For this purpose, we use the nested dissection algorithm nesdis,

proposed in SparseRC [56], from the SuiteSparse package [58]. The nesdis algorithm will

permute a network into the bordered block diagonal form while minimizing the number of

separator nodes.

After partitioning, we can reduce (2.42a)-(2.42b) using TurboMOR. Each subnetwork is

reduced independently while considering its connections to the separator nodes. To illustrate

this, let us consider subnetwork 1. The reduction of subnetwork 1 corresponds to reducing

G1 ∗

G31 G3

x1

x3

+ s

C1 ∗

C31 C3

x1

x3

=

B1

B3

u (2.43)

The nodes in (2.43) can be reordered such that the original network’s port nodes present in


subnetwork 1 and the separator nodes come first, followed by internal nodes. This is shown

below Gp

1 ∗ ∗

Gp31 G3 ∗

Gip1 (Gi

31)T Gi

1

xp1

x3

xi1

+ s

Cp

1 ∗ ∗

Cp31 C3 ∗

Cip1 (Ci

31)T Ci

1

xp1

x3

xi1

=

Bp

1

B3

0

u (2.44)

The state vectors xp1 and xi

1 correspond to the ports of the original network and internal nodes,

respectively, that are present in sub network 1. We can now apply TurboMOR to (2.44), where

• vectors xp1 and x3 form vector x1 in (2.2a)-(2.2b).

• vector xi1 forms vector x2 in (2.2a)-(2.2b).

The same procedure is repeated for subnetwork 2. The overall reduced model, obtained after

all subnetworks have been reduced, is

G1 0 ∗

0 G2 ∗

G31 G32 G3

x1

x2

x3

+ s

C1 0 ∗

0 C2 ∗

C31 C32 C3

x1

x2

x3

=

B1

B2

B3

u (2.45a)

y =

[BT

1 BT2 BT

3

]x1

x2

x3

(2.45b)

The reduced model (2.45a)-(2.45b) maintains the same bordered block diagonal form as the orig-

inal network (2.42a)-(2.42b). Such a reduced model structure is also attained by SparseRC [56].

The novel feature of our proposed approach however is that the reduced model of each subnet-

work is block-diagonal, unlike SparseRC where the reduced model of each subnetwork is dense.

This, as numerical results will show, makes the overall reduced model from our approach sparser

than SparseRC’s, and so faster to simulate.

2.6 Numerical Results

We now compare the performance of TurboMOR, with and without partitioning, against

PRIMA and SparseRC on industrial examples. All methods are implemented in MATLAB.


For TurboMOR, the implementation involves direct calls to compiled LAPACK routines DGE-

QRF and DORMQR. These calls are for efficiently handling the QR decompositions (2.25), and

matrix products involving Householder matrix Q(j). All experiments are run in MATLAB on a

3.40 GHz Intel i7 computer, with 16 GB of memory. The industrial examples used are on-chip

bus from [15] and power grid benchmarks from [16].

The on-chip bus that we use corresponds to interconnects found in the top metal layer

of a chip manufactured using 65nm technology [15]. The on-chip bus comprises of coupled

signal lines, that are modeled as parallel transmission lines. The following are the physical and

geometrical properties of the bus:

• wire material is copper.

• wire width is 0.45 um.

• line spacing is 0.45 um.

• wire thickness is 1.20 um.

• dielectric thickness is 0.20 um.

• dielectric constant is 2.2.

Each transmission line is modeled as a cascade of RC π segments with the following per unit

length parameters:

• line resistance R = 40.74 ohm/mm.

• ground capacitance Cg = 82.03 fF/mm.

• coupling capacitance between adjacent lines is Cc = 73.22 fF/mm.

The per unit length parameters are computed using empirical formulas based on the geometrical

and physical properties of the bus [15]. The length of each segment is 0.2 mm. This is 50 times

smaller that the wavelength at the maximum frequency of operation of 20 GHz [19], making

the lumped RC approximation valid.

The power grid benchmarks are obtained from [16]. These benchmarks consist of some

power grid models in the form of Figure 1.2. They are described in a Spice file format. We use


Examples q PRIMASparseRC TurboMOR TurboMOR + P

CPUtime

CPUtime

Speedupw.r.t

PRIMA

CPUtime

Speedupw.r.t

PRIMA

CPUtime

Speedupw.r.t

PRIMA

1. On-chipbus

p = 256m = 38, 528

1 0.94 0.40 2.35× 0.28 3.36× 0.37 2.54×2 2.84 1.83 1.55× 1.48 1.92× 1.58 1.80×3 4.78 3.63 1.32× 3.00 1.59× 2.94 1.63×

2. ibmpg1tp = 200

m = 25, 195

1 0.41 0.16 2.56× 0.19 2.16× 0.18 2.28×2 1.10 0.51 2.16× 0.66 1.67× 0.45 2.44×3 1.98 0.94 2.11× 1.38 1.43× 0.85 2.33×

3. ibmpg2tp = 800

m = 163, 697

1 22.00 6.24 3.53× 10.84 2.03× 6.28 3.50×2 65.55 20.89 3.14× 37.82 1.73× 18.89 3.47×3 118.64 39.48 3.01× 78.38 1.51× 35.28 3.36×

4. ibmpg2tp = 1200

m = 163, 697

1 35.51 9.53 3.73× 16.64 2.13× 9.52 3.73×2 109.41 32.63 3.35× 60.60 1.81× 29.19 3.75×3 224.06 64.41 3.48× 132.55 1.69× 56.58 3.96×

5. ibmpg2tp = 1500

m = 163, 697

1 49.50 11.66 4.25× 21.29 2.33× 11.76 4.21×2 152.81 43.06 3.55× 83.66 1.83× 37.58 4.07×3 729.92 83.76 8.71× 186.95 3.90× 72.32 10.09×

6. ibmpg2tp = 2000

m = 163, 697

1 73.17 16.29 4.49× 31.23 2.34× 16.29 4.49×2 340.18 62.12 5.48× 228.32 1.49× 54.76 6.21×3 9807.11 122.12 80.31× 1051.72 9.32× 115.71 84.76×

Table 2.1: Reduction time for the different methods on various test networks (time in seconds).The acronym P stands for partitioning.

an internal generator that reads the Spice file of each benchmark and generates the system of

equations (2.1) describing the benchmark. The original benchmarks include inductors, which

we have neglected. We do some preprocessing on the benchmarks by combining nodes that are

connected by short circuits, and eliminating nodes shorted to the ground. This does not change

the electrical behavior of the power grids. We also use a variable subset of the current sources

to study the scalability of the reduction methods with the number of ports.


2.6.1 Reduction Time

Our first test is to compare the time spent by each method in reducing each of the test networks.

Table. 2.1 shows the reduction times of all methods on various examples. Example 1 is an on-

chip bus with 128 signal lines (256 ports). Examples 2-6 are power grid benchmarks. Reduction

is done to generate models that match 2, 4 and 6 moments, which can be achieved with q = 1,

q = 2 and q = 3 iterations of TurboMOR, respectively.

We start by comparing TurboMOR without partitioning against PRIMA, in order to estab-

lish its intrinsic efficiency in moment matching. In Table 2.1, we observe that TurboMOR is

always significantly faster than PRIMA with a maximum speedup of 9.32×. This is because

TurboMOR avoids computations that involve large and dense matrices. Reduction is done by

employing sparse matrix techniques and Householder matrices that are always kept in a fac-

tored form. All intermediate computations that would result in dense matrices are carried on

to later iterations. PRIMA, on the other hand, does reduction by explicitly computing a dense

projection matrix and performing dense matrix products. For large networks with many ports,

the reduction consumes a lot of memory and CPU time. In example 6 with q = 3, the high

memory consumption in PRIMA led to use of swap memory. This resulted in a reduction time

of 2 hours and 43 minutes (9807 s), while TurboMOR took only 17.5 minutes (1052 s).

We also observe in Table 2.1 that more speedup is achieved, especially for the larger ex-

amples 3-6, when using either TurboMOR with partitioning or SparseRC. This is because with

partitioning, reduction of a large network simplifies to reduction of smaller subnetworks that

can be efficiently managed. In both TurboMOR with partitioning and SparseRC, the same

number of partitions is used for a fair comparison. The number of partitions is chosen, through

experiments, for the best tradeoff between the reduction time and size of the reduced model.

Looking at results in Table 2.1, we observe that for q = 1, TurboMOR with partitioning and

SparseRC have almost the same reduction time. This is expected since they use similar opera-

tions to match the first two moments. For q > 1, we observe that TurboMOR with partitioning

is consistently faster than SparseRC, that uses PRIMA to match additional moments. This

demonstrates the efficiency of using Householder reflectors, in TurboMOR, to match more than

two moments.


Examples qSparseRCCPU time

TurboMORCPU time

speedup

1. ibmpg3tp = 1, 200m = 1, 040, 612

1 99.2 99.7 1.0×2 482.8 379.2 1.3×3 2737.4 791.4 3.5×

2. ibmpg4tp = 920m = 1, 211, 064

1 106.4 106.6 1.0×2 1254.6 417.0 3.0×3 2106.9 1505.1 1.4×

3. ibmpg4tp = 2, 470m = 1, 211, 064

1 238.7 238.6 1.0×2 1678.8 1079.6 1.6×3 7930.5 2726.2 2.9×

Table 2.2: Reduction time of TurboMOR with partitioning and SparseRC, for very large powergrid benchmarks. All times in seconds.

Reduction Time For Very Large Networks

We also experiment on reducing the power grid networks “ibmpg3t” and “ibmpg4t” from [16],

which comprise of millions of nodes and components. For these networks, direct reduction with

PRIMA or TurboMOR without partitioning was not possible. The memory requirements for

the update equations exceeded the 16 GB of memory available. By employing partitioning, in

TurboMOR and SparseRC, we managed to reduce these networks. Table 2.2 shows results of

the reduction time of TurboMOR and SparseRC for these networks. We observe that for q = 1,

both methods have almost the same reduction time. This is because reduction is done in the

same way for the first two moments. For additional moments (q = 2 and q = 3), however,

TurboMOR is always significantly faster with a speedup of up to 3.5×, demonstrating how

well it scales with network size when matching additional moments. SparseRC loses efficiency

because it employs PRIMA for additional moment matching.

2.6.2 Scalability

To further demonstrate the reduction efficiency of TurboMOR, we compare its scalability

against that of PRIMA and SparseRC. We employ two kinds of tests. First we study the

reduction time while varying the number of network ports, with the node-to-port ratio con-

stant. Then we consider the reverse, varying the node-to-port ratio with the number of ports


0 200 400 600 800 1000 12000

50

100

150

Number of Ports

Reducti

on T

ime [

s]

TurboMOR

PRIMA

Figure 2.3: Reduction time of PRIMA and TurboMOR without partitioning vs number of ports.In both methods, reduction is done to match six moments (q = 3).

constant. We experiment on the on-chip bus introduced at the beginning of section 2.6. All

reductions are done to match six moments. For an on-chip bus with l signal lines, the number

of ports is p = 2l and the number of nodes is m = (N + 1)l, where N is the number of segments

per line. Parameter N is computed as the ratio of the length of a signal line to the length of

a segment. To carry out the scalability tests, we use an internal generator that changes the

length or the number of signal lines of the on-chip bus.

Varying Number of Ports, Constant Node-to-Port Ratio

In our first test, we vary the number of signal lines of the on-chip bus, which corresponds to

varying the number of ports. The length of each line is kept fixed at 60 mm, and the segment

length is also fixed at 0.2 mm. The number of segments per line is as a result fixed at N = 300.

The node-to-port ratio is therefore constant and equal to 150.5. Figure 2.3 shows the reduction

time of TurboMOR (without partitioning) and PRIMA versus number of ports. We observe

that TurboMOR scales better than PRIMA. We repeat the same analysis in Figure 2.4 for

TurboMOR with partitioning and SparseRC. Here again we observe that TurboMOR scales

better as the number of ports increases.


0 200 400 600 800 1000 12000

20

40

60

80

100

Number of Ports

Red

uct

ion T

ime

[s]

TurboMOR with partitioning

sparseRC

Figure 2.4: Reduction time of SparseRC and TurboMOR with partitioning vs number of ports.In both methods, reduction is done to match six moments (q = 3).

0 50 100 150 200 250 3000

500

1000

1500

2000

2500

3000

Node to Port Ratio

Red

ucti

on

Tim

e [

s]

TurboMOR

PRIMA

Figure 2.5: Reduction time of PRIMA and TurboMOR without partitioning vs the node toport ratio. Both methods match six moments.

Varying Node-to-Port Ratio, Constant Number of Ports

In the second test, we keep the number of signal lines of the on-chip bus constant at 512. This

corresponds to 1024 ports. We vary the length of each signal line while keeping the segment

length fixed at 0.2 mm. This varies the number of nodes in the network, thereby varying the


0 50 100 150 200 250 3000

50

100

150

200

Node to Port Ratio

Red

uct

ion

Tim

e [s

]

TurboMOR with partitioning

sparseRC

Figure 2.6: Reduction time of SparseRC and TurboMOR with partitioning vs the node to portratio. Both methods match six moments.

node-to-port ratio. Figure 2.5 shows the reduction time of TurboMOR (without partitioning)

and PRIMA versus the node-to-port ratio. We observe that beyond a certain level of node-

to-port ratio, approximately 200, PRIMA reduction time drastically increases. In this case,

reduction with PRIMA involves very large and dense matrices. To store these matrices, the

computer resorts to swap memory, a very slow process. TurboMOR, on the other hand, does not

use large and dense matrices for reduction. In intermediate iterations, any matrix products that

would result in a dense matrix are always kept in a factored form. In later iterations, updates

involving a result of such products are computed using the factors. This makes TurboMOR

scale well even for high node-to-port ratios. We repeat the analysis in Figure 2.6 for TurboMOR

with partitioning and SparseRC. We again observe that TurboMOR outperforms SparseRC and

the performance gets better as node-to-port ratio increases.

2.6.3 Efficiency of the Reduced Models

We now study the efficiency of the reduced models from TurboMOR, PRIMA and SparseRC.

To evaluate efficiency, we compare the time it takes to perform a transient simulation on each

reduced model versus the time it takes to perform the same simulation on the original network.

The simulation involves computing time responses at the ports through numerical integration


Examples qPRIMA SparseRC TurboMOR TurboMOR + P

S.T speed S.T Speed S.T Speed S.T Speed

1. Busp = 256

m = 38, 528S.T= 3.00 s

1 0.07 42.86× 0.06 50.00× 0.07 42.86× 0.06 50.00×2 0.76 3.95× 0.80 3.75× 0.24 12.50× 0.60 5.00×3 3.23 0.93× 2.38 1.26× 0.54 5.56× 1.48 2.03×

2. ibmpg1tp = 200

m = 25, 195S.T= 2.26 s

1 0.23 9.83× 0.14 16.14× 0.13 17.38× 0.14 16.14×2 0.76 2.97× 0.32 7.06× 0.33 6.85× 0.27 8.37×3 1.92 1.18× 1.19 1.90× 0.61 3.70× 0.58 3.90×

3. ibmpg2tp = 800

m = 163, 697S.T= 29.84 s

1 4.32 6.91× 1.58 18.89× 1.60 18.65× 1.61 18.53×2 12.98 2.30× 5.94 5.02× 8.11 3.68× 4.78 6.24×3 27.28 1.09× 12.68 2.35× 13.63 2.19× 7.55 3.95×

4. ibmpg2tp = 1200

m = 163, 697S.T= 30.24 s

1 8.58 3.52× 3.56 8.49× 3.67 8.24× 3.56 8.49×2 28.75 1.05× 12.98 2.33× 20.30 1.49× 10.10 2.99×3 62.51 0.48× 28.40 1.06× 34.98 0.86× 16.21 1.87×

5. ibmpg2tp = 1500

m = 163, 697S.T= 30.69 s

1 13.31 2.31× 5.55 5.53× 5.98 5.13× 5.60 5.48×2 45.32 0.68× 20.27 1.51× 35.23 0.87× 15.55 1.97×3 104.12 0.29× 42.93 0.71× 60.40 0.51× 24.21 1.27×

6. ibmpg2tp = 2000

m = 163, 697S.T= 30.80 s

1 23.08 1.33× 9.45 3.26× 10.89 2.83× 9.58 3.22×2 81.00 0.38× 34.84 0.88× 73.17 0.42× 26.80 1.15×3 173.67 0.18× 77.28 0.40× 121.54 0.25× 43.18 0.71×

Table 2.3: Simulation time of the ROMs obtained with the different methods (time in seconds).The acronyms S.T and P stand for simulation time and partitioning, respectively, and speedrepresents speedup factor with respect to original simulation.

of the modified nodal analysis equations. We use the trapezoidal rule, a popular numerical

integration technique. For the power grids, a small number of the ports are terminated with

1.8 V voltage sources, while the rest are terminated with current sources whose waveforms are

pulse trains. For the on-chip bus, the near ends of the signal lines are terminated with 1 V

voltage sources, while the far ends are terminated with capacitive loads.

The results in Table 2.3 show the simulation times of reduced models associated with each

method, and also the simulation times of the original networks. The speedups are calculated

with respect to the simulation time of the original network. We observe from Table. 2.3 that

for most examples, simulation of the reduced models from all methods is substantially faster


than of the original network. This demonstrates the vital role model order reduction plays in

interconnect modeling. Now let us compare the models from TurboMOR (without partitioning)

and PRIMA. We observe in Table 2.3 that simulations of TurboMOR’s models are always

significantly faster than PRIMA’s up to a factor of about 5×. This is attributed to the block-

diagonal structure of TurboMOR’s reduced models, which makes them sparser than PRIMA’s.

The reduced models obtained from PRIMA are always dense. Transient simulations with dense

matrices are computationally costly due to the expensive matrix factorizations, substitutions

and products involved.

From Table 2.3, we also observe that models from the partitioning methods are faster to

simulate, especially for examples 3-6. This is because the fill-ins, introduced during reduction,

only occur in certain blocks of the network. The overall reduced model is sparser and faster to

simulate than a model from PRIMA, or even TurboMOR without partitioning.

Comparing the simulation times of models from TurboMOR with partitioning and SparseRC,

we observe that for two moments matched (q = 1), the reduced models from both methods have

almost the same simulation time. This is because when matching two moments, both methods

employ similar operations to generate the reduced model. In the case of models that match

more than two moments (q = 2 and q = 3), we observe that models from TurboMOR with

partitioning are consistently faster to simulate than SparseRC’s. This is because in TurboMOR

with partitioning, the block-diagonal structure of the reduced model associated with each sub-

network introduces more sparsity in the overall reduced model. SparseRC uses PRIMA to

match additional moments, which introduces some dense blocks in the reduced model.

Efficiency of Reduced Models from Very Large Networks

We also study the simulation time associated with the reduced models of the very large networks

in Table 2.2. These reduced models are generated using TurboMOR with partitioning and

SparseRC. Table 2.4 shows the simulation time of these models and of the original network.

Here too, speedup is calculated with respect to the simulation time of the original network. We

observe in all 3 examples that the reduced models are much faster to simulate than the original

networks.

We now compare the simulation time of reduced models generated by TurboMOR with


Examples qSparseRC TurboMOR

Sim.time speedup Sim.time speedup

1. ibmpg3tp = 1, 200m = 1, 040, 612Sim.time = 342.9 s

1 3.1 110.6× 3.1 110.6×2 23.0 14.9× 11.8 29.1×3 354.8* 0.97× 13.8 24.8×

2. ibmpg4tp = 920m = 1, 211, 064Sim.time = 418.0 s

1 2.8 149.3× 2.9 144.1×2 11.8 35.4× 8.0 52.3×3 21.1 19.8× 12.4 33.7×

3. ibmpg4tp = 2, 470m = 1, 211, 064Sim.time = 420.3 s

1 31.5 13.3× 31.6 13.3×2 167.2 2.5× 120.3 3.5×3 417.6 1.01× 265.9 1.6×

Table 2.4: Simulation time of reduced models of very large power grids (in seconds). An asteriskstands for numerical problems.

partitioning and those of SparseRC. We observe in Table 2.4 that for q > 1, models from

TurboMOR with partitioning are always faster to simulate by a factor of up to 2×. In example

1 with q = 3, the reduced model from SparseRC is ill-conditioned, which results in an abnormal

simulation time.

2.6.4 Accuracy

Finally, we demonstrate that TurboMOR is just as accurate as PRIMA when matching the same

number of moments. We also show that matching only two moments at a given frequency point,

which is what SIP [43] does, is not always sufficient for accurately approximating the input-

output behavior of some parasitic networks. For this purpose, we use power grid “ibmpg2t”,

which corresponds to example 3 in Table 2.3. In this power grid, 120 ports are terminated with

1.8 V voltage sources, which correspond to the supply voltage of the network. The remaining

680 ports are terminated with current sources whose waveforms are pulse trains. The current

sources model non-linear blocks that draw current from the power grid. We evaluate the

voltage drop across the current sources. The transient response of the original network and of

the corresponding reduced models, obtained with TurboMOR and PRIMA, are compared.

Figure 2.7 shows the voltage drop at one of the ports of the power grid, computed for

the original network and reduced models from TurboMOR and PRIMA. Both reduced models


0 2 4 6 8 101.74

1.75

1.76

1.77

1.78

1.79

1.8

1.81

Time [ns]

Volt

age

[v]

Original

PRIMA

TurboMOR

Figure 2.7: Transient response of original network and reduced models from TurboMOR andPRIMA. Both reduced models match two moments at DC.

0 2 4 6 8 100

0.01

0.02

0.03

0.04

0.05

Time [ns]

Maxim

um

Abso

lute

Err

or

PRIMA

TurboMOR

Figure 2.8: Maximum absolute transient error computed across all ports of the original network,due to reduced models from TurboMOR and PRIMA. Both reduced models match two momentsat DC.

are obtained matching two moments. We can observe that the reduced model responses are

identical, but they inaccurately predict the original response. Figure 2.8 shows the worst case

transient error across all power grid ports, for the two reduced models. We observe that it is

the same for both models, but the voltage drop across the power grid ports is underestimated


0 2 4 6 8 101.74

1.75

1.76

1.77

1.78

1.79

1.8

1.81

Time [ns]

Volt

age

[v]

Original

PRIMA

TurboMOR

Figure 2.9: Transient response of original network and reduced models from TurboMOR andPRIMA. Both reduced models match four moments at DC.

0 2 4 6 8 100

0.5

1

1.5

2

2.5

3x 10

−3

Time [ns]

Maxim

um

Abso

lute

Err

or

PRIMA

TurboMOR

Figure 2.10: Maximum absolute transient error computed across all ports of the original net-work, due to reduced models from TurboMOR and PRIMA. Both reduced models match fourmoments at DC.

by as much as 40 mV. Matching only two moments is therefore not enough to correctly predict

the response at all ports of the power grid.

In Figure 2.9, we show the response at the same port but this time for the reduced models


that match four moments. We can observe that the responses of the two reduced models from

TurboMOR and PRIMA are identical, and correctly predict the original response. The worst

case transient errors of the two models, in Fig 2.10, are also identical, and their responses

across all ports deviate from the original response by at most 3 mV. Therefore, by matching

four moments, we can correctly predict the transient response at all ports of the power grid.

TurboMOR can therefore generate models of any level of accuracy, unlike SIP [43].

2.7 Concluding Remarks

We presented in this chapter TurboMOR, a novel reduction technique for RC networks with

many ports. The proposed method achieves accuracy by matching an arbitrary number of

moments. Unlike PRIMA, the explicit computation and storage of large and dense matrices

is avoided. The first two moments are matched using sparse matrix techniques. Additional

moments are matched using efficient Householder reflections. This reduction feature enables the

proposed method to scale well with the network size and number of ports. Graph-partitioning

can be integrated to handle reduction of very large networks. The reduced model obtained has

a block-diagonal structure. This structure makes the reduced model sparse, and accelerates

subsequent simulations. The reduced model is also passive, by construction, and can be easily

converted to an equivalent RC network for seamless integration in electronic design automation

tools. We also showed a novel interpretation of moment matching in terms of system theory.

We tested in this chapter the performance of the Proposed method in terms of reduction time,

simulation time and accuracy, through numerical experiments. As the results demonstrated,

the proposed method is superior to existing techniques such as PRIMA, SIP and the state of

the art SparseRC method. TurboMOR can be used to quickly and accurately predict signal

and power integrity issues in next-generation chips.

Chapter 3

On Reduction in Presence of

Inductors

The presence of inductors in parasitic networks limits a direct extension of TurboMOR. This is

due to the asymmetric nature of the conductance matrix, which poses a challenge in simultane-

ously eliminating resistive couplings between port nodes and internal nodes, and any inductors

attached to port nodes. As a result, it is challenging to decompose the original RLC network

into two subsystems coupled only dynamically. We can however still employ Gaussian elimina-

tion to efficiently reduce RLC parasitic networks, avoiding the explicit construction of a large

and dense projection matrix or matrix products between large and dense matrices. This is what

is done in SIP [43]. The method SIP efficiently generates reduced models of RLC networks by

using sparse matrix manipulation techniques. However, the models obtained only match one

moment of the original system at a given frequency point. This level of accuracy is not always

sufficient. One way of improving accuracy would be to match moments at multiple frequency

points [18, 34], which is what the authors of [43] suggest. This approach however could lead

to singular reduced models. Therefore, the efficient reduction of RLC networks is still an open

problem. Currently, RLC networks can only be reduced using “classical” methods like PRIMA,

that suffer from the scalability limitations mentioned in Sec. 1.2.4.

There are several technical challenges in extending SIP and TurboMOR to the RLC case.

One of them, tackled in this chapter, is the possible singularity of the matrix block of G that

51

Chapter 3. On Reduction in Presence of Inductors 52

must be inverted in order to start the reduction. This step is crucial in all fast reduction

methods that have been recently proposed in the literature (SIP [43], SparseRC [56]) as well as

for the TurboMOR method proposed in Sec. 2.1. For RC circuits, the block to be inverted is

non-singular in most cases. In the few cases where such block is singular, a simple workaround

can be found, as described in Sec. 2.3. For RLC circuits, singularity arises more frequently,

and the lack of symmetry of G rules out a simple extension of the workaround available for

the RC case. In this chapter, we propose using an alternative factorization known as the LDL

decomposition to handle the case of a singular block. This factorization clearly reveals the rows

and columns responsible for the block singularity. Once such rows and columns are identified,

they can be excluded from the reduction. With this procedure, the reduction strategy proposed

in SIP [43] for RLC circuits can be applied to a wider set of cases, and does not break down in

presence of singularities. It is hoped that this contribution will facilitate a complete extension

of the TurboMOR reduction method of Sec. 2.1 to the RLC case.

The rest of the chapter is organized as follows. In Section 3.1, we first show how RLC

networks can be reduced using Gaussian elimination. Then, this idea will be linked to the

SIP method, and we will prove that this reduction matches one moment. Finally, we describe

the proposed LDL method. Section 3.2 shows numerical results comparing the performance, in

terms of accuracy and speed, of the LDL method, SIP and PRIMA. In Section 3.3, we conclude.

3.1 Theory

Let us consider a passive RLC network in impedance representation with p ports, l inductors,

and of order m. The modified nodal analysis equations (1.1) can be written such that port


nodes come first, then internal nodes and lastly inductors

G11 G12 G13

GT12 G22 G23

−GT13 −GT

23 0

xp

xi

xl

+ s

C11 C12 0

CT12 C22 0

0 0 DL

xp

xi

xl

=

B1

0

0

u (3.1a)

y =

[BT

1 0 0

]xp

xi

xl

(3.1b)

The conductance matrix G ∈ Rm×m is asymmetric and indefinite while matrix C ∈ Rm×m

is symmetric and non-negative definite. Vectors xp ∈ Rp, xi ∈ Rm−(p+l) and xl ∈ Rl collect

the port voltages, internal node voltages and inductor currents, respectively, and sub matrix

DL ∈ Rl×l collects the inductance values of the inductors. The goal of all reduction strategies

that will be discussed is to eliminate all nodes except the port ones. With this in mind, we

group together the equations and unknowns that will be eliminated, i.e. the second and third

row in (3.1a)-(3.1b). This results in a compact form, given by

G1 G2

G3 G4

x1

x2

+ s

C1 C2

CT2 C4

x1

x2

=

B1

0

u (3.2a)

y =

[BT

1 0

]x1

x2

(3.2b)

where vector x1 ∈ Rp corresponds to vector xp in (3.1a)-(3.1b), and x2 ∈ Rm−p comprises of

vectors xi and xl in (3.1a)-(3.1b). Matrices G1, G2, G3 and G4 are written below explicitly,


using (3.1a), to help with discussions in later sections.

G1 = G11 (3.3)

G2 =

[G12 G13

](3.4)

G3 =

GT12

−GT13

(3.5)

G4 =

G22 G23

−GT23 0

(3.6)

3.1.1 Reducing RLC Networks with SIP

Matrices G2 and G3, in equations (3.2a)-(3.2b), represent the resistive and inductive couplings

between port nodes (to be kept) and internal nodes (to be eliminated). We would like to apply

Gaussian elimination to eliminate G2 and G3, so as to decompose the original system into two

subsystems decoupled at DC. By applying a congruence transform to (3.2a)-(3.2b), with the

congruence matrix

W =

I 0

−G−14 G3 I

(3.7)

we obtain

WT

G1 G2

G3 G4

W

x1

x2

+ sWT

C1 C2

CT2 C4

W

x1

x2

= WT

B1

0

u (3.8a)

y =

[BT

1 0

]W

x1

x2

(3.8b)


which simplifies, through matrix multiplication, to

G1 G2

0 G4

x1

x2

+ s

C1 C2

CT2 C4

x1

x2

=

B1

0

u (3.9a)

y =

[BT

1 0

]x1

x2

(3.9b)

where

G1 = G1 −G2G−14 G3 (3.10)

C1 = C1 −C2G−14 G3 −GT

3 G−T4 CT2 + GT

3 G−T4 C4G−14 G3 (3.11)

G2 = G2 −GT3 G−T4 G4 (3.12)

C2 = C2 −GT3 G−T4 C4 (3.13)x1

x2

= W

x1

x2

(3.14)

The transformed system (3.9a)-(3.9b) can be interpreted as a cascade of two subsystems. A

first subsystem Σ1 of order p

Σ1 :

G1x1 + sC1x1 = u1 − G2x2 + B1u

y = BT1 x1

(3.15)

and a second subsystem Σ2 of order m− p

Σ2 :

G4x2 + sC4x2 = −CT

2 u2

y2 = −C2x2

(3.16)

where

u1 = sy2 (3.17)

u2 = sx1 (3.18)


Σ1 x1

Σ2x2

s s

u1

y2u2

u (port currents)

y (port voltages)

Figure 3.1: A decomposition of system (3.9a)-(3.9b) into a cascade of two subsystems, Σ1 andΣ2. At DC, subsystem Σ2 has no influence on the transfer function of (3.9a)-(3.9b).

The two subsystems are decoupled at DC, as can be see in Fig. 3.1. At DC (s=0), the second

subsystem is not excited, since its input u2 = sx1 will be zero. The output y2 of (3.16) will be

also zero, since the system is stable by construction. Therefore, at DC, the second subsystem

has no influence on the relation between port currents u and port voltages y. This fact can be

also seen from (3.9a) using the concept of controllability from control theory. At DC, the term

in s in (3.9a) vanishes, and we have

G1 G2

0 G4

x1

x2

=

B1

0

u (3.19)

The state variables x2 cannot be excited directly by the system input u, because of the zero

block in the right hand side matrix below B1. State variables in x2 also cannot be excited

indirectly through x1, because of the zero block in the second row of the G matrix. In such

case, the state x2 is called non-controllable, and has no influence on the transfer function of the

system (3.9a)-(3.9b), i.e on the impedance matrix of the network relating port currents u and

port voltages y. Therefore, state variables x2 can be eliminated at DC without introducing any

error, obtaining a reduced model

G1x1 + sC1x1 = B1u

y = BT1 x1

(3.20)


of order p for the original system. The approximation error introduced with this reduction

progressively grows as frequency increases. The reduced model (3.20) is thus a “low-pass type”

approximation of the original system (3.9a)-(3.9b). This model matches the first moment of

the original network at DC. The proof is given next. The reduced model is also passive due to

use of congruence transformations. The reduced matrices G1 and C1 are computed efficiently

from (3.10) and (3.11), respectively, using sparse matrix techniques.

Proof of Moment Matching

The goal here is to prove that the first moment

M0 = BT G−1B (3.21)

of the reduced model (3.20) from SIP is equal to the first moment

M0 = BTG−1B (3.22)

of the original network (3.9a)-(3.9b). These expressions of moments M0 and M0 are obtained

from (1.16) and (1.19), respectively. By substituting for the reduced model matrices in (3.21),

using (3.20), we obtain

M0 = BT1 G1B1 (3.23)

Let us now substitute for the original network matrices in (3.22), using (3.9a)-(3.9b). We obtain

M0 =

[BT

1 0

]G1 G2

0 G4

−1 B1

0

(3.24)

After computing the inverse of matrix G, (3.24) becomes

M0 =

[BT

1 0

]G−11 −G−11 G2G−14

0 G−14

B1

0

(3.25)


which simplifies, through matrix multiplication, to

M0 = BT1 G1B1 (3.26)

The moment M0 matches M0. SIP therefore matches the first moment of a RLC network at

DC. The reduced model, with dimension p, is of the same size as a model from PRIMA that

also matches the first moment at DC. The model from SIP is however computed efficiently,

avoiding the dense matrix manipulations that characterize PRIMA.

On the Singularity of G4

The reduction in SIP assumes that submatrix G4 in (3.2a) is invertible. In the case of RLC

networks, SIP computes the matrix products in (3.10) and (3.11) that involve G−14 by first

computing the LU decomposition

G4 = LU (3.27)

where matrices L,U ∈ Rm−p×m−p are lower and upper triangular, respectively. Then, a forward

and backward substitution is used in (3.10) and (3.11) to obtain the reduced matrices G1 and

C1, respectively, as shown below

G1 = G1 −G2U−1L−1G3 (3.28)

C1 = C1 −C2U−1L−1G3 −GT

3 L−TU−TCT2 + GT

3 L−TU−TC4U−1L−1G3 (3.29)

If matrix G4 is singular or ill-conditioned, the forward/backward substitution in (3.28)

and (3.29) can not be performed since there will be zeros or small elements on the diagonal

of matrix U. We can not divide by zero or such small elements. We show that G4 can be

singular through a simple circuit example, inspired to a more practical scenario that arises

during parasitic extraction. The circuit is depicted in Fig. 3.2 and consists of an upper portion

and a lower portion. The upper portion describes, with a simple π model, a short segment

of interconnect, with two ports at the ends. With the two current sources connected at both

ends, modified nodal analysis leads to a model for the circuit impedance. The lower portion

of the circuit models another interconnect or metallic object nearby, with its own parasitics.


i1

g12 l23

i2c11 c33

C44

g45 l56

c66

c14 c36

Figure 3.2: Sample RLC circuit that demonstrates that the G4 block in (3.2a) can be singular.

Although this interconnect is not directly excited, it affects signal propagation along the upper

interconnect, through capacitive couplings described by c14 and c36. Inductive coupling could

also be introduced.

For this circuit, the G matrix of (3.2a) reads

G =

g12 −g12−g12 g12 +1

−1

g45 −g45−g45 +g45 +1

−1

−1 +1

−1 +1

(3.30)

The lower right corner block of this matrix corresponds to block G4 in (3.2a). As can be verified

by direct calculation of its determinant, this block is singular. The SIP reduction process for

RLC circuits [43] does not provide a way of handling singularities in G4. Therefore, it breaks

down in this case. Since this scenario can occur during parasitic extraction, we devise a method

to identify and extract the singular part of G4, and be able to proceed with the reduction. The


proposed method makes use of the LDL decomposition [44] of a real and symmetric matrix,

which we discuss next.

3.1.2 The Proposed LDL Method

For any real and symmetric matrix A ∈ Rn×n, there exists a permutation matrix P ∈ Rn×n

such that

PTAP = LDLT (3.31)

where matrix L ∈ Rn×n is a lower triangular matrix with only ones on its main diagonal, and

D ∈ Rn×n is a block diagonal matrix comprising of 1-by-1 and 2-by-2 blocks on its diagonal.

This factorization A is known as the LDL decomposition [44] of A. The diagonal matrix D

can be constructed in such a way that all the the singular or ill-conditioned 1-by-1 and 2-by-2

blocks, that would make D singular or ill-conditioned, are placed at the end of D. We can thus

write D as

D =

Da 0

0 Db

(3.32)

where submatrix Da ∈ Rr×r is an invertible block diagonal matrix of 1-by-1 and 2-by-2 blocks,

and Db ∈ Rn−r×n−r contains all the singular or ill-conditioned 1-by-1 and 2-by-2 blocks. The

rank of matrix D is denoted by r. Our goal is to use the LDL decomposition and the structure

of D to help us reveal any singularities or ill-conditioning in the block G4 in (3.2a).

From (3.6), matrix G4 is real and asymmetric. In order to apply the LDL decomposition,

we must first turn G4 into a real and symmetric matrix. We can accomplish this by using the

matrix

J =

Im−(p+l) 0

0 −Il

(3.33)

where Im−(p+l) ∈ Rm−(p+l)×m−(p+l) and Il ∈ Rl×l are identity matrices, such that

JG4 =

G22 G23

GT23 0

(3.34)


The new matrix JG4 is real and symmetric, and we can apply the LDL decomposition obtaining

PTJG4P = LDLT (3.35)

The question at this point is what kind of transformation can we apply to the original net-

work (3.2a)-(3.2b) so as to easily identify the singular or ill-conditioned rows/columns in G4.

We would like to use a congruence transformation since it will preserve passivity in the trans-

formed system. To answer this, let us rewrite (3.35) as

G4 = JPLDLTPT (3.36)

using the fact that permutation matrix P and diagonal matrix J are orthogonal, and J is

symmetric. Notice that in (3.36), if we premultiply by PTJ and postmultiply by JP, both the

right and left hand sides, we get

PTJG4JP = LDLT J (3.37)

where J = PTJP is a permutation of diagonal matrix J. Using (3.32), we can partition (3.37)

as Ga4 Gab

4

Gba4 Gb

4

=

La 0

Lba Lb

Da 0

0 Db

(La)T (Lba)T

0 (Lb)T

Ja 0

0 Jb

(3.38)

where submatrices La ∈ Rr×r and Lb ∈ Rm−p−r×m−p−r are unit lower triangular matrices,

Ja ∈ Rr×r and Jb ∈ Rm−p−r×m−p−r are diagonal matrices with +1 and/or −1 on the main

diagonal. Equation (3.38) simplifies, through matrix products, to

Ga4 Gab

4

Gba4 Gb

4

=

LaDa(La)T Ja LaDa(Lba)T Jb

LbaDa(La)T Ja LbaDa(Lba)T Jb + LbDb(Lb)T Jb

(3.39)

Submatrix Ga4 ∈ Rr×r is invertible and corresponds to the factored form LaDa(La)T Ja. There-

fore, using the LDL decomposition of JG4, we can reveal the rows and columns that cannot be

eliminated during the reduction, and extract a non-singular block that can be safely used for


Gaussian elimination. We can now apply a congruence to the original network (3.2a)-(3.2b),

using matrix

V =

Ip 0

0 JP

(3.40)

to obtain

VT

G1 G2

G3 G4

V

x1

x2

+ sVT

C1 C2

CT2 C4

V

x1

x2

= VT

B1

0

u (3.41a)

y =

[BT

1 0

]V

x1

x2

(3.41b)

which simplifies, through matrix products, to

G1 G2JP

PTJG3 PTJG4JP

x1

x2

+ s

C1 C2JP

PTJCT2 PTJC4JP

x1

x2

=

B1

0

u (3.42a)

y =

[BT

1 0

]x1

x2

(3.42b)

where x1

x2

= V

x1

x2

(3.43)

By reordering states in (3.42a)-(3.42b) such that states corresponding to the invertible part of

PTJG4JP come last, we get

G1 Gb

2 Ga2

Gb3 Gb

4 Gba4

Ga3 Gab

4 Ga4

x1

xb2

xa2

+ s

C1 Cb

2 Ca2

(Cb2)

T Cb4 (Cab

4 )T

(Ca2)T Cab

4 Ca4

x1

xb2

xa2

=

B1

0

0

u (3.44a)

y =

[BT

1 0 0

]x1

xb2

xa2

(3.44b)


Equations (3.44a)-(3.44b) can be written in compact form as

G′1 G

′2

G′3 G

′4

x

′1

x′2

+ s

C′1 C

′2

(C′2)

T C′4

x

′1

x′2

=

B′1

0

u (3.45a)

y =

[(B

′1)

T 0

]x′1

x′2

(3.45b)

where

G′1 =

G1 Gb2

Gb3 Gb

4

(3.46)

G′2 =

Ga2

Gba4

(3.47)

G′3 =

[Ga

3 Gab4

](3.48)

G′4 = Ga

4 (3.49)

C′1 =

C1 Cb2

(Cb2)

T Cb4

(3.50)

C′2 =

Ca2

(Cab4 )T

(3.51)

C′4 = Ca

4 (3.52)

B′1 =

B1

0

(3.53)

Due to the use of full rank congruence matrices, the transformation from original network (3.2a)-

(3.2b) to system (3.45a)-(3.45b) does not change the transfer function of the original network,

or its moments. Therefore, by matching the moments of (3.45a)-(3.45b), we will match also

the moments of the original system.

At this point, we can perform the order reduction. We apply Gaussian elimination to sys-

tem (3.45a)-(3.45b) to eliminate the coupling term G′3, similarly to what was done in Sec. 3.1.1.


The pivoting matrix G′4 is invertible by construction and is already available in factored form

LaDa(La)T Ja, as a by-product of the LDL decomposition that was performed to extract the

singularity. We perform Gaussian elimination by applying a congruence transform to (3.45a)-

(3.45b), using matrix

V2 =

Im−r 0

−(G′4)−1G

′3 Ir

(3.54)

where Im−r ∈ Rm−r×m−r and Ir ∈ Rr×r are identity matrices, to obtain

VT2

G′1 G

′2

G′3 G

′4

V2

x′1

x′2

+ sVT2

C′1 C

′2

(C′2)

T C′4

V2

x′1

x′2

= VT2

B′1

0

u (3.55a)

y =

[(B

′1)

T 0

]V2

x′1

x′2

(3.55b)

Through matrix multiplication, (3.55a) and (3.55b) simplify to give

G′′1 G

′′2

0 G′4

x

′1

x′′2

+ s

C′′1 C

′′2

(C′′2)T C

′4

x

′1

x′′2

=

B′1

0

u (3.56a)

y =

[(B

′1)

T 0

]x′1

x′′2

(3.56b)

where

G′′1 = G

′1 −G

′2(G

′4)−1G

′3 (3.57)

C′′1 = C

′1 −C

′2(G

′4)−1G

′3 − (G

′3)

T (G′4)−T (C

′2)

T + (G′3)

T (G′4)−TC

′4(G

′4)−1G

′3 (3.58)

G′′2 = G

′2 − (G

′3)

T (G′4)−TG

′4 (3.59)

C′′2 = C

′2 − (G

′3)

T (G′4)−TC

′4 (3.60)x

′1

x′2

= V2

x′1

x′′2

(3.61)

Using a system interpretation similar to that of (3.9a)-(3.9b), shown in Fig. 3.1, the state


variables in x′′2 do not influence the transfer function of (3.56a), (3.56b) at DC. We can therefore

match the first moment of the original network using the reduced model

G

′′1x

′1 + sC

′′1x

′1 = B

′1u

y′′

= (B′1)

Tx′′1

(3.62)

of order m − r. This model is passive, since we only employ congruence transformations.

The reduced matrices G′′1 and C

′′1 are computed using (3.57) and (3.58), respectively. Matrix

products involving (G′4)−1 or (G

′4)−T , in (3.57)-(3.58), are computed using the factorization

G′4 = LaDa(La)T Ja. Note that if there are no singularities in the block G4 in (3.2a), the reduced

model (3.62) will be of order p, just like the reduced model (3.20) from the SIP method.

Practical implementation Details

The following are the practical implementation details of the proposed LDL method that make

it computationally efficient and competitive with SIP [43]:

1. The transformation of the original network (3.2a)-(3.2b) to system (3.42a)- (3.42b), us-

ing the congruence matrix (3.40), is essentially equivalent to a few sign changes in the

conductance matrix G in (3.2a). This operation is cheap to perform.

2. To identify the singular rows/columns, we first compute the singular values of the 1-by-1

and 2-by-2 blocks of the block-diagonal matrix D in (3.35). For the 1-by-1 blocks, this

corresponds to their magnitudes. Given that the 2-by-2 blocks are real and symmetric,

their singular values correspond to the magnitude of their eigenvalues. We then search

for the first block (1-by-1 or 2-by-2 block) that satisfies the test

γ < σγmax (3.63)

where γ denotes a singular value of a given block, γmax is the maximum singular value

among all blocks and σ is a threshold value between 0 and 1. In our tests, we used

σ = 10−4. The first block to satisfy (3.63) represents the starting point of singularity or

ill-conditioning in D. All the blocks above this block form the invertible part of D, while


the rest including this block constitute the singular part.

3. Once we have identified the invertible part of D, we only need to permute rows and

columns in (3.42a)-(3.42b) to obtain (3.45a)-(3.45b). At this point, we can now compute

the reduced model matrices in (3.62) using equations (3.57) and (3.58). These equa-

tions are basically sparse matrix manipulations involving the sparse factors of matrix G′4,

and so can be efficiently performed in terms of both computational cost and memory

consumption.

The process of identifying the singular rows/columns does not affect much the cost of reduction

in the proposed LDL method. The computation of the reduced matrices involves sparse oper-

ations similar to SIP [43]. As numerical results will show, this makes the proposed method as

computationally efficient as the SIP method. However, unlike SIP [43], the proposed method

can even be applied to networks with a singular G4 block in (3.2a).

3.2 Numerical Results

In this section, we compare the performance of the proposed LDL method, SIP and PRIMA.

We test the methods in one case, on a network with an invertible G4 block and in another

case, on a network with a singular G4 block. The LDL decomposition, in the proposed method,

is computed in MATLAB using the routine ldl, for sparse, real and symmetric matrices.

This routine uses the MA57 algorithm [61], which returns the block-diagonal matrix D with

all singularities placed at the end of the matrix. The SIP and PRIMA methods are also

implemented in MATLAB. All experiments are run in MATLAB on a 3.4 GHz Intel i7 computer,

with 16 GB of memory.

3.2.1 Non-singular Case

We first experiment on a network in which the block G4 is non-singular. We compare the

reduction time and accuracy of all methods. The test example used is the same on-chip bus

in Sec. 2.6, but with inductors introduced. Each signal line is modeled as a cascade of RLC π

segments with the following per unit length parameters:


0 200 400 600 800 1000 12000

5

10

15

20

25

30

Number of Ports

Red

uct

ion t

ime

[s]

PRIMA

SIP

proposed LDL method

Figure 3.3: Reduction time of proposed LDL method, SIP and PRIMA vs number of ports. Allmethods match one moment at DC.

• line resistance is 40.74 Ohm/mm.

• ground capacitance is 82.03 fF/mm.

• coupling capacitance between adjacent lines is 73.22 fF/mm.

• line inductance is 1.7032 nH/mm.

• segment length is 0.2 mm

All signal lines are directly excited from both ends.

Reduction Speed

We start by assessing the speed of the proposed LDL method relative to SIP and PRIMA. We

evaluate the reduction times of all methods while varying the number of ports of the on-chip

bus, with the node-to-port ratio constant. We also evaluate the reduction times varying the

node-to-port ratio with the number of ports constant.

Varying Number of Ports, Constant Node-to-Port Ratio

We vary the number of signal lines of the on-chip bus, which corresponds to varying the number


0 20 40 60 80 100 120 140 1600

20

40

60

80

100

Node to Port Ratio

Red

uct

ion

tim

e [s

]

PRIMA

SIP

proposed LDL method

Figure 3.4: Reduction time of proposed LDL method, SIP and PRIMA vs the node-to-portratio. All methods match one moment at DC.

of ports, while keeping the length of each signal line and segment fixed at 20 mm and 0.2 mm,

respectively. This keeps the node-to-port ratio constant at 50.5.

Figure 3.3 shows the reduction time of the proposed method, SIP and PRIMA versus the

number of ports of the bus. We observe that the proposed method is faster than PRIMA

and speedup increases as the number of ports increases. This is because the proposed LDL

method uses sparse matrix techniques to avoid explicitly computing a dense projection matrix,

or performing dense matrix products. PRIMA instead employs dense matrix operations to

generate the reduced models. The memory requirements and CPU time for such operations

increase as the number of ports increases. Comparing the proposed LDL method to SIP, we

observe that they have almost the same reduction time even as the number of ports increases.

This shows that the computational cost associated with the LDL decomposition, identifying

singular rows/columns and all sparse manipulations involved in computing the reduced matrices

in the proposed method is comparable to that of the sparse operations in SIP, even for a high

port count.

Varying Node-to-Port Ratio, Constant Number of Ports

We now vary the length of each signal line while keeping the segment length and the number

of lines fixed at 0.2 mm and 512 lines (1024 ports), respectively. This varies the node-to-port


0 1 2 3 4 5−40

−30

−20

−10

0

Frequency GHz

Coupli

ng c

oef

fici

ent

[dB

]

Original

PRIMA

proposed LDL method

Figure 3.5: Magnitude of the near-end coupling coefficient between two adjacent lines. Theresponse is generated using the original network, and the reduced models obtained from theproposed LDL method and PRIMA. Both reduced models match one moment at DC.

ratio of the network.

Figure 3.4 shows the reduction time of the proposed LDL method, SIP and PRIMA ver-

sus the node-to-port ratio of the network. We observe that the proposed method outperforms

PRIMA, and the speedup increases as the node-to-port ratio increases. Comparison of the

proposed method with SIP shows that the two methods have almost the same reduction time

for different node-to-port ratios. The process of identifying the singular rows/columns is com-

putationally cheap and does not greatly influence the reduction time of the proposed method.

All other operations in the proposed method have a similar complexity to operations in SIP.

Accuracy

We now test the accuracy of all three methods. We evaluate the scattering matrix at the ports

of the on-chip bus, with 128 signal lines, and at the ports of the corresponding reduced models

generated using the proposed LDL method, SIP and PRIMA. All models match one moment

at DC.

Figure 3.5 shows the coupling coefficient between two ports of adjacent signal lines. The

results by the proposed method and PRIMA are identical. Figure 3.6 shows the same coupling


0 1 2 3 4 5−40

−30

−20

−10

0

Frequency GHz

Coupli

ng c

oef

fici

ent

[dB

]

Original

SIP

proposed LDL method

Figure 3.6: The same response as in Fig. 3.5. This figure includes also the result from SIP.Both reduced models match one moment at DC.

coefficient but also evaluated with the model from SIP. The result by SIP is also identical to the

proposed method. The three methods match the low-frequency response of the original system.

This is consistent with the fact that all three methods are capable of matching one moment

around each expansion point, in this case s=0. If accuracy is required on a wider bandwidth,

one must be able to match additional moments, for example around other frequency points,

using multipoint matching [18,34].

3.2.2 Singular Case

Our next set of experiments is on RLC networks with singular G4 blocks. The test examples

we use are similar to the circuit in Fig. 3.2, but with 300 RLC π segments on each line. We

consider the case of 2 lines, and also 3 lines. In both cases, only one line is directly excited from

both ends. Adjacent lines are capacitively coupled. These circuits represent a practical scenario

on a chip, where a signal line is close to metallic objects. The methods SIP and PRIMA failed

to perform reduction due to the singularities in the G4 block.


0 1 2 3 4 5−80

−60

−40

−20

0

Frequency [GHz]

Tra

nsm

issi

on c

oef

fici

ent

[dB

]

Original

proposed LDL method

Figure 3.7: The magnitude of the transmission coefficient between two ports of a network with2 lines. The response is generated using the original network, and the reduced model from theproposed LDL method. The reduced model matches one moment at DC.

The Case of Two Lines

The order of the network is m = 1, 802. The number of ports is p = 2, since only one line

is excited. The proposed LDL method successfully performed reduction, generating a reduced

model of order 3. The size of the reduced model implies that only one singular row was

identified and retained. This one row, in the G4 block, did not increase much the size of

the reduced model from the Proposed LDL method, but was enough to cause the SIP and

PRIMA methods to fail. The reduction time of the proposed method was 0.01 seconds, which

demonstrates that the process of identifying the singular rows and computing the reduced model

matrices is computationally efficient. Figure. 3.7 shows the accuracy of the reduced model from

the proposed method. We evaluate the transmission coefficient between the two ports of the

network, and compare results from the original network and the reduced model. As can be

observed, the result of the reduced model coincides with that of the original network at DC.

This demonstrates that the reduced model matches the first moment, which is the constant

term in the Taylor series expansion of the transfer function of the original network. To improve

accuracy at other frequency points, we would have to capture more terms of the Taylor series

expansion, which could be done by matching more moments.


0 1 2 3 4 5−80

−60

−40

−20

0

Frequency [GHz]

Tra

nsm

issi

on c

oef

fici

ent

[dB

]

Original

proposed LDL method

Figure 3.8: The magnitude of the transmission coefficient between two ports of a network with3 lines. The response is generated using the original network, and the reduced model from theproposed LDL method. The reduced model matches one moment at DC.

The Case of Three Lines

In the 3 line case, the network order is m = 2703, and the number of ports is p = 2. The

proposed LDL method again successfully reduced the original network, generating a reduced

model of order 904. The model size implies that 902 singular rows were identified and retained.

The number of singular rows in the G4 block of this network was quite high and increased

much the size of the reduced model. We are currently looking into steps to take when the level

of singularities is high. Nevertheless, unlike the SIP method and PRIMA, the proposed LDL

method was still able to reduce the network. The reduction time of the proposed method was

0.09 seconds, which demonstrates its computational efficiency. Figure. 3.8 shows the accuracy

of the model. We can observe that the result of the reduced model matches that of the original

network at DC. This shows that the model captures the first moment of the original network.

3.3 Concluding Remarks

In this chapter, we proposed an alternative method for efficiently reducing RLC networks with

many ports. The proposed method uses Gaussian elimination, through sparse matrix manip-


ulations, to generate the reduced model matrices. This is also done by other fast techniques

like SIP [43]. The novel feature of the proposed method, however, is that it can also reduce

networks with singularities. This is not the case for the SIP method. Prior to reduction, we em-

ploy LDL decomposition to identify and isolate any singular rows/columns, in the pivot matrix,

that would prevent applying Gaussian elimination. Reduction is then performed using the in-

vertible part only, while all the singular rows/columns are retained in the reduced model. The

process of identifying singular rows/columns is computationally cheap, and does not greatly

affect the reduction time. As numerical results demonstrated, the proposed method is as com-

putationally efficient as SIP [43], but can be applied to a wider set of parasitic networks from

interconnect modeling. The numerical results also demonstrated that, just like SIP [43], the

proposed method matches one moment at DC. Extending the proposed method to efficiently

match additional moments of an RLC network is ongoing research.

Chapter 4

Conclusions

4.1 Summary

The focus of our work in this thesis has been on the efficient model order reduction of large RC

networks with many ports, which arise from parasitic extraction of interconnects during the

computer aided design of VLSI chips. Such networks can comprise of millions of components

and nodes, and several thousands of ports. Directly simulating these electrical networks with

other components, to assess power and signal integrity on chips, consumes a lot of memory and

CPU time, and is in some cases infeasible. Model order reduction is typically used to generate

accurate and passive reduced models to accelerate simulations. However, existing reduction

techniques do not scale well with the network order and port count. The memory requirements

and CPU time for reduction increase dramatically as the size of the network and number of ports

increase. Moreover, the obtained reduced models tend to be inefficient for use in subsequent

simulations. This is because the reduction process involves explicitly computing and storing a

dense projection matrix, and performing matrix products between dense matrices that result

in dense reduced models. As the network size and port count increase, such operations become

computationally costly, and lead to large and dense reduced models that can even be more

expensive to simulate than the original networks. In particular, the scalability of existing

methods with respect to the number of ports is of particular concern for industry. Although a

number of techniques have been recently proposed to address this issue, they have limitations

in terms of accuracy or stability of the reduced model.

74

Chapter 4. Conclusions 75

In Chapter 2, we developed a novel model order reduction technique for RC networks with

many ports, named TurboMOR. Numerical tests show that TurboMOR scales better than

existing methods with respect to both node and port count. TurboMOR was applied to net-

works with more than one million nodes and two thousand ports, which can be hardly handled

by existing methods. TurboMOR achieves such high scalability because it completely avoids

the computation and storage of a dense projection matrix, used by mainstream methods like

PRIMA. For large networks, such matrix becomes huge, and significantly increases both CPU

time and memory consumption. TurboMOR’s accuracy can be fully controlled, since the method

can match an arbitrary number of moments, unlike previous “fast” methods like SIP that can

match only two moments.

Another distinctive feature of TurboMOR is the sparse and block-diagonal structure of the

reduced models. This feature makes the proposed models faster to simulate than those produced

by existing techniques, which are typically dense. Improvements in simulation time of up to

2X with respect to the state of the art have been shown. From a theoretical viewpoint, the

structure of the proposed reduced model enables a novel interpretation of the reduction process

in terms of system theory, which we believe is far-reaching. TurboMOR’s models are passive by

construction, unlike those generated by some recently-proposed techniques, that trade passivity

for scalability. Being passive, the proposed models ensure stable transient simulations under

any condition, and are thus robust for industrial use.

Finally, in Chapter 3, we considered the open problem of the reduction of large RLC net-

works. The presence of inductors significantly increases the complexity of the mathematical

problems underlying model order reduction. The few “fast” reduction strategies proposed so

far in the literature do not always succeed in the RLC case, due to the possible singularity

of some matrix blocks that must be inverted. We proposed a new procedure, based on the

LDL decomposition, that is capable of identifying the nodes responsible for singularity. Such

nodes are excluded from the reduction, which thus does not break down. Numerical results

show that this approach enjoys the same scalability as previously-proposed techniques. The

proposed idea, in its current form, does not solve the problem of reducing RLC networks, since

only one moment can be preserved, which is typically insufficient to ensure good accuracy. Also,

in some cases, too many nodes must be excluded from the reduction in order to circumvent


the singularity, which partially defeats the purpose of model order reduction. In spite of these

limitations, we believe that the results in Chapter 3 are encouraging since they show a possible

pathway to extend the benefits of the proposed TurboMOR approach to the general case of

RLC networks.

4.2 Contributions

This thesis resulted in the following publications:

1. D. Oyaro and P. Triverio, “Fast Model Order Reduction of RC Networks with Very Large

Order and Port Count,” in 24th IEEE Conference on Electrical Performance of Electronic

Packaging and Systems, San Jose, CA, Oct. 25-28 2015, (accepted).

2. D. Oyaro and P. Triverio, “TurboMOR: an Efficient Model Order Reduction Technique

for RC Networks with Many Ports,” IEEE Transactions on Computer-Aided Design of

Integrated Circuits and Systems, 2015, (submitted).

4.3 Future Work

The reduction method proposed in this thesis, with its novel features, opens several exciting

avenues for future research:

• The inclusion of inductors would be, without a doubt, the most compelling extension

of the proposed work. Currently, there is essentially no “fast” reduction method in the

literature that can handle inductors. Only “classical” methods like PRIMA can reduce

generic RLC networks, with the well-recognized scalability limitations that motivated this

thesis. We believe that the novel ideas and insights proposed in this work will facilitate

the development of a scalable reduction method for RLC circuits. Such activity is already

underway, and some preliminary results have been discussed in Chapter 3;

• Another interesting topic for future research is the development of a robust method to

automatically choose the order of the reduced model based on user’s requirements. Order

selection is an open problem in model order reduction. The structure of the reduced


models produced by TurboMOR seems to offer a novel and convenient representation to

develop an error control mechanism, and stop reduction when accuracy requirements have

been met;

• As shown in this thesis and in some earlier publications, graph partitioning techniques are

extremely useful to tackle the reduction of very large networks. However, it is currently

an open question how to determine the optimal number of partitions for a given network.

Answering this question will facilitate the integration of graph partitioning algorithms in

commercial codes for model order reduction.

Bibliography

[1] IBM, “Copper Interconnects,” http://www-03.ibm.com/ibm/history/ibm100/us/en/

icons/copperchip/.

[2] M. Swaminathan and E. Engin, Power integrity modeling and design for semiconductors

and systems. Pearson Education, 2007.

[3] S. R. Nassif, “Power grid analysis benchmarks,” in Proceedings of the 2008 Asia and South

Pacific Design Automation Conference. IEEE Computer Society Press, 2008, pp. 376–381.

[4] S. I. Association, “International Technology Roadmap for Semiconductor,” http://www.

itrs.net/, 2013.

[5] R. Dhiman and R. Chandel, Compact Models and Performance Investigations for Sub-

threshold Interconnects. Springer, 2015.

[6] J. Power, Y. Li, M. D. Hill, J. M. Patel, and D. A. Wood, “Implications of emerging 3d

gpu architecture on the scan primitive,” ACM SIGMOD Record, vol. 44, no. 1, pp. 18–23,

2015.

[7] A. M. Devices, “High-Bandwith Memory (HBM),” http://www.amd.com/Documents/

High-Bandwidth-Memory-HBM.pdf, 2015.

[8] R. Achar and M. S. Nakhla, “Simulation of high-speed interconnects,” Proceedings of the

IEEE, vol. 89, no. 5, pp. 693–728, 2001.

[9] F. Caignet, S. D. Bendhia, and E. Sicard, “The challenge of signal integrity in deep-

submicrometer cmos technology,” Proceedings of the IEEE, vol. 89, no. 4, pp. 556–573,

2001.

78

Bibliography 79

[10] A. E. Ruehli and A. C. Cangellaris, “Progress in the methodologies for the electrical

modeling of interconnects and electronic packages,” Proceedings of the IEEE, vol. 89, no. 5,

pp. 740–771, 2001.

[11] R. Nair and D. Bennett, Power Integrity Analysis and Management for Integrated Circuits.

Prentice Hall, 2010.

[12] M. Kamon, S. McCormick, and K. Shepard, “Interconnect parasitic extraction in the

digital IC design methodology,” in Computer-Aided Design, 1999. Digest of Technical

Papers. 1999 IEEE/ACM International Conference on. IEEE, 1999, pp. 223–230.

[13] W. H. Kao, C.-Y. Lo, M. Basel, and R. Singh, “Parasitic extraction: current state of the

art and future trends,” Proceedings of the IEEE, vol. 89, no. 5, pp. 729–739, 2001.

[14] S.-P. Sim, S. Krishnan, D. M. Petranovic, N. D. Arora, K. Lee, and C. Y. Yang, “A unified

RLC model for high-speed on-chip interconnects,” IEEE Transactions on Electron Devices,

vol. 50, no. 6, pp. 1501–1510, 2003.

[15] Nanoscale Integration and Modeling (NIMO) Group, “Predictive technology model,” http:

//ptm.asu.edu/.

[16] Z. Li, P. Li, and S. R. Nassif, “IBM Power Grid Benchmarks,” http://dropzone.tamu.edu/

∼pli/PGBench/.

[17] P.-W. Luo, C. Zhang, Y.-T. Chang, L.-C. Cheng, H.-H. Lee, B.-L. Sheu, Y.-S. Su, D.-

M. Kwai, and Y. Shi, “Benchmarking for research in power delivery networks of three-

dimensional integrated circuits,” in Proceedings of the 2013 ACM international symposium

on International symposium on physical design. ACM, 2013, pp. 17–24.

[18] M. Celik, L. Pileggi, and A. Odabasioglu, IC interconnect analysis. Springer Science &

Business Media, 2002.

[19] K. Banerjee, S. Im, and N. Srivastava, “Interconnect modeling and analysis in the nanome-

ter era: Cu and beyond,” in Proceedings of the 22nd Advanced Metallization Conference

(AMC05), 2005, pp. 25–31.

Bibliography 80

[20] K. Lee, “On-chip interconnects: Giga Hertz and beyond,” in Interconnect Technology Con-

ference, 1998. Proceedings of the IEEE 1998 International. IEEE, 1998, pp. 15–17.

[21] P. Benner, M. Hinze, and E. J. W. Ter Maten, Model reduction for circuit simulation.

Springer, 2011.

[22] A. C. Antoulas, Approximation of large-scale dynamical systems. SIAM, 2005, vol. 6.

[23] W. H. Schilders, H. A. Van der Vorst, and J. Rommes, Model order reduction: theory,

research aspects and applications. Springer, 2008, vol. 13.

[24] S. Tan and L. He, Advanced model order reduction techniques in VLSI design. Cambridge

University Press, 2007.

[25] F. Yang, X. Zeng, Y. Su, and D. Zhou, “RLCSYN: RLC equivalent circuit synthesis for

structure-preserved reduced-order model of interconnect,” in International Symposium on

Circuits and Systems (ISCAS). IEEE, 2007, pp. 2710–2713.

[26] C.-W. Ho, A. E. Ruehli, and P. A. Brennan, “The modified nodal approach to network

analysis,” IEEE Transactions on Circuits and Systems, vol. 22, no. 6, pp. 504–509, 1975.

[27] P. Triverio, S. Grivet-Talocia, M. S. Nakhla, F. G. Canavero, and R. Achar, “Stability,

causality, and passivity in electrical interconnect models,” IEEE Transactions on Advanced

Packaging, vol. 30, no. 4, pp. 795–808, 2007.

[28] O. Brune, “Synthesis of a finite two-terminal network whose driving-point impedance is a

prescribed function of frequency,” Ph.D. dissertation, Massachusetts Institute of Technol-

ogy, 1931.

[29] L. T. Pillage, R. Rohrer et al., “Asymptotic waveform evaluation for timing analysis,”

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 9,

no. 4, pp. 352–366, 1990.

[30] L. Pillage, Electronic Circuit & System Simulation Methods (SRE). McGraw-Hill, Inc.,

1998.

Bibliography 81

[31] A. Odabasioglu, M. Celik, and L. T. Pileggi, “PRIMA: passive reduced-order interconnect

macromodeling algorithm,” in Proceedings of the 1997 IEEE/ACM international confer-

ence on Computer-aided design. IEEE Computer Society, 1997, pp. 58–65.

[32] E. Chiprout and M. S. Nakhla, Asymptotic waveform evaluation. Springer, 1994.

[33] H. Pade, Sur la representation approchee d’une fonction par des fractions rationnelles.

Gauthier-Villars et fils, 1892.

[34] E. J. Grimme, “Krylov projection methods for model reduction,” Ph.D. dissertation, Cite-

seer, 1997.

[35] D. L. Boley, “Krylov space methods on state-space control models,” Circuits, Systems and

Signal Processing, vol. 13, no. 6, pp. 733–758, 1994.

[36] P. Feldmann and R. W. Freund, “Efficient linear circuit analysis by Pade approximation

via the Lanczos process,” IEEE Transactions on Computer-Aided Design of Integrated

Circuits and Systems, vol. 14, no. 5, pp. 639–649, 1995.

[37] W. E. Arnoldi, “The principle of minimized iterations in the solution of the matrix eigen-

value problem,” Quarterly of Applied Mathematics, vol. 9, no. 1, pp. 17–29, 1951.

[38] B. C. Moore, “Principal component analysis in linear systems: Controllability, observabil-

ity, and model reduction,” IEEE Transactions on Automatic Control, vol. 26, no. 1, pp.

17–32, 1981.

[39] J. R. Phillips and L. M. Silveira, “Poor man’s TBR: a simple model reduction scheme,”

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24,

no. 1, pp. 43–55, 2005.

[40] J. R. Phillips, L. Daniel, and L. M. Silveira, “Guaranteed passive balancing transformations

for model order reduction,” IEEE Transactions on Computer-Aided Design of Integrated

Circuits and Systems, vol. 22, no. 8, pp. 1027–1041, 2003.

[41] B. N. Sheehan, “Realizable reduction of RC networks,” IEEE Transactions on Computer-

Aided Design of Integrated Circuits and Systems, vol. 26, no. 8, pp. 1393–1407, 2007.

Bibliography 82

[42] C. S. Amin, M. H. Chowdhury, and Y. I. Ismail, “Realizable rlck circuit crunching,” in

Proceedings of the 40th annual Design Automation Conference. ACM, 2003, pp. 226–231.

[43] Z. Ye, D. Vasilyev, Z. Zhu, and J. R. Phillips, “Sparse implicit projection (SIP) for reduc-

tion of general many-terminal networks,” in Proceedings of the 2008 IEEE/ACM Interna-

tional Conference on Computer-Aided Design. IEEE Press, 2008, pp. 736–743.

[44] G. H. Golub and C. F. Van Loan, Matrix computations. John Hopkins University Press,

2012, vol. 3.

[45] J. M. Silva, J. F. Villena, P. Flores, and L. M. Silveira, “Outstanding issues in model

order reduction,” in Scientific Computing in Electrical Engineering. Springer, 2007, pp.

139–152.

[46] P. Feldmann, “Model order reduction techniques for linear systems with large numbers of

terminals,” in Proceedings of the conference on Design, automation and test in Europe-

Volume 2. IEEE Computer Society, 2004, p. 20944.

[47] P. Liu, S. X.-D. Tan, B. Yan, and B. McGaughy, “An efficient terminal and model order

reduction algorithm,” Integration, the VLSI journal, vol. 41, no. 2, pp. 210–218, 2008.

[48] P. Feldmann and F. Liu, “Sparse and efficient reduced order modeling of linear subcircuits

with large number of terminals,” in IEEE/ACM International Conference on Computer

Aided Design, 2004 (ICCAD-2004). IEEE, 2004, pp. 88–92.

[49] P. Benner and A. Schneider, “Model order and terminal reduction approaches via ma-

trix decomposition and low rank approximation,” in Scientific Computing in Electrical

Engineering SCEE 2008. Springer, 2010, pp. 523–530.

[50] P. Li and W. Shi, “Model order reduction of linear networks with massive ports via

frequency-dependent port packing,” in Proceedings of the 43rd annual Design Automa-

tion Conference. ACM, 2006, pp. 267–272.

[51] P. Benner, L. Feng, and E. B. Rudnyi, “Using the superposition property for model reduc-

tion of linear systems with a large number of inputs,” in Proceedings of the 18th Interna-

tional Symposium on Mathematical Theory of Networks & Systems, 2008.

Bibliography 83

[52] Z. Zhang, X. Hu, C.-K. Cheng, and N. Wong, “A block-diagonal structured model reduction

scheme for power grid networks,” in Design, Automation & Test in Europe Conference &

Exhibition (DATE). IEEE, 2011, pp. 1–6.

[53] B. Nouri, M. S. Nakhla, and R. Achar, “Efficient reduced-order macromodels of massively

coupled interconnect structures via clustering,” IEEE Transactions on Components, Pack-

aging and Manufacturing Technology, vol. 3, no. 5, pp. 826–840, 2013.

[54] F. Zhang, The Schur complement and its applications. Springer Science & Business Media,

2006, vol. 4.

[55] K. J. Kerns and A. T. Yang, “Stable and efficient reduction of large, multiport RC networks

by pole analysis via congruence transformations,” IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems, vol. 16, no. 7, pp. 734–744, 1997.

[56] R. Ionutiu, J. Rommes, and W. H. Schilders, “SparseRC: sparsity preserving model reduc-

tion for RC circuits with many terminals,” IEEE Transactions on Computer-Aided Design

of Integrated Circuits and Systems, vol. 30, no. 12, pp. 1828–1841, 2011.

[57] E. Chiprout and M. S. Nakhla, “Analysis of interconnect networks using complex fre-

quency hopping (cfh),” Computer-Aided Design of Integrated Circuits and Systems, IEEE

Transactions on, vol. 14, no. 2, pp. 186–200, 1995.

[58] T. Davis, “SuiteSparse,” http://faculty.cse.tamu.edu/davis/suitesparse.html.

[59] Netlib, “LAPACK,” http://www.netlib.org/lapack/index.html.

[60] A. Zecevic and D. Siljak, “Balanced decompositions of sparse systems for multilevel parallel

processing,” IEEE Transactions on Circuits and Systems I, vol. 41, no. 3, pp. 220–233,

1994.

[61] I. S. Duff, “MA57—a code for the solution of sparse symmetric definite and indefinite

systems,” ACM Transactions on Mathematical Software (TOMS), vol. 30, no. 2, pp. 118–

144, 2004.