on the modeling and simulation of large-scale systems

On the modeling and simulation of large-scale systems

Venkataramanan (Ragu) Balakrishnan School of ECE, Purdue University

Joint work with Stephen Cauley, Jitesh Jain, Hong Li, Cheng-Kok Koh (Purdue) and M. P. Anantram (NASA)

Basic ideas

Many engineering system models are of a large scale

However, most interactions are local Problems:

Modeling that captures locality Exploitation in simulation

Outline

One modeling example One simulation example

VLSI interconnect modeling

Interconnects are relatively long wires connecting circuit elements

To account for distributed effects: Wires broken into short

segments Segments further subdivided

into filaments (if necessary) Surfaces subdivided in panels (if

necessary) Result, a large-scale RLC model

Model parameterized by very large matrices (size 104 or higher)

obtained by inverting , the potential matrix, which maps charges to voltages

obtained directly from magnetic vector potential; entries are self- and mutual-inductances

The interconnect model

Model structure

is diagonal is approximated as

sparse is dense, but inverse is

approximately sparse Sparsity structure:

Entries of and obtained via CAD tools

and approx. sparse Modeling issues:

Detecting approx. sparsity pattern and Approximating and s.t.

inverses are sparse, with Parameterization of and Efficient computation (say matrix-vector

multiplies) with and

Model extraction

Some answers

Interconnect modeling is part of an engineering design flow

Partial answers available from design stage; e.g., sparsity pattern in and

Focus on approximation problem Begin with simple case: Given , find with tridiagonal

Tridiagonal case

Key result: Suppose with tridiagonal. Under mild conditions:

Result from late 1950s Parameters computed as

Only tridiagonal entries of needed

Tridiagonal band-matching

Given : Construct from tridiag entries of

:

Define:

Tridiagonal band-matching

Then: is tridiagonal can be computed in Products and computable in We have Optimality?

minimizes Kullback-Leibler distance:

80.1051.870.628.516.428.358.2

51.880.1051.870.628.516.428.3

70.651.880.1051.870.628.516.4

28.570.651.880.1051.870.628.5

16.428.570.651.880.1051.870.6

28.316.428.570.651.880.1051.8

58.228.316.428.570.651.880.10

10~ 11L

80.1051.822.745.690.547.513.5

51.880.1051.822.745.690.547.5

22.751.880.1051.822.745.690.5

45.622.751.880.1051.822.745.6

90.545.622.751.880.1051.822.7

47.590.545.622.751.880.1051.8

13.547.590.545.622.751.880.10

10 11L

A simple example

44.292.100000

92.195.392.10000

092.195.392.1000

0092.195.392.100

00092.195.392.10

000092.195.392.1

0000092.144.2

10ˆ 101L

General case Given seek with block-banded ( blocks, block-size , block bandwidth

) “Band-matching” gives approximant s.t:

is block-banded, with In-band entries of and match is an “optimal” approximant specified by parameters, requiring computation Products and can be computed in

Further issues

Numerical stability: Tridiagonal case:

Parameterization with ill-conditioned Instead of , use ratios , …

Extension possible to block-tridiagonal case

Simulation without explicit computation of parameters

Structure of matrices whose inverses have more general sparsity patterns

Outline

One modeling example One simulation example

Nano-scale simulation

Problem: Determine and evaluate dynamic behavior of the device

Macro-level simulation techniques of unacceptable accuracy

Need quantum mechanical modeling

Nonequilibrium Green’s Function approach:

• Form Hamiltonian• Write out the equations of motion

for the retarded ( ) and less-than ( ) Green’s functions

• Solve for density of states and charge density

2D Simulation of Nanotransistors

Need diagonal entries of and , A is block-tridiagonal:

, Typical values:

Mathematical Formulation

Current state of the art

Marching algorithm due to Anant et al

Computational complexity: Memory consumption: For a problem of size

this translates to 16GB ( ) and 32GB ( ) of memory!

[Anant02] - A. Svizhenko, M. P. Anantram, T. R. Govindan, B. Biegel, and R. Venugopal. Two-dimensional quantum mechanical modeling of nanotransistors. Journal of Applied Physics, 91(4):2343–2354, 2002.

Comparable computational efficiency:

Similar numerical conditioning

Significantly reduced memory requirements:

allowing for large problems to be run on a single desktop computer

Flexibility to distribute computation across multiple processors, due to its inherent ability to be parallelized

New divide and conquer algorithm

Compute inverse of block-tridiagonal matrices

Adjust for “low rank” correction term(Procedure can be continued recursively)

Approach

Correction terms

Low rank

Inverting

Computing and :

Updating first block-row and last block-column too costly

Instead, accumulate low-rank maps that underlie updates

1A

12S

11S

Applying low-rank corrections

For combining sub-problems, and for computing diagonal entries of inverse

Maps depend on corner blocks of sub-problem solutions

Matrix maps

12S

11S

1A

Parallel Implementation

Separate problem into . divisions

Data passed to first division is:

1

2

3

4

IV III I II Each CPU only

modifies its matrix maps

Information exchanged at each combining step: Few . matrices

I

II

III

IV

Parallel Implementation

Single computer implementation

Problem separated into divisions First pass: Divisions solved one after

the other, and matrix maps computed

Second pass: Divisions re-solved for first block-row and last block-column of the inverses, and matrix maps applied to get final answer

Computation There are divisions; computation of first

block-row and block column of each division requires computation

There are combining stages. During each stage, for each division, map update requires computation

Total: Single computer: CPUs:

Memory

There are divisions; storage of first block-row and block column of each division requires memory

For each division, maps require storage

Total: Single computer: CPUs:

Results*

** as compared to [Anant02] with Nx = 100

*

* All results are reported for the Retarded Green’s Function (Gr)

Single Computer (min)

**

Multi-processor (min)

Conclusions Mathematical problems underlying

applications presented well-studied Examples of recent work:

Hierarchical (H) matrices (Hackbush et al.) Nested dissection (Darve et al.)

Much of recent work in general settings Presented work closer to application end;

potential to exploit problem-specific information at the expense of generality

on the modeling and simulation of large-scale systems

Documents

tridiagonal casesimulation

tridiagonal bandmatchingthen

band entries

tridiag entries

blockbanded blocks

problem of size

large matrices size

simple case