on the modeling and simulation of large-scale systems
DESCRIPTION
On the modeling and simulation of large-scale systems. Venkataramanan (Ragu) Balakrishnan School of ECE, Purdue University Joint work with Stephen Cauley, Jitesh Jain, Hong Li, Cheng-Kok Koh (Purdue) and M. P. Anantram (NASA). Basic ideas. Many engineering system models are of a large scale - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/1.jpg)
On the modeling and simulation of large-scale systems
Venkataramanan (Ragu) Balakrishnan School of ECE, Purdue University
Joint work with Stephen Cauley, Jitesh Jain, Hong Li, Cheng-Kok Koh (Purdue) and M. P. Anantram (NASA)
![Page 2: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/2.jpg)
Basic ideas
Many engineering system models are of a large scale
However, most interactions are local Problems:
Modeling that captures locality Exploitation in simulation
![Page 3: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/3.jpg)
Outline
One modeling example One simulation example
![Page 4: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/4.jpg)
VLSI interconnect modeling
Interconnects are relatively long wires connecting circuit elements
To account for distributed effects: Wires broken into short
segments Segments further subdivided
into filaments (if necessary) Surfaces subdivided in panels (if
necessary) Result, a large-scale RLC model
![Page 5: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/5.jpg)
Model parameterized by very large matrices (size 104 or higher)
obtained by inverting , the potential matrix, which maps charges to voltages
obtained directly from magnetic vector potential; entries are self- and mutual-inductances
The interconnect model
![Page 6: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/6.jpg)
Model structure
is diagonal is approximated as
sparse is dense, but inverse is
approximately sparse Sparsity structure:
![Page 7: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/7.jpg)
Entries of and obtained via CAD tools
and approx. sparse Modeling issues:
Detecting approx. sparsity pattern and Approximating and s.t.
inverses are sparse, with Parameterization of and Efficient computation (say matrix-vector
multiplies) with and
Model extraction
![Page 8: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/8.jpg)
Some answers
Interconnect modeling is part of an engineering design flow
Partial answers available from design stage; e.g., sparsity pattern in and
Focus on approximation problem Begin with simple case: Given , find with tridiagonal
![Page 9: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/9.jpg)
Tridiagonal case
Key result: Suppose with tridiagonal. Under mild conditions:
Result from late 1950s Parameters computed as
Only tridiagonal entries of needed
![Page 10: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/10.jpg)
Tridiagonal band-matching
Given : Construct from tridiag entries of
:
Define:
![Page 11: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/11.jpg)
Tridiagonal band-matching
Then: is tridiagonal can be computed in Products and computable in We have Optimality?
minimizes Kullback-Leibler distance:
![Page 12: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/12.jpg)
80.1051.870.628.516.428.358.2
51.880.1051.870.628.516.428.3
70.651.880.1051.870.628.516.4
28.570.651.880.1051.870.628.5
16.428.570.651.880.1051.870.6
28.316.428.570.651.880.1051.8
58.228.316.428.570.651.880.10
10~ 11L
80.1051.822.745.690.547.513.5
51.880.1051.822.745.690.547.5
22.751.880.1051.822.745.690.5
45.622.751.880.1051.822.745.6
90.545.622.751.880.1051.822.7
47.590.545.622.751.880.1051.8
13.547.590.545.622.751.880.10
10 11L
A simple example
44.292.100000
92.195.392.10000
092.195.392.1000
0092.195.392.100
00092.195.392.10
000092.195.392.1
0000092.144.2
10ˆ 101L
![Page 13: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/13.jpg)
General case Given seek with block-banded ( blocks, block-size , block bandwidth
) “Band-matching” gives approximant s.t:
is block-banded, with In-band entries of and match is an “optimal” approximant specified by parameters, requiring computation Products and can be computed in
![Page 14: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/14.jpg)
Further issues
Numerical stability: Tridiagonal case:
Parameterization with ill-conditioned Instead of , use ratios , …
Extension possible to block-tridiagonal case
Simulation without explicit computation of parameters
Structure of matrices whose inverses have more general sparsity patterns
![Page 15: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/15.jpg)
Outline
One modeling example One simulation example
![Page 16: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/16.jpg)
Nano-scale simulation
Problem: Determine and evaluate dynamic behavior of the device
Macro-level simulation techniques of unacceptable accuracy
Need quantum mechanical modeling
![Page 17: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/17.jpg)
Nonequilibrium Green’s Function approach:
• Form Hamiltonian• Write out the equations of motion
for the retarded ( ) and less-than ( ) Green’s functions
• Solve for density of states and charge density
2D Simulation of Nanotransistors
![Page 18: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/18.jpg)
Need diagonal entries of and , A is block-tridiagonal:
, Typical values:
Mathematical Formulation
![Page 19: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/19.jpg)
Current state of the art
Marching algorithm due to Anant et al
Computational complexity: Memory consumption: For a problem of size
this translates to 16GB ( ) and 32GB ( ) of memory!
[Anant02] - A. Svizhenko, M. P. Anantram, T. R. Govindan, B. Biegel, and R. Venugopal. Two-dimensional quantum mechanical modeling of nanotransistors. Journal of Applied Physics, 91(4):2343–2354, 2002.
![Page 20: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/20.jpg)
Comparable computational efficiency:
Similar numerical conditioning
Significantly reduced memory requirements:
allowing for large problems to be run on a single desktop computer
Flexibility to distribute computation across multiple processors, due to its inherent ability to be parallelized
New divide and conquer algorithm
![Page 21: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/21.jpg)
Compute inverse of block-tridiagonal matrices
Adjust for “low rank” correction term(Procedure can be continued recursively)
Approach
Correction terms
Low rank
![Page 22: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/22.jpg)
Inverting
Computing and :
![Page 23: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/23.jpg)
Updating first block-row and last block-column too costly
Instead, accumulate low-rank maps that underlie updates
1A
12S
11S
Applying low-rank corrections
![Page 24: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/24.jpg)
For combining sub-problems, and for computing diagonal entries of inverse
Maps depend on corner blocks of sub-problem solutions
Matrix maps
12S
11S
1A
![Page 25: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/25.jpg)
Parallel Implementation
Separate problem into . divisions
Data passed to first division is:
1
2
3
4
![Page 26: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/26.jpg)
IV III I II Each CPU only
modifies its matrix maps
Information exchanged at each combining step: Few . matrices
I
II
III
IV
Parallel Implementation
![Page 27: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/27.jpg)
Single computer implementation
Problem separated into divisions First pass: Divisions solved one after
the other, and matrix maps computed
Second pass: Divisions re-solved for first block-row and last block-column of the inverses, and matrix maps applied to get final answer
![Page 28: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/28.jpg)
Computation There are divisions; computation of first
block-row and block column of each division requires computation
There are combining stages. During each stage, for each division, map update requires computation
Total: Single computer: CPUs:
![Page 29: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/29.jpg)
Memory
There are divisions; storage of first block-row and block column of each division requires memory
For each division, maps require storage
Total: Single computer: CPUs:
![Page 30: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/30.jpg)
Results*
** as compared to [Anant02] with Nx = 100
*
* All results are reported for the Retarded Green’s Function (Gr)
Single Computer (min)
**
Multi-processor (min)
![Page 31: On the modeling and simulation of large-scale systems](https://reader038.vdocument.in/reader038/viewer/2022110102/568135fa550346895d9d6ce8/html5/thumbnails/31.jpg)
Conclusions Mathematical problems underlying
applications presented well-studied Examples of recent work:
Hierarchical (H) matrices (Hackbush et al.) Nested dissection (Darve et al.)
Much of recent work in general settings Presented work closer to application end;
potential to exploit problem-specific information at the expense of generality