stochastic optimization of complex energy systems on high-performance computers cosmin g. petra...
TRANSCRIPT
![Page 1: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/1.jpg)
Stochastic Optimization of Complex Energy Systems on High-Performance Computers
Cosmin G. PetraMathematics and Computer Science Division
Argonne National [email protected]
SIAM CSE 2013
Joint work with Olaf Schenk(USI Lugano),
Miles Lubin (MIT), Klaus Gaertner(WIAS Berlin)
![Page 2: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/2.jpg)
Outline
Application of HPC to power-grid optimization under uncertainty
Parallel interior-point solver (PIPS-IPM)– structure exploiting
Revisiting linear algebra
Experiments on BG/P with the new features
2
![Page 3: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/3.jpg)
Stochastic unit commitment with wind power
Wind Forecast – WRF(Weather Research and Forecasting) Model– Real-time grid-nested 24h simulation – 30 samples require 1h on 500 CPUs (Jazz@Argonne)
3
1min COST
s.t. , ,
, ,
ramping constr., min. up/down constr.
wind
wind
p u dsjk jk jk
s j ks
sjk kj
windsjk
j
wik ksj
ndsk
jjk
j
c c cN
p D s k
p D R s k
p
p
S N T
N
N
N
N
S T
S T
Slide courtesy of V. Zavala & E. Constantinescu
Wind farmThermal generator
![Page 4: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/4.jpg)
4
Stochastic Formulation
Discrete distribution leads to block-angular LP
![Page 5: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/5.jpg)
5
Large-scale (dual) block-angular LPs
• In terminology of stochastic LPs:• First-stage variables (decision now): x0
• Second-stage variables (recourse decision): x1, …, xN
• Each diagonal block is a realization of a random variable (scenario)
Extensive form
![Page 6: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/6.jpg)
6
Computational challenges and difficulties
May require many scenarios (100s, 1,000s, 10,000s …) to accurately model uncertainty
“Large” scenarios (Wi up to 250,000 x 250,000) “Large” 1st stage (1,000s, 10,000s of variables) Easy to build a practical instance that requires 100+ GB of
RAM to solve Requires distributed memory
Real-time solution needed in our applications
![Page 7: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/7.jpg)
Linear algebra of primal-dual interior-point methods (IPM)
7
1
2T Tx Qx c x
subj. to.
Min
0
Ax b
x
Convex quadratic problem
0
T xQ Arhs
yA
IPM Linear System
1 1
1 1
2 2
2 2
01 2 0
0
0 00 0
0 00 0
0 00 0
0 0 00 0 0 0 0 0 0
T
T
TS
ST T T T
S
N
N
H BB A
H BB A
H BB A
A A A H AA
Two-stage SP
arrow-shaped linear system(modulo a permutation)
Multi-stage SP
nested
N is the number of scenarios
2 solves per IPM iteration - predictor directions - corrector directions
![Page 8: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/8.jpg)
8
Special Structure of KKT System (Arrow-shaped)
![Page 9: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/9.jpg)
9
Parallel Solution Procedure for KKT System
Steps 1 and 5 trivially parallel– “Scenario-based decomposition”
Steps 1,2,3 are >95% of total execution time.
![Page 10: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/10.jpg)
10
Components of Execution Time
Notice break in y-axis scale
![Page 11: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/11.jpg)
11
Scenario Calculations – Steps 1 and 5
Each scenario is assigned to an MPI process, which locally performs steps 1 and 5.
Matrices are sparse and symmetric indefinite (symmetric with positive and negative eigenvalues).
Computing is very expensive when solving with the factors of against non-zero columns of and multiplying from left with
4 hours 10 minutes wall time to solve a 4h-horizon problem with 8k scenarios on 8k nodes.
Need to run under strict time requirements – For example, solve 24h-horizon problem in less
than 1h
1Ti i iB K B
iBTiB
![Page 12: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/12.jpg)
12
Revisiting scenario computations for shared-memory
Multiple sparse right-hand sides Triangular solves phase hard to parallelize in shared-memory (multi-core) Factorization phase speeds up very well and achieves considerable peak-
performance Our approach: incomplete factorization of
Stop factorization after the elimination of (1,1) block will sit in the (2,2) block (Schur complement)
1"Compute SC" (Step 1): Ti i iB K B
0
Ti i
i
i
K BM
B
1Ti i iB K B
![Page 13: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/13.jpg)
13
Implementation
Requires modification of the linear solver PARDISO (Schenk) -> PARDISO-SC
Pivot perturbations during factorization needed to maintain numerical stability Errors due to perturbations are absorbed by iterative refinement This would be extremely expensive in our case (many right-hand sides) We let errors propagate in the “global” Schur complement C (Step 2)
Factorize the perturbed C (denoted by ) (Step 3) After Step 1, 2 and 3, we have the factorization of an approximation matrix
10
1
NTi i i
i
C K B K B
C
1 1
1 0
, i i
N NT T
N
K B
K K K KK B
B B K
![Page 14: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/14.jpg)
14
Pivot error absorption by preconditioned BiCGStab
Still we have to solve with
“Absorb errors” by solving Kz=r using preconditioned BiCGStab– Numerical experiments showed it is more robust than iterative refinement.
Preconditioner is Each BiCGStab iteration requires
– 2 mat-vecs: Kz– 2 applications of the preconditioner:
One application of the preconditioner resumes to performing “solve” steps 4 and 5 for
1 1
1 0
N NT T
N
K B
KK B
B B K
K
1K z
K
![Page 15: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/15.jpg)
15
Summary of the new approach
1
1
0 1
0 0 0
1
1. Calculate , 1, ("Compute S.C.")
2. Form : ("Form S.C.")
3. Factorize ("Factor S.C.")do BiCGStab iteration ...
apply preconditioner (steps 4
T
i i iN T
i iiT
B K B i N
C K B K B
C L D L
K z
1
and 5) mat-vec ...
apply preconditioner (steps 4 and 5) mat-vec while BiCGStab did not converged
Kz
K zKz
![Page 16: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/16.jpg)
16
Test architecture
“Intrepid” Blue Gene/P supercomputer– 40,960 nodes– Custom interconnect– Each node has quad-core 850 Mhz PowerPC processor, 2 GB
RAM
DOE INCITE Award 2012-2013 – 24 million core hours
![Page 17: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/17.jpg)
17
Numerical experiments
4h (UC4), 12h(UC12), 24h(UC24) horizon problems
1 scenario per node (4 cores per scenario)
Large-scale: 12h horizon, up to 32k scenarios and 128k cores (k=1,024)– 16k scenarios – 2.08 billion variables, 1.81 billion constraints, KKT
system size = 3.89 billion
LAPACK+SMP ESSL BLAS for first-stage linear systems PARDISO-SC for second-stage linear systems
![Page 18: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/18.jpg)
18
Compute SC Times
![Page 19: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/19.jpg)
19
Time per IPM iteration UC12, 32k scenarios, 32k nodes (128k cores)
BiCGStab iteration count ranges from 0 to 1.5 Cost of absorbing factorization perturbation errors is between 10 and 30% of total
iteration cost
![Page 20: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/20.jpg)
20
Solve to completion – UC12
Nodes/scens Wall time (sec) IPM Iterations Time per IPM iteration (sec)
4096 3548.5 103 33.57
8192 3883.7 112 34.67
16384 4208.8 123 34.80
32768 4781.7 133 35.95
Before: 4 hours 10 minutes wall time to solve UC4 problem with 8k scenarios on 8k nodes
Now: UC12
![Page 21: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/21.jpg)
21
Weak scaling
![Page 22: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/22.jpg)
22
Strong scaling
![Page 23: Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d215503460f949f60bd/html5/thumbnails/23.jpg)
23
Conclusions and Future Considerations
Multicore-friendly reformulation of sparse linear algebra computations lead to one order of magnitude faster execution times.
Fast factorization-based computation of SC Robust and cheap pivot errors absorption via Krylov iterative methods
Parallel efficiency of PIPS remains good.
Performance evaluation on today’s supercomputers– IBM BG/Q – Cray XK7, XC30