1 1 capabilities: serial (thread-safe), shared-memory (superlu_mt, openmp or pthreads),...

3
1 1 Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA). All implemented in C, having Fortran interface. Sparse LU decomposition, triangular solution with multiple right-hand sides. Incomplete LU (ILU) preconditioner in serial SuperLU. Sparsity-preserving ordering: Minimum degree ordering applied to A T A or A T +A [MMD, Liu `85] ‘Nested dissection’ ordering applied to A T A or A T +A [(Par)METIS, (PT)-Scotch User-controllable pivoting: partial pivoting, threshold pivoting, static pivoting. Condition number estimation. Iterative refinement. Componentwise error bounds. Download: www.crd.lbl.gov/~xiaoye/SuperLU Further information: Contact: Sherry Li, [email protected] Developers: Sherrry Li, Jim Demmel, John Gilbert, Laura Grigori, Piush Sao, Meiyue Shao, Ichitaro Yamazaki SuperLU – supernodal sparse LU direct solver

Upload: elwin-jennings

Post on 17-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

3 3  Over 26,000 downloads in FY SuperLU is mentioned in 5% of the NERSC projects (weighted by allocation size)  Used in many high-end simulation codes: ASCEM/Amanzi: Advanced Simulation Capability for Environmental Management, DOE Denovo: radiation transport simulations for nuclear reactors, DOE DGDFT: Dicontinuous Galerkin Method for Density Functional Theory, DOE FEAP: finite element analysis, UC Berkeley H2plus: water simulation code, DOE HiFi: multi-fluid modeling for plasma applications, U. Washington M3D-C1: plasma fusion energy, DOE NekTar: High-order spectral-element Navier-Stokes solver, NCAR NIMROD: plasma fusion energy, DOE Omega3P: accelerator cavity design, DOE OpenSees: earthquake engineering, Pacific Earthquake Engineering Research Center PMAMR: CCSE code for carbon sequestration, DOE PHOENIX: stellar and planetary atmosphere code QUEST: Quantum electron simulation toolbox, UC Davis VORPAL: Plasma physics simulation code, Tech-X  Adopted in many commercial mathematical libraries and simulation software, including AMD (circuit simulation), Boeing (aircraft design), Chevron, ExxonMobile (geology), Cray's LibSci, FEMLAB, HP's MathLib, IMSL, NAG, OptimaNumerics, Python (SciPy), Walt Disney Feature Animation. SuperLU usage and impact

TRANSCRIPT

Page 1: 1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA)

1

1

Capabilities:• Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory

(SuperLU_DIST, hybrid MPI+ OpenM + CUDA). All implemented in C, having Fortran interface.• Sparse LU decomposition, triangular solution with multiple right-hand sides.• Incomplete LU (ILU) preconditioner in serial SuperLU.• Sparsity-preserving ordering:

Minimum degree ordering applied to ATA or AT+A [MMD, Liu `85] ‘Nested dissection’ ordering applied to ATA or AT+A [(Par)METIS, (PT)-Scotch

• User-controllable pivoting: partial pivoting, threshold pivoting, static pivoting.• Condition number estimation.• Iterative refinement.• Componentwise error bounds.

Download: www.crd.lbl.gov/~xiaoye/SuperLU

Further information: • Contact: Sherry Li, [email protected]• Developers: Sherrry Li, Jim Demmel, John Gilbert, Laura Grigori, Piush Sao, Meiyue Shao,

Ichitaro Yamazaki

SuperLU – supernodal sparse LU direct solver

Page 2: 1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA)

2

2

Increased scalability via new DAG-based scheduling algorithms to shorten critical path.• Idle time (MPI_Wait) was significantly reduced (2.6x faster using 1000s cores)

Architecture-aware: exploit heterogeneous nodes• Offload fine-grained Schur-complement updates to GPU or MIC accelerators.

Programming: MPI + OpenMP + CUDA Pipeline execution of CPU and GPU tasks

• 3x faster on multi-GPU, or multi-Xeon Phi clusters, 2-5x reduction in memory usage.

“A distributed CPU-GPU sparse direct solver”, P. Sao, R. Vuduc and X.S. Li, Euro-Par 2014, LNCS Vol. 8632. Porto, Portugal, August 25-29, 2014. “A Sparse Direct Solver for Distributed Memory Xeon Phi-accelerated Systems”, P. Sao, X. Liu, R. Vuduc, and X.S. Li, X. Liu, IPDPS 2015, May 25-29, 2015.

SuperLU_DIST: Recent advances

CPU copy Accelerator copy

Pipeline execution:CPU & Accelerator

Page 3: 1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA)

3

3

Over 26,000 downloads in FY 2014. • SuperLU is mentioned in 5% of the NERSC projects (weighted by allocation size)

Used in many high-end simulation codes:• ASCEM/Amanzi: Advanced Simulation Capability for Environmental Management, DOE• Denovo: radiation transport simulations for nuclear reactors, DOE• DGDFT: Dicontinuous Galerkin Method for Density Functional Theory, DOE• FEAP: finite element analysis, UC Berkeley• H2plus: water simulation code, DOE• HiFi: multi-fluid modeling for plasma applications, U. Washington• M3D-C1: plasma fusion energy, DOE• NekTar: High-order spectral-element Navier-Stokes solver, NCAR• NIMROD: plasma fusion energy, DOE• Omega3P: accelerator cavity design, DOE• OpenSees: earthquake engineering, Pacific Earthquake Engineering Research Center• PMAMR: CCSE code for carbon sequestration, DOE• PHOENIX: stellar and planetary atmosphere code• QUEST: Quantum electron simulation toolbox, UC Davis• VORPAL: Plasma physics simulation code, Tech-X

Adopted in many commercial mathematical libraries and simulation software, including AMD (circuit simulation), Boeing (aircraft design), Chevron, ExxonMobile (geology), Cray's LibSci, FEMLAB, HP's MathLib, IMSL, NAG, OptimaNumerics, Python (SciPy), Walt Disney Feature Animation.

SuperLU usage and impact