Transcript
Page 1: A High-Performance Preconditioned SVD Solver for ...sc15.supercomputing.org/sites/all/themes/SC15...A High-Performance Preconditioned SVD Solver for Accurately Computing Large-Scale

A High-Performance Preconditioned SVD Solver for Accurately Computing Large-Scale Singular Value Problems in PRIMME

Lingfei Wu, Andreas Stathopoulos (Advisor), Eloy Romero (Advisor), College of William and Mary, Williamsburg, Virginia, USA Introduction

v Experimental setup: Ø  Compare PRIMME_SVDS against TRLBD, Krylov-Schur (KS), Generalized

Davidson (GD), and Jacob-Davidson (JD) in SLEPc Ø  Seek a small number of extreme singular triplets W/ and W/O preconditioning Ø  Matrices and vectors distributed on total 1024 processes on Edison (NERSC) or 96 processes on SciClone (W&M)

v Our approach: a preconditioned hybrid, two-stage SVD method on top of the state-of-the-art eigensolver PRIMME

Experiments and Results

PRIMME_SVDS: A High-Performance SVD Solver

State-of-The-Art SVD Solvers

v The problem: find k extreme singular triplets of Am×n Aνi = σiui , i = 1, 2, …, k , k << n

v  Iterative methods for computing SVD: �  Hermitian eigenvalue problem on C = ATA or C = AAT �  Hermitian eigenvalue problem on B = [0 AT; A 0] �  Lanczos bidiagonalization method (LBD)

A = PBdQT and Bd = XΣYT where U = PX and V = QY. v Comparison of different approaches

Fig. 3: Comparing different methods on relat9 and delaunay_n24

v Our goal: solve large-scale, sparse SVD problems with unprecedented efficiency, robustness and accuracy

v Scarcity of SVD software for large-scale problems

v A clear need for a high quality SVD solver software Ø  Implement state-of-the-art methods with massively parallelism Ø  Compute smallest singular triplets effectively and accurately Ø  Leverage preconditioning to deal with growing size and

difficulty of real-world problems Ø  Flexible interface suitable for black-box and advanced usage

Software State-of-the-art Methods

Classic Methods

Lang MPI/SMP

Precond-itioning

Extreme SVs

Fast Full Accuracy

PRIMME PRIMME_SVDS Multimethod C Both Y Y Y

N IRRHLB N/A Matlab N N Y Y

N IRLBA N/A Matlab N N Y Y

N JDSVD N/A Matlab N Y Y Y

N SVDIFP N/A Matlab N Y Y N

SLEPc N/A Many C MPI Y Y N

PROPACK N/A LBD F77 SMP N Y N

SVDPACK N/A Lanczos F77 N N Y N

Applications DNA Least squares Graph clustering QCD

Matrix relat9 cage15 delaunay_n24 Laplacian

Dimension 12,360,060×549,336 5,154,859 16,777,216 8,000 per proc

NNZ 38,955,420 99,199,551 100,663,202 55,760 per proc

Target Smallest Smallest Largest Largest

Preconditioning N/A BoomerAMG N/A N/A

Eigenmethod on C

•  Fast for largest SVs •  Slow for smallest SVs •  Achieve accuracy of

O(κ(A)||A||εmach)

Eigenmethod on B

•  Slow for largest SVs •  Extremely slow for

smallest SVs (interior eigenvalue problem)

•  Achieve accuracy of O(||A||εmach)

LBD on A

•  Fast for largest SVs •  Similar to C but

exhibits irregular convergence for smallest SVs

•  Achieve accuracy of O(||A||εmach)

GD+k on C + dynamic tolerance

JDQMR on B + initial guesses

from C + refined projection

PRIMME_SVDS: a preconditioned two-stage SVD

method

v Results I: methods comparison (low accuracy)

0 500 1000 1500 2000 2500 300010−10

10−5

100

105

Matvecs

||AT u

− m

v||

A=diag([1:10 1000:100:1e6]), Prec=A+rand(1,10000)*1e4

GD+1 on BGD+1 on C to limitGD+1 on C to eps||A||2switch to GD+1 on B

Fig. 2: Hybrid, two-stage SVD method and an example

This work is supported by NSF under grants No. CCF 1218349 and ACI SI2-SSE 1440700, and by DOE under a grant No. DE-FC02-12ER41890

v Key components: Ø  Stage I: fast convergence on matrix C to compute approximate

eigenvalues and eigenvectors Ø  Stage II: improve accuracy by exploiting power of PRIMME and

specialized refined projection on matrix B v Parallel Implementation of PRIMME_SVDS Ø  User defined data distribution among processes Ø  User defined parallel matrix-vector and preconditioning functions Ø  User provided global summation for dot products

Table 3: Dimension of the matrices and parameters used in the experiments

Fig. 1: Advantages and disadvantages of different approaches

Table 1: Functionalities of different SVD solvers

v Results II: scalability performance (high accuracy)

Fig. 4: Strong scaling: seek smallest on relat9 and cage15

Fig. 5: Strong and Weak scaling: seek largest on delaunay_n24 and Laplacian

v Highlights of PRIMME_SVDS: Ø  Among the fastest and most robust production level software for

computing a small number of singular triplets Ø  Exploits preconditioning for large-scale problems Ø  Computes efficiently smallest SVs in full accuracy Ø  Demonstrates good scalability under both strong and weak scaling even for highly sparse matrices Ø  Free software, available at: https://github.com/primme/primme

Number of MPI Processes16 32 64 128 256 512

Run

time

(Sec

ond)

0

500

1000

1500Seeking Smallest without Preconditioning

Para

llel S

peed

up

16

32

64

128

256

Number of MPI Processes16 32 64 128 256 512 1024

Run

time

(Sec

ond)

0

500

1000

1500Seeking Smallest with Preconditioning

Para

llel S

peed

up

16

32

64

128

256

512

Number of MPI Processes16 32 64 128 256 512

Run

time

(Sec

ond)

0

500

1000

1500

2000Seeking Largest without Preconditioning

Para

llel S

peed

up

16

32

64

128

256

512

Number of MPI Processes64 125 216 343 512 729 1000

Run

time

(Sec

ond)

0

0.005

0.01

0.015

0.02Seeking Largest without Preconditioning

Para

llel E

ffici

ency

0

0.25

0.5

0.75

1

Operations Kernels or Libs Cost per iteration Scalability Dense algebra: MV, MM, Inner Prods, Scale

BLAS (eg, MKL, ESSL, OpenBlas, ACML)

O((m+n)*numSVs) Good

Sparse algebra: SpMV, SpMM, Preconditioner

User defined (eg, PETSc, Trilinos, HYPRE, librsb)

O(1) calls Application dependent

Reduction User defined (eg, MPI_AllReduce)

O(1) calls of size O(numSVs)

Machine dependent

Table 2: Parallel characteristics of PRIMME operations

Number of MPI Processes20 40 60 80 100

Spee

dup

over

TR

LBD

100

101

102 Seeking Smallest without Preconditioning

PRIMME_SVDSJD on CKS on CTRLBD

Number of MPI Processes20 40 60 80 100

Spee

dup

Ove

r TR

LBD

1

1.5

2

2.5

3

3.5

4Seeking Largest without Preconditioning

PRIMME_SVDSGD on CKS on CTRLBD

Top Related