parallel multi-reference configuration interaction on jazz ron shepard (chm) mike minkoff (mcs) mike...

30
Parallel Multi- Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Upload: jasmin-gibbs

Post on 30-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Parallel Multi-Reference Configuration Interaction on

JAZZ

Ron Shepard (CHM)Mike Minkoff (MCS)Mike Dvorak (MCS)

Page 2: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

The COLUMBUS Program System

• Molecular Electronic Structure• Collection of individual programs that

communicate through external files• 1: Atomic-Orbital Integral Generation

2: Orbital Optimization (MCSCF, SCF)3: Integral Transformation4: MR-SDCI5: CI Density6: Properties (energy gradient, geometry optimization)

Page 3: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Real Symmetric Eigenvalue Problem

• Use the iterative Davidson Method for the lowest (or lowest few) eigenpairs

• Direct CI: H is not explicitly constructed, w=Hv are constructed in “operator” form

• Matrix dimensions are 104 to 109

• All floating point calculations are 64-bit

Page 4: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Davidson MethodGenerate an initial vector x1

MAINLOOP: DO n=1, NITER Compute and save wn = H xn

Compute the nth row and column of G = XTHX = WTX Compute the subspace Ritz pair: (G – 1) c = 0 Compute the residual vector r = W c – X c Check for convergence using |r|, c, , etc. IF (converged) THEN EXIT MAINLOOP ELSE Generate a new expansion vector xn+1 from r, , v=Xc, etc. ENDIFENDDO MAINLOOP

Page 5: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Matrix Elements

• Hmn = <m| Hop |n>

• |n> = |(r1) 1 (r2)2 … (rn)n | with j=,

H op =−h2

2mej

n

∑ ∇ j2 +

ZeZa

r j − Raa

Nuc

∑j

n

∑ +Ze

2

r j − rkj<k

n

≡ … ∫∫ dr1dr2…drn∫

Page 6: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Hmn = hpq m E pq np,q

Norb

∑ + 12 gpqrs m epqrs n

p,q,r,s

Norb

…Matrix Elements

• hpq and gpqrs are computed and stored as arrays (with index symmetry)

• <m|Epq|n> and <m|epqrs|n> are coupling coefficients; these are sparse and are recomputed as needed

Page 7: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

wm = Hmn xn

n

Ncsf

= hpq m E pq n xn

p,q

Norb

∑n

Ncsf

∑ + 12 gpqrs m epqrs n xn

p,q,r,s

Norb

∑n

Ncsf

Matrix-Vector Products

• The challenge is to bring together the different factors in order to compute w efficiently

• w = H x

Page 8: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Coupling Coefficient Evaluation

• Graphical Unitary Group Approach (GUGA)• Define a directed graph with nodes and arcs:

Shavitt Graph• Nodes correspond to spin-coupled states

consisting of a subset of the total number of orbitals

• Arcs correspond to the (up to) four allowed spin couplings when an orbital is added to the graph

Page 9: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

…Coupling Coefficient Evaluation

w,x,y,z

graph head

graph tail

Internal orbitals

External orbitals

Page 10: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

…Coupling Coefficient Evaluation

m E ij ′ m

Page 11: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Integral Types

• 0: gpqrs

• 1: gpqra

• 2: gpqab, gpa,qb

• 3: gpabc

• 4: gabcd

z y x w

z 0 1 2 2

y 1 0,2 1,3 1,3

x 2 1,3 0,2,4 2

w 2 1,3 2 0,2,4

Page 12: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Original Program (1980)

• Need to optimize wave functions for Ncsf=105 to 106

• Available memory was typically 105 words

• Must segment the vectors, v and w, and partition the matrix H into subblocks, then work with one subblock at a time.

Page 13: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

…First Parallel Program (1990)

• Networked workstations using TCGMSG• Each matrix subblock corresponds to a compute

task• Different tasks require different resources (pay

attention to load balancing)

• Same vector segmentation for all gpqrs types

• gpqrs, <m| epqrs |n>, w, and v were stored on external shared files (file contention bottlenecks)

Page 14: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Current Parallel Program• Eliminate shared file I/O by distributing data

across the nodes with the GA Library• Parallel efficiency depends on the vector

segmentation and corresponding H subblocking• Apply different vector segmentation for different

gpqrs types• Tasks are timed each Davidson iteration, then

sorted into decreasing order and reassigned for the next iteration in order to optimize load balancing

• Manual tuning of the segmentation is required for optimal performance

• Capable of optimizing expansions up to Ncsf=109

Page 15: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

COLUMBUS-PetaflopsApplication

Mike Dvorak, Mike MinkoffMCS Division

Ron ShepardChemistry Division

Argonne National Lab

Page 16: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Notes on software engineering

• PCIUDG parallel code– Fortran 77/90– Compiled with Intel/Myrinet on Jazz

• 70k lines in PCIUDG– 14 files containing ~205 subroutines

• Versioning system– Currently distributed in a tar file– Created a LCRC CVS repository for personal code

mods

Page 17: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Notes on Software Engineering (cont)

• Homegrown preprocessing system– Uses “*mdc*if parallel” statements to

comment/uncomment parts of the code– Could/should be replaced with CPP directives

• Global Arrays library– Provides global address space for matrix computation– Used mainly for chemistry codes but applicable for

other applications– Ran with most current version --> no perf gain– Installed on Softenv on Jazz (version 3.2.6)

Page 18: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Gprof Output• 270 subroutines called• loopcalc subroutine using ~20% of

simulation time• Added user defined MPE states to 50

loopcalc calls– Challenge due to large number of subroutines in

file– 2 GB file size severe limiter on number of procs– Broken logging

• Show actual output

Page 19: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Jumpshot/MPE Instrumentation

Live Demo of a 20 proc run

Page 20: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Using FPMPI

• Relinked code with FPMPI• Tell you total number of MPE calls made• Output file size smalled (compared to other tools

i.e. Jumpshot)• Produces a histogram of message sizes• Not installed in Softenv on Jazz yet

– ~riley/fpmpi-2.0

• Problem for runs– Double Zeta C2H4 without optimizing the load balance

Page 21: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Total Number of MPI calls

Page 22: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Max/Avg MPI Complete Time

Page 23: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Avg/Max Time MPI Barrier

Page 24: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

COLUMBUS Performance Results

Page 25: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

COLUMBUS Performance Data

R. Shepard, M. Dvorak, M. Minkoff

Page 26: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Timing of Steps (Sec.)

Time

Basis Set

Integral Time

Orbital

Opt. Time

CI Time

QZ 388 11806 382,221

TZ 26 104 31,415

DZ 1 34 3,281

Page 27: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Walks Vs. Basis Set (Millions)

Walk Type

Basis SetZ Y X W

Matrix

Dim.

cc-pVQZ.08 15 536 305 858

cc-pVTZ.08 7 120 69 198

cc-pVDZ.08 2 13 8 24

Page 28: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Timing of CI Iteration

Page 29: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Basic Model of PerformanceTime = C1+C2*N+C3/N

Page 30: Parallel Multi-Reference Configuration Interaction on JAZZ Ron Shepard (CHM) Mike Minkoff (MCS) Mike Dvorak (MCS)

Constrained Linear TermC2 > 0