tensor contraction engine & extensible many-electron theory module in nwchem

27
Tensor contraction engine & extensible many-electron theory module in NWChem So Hirata Pacific Northwest National Laboratory MSS group meeting (24 Oct, 2002)

Upload: uta

Post on 08-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Tensor contraction engine & extensible many-electron theory module in NWChem. So Hirata Pacific Northwest National Laboratory MSS group meeting (24 Oct, 2002). Collaborators & Sponsors. M. Nooijen (Princeton University) R. J. Harrison & D. Bernholdt (Oak Ridge National Laboratory) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tensor contraction engine & extensible many-electron theory module in NWChem

Tensor contraction engine& extensible many-

electron theory module in NWChem

Tensor contraction engine& extensible many-

electron theory module in NWChem

So HirataPacific Northwest National

Laboratory

MSS group meeting (24 Oct, 2002)

Page 2: Tensor contraction engine & extensible many-electron theory module in NWChem

2

Collaborators & SponsorsCollaborators & Sponsors

• M. Nooijen (Princeton University)• R. J. Harrison & D. Bernholdt (Oak Ridge National

Laboratory)• D. Cociorva, G. Baumgartner, R. Pitzer, & P.

Sadayappan (Ohio State University)• J. Ramanujam (Louisiana State University)

• Office of Basic Energy Science, Department of Energy

• Office of Biological and Environmental Research, Department of Energy

Page 3: Tensor contraction engine & extensible many-electron theory module in NWChem

3

Purpose of this projectPurpose of this project

• Create a high-level symbolic manipulation language that derives working equations of second-quantized many-electron theories and implement them automatically• Expedites complex and error-prone many-

electron theory implementation• Helps develop and examine new theories or

approximations• Facilitates parallelization and other laborious

code optimizations• CCSDT T3 code is >18000 lines in Fortran77!

Page 4: Tensor contraction engine & extensible many-electron theory module in NWChem

4

Operator contraction engine (OCE)Operator contraction engine (OCE)

• Object-oriented symbolic manipulation program that derives working equations from any well-defined second-quantized many-electron theory ansatz

• Performs valid contractions of normal-ordered operators according to Wick’s theorem and reduces any given ansatz into the simplest form of tensor contraction expressions

• Consolidates identical terms and recognizes terms that are related by permutation symmetry

Page 5: Tensor contraction engine & extensible many-electron theory module in NWChem

5

Tensor contraction engine (TCE)Tensor contraction engine (TCE)

• Object-oriented symbolic manipulation program that analyzes tensor contraction expressions and implement them into efficient programs

• Breaks down multiple tensor contractions (A=BCDE) into a sequence of elementary tensor contractions (X=DE; Y=BX; A=YC) with minimal operation costs

• Factorizes common contractions [X=BC+BD into X=B(C+D)]

• Generates debug-level Fortran90 programs and release-level parallel Fortran77 programs

Page 6: Tensor contraction engine & extensible many-electron theory module in NWChem

OCE & TCE demonstration

OCE & TCE demonstration

Page 7: Tensor contraction engine & extensible many-electron theory module in NWChem

7

What is new?What is new?

• Full exploitation of index permutation symmetry• Taking advantage of spin and spatial

symmetry also, the programs generated by TCE are theoretically operation cost minimal

• OCE extracts permutation symmetries among working equations

• TCE breaks down permutation operators into elementary permutation operators, analyzes which permutation symmetries can be exploited, and reflects the result to the generated codes

Page 8: Tensor contraction engine & extensible many-electron theory module in NWChem

8

Permutation symmetryPermutation symmetry

• Primitive tensors that appear in many-electron theories possess “permutation anti-symmetry.” For example, a two-electron integral tensor and a three-electron excitation amplitude tensor have the following properties: qp

srqprs

pqsr

pqrs vvvv

cbakji

cbakij

cbajki

cbajik

cbaikj

cbaijk

cabkji

cabkij

cabjki

cabjik

cabikj

cabijk

bcakji

bcakij

bcajki

bcajik

bcaikj

bcaijk

backji

backij

bacjki

bacjik

bacikj

bacijk

acbkji

acbkij

acbjki

acbjik

acbikj

acbijk

abckji

abckij

abcjki

abcjik

abcikj

abcijk

tttttt

tttttt

tttttt

tttttt

tttttt

tttttt

Page 9: Tensor contraction engine & extensible many-electron theory module in NWChem

9

ImplicationImplication

• Reduced storage size• Instead of storing full , we may keep only

• Reduced operation cost by shorter summation index ranges

• Reduced operation cost by shorter target index ranges• Instead of computing full , we may

obtain just

abijt

bajit

dc

abdc

dcij

dc

abcd

cdij vtvt 2

,

dc

abdc

dcij vt2

dc

badc

dcji vt2

Page 10: Tensor contraction engine & extensible many-electron theory module in NWChem

10

ChallengesChallenges

• What is the index permutation symmetry of an intermediate tensor?• Consider the intermediate

• What is the best way to store just the non-redundant elements of tensors?

• What is the operation cost minimal contraction of two tensors with permutation symmetry?

• How can TCE generate a code that exploits spin, spatial, and permutation symmetries at the same time?

bj

ai

abij tti

Page 11: Tensor contraction engine & extensible many-electron theory module in NWChem

11

Index permutation symmetry versus permutation symmetry of tensor contraction expressions

Index permutation symmetry versus permutation symmetry of tensor contraction expressions

• Index permutation anti-symmetry ultimately reflects the Pauli principle of fermions; any tensor having electron indices (such as integrals, excitation amplitudes) is anti-symmetric• When there is such a multiple tensor contraction

as

there “must” be also

dnm

mnid

cm

abdjkn

abcijk vtti

,,

mnid

cm

abdjkn

mnid

cm

abdjkn

mnid

cm

abdjkn

mnid

cm

abdjkn

mnid

cm

abdjkn

mnid

cm

abdjkn

mnid

cm

abdjkn

mnid

cm

abdjkn

vttkiPcbPvttkiPcaPvttjiPcbPvttjiPcaP

vttkiPvttjiPvttcbPvttcaP

)()()()()()()()(

)()()()(

Page 12: Tensor contraction engine & extensible many-electron theory module in NWChem

12

Break down of permutation operators

Break down of permutation operators

• When breaking down a multiple tensor contraction into a sequence of binary tensor contractions, we should break down the permutation operators appropriately, so that each intermediate has maximum index permutation symmetries

mnid

cm

abdjkn

abcijk vttjkiPabcPr )/()/(

m

abmijk

cm

abcijk

nd

mnid

abdjkn

abmijk

itabcPr

vtjkiPi

)/(

)/(,

Page 13: Tensor contraction engine & extensible many-electron theory module in NWChem

13

What is the best way to store an intermediate?

What is the best way to store an intermediate?

• An intermediate tensor has much more limited index permutation symmetries. Super (sub) indices are categorized into global targets and local targets, and permutation anti-symmetry exists among just global targets and among just local targets. So in general, the non-redundant elements are: pn

qm

ggggggggi

321321

321321

,,

Page 14: Tensor contraction engine & extensible many-electron theory module in NWChem

14

What is the general form of tensor contraction with permutation

symmetry?

What is the general form of tensor contraction with permutation

symmetry?

• Expansion

Note that an excitation amplitude tensor will not have local target indices. This is because two excitation amplitudes cannot contract (as they have super particles, sub holes

structures).

txn

uym

pn

qm

ccggccgg

gg

gg ii

111

111

11

11

,,,,

,

,

up

tq

n

m

ccgg

ccggaaii tt

11

11

1

1

,

,

Page 15: Tensor contraction engine & extensible many-electron theory module in NWChem

15

What is the general form of tensor contraction with permutation

symmetry?

What is the general form of tensor contraction with permutation

symmetry?

• Contraction

Note that at least one of the two tensors is always an excitation amplitude tensor.

pxn

qym

t u

up

tq

txn

uym

gggggggg

cc cc

ccggccgg

ccggccgg

i

titu

111

111

1 1

11

11

111

111

,,,,

,,

,,,,!!

Page 16: Tensor contraction engine & extensible many-electron theory module in NWChem

16

What is the general form of tensor contraction with permutation

symmetry?

What is the general form of tensor contraction with permutation

symmetry?

• Compressionpxn

qym

xvpn

ywqm

gggg

gggg

gg

gg iPi

111

111

11

11

,,

,,

,

,

Page 17: Tensor contraction engine & extensible many-electron theory module in NWChem

17

Spin & spatial symmetrySpin & spatial symmetry

• Spin symmetry

• Spatial symmetry

indices

subscriptindices

tsuperscrip

qq

pp ss

symmetricTotally zqp

Page 18: Tensor contraction engine & extensible many-electron theory module in NWChem

18

An exampleAn example

d

cldi

dbkj

cblkji vtbcPjkiPx /,

LOOP OVER b,j<=k BLOCKS LOOP OVER l,c,i BLOCKS LOOP OVER d BLOCKS IF (b<=d) READ t(b<=d,j<=k) IF (d<b) READ t(d<b,j<=k) READ v(l<c,i<d) ! Always holes < particles IF (spin/spatial sym block of t is non-zero) THEN IF (spin/spatial sym block of v is non-zero) THEN MAKE x(l,b,c,i,j<=k) BLOCK BY DGEMM IF (b<=c and i<=j) ACCUM x(l,b<=c,i<=j<=k) IF (b<=c and j<=i,i<=k) ACCUM -x(l,b<=c,j<=i<=k) IF (b<=c and k<=i) ACCUM x(l,b<=c,k<=i<=j) IF (c<=b and i<=j) ACCUM -x(l,c<=b,i<=j<=k) IF (c<=b and j<=i,i<=k) ACCUM x(l,c<=b,j<=i<=k) IF (c<=b and k<=i) ACCUM -x(l,c<=b,k<=i<=j) END IF ! Note that b=c, i=j block is accumulated END IF ! multiple times END LOOP END LOOPEND LOOP

Page 19: Tensor contraction engine & extensible many-electron theory module in NWChem

19

Extensible many-electron theory module in NWChem

Extensible many-electron theory module in NWChem

• “Extensible” because a new many-electron method can be added relatively easily by TCE

• Very general tensor storage interface (needs only size & offsets of one-dimensional compressed tensor arrays; intermediate arrays’ offsets are also computed in run-time by programs generated by TCE )

• Compatible one- and two-electron integral transformation codes and offset generators

Page 20: Tensor contraction engine & extensible many-electron theory module in NWChem

20

OptimizationsOptimizations

• Spin, spatial, permutation symmetries• Dynamic tiling (orbital ranges are “tiled” (or

blocked) into smaller section so that the peak local memory usage does not exceed the user-specified limit)

• Dynamic load balancing parallelism (each tile-level tensor contraction is carried out in one processor with virtually no communication)

• Disk I/O is based on Shared File Library of ParSoft, which allows one-sided (independent) read/write without Global Array cache

• Local sorting of array elements (so that the composite summation indices become contiguous in memory) followed by local DGEMM (with absolutely no communication in this critical step)

Page 21: Tensor contraction engine & extensible many-electron theory module in NWChem

21

Previous & new algorithmsPrevious & new algorithms

DRA DRADRADRADRA

GA

MAGA to MA sort (communications!)

Collective I/O (synchronization!) & GA cache

SF SFSFSFSF

MAMA to MA sort (no communications!)

One-sided I/O (no synchronization!)

MA

Page 22: Tensor contraction engine & extensible many-electron theory module in NWChem

22

Methods availableMethods available

• Various spin-unrestricted coupled-cluster methods• LCCD, CCD, LCCSD, CCSD, CCSDT• More to follow (higher CC, CI, MBPT, EOM-CC,

etc.)

• Input syntax• Uses NWDFT module for the ground statedft

xc Hfexch 1.0end

tceccsdthresh 1e-6maxiter 100end

task tce energy

Page 23: Tensor contraction engine & extensible many-electron theory module in NWChem

23

A sample output (water CCSD/sto-3g)

A sample output (water CCSD/sto-3g)

NWChem General Electron-Correlation Theory Module ------------------------------------------------- Programs generated by a Tensor Contraction Engine

General Information ------------------- Wavefunction type : Restricted No. of electrons : 10 Alpha electrons : 5 Beta electrons : 5

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Correlation Information ----------------------- Calculation type : Coupled-cluster singles & doubles (CCSD) Max iterations : 100 Residual threshold : 0.10E-09

Memory Information ------------------ Available GA+MA space size is 26213624 doubles

Maximum block size 50 doubles

Page 24: Tensor contraction engine & extensible many-electron theory module in NWChem

24

A sample output (continued)A sample output (continued) Suggested orbital blocking

Block Spin Irrep Size Offset ----------------------------------------- 1 alpha a 5 doubles 0 2 beta a 5 doubles 5 3 alpha a 2 doubles 10 4 beta a 2 doubles 12

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

2-e file size = 5443 2-e file name = ./temp.v2 Cpu time / sec 0.0

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

t2 file size = 300 t2 file name = ./temp.t2 Cpu time / sec 0.0

MBPT(2) correlation energy = -0.035867246917899 hartree MBPT(2) total energy = -74.998530309066552 hartree Cpu time / sec 0.0

Page 25: Tensor contraction engine & extensible many-electron theory module in NWChem

25

A sample output (continued)A sample output (continued) ------------------------------------------------------- Iter Residuum Correlation Cpu/Sec ------------------------------------------------------- 1 0.089123237955088 -0.035867246917899 0.1 2 0.031759620132034 -0.045406888265697 0.1 3 0.012682891602275 -0.048387005902666 0.1 4 0.005383277884425 -0.049437059764660 0.1 5 0.002395445228466 -0.049839118488995 0.1 6 0.001110827268269 -0.050002172402908 0.1

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

26 0.000000002031284 -0.050127328255753 0.1 27 0.000000001066715 -0.050127328323605 0.0 28 0.000000000560286 -0.050127328359134 0.1 29 0.000000000294338 -0.050127328377747 0.1 30 0.000000000154649 -0.050127328387501 0.1 31 0.000000000081266 -0.050127328392616 0.1 ------------------------------------------------------- CC iteration converged CCSD correlation energy = -0.050127328392616 hartree CCSD total energy = -75.012790390541269 hartree

Task times cpu: 2.0s wall: 2.4s

Page 26: Tensor contraction engine & extensible many-electron theory module in NWChem

26

PerformancePerformance

• Titan spin-adapted parallel CCSD code • H2O CCSD/cc-pVTZ

Energy = – 0.2850225 hartree1 node sym=off 16.8 secs/iter1 node sym=on 16.6 secs/iter2 nodes sym=off 8.2 secs/iter2 nodes sym=on 8.3 secs/iter

• Present spin-unrestricted parallel CCSD code• H2O CCSD/cc-pVTZ

Energy = – 0.2850225 hartree1 node sym=off 49.1 secs/iter1 node sym=on 14.5 secs/iter2 nodes sym=off 25.2 secs/iter2 nodes sym=on 7.5 secs/iter

Spin-unrestricted code has to deal with 3 times as many t-amplitudes as does spin-adapted code, so theoretically spin-adapted code should be 3 times as fast as spin-unrestricted code

Page 27: Tensor contraction engine & extensible many-electron theory module in NWChem

27

Future plansFuture plans

• CCSDTQ, CI, MBPT, EOM-CC implementation• What is the appropriate tensor formulation for MBPT? (are the

MBPT denominators tensors?) See Head-Gordon et al.• “Persistent intermediates” (or the so-called similarity

transformed Hamiltonian matrix elements) in EOM-CC

• CC(2)PT(2) implementation• Post-CCSD(T) O(n7) method that includes perturbative

quadruples

• Further optimization (loop fusion, more aggressive factorization, space-time tradeoffs, etc.) by computer scientist colleagues

• Modular extensibility of operator contraction engine• Active spaces (multi-reference methods)• Orbital rotations (atomic-orbital-based or local correlation

methods)