puffin:&an&embedded&domain1specific&language&for&...

35
LLNL-PRES-679383 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Puffin: An Embedded DomainSpecific Language for Exis<ng Unstructured Hydrodynamics Codes WOLFHPC 2015 Christopher Earl November 16 th , 2015

Upload: others

Post on 24-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Puffin:  An  Embedded  Domain-­‐Specific  Language  for  Exis<ng  Unstructured  Hydrodynamics  Codes    WOLFHPC  2015  

Christopher  Earl  November 16th, 2015

Page 2: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 2  

Puffin  is  an  experimental  C++98  template  meta-­‐programming  language,  targeCng:    

Unstructured  hydrodynamics  calculaCons  IniCally  LULESH  2.0,  a  Lawrence  Livermore  proxy  applicaCon  

 

MulCple  architectures  Currently,  just  supports  single-­‐thread  CPU  

Eventually,  mulC-­‐thread  CPU,  GPU,  and  Xeon  Phi    

ExisCng  (C++)  projects  /  codes  

Overview  

Page 3: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 3  

In  relevant  C++  header  and  source  files,  add:  #include <Puffin.h> !

!

To  the  exisCng  build  system,  add  dependencies  to  Puffin*.h  (16  files)    

Future  addiCons  may  require  opConal  dependencies  For  example,  NVIDIA’s  nvcc  will  be  required  for  GPU  execuCon  

Adding  Puffin  to  an  Exis<ng  Project  

Page 4: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 4  

Puffin’s  basic  “space”  abstracCon  is  an  aspect:  Data/memory  “space:”  dimensions  of  arrays  

IteraCon  “space:”  Loop  dimensions    

Aspects  are  user/project  defined:  

typedef PuffinAspect<0, 3, PuffinStyleContainer> DimAspect; ! typedef PuffinAspect<1, -1, PuffinStyleArray> NodeAspect; ! typedef PuffinAspect<2, -1, PuffinStyleArray> ElemAspect; !

 

Puffin  Aspects  

Unique  ID/integer  

Extent/size  (-­‐1  means  non-­‐constant  size)  

Memory  layout/style  (ignore  for  now)  

Page 5: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 5  

Puffin  Arrays  Puffin’s  Basic  Variables    

Single-­‐  and  mul<ple-­‐aspect  arrays:  

typedef PuffinArray<NodeAspect> !" " NodeArray; !

typedef PuffinArray<ElemAspect> !" " ElemArray; !

!typedef PuffinAspects<DimAspect, NodeAspect> !

" " DimNodeAspects; !typedef PuffinArray<DimNodeAspects> !

" " DimNodeArray; !

0  1   2  

N-­‐3  N-­‐2  

N-­‐1  NodeArray  

1   2  

3  

E-­‐4  E-­‐3  

E-­‐2  

0  

E-­‐1  

ElemArray  

DimNodeArray  

0  

1   2  

N-­‐3  N-­‐2  

N-­‐1  

x  y  z  

Page 6: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 6  

Puffin  Arrays  Puffin’s  Basic  Variables    

Declara<on  of  Puffin  arrays:  

NodeAspect Nodes (N); !ElemAspect Elem (E); !DimAspect Dim; !DimNodeAspects DimNodes(Dim, Nodes); !!NodeArray nodalMass(Nodes); !ElemArray energy (Elems); !DimNodeArray positions(DimNodes); !

0  1   2  

N-­‐3  N-­‐2  

N-­‐1  nodalMass  

1   2  

3  

E-­‐4  E-­‐3  

E-­‐2  

0  

E-­‐1  

energy  

posiCons  

0  

1   2  

N-­‐3  N-­‐2  

N-­‐1  

x  y  z  

Page 7: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 7  

Puffin  Provided  Arrays  

To  use  exis<ng  arrays  with  Puffin:  

typedef PuffinProvidedArray<ElemAspect> !" " ElemProvidedArray; !

double* m_energy = ...; //lulesh allocation !

ElemProvidedArray energy(Elems, m_energy); !!

1   2  

3  

E-­‐4  E-­‐3  

E-­‐2  

0  

E-­‐1  energy  

Page 8: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 8  

Unstructured  Meshes    

Order  of  values  (nodes,  elements,  etc.)  in  arrays  need  not  correspond  to  any  ordering  of  simulated  space.  

Nodes

0  1   2  

N-­‐3  N-­‐2  

N-­‐1  

1   2  

3  

E-­‐4  E-­‐3  

E-­‐2  

0  

E-­‐1  

Elements

Page 9: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 9  

Single  Assignment  Example  

for (Index_t i = 0; i < N; ++i) { ! domain.xdd(i) = domain.fx(i) / domain.nodalMass(i); ! domain.ydd(i) = domain.fy(i) / domain.nodalMass(i); ! domain.zdd(i) = domain.fz(i) / domain.nodalMass(i); !} !

force   mass  acceleraCon  

Accelera<on  calcula<on  in  LULESH  2.0:  

Page 10: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 10  

Single  Assignment  Example  

Dimensions  

Nodes  

force   mass  acceleraCon  

Accelera<on  calcula<on  in  LULESH  2.0:  

for (Index_t i = 0; i < N; ++i) { ! domain.xdd(i) = domain.fx(i) / domain.nodalMass(i); ! domain.ydd(i) = domain.fy(i) / domain.nodalMass(i); ! domain.zdd(i) = domain.fz(i) / domain.nodalMass(i); !} !

Page 11: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 11  

for (Index_t i = 0; i < N; ++i) { ! domain.xdd(i) = domain.fx(i) / domain.nodalMass(i); ! domain.ydd(i) = domain.fy(i) / domain.nodalMass(i); ! domain.zdd(i) = domain.fz(i) / domain.nodalMass(i); !} !

Single  Assignment  Example  

force   mass  acceleraCon  

domain.ddd()[Nodes][Dim] <<= domain.fd() / domain.nodalMass(); !

Accelera<on  calcula<on  in  LULESH  2.0:  

Page 12: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 12  

Single  Assignment  Example  How  it  works  

domain.ddd()[Nodes][Dim] <<= domain.fd() / domain.nodalMass(); !{ {

PuffinLhs<…> ! PuffinExpression<PuffinDivide<…>, …> !{ PuffinAssignment<…> ! (Immediately  executes)  

Page 13: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 13  

for( Index_t k=0 ; k<E ; ++k ) { ! Real_t vdov = domain.dxx(k) + domain.dyy(k) + domain.dzz(k) ; ! Real_t vdovthird = vdov/Real_t(3.0) ; !! domain.vdov(k) = vdov ; ! domain.dxx(k) -= vdovthird ; ! domain.dyy(k) -= vdovthird ; ! domain.dzz(k) -= vdovthird ; !} !

Mul<ple  Assignment  Example  Plus  temporary  values  and  summaCon  

Update  of  volume  deriva<ve  and  strains  in  LULESH  2.0:  

Volume  derivaCve  

Strains  {

Temp  value  

Page 14: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 14  

Volume  derivaCve  

Strains  {

Dimensions  

sum_over(Dim, domain.strains()) !

Temp  value  

Mul<ple  Assignment  Example  Plus  temporary  values  and  summaCon  

Update  of  volume  deriva<ve  and  strains  in  LULESH  2.0:  Elements  

for( Index_t k=0 ; k<E ; ++k ) { ! Real_t vdov = domain.dxx(k) + domain.dyy(k) + domain.dzz(k) ; ! Real_t vdovthird = vdov/Real_t(3.0) ; !! domain.vdov(k) = vdov ; ! domain.dxx(k) -= vdovthird ; ! domain.dyy(k) -= vdovthird ; ! domain.dzz(k) -= vdovthird ; !} !

Page 15: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 15  

ScalarValue vdovthird; !!puffin_foreach(Elems) ! (domain.vdov() |= sum_over(Dim, domain.strains())) ! (vdovthird |= domain.vdov() / 3.0) ! (domain.strains()[Dim] |= domain.strains() – vdovthird) ! .execute(); !

Mul<ple  Assignment  Example  Plus  temporary  values  and  summaCon  

Update  of  volume  deriva<ve  and  strains  in  LULESH  2.0:  

Page 16: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 16  

Mul<ple  Assignment  Example  How  it  works  

Foreach  Functor  

PuffinAssignment<…> !

PuffinAssignment<…> !

PuffinAssignment<…> !

puffin_foreach(Elems) !! (domain.vdov() |= ...) !! (vdovthird |= ...) !! (domain.strains()[Dim] |= ...) !! .execute(); !

Page 17: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 17  

Mul<ple  Assignment  Example  How  it  works  

} Foreach  Functor  (1  assignment)  puffin_foreach(Elems) !! (domain.vdov() |= ...) !! (vdovthird |= ...) !! (domain.strains()[Dim] |= ...) !! .execute(); !

Page 18: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 18  

Mul<ple  Assignment  Example  How  it  works  

} Foreach  Functor  (2  assignments)  

puffin_foreach(Elems) !! (domain.vdov() |= ...) !! (vdovthird |= ...) !! (domain.strains()[Dim] |= ...) !! .execute(); !

Page 19: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 19  

puffin_foreach(Elems) !! (domain.vdov() |= ...) !! (vdovthird |= ...) !! (domain.strains()[Dim] |= ...) !! .execute(); !

Mul<ple  Assignment  Example  How  it  works  

} Foreach  Functor  (3  assignments)  

ExecuCon  of  all  assignments  happens  in  a  single  loop  within  execute()  call.  

Page 20: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 20  

Indirect  Array  Example  

for (Index_t i = 0; i < domain.regElemSize(r) ; ++i) { ! Index_t elem = domain.regElemlist(r)[i]; ! Real_t vchalf ; ! compression[i] = Real_t(1.) / vnewc[elem] - Real_t(1.); ! vchalf = vnewc[elem] - delvc[i] * Real_t(.5); ! compHalfStep[i] = Real_t(1.) / vchalf - Real_t(1.); !} !

Region  indirecCon  

Start  of  EOS  calcula<on  in  LULESH  2.0:  

Page 21: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 21  

Indirect  Array  Example  

Region  indirecCon  

0  1   2  

E-­‐3  E-­‐2  

E-­‐1  

0  1  

R-­‐2  R-­‐1  

Start  of  EOS  calcula<on  in  LULESH  2.0:  Current  region  Element  

for (Index_t i = 0; i < domain.regElemSize(r) ; ++i) { ! Index_t elem = domain.regElemlist(r)[i]; ! Real_t vchalf ; ! compression[i] = Real_t(1.) / vnewc[elem] - Real_t(1.); ! vchalf = vnewc[elem] - delvc[elem] * Real_t(.5); ! compHalfStep[i] = Real_t(1.) / vchalf - Real_t(1.); !} !

Page 22: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 22  

Indirect  Array  Example  

typedef PuffinAspect<3, -1, PuffinStyleRelatedArray<ElemAspect> > ! SingleRegionAspect; !

Defini<on  of  region  aspect  and  array,  related  to  ElemAspect:  

typedef PuffinArray<SingleRegionAspect> ! SingleRegionArray; !

SingleRegionAspect CurReg(domain.regElemSize(r), ! domain.regElemlist(r));!

Based  on  r,  an  aspect  for  the  current  region  (CurReg)  can  be  instan<ated:  

Page 23: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 23  

Indirect  Array  Example  

ElemProvidedArray vnewc_p (Elems, vnewc); !ElemProvidedArray delvc_p (Elems, delvc); !SingleRegionArray compression (CurReg); !SingleRegionArray compHalfStep(CurReg); !ScalarValue vchalf; !!puffin_foreach(CurReg) ! (compression |= 1.0 / vnewc_p - 1.0) ! (vchalf |= vnewc_p – delvc_p * 0.5) ! (compHalfStep |= 1.0 / vchalf - 1.0) ! .execute(); !

Start  of  EOS  calcula<on  in  LULESH  2.0:  

Page 24: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 24  

Affilia<on  Example  

for( Index_t i=0; i<E; ++i ) { ! const Index_t *elemToNode = domain.nodelist(i); !! domain.fx(elemToNode[0]) += hgfx[0]; ! domain.fy(elemToNode[0]) += hgfy[0]; ! domain.fz(elemToNode[0]) += hgfz[0]; !! //... !! domain.fx(elemToNode[7]) += hgfx[7]; ! domain.fy(elemToNode[7]) += hgfy[7]; ! domain.fz(elemToNode[7]) += hgfz[7]; !} !

Update  of  force  in  LULESH  2.0  (scaSer):  

Element  to  node  indirecCon  

Page 25: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 25  

Affilia<on  Example  

Update  of  force  in  LULESH  2.0  (scaSer):  

Element  to  node  indirecCon  

Elements  

Dimensions  

Corners  

Element/Node  indirecCon  

for( Index_t i=0; i<E; ++i ) { ! const Index_t *elemToNode = domain.nodelist(i); !! domain.fx(elemToNode[0]) += hgfx[0]; ! domain.fy(elemToNode[0]) += hgfy[0]; ! domain.fz(elemToNode[0]) += hgfz[0]; !! //... !! domain.fx(elemToNode[7]) += hgfx[7]; ! domain.fy(elemToNode[7]) += hgfy[7]; ! domain.fz(elemToNode[7]) += hgfz[7]; !} !

Page 26: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 26  

Affilia<on  Example  

1   2  

3  

E-­‐4  E-­‐3  

E-­‐2  

0  

E-­‐1  

0   1  2  

N-­‐3  N-­‐2  

N-­‐1  Force  (Nodes)  

Hourglass  force  update  (Elems)  

Scaier  operaCon  is  efficient  but  not  thread-­‐safe.  

Page 27: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 27  

Affilia<on  Example  

1   2  

3  

E-­‐4  E-­‐3  

E-­‐2  

0  

E-­‐1  

0   1  2  

N-­‐3  N-­‐2  

N-­‐1  Force  (Nodes)  

Hourglass  force  update  (Elems)  

Gather  operaCon  is  thread-­‐safe.  

Page 28: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 28  

Affilia<on  Example  

1   2  

3  

E-­‐4  E-­‐3  

E-­‐2  

0  

E-­‐1  

0   1  2  

N-­‐3  N-­‐2  

N-­‐1  Force  (Nodes)  

Hourglass  force  update  (Elems)  

Gather  operaCon  is  thread-­‐safe.  

Page 29: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 29  

Affilia<on  Example  

Defini<on  of  corner  aspect:  typedef PuffinAspect<4, 8, PuffinStyleFixedArray> ! CornerAspect; !

typedef PuffinArray<DimElemCornerAspects> ! DimElemCornerArray; !

DimElemCornerArray hgfd_p(DimElemCorner); !

Defini<on  of  array  type  for  hgfd:  

Instan<a<on  of  hgfd:  

Page 30: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 30  

Affilia<on  Example  

typedef PuffinAffiliation<ElemAspect, NodeAspect, CornerAspect, true> ! ElemToNodeAffiliation; !typedef PuffinAffiliation<NodeAspect, ElemAspect, CornerAspect, false> ! NodeToElemAffiliation; !

Defini<on  of  affilia<ons  between  nodes  and  elements:  

DesCnaCon  aspect  

RelaConal  aspect  

Source  aspect  

Size  fixed  to  relaConal  aspect?  

An  affiliaCon  describes  how  one  aspect  relates  to  another.  

Page 31: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 31  

Affilia<on  Example  

ScaSer  version;  ElemToNode  affilia<on  used  over  a  statement:  

puffin_foreach(Elems) ! (ElemToNode(domain.fd()[Dim] |= domain.fd() + hgfd_p)) ! .execute(); !

puffin_foreach(Nodes)(Dim) ! (domain.fd() |= domain.fd() + NodeToElem.sum(hgfd_p)) ! .execute(); !

Gather  version;  NodeToElem  affilia<on  summing  over  an  expression:  

Current  syntax  is  not  final,  but  is  sufficient  for  now.  

Page 32: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 32  

Puffin  supports  all  but  about  10  loops  of  LULESH  2.0  Remaining  loops  require  “whole-­‐aspect”  calculaCons  

 

Puffin  slows  compile-­‐Cme  by  1.33x  to  3x  Mostly  depends  upon  compile-­‐Cme  opCons  

 

Puffin  slows  runCme  performance  by  5-­‐11%  Depends  upon  problem  size,  used  features,  and  internal  Puffin  opCmizaCons  

 

Results  can  be  maintained  (down  to  the  bit)  Requires  certain  compiler  flags  (to  maintain  consistent  64-­‐bit  precision)  

Puffin  version  of  LULESH  2.0  

Page 33: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 33  

LULESH  2.0  Performance  Results  

Puffin  has  roughly  5-­‐11%  runCme  overhead.  

10 20 30 40 50

0.8

0.9

1

1.1

1.2

0.99

1.04

0.9

1.03 1.031.041.08

0.95

1.1 1.11

Problem size (size input parameter)

Normalized

runtime(low

eris

better)

Comparison of Original LULESH 2.0 and Puffin Versions

Original Simple Features All Features

Figure 1: Comparison of the Original version ofLULESH 2.0 with the Simple Features and All Fea-tures versions. These numbers have been normal-ized based the Original version of LULESH 2.0.Thus, a time of 1.08 means that the All Featuresversion ran 8% slower than the Original version ofLULESH 2.0 for the test with problem size of 20.For the problem size 30 tests, the Original versionruns slower than expected, which explains why thePuffin versions are 5-10% faster. See Figure 2 formore typical results.

However, different compilers and different optimizations canproduce different results because of extra-precision oper-ations such as fused multiply-add (FMA). All versions ofLULESH 2.0 return different results because of these extra-precision operations, when used with different compilers ordifferent optimizations. Some compilers have flags that pre-vent the use of extra-precision operations, such as gcc’s float-store flag. When compiled with these flags, all version ofLULESH 2.0 presented in this paper produce the same re-sults, down to the same rounding error. Without these flags,all version of LULESH 2.0 presented in this paper producesame energy at the origin, which is LULESH 2.0’s standardcorrectness measurement, that ignores rounding error.

5.2 PerformanceFigures 1 and 2 show the runtime overhead of Puffin us-ing both the original and optimized loop structure. Thesetests were run on a BG/Q system with PPC A2 CPUs, us-ing Clang version 3.7 (optimization level 3 and aggressiveinlining). The “simple” version of Puffin has no more than5% runtime overhead, and the “full” version of Puffin has nomore than 11% runtime overhead.

The biggest cost to using Puffin is the compile-time over-head: As shown in Table 1, the “simple” version of Puf-fin takes nearly 2X time to compile, and the “full” ver-sion of Puffin takes almost 3X as long to compile. While

10 20 30 40 50

0.8

0.9

1

1.1

1.2

1.031.05 1.04 1.03 1.03

1.08 1.09 1.1 1.11 1.11

Problem size (input parameter)

Normalized

runtime(low

eris

better)

Comparison of Optimized Loop Structure Versions

Original Simple Features All Features

Figure 2: Comparison of the Optimized Originalversion of LULESH 2.0 with the Optimized Sim-ple Features and Optimized All Features versions.These numbers have been normalized based the Op-timized Original version of LULESH 2.0. Thus, atime of 1.08 means that the All Features versionran 8% slower than the Original version of LULESH2.0 for the test with problem size of 10.

this compile-time overhead is significant, most modern buildsystems support parallel compilation of multiple compila-tion units. This parallel compilation limits the impact ofthe compile-time overhead. Likewise, modern build sys-tems only recompile compilation units whose source codehas changed. Thus, during active development using Puffin,only the files that are modified must be recompiled.

6. RELATED WORKThere are many other domain-specific languages that havefunctionality and domains similar to Puffin, but none con-tain all of Puffin’s features. Nebo [4] and Liszt [3] are prob-ably the two closest projects. Nebo is a domain-specific lan-guage for solving partial differential equations on structuredmeshes. Nebo uses template meta-programming the same asPuffin. Nebo supports multiple backends, including a GPUbackend. The main difference between them is that Puffinis targeting unstructured meshes, whereas Nebo is strictlylimited to structured meshes.

Liszt [3] represents calculations by abstracting based on ge-ometry and spatial reasoning, similar to Puffin affiliations.Liszt supports both CPU- and GPU-based parallelism, butdoes not support incremental adoption. Thus, entire appli-cations must be written in Liszt to use Liszt.

RAJA [5] is an abstraction layer with similar goals to Puffin.Like Puffin, RAJA’s main goal is modifying existing projectso that the project is portable to multiple architectures. Un-like Puffin, RAJA focuses on changing as little about the ex-

Page 34: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

LLNL-PRES-679383 34  

Puffin  is  currently  a  prototype    

Puffin  can  be  adopted  incrementally  Useful  for  conCnuous  use  and  maintaining  results  

 

Main  benefit  is  low-­‐overhead  portable  code  with  potenCal  for  performance  Plan  to  support  single-­‐thread  CPU,  mulC-­‐thread  CPU,  GPU,  and  Xeon  Phi  

Summary  

Page 35: Puffin:&An&Embedded&Domain1Specific&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of