ccaecloud phase i wrap-up phase i doe sbir stefan muszala, pi doe grant no de-fg02-08er85152 tech-x...

13
CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel, Babel Structs

Upload: jody-hamilton

Post on 30-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

CcaEcloud Phase I Wrap-upPhase I Doe SBIR

Stefan Muszala, PIDOE Grant No DE-FG02-08ER85152

Tech-X CorporationBoulder, CO

Updates: onRamp, FACETS+Babel,Babel Structs

Page 2: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

• Particle accelerator programs play a significant role in 14 out of 28 DOE laboratories which span a number of DOE offices such as the Offices of High Energy Physics (HEP), Nuclear Physics (NP) and Basic Energy Sciences (BES) (Facilities for the Future of Science)

• Accelerator simulation is required throughout the life-cycle of accelerators in four areas- Design- Analysis- Optimization- Upgrading

Accelerator simulations play vital near-, medium-, and long-term roles

Page 3: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

• Software reuse and common interfaces

• Ability to compose simulations

• Portability• Mixed language

programming interoperability

• Performance analysis of composed simulations

High-performance accelerator software should allow complex applications while promoting good software engineering practices

Page 4: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

• Channel Driver Component for isolating space charge kick calculation• Tweaked SIDL interfaces over what was used for the electron cloud

component

• Apply performance analysis• Model 1 processor performance while increasing problem size• Model 1-4 processor performance (two dual core AMD Opterons)

• Modifications to Bocca by Stephen Tramer• Splicer block protection• Bocca change• Bocca copy

Since the last CCA meeting finished Phase I work and prepared for Phase II

Page 5: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

The original Synergia2 channel driver exercised different space charge routines and provides a concise test-bed for a CCA implementation

Results are comparable

Hor

izon

tal W

idth

(M

)

Hor

izon

tal W

idth

(M

)

Longitudinal position (M)

Longitudinal position (M)

Page 6: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

After Instrumentation we can see Solve and Kick behavior even with substantial call overhead

Page 7: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

The source of overhead is due to existing Synergia2 method structure

Page 8: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

Single Processor Performance of the space charge calculation goes as as N3

• Number of Particles = N3, N=grid size, particles/cell=1

• Core Work consists of two triple-nested for loops over 3-dimensions: 6(N3)

• T1=Tu6(N3)+

• Tu = {min | max | average} for cell update. We use min.

• need to study whyandminTu)

Page 9: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

Multi-Processor Performance starts with Amdahl’s law

• Start with Amdahl’s law TP = S + Q/P but let f= S/(S + Q) be the fraction of serial work.

• Amdahl’s law is now: TP = fT1 + (1 − f )T1/P

• Account for Communication for PEs > 0

• Substitute for T1 and Tcomm

• Tu = {min | max | average} for cell update. From serial model We use min. for now.

• Need to actually quantify f (cycle and instruction count), understand messaging better

Page 10: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

Multi-Processor Performance model matches real data for a 323 size problem (32,768 particles)

Page 11: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

Stephen Tramer was able to add features to Bocca

• Bocca splicer block protection when using bocca-merge in target files- --no-preserve option- 4 regression tests

- merging a protected block inside a key block, - ignoring a protected block outside a key block, - standard merge- preservation turned off

• Bocca change now supports:– --remove-implements– --remove-requires

• Bocca copy operation:– may now create exact duplicates of {component, class, interface, port, enum}

Page 12: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

CcaEcloud Phase I Wrap-upPhase I Doe SBIR

Stefan Muszala, PIDOE Grant No DE-FG02-08ER85152

Tech-X CorporationBoulder, CO

Updates: onRamp, FACETS+Babel,Babel Structs

Page 13: CcaEcloud Phase I Wrap-up Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152 Tech-X Corporation Boulder, CO Updates: onRamp, FACETS+Babel,

Tech-X Corporation

Update on other projects using CCA tools

• OnRamp: – Need to write prototype with onRamp mini-tutorial and autotools– Possible use in Synergia2/TxPhysics as well as in FACETS transport model

integration

• FACETS transport model integration:– Will work with Tom to implement an alternative struct passing mechanism

• Legacy Fortran allocates memory as the callee, reluctance to change contributed codes to fix this

• Concern over F2003 compilers available on Franklin and Jacquard in the near term

• Did not want to re-write existing f90 code for using the Array type• Babel doesn’t support arrays of structs (and deeper nesting) so needed

derived type refactor in transport models if using current Babel struct implementation

• Babel Structs: – Few hours so far. By end of May should be well on the way to implementing

F77 & F90 struct support and tests. Java struct support and tests are next target