international exascale software program

26
National Science Foundation Where Discoveries Begin Office of Cyberinfrastructure Abani K. Patra [email protected] International Exascale Software Program Abani K. Patra, J. Dongarra, P. Beckman …. Office of Cyberinfrastructure, National Science Foundation

Upload: pierce

Post on 17-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

International Exascale Software Program. Abani K. Patra, J. Dongarra, P. Beckman …. Office of Cyberinfrastructure, National Science Foundation. Science Case For Exascale. DOE Workshops @ ~100 People Climate Science (11/08) High Energy Physics (12/08) Nuclear physics (1/09) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

International Exascale Software Program

International Exascale Software Program

Abani K. Patra, J. Dongarra, P. Beckman …. Office of Cyberinfrastructure, National Science Foundation

Abani K. Patra, J. Dongarra, P. Beckman …. Office of Cyberinfrastructure, National Science Foundation

Page 2: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Science Case For Exascale

Strong science case for the continued

escalation of high-end computing.Broad consensus of

need to redesign and replace many of the algorithms and software infrastructure that HPC has built on for more than a decade.

DOE Workshops @ ~100 People Climate Science (11/08) High Energy Physics (12/08) Nuclear physics (1/09) Fusion Energy (3/09) Nuclear Energy (5/09) Biology (8/09) Basic Energy Science (8/09) Joint National Security (10/09) Computer Science Mathematics Computer Architecture

www.exascale.org

2

Page 3: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

QuickTime™ and a decompressor

are needed to see this picture.

Page 4: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Questions?Questions?• What are the new applications that are emerging or likely to

emerge in the coming decade?

• How can NSF best stimulate development of exascale software applications?

• How can useful software that has been developed as part of the exascale effort be sustained beyond the development period?

• What systems software will be required? Distributed systems support, programming environments, runtime support, data-management user tools?

• What are the new applications that are emerging or likely to emerge in the coming decade?

• How can NSF best stimulate development of exascale software applications?

• How can useful software that has been developed as part of the exascale effort be sustained beyond the development period?

• What systems software will be required? Distributed systems support, programming environments, runtime support, data-management user tools?

Page 5: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Questions?Questions?• What application support environments will be needed?

Application packages, numeric and non-numeric library packages, problem-solving environments?

• How can NSF aid or catalyze developments that make it possible to use the same tools, including compilers, debuggers and performance tools, on system scales all the way down to the typical researcher’s laptop or desktop?

• What education and training actions should be considered to prepare researchers, students and educators for future cyberinfrastructure?

• What application support environments will be needed? Application packages, numeric and non-numeric library packages, problem-solving environments?

• How can NSF aid or catalyze developments that make it possible to use the same tools, including compilers, debuggers and performance tools, on system scales all the way down to the typical researcher’s laptop or desktop?

• What education and training actions should be considered to prepare researchers, students and educators for future cyberinfrastructure?

Page 6: International Exascale Software Program

IESP Executive CommitteeJack Dongarra, Pete Beckman, Patrick Aerts, Frank Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Anne Trefethen, Mateo Valero

www.exascale.org

6

Page 7: International Exascale Software Program

www.exascale.org

Performance Development in Top500

1 Gflop/s

1 Tflop/s

100 Mflop/s

100 Gflop/s

100 Tflop/s

10 Gflop/s

10 Tflop/s

1 Pflop/s

100 Pflop/s

10 Pflop/s SUM

N=1

N=500

GordonBell

Winners

GordonBell

Winners

www.exascale.org

7

Page 8: International Exascale Software Program

Exponential growth in parallelism for the foreseeable future

www.exascale.org8

Page 9: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Factors that Necessitate RedesignFactors that Necessitate Redesign

• Extreme parallelism and hybrid design

• Preparing for million/billion way parallelism

• Tightening memory/bandwidth bottleneck

• Limits on power/clock speed implication on multicore

• Reducing communication will become much more intense

• Extreme parallelism and hybrid design

• Preparing for million/billion way parallelism

• Tightening memory/bandwidth bottleneck

• Limits on power/clock speed implication on multicore

• Reducing communication will become much more intense

Tera→ Peta ≠ Peta→ Exa

• Memory per core changes, byte-to-flop ratio will change

• Necessary Fault Tolerance

• MTTF will drop

• Checkpoint/restart has limitations

Software infrastructure does not exist today

• Memory per core changes, byte-to-flop ratio will change

• Necessary Fault Tolerance

• MTTF will drop

• Checkpoint/restart has limitations

Software infrastructure does not exist today

Page 10: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Factors that Necessitate RedesignFactors that Necessitate Redesign

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Advances in most branches of science and engineering are critically dependent on increasingly complex multi-scale, multi-physics, data driven computations and analysis.

Complexity of Systems Data Intensive Scalable ComputingWorkflows, Grids, Clouds ...

First Cosmological simulations to include black hole physics by Di Matteo et. al. at Carnegie Mellon funded by OCI and MPS/AST.

Optimal siting of oil exploration platform estimated by using simulation and optimization tools to maximize product

Optimal siting of oil exploration platform estimated by using simulation and optimization tools to maximize product

http://www.amd.com/us-en/assets/content_type/DigitalMedia/43264A_hi_res.jpg

AMD Phenom

http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html?open

IBM Cell Broadband Engine

QuickTime™ and a decompressor

are needed to see this picture.

All this complexity dealt with by software and tools! All this complexity dealt with by software and tools!

Page 11: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Exascale ComputingExascale Computing

• Exascale systems are likely feasible by 2017±2

• 10-100x106 processing elements (cores or mini-cores) with chips of 1,000 cores per socket, clock rates will grow slower; Heterogeneous cores

• 3D packaging likely perhaps with optics based interconnects

• 10-100 PB of aggregate memory

• Hardware and software based fault management

• Performance per watt — stretch goal 100 GF/watt Þ >> 10 – 100 MW Exascale system

• Power, area and capital costs (?) will be significantly higher than for petascale

• Exascale systems are likely feasible by 2017±2

• 10-100x106 processing elements (cores or mini-cores) with chips of 1,000 cores per socket, clock rates will grow slower; Heterogeneous cores

• 3D packaging likely perhaps with optics based interconnects

• 10-100 PB of aggregate memory

• Hardware and software based fault management

• Performance per watt — stretch goal 100 GF/watt Þ >> 10 – 100 MW Exascale system

• Power, area and capital costs (?) will be significantly higher than for petascale

Google: exascale computing study

Page 12: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

A Call to ActionA Call to Action

• Hardware has changed dramatically while software ecosystem has remained stagnant

• Previous approaches have not looked at co-design of multiple levels in the system software stack (OS, runtime, compiler, libraries, application frameworks)

• Need to exploit new hardware trends (e.g., manycore, heterogeneity) that cannot be handled by existing software stack, memory per socket trends

• Emerging software technologies exist, but have not been fully integrated with system software, e.g., UPC, Cilk, CUDA, HPCS

• Community codes unprepared for sea change in architecture

• Hardware has changed dramatically while software ecosystem has remained stagnant

• Previous approaches have not looked at co-design of multiple levels in the system software stack (OS, runtime, compiler, libraries, application frameworks)

• Need to exploit new hardware trends (e.g., manycore, heterogeneity) that cannot be handled by existing software stack, memory per socket trends

• Emerging software technologies exist, but have not been fully integrated with system software, e.g., UPC, Cilk, CUDA, HPCS

• Community codes unprepared for sea change in architecture

Page 13: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

IESP GoalIESP Goal

• Improve the world’s simulation and modeling capability by improving the coordination and development of the HPC software environment

• Improve the world’s simulation and modeling capability by improving the coordination and development of the HPC software environment

Build an international plan for developing the next generation open source software for scientific high-

performance computing

Page 14: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

PurposePurpose• The IESP software roadmap is a planning instrument

designed to enable the international HPC community to improve, coordinate and leverage their collective investments and development efforts.

• After needs are determined, the task will be to construct the organizational structures suitable to accomplish the work

• The IESP software roadmap is a planning instrument designed to enable the international HPC community to improve, coordinate and leverage their collective investments and development efforts.

• After needs are determined, the task will be to construct the organizational structures suitable to accomplish the work

Page 15: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

International Community Effort

This needs to be an international collaboration because of the:

•Scale of investment

•Need for international input on requirements

•No global evaluation of key missing components

•Hardware features are uncoordinated with software development

•It’s a “flat world” in software -- need to harness all available intellectual resources -- US, Asia, Europe …

www.exascale.org

15

Page 16: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Timeline:Timeline:• SC08 (Austin TX) meeting to generate interest

• Funding from DOE’s Office of Science & NSF Office of Cyberinfratructure and sponsorship by Europeans and Asians

• US meeting (Santa Fe, NM) April 6-8, 2009

65 people

• NSF’s Office of Cyberinfrastructure funding

• European meeting (Paris, France) June 28-29, 2009

70 people

Outline Report

• Asian meeting (Tsukuba Japan) October 18-20, 2009

Draft roadmap

Refine Report

• SC09 (Portland OR) BOF to inform others

Public Comment

Draft Report presented

Nov 2008

Apr 2009

Jun 2009

Oct 2009

Nov 2009

www.exascale.org

16

Page 17: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Four Goals for IESP• Strategy for determining requirements

•clarity in scope is the issue

• Comprehensive software roadmap

•goals, challenges, barriers and options

• Resource estimate and schedule

•scale and risk relative to hardware and applications

• A governance and project coordination model

•Is the community ready for a project of this scale, complexity and importance?

•“Can we be trusted to pull this off?”ale.org

17

Page 18: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Key Trends

• Increasing Concurrency

•Reliability Challenging

•Power dominating designs

•Heterogeneity in a node

• I/O and Memory: ratios and breakthroughs

Requirements on X-stack

Programming models, applications, and tools must address concurrency

Software and tools must manage power directly

Software must be resilient

Software must address change to heterogeneous nodes

Software must be optimized for new Memory ratios and need to solve parallel I/O bottleneck

Page 19: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Roadmap ComponentsRoadmap Components

Page 20: International Exascale Software Program

Priority Research Direction (one for each component)

Key challenges

What will you do to address the challenges?Brief overview of the barriers and gaps

What capabilities will result?

What new methods and components will be developed?

How will this impact the range of applications that may benefit from exascale systems?

What’s the timescale in which that impact may be felt?

Summary of research direction

Potential impact on software componentPotential impact on usability, capability,

and breadth of community

Page 21: International Exascale Software Program

4.x <component>Technology drivers

Alternative R&D strategies

Recommended research agenda

Cross-cutting considerations

Page 22: International Exascale Software Program

4.2.4 Numerical LibrariesTechnology drivers

Hybrid architectures Programming models/languages Precision Fault detection Energy budget Memory hierarchy Standards

Alternative R&D strategies Message passing Global address space Message-driven work-queue

Recommended research agenda Hybrid and hierarchical based software (eg linear

algebra split across multi-core / accelerator) Autotuning Fault oblivious sw, Error tolerant sw Mixed arithmetic Architectural aware libraries Energy efficient implementation Algorithms that minimize communications

Crosscutting considerations Performance Fault tolerance Power management Arch characteristics

Page 23: International Exascale Software Program

Priority Research Direction

Key challenges

•Fault oblivious, Error tolerant software•Hybrid and hierarchical based algorithms (eg linear algebra split across multi-core and gpu, self-adapting)•Mixed arithmetic•Energy efficient algorithms•Algorithms that minimize communications•Autotuning based software•Architectural aware algorithms/libraries•Standardization activities •Async methods

•Overlap data and computation

•Adaptivity for architectural environment •Scalability : need algorithms with minimal amount of communication•Increasing the level of asynchronous behavior •Fault resistant software– bit flipping and loosing data (due to failures). Algorithms that detect and carry on or detect and correct and carry on (for one or more)•Heterogeneous architectures•Languages•Accumulation of round-off errors

•Efficient libraries of numerical routines

•Agnostic of platforms

•Self adapting to the environment

•Libraries will be impacted by compilers, OS, runtime, prog env etc

•Standards: FT, Power Management, Hybrid Programming, arch characteristics

•Make systems more usable by a wider group of applications•Enhance programmability

Summary of research direction

Potential impact on software component Potential impact on usability, capability, and breadth of community

Page 24: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

4.2.4 Numerical Libraries4.2.4 Numerical Libraries

Energy aware

Fault tolerant

Heterogeneous sw

Self adapting for precision

Scaling to billion way

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Complexity of system

Architectural transparency

Self Adapting for performance

Numerical LibrariesStructured gridsUnstructured gridsFFTsDense LASparse LAMonte CarloOptimization

Language issuesStd: Fault tolerant

Std: Energy aware

Std: Arch characteristicsStd: Hybrid Progm

Page 25: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

• www.exascale.org• www.exascale.org

Page 26: International Exascale Software Program

National Science FoundationWhere Discoveries Begin

Office of Cyberinfrastructure

Abani K. [email protected]

Abani K. [email protected]

Next StepsNext Steps• Revise and extend initial draft

• Build management and collaboration plans

• Work with all stakeholders to plan research activities

• Next workshop(s) in the spring -- greater application focus

• NSF/OCI -- exascale exploration awards

• Revise and extend initial draft

• Build management and collaboration plans

• Work with all stakeholders to plan research activities

• Next workshop(s) in the spring -- greater application focus

• NSF/OCI -- exascale exploration awards