international exascale software program
DESCRIPTION
International Exascale Software Program. Abani K. Patra, J. Dongarra, P. Beckman …. Office of Cyberinfrastructure, National Science Foundation. Science Case For Exascale. DOE Workshops @ ~100 People Climate Science (11/08) High Energy Physics (12/08) Nuclear physics (1/09) - PowerPoint PPT PresentationTRANSCRIPT
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
International Exascale Software Program
International Exascale Software Program
Abani K. Patra, J. Dongarra, P. Beckman …. Office of Cyberinfrastructure, National Science Foundation
Abani K. Patra, J. Dongarra, P. Beckman …. Office of Cyberinfrastructure, National Science Foundation
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Science Case For Exascale
Strong science case for the continued
escalation of high-end computing.Broad consensus of
need to redesign and replace many of the algorithms and software infrastructure that HPC has built on for more than a decade.
DOE Workshops @ ~100 People Climate Science (11/08) High Energy Physics (12/08) Nuclear physics (1/09) Fusion Energy (3/09) Nuclear Energy (5/09) Biology (8/09) Basic Energy Science (8/09) Joint National Security (10/09) Computer Science Mathematics Computer Architecture
www.exascale.org
2
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
QuickTime™ and a decompressor
are needed to see this picture.
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Questions?Questions?• What are the new applications that are emerging or likely to
emerge in the coming decade?
• How can NSF best stimulate development of exascale software applications?
• How can useful software that has been developed as part of the exascale effort be sustained beyond the development period?
• What systems software will be required? Distributed systems support, programming environments, runtime support, data-management user tools?
• What are the new applications that are emerging or likely to emerge in the coming decade?
• How can NSF best stimulate development of exascale software applications?
• How can useful software that has been developed as part of the exascale effort be sustained beyond the development period?
• What systems software will be required? Distributed systems support, programming environments, runtime support, data-management user tools?
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Questions?Questions?• What application support environments will be needed?
Application packages, numeric and non-numeric library packages, problem-solving environments?
• How can NSF aid or catalyze developments that make it possible to use the same tools, including compilers, debuggers and performance tools, on system scales all the way down to the typical researcher’s laptop or desktop?
• What education and training actions should be considered to prepare researchers, students and educators for future cyberinfrastructure?
• What application support environments will be needed? Application packages, numeric and non-numeric library packages, problem-solving environments?
• How can NSF aid or catalyze developments that make it possible to use the same tools, including compilers, debuggers and performance tools, on system scales all the way down to the typical researcher’s laptop or desktop?
• What education and training actions should be considered to prepare researchers, students and educators for future cyberinfrastructure?
IESP Executive CommitteeJack Dongarra, Pete Beckman, Patrick Aerts, Frank Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Anne Trefethen, Mateo Valero
www.exascale.org
6
www.exascale.org
Performance Development in Top500
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
100 Pflop/s
10 Pflop/s SUM
N=1
N=500
GordonBell
Winners
GordonBell
Winners
www.exascale.org
7
Exponential growth in parallelism for the foreseeable future
www.exascale.org8
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Factors that Necessitate RedesignFactors that Necessitate Redesign
• Extreme parallelism and hybrid design
• Preparing for million/billion way parallelism
• Tightening memory/bandwidth bottleneck
• Limits on power/clock speed implication on multicore
• Reducing communication will become much more intense
• Extreme parallelism and hybrid design
• Preparing for million/billion way parallelism
• Tightening memory/bandwidth bottleneck
• Limits on power/clock speed implication on multicore
• Reducing communication will become much more intense
€
Tera→ Peta ≠ Peta→ Exa
• Memory per core changes, byte-to-flop ratio will change
• Necessary Fault Tolerance
• MTTF will drop
• Checkpoint/restart has limitations
Software infrastructure does not exist today
• Memory per core changes, byte-to-flop ratio will change
• Necessary Fault Tolerance
• MTTF will drop
• Checkpoint/restart has limitations
Software infrastructure does not exist today
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Factors that Necessitate RedesignFactors that Necessitate Redesign
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Advances in most branches of science and engineering are critically dependent on increasingly complex multi-scale, multi-physics, data driven computations and analysis.
Complexity of Systems Data Intensive Scalable ComputingWorkflows, Grids, Clouds ...
First Cosmological simulations to include black hole physics by Di Matteo et. al. at Carnegie Mellon funded by OCI and MPS/AST.
Optimal siting of oil exploration platform estimated by using simulation and optimization tools to maximize product
Optimal siting of oil exploration platform estimated by using simulation and optimization tools to maximize product
http://www.amd.com/us-en/assets/content_type/DigitalMedia/43264A_hi_res.jpg
AMD Phenom
http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html?open
IBM Cell Broadband Engine
QuickTime™ and a decompressor
are needed to see this picture.
All this complexity dealt with by software and tools! All this complexity dealt with by software and tools!
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Exascale ComputingExascale Computing
• Exascale systems are likely feasible by 2017±2
• 10-100x106 processing elements (cores or mini-cores) with chips of 1,000 cores per socket, clock rates will grow slower; Heterogeneous cores
• 3D packaging likely perhaps with optics based interconnects
• 10-100 PB of aggregate memory
• Hardware and software based fault management
• Performance per watt — stretch goal 100 GF/watt Þ >> 10 – 100 MW Exascale system
• Power, area and capital costs (?) will be significantly higher than for petascale
• Exascale systems are likely feasible by 2017±2
• 10-100x106 processing elements (cores or mini-cores) with chips of 1,000 cores per socket, clock rates will grow slower; Heterogeneous cores
• 3D packaging likely perhaps with optics based interconnects
• 10-100 PB of aggregate memory
• Hardware and software based fault management
• Performance per watt — stretch goal 100 GF/watt Þ >> 10 – 100 MW Exascale system
• Power, area and capital costs (?) will be significantly higher than for petascale
Google: exascale computing study
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
A Call to ActionA Call to Action
• Hardware has changed dramatically while software ecosystem has remained stagnant
• Previous approaches have not looked at co-design of multiple levels in the system software stack (OS, runtime, compiler, libraries, application frameworks)
• Need to exploit new hardware trends (e.g., manycore, heterogeneity) that cannot be handled by existing software stack, memory per socket trends
• Emerging software technologies exist, but have not been fully integrated with system software, e.g., UPC, Cilk, CUDA, HPCS
• Community codes unprepared for sea change in architecture
• Hardware has changed dramatically while software ecosystem has remained stagnant
• Previous approaches have not looked at co-design of multiple levels in the system software stack (OS, runtime, compiler, libraries, application frameworks)
• Need to exploit new hardware trends (e.g., manycore, heterogeneity) that cannot be handled by existing software stack, memory per socket trends
• Emerging software technologies exist, but have not been fully integrated with system software, e.g., UPC, Cilk, CUDA, HPCS
• Community codes unprepared for sea change in architecture
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
IESP GoalIESP Goal
• Improve the world’s simulation and modeling capability by improving the coordination and development of the HPC software environment
• Improve the world’s simulation and modeling capability by improving the coordination and development of the HPC software environment
Build an international plan for developing the next generation open source software for scientific high-
performance computing
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
PurposePurpose• The IESP software roadmap is a planning instrument
designed to enable the international HPC community to improve, coordinate and leverage their collective investments and development efforts.
• After needs are determined, the task will be to construct the organizational structures suitable to accomplish the work
• The IESP software roadmap is a planning instrument designed to enable the international HPC community to improve, coordinate and leverage their collective investments and development efforts.
• After needs are determined, the task will be to construct the organizational structures suitable to accomplish the work
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
International Community Effort
This needs to be an international collaboration because of the:
•Scale of investment
•Need for international input on requirements
•No global evaluation of key missing components
•Hardware features are uncoordinated with software development
•It’s a “flat world” in software -- need to harness all available intellectual resources -- US, Asia, Europe …
www.exascale.org
15
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Timeline:Timeline:• SC08 (Austin TX) meeting to generate interest
• Funding from DOE’s Office of Science & NSF Office of Cyberinfratructure and sponsorship by Europeans and Asians
• US meeting (Santa Fe, NM) April 6-8, 2009
65 people
• NSF’s Office of Cyberinfrastructure funding
• European meeting (Paris, France) June 28-29, 2009
70 people
Outline Report
• Asian meeting (Tsukuba Japan) October 18-20, 2009
Draft roadmap
Refine Report
• SC09 (Portland OR) BOF to inform others
Public Comment
Draft Report presented
Nov 2008
Apr 2009
Jun 2009
Oct 2009
Nov 2009
www.exascale.org
16
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Four Goals for IESP• Strategy for determining requirements
•clarity in scope is the issue
• Comprehensive software roadmap
•goals, challenges, barriers and options
• Resource estimate and schedule
•scale and risk relative to hardware and applications
• A governance and project coordination model
•Is the community ready for a project of this scale, complexity and importance?
•“Can we be trusted to pull this off?”ale.org
17
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Key Trends
• Increasing Concurrency
•Reliability Challenging
•Power dominating designs
•Heterogeneity in a node
• I/O and Memory: ratios and breakthroughs
Requirements on X-stack
Programming models, applications, and tools must address concurrency
Software and tools must manage power directly
Software must be resilient
Software must address change to heterogeneous nodes
Software must be optimized for new Memory ratios and need to solve parallel I/O bottleneck
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Roadmap ComponentsRoadmap Components
Priority Research Direction (one for each component)
Key challenges
What will you do to address the challenges?Brief overview of the barriers and gaps
What capabilities will result?
What new methods and components will be developed?
How will this impact the range of applications that may benefit from exascale systems?
What’s the timescale in which that impact may be felt?
Summary of research direction
Potential impact on software componentPotential impact on usability, capability,
and breadth of community
4.x <component>Technology drivers
Alternative R&D strategies
Recommended research agenda
Cross-cutting considerations
4.2.4 Numerical LibrariesTechnology drivers
Hybrid architectures Programming models/languages Precision Fault detection Energy budget Memory hierarchy Standards
Alternative R&D strategies Message passing Global address space Message-driven work-queue
Recommended research agenda Hybrid and hierarchical based software (eg linear
algebra split across multi-core / accelerator) Autotuning Fault oblivious sw, Error tolerant sw Mixed arithmetic Architectural aware libraries Energy efficient implementation Algorithms that minimize communications
Crosscutting considerations Performance Fault tolerance Power management Arch characteristics
Priority Research Direction
Key challenges
•Fault oblivious, Error tolerant software•Hybrid and hierarchical based algorithms (eg linear algebra split across multi-core and gpu, self-adapting)•Mixed arithmetic•Energy efficient algorithms•Algorithms that minimize communications•Autotuning based software•Architectural aware algorithms/libraries•Standardization activities •Async methods
•Overlap data and computation
•Adaptivity for architectural environment •Scalability : need algorithms with minimal amount of communication•Increasing the level of asynchronous behavior •Fault resistant software– bit flipping and loosing data (due to failures). Algorithms that detect and carry on or detect and correct and carry on (for one or more)•Heterogeneous architectures•Languages•Accumulation of round-off errors
•Efficient libraries of numerical routines
•Agnostic of platforms
•Self adapting to the environment
•Libraries will be impacted by compilers, OS, runtime, prog env etc
•Standards: FT, Power Management, Hybrid Programming, arch characteristics
•Make systems more usable by a wider group of applications•Enhance programmability
Summary of research direction
Potential impact on software component Potential impact on usability, capability, and breadth of community
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
4.2.4 Numerical Libraries4.2.4 Numerical Libraries
Energy aware
Fault tolerant
Heterogeneous sw
Self adapting for precision
Scaling to billion way
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Complexity of system
Architectural transparency
Self Adapting for performance
Numerical LibrariesStructured gridsUnstructured gridsFFTsDense LASparse LAMonte CarloOptimization
Language issuesStd: Fault tolerant
Std: Energy aware
Std: Arch characteristicsStd: Hybrid Progm
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
• www.exascale.org• www.exascale.org
National Science FoundationWhere Discoveries Begin
Office of Cyberinfrastructure
Abani K. [email protected]
Abani K. [email protected]
Next StepsNext Steps• Revise and extend initial draft
• Build management and collaboration plans
• Work with all stakeholders to plan research activities
• Next workshop(s) in the spring -- greater application focus
• NSF/OCI -- exascale exploration awards
• Revise and extend initial draft
• Build management and collaboration plans
• Work with all stakeholders to plan research activities
• Next workshop(s) in the spring -- greater application focus
• NSF/OCI -- exascale exploration awards