the future of (super) computer programmingpeople.cs.aau.dk/~bt/phdsupercomputing2011/super...•...
TRANSCRIPT
1
The future of (super) Computer Programming
Bent Thomsen
[email protected] of Computer Science
Aalborg University
2
eScience: Simulation - The Third Pillar of Science • Traditional scientific and engineering paradigm:
1) Do theory or paper design.2) Perform experiments or build system.
• Limitations:- Too difficult -- build large wind tunnels.- Too expensive -- build a throw-away passenger jet.- Too slow -- wait for climate or galactic evolution.- Too dangerous -- weapons, drug design, climate experimentation.
• Computational science paradigm:3) Use high performance computer systems to simulate the
phenomenon- Base on known physical laws and efficient numerical methods.
Exascale computing
The United States has put aside $126 million for exascale computing beginning in 2012, in an attempt to overtake China's Tianhe-1A supercomputer as the fastest computing platform in the world.
3
February 21, 2011
How to spend a billion dollars
Exascale programme builds on the HPCS programme In Phase I (June 2002 – June 2003)
Cray, IBM, SUN, HP, SGI, MITRE spent $250 million
In Phase II (July 2003 – June 2006) Cray was awarded a $43.1 million IBM was awarded a $53.3 million SUN was awarded $49.7 million
Phase III (July 2006 – December 2010) Cray has been awarded $250 million IBM has been awarded $244 million
High Productivity Computing Systems
Phase 1 Phase 2(2003-2005)
Phase 3(2006-2010)
ConceptStudy
AdvancedDesign &Prototypes
Full ScaleDevelopment
Petascale/s Systems
Vendors
New EvaluationFramework
Test EvaluationFramework
Create a new generation of economically viable computing systems (2010) and a procurement methodology (2007-2010) for the security/industrial community
Validated ProcurementEvaluation Methodology
-Program Overview-
Half-Way PointPhase 2
TechnologyAssessment
Review
Petascale Computers• Roadrunner, built by IBM,
• first computer to go petascale, May 25, 2008, performance of 1.026 petaflops.
• XT5 "Jaguar", built by Cray, • Later in 2008. After an update in 2009, its performance reached
1.759 petaflops.• Nebulae built by Dawning,
• third petascale computer and the first built by China, performance of 1.271 petaflops in 2010.
• Tianhe-1A built by NUDT, • is the fastest supercomputer in the world, at 2.566 petaflops in
2010.
6
High Productivity Computing Systems
Impact: Performance (time-to-solution): speedup critical national
security applications by a factor of 10X to 40X Programmability (idea-to-first-solution): reduce cost and
time of developing application solutions Portability (transparency): insulate research and
operational application software from system Robustness (reliability): apply all known techniques to
protect against outside attacks, hardware faults, & programming errors
Fill the Critical Technology and Capability GapToday (late 80’s HPC technology)…..to…..Future (Quantum/Bio Computing)
Applications: Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant
modeling and biotechnology
HPCS Program Focus Areas
Create a new generation of economically viable computing systems (2010) and a procurement methodology (2007-2010) for the security/industrial community
HPCS Program GoalsProductivity Goals
• HPCS overall productivity goals:– Execution (sustained performance)
1 Petaflop/s scalable to greater than 4 Petaflop/s
– Development 10X over today’s systems Reference: Lone researcher and
Enterprise workflows
10x improvement in time to first solution!
Experiment
Theory
Experiment
Theory
Researcher
Lone ResearcherDesign
Simulation
Visualize
Enterprise
Design
Simulation
Visualize
Enterprise
Port Legacy Software
Enterprise Execution
Development
9
How to increase Programmer Productivity?
3 ways of increasing programmer productivity:1. Process (software engineering)
– Controlling programmers– Good process can yield up to 20% increase
2. Tools (verification, static analysis, program generation)– Good tools can yield up to 10% increase
3. Language design --- the center of the universe!– Core abstractions, mechanisms, services, guarantees– Affect how programmers approach a task (C vs. SML)– New languages can yield 700% increase
10
High Productivity Computing Systems
Large Part of HPCS Program focused on Programming Language Development
• X10 from IBM– Extended subset of Java based onNon-Uniform
Computing Clusters (NUCCs) where different memory locations incur different cost
• Chapel from CRAY– Built on HPF and ZPL (based on Modula-2, Pascal,
Algol)• Fortress from SUN
– Based on “Growwing a Language” philosophy
11
New Programming LanguagesWhy should I bother?
• Fortran has been with us since 1954• C has been with us since 1971• C++ has been with us from 1983• Java has been with us since 1995• C# has been with us since 2000
12
New Programming LanguagesWhy should I bother?
• Every generation improves:– Programmer productivity
• Higher level of abstraction• Thus reduce time-to-market
– Program reuse• Libraries, components, patterns
– Program reliability• Thus fewer bugs make it through to product
– But usually not performance • Usually lagging five years behind, but will catch-up
13
Programming Language Genealogy
Diagram by Peter SestoftLang History.htm
14
But why do we need new (HPCS) languages now?
• Until about 20 years ago there was a neat correspondence between the Fortran/C/C++/Java/C# programming model and the underlying machines
• The only thing that (apparently) changed was that the processors got faster
• Moore’s Law (misinterpreted):– The processor speed doubles every 18 months– Almost every measure of the capabilities of digital
electronic devices is linked to Moore's Law: processing speed, memory capacity, … (source Wikipedia)
15
The Hardware world is changing!
16
Moore’s Law• Popular belief:
– Moore’s Law stopped working in 2005!• Moore’s Law (misinterpreted):
– The processor speed doubles every 18 months• Moore’s Law still going strong
– the number of transistors per unit area on a chip doubles every 18 months
• Instead of using more and more HW real-estate on cache memory it is now used for multiple cores
17
The IT industry wakeup call
• The super computing community discovered the change in hardware first
• The rest of the computing industry have started to worry
“Multicore: This is the one which will have the biggest impact on us. We have never had a problem to solve like this. A breakthrough is needed in how applications are done on multicore devices.” – Bill Gates
18
What is the most expensive operation in this line of C code?
• int x = (3.14 * r) + (x * y);
19
A programmer’s view of memory
This model was pretty accurate in 1985. Processors (386, ARM, MIPS, SPARC) all ran at 1–10MHz clock speed and could access external memory in 1 cycle; and most instructions took 1 cycle.Indeed the C language was as expressively time-accurate as alanguage could be: almost all C operators took one or two cycles.But this model is no longer accurate!
20
A modern view of memory timings
So what happened? On-chip computation (clock-speed) sped upfaster (1985–2005) than off-chip communication (with memory) as feature sizes shrank.The gap was filled by spending transistor budget on caches which(statistically) filled the mismatch until 2005 or so.Techniques like caches, deep pipelining with bypasses, andsuperscalar instruction issue burned power to preserve our illusions.2005 or so was crunch point as faster, hotter, single-CPU Pentiumswere scrapped. These techniques had delayed the inevitable.
21
The Current Mainstream Processor
Will scale to 2, 4 maybe 8 processors. But ultimately shared memory becomes the bottleneck (1024 processors?!?).
22
Angela C. Sodan, Jacob Machina, Arash Deshmeh, Kevin Macnaughton, Bryan Esbaugh, "Parallelism via Multithreaded and Multicore CPUs," Computer, pp. 24-32, March, 2010
23
Angela C. Sodan, Jacob Machina, Arash Deshmeh, Kevin Macnaughton, Bryan Esbaugh, "Parallelism via Multithreaded and Multicore CPUs," Computer, pp. 24-32, March, 2010
24
Hardware will change
• Cell – Multi-core with 1 PPC + 8(6) SPE (SIMD) – 3 level memory hierarchy – broadcast communication
• GPU– 256 SIMD HW treads– Data parallel memory
• FPGA … (build your own hardware)
25
Super Computer Organisation
26
Locality and Parallelism
• Large memories are slow, fast memories are small.• Storage hierarchies are large and fast on average.• Parallel processors, collectively, have large, fast memories -- the
slow accesses to “remote” data we call “communication”.• Algorithm should do most work on local data.
ProcCache
L2 Cache
L3 Cache
Memory
ProcCache
L2 Cache
L3 Cache
Memory
potentialinterconnects
27
Memory Hierarchy• Most programs have a high degree of locality in their accesses
• spatial locality: accessing things nearby previous accesses• temporal locality: reusing an item that was previously accessed
• Memory hierarchy tries to exploit locality
on-chip cacheregisters
datapath
control
processor
Second level
cache (SRAM)
Main memory
(DRAM)
Secondary storage (Disk)
Tertiary storage
(Disk/Tape)
Speed (ns): 1 10 100 10 ms 10 sec
Size (bytes): 100s KB MB GB TB
28
Programming model(s) reflecting the new world are called for
• Algorithm should do most work on local data !!• Programmers need to
– make decisions on parallel execution – know what is local and what is not– need to deal with communication
• But how can the poor programmer ensure this?• She/he has to exploit:
– Data Parallelism and memory parallelism– Task parallelism and instruction parallelism
• She/he needs programming language constructs to help her/him
29
Domain deposition
30
Domain deposition methods
31
Functional deposition
32
Types of ParallelismTask Parallelism
– Parallelism explicit in algorithm– Between filters without
producer/consumer relationship
Data Parallelism– Peel iterations of filter, place within
scatter/gather pair (fission)– parallelize filters with state
Pipeline Parallelism– Between producers and consumers– Stateful filters can be parallelized
Scatter
Gather
Task
33
Types of ParallelismTask Parallelism
– Parallelism explicit in algorithm– Between filters without
producer/consumer relationship
Data Parallelism– Between iterations of a stateless filter – Place within scatter/gather pair (fission)– Can’t parallelize filters with state
Pipeline Parallelism– Between producers and consumers– Stateful filters can be parallelized
Scatter
Gather
Scatter
Gather
Task
Pip
elin
e
Data
Data Parallel
34
Types of ParallelismTraditionally:
Task Parallelism– Thread (fork/join) parallelism
Data Parallelism– Data parallel loop (forall)
Pipeline Parallelism– Usually exploited in hardware
Scatter
Gather
Scatter
Gather
Task
Pip
elin
e
Data
35
New HPCS Languages• Constructs for expressing Data Parallelism
– In Chapel distribution is separate annotation– In X10 no direct remote access for distributed array data– In Fortress user-defined distributed data structures without explicit
layout control– All have partitioned global address space (PGAS)
• Constructs for expressing Task Parallelism– All three languages support atomic blocks
• None of the languages have locks• (Semantically, locks are more powerful, but harder to manage than atomic
sections)– Other mechanisms
• X10 has “clocks” (barriers with dynamically attached tasks), conditional atomic sections, synchronization variables
• Chapel has “single” (single writer) and “sync” (multiple readers and writers) variables)
• Fortress has abortable atomic sections and a mechanism for waiting on individual spawned threads.
• Language designers could not resist temptation to also address some general programming language design issues
GPGPU (the poor mans HPC)
36
CUDA and OpenCL
• CUDA from NVIDIA– Extension to C for programming NVIDIA GPU
• OpenCL– Initiated by Apple– Developed by the Khronos Group– Extension to C generalizing CUDA concepts
from GPU to multi-core and Cell processors
37
OpenCL and CUDA
Image from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
CUDA OpenCLKernel KernelHost program Host programThread Work itemBlock Work groupGrid NDRange (index space)
OpenCL and CUDA
__global__ void vecAdd(float *a,float *b, float *c)
{int i = threadIdx.x;c[i] = a[i] + b[i];
}
In CUDA:
OpenCL and CUDA
__kernel void vecAdd(__global const float *a, __global const float *b, __global float *c)
{int i = get_global_id(0);c[i] = a[i] + b[i];
}
In OpenCL:
OpenCL and CUDACUDA OpenCLGlobal memory Global memoryConstant memory Constant memoryShared memory Local memoryLocal memory Private memory
42
Other Programming Language Trends
• Conventional wisdom is out of the window– Cheaper to re-compute than store and fetch
• Declarative Programming– Mainly functional programming– “Why functional programming matters” (again)
• Hardware again influence language design• Need for correct programs
– eScience – what good is an eScience result if we cannot trust the computational results
– In mainstream we need to provide applications that will not let in vira and worms
– Software used in safety critical system– Software used in (high)-finance– Software used en eGovernment (e.g. online voting)– Implies lots of work on semantics
43Source: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
January 2011
44Source: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
January 2011
45
Conclusions• Nothing has changed much• Main languages have their domain
– Java – for web applications– C – for system programming– (Visual) Basic – for desktop windows apps– PHP for serverside scripting– C++ when Java is (perceived) too slow
• We shouldn’t bother with new languages!
• Wait a minute!• Something is changing
– Software is getting more and more complex– Hardware has changed
46
Which languages are discussed?
Source: http://langpop.com
47Source: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
48
Three Trends
• Declarative programming languages in vogue again– Especially functional
• Dynamic Programming languages are gaining momentum
• Concurrent Programming languages are back on the agenda
49
Declarative Programming
• Lots of talk about declarative languages:– Haskell– Scheme, Lisp, Clojure– F#, O’Caml, SML– Scala, Fortress
• Lots of talk about declarative constructs in traditional languages– C# (and Java and C++)
50
What do we mean by declarative/functional?
• Say what you want, without saying how
Or as Anders Heilsberg, Inventor of C#, puts it:
”programmers need to talk less about how to do things and more about what they want done and have computers reason it out."
51
Quicksort in C
52
Quicksort in Haskell
qsort [] = [] qsort (x:xs) =
qsort (filter (< x) xs) ++ [x] ++qsort (filter (>= x) xs)
53
What do we mean by declarative/functional?
• Say what you want, without saying how– Not quite true – more a question of saying how
implicitly
54
What do we mean by declarative/functional?
• Say what you want, without saying how
• Functions as first class entities• Lazy or(/and) eager evaluation• Pure vs. impure• Value oriented (vs. state oriented)• Pattern matching• Generics (or parametric polymorphism)
55
Mainstream programming is going declarative
In 2005 Anders Heilsberg (designer of C#) said:
``Generally speaking, it's interesting to think about more declarative styles of programming vs. imperative styles. ... Functional programming languages and queries are actually a more declarative style of programming''.
``programmers have to unlearn .. and to learn to trust that when they're just stating the ``what'' The machine is smart enough to do the ``how'' the way they want it done, or the most efficient way''. - Anders Hejlsberg
http://www.microsoft-watch.com/content/operating_systems/the_father_of_c_on_the_past_present_and_future_of_programming.html
56
Name the language...
Func<intlist, intlist> Sort = xs =>xs.Case(
() => xs, (head,tail) => (Sort(tail.Where(x => x < head)))
.Concat(Single(head))
.Concat(Sort(tail.Where(x => x >= head)))
);
• Quicksort revisited C# 3.0
type inferenceappend
higher-order function
parameterized type of functions
recursionfilter
lambda expression
57
C# 3.0 Language Extensionsvar contacts =
from c in customerswhere c.State == "WA"select new { c.Name, c.Phone };
var contacts =customers.Where(c => c.State == "WA").Select(c => new { c.Name, c.Phone });
Extension methods
Lambda expressions
Query expressions
Object initializers
Anonymous types
Local variable type inference
58
C# 3.0 Features– Implicitly Typed Local Variables– Lambda Expressions– Anonymous Types– Expression Trees– Query Expressions– Extension Methods– Object Initializers– Collection Initializers– Iterators– Lazy streams– Nullable value types
– C# 2.0 already has:• Generics• Structured Value Types• First class anonymous functions (called delegates)
59
F#• A .NET language (developed by Don Syme)
– Connects with all Microsoft foundation technologies – 3rd official MS language shipped with VS2010
• Aims to combine the best of Lisp, ML, Scheme, Haskell, in the context of .NET– Actually based on O’Caml
• Functional, math-oriented, scalable
• Aimed particularly at the "Symbolic Programming" niche at Microsoft
60
F# on one slideF# on one slide• let data = (1,2,3)
• let sqr x = x * x
• let f (x,y,z) = (sqr x, sqr y, sqr z)
• let sx,sy,sz = f (10,20,30)
• print "hello world"; 1+2
• let show x y z = • printf "x = %d y = %d y = %d \n" x y z;• let sqrs= f (x,y,z) in • print "Hello world\n";• sqrs
• let (|>) x f = f x
NOTE: parentheses optional on application
NOTE: sequencing
NOTE: local binding, sequencing, return
NOTE: type inferredval data: int * int * int
val sqr: int -> int
NOTE: pipelining operator
NOTE: pattern
matching
61
Java Future• Since its launch in 1995 Java has been
the darling of Industry and Academia• Java is now more than 15 years old• Pace of language innovation slowing down
– Java 6 SE released Dec. 2006– Java 6 EE released Dec. 2009
• Waiting for Java 7 SE / JDK 7 – Work started in 2006– Forecast Feb. 2010– Postponed till summer 2011
62
Guy Steele theorizes that programming languages are finite, and argues that the time is right for a successor to Java, which has another two decades of life left. Sun is investigating whether aligning programming languages more closely to traditional mathematical notation can reduce the burden for scientific programmers
"A Conversation With Guy Steele Jr."Dr. Dobb's Journal (04/05) Vol. 30, No. 4, P. 17; Woehr, Jack J.
Guy Steele co-wrote the original Java specifications and in 1996 was awarded the ACM SIGPLAN Programming Language Achievement Award. Steele is a distinguished engineer and principal investigator at Sun Microsystems Laboratories, where he heads the company's Programming Language Research Group.
Beyond Java
63
Fortress• One of the three languages DARPA spent 1BN$ on
– Actually SUN only got 49.7M$ (IBM and CRAY got the rest)
• First class higher order functions• Type inference• Immutable and mutable variables• Traits
– Like Java interfaces with code, classes without fields• Objects
– Consist of fields and methods• Designed to be parallel unless explicit sequential
– For loops and generators, tuples– Transactional Memory– PGAS (Partitioned Global Address Space)
• Runs on top of the JVM
64
“Advances” in Syntax• Extensible syntax – follows Guy Stell’s vision of
“Growing a language”– The only language I know with overloadable whitespace!– Syntax based on Parsing Expression Grammars (PEG)
• Syntax resembling mathematical notation
65
Scala• Scala is an object-oriented and functional language which is
completely interoperable with Java– Developed by Martin Odersky, EPFL, Lausanne, Switzerland
• Uniform object model– Everything is an object– Class based, single inheritance– Mixins and traits– Singleton objects defined directly
• Higher Order and Anonymous functions with Pattern matching• Genericity• Extendible
– All operators are overloadable, function symbols can be pre-, post- or infix
– New control structures can be defined without using macros
66
Scala is Object OrientedScala programs interoperate seamlessly with Java class libraries:– Method calls– Field accesses– Class inheritance– Interface implementation
all work as in Java.Scala programs compile to JVM bytecodes.
Scala’s syntax resembles Java’s, but there are also some differences.
object Example1 {def main(args: Array[String]) {
val b = new StringBuilder()for (i 0 until args.length) {
if (i > 0) b.append(" ")b.append(args(i).toUpperCase)
}Console.println(b.toString)
}
}
object instead of static members var: Type instead of Type var
Scala’s version of the extendedfor loop
(use <- as an alias for )Arrays are indexed
args(i) instead of args[i]
67
Scala is functional
The last program can also be written in a completely different style:– Treat arrays as instances
of general sequence abstractions.
– Use higher-orderfunctions instead of loops.
object Example2 {def main(args: Array[String]) {
println(args map (_.toUpperCase) mkString " ")
}}
Arrays are instances of sequences with map and mkString methods.
A closure which applies the toUpperCase method to its
String argument
map is a method of Array which applies the function on its right
to each array element.
mkString is a method of Array which forms a string of all elements with a
given separator between them.
68
Scala’s approach
• Scala applies Tennent’s design principles:– Concentrate on abstraction and composition
capabilities instead of basic language constructs
– Minimal orthogonal set of core language constructs
• But it is European
69
Clojure• Concurrent Lisp like language on JVM
– Developed by Rich Hickey• Functions are first-class values • Everything is an expression, except:
– Symbols– Operations (op ...)– Special operations:
• def if fn let loop recur do new . throw try set! quote var
• Code is expressed in data structures– Clojure is homoiconic
homoiconicity is a property of some programming languages, in which the primary representation of programs is also a data structure in a primitive type of the language itself - Wikipedia
70
Java vs. Clojure
71
Dynamic Programming• Lots of talk about dynamic languages
– PhP, Perl, Ruby– JavaScript– Lisp/Scheme– Erlang– Groovy– Clojure– Python
• jPython for JVM and IronPyhon for .Net
• Real-programmers don’t need types
72
73
74
Dynamic Language characteristics• (Perceived) to be less verbose
– Comes with good libraries/frameworks
• Interpreted or JIT to bytecode• Eval: string -> code• REPL style programming• Embeddable in larger applications as scripting language
• Supports Higher Order Function!• Object oriented
– JavaScript, Ruby and Python– Based on Self resp. SmallTalk
• Meta Programming made easier
75
Dynamic Programming in C# 4.0
– Dynamic Lookup• A new static type called: dynamic• No static typing of operations with dynamic• Exceptions on invalid usage at runtime
– Optional and Named Parameters– COM interop features– (Co-and Contra-variance)
dynamic d = GetDynamicObject(…); d.M(7); // calling methods d.f= d.P; // getting and settings fields and properties d[“one”] = d[“two”]; // getting and setting thorughindexers Int i= d + 3; // calling operators string s = d(5,7); // invoking as a delegate
76
Concurrent Programming• Lots of talk about Erlang
• Fortress, X10 and Chapel
• Java.util.concurrency• Actors in Scala• Clojure
• C omega• F# / Axum• .Net Parallel Extensions
77
The problem with Threads• Threads
– Program counter– Own stack– Shared Memory– Create, start (stop), yield ..
• Locks– Wait, notify, notifyall– manually lock and unlock
• or implicit via synchronized– lock ordering is a big problem– Granularity is important
– Not compositional
78
Several directions
• (Software) Transactional Memory– Enclose code in begin/end blocks or atomic
blocks– Variations
• specify manual abort/retry• specify an alternate path (way of controlling
manual abort)– Java STM2 library– Clojure, Fortress, X10, Chapel
79
Message Passing/Actors
– Erlang– Scala Actors– F#/Axum
– GO!
80
Theoretical Models
• Actors• CSP• CCS• pi-calculus• join-calculus
• All tried and tested in many languages over the years, but …
81
Problems with Actor like models• Actors (Agents, Process or Threads) are not free• Message sending is not free• Context switching is not free• Still need Lock acquire/release at some level
and it is not free• Multiple actor coordination
– reinvent transactions?– Actors can still deadlock and starve– Programmer defines granularity by choosing what is
an actor
82
Other concurrency models
• Dataflow– Stream Processing Functions - Fits nicely with GPUs
• Futures• Tuple Spaces
• Stop gap solutions based on parallelised libraries
• Lots of experiments with embedded DSL
• Lots of R&D (again) in this area!!!
83
Other trends worth watching• Development methods
– Away from waterfall, top-down– Towards agile/XP/Scrum– Refactoring– Frameworks, Patterns– test-driven-development
• Tools– Powerful IDEs with plug-ins– Frameworks– VM and OS integrations
• MS PowerShell, v8 in Android
84
Promises for Programming• New ways of programming is back on the agenda• Understanding of HW has (again) become necessary• Semantics is back on the agenda
– SOS/Calculi for Fortress, Scala, F#– Advanced type systems and type inference
• Program Analysis and verification– JML and SPEC# (Design by contract)– SPIN, Blast, UPPAAL– ProVerif (Microsoft)– JavaPathfinder (NASA, Fujitsu)– WALA (IBM) osv.
85
Promises for Programming Language Development
• Programming Language construction is becoming easier– Extendible Open (source) Compilers for most
mainstream languages– AST (or expression trees in C#/F#, Fortress
and Scala)– Generic code generators– VM and JIT– Parsing Expression Grammars (PEG)
What about performance?
• Case studies in C, C# and Java by Peter Sestoft– Matrix multiplication– A division intensive series– Polynomial evaluation– A statistical function (NORMDIST)
• Execution platforms:– C: gcc 4.2.1, MacOS– C#: Microsoft .NET 4.0 and Mono 2.6– Java: Sun Hotspot server (unfortunately not IBM JVM)– Hardware:
• Intel Core 2 Duo, 2660 MHz 86
87
88
89
90
91
92
93
Is performance an issue?
• Case studies by Sestoft confirms older case study of Java for DSP– On IBM JVM Java beat C on all DSP
algorithms!
94
DARPA HPCS Language Project IBM
– X10 (now on version 2.1.2) CRAY
– Chapel (now on version 1.3.0) SUN
– Fortress– Became Open Source in 2007
These languages were expected to run well on and exploit the HPCS hardware platforms being developed by all vendors– But it is recognized that, in order to be adopted, any HPCS
language will also have to be effective on other parallel architectures.
– (And, ironically, the HPCS hardware will have to run Fortran + MPI programs well.)
“HPCS” languages have been tried before…
The Japanese 5th generation project had many similarities and near similarities to the DARPA program– 10 year program– A new parallel language (Concurrent Logic Programming)
expressing a new programming model• New language presented as more productive
– Multiple vendors, each implementing hardware support for the language, to address performance problems.
ADA was intended to replace both Fortran and Cobol. It included built-in parallelism. It was successful in its way: many (defense) applications were implemented in ADA. A large community of ADA programmers was created.
Neither language is with us today. It is not clear whether either even had any influence on our current parallel programming landscape.
Other (new) Programming Languages worth keeping an eye on
• Scala– Already able to run 1.2 million actors in 1 JVM– Has DSL for running on GPU (ScalaCL)– Recent grant with Stanford for HPC
• C# 3.0/4.0 or F#– LINQ and/or Accelerator
• Python/Ruby– Going from strength to strength
Eh …nothing changes
• Fortran will be with us for a long time• C/C++ with OpenMP and MPI will be with
us for a long time• Java – still the most widely used language
– Volatile key word to ensure memory consistance
– Lots of (Distributed) STM libraries
Discussion
• Anybody tempted to use any of these languages?
• Anybody tempted to extend or create new languages for HPC?
Assignment
• Try one of the assignments in– Fortress, Scala and/or F#
• And for the die-hards:– Try CUDA and/or OpenCL– Try ScalaCL and/or F# with Accelerator