the future of (super) computer programmingpeople.cs.aau.dk/~bt/phdsupercomputing2011/super...•...

1

The future of (super) Computer Programming

Bent Thomsen

[email protected] of Computer Science

Aalborg University

2

eScience: Simulation - The Third Pillar of Science • Traditional scientific and engineering paradigm:

1) Do theory or paper design.2) Perform experiments or build system.

• Limitations:- Too difficult -- build large wind tunnels.- Too expensive -- build a throw-away passenger jet.- Too slow -- wait for climate or galactic evolution.- Too dangerous -- weapons, drug design, climate experimentation.

• Computational science paradigm:3) Use high performance computer systems to simulate the

phenomenon- Base on known physical laws and efficient numerical methods.

Exascale computing

The United States has put aside $126 million for exascale computing beginning in 2012, in an attempt to overtake China's Tianhe-1A supercomputer as the fastest computing platform in the world.

3

February 21, 2011

How to spend a billion dollars

Exascale programme builds on the HPCS programme In Phase I (June 2002 – June 2003)

Cray, IBM, SUN, HP, SGI, MITRE spent $250 million

In Phase II (July 2003 – June 2006) Cray was awarded a $43.1 million IBM was awarded a $53.3 million SUN was awarded $49.7 million

Phase III (July 2006 – December 2010) Cray has been awarded $250 million IBM has been awarded $244 million

High Productivity Computing Systems

Phase 1 Phase 2(2003-2005)

Phase 3(2006-2010)

ConceptStudy

AdvancedDesign &Prototypes

Full ScaleDevelopment

Petascale/s Systems

Vendors

New EvaluationFramework

Test EvaluationFramework

Create a new generation of economically viable computing systems (2010) and a procurement methodology (2007-2010) for the security/industrial community

Validated ProcurementEvaluation Methodology

-Program Overview-

Half-Way PointPhase 2

TechnologyAssessment

Review

Petascale Computers• Roadrunner, built by IBM,

• first computer to go petascale, May 25, 2008, performance of 1.026 petaflops.

• XT5 "Jaguar", built by Cray, • Later in 2008. After an update in 2009, its performance reached

1.759 petaflops.• Nebulae built by Dawning,

• third petascale computer and the first built by China, performance of 1.271 petaflops in 2010.

• Tianhe-1A built by NUDT, • is the fastest supercomputer in the world, at 2.566 petaflops in

2010.

6


Impact: Performance (time-to-solution): speedup critical national

security applications by a factor of 10X to 40X Programmability (idea-to-first-solution): reduce cost and

time of developing application solutions Portability (transparency): insulate research and

operational application software from system Robustness (reliability): apply all known techniques to

protect against outside attacks, hardware faults, & programming errors

Fill the Critical Technology and Capability GapToday (late 80’s HPC technology)…..to…..Future (Quantum/Bio Computing)

Applications: Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant

modeling and biotechnology

HPCS Program Focus Areas

Create a new generation of economically viable computing systems (2010) and a procurement methodology (2007-2010) for the security/industrial community

HPCS Program GoalsProductivity Goals

• HPCS overall productivity goals:– Execution (sustained performance)

1 Petaflop/s scalable to greater than 4 Petaflop/s

– Development 10X over today’s systems Reference: Lone researcher and

Enterprise workflows

10x improvement in time to first solution!

Experiment

Theory

Experiment

Theory

Researcher

Lone ResearcherDesign

Simulation

Visualize

Enterprise

Design

Simulation

Visualize

Enterprise

Port Legacy Software

Enterprise Execution

Development

9

How to increase Programmer Productivity?

3 ways of increasing programmer productivity:1. Process (software engineering)

– Controlling programmers– Good process can yield up to 20% increase

2. Tools (verification, static analysis, program generation)– Good tools can yield up to 10% increase

3. Language design --- the center of the universe!– Core abstractions, mechanisms, services, guarantees– Affect how programmers approach a task (C vs. SML)– New languages can yield 700% increase

10


Large Part of HPCS Program focused on Programming Language Development

• X10 from IBM– Extended subset of Java based onNon-Uniform

Computing Clusters (NUCCs) where different memory locations incur different cost

• Chapel from CRAY– Built on HPF and ZPL (based on Modula-2, Pascal,

Algol)• Fortress from SUN

– Based on “Growwing a Language” philosophy

11

New Programming LanguagesWhy should I bother?

• Fortran has been with us since 1954• C has been with us since 1971• C++ has been with us from 1983• Java has been with us since 1995• C# has been with us since 2000

12

New Programming LanguagesWhy should I bother?

• Every generation improves:– Programmer productivity

• Higher level of abstraction• Thus reduce time-to-market

– Program reuse• Libraries, components, patterns

– Program reliability• Thus fewer bugs make it through to product

– But usually not performance • Usually lagging five years behind, but will catch-up

13

Programming Language Genealogy

Diagram by Peter SestoftLang History.htm

14

But why do we need new (HPCS) languages now?

• Until about 20 years ago there was a neat correspondence between the Fortran/C/C++/Java/C# programming model and the underlying machines

• The only thing that (apparently) changed was that the processors got faster

• Moore’s Law (misinterpreted):– The processor speed doubles every 18 months– Almost every measure of the capabilities of digital

electronic devices is linked to Moore's Law: processing speed, memory capacity, … (source Wikipedia)

15

The Hardware world is changing!

16

Moore’s Law• Popular belief:

– Moore’s Law stopped working in 2005!• Moore’s Law (misinterpreted):

– The processor speed doubles every 18 months• Moore’s Law still going strong

– the number of transistors per unit area on a chip doubles every 18 months

• Instead of using more and more HW real-estate on cache memory it is now used for multiple cores

17

The IT industry wakeup call

• The super computing community discovered the change in hardware first

• The rest of the computing industry have started to worry

“Multicore: This is the one which will have the biggest impact on us. We have never had a problem to solve like this. A breakthrough is needed in how applications are done on multicore devices.” – Bill Gates

18

What is the most expensive operation in this line of C code?

• int x = (3.14 * r) + (x * y);

19

A programmer’s view of memory

This model was pretty accurate in 1985. Processors (386, ARM, MIPS, SPARC) all ran at 1–10MHz clock speed and could access external memory in 1 cycle; and most instructions took 1 cycle.Indeed the C language was as expressively time-accurate as alanguage could be: almost all C operators took one or two cycles.But this model is no longer accurate!

20

A modern view of memory timings

So what happened? On-chip computation (clock-speed) sped upfaster (1985–2005) than off-chip communication (with memory) as feature sizes shrank.The gap was filled by spending transistor budget on caches which(statistically) filled the mismatch until 2005 or so.Techniques like caches, deep pipelining with bypasses, andsuperscalar instruction issue burned power to preserve our illusions.2005 or so was crunch point as faster, hotter, single-CPU Pentiumswere scrapped. These techniques had delayed the inevitable.

21

The Current Mainstream Processor

Will scale to 2, 4 maybe 8 processors. But ultimately shared memory becomes the bottleneck (1024 processors?!?).

22

Angela C. Sodan, Jacob Machina, Arash Deshmeh, Kevin Macnaughton, Bryan Esbaugh, "Parallelism via Multithreaded and Multicore CPUs," Computer, pp. 24-32, March, 2010

23

Angela C. Sodan, Jacob Machina, Arash Deshmeh, Kevin Macnaughton, Bryan Esbaugh, "Parallelism via Multithreaded and Multicore CPUs," Computer, pp. 24-32, March, 2010

24

Hardware will change

• Cell – Multi-core with 1 PPC + 8(6) SPE (SIMD) – 3 level memory hierarchy – broadcast communication

• GPU– 256 SIMD HW treads– Data parallel memory

• FPGA … (build your own hardware)

25

Super Computer Organisation

26

Locality and Parallelism

• Large memories are slow, fast memories are small.• Storage hierarchies are large and fast on average.• Parallel processors, collectively, have large, fast memories -- the

slow accesses to “remote” data we call “communication”.• Algorithm should do most work on local data.

ProcCache

L2 Cache

L3 Cache

Memory

ProcCache

L2 Cache

L3 Cache

Memory

potentialinterconnects

27

Memory Hierarchy• Most programs have a high degree of locality in their accesses

• spatial locality: accessing things nearby previous accesses• temporal locality: reusing an item that was previously accessed

• Memory hierarchy tries to exploit locality

on-chip cacheregisters

datapath

control

processor

Second level

cache (SRAM)

Main memory

(DRAM)

Secondary storage (Disk)

Tertiary storage

(Disk/Tape)

Speed (ns): 1 10 100 10 ms 10 sec

Size (bytes): 100s KB MB GB TB

28

Programming model(s) reflecting the new world are called for

• Algorithm should do most work on local data !!• Programmers need to

– make decisions on parallel execution – know what is local and what is not– need to deal with communication

• But how can the poor programmer ensure this?• She/he has to exploit:

– Data Parallelism and memory parallelism– Task parallelism and instruction parallelism

• She/he needs programming language constructs to help her/him

29

Domain deposition

30

Domain deposition methods

31

Functional deposition

32

Types of ParallelismTask Parallelism

– Parallelism explicit in algorithm– Between filters without

producer/consumer relationship

Data Parallelism– Peel iterations of filter, place within

scatter/gather pair (fission)– parallelize filters with state

Pipeline Parallelism– Between producers and consumers– Stateful filters can be parallelized

Scatter

Gather

Task

33

Types of ParallelismTask Parallelism

– Parallelism explicit in algorithm– Between filters without

producer/consumer relationship

Data Parallelism– Between iterations of a stateless filter – Place within scatter/gather pair (fission)– Can’t parallelize filters with state

Pipeline Parallelism– Between producers and consumers– Stateful filters can be parallelized

Scatter

Gather

Scatter

Gather

Task

Pip

elin

e

Data

Data Parallel

34

Types of ParallelismTraditionally:

Task Parallelism– Thread (fork/join) parallelism

Data Parallelism– Data parallel loop (forall)

Pipeline Parallelism– Usually exploited in hardware

Scatter

Gather

Scatter

Gather

Task

Pip

elin

e

Data

35

New HPCS Languages• Constructs for expressing Data Parallelism

– In Chapel distribution is separate annotation– In X10 no direct remote access for distributed array data– In Fortress user-defined distributed data structures without explicit

layout control– All have partitioned global address space (PGAS)

• Constructs for expressing Task Parallelism– All three languages support atomic blocks

• None of the languages have locks• (Semantically, locks are more powerful, but harder to manage than atomic

sections)– Other mechanisms

• X10 has “clocks” (barriers with dynamically attached tasks), conditional atomic sections, synchronization variables

• Chapel has “single” (single writer) and “sync” (multiple readers and writers) variables)

• Fortress has abortable atomic sections and a mechanism for waiting on individual spawned threads.

• Language designers could not resist temptation to also address some general programming language design issues

GPGPU (the poor mans HPC)

36

CUDA and OpenCL

• CUDA from NVIDIA– Extension to C for programming NVIDIA GPU

• OpenCL– Initiated by Apple– Developed by the Khronos Group– Extension to C generalizing CUDA concepts

from GPU to multi-core and Cell processors

37

OpenCL and CUDA

Image from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx

CUDA OpenCLKernel KernelHost program Host programThread Work itemBlock Work groupGrid NDRange (index space)

OpenCL and CUDA

__global__ void vecAdd(float *a,float *b, float *c)

{int i = threadIdx.x;c[i] = a[i] + b[i];

}

In CUDA:

OpenCL and CUDA

__kernel void vecAdd(__global const float *a, __global const float *b, __global float *c)

{int i = get_global_id(0);c[i] = a[i] + b[i];

}

In OpenCL:

OpenCL and CUDACUDA OpenCLGlobal memory Global memoryConstant memory Constant memoryShared memory Local memoryLocal memory Private memory

42

Other Programming Language Trends

• Conventional wisdom is out of the window– Cheaper to re-compute than store and fetch

• Declarative Programming– Mainly functional programming– “Why functional programming matters” (again)

• Hardware again influence language design• Need for correct programs

– eScience – what good is an eScience result if we cannot trust the computational results

– In mainstream we need to provide applications that will not let in vira and worms

– Software used in safety critical system– Software used in (high)-finance– Software used en eGovernment (e.g. online voting)– Implies lots of work on semantics

43Source: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

January 2011


January 2011

45

Conclusions• Nothing has changed much• Main languages have their domain

– Java – for web applications– C – for system programming– (Visual) Basic – for desktop windows apps– PHP for serverside scripting– C++ when Java is (perceived) too slow

• We shouldn’t bother with new languages!

• Wait a minute!• Something is changing

– Software is getting more and more complex– Hardware has changed

46

Which languages are discussed?

Source: http://langpop.com

48

Three Trends

• Declarative programming languages in vogue again– Especially functional

• Dynamic Programming languages are gaining momentum

• Concurrent Programming languages are back on the agenda

49

Declarative Programming

• Lots of talk about declarative languages:– Haskell– Scheme, Lisp, Clojure– F#, O’Caml, SML– Scala, Fortress

• Lots of talk about declarative constructs in traditional languages– C# (and Java and C++)

50

What do we mean by declarative/functional?

• Say what you want, without saying how

Or as Anders Heilsberg, Inventor of C#, puts it:

”programmers need to talk less about how to do things and more about what they want done and have computers reason it out."

51

Quicksort in C

52

Quicksort in Haskell

qsort [] = [] qsort (x:xs) =

qsort (filter (< x) xs) ++ [x] ++qsort (filter (>= x) xs)

53


• Say what you want, without saying how– Not quite true – more a question of saying how

implicitly

54


• Say what you want, without saying how

• Functions as first class entities• Lazy or(/and) eager evaluation• Pure vs. impure• Value oriented (vs. state oriented)• Pattern matching• Generics (or parametric polymorphism)

55

Mainstream programming is going declarative

In 2005 Anders Heilsberg (designer of C#) said:

``Generally speaking, it's interesting to think about more declarative styles of programming vs. imperative styles. ... Functional programming languages and queries are actually a more declarative style of programming''.

``programmers have to unlearn .. and to learn to trust that when they're just stating the ``what'' The machine is smart enough to do the ``how'' the way they want it done, or the most efficient way''. - Anders Hejlsberg

http://www.microsoft-watch.com/content/operating_systems/the_father_of_c_on_the_past_present_and_future_of_programming.html

56

Name the language...

Func<intlist, intlist> Sort = xs =>xs.Case(

() => xs, (head,tail) => (Sort(tail.Where(x => x < head)))

.Concat(Single(head))

.Concat(Sort(tail.Where(x => x >= head)))

);

• Quicksort revisited C# 3.0

type inferenceappend

higher-order function

parameterized type of functions

recursionfilter

lambda expression

57

C# 3.0 Language Extensionsvar contacts =

from c in customerswhere c.State == "WA"select new { c.Name, c.Phone };

var contacts =customers.Where(c => c.State == "WA").Select(c => new { c.Name, c.Phone });

Extension methods

Lambda expressions

Query expressions

Object initializers

Anonymous types

Local variable type inference

58

C# 3.0 Features– Implicitly Typed Local Variables– Lambda Expressions– Anonymous Types– Expression Trees– Query Expressions– Extension Methods– Object Initializers– Collection Initializers– Iterators– Lazy streams– Nullable value types

– C# 2.0 already has:• Generics• Structured Value Types• First class anonymous functions (called delegates)

59

F#• A .NET language (developed by Don Syme)

– Connects with all Microsoft foundation technologies – 3rd official MS language shipped with VS2010

• Aims to combine the best of Lisp, ML, Scheme, Haskell, in the context of .NET– Actually based on O’Caml

• Functional, math-oriented, scalable

• Aimed particularly at the "Symbolic Programming" niche at Microsoft

60

F# on one slideF# on one slide• let data = (1,2,3)

• let sqr x = x * x

• let f (x,y,z) = (sqr x, sqr y, sqr z)

• let sx,sy,sz = f (10,20,30)

• print "hello world"; 1+2

• let show x y z = • printf "x = %d y = %d y = %d \n" x y z;• let sqrs= f (x,y,z) in • print "Hello world\n";• sqrs

• let (|>) x f = f x

NOTE: parentheses optional on application

NOTE: sequencing

NOTE: local binding, sequencing, return

NOTE: type inferredval data: int * int * int

val sqr: int -> int

NOTE: pipelining operator

NOTE: pattern

matching

61

Java Future• Since its launch in 1995 Java has been

the darling of Industry and Academia• Java is now more than 15 years old• Pace of language innovation slowing down

– Java 6 SE released Dec. 2006– Java 6 EE released Dec. 2009

• Waiting for Java 7 SE / JDK 7 – Work started in 2006– Forecast Feb. 2010– Postponed till summer 2011

62

Guy Steele theorizes that programming languages are finite, and argues that the time is right for a successor to Java, which has another two decades of life left. Sun is investigating whether aligning programming languages more closely to traditional mathematical notation can reduce the burden for scientific programmers

"A Conversation With Guy Steele Jr."Dr. Dobb's Journal (04/05) Vol. 30, No. 4, P. 17; Woehr, Jack J.

Guy Steele co-wrote the original Java specifications and in 1996 was awarded the ACM SIGPLAN Programming Language Achievement Award. Steele is a distinguished engineer and principal investigator at Sun Microsystems Laboratories, where he heads the company's Programming Language Research Group.

Beyond Java

63

Fortress• One of the three languages DARPA spent 1BN$ on

– Actually SUN only got 49.7M$ (IBM and CRAY got the rest)

• First class higher order functions• Type inference• Immutable and mutable variables• Traits

– Like Java interfaces with code, classes without fields• Objects

– Consist of fields and methods• Designed to be parallel unless explicit sequential

– For loops and generators, tuples– Transactional Memory– PGAS (Partitioned Global Address Space)

• Runs on top of the JVM

64

“Advances” in Syntax• Extensible syntax – follows Guy Stell’s vision of

“Growing a language”– The only language I know with overloadable whitespace!– Syntax based on Parsing Expression Grammars (PEG)

• Syntax resembling mathematical notation

65

Scala• Scala is an object-oriented and functional language which is

completely interoperable with Java– Developed by Martin Odersky, EPFL, Lausanne, Switzerland

• Uniform object model– Everything is an object– Class based, single inheritance– Mixins and traits– Singleton objects defined directly

• Higher Order and Anonymous functions with Pattern matching• Genericity• Extendible

– All operators are overloadable, function symbols can be pre-, post- or infix

– New control structures can be defined without using macros

66

Scala is Object OrientedScala programs interoperate seamlessly with Java class libraries:– Method calls– Field accesses– Class inheritance– Interface implementation

all work as in Java.Scala programs compile to JVM bytecodes.

Scala’s syntax resembles Java’s, but there are also some differences.

object Example1 {def main(args: Array[String]) {

val b = new StringBuilder()for (i 0 until args.length) {

if (i > 0) b.append(" ")b.append(args(i).toUpperCase)

}Console.println(b.toString)

}

}

object instead of static members var: Type instead of Type var

Scala’s version of the extendedfor loop

(use <- as an alias for )Arrays are indexed

args(i) instead of args[i]

67

Scala is functional

The last program can also be written in a completely different style:– Treat arrays as instances

of general sequence abstractions.

– Use higher-orderfunctions instead of loops.

object Example2 {def main(args: Array[String]) {

println(args map (_.toUpperCase) mkString " ")

}}

Arrays are instances of sequences with map and mkString methods.

A closure which applies the toUpperCase method to its

String argument

map is a method of Array which applies the function on its right

to each array element.

mkString is a method of Array which forms a string of all elements with a

given separator between them.

68

Scala’s approach

• Scala applies Tennent’s design principles:– Concentrate on abstraction and composition

capabilities instead of basic language constructs

– Minimal orthogonal set of core language constructs

• But it is European

69

Clojure• Concurrent Lisp like language on JVM

– Developed by Rich Hickey• Functions are first-class values • Everything is an expression, except:

– Symbols– Operations (op ...)– Special operations:

• def if fn let loop recur do new . throw try set! quote var

• Code is expressed in data structures– Clojure is homoiconic

homoiconicity is a property of some programming languages, in which the primary representation of programs is also a data structure in a primitive type of the language itself - Wikipedia

70

Java vs. Clojure

71

Dynamic Programming• Lots of talk about dynamic languages

– PhP, Perl, Ruby– JavaScript– Lisp/Scheme– Erlang– Groovy– Clojure– Python

• jPython for JVM and IronPyhon for .Net

• Real-programmers don’t need types

74

Dynamic Language characteristics• (Perceived) to be less verbose

– Comes with good libraries/frameworks

• Interpreted or JIT to bytecode• Eval: string -> code• REPL style programming• Embeddable in larger applications as scripting language

• Supports Higher Order Function!• Object oriented

– JavaScript, Ruby and Python– Based on Self resp. SmallTalk

• Meta Programming made easier

75

Dynamic Programming in C# 4.0

– Dynamic Lookup• A new static type called: dynamic• No static typing of operations with dynamic• Exceptions on invalid usage at runtime

– Optional and Named Parameters– COM interop features– (Co-and Contra-variance)

dynamic d = GetDynamicObject(…); d.M(7); // calling methods d.f= d.P; // getting and settings fields and properties d[“one”] = d[“two”]; // getting and setting thorughindexers Int i= d + 3; // calling operators string s = d(5,7); // invoking as a delegate

76

Concurrent Programming• Lots of talk about Erlang

• Fortress, X10 and Chapel

• Java.util.concurrency• Actors in Scala• Clojure

• C omega• F# / Axum• .Net Parallel Extensions

77

The problem with Threads• Threads

– Program counter– Own stack– Shared Memory– Create, start (stop), yield ..

• Locks– Wait, notify, notifyall– manually lock and unlock

• or implicit via synchronized– lock ordering is a big problem– Granularity is important

– Not compositional

78

Several directions

• (Software) Transactional Memory– Enclose code in begin/end blocks or atomic

blocks– Variations

• specify manual abort/retry• specify an alternate path (way of controlling

manual abort)– Java STM2 library– Clojure, Fortress, X10, Chapel

79

Message Passing/Actors

– Erlang– Scala Actors– F#/Axum

– GO!

80

Theoretical Models

• Actors• CSP• CCS• pi-calculus• join-calculus

• All tried and tested in many languages over the years, but …

81

Problems with Actor like models• Actors (Agents, Process or Threads) are not free• Message sending is not free• Context switching is not free• Still need Lock acquire/release at some level

and it is not free• Multiple actor coordination

– reinvent transactions?– Actors can still deadlock and starve– Programmer defines granularity by choosing what is

an actor

82

Other concurrency models

• Dataflow– Stream Processing Functions - Fits nicely with GPUs

• Futures• Tuple Spaces

• Stop gap solutions based on parallelised libraries

• Lots of experiments with embedded DSL

• Lots of R&D (again) in this area!!!

83

Other trends worth watching• Development methods

– Away from waterfall, top-down– Towards agile/XP/Scrum– Refactoring– Frameworks, Patterns– test-driven-development

• Tools– Powerful IDEs with plug-ins– Frameworks– VM and OS integrations

• MS PowerShell, v8 in Android

84

Promises for Programming• New ways of programming is back on the agenda• Understanding of HW has (again) become necessary• Semantics is back on the agenda

– SOS/Calculi for Fortress, Scala, F#– Advanced type systems and type inference

• Program Analysis and verification– JML and SPEC# (Design by contract)– SPIN, Blast, UPPAAL– ProVerif (Microsoft)– JavaPathfinder (NASA, Fujitsu)– WALA (IBM) osv.

85

Promises for Programming Language Development

• Programming Language construction is becoming easier– Extendible Open (source) Compilers for most

mainstream languages– AST (or expression trees in C#/F#, Fortress

and Scala)– Generic code generators– VM and JIT– Parsing Expression Grammars (PEG)

What about performance?

• Case studies in C, C# and Java by Peter Sestoft– Matrix multiplication– A division intensive series– Polynomial evaluation– A statistical function (NORMDIST)

• Execution platforms:– C: gcc 4.2.1, MacOS– C#: Microsoft .NET 4.0 and Mono 2.6– Java: Sun Hotspot server (unfortunately not IBM JVM)– Hardware:

• Intel Core 2 Duo, 2660 MHz 86

Is performance an issue?

• Case studies by Sestoft confirms older case study of Java for DSP– On IBM JVM Java beat C on all DSP

algorithms!

94

DARPA HPCS Language Project IBM

– X10 (now on version 2.1.2) CRAY

– Chapel (now on version 1.3.0) SUN

– Fortress– Became Open Source in 2007

These languages were expected to run well on and exploit the HPCS hardware platforms being developed by all vendors– But it is recognized that, in order to be adopted, any HPCS

language will also have to be effective on other parallel architectures.

– (And, ironically, the HPCS hardware will have to run Fortran + MPI programs well.)

“HPCS” languages have been tried before…

The Japanese 5th generation project had many similarities and near similarities to the DARPA program– 10 year program– A new parallel language (Concurrent Logic Programming)

expressing a new programming model• New language presented as more productive

– Multiple vendors, each implementing hardware support for the language, to address performance problems.

ADA was intended to replace both Fortran and Cobol. It included built-in parallelism. It was successful in its way: many (defense) applications were implemented in ADA. A large community of ADA programmers was created.

Neither language is with us today. It is not clear whether either even had any influence on our current parallel programming landscape.

Other (new) Programming Languages worth keeping an eye on

• Scala– Already able to run 1.2 million actors in 1 JVM– Has DSL for running on GPU (ScalaCL)– Recent grant with Stanford for HPC

• C# 3.0/4.0 or F#– LINQ and/or Accelerator

• Python/Ruby– Going from strength to strength

Eh …nothing changes

• Fortran will be with us for a long time• C/C++ with OpenMP and MPI will be with

us for a long time• Java – still the most widely used language

– Volatile key word to ensure memory consistance

– Lots of (Distributed) STM libraries

Discussion

• Anybody tempted to use any of these languages?

• Anybody tempted to extend or create new languages for HPC?

Assignment

• Try one of the assignments in– Fortress, Scala and/or F#

• And for the die-hards:– Try CUDA and/or OpenCL– Try ScalaCL and/or F# with Accelerator

the future of (super) computer programmingpeople.cs.aau.dk/~bt/phdsupercomputing2011/super...•...

Documents