program systems institute russian academy of sciences1 open ts: an advanced tool for parallel and...
TRANSCRIPT
11
Program Systems Institute Russian Academy of Sciences
Open TS: Open TS: an Advanced Tool foran Advanced Tool for
Parallel and Distributed Parallel and Distributed ComputingComputing
Program Systems Institute Program Systems Institute Russian Academy of Sciences, Russian Academy of Sciences,
2006-11-202006-11-20(Redmond, USA)(Redmond, USA)
22
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Presentation OutlinePresentation OutlinePresentation OutlinePresentation Outline Short self-introductionShort self-introduction Open TS outineOpen TS outine Few sample programsFew sample programs Inside Open TSInside Open TS MPI vs Open TS case studyMPI vs Open TS case study OpenTS@WinCCS OpenTS@WinCCS (academic)(academic) T-System SimplifiedT-System Simplified Open TS GadgetsOpen TS Gadgets Conference ratingConference rating Future workFuture work
33
Program Systems Institute Russian Academy of Sciences
Short Self-IntroductionShort Self-Introduction
44
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Pereslavl-Pereslavl-
ZalesskiZalesskiPereslavl-Pereslavl-ZalesskiZalesski
Russian Golden Ring Russian Golden Ring City: 857 years oldCity: 857 years old
Hometown of Great Hometown of Great Dukes of RussiaDukes of Russia
The first building site The first building site Peter The Great Peter The Great navynavy
Ancient capital of Ancient capital of Russian Orthodox Russian Orthodox churchchurch
55
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
PSI RAS, Pereslavl-ZalesskiPSI RAS, Pereslavl-ZalesskiPSI RAS, Pereslavl-ZalesskiPSI RAS, Pereslavl-Zalesski
66
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Flagship “SKIFFlagship “SKIF К- К-
10001000””Flagship “SKIFFlagship “SKIF К- К-
10001000”” Peak performancePeak performance22,,5 5 TflopsTflops
Linpack-Linpack-performanceperformance22,0,0 TflopsTflops
Efficiency ratioEfficiency ratio8080..1 %1 %
November 2004November 2004: The most powerful: The most powerful supercomputer in ex-USSRsupercomputer in ex-USSR
November 2004November 2004 : : Rank 98 in Top500Rank 98 in Top500
77
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
88
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-System HistoryT-System HistoryT-System HistoryT-System History Mid-Mid-80-80-iesies
Basic ideasBasic ideas of T-Systemof T-System 1990-1990-iesies
First implementationFirst implementation of T-Systemof T-System 2001-20022001-2002, “SKIF” , “SKIF”
GRACE — Graph Reduction Applied to GRACE — Graph Reduction Applied to Cluster Environment Cluster Environment
2003-current, “SKIF” 2003-current, “SKIF” Cooperation with MicrosoftCooperation with MicrosoftOpen TS — Open T-systemOpen TS — Open T-system
99
Program Systems Institute Russian Academy of Sciences
Open TS OverviewOpen TS Overview
1010
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Comparison: T-System and Comparison: T-System and MPIMPI
Comparison: T-System and Comparison: T-System and MPIMPI
C/Fortran T-System
Assembler MPI
High-levela few
keywords
Low-levelhundred(s)primitives
Sequential Parallel
1111
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-System in ComparisonT-System in ComparisonT-System in ComparisonT-System in ComparisonRelated workRelated work Open TS differentiatorOpen TS differentiator
Charm++Charm++ FP-based approachFP-based approach
UPC, mpC++UPC, mpC++ Implicit parallelismImplicit parallelism
Glasgow Glasgow Parallel HaskellParallel Haskell
Allows C/C++ based low-Allows C/C++ based low-level optimizationlevel optimization
OMPC++OMPC++ Provides both language Provides both language and C++ templates and C++ templates librarylibrary
CilkCilk Supports SMP, MPI, PVM, Supports SMP, MPI, PVM, and GRID platformsand GRID platforms
1212
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Open TS: an OutlineOpen TS: an OutlineOpen TS: an OutlineOpen TS: an Outline High-performance computing High-performance computing ““Automatic dynamic parallelization”Automatic dynamic parallelization” Combining functional and Combining functional and
imperative approaches, high-level imperative approaches, high-level parallel programmingparallel programming
Т++ Т++ language: “Parallel dialect” of language: “Parallel dialect” of C++ — an approach popular in 90-C++ — an approach popular in 90-iesies
1313
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Т-Т-ApproachApproachТ-Т-ApproachApproach ““Pure” functions (Pure” functions (tfunctionstfunctions) invocations ) invocations
produce grains of parallelismproduce grains of parallelism T-Program isT-Program is
Functional – on higher levelFunctional – on higher level Imperative – on low level (optimization)Imperative – on low level (optimization)
C-compatible execution modelC-compatible execution model Non-ready variables, Multiple assignmentNon-ready variables, Multiple assignment ““Seamless” C-extension Seamless” C-extension (or Fortran-(or Fortran-
extension)extension)
1414
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Т++Т++ Keywords KeywordsТ++Т++ Keywords Keywords tfuntfun —— Т-Т-functionfunction tvaltval—— Т-Т-variablevariable tptrtptr—— Т-Т-pointerpointer touttout —— Output parameter (like &) Output parameter (like &) tdroptdrop —— Make ready Make ready twaittwait —— Wait for readiness Wait for readiness tcttct —— Т-Т-contextcontext
1515
Program Systems Institute Russian Academy of Sciences
Short IntroductionShort Introduction(Sample Programs)(Sample Programs)
1616
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
#include <stdio.h>#include <stdio.h>
int int fibfib (int n) { (int n) {return n < 2 ? n : return n < 2 ? n : fibfib(n-1)+ (n-1)+ fibfib(n-2);(n-2);
}}
int int mainmain (int argc, char **argv) { (int argc, char **argv) {if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }int n = atoi(argv[1]);int n = atoi(argv[1]);printf("fib(%d) = %d\n", n, printf("fib(%d) = %d\n", n, fibfib(n));(n));return 0;return 0;
}}
Sample Program (C++)Sample Program (C++)Sample Program (C++)Sample Program (C++)
1717
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
#include <stdio.h>#include <stdio.h>
tfuntfun int int fibfib (int n) { (int n) {return n < 2 ? n : return n < 2 ? n : fibfib(n-1)+ (n-1)+ fibfib(n-2);(n-2);
}}
tfun tfun int int mainmain (int argc, char **argv) { (int argc, char **argv) {if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }int n = atoi(argv[1]);int n = atoi(argv[1]);printf("fib(%d) = %d\n", n, printf("fib(%d) = %d\n", n, (int)(int)fibfib(n));(n));return 0;return 0;
}}
Sample Program (T++)Sample Program (T++)Sample Program (T++)Sample Program (T++)
1818
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Sample Program (T+Sample Program (T+
+)+)Sample Program (T+Sample Program (T+
+)+)
0%
20%
40%
60%
80%
100%
120%
0 2 4 6 8 10
CPUs
Time(%) CoE
WinCCS cluster,WinCCS cluster,4 nodes4 nodes
CPU:CPU: AMD Athlon AMD Athlon 64 X2 Dual Core 64 X2 Dual Core Processor 4400+ Processor 4400+ 2.21 GHz2.21 GHz
Gigabit EthernetGigabit Ethernet
time% =time% = timetimetapptapp(N)/timet(N)/timetappapp((1)1)
CoE = CoE = 1/(n1/(n×time%)×time%)CPU Cores
1919
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Approximate calculation of Pi (C+Approximate calculation of Pi (C++)+)
Approximate calculation of Pi (C+Approximate calculation of Pi (C++)+)
#include <math.h>#include <math.h>#include <stdio.h>#include <stdio.h>#include <stdlib.h>#include <stdlib.h>doubledouble
iisumsum(double begin,(double begin, double finish,double finish, double d) {double d) {
double dl = finish - begin;double dl = finish - begin; double mid = double mid =
(begin + finish) / 2;(begin + finish) / 2; if (fabs(dl) > d)if (fabs(dl) > d) return return isumisum(begin, mid, (begin, mid,
d) +d) + isumisum(mid, finish, d);(mid, finish, d); return return ff(mid) * dl;(mid) * dl;}}
double double ff(double x) {(double x) {
return 4/(1+x*x);return 4/(1+x*x);
}}int int mainmain(int argc, char* argv[]){(int argc, char* argv[]){ unsigned long h;unsigned long h; double a, b, d, sum;double a, b, d, sum;
iif (argc < 2) {return 0;}f (argc < 2) {return 0;} a = 0; b = 1; h = atol(argv[1]);a = 0; b = 1; h = atol(argv[1]); d = fabs(b - a) / h;d = fabs(b - a) / h; sum = sum = isumisum(a, b, d);(a, b, d); printf("PI is approximately printf("PI is approximately
%15.15lf\n", sum);%15.15lf\n", sum); return 0;return 0;}}
2020
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Approximate calculation of Pi (T+Approximate calculation of Pi (T++)+)
Approximate calculation of Pi (T+Approximate calculation of Pi (T++)+)
#include <math.h>#include <math.h>#include <stdio.h>#include <stdio.h>#include <stdlib.h>#include <stdlib.h>tfun tfun doubledouble
iisumsum(double begin,(double begin, double finish,double finish, double d) {double d) {
double dl = finish - begin;double dl = finish - begin; double mid = double mid =
(begin + finish) / 2;(begin + finish) / 2; if (fabs(dl) > d)if (fabs(dl) > d) return return isumisum(begin, mid, (begin, mid,
d) +d) + isumisum(mid, finish, d);(mid, finish, d); return return (double)(double)ff(mid) * dl;(mid) * dl;}}
tfun tfun double double ff(double x) {(double x) {
return 4/(1+x*x);return 4/(1+x*x);
}}tfun tfun int int mainmain(int argc, char* (int argc, char*
argv[]){argv[]){ unsigned long h;unsigned long h; double a, b, d, sum;double a, b, d, sum;
iif (argc < 2) {return 0;}f (argc < 2) {return 0;} a = 0; b = 1; h = atol(argv[1]);a = 0; b = 1; h = atol(argv[1]); d = fabs(b - a) / h;d = fabs(b - a) / h; sum = sum = isumisum(a, b, d);(a, b, d); printf("PI is approximately printf("PI is approximately
%15.15lf\n", sum);%15.15lf\n", sum); return 0;return 0;}}
2121
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
0%
20%
40%
60%
80%
100%
120%
0 2 4 6 8 10
Time(%) CoE
Calculation of Pi (T++)Calculation of Pi (T++)Calculation of Pi (T++)Calculation of Pi (T++)
WinCCS cluster,WinCCS cluster,4 nodes4 nodes
CPU:CPU: AMD Athlon AMD Athlon 64 X2 Dual Core 64 X2 Dual Core Processor 4400+ Processor 4400+ 2.21 GHz2.21 GHz
Gigabit EthernetGigabit Ethernet
time% =time% = timetimetapptapp(N)/timet(N)/timetappapp((1)1)
CoE = CoE = 1/(n1/(n×time%)×time%)CPU Cores
2222
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Map-ReduceMap-ReduceMap-ReduceMap-Reduce----- Original Message ---------- Original Message -----From:From: AlexyAlexy MaykovMaykovSent:Sent: Monday, October 02, 2006 11:58 PM Monday, October 02, 2006 11:58 PMSubject:Subject: MCCS projects MCCS projects……I work in Microsoft Live Labs … I have I work in Microsoft Live Labs … I have several questions below:several questions below:
1.1. How would you implement Map-How would you implement Map-ReduceReduce
in OpenTS?in OpenTS?……
2323
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)
#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;
int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-
2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;
cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;
for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)
{{
a.push_back(i % factor);a.push_back(i % factor);
b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);
c.push_back(0);c.push_back(0);
fa.push_back(0);fa.push_back(0);
fb.push_back(0);fb.push_back(0);
}}
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););
cout << " Reducing..." << endl;cout << " Reducing..." << endl;
transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);
cout << endl << " Result: (" ;cout << endl << " Result: (" ;
ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");
copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);
cout << "\b)" << endl;cout << "\b)" << endl;
return 0;return 0;
}}
2424
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)
#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;
int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-
2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;
cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;
for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)
{{
a.push_back(i % factor);a.push_back(i % factor);
b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);
c.push_back(0);c.push_back(0);
fa.push_back(0);fa.push_back(0);
fb.push_back(0);fb.push_back(0);
}}
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););
cout << " Reducing..." << endl;cout << " Reducing..." << endl;
transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);
cout << endl << " Result: (" ;cout << endl << " Result: (" ;
ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");
copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);
cout << "\b)" << endl;cout << "\b)" << endl;
return 0;return 0;
}}
Fibonacci
Just “Plus”
2525
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)
#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;
int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-
2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;
cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;
for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)
{{
a.push_back(i % factor);a.push_back(i % factor);
b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);
c.push_back(0);c.push_back(0);
fa.push_back(0);fa.push_back(0);
fb.push_back(0);fb.push_back(0);
}}
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););
cout << " Reducing..." << endl;cout << " Reducing..." << endl;
transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);
cout << endl << " Result: (" ;cout << endl << " Result: (" ;
ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");
copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);
cout << "\b)" << endl;cout << "\b)" << endl;
return 0;return 0;
}}
Filling vectors:a=[ k%23 | k[1..40]]b=[ (41-k)%23 | k[1..40]]
Five vectors: a, b, fa, fb, c
2626
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)
#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;
int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-
2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;
cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;
for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)
{{
a.push_back(i % factor);a.push_back(i % factor);
b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);
c.push_back(0);c.push_back(0);
fa.push_back(0);fa.push_back(0);
fb.push_back(0);fb.push_back(0);
}}
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););
cout << " Reducing..." << endl;cout << " Reducing..." << endl;
transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);
cout << endl << " Result: (" ;cout << endl << " Result: (" ;
ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");
copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);
cout << "\b)" << endl;cout << "\b)" << endl;
return 0;return 0;
}}
Transform vectors:fa = map fib afb = map fib bc = zipWith plus fa fb
2727
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)
#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;
int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-
2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;
cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;
for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)
{{
a.push_back(i % factor);a.push_back(i % factor);
b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);
c.push_back(0);c.push_back(0);
fa.push_back(0);fa.push_back(0);
fb.push_back(0);fb.push_back(0);
}}
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););
cout << " Reducing..." << endl;cout << " Reducing..." << endl;
transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);
cout << endl << " Result: (" ;cout << endl << " Result: (" ;
ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");
copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);
cout << "\b)" << endl;cout << "\b)" << endl;
return 0;return 0;
}}
2828
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Map-Reduce (T+Map-Reduce (T++)+)Map-Reduce (T+Map-Reduce (T++)+)
#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;
tfun tfun int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-
2);2);}}tfun tfun int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}tfun tfun int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<vector<tval tval int> fa, fb;int> fa, fb;
cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;
for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)
{{
a.push_back(i % factor);a.push_back(i % factor);
b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);
c.push_back(0);c.push_back(0);
fa.push_back(0);fa.push_back(0);
fb.push_back(0);fb.push_back(0);
}}
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););
cout << " Mapping..." << endl;cout << " Mapping..." << endl;
transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););
cout << " Reducing..." << endl;cout << " Reducing..." << endl;
transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);
cout << endl << " Result: (" ;cout << endl << " Result: (" ;
ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");
copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);
cout << "\b)" << endl;cout << "\b)" << endl;
return 0;return 0;
}}
Vector of T-values
2929
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Map-Reduce (T++): “Laziness” Map-Reduce (T++): “Laziness” Map-Reduce (T++): “Laziness” Map-Reduce (T++): “Laziness” Filling, mapping — all T-functions are invoked, no T-Functions calculated: 0 seconds
Calculating of all T-functions, printing out: 8 seconds
3030
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
0%
20%
40%
60%
80%
100%
120%
0 2 4 6 8 10
Time(%) CoE
Map-Reduce (T++)Map-Reduce (T++)Map-Reduce (T++)Map-Reduce (T++)
WinCCS cluster,WinCCS cluster,4 nodes4 nodes
CPU:CPU: AMD Athlon AMD Athlon 64 X2 Dual Core 64 X2 Dual Core Processor 4400+ Processor 4400+ 2.21 GHz2.21 GHz
Gigabit EthernetGigabit Ethernet
time% =time% = timetimetapptapp(N)/timet(N)/timetappapp((1)1)
CoE = CoE = 1/(n1/(n×time%)×time%)CPU Cores
3131
Program Systems Institute Russian Academy of Sciences
Inside OpenTSInside OpenTS
3232
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Open TSOpen TS: : EnvironmentEnvironmentOpen TSOpen TS: : EnvironmentEnvironment
Supports more then 1,000,000
threads per core
Supports more then 1,000,000
threads per core
3333
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
SupermemorySupermemorySupermemorySupermemory
Utilization: non-ready values, Utilization: non-ready values, resource and status information, etc.resource and status information, etc.
Object-Oriented Distributed shared Object-Oriented Distributed shared memory (OO DSM)memory (OO DSM)
Global address spaceGlobal address space DSM-cell versioningDSM-cell versioning On top - automatic garbage collectionOn top - automatic garbage collection
3434
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Multithreading & Multithreading &
CommunicationsCommunicationsMultithreading & Multithreading & CommunicationsCommunications
Lightweight threadsLightweight threads PIXELS (1 000 000 threadsPIXELS (1 000 000 threads))
AsynchronousAsynchronous communications communications A thread A thread “A”“A” asks non-ready value (or new asks non-ready value (or new
job)job) Asynchronous request sent: Active Asynchronous request sent: Active
messages & Signals delivery over network to messages & Signals delivery over network to stimulate data transfer to the thread stimulate data transfer to the thread “A”“A”
Context switches (including a quant for Context switches (including a quant for communications)communications)
Latency HidingLatency Hiding for node-node exchange for node-node exchange
3535
Program Systems Institute Russian Academy of Sciences
Open TS applicationsOpen TS applications(selected)(selected)
3636
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
MultiGenMultiGenChelyabinsk State UniversityChelyabinsk State University
MultiGenMultiGenChelyabinsk State UniversityChelyabinsk State University
Level 0
Level 1
Level 2
Multi-conformation model
К0
К11 К12
К21 К22
3737
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
MultiGen: SpeedupMultiGen: Speedup
Substance Atom number
Rotations number
Conformers Exectution time (min.:с)
1 node 4 nodes 16 nodes
NCI-609067 28 4 13 9:33 3:21 1:22
TOSLAB A2-0261 82 18 49 115:27 39:23 16:09
NCI-641295 126 25 74 266:19 95:57 34:48
National Cancer Institute USAReg.No. NCI-609067(AIDS drug lead)
TOSLAB company (Russia-Belgium)Reg.No. TOSLAB A2-0261(antiphlogistic drug lead)
National Cancer Institute USAReg.No. NCI-641295(AIDS drug lead)
3838
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
AeromechanicsAeromechanicsInstitute of Mechanics, MSUInstitute of Mechanics, MSU
AeromechanicsAeromechanicsInstitute of Mechanics, MSUInstitute of Mechanics, MSU
3939
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Belocerkovski’sBelocerkovski’s approachapproachBelocerkovski’sBelocerkovski’s approachapproach
flow presented asa collection of smallelementary whirlwind(colours: clockwiseand contra-clockwiserotation)
4040
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Creating space-born radar image from Creating space-born radar image from hologramhologram
Creating space-born radar image from Creating space-born radar image from hologramhologram
Space Research Institute Development
4141
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Simulating broadband radar Simulating broadband radar signalsignal
Simulating broadband radar Simulating broadband radar signalsignal
Graphical User Interface
Non-PSI RAS development team (Space research institute of Khrunichev corp.)
0
50
100
150
200
250
300
1 4 8 12 16 20 24 28
0
50
100
150
200
250
300
1 4 8 12 16 20 24 28
4242
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Landsat Image Landsat Image
ClassificationClassification Landsat Image Landsat Image
ClassificationClassification Computational Computational “web-service”“web-service”
4343
Program Systems Institute Russian Academy of Sciences
Open TS vs MPI case Open TS vs MPI case studystudy
4444
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
ApplicationsApplicationsApplicationsApplications Popular and widely used Popular and widely used Developed by independent teams (MPI Developed by independent teams (MPI
experts)experts)
PovRayPovRay – Persistence of Vision Ray- – Persistence of Vision Ray-tracer, enabled for parallel run by a tracer, enabled for parallel run by a patchpatch
ALCMD/MP_liteALCMD/MP_lite – molecular dynamics – molecular dynamics package (Ames Lab)package (Ames Lab)
4545
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: code complexitycode complexity
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: code complexitycode complexity
ProgramProgram Source code Source code volumevolume
MPI modules for MPI modules for PovRay 3.10gPovRay 3.10g
1,500 lines1,500 lines
MPI patch for MPI patch for PovRay 3.50cPovRay 3.50c
3,000 lines3,000 lines
T++ modules (for T++ modules (for both versions 3.10g & both versions 3.10g & 3.50c)3.50c)
200 lines200 lines
~7—15 times
4646
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance
90%100%110%120%130%140%150%160%170%180%190%200%210%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1
4747
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance
90%100%110%120%130%140%150%160%170%180%190%200%210%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1
4848
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS ALCMD/OpenTS ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS ALCMD/OpenTS MP_Lite component of ALCMD MP_Lite component of ALCMD
rewritten in T++rewritten in T++ Fortran code is left intact Fortran code is left intact
M PI
M PIM P_Lite
ALCMD
OpenTS
OpenTSM P_Lite
ALCMD
4949
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS : ALCMD/OpenTS : code complexitycode complexity
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS : ALCMD/OpenTS : code complexitycode complexity
ProgramProgram Source code Source code volumevolume
MP_Lite total/MPIMP_Lite total/MPI ~20,000 lines~20,000 lines
MP_Lite,ALCMD-MP_Lite,ALCMD-related/related/MPIMPI
~3,500 lines~3,500 lines
MP_Lite,ALCMD-MP_Lite,ALCMD-related/related/OpenTSOpenTS
500 lines500 lines
~7 times
5050
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:
performanceperformance
80%
90%
100%
110%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
16 dual Athlon 1800, AMD Athlon MP 1800+ RAM 1GB, FastEthernet, LAM 7.0.6, Lennard-Jones MD, 512000 atoms
5151
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:
performanceperformance
80%
90%
100%
110%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1, Lennard-Jones MD, 512000 atoms
5252
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:
performanceperformance
80%
90%
100%
110%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, InfiniBand,MVAMPICH 0.9.4, Lennard-Jones MD,512000 atoms
5353
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:
performanceperformance
80%
90%
100%
110%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1, Lennard-Jones MD, 512000 atoms
5454
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:
performanceperformance
80%
90%
100%
110%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, InfiniBand,MVAMPICH 0.9.4, Lennard-Jones MD,512000 atoms
5555
Program Systems Institute Russian Academy of Sciences
Porting OpenTSPorting OpenTSto MS Windows CCSto MS Windows CCS
5656
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
2006: contract with Microsoft 2006: contract with Microsoft “Porting OpenTS to Windows “Porting OpenTS to Windows
Compute Cluster Server”Compute Cluster Server”
2006: contract with Microsoft 2006: contract with Microsoft “Porting OpenTS to Windows “Porting OpenTS to Windows
Compute Cluster Server”Compute Cluster Server” OpenTS@WinCCSOpenTS@WinCCS
inherits all basic features of the inherits all basic features of the original Linux versionoriginal Linux version
is available under FreeBSD licenseis available under FreeBSD license does not require any commercial does not require any commercial
compiler for T-program development compiler for T-program development — — it’s only enough to install VisualC+it’s only enough to install VisualC++ 2005 Express Edition (available for + 2005 Express Edition (available for free on Microsoft website) and PSDKfree on Microsoft website) and PSDK
5757
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
OpenTS@WinCCSOpenTS@WinCCSOpenTS@WinCCSOpenTS@WinCCS AMD64 and x86 platforms are AMD64 and x86 platforms are
currently supportedcurrently supported Integration into Microsoft Visual Studio Integration into Microsoft Visual Studio
20052005 Two ways for building T-applications: Two ways for building T-applications:
command line and Visual Studio IDE command line and Visual Studio IDE An installer of OpenTS for Windows An installer of OpenTS for Windows
XP/2003/WCCSXP/2003/WCCS Installation of WCCS SDK (including Installation of WCCS SDK (including
MS-MPI), if necessaryMS-MPI), if necessary OpenTS self-testing procedureOpenTS self-testing procedure
5858
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Installer of OpenTSInstaller of OpenTSfor Windows XP/2003/WCCSfor Windows XP/2003/WCCS
Installer of OpenTSInstaller of OpenTSfor Windows XP/2003/WCCSfor Windows XP/2003/WCCS
5959
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
OpenTS integration into OpenTS integration into Microsoft Visual Studio 2005Microsoft Visual Studio 2005
OpenTS integration into OpenTS integration into Microsoft Visual Studio 2005Microsoft Visual Studio 2005
6060
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T++ demo applicationsT++ demo applicationsT++ demo applicationsT++ demo applications POVRay and ALCMD were ported to POVRay and ALCMD were ported to
WindowsWindows A benchmark testingA benchmark testing
Both Windows and Linux were testedBoth Windows and Linux were tested Same hardware usedSame hardware used Same OpenTS kernel source code used Same OpenTS kernel source code used
(cross-platform academic OpenTS (cross-platform academic OpenTS microkernel)microkernel)
Same applications (POVRay and ALCMD) Same applications (POVRay and ALCMD) source code used for Windows and source code used for Windows and LinuxLinux
6161
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Benchmark notationsBenchmark notationsBenchmark notationsBenchmark notations time(N) — execution time (in seconds) time(N) — execution time (in seconds)
of T++ demo, where N of T++ demo, where N — — number CPU number CPU corescores
time_c — execution time of C time_c — execution time of C implementation (in seconds, one CPU implementation (in seconds, one CPU core used)core used)
time%(N) = time(N) / time_ctime%(N) = time(N) / time_c CoE = 1 / (N * time%(N)) — coefficient CoE = 1 / (N * time%(N)) — coefficient
of efficiencyof efficiency
6262
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
POVRay time(%)POVRay time(%)POVRay time(%)POVRay time(%)
0%
20%
40%
60%
80%
100%
120%
1 2 3 4 5 6 7 8CPUs
Time(%) Linux,MPI Time(%) Linux,OpenTSTime(%) Windows,MPI Time(%) Windows,OpenTS
6363
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
0%
20%
40%
60%
80%
100%
120%
1 2 3 4 5 6 7 8CPUs
CoE Linux,MPI CoE Linux,OpenTS CoE Windows,MPI CoE Windows,OpenTS
POVRay CoEPOVRay CoEPOVRay CoEPOVRay CoE
6464
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Performance issuesPerformance issuesPerformance issuesPerformance issues Academic OpenTS:Academic OpenTS:
CoE for POVRay is decreasing CoE for POVRay is decreasing (as well as for Fib, Pi,…) (as well as for Fib, Pi,…)
Reason Reason (proof: next slide):(proof: next slide): asynchronous communications asynchronous communications unsupported in unsupported in academic OpenTSacademic OpenTS
Possible subject for future workPossible subject for future work
6565
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-POVRay: time%T-POVRay: time%syncsync / time% / time%asyncasyncT-POVRay: time%T-POVRay: time%syncsync / time% / time%asyncasync
60%
80%
100%
120%
140%
160%
180%
200%
220%
0 1 2 3 4 5 6 7 8 9
CPUs
Time Sync / Time Async
6666
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8CPUs
Time(%) Linux,MPI Time(%) Linux,OpenTSTime(%) Windows,MPI Time(%) Windows,OpenTS
ALCMD time(%)ALCMD time(%)ALCMD time(%)ALCMD time(%)
6767
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
30%
40%
50%
60%
70%
80%
90%
100%
110%
120%
1 2 3 4 5 6 7 8CPUs
CoE Linux,MPI CoE Linux,OpenTS CoE Windows,MPI CoE Windows,OpenTS
ALCMD CoEALCMD CoEALCMD CoEALCMD CoE
6868
Program Systems Institute Russian Academy of Sciences
Open TS “Gadgets”Open TS “Gadgets”
6969
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Web-services, Live Web-services, Live
documentsdocumentsWeb-services, Live Web-services, Live
documentsdocumentstfuntfun int fib (int n) { int fib (int n) { return n < 2 ? n : return n < 2 ? n :
fib(n-1)+fib(n-2);fib(n-1)+fib(n-2);}}
<operation name="wstfib"><operation name="wstfib"> <SOAP:operation style="rpc" soapAction=""/><SOAP:operation style="rpc" soapAction=""/> <input><input> <<SOAP:body use="encoded" namespace="urn:myservice“SOAP:body use="encoded" namespace="urn:myservice“ encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>> </input></input> <output><output> <SOAP:body use="encoded" namespace="urn:myservice" <SOAP:body use="encoded" namespace="urn:myservice" encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>> </output></output>
</operation></operation>
twsgen Perl script
7070
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Trace visualizerTrace visualizerTrace visualizerTrace visualizer Collect trace of Collect trace of
T-program T-program executionexecution
Visualize Visualize performance performance metrics of metrics of OpenTS OpenTS runtimeruntime
7171
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Fault-toleranceFault-toleranceFault-toleranceFault-tolerance Recalculation based fRecalculation based fault-toleranceault-tolerance
(+)(+) Very simple (in comparison with full transactional Very simple (in comparison with full transactional model)model)
(+)(+) Efficient (only minimal set of damaged functions Efficient (only minimal set of damaged functions are recalculated)are recalculated)
(–)(–) Applicable only for functional programsApplicable only for functional programs Fault-tolerant communications neededFault-tolerant communications needed
(eg.: DMPI v1.0)(eg.: DMPI v1.0) Implemented (experimental version on Linux )Implemented (experimental version on Linux ) Subject for future work for OpenTS @ WinCCSSubject for future work for OpenTS @ WinCCS
7272
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Some other GadgetsSome other GadgetsSome other GadgetsSome other Gadgets Other T-languages: T-Refal, T-FortanOther T-languages: T-Refal, T-Fortan MemoizationMemoization Automatically choosing between call-Automatically choosing between call-
style and fork-style of function invocationstyle and fork-style of function invocation CheckpointingCheckpointing Heartbeat mechanismHeartbeat mechanism FlavoursFlavours of data references: “normal”, of data references: “normal”,
“glue” and “magnetic” “glue” and “magnetic” — — lazy, eager and lazy, eager and ultra-eager (speculative) data transferultra-eager (speculative) data transfer
7373
Program Systems Institute Russian Academy of Sciences
T-System “Simplified”T-System “Simplified”
7474
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-Sim libraryT-Sim libraryT-Sim libraryT-Sim library C++ templates, Open TS spun offC++ templates, Open TS spun off Simplistic implementation Simplistic implementation
no light-weight threads (NPTL threads)no light-weight threads (NPTL threads) no multiple-assignment variablesno multiple-assignment variables
FeaturesFeatures XML-RPC for WANs, MPI for LAN,meta-XML-RPC for WANs, MPI for LAN,meta-
cluster supportcluster support compatible load-balancing modelcompatible load-balancing model scheduler template library scheduler template library
7575
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-Sim vs Open TST-Sim vs Open TST-Sim vs Open TST-Sim vs Open TS
FeatureFeature Open TSOpen TS T-SimT-Sim
LanguageLanguage TT++ - С++++ - С++ extension extension, , compiler (GCC) converter compiler (GCC) converter (Windows)(Windows). .
CC++ - ++ - static librarystatic library. .
Data transfer Data transfer DynamicDynamic MPI MPI ((multiple multiple implementations supportimplementations support), ), TCPTCP
XMLXML--RPCRPC, , MPI(experimental)MPI(experimental)
SerializationSerialization
SynchronizationSynchronization Non-ready variables,Non-ready variables, multiple assignmentmultiple assignment
Non-ready variables,Non-ready variables, single single assigmentassigment
Granule of Granule of Parallelism Parallelism
T-functions – lighweight, T-functions – lighweight, non-preemptive threads.non-preemptive threads.
C++ - C++ - ««bindersbinders»» (or (or closures)closures) , , started in a started in a separate OS-level thread separate OS-level thread (NPTL)(NPTL)..
Memory Memory ManagementManagement
Distributed reference countDistributed reference count User-levelUser-level
SchedulerScheduler Dynamic load-balancing, Dynamic load-balancing, plug-ins mechanismplug-ins mechanism..
C++ templates – strategies, C++ templates – strategies, “lego” to construct app-“lego” to construct app-specific schedulersspecific schedulers
7676
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-Sim: sample programT-Sim: sample programT-Sim: sample programT-Sim: sample programtypedeftypedef TVal< TVal<intint> TInt ;> TInt ;
TSIM_TFUNDEF_2TSIM_TFUNDEF_2(fib,int,TInt,TFib)(fib,int,TInt,TFib)voidvoid fib( fib(intint in,TInt __out) in,TInt __out){{ intint out; out; if (in < 2) {if (in < 2) { out = in;out = in; }} elseelse { { TInt o1,TInt o1,o2o2;; TFib(in-1,o1);TFib(in-1,o1); TFib(in-2,o2);TFib(in-2,o2); out = o1+o2;out = o1+o2; o1.release();o1.release(); o2.release();o2.release(); }} __out = out;__out = out; returnreturn;;}}
int int main (main (intint argc, argc,charchar *argv[]) *argv[])
{{
intint t,t,res;res;
TSimRuntime rt;TSimRuntime rt;
TInt _res;TInt _res;
if (argc < 2) t = 10; if (argc < 2) t = 10;
else t = atoi(argv[1]);else t = atoi(argv[1]);
fib(t,_res);fib(t,_res);
res = _res;res = _res;
_res.release();_res.release();
cerrcerr << "The FIB "<<t<<"th is << "The FIB "<<t<<"th is " <<res<<" <<res<<endlendl;;
returnreturn 0; 0;
}}
7777
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Important FeaturesImportant FeaturesImportant FeaturesImportant Features Scheduler “lego”Scheduler “lego”
““Strategies” of task distribution: round-Strategies” of task distribution: round-robin,on data location, on CPU power robin,on data location, on CPU power available, etc..available, etc..
Resource gathering pluggable Resource gathering pluggable (static/dynamic implemented)(static/dynamic implemented)
Map/Reduce implementation existsMap/Reduce implementation exists Active messages templateActive messages template Still ExperimentalStill Experimental
7878
Program Systems Institute Russian Academy of Sciences
““Cooperative (or Cooperative (or Conference)Conference)
Rating” ProjectRating” Project
7979
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Project OutlineProject OutlineProject OutlineProject Outline Goals:Goals:
Familiarize with Open TS@WinCCSFamiliarize with Open TS@WinCCS Demonstrate programming techniques Demonstrate programming techniques
safe with side-effect functions, safe with side-effect functions, ““monotonic” global objectmonotonic” global object
Branch-and-Bound search for an Shortest Branch-and-Bound search for an Shortest Hamilton path in full-graphHamilton path in full-graph
Two developers (Alexander&Sergey)Two developers (Alexander&Sergey) Timeframe: 13-16 November Timeframe: 13-16 November
(at SC06, Microsoft booth, in background)(at SC06, Microsoft booth, in background)
8080
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
AlgorithmAlgorithmAlgorithmAlgorithm ““Conference” : experts reviewing Conference” : experts reviewing
paperspapers Each expert provides an order of Each expert provides an order of
papers (A better than B)papers (A better than B) Find an order, that minimizes conflictsFind an order, that minimizes conflicts Algorithm: recursionAlgorithm: recursion
Check, if the current cost is greater than Check, if the current cost is greater than current recordcurrent record
If it doesn’t, ask to add another node If it doesn’t, ask to add another node start from an empty orderstart from an empty order
Static Global Monotonic Object
8181
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Code AnalysisCode AnalysisCode AnalysisCode Analysis C versionC version
File Input File Input — — 6363 Lines of Code Lines of Code Algorithm implementation Algorithm implementation —— 9898 Global variable to store record valueGlobal variable to store record value
T++ versionT++ version File Input File Input — — 6363 ((same)same) Algorithm implementation Algorithm implementation —— 165 = 98 165 = 98
+ 67+ 67 Record UpdateRecord Update (67)(67): start function, that : start function, that
updatesupdates on each node local copies of on each node local copies of global monotonic objectsglobal monotonic objects
Efficient support of global monotonic objects needed — possible future work
8282
Program Systems Institute Russian Academy of Sciences
Proposal For Future Proposal For Future Cooperation with MicrosoftCooperation with Microsoft
8383
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Open TS “Windows”iation”Open TS “Windows”iation”Open TS “Windows”iation”Open TS “Windows”iation” More efficient utilizing of Windows API More efficient utilizing of Windows API Asynchronous communications supportAsynchronous communications support SMP modeSMP mode T-program trace visualizerT-program trace visualizer Generating web-services for T-functionsGenerating web-services for T-functions DMPIDMPI Fault tolerance for T++ applicationsFault tolerance for T++ applications Different schedulersDifferent schedulers In future: OpenTS/.NET In future: OpenTS/.NET — — T#T#
8484
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Templates and SkeletonsTemplates and SkeletonsTemplates and SkeletonsTemplates and Skeletons Development with collaboration with Development with collaboration with
interested MS teamsinterested MS teams Gather requirementsGather requirements PSI RAS implementationPSI RAS implementation Result: generic parallel solutionsResult: generic parallel solutions Map-reduce as the first candidateMap-reduce as the first candidate
C++ templates for usage OpenTS C++ templates for usage OpenTS kernel without (T++ kernel without (T++ → → C++)-converterC++)-converter
8585
Program Systems Institute Russian Academy of Sciences
THANKS THANKS
… … … … ANY QUESTIONSANY QUESTIONS ??????… …… …
[email protected]@opents.nett