cs498dp 1 introduction
TRANSCRIPT
![Page 1: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/1.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 1/52
CS 498DP3/4/O INTRODUCTION TO PARALLELPROGRAMMING
SPRING 2013
Department of Computer Science
University of Illinois at Urbana-Champaign
1
![Page 2: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/2.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 2/52
Topics covered
• Parallel algorithms
• Parallel programing languages
• Parallel programming techniques focusing on tuning
programs for performance.
• The course will build on your knowledge of algorithms,
data structures, and programming. This is an advanced
course in Computer Science for CS students.• This course is a more advanced version of CS420
2
![Page 3: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/3.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 3/52
Why parallel programming ?
• For science and engineering• Science and engineering computations are often lengthy.
• Parallel machines have more computational power than their sequential counterparts.
• Faster computing→ Faster science/design
• If fixed resources: Better science/engineering
• For everyday computing• Scalable software will get faster with increased parlalelism.
• Better poser consumption.
• Yesterday: Top of the line machines were parallel
• Today: Parallelism is the norm for all classes of machines, frommobile devices to the fastest machines.
3
![Page 4: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/4.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 4/52
CS498DP3/4/O
• A parallel programming course for Computer Sciencestudents.
• Assumes students are proficient programmers with
knowledge of algorithms and data structures.
4
![Page 5: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/5.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 5/52
Course organization
Course website: http://courses.engr.illinois.edu/cs498dp3/
Instructor: David Padua
4227 SC
3-4223
Office Hours: TBATA: G. Carl Evans
Grading: 7-10 Machine Problems(MPs) 40%
Homeworks Not graded
Midterm (Wednesday, Feb 27) 30%
Final (Comprehensive,
8:00-11:00 AM, Wednesday, May 8) 30%
Graduate students registered for 4 credits must complete
additional work (assigned as part of some of the MPs).
5
![Page 6: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/6.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 6/52
MPs
• Several programing models
• Sequential (locality)
• Vector
• Shared memory
• Distributed memory
• Common language will be C with extensions.
• Target machines will be
• Engineering workstations for development
• I2PC5 in single user mode for measurements
6
![Page 7: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/7.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 7/52
Textbook
• Introduction to Parallel
Computing by Ananth Grama, Anshul Gupta, George Karypis, and
Vipin Kumar. Addison-Wesley . 2edition (January 26, 2003)
7
![Page 8: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/8.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 8/52
Specific topics covered
• Material from the
textbook, plus papers
on specific topicsincluding.
• Locality
• Vector computing
• Compiler technology
8
![Page 9: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/9.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 9/52
PARALLEL COMPUTING
9
![Page 10: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/10.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 10/52
An active subdiscipline
• The history of computing is intertwined with parallelism.
• Parallelism has become an extremely active discipline
within Computer Science.
10
![Page 11: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/11.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 11/52
What makes parallelism so important ?
• One reason is its impact on performance
• For a long time, the technology of high-end machines
• An important strategy to improve performance for all classes of
machines
11
![Page 12: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/12.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 12/52
Parallelism in hardware
• Parallelism is pervasive. It appears at all levels
• Within a processor • Basic operations
• Multiple functional units
• Pipelining
• SIMD
• Multiprocessors• Multiplicative effect on performance
12
![Page 13: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/13.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 13/52
Parallelism in hardware (Adders)
• Adders could be serial
• Parallel
• Or highly parallel
13
![Page 14: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/14.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 14/52
14
Carry lookahead logic
![Page 15: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/15.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 15/52
Parallelism in hardware
(Scalar vs SIMD array operations)
for (i=0; i<n; i++)c[i] = a[i] + b[i];
… Register File
X1
Y1
Z1
32 bits
32 bits
+
32
bits
ld r1, addr1
ld r2, addr2
add r3, r1, r2
st r3, addr3
n
times
ldv vr1, addr1
ldv vr2, addr2
addv vr3, vr1,
vr2
stv vr3, addr3
n/4
times
15
![Page 16: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/16.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 16/52
Parallelism in hardware (Multiprocessors)
• Multiprocessing is the characteristic that is most evident in
clients and high-end machines.
16
![Page 17: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/17.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 17/52
Power (1/3)
• With recent increases in frequency, there was
also an increase in energy consumption
• Power ∝ V 2 * frequency and since voltage and frequency depend on
each other: 2.5
( )Power frequency
![Page 18: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/18.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 18/52
Power (2/3)
D. Yen, “Chip multithreading processors enable reliable high throughput computing,”
Keynote speech at International Symposium on Reliability Physics (IRPS), April 2005.
From Pradip Bose. Power Wall. Encyclopedia of Parallel Computing Springer Verlag.
![Page 19: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/19.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 19/52
Challenges in Power
• Energy consumption imposes limits at the high end. (“Youwould need a good-size nuclear power plant next door [for an exascale machine]”P. Kogge)
• It also imposes limits on mobile and other personal
devices because of batteries. More processors implymore power (albeit only linear increases ?)
• This is a tremendous challenge at both ends of thecomputing spectrum.• New architectures
• Heterogeneous systems
• No caches
• Ability to switch off parts of processors
• New hardware technology
![Page 20: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/20.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 20/52
Power (3/3)
• At the same time,
Moore’s Law is
still going strong.
• Therefore
increased
parallelism ispossible
From Wikipedia
![Page 21: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/21.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 21/52
Parallelism is the norm
• Despite all limitations, there is much parallelism today andmore is coming.
• The most effective path towards performance gains
22
![Page 22: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/22.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 22/52
Clients: Intel microprocessor performance
(Graph from Markus Püschel, ETH)
Knights FerryMIC co-processor
22
23
![Page 23: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/23.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 23/52
High-end machines: Top 500 number 1
0.1
1
10
100
1000
10000
100000
1000000
10000000
100000000
J - 9 9
J - 0 0
J - 0 1
J - 0 2
J - 0 3
J - 0 4
J - 0 5
J - 0 6
J - 0 7
J - 0 8
J - 0 9
J - 1 0
J - 1 1
G f l o p / s Theoretical peak
performance
Theoretical peakperformance per core
Number of cores
23
24
![Page 24: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/24.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 24/52
• How can it be accessed ? In increasing degrees of
complexity:
• Applications
• Programming
• Libraries
• Implicitly parallel
• Explicitly parallel.
24
25
![Page 25: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/25.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 25/52
1. ISSUES IN APPLICATIONS
25
26
![Page 26: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/26.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 26/52
Applications at the high-end
• Numerous applications have been developed in a wide
range of areas.
• Science
• Engineering
• Search engines
• Experimental AI
• Tuning for performance requires expertise.
• Although additional computing power is expected to help
advances in science and engineering, it is not that simple:
26
27
![Page 27: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/27.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 27/52
More computational power is only part of
the story
• “increase in computing power will need to be accompanied bychanges in code architecture to improve the scalability, … and bythe recalibration of model physics and overall forecastperformance in response to increased spatial resolution” *
• “…there will be an increased need to work toward balancedsystems with components that are relatively similar in theirparallelizability and scalability”.*
• Parallelism is an enabling technology but much more is needed.
*National Research Council: The potential impact of high-end capability computing on four illustrative fields of scienceand engineering. 2008
27
28
![Page 28: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/28.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 28/52
Applications for clients / mobile devices
• A few cores can be justified to supportexecution of multiple applications.
• But beyond that, … What app will drive theneed for increased parallelism ?
• New machines will improve performance by
adding cores. Therefore, in the newbusiness model: software scalabilityneeded to make new machines desirable.
• Need app that must be executed locally andrequires increasing amounts of computation.
• Today, many applications ship computationsto servers (e.g. Apple’s Siri). Is that thefuture. Will bandwidth limitations force localcomputations ?
28
29
![Page 29: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/29.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 29/52
2A. ISSUES IN PARALLEL
PROGRAMMING:
LIBRARIES
29
30
![Page 30: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/30.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 30/52
Library routines
• Easy access to parallelism. Already available in some
libraries (e.g. Intel’s MKL).
• Same conventional programming style. Parallel programs
would look identical to today’s programs with parallelism
encapsulated in library routines.
• But, …
• Libraries not always easy to use (Data structures). Hence not
always used.
• Locality across invocations an issue.
• In fact, composability for performance not effective today
30
31
![Page 31: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/31.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 31/52
2B. ISSUES IN PARALLEL
PROGRAMMING:
IMPLICIT PARALLELISM
31
32
![Page 32: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/32.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 32/52
Objective:
Compiling conventional code
• Since the Illiac IV times
• “The ILLIAC IV Fortran compiler's Parallelism Analyzer and Synthesizer (mnemonicized as the Paralyzer)detects computations in Fortran DO loops which can be
performed in parallel.” (*)
(*) David L. Presberg. 1975. The Paralyzer: Ivtran's Parallelism Analyzer and Synthesizer. In Proceedings of the Conference on Programming Languages and Compilers for Parallel and Vector Machines . ACM, New York, NY, USA, 9-16.
33
![Page 33: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/33.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 33/52
Benefits
• Same conventional programming style. Parallel programs
would look identical to today’s programs with parallelism
extracted by the compiler.
• Machine independence.
• Compiler optimizes program.
• Additional benefit: legacy codes
• Much work in this area in the past 40 years, mainly at
Universities.
• Pioneered at Illinois in the 1970s
34
![Page 34: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/34.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 34/52
The technology
• Dependence analysis is the foundation.
• It computes relations between statement instances
• These relations are used to transform programs• for locality (tiling),
• parallelism (vectorization, parallelization),
• communication (message aggregation),
• reliability (automatic checkpoints),
• power …
35
![Page 35: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/35.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 35/52
The technology
Example of use of dependence
• Consider the loop
for (i=1; i<n; i++) {for (j=1; j<n; j++) {a[i][j]=a[i][j-1]+a[i-1][j];
}}
36
![Page 36: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/36.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 36/52
for (i=1; i<n; i++) {for (j=1; j<n; j++) {a[i][j]=a[i][j-1]+a[i-1][j];
}}
a[1][1] = a[1][0] + a[0][1]
a[1][2] = a[1][1] + a[0][2]
a[1][3] = a[1][2] + a[0][3]
a[1][4] = a[1][3] + a[0][4]
j=1
j=2
j=3
j=4
a[2][1] = a[2][0] + a[1][1]
a[2][2] = a[2][1] + a[1][2]
a[2][3] = a[2][2] + a[1][3]
a[2][4] = a[2][3] + a[1][4]
i=1 i=2
The technology
Example of use of dependence• Compute dependences (part 1)
37
![Page 37: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/37.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 37/52
The technology
Example of use of dependence
for (i=1; i<n; i++) {for (j=1; j<n; j++) {a[i][j]=a[i][j-1]+a[i-1][j];
}}
a[1][1] = a[1][0] + a[0][1]
a[1][2] = a[1][1] + a[0][2]
a[1][3] = a[1][2] + a[0][3]
a[1][4] = a[1][3] + a[0][4]
j=1
j=2
j=3
j=4
a[2][1] = a[2][0] + a[1][1]
a[2][2] = a[2][1] + a[1][2]
a[2][3] = a[2][2] + a[1][3]
a[2][4] = a[2][3] + a[1][4]
i=1 i=2
• Compute dependences (part 2)
38
![Page 38: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/38.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 38/52
The technology
Example of use of dependence
for (i=1; i<n; i++) {for (j=1; j<n; j++) {a[i][j]=a[i][j-1]+a[i-1][j];
}}
1 2 3 4 …
1
2
3
4
j
i
1,1
or
39
![Page 39: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/39.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 39/52
The technology
Example of use of dependence3.
for (i=1; i<n; i++) {for (j=1; j<n; j++) {
a[i][j]=a[i][j-1]+a[i-1][j];}}
• Find parallelism
40
![Page 40: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/40.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 40/52
The technology
Example of use of dependence
for (i=1; i<n; i++) {for (j=1; j<n; j++) {
a[i][j]=a[i][j-1]+a[i-1][j];}}
• Transform the code
for k=4; k<2*n; k++) forall (i=max(2,k-n):min(n,k-2)) a[i][k-i]=...
41
![Page 41: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/41.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 41/52
How well does it work ?
• Depends on three factors:
1. The accuracy of the dependence analysis
2. The set of transformations available to the compiler
3. The sequence of transformations
42
![Page 42: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/42.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 42/52
How well does it work ?
Our focus here is on vectorization
• Vectorization important:
• Vector extensions are of great importance. Easy parallelism. Will
continue to evolve
• SSE
• AltiVec
• Longest experience
• Most widely used. All compilers has a vectorization pass
(parallelization less popular)
• Easier than parallelization/localization
• Best way to access vector extensions in a portable manner
• Alternatives: assembly language or machine-specific macros
43
![Page 43: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/43.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 43/52
How well does it work ?
Vectorizers - 2005
0
1
2
3
4
C a l c u
l a t i o n
_ o f_ t
h e_ L T P...
S h o r t
_ T e r m
_ A n a l
y s i s_ F i l t e r
S h o r t
_ T e r m
_ S y n t h
e s i s_ F
i l t e r
c a l c_ n o i s e 2
s y n t h_
1 t o 1
j p e g _ i d c t_ i s l o w d i s t 1 f d c t
f o r m_ c
o m p o
n e n t_ p r e
d i c t i o n i d c t
I W P i x
m a p : : i n i
t
p e r s p_
t e x t u r
e d_ t r i a
n g l e
g l_ d e p
t h_ t e s
t_ s p a n
_ g e n
e r i c
m i x_
m y s t e r y_
s i g n a l
S p e e d u
p s
Manual Vectorization
ICC 8.0
G. Ren, P. Wu, and D. Padua: An Empirical Study on the Vectorization of Multimedia Applications
for Multimedia Extensions. IPDPS 2005
44
![Page 44: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/44.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 44/52
S. Maleki, Y. Gao, T. Wong, M. Garzarán, and D. Padua. An Evaluation of Vectorizing Compilers .International Conference on Parallel Architecture and Compilation Techniques. PACT 2011.
How well does it work ?
Vectorizers - 2010
45
![Page 45: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/45.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 45/52
Going forward
• It is a great success story. Practically all compilers today havea vectorization pass (and a parallelization pass)
• But… Research in this are stopped a few years back. Althoughall compilers do vectorization and it is a very desirable property.
• Some researchers thought that the problem was impossible tosolve.
• However, work has not been as extensive nor as long as workdone in AI for chess of question answering.
• No doubt that significant advances are possible.
46
![Page 46: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/46.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 46/52
What next ?
3-10-2011
Inventor, futurist predicts dawn of total artificialintelligence
Brooklyn, New York (VBS.TV) -- ...Computers will be ableto improve their own source codes ... in ways we puny
humans could never conceive.
47
![Page 47: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/47.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 47/52
2C. ISSUES IN PARALLEL
PROGRAMMING:
EXPLICIT PARALLELISM
48
![Page 48: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/48.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 48/52
• Much has been accomplished
• Widely used parallel programming notations• Distributed memory (SPMD/MPI) and
• Shared memory (pthreads/OpenMP/TBB/Cilk/ArBB).
Accomplishments of the last decades in
programming notation
49
![Page 49: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/49.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 49/52
• OpenMP constitutes an important advance, butits most important contribution was to unify the
syntax of the 1980s (Cray, Sequent, Alliant,Convex, IBM,…).
• MPI has been extraordinarily effective.
• Both have mainly been used for numerical
computing. Both are widely considered as “lowlevel”.
Languages
50
![Page 50: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/50.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 50/52
The future
• Higher level notations
• Libraries are a higher level solution, but perhaps too high-level.
• Want something at a lower level that can be used to
program in parallel.
• The solution is to use abstractions.
51
![Page 51: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/51.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 51/52
Array operations in MATLAB
• An example of abstractions are array operations.
• They are not only appropriate for parallelism, but also tobetter represent computations.
• In fact, the first uses of array operations does not seem tobe related to parallelism. E.g. Iverson’s APL (ca. 1960).
Array operations are also powerful higher levelabstractions for sequential computing
• Today, MATLAB is a good example of languageextensions for vector operations
52
![Page 52: Cs498DP 1 Introduction](https://reader033.vdocument.in/reader033/viewer/2022050916/577ce33a1a28abf1038b9d14/html5/thumbnails/52.jpg)
7/30/2019 Cs498DP 1 Introduction
http://slidepdf.com/reader/full/cs498dp-1-introduction 52/52
Array operations in MATLAB
Matrix addition in scalar mode
for i=1:m,
for j=1:l,
c(i,j)= a(i,j) + b(i,j);end
end
Matrix addition in array notation
c = a + b;