programming - gbv
TRANSCRIPT
yy-X-'-y-'•:•,:•••'••'•• :: ':•; '•''•••• : 1 : :• :* - / : - - : - V
PROGRAMMING
Calvin Lin Department of Computer Sciences The University of Texas at Austin
Lawrence Snyder Department of Computer Science and Engineering
University of Washington, Seattle
PEARSON
Addison Wesley
Boston San Francisco New York
London Toronto Sydney Tokyo Singapore Madrid
Mexico City Munich Paris Cape Town Hong Kong Montreal
Contents
PARTI Foundations
Chapter 1 Introduction
The Power and Potential of Parallelism Parallelism, a Familiär Concept Parallelism in Computer Programs Multi-Core Computers, an Opportunity Even More Opportunities to Use Parallel
Hardware Parallel Computing versus Distributed
15
I C t •••. •
16 16 17 18
19
Computing System Level Parallelism Convenience of Parallel Abstractions
Examining Sequential and Parallel Programs
Parallelizing Compilers A Paradigm Shift Parallel Prefix Sum
Parallelism Using Multiple Instruction Streams
The Concept of a Thread A Multithreaded Solution to Counting 3s
The Goals: Scalability and Performance Portability
Scalability Performance Portability Principles First
20 20 22
22 22 23 27
29 29 29
39 39 40 41
Chapter Summary 41 Historical Perspective 42 Exercises 42
Chapter 2 Understandirtg Parallel Computers 44
Balancing Machine Specifics with Portability 44
A Look at Six Parallel Computers 45 Chip Multiprocessors 45 Symmetrie Multiprocessor Architectures 48 Heterogeneous Chip Designs 50 Clusters 53 Supercomputers 54 Observations from Our Six
Parallel Computers 57
An Abstraction of a Sequential Computer 58 Applying the RAM Model 58 Evaluating the RAM Model 59
The PRAM: A Parallel Computer Model 60
The CTA: A Practical Parallel Computer Model
The CTA Model Communication Latency Properties of the CTA
Memory Reference Mechanisms Shared Memory
61 61 63 66
67 67
8
One-Sided Communication 68 Message Passing 68 Memory Consistency Models 69 Programming Models 70
A Closer Look at Communication 71
Applying the CTA Model 72
Chapter Summary 73 Historical Perspective 73 Exercises 73
Chapter 3 Reasoning about Performance 75
Motivation and Basic Concepts 75 Parallelism versus Performance 75 Threads and Processes 76 Latency and Throughput 76
Sources of Performance Loss 78 Overhead 78 Non-Parallelizable Code 79 Contention 81 IdleTime 81
Parallel Structure 82 Dependences 82 Dependences Limit Parallelism 84 Granularity 86 Locality 87
Performance Trade-Offs 87 Communication versus Computation 88 Memory versus Parallelism 89 Overhead versus Parallelism 89
Measuring Performance 91 Execution Time 91 Speedup 92 Superlinear Speedup 92 Efficiency 93 Concerns with Speedup 93 Scaled Speedup versus Fixed-Size Speedup 95
Scalable Performance 95 Scalable Performance Is Difficult to Achieve 95
Implications for Hardware Implications for Software Scaling the Problem Size
Chapter Summary Historical Perspective Exercises
PART 2 Parallel Abstractions
Chapter 4 First Steps Toward Parallel Programming
96 97 97
98 98 99
101
102
Data and Task Parallelism Definitions Illustrating Data and Task Parallelism
The Peril-L Notation Extending C Parallel Threads Synchronization and Coordination Memory Model Synchronized Memory Reduce and Scan The Reduce Abstraction
Count 3 s Example
Formulating Parallelism Fixed Parallelism Unlimited Parallelism Scalable Parallelism
Alphabetizing Example Unlimited Parallelism Fixed Parallelism Scalable Parallelism
Comparing the Three Solutions
Chapter Summary Historical Perspective Exercises
102 102 103
103 104 104 105 106 108 109 110
111
111 111 112 113
114 115 116 118
123
124 124 124
Contents
Chapter 5 Scalable Algor i thmic Techniques
Blocks of Independent Computation
Schwartz' Algorithm
The Reduce and Scan Abstractions Example of Generalized Reduces
and Scans The Basic Structure Structure for Generalized Reduce Example of Components
of a Generalized Scan Applying the Generalized Scan Generalized Vector Operations
Assigning Work to Processes Statically Block Allocations Overlap Regions Cyclic and Block Cyclic Allocations Irregulär Allocations
Assigning Work to Processes Dynamically
Work Queues Variations of Work Queues Case Study: Concurrent Memory
Allocation
Trees Allocation by Sub-Tree Dynamic Allocations
Chapter Summary Historical Perspective Exercises
PART 3 Parallel Programming Languages
Chapter 6 Programming wi th Threads
POSIX Threads Thread Creation and Destruction
126
127
129
130 132 133
136 138 139
139 140 142 143 146
148 148 151
151
153 153 154
155 156 156
157
159
159 160
Mutual Exclusion 164 Synchronization 167 Safetylssues 177 Performance Issues 181 Case Study: Successive Over-Relaxation 188 Case Study: Overlapping Synchronization
with Computation 193 Case Study: Streaming Computations
on a Multi-Core Chip 201
Java Threads 201 Synchronized Methods 203 Synchronized Statements 203 The Count 3s Example 204 Volatile Memory 206 Atomic Objects 206 Lock Objects 207 Executors 207 Concurrent Collections 207
OpenMP 207 The Count 3s Example 208 Semantic Limitations on p a r a l l e l for 209 Reduction 210 Thread Behavior and Interaction 211 Sections 213 Summary of OpenMP 213
Chapter Summary 214 Historical Perspective 214 Exercises 214
Chapter 7 MPI and Other Local Wiew Languages
MPI: The Message Passing Interface 216 The Count 3s Example 217 Groups and Communicators 225 Point-to-Point Communication 226 Collective Communication 228 Example: Successive Over-Relaxation 233 Performance Issues 236 Safety Issues 242
Partitioned Global Address Space Languages 243
Contents
Co-Array Fortran Unified Parallel C Titanium
Chapter Summary Historical Perspective Exercises
Chapter 8 Z P L a n d O t h e r G l o b a l V i e w L a n g u a g e s
The ZPL Programming Language
Basic Concepts of ZPL Regions Array Computation
Life, an Example The Problem The Solution How It Works The Philosophy of Life
Distinguishing Features of ZPL Regions Statement-Level Indexing Restrictions Imposed by Regions Performance Model Addition by Subtraction
Manipulating Arrays of Different Ranks Partial Reduce Flooding The Flooding Principle Data Manipulation, an Example Flood Regions Matrix Multiplication
Reordering Data with Remap Index Arrays Remap Ordering Example
Parallel Execution of ZPL Programs Role of the Compiler Specifying the Number of Processes
244 245 246
247 248 248
250
250
Assigning Regions to Processes 275 Array Allocation 276 Scalar Allocation 277 Work Assignment 277
Performance Model 278 Applying the Performance Model: Life 279 Applying the Performance Model:
SUMMA 280 Summary of the Performance Model 280
NESL Parallel Language 281 Language Concepts 281 Matrix Product Using Nested Parallelism 282 NESL Complexity Model 283
251 251 254
256 256 256 257 259
259 259 259 260 260 261
261 262 263 264 265 266 267
269 269 270 272
274 274 275
Chapter Summary Historical Perspective Exercises
Chapter 9 Assessing the State of the Art Four Important Properties of Parallel Languages
Correctness Performance Scalability Portability
Evaluating Existing Approaches POSIX Threads Java Threads OpenMP MPI PGAS Languages ZPL NESL
Lessons for the Future Hidden Parallelism Transparent Performance Locality Constrained Parallelism Implicit versus Explicit Parallelism
Chapter Summary Historical Perspective Exercises
283 283 284
285
285 285 287 288 288
289 289 290 290 290 291 292 292
293 293 294 294 294 295
296 296 296
Contents
PART 4 Parallel Programming Recommendations 321
Looking Forward Chapter 10 Future Directions in Parallel Programming Attached Processors
Graphics Processing Units Cell Processors Attached Processors Summary
Grid Computing
Transactional Memory Comparison with Locks Implementation Issues Open Research Issues
MapReduce
Problem Space Promotion
Emerging Languages Chapel Fortress X10
Chapter Summary Historical Perspective Exercises
297
jß *̂* ß»
298 299 302 302
304
305 306 307 309
310
312
313 314 314 316
318 318 318
Incremental Development Focus on the Parallel Structure Testing the Parallel Structure Sequential Programming Be Willing to Write Extra Code Controlling Parameters during Testing Functional Debugging
Capstone Project Ideas Implementing Existing Parallel
Algorithms Competing with Standard Benchmarks Developing New Parallel Computations
Performance Measurement Comparing against a Sequential Solution Maintaining a Fair Experimental Setting
Understanding Parallel Performance
Performance Analysis
Experimental Methodology
Portability and Tuning
Chapter Summary Historical Perspective Exercises
321 321 322 323 323 324 324
325
325 326 327
328 329 329
330
331
332
333
333 333 334
Chapter 11 Writing Parallel Programs
Getting Started Access and Software Hello, World
319 Glossary
319 References 319 320 Index
335
339
342