multicore, parallelism, and multithreading
DESCRIPTION
Multicore, parallelism, and multithreading. By: Eric Boren, Charles Noneman , and Kristen Janick. Multicore Processing. Why we care. What is it?. A processor with more than one core on a single chip - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/1.jpg)
MULTICORE, PARALLELISM, AND MULTITHREADINGBy: Eric Boren, Charles Noneman, and Kristen Janick
![Page 2: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/2.jpg)
MULTICORE PROCESSINGWhy we care
![Page 3: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/3.jpg)
What is it?
A processor with more than one core on a single chip
Core: An independent system capable of processing instructions and modifying registers and memory
![Page 4: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/4.jpg)
Motivation Advancements in component technology
and optimization are limited in contribution to processor speed
Many CPU applications attempt to do multiple things at once: Video editing Multi-agent simulation
So, use multiple cores to get it done faster
![Page 5: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/5.jpg)
Hurdles Instruction assignment (who does what?)
Mostly delegated to the operating system Can be done to a small degree through
dependency analysis on the chip Cores must still communicate at times –
how? Shared-memory Message passing
![Page 6: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/6.jpg)
Advantages Multiple Programs:
Can be separated between cores Other programs don’t suffer when one hogs CPU
Multi-threaded Applications: Independent threads don’t have to wait as long for
each other – results in faster overall execution VS Multiple Processors
Less distance between chips - faster communication results in higher maximum clock rate
Less expensive due to smaller overall chip area, shared components (caches, etc.)
![Page 7: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/7.jpg)
Disadvantages OS and programs must be optimized for
multiple cores, or no gain will be seen
In a singly-threaded application, little to no improvement
Overhead in assigning tasks to cores
Real bottleneck is typically memory and disk access time – independent of number of cores
![Page 8: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/8.jpg)
Amdahl’s Law
Potential performance increase on a parallel computing platform is given by Amdahl’s law. Large problems are made up of several
parallelizable parts and non-parallelizable parts.
S = 1/(1-P) S = speed-up of program P = fraction of program that is parallizable
![Page 9: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/9.jpg)
Current State of the Art Commercial processors:
Most have at least 2 cores Quad-core are highly popular for desktop applications 6-core processors have recently appeared on the
market (Intel’s i7 980X) 8-core exist but are less common
Academic and research: MIT: RAW 16-core Intel Polaris – 80-core UC Davis: AsAP – 36 and 167-core, individually-
clocked
![Page 10: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/10.jpg)
PARALLELISM
![Page 11: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/11.jpg)
What is Parallel Computing?
Form of computation in which many calculations are carried out simultaneously.
Operating on the principle that large problems can often be divided into smaller ones, which are solved concurrently.
![Page 12: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/12.jpg)
Types of Parallelism
Bit level parallelism Increase processor word size
Instruction level parallelism Instructions combined into groups
Data parallelism Distribute data over different computing environments
Task parallelism Distribute threads across different computing environments
![Page 13: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/13.jpg)
Flynn’s Taxonomy
![Page 14: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/14.jpg)
Single Instruction, Single Data (SISD)
Provides no parallelism in hardware
1 data stream processed by the CPU in 1 clock cycle
Instructions executed in serial fashion
![Page 15: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/15.jpg)
Multiple Instruction, Single Data (MISD)
Process single data stream using multiple instruction streams simultaneously
More theoretical model than practical model
![Page 16: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/16.jpg)
Single Instruction, Multiple Data (SIMD)
Single instruction steam has ability to process multiple data streams in 1 clock cycle
Takes operation specified in one instruction and applies it to more than 1 set of data elements at 1 time
Suitable for graphics and image processing
![Page 17: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/17.jpg)
Multiple Instruction, Multiple Data (MIMD)
Different processors can execute different instructions on different pieces of data
Each processor can run independent task
![Page 18: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/18.jpg)
Automatic parallelization
The goal is to relieve programmers from the tedious and error-prone manual parallelization process.
Parallelizing compiler tries to split up a loops so that its iterations can be executed on separate processors concurrently
Identify dependences between references -- independent actions can operate in parallel
![Page 19: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/19.jpg)
Parallel Programming languages
Concurrent programming languages, libraries, API’s, and parallel programming models have been created for programming parallel computers.
Parallel languages make it easier to write parallel algorithms Resulting code will run more efficiently because
the compiler will have more information to work with
Easier to identify data dependencies so that the runtime system can implicitly schedule independent work
![Page 20: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/20.jpg)
MULTITHREADING TECHNIQUES
![Page 21: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/21.jpg)
fork()
Make a (nearly) exact duplicate of the process
Good when there is no or almost no need to communicate between processes
Often used for servers
![Page 22: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/22.jpg)
fork()ParentGlobalsHeap Stack
ChildGlobalsHeapStack
ChildGlobalsHeapStack
ChildGlobalsHeapStack
ChildGlobalsHeapStack
![Page 23: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/23.jpg)
fork()
pid_t pID = fork();
if (pID == 0) {//child
} else {//parent
}
![Page 24: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/24.jpg)
POSIX Threads C library for threading
Available in Linux, OS X
Shared Memory
Threads are created and destroyed manually
Has mechanisms for locking memory
![Page 25: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/25.jpg)
POSIX ThreadsProcessGlobalsHeap
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
![Page 26: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/26.jpg)
POSIX Threadspthread_t thread;pthread_create( &thread, NULL,
function_to_call, (void*) data);
//Do stuff
pthread_join(thread, NULL);
![Page 27: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/27.jpg)
POSIX Threadsint total = 0;
void do_work() {//Do stuff to create “result”total = total + result;
}
Thread 1 reads total (0)Thread 2 reads total (0)Thread 1 does add and saves total (1)Thread 2 does add and saves total (2)
![Page 28: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/28.jpg)
POSIX Threadsint total = 0;pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void do_work() {//Do stuff to create “result”pthread_mutex_lock( &mutex );total = total + result;pthread_mutex_unlock( &mutex );
}
![Page 29: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/29.jpg)
OpenMP Library and compiler directives for multi-threading
Support in Visual C++, gcc
Code compiles even if compiler doesn't support OpenMP
Popular in high performance communities
Easy to add parallelism to existing code
![Page 30: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/30.jpg)
OpenMPInitialize an Array
const int array_size = 100000;int i, a[array_size]; #pragma omp parallel forfor (i = 0; i < array_size; i++) { a[i] = 2 * i;}
![Page 31: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/31.jpg)
OpenMPReduction
#pragma omp parallel for reduction(+:total)
for(i = 0; i < array_size; i++) { total = total + a[i]; }
![Page 32: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/32.jpg)
Grand Central Dispatch
Apple Technology for Multi-ThreadingProgrammer puts work into queuesA system central process determines the number threads to give to each queueAdd code to queues using a closureRight now Mac only, but open sourceEasy to add parallelism to existing code
![Page 33: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/33.jpg)
Grand Central DispatchInitialize an Array
dispatch_apply(array_size, dispatch_get_global_queue(0, 0), ^(int i) {
a[i] = 2*i; });
![Page 34: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/34.jpg)
Grand Central DispatchGUI Example
void analyzeDocument(doc) { do_analysis(doc); //May take a very long
time update_display();}
![Page 35: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/35.jpg)
Grand Central DispatchGUI Example
void analyzeDocument(doc) {
dispatch_async(dispatch_get_global_queue(0, 0), ^{
do_analysis(doc); update_display(); });}
![Page 36: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/36.jpg)
Other Technologies
Threading in Java, Python, etc.
MPI – for clusters
![Page 37: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/37.jpg)
QUESTIONS?
![Page 38: Multicore, parallelism, and multithreading](https://reader035.vdocument.in/reader035/viewer/2022081420/56816385550346895dd46cd2/html5/thumbnails/38.jpg)
Supplemental Reading Introduction to Parallel Computing
https://computing.llnl.gov/tutorials/parallel_comp/#Abstract
Introduction to Multi-Core Architecture http://
www.intel.com/intelpress/samples/mcp_samplech01.pdf
CPU History: A timeline of microprocessors http://
everything2.com/title/CPU+history%253A+A+timeline+of+microprocessors