parallel processing in python - cosmos

18
PARALLEL PROCESSING IN PYTHON COSMOS - 1/28/2020 BY JOSEPH KREADY

Upload: others

Post on 28-Jul-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PARALLEL PROCESSING IN PYTHON - COSMOS

PARALLEL PROCESSING IN PYTHON

COSMOS - 1/28/2020

BY JOSEPH KREADY

Page 2: PARALLEL PROCESSING IN PYTHON - COSMOS

LAYOUT

What is Parallel

Processing

History of Parallel

Computation

Parallel Processing and Python

Google Colab Example

Page 3: PARALLEL PROCESSING IN PYTHON - COSMOS
Page 4: PARALLEL PROCESSING IN PYTHON - COSMOS

SERIAL PROCESSING VS. PARALLEL PROCESSING

Serial Processing: One object at a time

Parallel Processing: Multiple objects at a time

Page 5: PARALLEL PROCESSING IN PYTHON - COSMOS
Page 6: PARALLEL PROCESSING IN PYTHON - COSMOS

CPU ARCHITECTURE

Serial Computation

Page 7: PARALLEL PROCESSING IN PYTHON - COSMOS

FREQUENCY SCALING

The method used for improving computer performance from 1980s –2000s

Frequency Scaling Equation: Power consumption = capacitance * voltage^2 * frequency

The increase to power consumption led to the demise of frequency scaling

Page 8: PARALLEL PROCESSING IN PYTHON - COSMOS

SUPERSCALAR ARCHITECTURE

Executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor.

Page 9: PARALLEL PROCESSING IN PYTHON - COSMOS

MULTI-CORE PROCESSORS

Each CPU is made of independent ‘Cores’ that can access the same memory concurrently

Moore’s Law: The number of cores per processor doubles every 18-24 months

Operating systems ensure programs run on available cores, but developers must design their programs to take advantage of parallel processing.

Page 10: PARALLEL PROCESSING IN PYTHON - COSMOS

PROCESS VS. THREAD VS.MULTI-THREADING VS.HYPER-THREADING

¡ Processes are made of threads

¡ Threads of the same processes share memory.

¡ Processes run in separate memory.

¡ Hyper-threading: allows scheduling of 2 processes on 1 CPU core

¡ Multiple Instructions operate on separate data in parallel

Page 11: PARALLEL PROCESSING IN PYTHON - COSMOS
Page 12: PARALLEL PROCESSING IN PYTHON - COSMOS

THE GLOBAL INTERPRETER LOCK!!!¡ Global Interpreter lock (GIL): All Python processes must

go through the GIL to execute; thus threads execute 1 at a time¡ It is faster in the single-threaded case.¡ It is faster in the multi-threaded case for i/o bound

programs. (they are not GIL locked)¡ It is faster in the multi-threaded case for CPU-bound

programs that do their compute-intensive work in C libraries, ie Numpy.

¡ GIL only becomes a problem when doing CPU-intensive work in pure Python.

¡ Not all versions of Python use a GIL: Jython, IronPython, PyPy-STM

Page 13: PARALLEL PROCESSING IN PYTHON - COSMOS

YOUR CHOICES

¡ Threads act as sub-tasks of a single process

¡ Threads share the same memory space

¡ Great for background tasks / waiting for asynchronous functions

¡ Can lead to conflicts (Race Conditions), when writing to the same memory location at the same time

¡ Separate processes act as individual jobs

¡ Processes run in their own memory space

¡ Great for complex calculations / running multiple instances of a whole project

¡ Higher overhead, but more secure

MULTI-THREADING MULTI-PROCESSING

Page 14: PARALLEL PROCESSING IN PYTHON - COSMOS

WHY DON’T WE JUST USE TREADS?

Problem

¡ Race Conditions: Multiple threads reading and writing to the same object Will cause unexpected results

¡ The operating system handles the processing of threads dynamically. There’s no way to ensure the compute order.

Solution

¡ Synchronization using Lock: You can define one part of your function ‘with thread.lock:’ which requires 1 thread processing at a time.

¡ This can lead to Deadlocks: where the lock is not released properly, or you call sub functions that are already locked in another thread

Page 15: PARALLEL PROCESSING IN PYTHON - COSMOS

WHY DON’T WE JUST USE PROCESSES? Problem

¡ Serialization using Pickles: converting python objects to byte streams

¡ Individual processes run in separate memory spaces and need a way to communicate. This is done through Pickles.

¡ Certain limitations, like limited function arguments and no class supported with pickling. Therefore, you must design your functions with pickling in mind.

Solution

¡ Serialization using Dill

¡ Dill extends Pickles, allowing to send arbitrary classes and functions as byte streams. Dill can Pickle all the python objects!

¡ Pathos.Multiprocessing library is a fork of python’s multi-processing that uses Dills instead of Pickles.

Page 16: PARALLEL PROCESSING IN PYTHON - COSMOS

AMDAHL’S LAW¡ The small part of a program that

cannot be parallelized will limit the overall speedup

¡ S-latency is the potential speedup in latency of the execution of the whole task;

¡ s is the speedup in latency of the execution of the parallelizable part of the task;

¡ p is the percentage of the execution time of the whole task concerning the parallelizable part of the task before parallelization.

Page 17: PARALLEL PROCESSING IN PYTHON - COSMOS

https://colab.research.google.com/drive/1TkjjiIrzq5wE1BF2DbOAqTKhmRgzSYVh

Page 18: PARALLEL PROCESSING IN PYTHON - COSMOS

Works Cited

“An Introduction to Parallel Programming Using Python's Multiprocessing Module.” Dr.

Sebastian Raschka, 20 June 2014,

sebastianraschka.com/Articles/2014_multiprocessing.html. Accessed 31 Jan. 2020.

“Dill.” PyPI, pypi.org/project/dill/. Accessed 31 Jan. 2020.

FomiteFomite 2, et al. “Why Was Python Written with the GIL?” Software Engineering Stack

Exchange, 1 Feb. 1963, softwareengineering.stackexchange.com/questions/186889/why-

was-python-written-with-the-gil. Accessed 31 Jan. 2020.

“Has the Python GIL Been Slain?” By, hackernoon.com/has-the-python-gil-been-slain-

9440d28fa93d. Accessed 31 Jan. 2020.

“Hyper-Threading.” Wikipedia, Wikimedia Foundation, 19 Jan. 2020,

en.wikipedia.org/wiki/Hyper-threading. Accessed 31 Jan. 2020.

“Multithreading (Computer Architecture).” Wikipedia, Wikimedia Foundation, 2 Jan. 2020,

en.wikipedia.org/wiki/Multithreading_(computer_architecture). Accessed 31 Jan. 2020.

“Parallel Computing.” Wikipedia, Wikimedia Foundation, 26 Dec. 2019,

en.wikipedia.org/wiki/Parallel_computing. Accessed 31 Jan. 2020.

“Pickle - Python Object Serialization¶.” Pickle - Python Object Serialization - Python 3.8.1

Documentation, docs.python.org/3/library/pickle.html. Accessed 31 Jan. 2020.

Real Python. “An Intro to Threading in Python.” Real Python, Real Python, 25 May 2019,

realpython.com/intro-to-python-threading/. Accessed 31 Jan. 2020.

Rocklin, Matthew. “Parallelism and Serialization How Poor Pickling Breaks Multiprocessing.”

Parallelism and Serialization, matthewrocklin.com/blog/work/2013/12/05/Parallelism-

and-Serialization. Accessed 31 Jan. 2020.

“Superscalar Processor.” Wikipedia, Wikimedia Foundation, 7 May 2019,

en.wikipedia.org/wiki/Superscalar_processor. Accessed 31 Jan. 2020.

REFERENCES