concurrency and distributed systems

51
Concurrency and Distributed systems ... With Python today. Jesse Noller Saturday, March 28, 2009

Upload: others

Post on 04-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Concurrency and Distributed systems

Concurrency and Distributed systems

... With Python today.

Jesse Noller

Saturday, March 28, 2009

Page 2: Concurrency and Distributed systems

30,000 Foot View

• Introduction

• Concurrency/Parallelism

• Distributed Systems

• Where Python is today

• Ecosystem

• Where can we go?

• Questions

Saturday, March 28, 2009

Page 3: Concurrency and Distributed systems

Hello there!

• Who am I?

• Why am I doing this?

• Email: [email protected]

• Blog - http://www.jessenoller.com

• Pycon - http://jessenoller.com/category/pycon-2009/

Saturday, March 28, 2009

Page 4: Concurrency and Distributed systems

Most of all, it’s fun!

Saturday, March 28, 2009

Page 5: Concurrency and Distributed systems

No Code, Why?

Saturday, March 28, 2009

Page 6: Concurrency and Distributed systems

Bike sheds

Saturday, March 28, 2009

Page 7: Concurrency and Distributed systems

Concurrency

• What is it?

• Doing many things “at once”

• Typically local to the machine running the app.

• Implementation Options:

• threads / multiple processes

• cooperative multitasking

• coroutines

• asynchronous programming

Saturday, March 28, 2009

Page 8: Concurrency and Distributed systems

... vs Parallelism

• What is it?

• Doing many things simultaneously

• Implementation options:

• threads

• multiple processes

• distributed systems

Saturday, March 28, 2009

Page 9: Concurrency and Distributed systems

... vs Distributed Systems

• What is it?

• Doing many things, across multiple machines, simultaneously

• Many cores, on many machines

• There are many designs

• There are eight fallacies...

Saturday, March 28, 2009

Page 10: Concurrency and Distributed systems

8 fallacies of distributed systems

Saturday, March 28, 2009

Page 11: Concurrency and Distributed systems

The network is reliable

Saturday, March 28, 2009

Page 12: Concurrency and Distributed systems

Latency is zero

Saturday, March 28, 2009

Page 13: Concurrency and Distributed systems

Bandwidth is infinite

Saturday, March 28, 2009

Page 14: Concurrency and Distributed systems

The network is secure

Saturday, March 28, 2009

Page 15: Concurrency and Distributed systems

Topology doesn’t change

Saturday, March 28, 2009

Page 16: Concurrency and Distributed systems

There is only one administrator

Saturday, March 28, 2009

Page 17: Concurrency and Distributed systems

Transport cost is zero

Saturday, March 28, 2009

Page 18: Concurrency and Distributed systems

The network is homogenous

Saturday, March 28, 2009

Page 19: Concurrency and Distributed systems

Summary• All 3 are related to one another, the fundamental

goals of which are to:

• Decrease latency

• Increase throughput

• Applications start simple, progress to concurrent systems and evolve into parallel, distributed systems

• As the system evolves, the fallacies become more pertinent, you have to account for them early

Saturday, March 28, 2009

Page 20: Concurrency and Distributed systems

Saturday, March 28, 2009

Page 21: Concurrency and Distributed systems

• We have threads. Shiny, real OS ones

• Except for the Global Interpreter Lock

• The GIL makes the interpreter easier to maintain

• ...And it simplifies extension module code

Where is (C)Python?

Saturday, March 28, 2009

Page 22: Concurrency and Distributed systems

• Yes. Sorta. Maybe. It depends.

• I/O Bound / C extensions release it!

• Most applications are I/O bound

• The GIL still has non-zero overhead

• The GIL is not going away*

• You can build concurrent applications regardless of the GIL

Is the GIL a problem?

* ... more on this in a moment, dun dun dun.

Saturday, March 28, 2009

Page 23: Concurrency and Distributed systems

Multiprocessing!

• Added in the 2.6/3.0 timeline, PEP 371

• Processes and IPC (via pipes) to allow parallelism

• Same(ish) API as threading and queue

• Includes Pool, remote Managers for data sharing over a network, etc

• Multiprocessing “outperforms” threading

• IPC requires pickle-ability. Incurs overhead

Saturday, March 28, 2009

Page 24: Concurrency and Distributed systems

Summary

• We have the Global Interpreter Lock

• We also have multiprocessing (no GIL)

• Threads (as an approach) are good for some problems

• They’re not impossible to use correctly

• While hampered, python threads are still useful

• Python still allows you to leverage other approaches to concurrency

Saturday, March 28, 2009

Page 25: Concurrency and Distributed systems

(remember that asterisk?)*

Saturday, March 28, 2009

Page 26: Concurrency and Distributed systems

• Python on the JVM (in Java)

• 2.5-Compatible

• Frank and the others are awesome for resurrecting this project

• May allow python in the Java door

• Pros:

• Unrestricted threading

• Hooray java.util.concurrent!

• Cons:

• No C extensions

Saturday, March 28, 2009

Page 27: Concurrency and Distributed systems

IronPython

• Python on the .NET CLR

• 2.5.2 Compatible

• Matured rapidly, highly usable

• Great for windows environments

• Pros:

• Unrestricted threading

• Some C extensions via ironclad

• Cons:

• Mostly windows only, barring mono

Saturday, March 28, 2009

Page 28: Concurrency and Distributed systems

Stackless

• Modified CPython interpreter

• Offers Coroutines, Channels - “lightweight threads”

• Cooperative multitasking (single thread executes)

• (mostly) Still alive courtesy of CCP Games

• Still has a GIL

• “Stackless is dead, long live PyPy”

Saturday, March 28, 2009

Page 29: Concurrency and Distributed systems

• Python written in (R)Python

• Getting close to 2.5-Compatibility

• Complete “rethink” of the interpreter

• Focusing on JIT/interpreter speed right now

• Still has the GIL

• Some Stackless features (e.g. coroutines, channels)

• Not mature

Saturday, March 28, 2009

Page 30: Concurrency and Distributed systems

The Ecosystem

Saturday, March 28, 2009

Page 31: Concurrency and Distributed systems

That’s a lot of nuts!

• When I started, I had around 40 libraries on my list

• Coroutines, messaging, frameworks, etc

• Python has a huge ecosystem of “stuff”

• Unfortunately, much of is long in the tooth, or of beta quality

• New libraries/frameworks/approaches are coming out every week

Saturday, March 28, 2009

Page 32: Concurrency and Distributed systems

ConcurrencyFrameworks

Saturday, March 28, 2009

Page 33: Concurrency and Distributed systems

Twisted

• “OK, who hasn’t tried twisted?”

• Asynchronous, Event Driven multitasking

• Vast networking library, large ecosystem

• Supports thread usage, but twisted code may not be thread safe

• Supports using processes (not mprocessing).

• Can be mind-bending

Saturday, March 28, 2009

Page 34: Concurrency and Distributed systems

Kamaelia

• Came out of BBC Research

• Uses an easy to understand “components talking via mailboxes” approach

• Cooperative multitasking via generators by default.

• Honkin’ library of cool things

• Supports thread-based components as well

• Very easy to get up and running

• Abstracts IPC, Process, Threads, etc “away”

Saturday, March 28, 2009

Page 35: Concurrency and Distributed systems

Frameworks

• Both kamaelia and twisted have nice networking support

• Both use schedulers which allow scheduled items to schedule other items

• Two different approaches to thinking about the problem

• Both can be used to build distributed apps

• Like all frameworks, you adopt the methodology

Saturday, March 28, 2009

Page 36: Concurrency and Distributed systems

New: Concurrence

• New on the scene (’09) version 0.3

• Lightweight tasks-with-message passing

• Has a main scheduler/dispatcher

• Built on top of stackless/greenlets/libevent

• Network-oriented (HTTP, WSGI servers)

• Still raw (more docs please)

• Very promising (minus compilation problems)

Saturday, March 28, 2009

Page 37: Concurrency and Distributed systems

Coroutines

• Coroutines are essentially light-weight threads of control, Think micro/green threads

• Typically use explicit task switching (cooperative)

• Most implementations have a scheduler, and some communications method (e.g. pipes)

• Not parallel unless used in a distributed fashion

• Both Kamaelia and Twisted “fit” here

• Enhanced generators make these easy to build

Saturday, March 28, 2009

Page 38: Concurrency and Distributed systems

Coroutine libraries

• Fibra: microthreads, tubes, scheduler

• Greenlet: C based, microthreads, no scheduler

• Eventlet: Network “framework” layer on top of greenlet. Has an Actor implementation \o/

• Circuits: Event-based, components/microthreads

• Cogen: network oriented, scheduler, microthreads

• Multitask: microthreads, no channels (it’s dead jim)

Saturday, March 28, 2009

Page 39: Concurrency and Distributed systems

Actors

• Isolated, self reliant components

• Can spawn other Actors

• Communicate via message passing only (by value)

• Operate in parallel

• Communication is asynchronous

• A good model to overcome the fallacies

• See also: Erlang, Scala

Saturday, March 28, 2009

Page 40: Concurrency and Distributed systems

Actor Libraries• Dramatis (alpha quality)

• Great start, excellent base to start working with them

• Parley (alpha quality)

• Another excellent start, supports actors in threads, greenlets or stackless tasklets

• Candygram (2004)

• Old, implements erlang primitives, spawns in threads

• Kamaelia components can fit here(ish)

Saturday, March 28, 2009

Page 41: Concurrency and Distributed systems

(local) Parallelism• Multiprocessing

• Processes and IPC via the threading API, in Python-Core as of 2.6

• Parallel Python

• Allows local parallelism, but also distributed parallelism in a “full” package

• pprocess

• Another easy to use fork/process based package

• Has IPC mechanisms

Saturday, March 28, 2009

Page 42: Concurrency and Distributed systems

Distributed Systems

• Lots of various technologies to help build something

• communications libraries

• socket/networking libraries

• message queues

• some shared memory implementations

• No “full stack” approach

• Most users end up rolling their own, using some combinations of libraries and tools

Saturday, March 28, 2009

Page 43: Concurrency and Distributed systems

Distributed Processing

• Frameworks:

• Parallel Python is the closest for a processing cluster

• The Disco Project is an erlang-based (with python bindings) map-reduce framework

Saturday, March 28, 2009

Page 44: Concurrency and Distributed systems

RPC/Messaging

• RPC:

• Pyro

• rPyc

• Thrift

• Messaging:

• pySage

• python-spread

• XMPP

• Protocol Buffers

Saturday, March 28, 2009

Page 45: Concurrency and Distributed systems

Shared Memory/Message Qs

• Shared Memory

• Posh (dead)

• Memcached

• posix_ipc

• Message Queues

• Apache ActiveMQ

• RabbitMQ

• Stomp

• MemcacheQ

• Beanstalkd

Saturday, March 28, 2009

Page 46: Concurrency and Distributed systems

So...

• Where the hell do we point new users?

• While good, Twisted and Kamaelia have a documentation problem

• The rest is a mish-mash of technologies

• Concurrency is hard let’s go shopping!

Saturday, March 28, 2009

Page 47: Concurrency and Distributed systems

Where does this leave us?

• The GIL is here for the foreseeable future

• Not entirely a bad thing (extensions!)

• Python-Core is not the right place for much of this, but can provide some basics

• Actor implementation

• Java.util.concurrent-like abstractions

• Anything going in must make this work safe

Saturday, March 28, 2009

Page 48: Concurrency and Distributed systems

Where does this leave us?

• Lots of great community work

• Continued room for growth, adoption of other language’s technologies

• If we can build a stack of reusable, swappable components for all three areas: everyone wins

• Anyone for a “distributed Django”?

• “loose coupling and tight cohesion”

• Must take the fallacies into account

Saturday, March 28, 2009

Page 49: Concurrency and Distributed systems

Django?

• The point of a framework is to make the easy things easy, and the hard things easier

• The abstractions must be leaky

• Go see abstractions as leverage!

• It must be safe

• It can not ignore the fallacies

• I shall call it Mustaine (Megadeth)

Saturday, March 28, 2009

Page 50: Concurrency and Distributed systems

Questions?

Saturday, March 28, 2009

Page 51: Concurrency and Distributed systems

Fin.

Saturday, March 28, 2009