beating pythons gil to max out your cpus
TRANSCRIPT
![Page 1: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/1.jpg)
Beating Python's GIL!to
Max Out Your CPUsAndrew Montalenti!
CTO, Parse.ly @amontalenti
![Page 2: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/2.jpg)
Scaling Python!to
3,000 CoresAndrew Montalenti!
CTO, Parse.ly @amontalenti
OR:
![Page 3: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/3.jpg)
![Page 4: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/4.jpg)
![Page 5: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/5.jpg)
![Page 6: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/6.jpg)
![Page 7: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/7.jpg)
What happens when you have 153 TB of compressed customer data that may need to be reprocessed at any time,
and it’s now growing at 10-20TB per month?
![Page 8: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/8.jpg)
![Page 9: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/9.jpg)
![Page 10: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/10.jpg)
@dabeaz = “the GIL guy”
![Page 11: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/11.jpg)
![Page 12: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/12.jpg)
Is the GIL a feature, not a bug?!
In one Python process,
at any one time,
only one Python bytecode instruction
is executing at once.
![Page 13: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/13.jpg)
![Page 14: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/14.jpg)
![Page 15: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/15.jpg)
should we just rewrite it in Go?
![Page 16: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/16.jpg)
![Page 17: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/17.jpg)
![Page 18: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/18.jpg)
![Page 19: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/19.jpg)
![Page 20: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/20.jpg)
fast functions!running in parallel
![Page 21: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/21.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
from urllib.parse import urlparse urls = ["http://arstechnica.com/", "http://ars.to/1234", "http://ars.to/5678", ...]
![Page 22: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/22.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
map(urlparse, urls)
from urllib.parse import urlparse urls = ["http://arstechnica.com/", "http://ars.to/1234", "http://ars.to/5678", ...]
![Page 23: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/23.jpg)
Cythonspeeding up functions on a single core
![Page 24: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/24.jpg)
![Page 25: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/25.jpg)
concurrent.futuresgood map API, but odd implementation details
![Page 26: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/26.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
executor = ThreadPoolExecutor() executor.map(urlparse, urls)
![Page 27: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/27.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
executor = ProcessPoolExecutor() executor.map(urlparse, urls)
Python subprocess
State
Code
Python subprocess
State
Code
pickle.dumps()
os.fork()
![Page 28: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/28.jpg)
![Page 29: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/29.jpg)
joblibmap functions over local machine cores
by cleaning up stdlib facilities
![Page 30: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/30.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
par = Parallel(n_jobs=2) do_urlparse = delayed(urlparse) par(do_urlparse(url) for url in urls)
Python subprocess
State
Code
Python subprocess
State
Code
pickle.dumps()
os.fork()
![Page 31: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/31.jpg)
ipyparallelmap functions over a pet compute cluster
![Page 32: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/32.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
rc = Client() rc[:].map_sync(urlparse, urls)
Python
State
Code
Python
State
Code
ipengine
Python
State
Code
Python
State
Code
Python
State
Code
ipengine
ipengine
ipcontroller
Python
State
Code
pickle.dumps()
pickle.dumps()
![Page 33: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/33.jpg)
![Page 34: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/34.jpg)
pykafkamap functions over a multi-consumer log
![Page 35: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/35.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
consumer = ... # balanced while True: msg = consumer.consume() msg = json.loads(msg) urlparse(msg["url"])
Python
State
Code
Python
State
CodePython
State
Code
Python
State
Code
Python
State
Code
pykafka.producer
Python
State
Code
![Page 36: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/36.jpg)
![Page 37: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/37.jpg)
pystormmap functions over a stream of inputs
to generate a stream of outputs
![Page 38: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/38.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
Python
State
Code
Python
State
CodePython
State
Code
Python
State
Code
pykafka.producer
Python
State
Code
multi-lang json protocol
class UrlParser(Topology): url_spout = UrlSpout.spec(p=1) url_bolt = UrlBolt.spec(p=4, input=url_spout)
![Page 39: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/39.jpg)
![Page 40: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/40.jpg)
![Page 41: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/41.jpg)
pysparkmap functions over a dataset representation
to perform transformations and actions
![Page 42: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/42.jpg)
Python
State
Code
Server 1
Core 2
Core 1
Server 2
Core 2
Core 1
Server 3
Core 2
Core 1
Python
State
Code
Python
State
CodePython
State
Code
Python
State
Code
pyspark.SparkContext
sc = SparkContext() file_rdd = sc.textFile(files) file_rdd.map(urlparse).take(1)
cloudpickle
py4j and binary pipes
![Page 43: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/43.jpg)
![Page 44: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/44.jpg)
![Page 45: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/45.jpg)
![Page 46: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/46.jpg)
![Page 47: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/47.jpg)
"lambda architecture"
![Page 48: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/48.jpg)
![Page 49: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/49.jpg)
Parse.ly "Batch Layer" Topologies with Spark & S3
Parse.ly "Speed Layer" Topologies with Storm & Kafka
Parse.ly Dashboards and APIs with Elasticsearch & Cassandra
Parse.ly Raw Data Warehouse with Streaming & SQL Access
Technology Component Summary
![Page 50: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/50.jpg)
parting thoughts
![Page 51: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/51.jpg)
![Page 52: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/52.jpg)
![Page 53: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/53.jpg)
the free lunch is over, but not how we thought
![Page 54: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/54.jpg)
multi-process, not multi-thread multi-node, not multi-core
message passing, not shared memory !
heaps of data and streams of data
![Page 55: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/55.jpg)
![Page 56: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/56.jpg)
GIL: it's a feature, not a bug.
help us!
pystorm pykafka
streamparse
![Page 57: Beating Pythons GIL to Max Out Your CPUs](https://reader034.vdocument.in/reader034/viewer/2022050614/58ad89e51a28ab662a8b55db/html5/thumbnails/57.jpg)
Questions?tweet at @amontalenti