saltconf 2016: salt stack transport and concurrency
TRANSCRIPT
![Page 1: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/1.jpg)
![Page 2: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/2.jpg)
Salt Transport Modularity and Concurrency for Performance and
ScaleThomas JacksonStaff Site Reliability EngineerLinkedIn
![Page 3: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/3.jpg)
3
Agenda
• for item in (‘transport’, ‘concurrency’):• History• Problems• Options• Solution
![Page 4: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/4.jpg)
4
Transport in SaltSalt Transport: a history
• In the beginning Salt was primarily a remote execution engine• Send jobs from Master to N minions (defined by some target)
• In the beginning there was
![Page 5: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/5.jpg)
5
"ZeroMQ (also spelled ØMQ, 0MQ or ZMQ) is a high-performance asynchronous messaging library, aimed at use in
distributed or concurrent applications.”
- Wikipedia (https://en.wikipedia.org/wiki/ZeroMQ)
![Page 6: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/6.jpg)
6
We took a normal TCP socket, injected it with a mix of radioactive isotopes stolen from a secret Soviet atomic
research project, bombarded it with 1950-era cosmic rays, and put it into the hands of a drug-addled comic book
author with a badly-disguised fetish for bulging muscles clad in spandex. Yes, ZeroMQ sockets are the world-saving
superheroes of the networking world.
- http://zguide.zeromq.org/page:all#How-It-Began
![Page 7: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/7.jpg)
7
Salt Transport: a history How ZMQ PUB/SUB looks
Servercontext = zmq.Context()socket = context.socket(zmq.PUB)socket.bind("tcp://*:12345")socket.send(”Message")
Clientcontext = zmq.Context()socket = context.socket(zmq.SUB)socket.connect("tcp://localhost:12345")print socket.recv()
![Page 8: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/8.jpg)
8
Salt Transport: a history How ZMQ REQ/REP looks
Servercontext = zmq.Context()socket = context.socket(zmq.REP)socket.bind("tcp://*:12345")message = socket.recv()socket.send(“got message”)
Clientcontext = zmq.Context()socket = context.socket(zmq.REQ)socket.connect("tcp://localhost:12345")socket.send("Hello”)message = socket.recv()
![Page 9: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/9.jpg)
9
Request lifecycleSalt Transport: a history
Master Minion
1. Jobpublish2. Sign-in(optional–potentiallyreusedorcached)3. PillarFetch4. SLS/filefetch(optional)5. Return
![Page 10: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/10.jpg)
10
Initial ZeroMQ implementationSalt Transport: a history
• Master-initiated messages• Using the pub/sub socket pair in zmq• All broadcast messages from the master to the minion
• Minion-initiated messages• Using the req/rep socket pair in zmq• All messages initiated by the minion, such as:
• Sign-in• Job return• Module sync• Pillar• Etc.
![Page 11: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/11.jpg)
11
Initial problemsSalt Transport: a history
• Message loss• Broadcasts where filtered client side
• Added zmq filtering: https://github.com/saltstack/salt/pull/13285
• Etc.
![Page 12: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/12.jpg)
12
![Page 13: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/13.jpg)
13
Larger problemsSalt Transport: a history
• Huge ZMQ publisher memory leak (https://github.com/zeromq/libzmq/issues/954)• Workaround: Process manager in salt
• No concept of client state• When messages arrive, there is no way to see if the client is still connected– which leads to auth storms• Workaround: Exponential backoff on the minion side
• No sync "connect" (https://github.com/saltstack/salt/pull/21570)• Workaround: fire event and wait for it to return (or timeout to expire)
• Some users have issues with the LGPL license • Workaround: n/a
![Page 14: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/14.jpg)
![Page 15: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/15.jpg)
15
The Reliable Asynchronous Event Transport, or RAET, is an alternative transport medium
developed specifically with Salt in mind. It has been developed to allow queuing to happen up on the application layer and comes with socket layer encryption. It also abstracts a great deal of control over the socket layer and makes it
easy to bubble up errors and exceptions.
- docs.saltstack.com
Salt Transport: previous attempt
![Page 16: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/16.jpg)
16
RAETSalt Transport: previous attempt
• The good• No ZMQ!
• The bad• Effectively a re-implementation of the daemons (separate files, etc.)• Unable to run zmq and RAET simultaneously (initially, hydra was added later – which just runs both daemons at once)
• The different• Changed the model from “minions always connect” to “minions are listening”, meaning minions have a socket to
attack
![Page 17: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/17.jpg)
17
![Page 18: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/18.jpg)
18
What do we really needSalt Transport: back to basics
• Salt is a platform, not a specific transport– we need transports to be modular• Some requirements:
• Simple interface to implement (such that other modules can be written)• Test coverage (including pre-canned tests for new modules)• Support N transports simultaneously (for ramps, and complex infra)• Clear contract of security/privacy requirements of various methods
![Page 19: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/19.jpg)
19
• ReqChannel: minion to master messagesSalt Transport: Channels!
• Master• pre_fork(self, process_manager)• post_fork(self, payload_handler, io_loop)
• Minion• send(self, load, tries=3, timeout=60)• crypted_transfer_decode_dictentry(self, load, dictkey=None, tries=3, timeout=60)
![Page 20: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/20.jpg)
20
• PubChannel: broadcasts to the appropriate minionsSalt Transport: Channels!
• Master• pre_fork(self, process_manager)• publish(self, load)
• Minion:• on_recv(self, callback)
![Page 21: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/21.jpg)
21
ResponsibilitiesSalt Transport: Channels!
• Serialization• Encryption• Targeting (pub channel only)
![Page 22: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/22.jpg)
22
TCP channelSalt Transport: Channels!
• Wire protocol: msgpack({'head': SOMEHEADER, 'body': SOMEBODY})• Main advantages over ZMQ? better failure modes
• Faster failure detection (if minion isn’t connected to the master, you don’t have to wait for the timeouts)• True link-status (no more auth storms!)• Basically, we have sockets again!
• https://docs.saltstack.com/en/develop/topics/transports/tcp.html
![Page 23: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/23.jpg)
23
TCP: How does it look?Salt Transport: Channels!
async_channel = salt.transport.client.AsyncReqChannel.factory(minion_opts)ret = yield async_channel.send(msg)
![Page 24: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/24.jpg)
24
TCP: How accurate?Salt Transport: Channels!
• ZeroMQ• Total jobs: 1000• Completed jobs: 171• Hit rate: 17.1%
• TCP• Total jobs: 1000• Completed jobs: 1000• Hit rate: 100%
![Page 25: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/25.jpg)
25
TCP: How does it performSalt Transport: Channels!
• 15 byte message• ZeroMQ*
• Average time: 0.00295809405715• QPS: 2246.952241147
• TCP• Average time: 0.0023341544863• QPS: 2580.04452801
![Page 26: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/26.jpg)
26
TCP: How does it performSalt Transport: Channels!
• 1053 byte message• ZeroMQ*
• Average time: 0.00278297542184• QPS: 2489.300394919
• TCP• Average time: 0.00251070397869• QPS: 2602.4855051
![Page 27: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/27.jpg)
27
Awesome!Salt Transport: Channels!
• Definitely awesome! • But async? What was that about? • Before we get into specifics, lets talk about concurrency
![Page 28: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/28.jpg)
28
The General ProblemConcurrency
We have lots of things to do, some of which are blocking calls to remote things which are “slow”. It is more efficient (and overall “faster”) to work on something else while we wait for that “slow” call.
![Page 29: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/29.jpg)
29
![Page 30: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/30.jpg)
30
Current state of concurrency in SaltConcurrency
• Master-side: the master creates N Mworkers to process N requests in parallel• N Mworkers to process N requests in parallel• Interaces with non-blocking as well, using `while True:` loops to do timeouts etc.
• Minion-side:• Threads used in MultiMaster for managing the multiple master connections
![Page 31: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/31.jpg)
31
ProblemsConcurrency
• No unified approach (multiprocessing, threading, nonblocking “loops” -- all in use)• Slow and/or blocking operations hold process/thread while waiting• No consistent use of non-blocking libraries, so the code is a mix of loops and
blocking calls• Limited scalability (each approach scales differently)
![Page 32: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/32.jpg)
32
Common solutions in PythonConcurrency
• Threading• Multiprocessing• User-space “threads”: Coroutines / stackless threads
![Page 33: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/33.jpg)
33
Concurrency Threading
• Some isolation between threads• Pre-emptive scheduling
Import threading
def handle_request():
ret = requests.get(‘http://slowthing/’)
# do something else
threads = []
for x in xrange(0, NUM)REQUESTS):
t = threading.Thread(target=handle_request)
t.start()
threads.append(t)
for t in threads:
t.join()
![Page 34: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/34.jpg)
34
Concurrency Multiprocessing
• Complete isolation• Pre-emptive scheduling
Import multiprocessing
def handle():
ret = requests.get(‘http://slowthing/’)
# do something else
Processes = []
for x in xrange(0, NUM)REQUESTS):
p = multiprocessing.Process(target=handle)
p.start()
processes.append(p)
For p in processes:
p.join()
![Page 35: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/35.jpg)
35
• User-space “threads”: Coroutines / stackless threadsConcurrency
• Some libraries you may have heard of• gevent• Stackless python• Greenlet• Twisted• Tornado
• How are these implemented• Green threads• callbacks• coroutines
![Page 36: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/36.jpg)
36
Why Coroutines?Concurrency
• Coroutines have been in use in python for a while (tornado)• The new asyncio in python3 (tulip) is coroutines
(https://docs.python.org/3/library/asyncio.html)
![Page 37: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/37.jpg)
37
Coroutines are computer program components that generalize subroutines for
nonpreemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations.
- https://en.wikipedia.org/wiki/Coroutine
Concurrency
![Page 38: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/38.jpg)
38
Concurrency Coroutines– what is this magic?
def item_of_work():
while True:
input = yield
yield do_something(input)
![Page 39: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/39.jpg)
39
Concurrency Coroutines– what is this magic?
def some_complex_handle():
while True:
input = yield
out1 = do_something(input)
yield None
out2 = do_something2(out1)
yield None
return do_something3(out2)
![Page 40: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/40.jpg)
40
Concurrency Tornado coroutines
• Some isolation between coroutines• Explicit yield• Light “threads”
Import threading
@tornado.gen.coroutine
def handle_request():
ret = yield requests.get(‘http://slow/’)
# do something else
loop = tornado.ioloop.IOLoop.current()
loop.spawn_callback(handle_request)
loop.start()
![Page 41: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/41.jpg)
41
Coroutines– futuresConcurrency
• Futures are just objects that represent a thing that will complete in the future• This allows methods to return immediately, but finish the task in the future• This allows the callers to yield execution until the futures they depend on complete
![Page 42: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/42.jpg)
42
Concurrency Coroutines– with futures
• Yield execution, and get returns• Method looks fairly normal• Stack traces in here have context• Easy chaining of futures
@tornado.gen.coroutine
def some_complex_handle(request):
a = yield is_authd(request)
if not a:
return False
ret = yield do_request(request)
yield save1(ret), save2(ret)
return ret
![Page 43: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/43.jpg)
43
Tornado in SaltConcurrency
• What is tornado?• Python web framework and asynchronous networking library
• Why Tornado and not asyncio?• Free python 2.x compatibility!• A fairly comprehensive set of libraries for it (http, locks, queues, etc.)
![Page 44: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/44.jpg)
44
Back to the transport interfacesConcurrency
• AsyncReqChannel• send: return a future• crypted_transfer_decode_dictentry: return a future
ret = yield channel.send(load, timeout=timeout)
![Page 45: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/45.jpg)
45
Now what?Concurrency
• Now that we have a real concurrency model, what have we done with it?• MultiMinion in a single process (coroutine per connection)• Easily implement concurrent networking within Salt
• TCP transport• IPC
![Page 46: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/46.jpg)
46
![Page 47: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/47.jpg)
47
Really? Problems?Concurrency problems
• Most common pitfalls to concurrent programming• race conditions and memory collisions• deadlocks
![Page 48: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/48.jpg)
48
Race conditionsConcurrency problems
• Weird data problems in the reactor: https://github.com/saltstack/salt/issues/23373• The underlying problem: injected stuff in modules (__salt__ etc.) were just dicts—
which aren’t threadsafe (or coroutinesafe!)
• The solution? `ContextDict`
![Page 49: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/49.jpg)
49
Copy-on-write thread/coroutine specific dictContextDict
• Works just like a dict• Exposes a clone() method, which creates a `ChildContextDict` which is a
thread/coroutine local copy• With tornado’s StackContext, we switch the backing dict of the parent with your
child using a context manager
cd = ContextDict(foo=bar)print cd[‘foo’] # will be barwith tornado.stack_context.StackContext(cd.clone): print cd[‘foo’] # will be bar cd[‘foo’] = ‘baz’ print cd[‘foo’] # will be bazprint cd[‘foo’] # will be bar
More examples: https://github.com/saltstack/salt/blob/develop/tests/unit/context_test.py
![Page 50: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/50.jpg)
50
DeadlocksConcurrency problems
• haven't seen any yet *knock on wood* -- in general we avoid these since each coroutine is more-or-less independent of the others
![Page 51: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/51.jpg)
51
Layers!Concurrency problems
• Don’t forget, concurrency at all layers– including your DC-wide state execution• For example: automated highstate enforcement of your whole DC
• Does it matter if all DB hosts update at once?• Does it matter if all web servers update at once?• Does it matter if all edge boxes update at once?
![Page 52: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/52.jpg)
52
concurrency controls for state executionzk_concurrency
acquire_lock: zk_concurrency.lock: - name: /trafficeserver - zk_hosts: 'zookeeper:2181' - max_concurrency: 4 - prereq: - service: trafficservertrafficserver: service.running: []release_lock: zk_concurrency.unlock: - name: /trafficserver - require: - service: trafficserver
![Page 53: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/53.jpg)
53
Things on my “list”Future Awesomeness
• Transport• failover groups• even better HA (https://github.com/saltstack/salt/issues/25700 -- get involved in the conversation)
• Concurrency• async ext_pillar• Partially concurrent state execution (prefetch, etc.)?• Coroutine-based:
• Reactor• Engines• Beacons• Thorium
![Page 54: Saltconf 2016: Salt stack transport and concurrency](https://reader036.vdocument.in/reader036/viewer/2022070514/587cf7041a28ab564b8b7375/html5/thumbnails/54.jpg)
©2014 LinkedIn Corporation. All Rights Reserved.©2014 LinkedIn Corporation. All Rights Reserved.