mongotorino 2013 - bson mad science for fun and profit
DESCRIPTION
The talk will cover how to use BSON directly as an exchange protocol to gain speed and advanced types. BSON is the underlying serialization protocol used by MongoDB to store and represent data. Whenever we retrieve data from MongoDB we get it as BSON, then our drivers decode it just so that our web service can encode it back in JSON. We will see how to take advantage of BSON for fun and speed skipping this double step by directly fetching BSON and decoding it at client side.TRANSCRIPT
![Page 2: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/2.jpg)
Who am I
● CTO @ Axant.it, mostly Python company,
with some iOS and Android development.
● Mostly relying on MySQL, MongoDB, Redis
(and sqlite!) for day by day data storage
● TurboGears web framework team member
● Contributions to Ming MongoDB ODM
![Page 3: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/3.jpg)
The Reason
● EuroPython 2013
○ JSON WebServices with Python best practices
talk
● Question raised
○ “We have a service where our bottleneck is
actually the JSON encoding itself, what can we
do?”
![Page 4: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/4.jpg)
First obvious answer
● Avoid encoding whole data in memory
○ iterencode yields one object at time instead of
encoding everything at once.
● Use a faster encoder!
○ There are projects with custom encoders like
GPSD that are very fast and very memory
conservative.
![Page 5: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/5.jpg)
Mad answer
● If the JSON encoder is too slow for you
○ Remove JSON encoding
● Looking for the fastest encoding?
○ Don’t encode data at all!
![Page 6: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/6.jpg)
MongoDB flow
● BSON is the serialization format used by
mongodb to talk with its clients
● Involves decoding BSON and then re-
encoding JSON
MongoDB WebServiceClient
BSONJSON
DriverNATIVE
![Page 7: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/7.jpg)
Using BSON!
● Can we totally skip “BSON decoding” and
“JSON encoding” dance and directly use
BSON?
“BSON [bee · sahn], short for Bin-ary JSON, is a binary-encoded seri-alization
of JSON-like documents. Like JSON, BSON supports the embedding of
documents and arrays within other documents and arrays. BSON also contains
extensions that allow representation of data types that are not part of the JSON
spec. For example, BSON has a Date type and a BinData type.”
![Page 8: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/8.jpg)
Target Flow
● BSON decoding on the client can happen
using the js-bson library (or equivalent)
● Skipping BSON decoding on server is hard
○ It’s built-in into the mongodb driver
MongoDB WebServiceClient
BSONBSON
DriverBSON
![Page 9: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/9.jpg)
The Python Driver
MongoDB Cursor _unpack_response
bson.decode_all_elements_to_dict_element_to_dict
![Page 10: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/10.jpg)
Custom decoding
● bson.decode_all is the method in charge
of decoding BSON objects.
● We need a decoder that partially decodes
the query but lets the actual documents
encoded.
● Full BSON spec available on bsonspec.org
![Page 11: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/11.jpg)
Custom bson.decode_all
$ python test.py {u'text': u'My first blog post!', u'_id': ObjectId('5267f71a0e9ce56fe55bdc4b'), u'author': u'Mike'}
$ python test.py 'E\x00\x00\x00\x07_id\x00Rg\xf7\x1a\x0e\x9c\xe5o\xe5[\xdcK\x02text\x00\x14\x00\x00\x00My first blog post!\x00\x02author\x00\x05\x00\x00\x00Mike\x00\x00'
![Page 12: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/12.jpg)
BSON format
SIZE ONE OR MORE KEY-VALUE ENTRIES \0
TYPE KEY NAME \0 VALUE
![Page 13: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/13.jpg)
Custom bson.decode_all
obj_size = struct.unpack("<i", data[position:position + 4])[0]elements = data[position + 4:position + obj_size - 1]position += obj_sizedocs.append(_elements_to_dict(elements, as_class, ...))
obj_size = struct.unpack("<i", data[position:position + 4])[0]elements = data[position:position + obj_size]position += obj_sizedocs.append(elements)
![Page 14: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/14.jpg)
Enforcing in PyMongo
● Now that we have a custom decoding
function, that leaves the documents
encoded in BSON, we need to enforce it to
PyMongo
● _unpack_response is the method that is in
charge of calling the decode_all function,
we must convince it to call our version
![Page 15: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/15.jpg)
MonkeyPatching
and this is the reason why it’s mad science and you should avoid doing it!
![Page 16: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/16.jpg)
Hijacking decoding
● _unpack_response
○ Called by pymongo to unpack responses retrieved
by the server.
○ Some informations are given: like the current
cursor id in case of getMore and other parameters
○ We can use provided parameters to suppose if we
are decoding a query response or something else.
![Page 17: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/17.jpg)
Custom unpack_response_real_unpack_response = pymongo.helpers._unpack_response
def custom_unpack_response(response, cursor_id=None, as_class=None, *args, **kw): if as_class is None: # Not a query, here lies the real trick return _real_unpack_response(response, cursor_id, dict, *args, **kw)
response_flag = struct.unpack("<i", response[:4])[0] if response_flag & 2: # In case it's an error report return _real_unpack_response(response, cursor_id, as_class, *args, **kw)
result = {} result["cursor_id"] = struct.unpack("<q", response[4:12])[0] result["starting_from"] = struct.unpack("<i", response[12:16])[0] result["number_returned"] = struct.unpack("<i", response[16:20])[0] result["data"] = custom_decode_all(response[20:]) return result
pymongo.helpers._unpack_response = custom_unpack_response
![Page 18: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/18.jpg)
Fetching BSON
● Our PyMongo queries will now return
BSON encoded data we can then push to
the client
● Let’s fetch the data from the client to close
the loop
![Page 19: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/19.jpg)
Fetching BSONfunction fetch_bson() { var BSON = bson().BSON;
var oReq = new XMLHttpRequest(); oReq.open("GET", 'http://localhost:8080/results_bson', true); oReq.responseType = "arraybuffer"; oReq.onload = function(e) { var data = new Uint8Array(oReq.response); var offset = 0; var results = [];
while (offset < data.length) offset = BSON.deserializeStream(data, offset, 1, results, results.length, {});
show_output(results); }
oReq.send();}
![Page 20: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/20.jpg)
See it in action
![Page 21: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/21.jpg)
Performance Gain
● All started to get a performance boost,
how much did it improve?
JSON BSON
1239.72 req/sec 2079.75 req/sec
![Page 22: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/22.jpg)
False Benchmark
● Benchmark is actually pointless
○ as usual ;)
● Replacing bson.decode_all which is
written in C with custom_decode_all which
is written in Python
○ The two don’t compare much
● Wanna try with PyPy?
![Page 23: MongoTorino 2013 - BSON Mad Science for fun and profit](https://reader034.vdocument.in/reader034/viewer/2022052523/55515a3cb4c905a8768b4bab/html5/thumbnails/23.jpg)
Questions?