python and mongodb
TRANSCRIPT
Agenda
Introduction to MongoDB pymongo CRUD Aggregation GridFS Indexes ODMs
Ola, I'm Norberto
Norberto Leite Technical Evangelist !
Madrid, Spain @nleite [email protected] http://www.mongodb.com/norberto
MongoDB
MongoDB
GENERAL PURPOSE DOCUMENT DATABASE OPEN-SOURCE
Fully Featured
MongoDB Features
JSON Document Model with Dynamic Schemas
Auto-Sharding for Horizontal Scalability
Text Search
Aggregation Framework and MapReduce
Full, Flexible Index Support and Rich Queries
Built-In Replication for High Availability
Advanced Security
Large Media Storage with GridFS
MongoDB Inc.
400+ employees 2,000+ customers
Over $311 million in funding 13 offices around the world
THE LARGEST ECOSYSTEM
9,000,000+ MongoDB Downloads
250,000+ Online Education Registrants
35,000+ MongoDB User Group Members
35,000+ MongoDB Management Service (MMS) Users
750+ Technology and Services Partners
2,000+ Customers Across All Industries
pymongo
pymongo
• MongoDB Python official driver • Rockstart developer team
• Jesse Jiryu Davis, Bernie Hackett • One of oldest and better maintained drivers • Python and MongoDB are a natural fit
• BSON is very similar to dictionaries • (everyone likes dictionaries)
• http://api.mongodb.org/python/current/ • https://github.com/mongodb/mongo-python-driver
pymongo 3.0
!
• Server discovery spec • Monitoring spec • Faster client startup when connecting to Replica Set • Faster failover • More robust replica set connections • API clean up
Connecting
Connecting#!/bin/python from pymongo import MongoClient !mc = MongoClient()
client instance
Connecting#!/bin/python from pymongo import MongoClient !uri = 'mongodb://127.0.0.1' mc = MongoClient(uri)
Connecting#!/bin/python from pymongo import MongoClient !uri = 'mongodb://127.0.0.1' mc = MongoClient(host=uri, max_pool_size=10)
Connecting to Replica Set#!/bin/python from pymongo import MongoClient !uri = ‘mongodb://127.0.0.1?replicaSet=MYREPLICA' mc = MongoClient(uri)
Connecting to Replica Set#!/bin/python from pymongo import MongoClient !uri = ‘mongodb://127.0.0.1' mc = MongoClient(host=uri, replicaSet='MYREPLICA')
Database Instance#!/bin/python from pymongo import MongoClient mc = MongoClient() !db = mc['madrid_pug'] !#or !db = mc.madrid_pug
database instance
Collection Instance#!/bin/python from pymongo import MongoClient mc = MongoClient() !coll = mc[‘madrid_pug’]['testcollection'] !#or !coll = mc.madrid_pug.testcollection
collection instance
CRUD
http://www.ferdychristant.com/blog//resources/Web/$FILE/crud.jpg
Operations
• Insert • Remove • Update • Query • Aggregate • Create Indexes • …
CRUD
• Insert • Remove • Update • Query • Aggregate • Create Indexes • …
Insert#!/bin/python from pymongo import MongoClient mc = MongoClient() !coll = mc['madrid_pug']['testcollection'] !!coll.insert( {'field_one': 'some value'})
Find#!/bin/python from pymongo import MongoClient mc = MongoClient() !coll = mc['madrid_pug']['testcollection'] !!cur = coll.find_one( {'field_one': 'some value'}) !for d in cur: print d
Update#!/bin/python from pymongo import MongoClient mc = MongoClient() !coll = mc['madrid_pug']['testcollection'] !!result = coll.update_one( {'field_one': 'some value'}, {"$set": {'field_one': 'new_value'}} ) #or !result = coll.update_many( {'field_one': 'some value'}, {"$set": {'field_one': 'new_value'}} ) !print(result) !
Remove#!/bin/python from pymongo import MongoClient mc = MongoClient() !coll = mc['madrid_pug']['testcollection'] !!result = coll.delete_one( {'field_one': 'some value’}) !#or !result = coll.delete_many( {'field_one': 'some value'}) !print(result) !
Aggregate
http://4.bp.blogspot.com/-‐0IT3rIJkAtM/Uud2pTrGCbI/AAAAAAAABZM/-‐XUK7j4ZHmI/s1600/snowflakes.jpg
Aggregation Framework
• Analytical workload solution • Pipeline processing • Several Stages
• $match • $group • $project • $unwind • $sort • $limit • $skip • $out
!
• http://docs.mongodb.org/manual/aggregation/
Aggregation Framework#!/bin/python from pymongo import MongoClient mc = MongoClient() !coll = mc['madrid_pug']['testcollection'] !!cur = coll.aggregate( [ {"$match": {'field_one': {"$exists": True }}} , {"$project": { "new_label": "$field_one" }} ] ) !for d in cur: print(d)
GridFS
http://www.appuntidigitali.it/site/wp-‐content/uploads/rawdata.png
GridFS
• MongoDB has a 16MB document size limit • So how can we store data bigger than 16MB? • Media files (images, pdf’s, long binary files …)
• GridFS • Convention more than a feature • All drivers implement this convention
• pymongo is no different • Very flexible approach • Handy out-of-the-box solution
GridFS#!/bin/python from pymongo import MongoClient import gridfs !!mc = MongoClient() database = mc.grid_example !!gfs = gridfs.GridFS( database) !read_file = open( '/tmp/somefile', 'r') !gfs.put(read_file, author='Norberto', tags=['awesome', 'madrid', 'pug'])
call grids lib w/ database
GridFS#!/bin/python from pymongo import MongoClient import gridfs !!mc = MongoClient() database = mc.grid_example !!gfs = gridfs.GridFS( database) !read_file = open( '/tmp/somefile', 'r') !gfs.put(read_file, author='Norberto', tags=['awesome', 'madrid', 'pug'])
open file for reading
GridFS#!/bin/python from pymongo import MongoClient import gridfs !!mc = MongoClient() database = mc.grid_example !!gfs = gridfs.GridFS( database) !read_file = open( '/tmp/somefile', 'r') !gfs.put(read_file, author='Norberto', tags=['awesome', 'madrid', 'pug'])
call put to store file and metadata
GridFSmongo nair(mongod-‐3.1.0-‐pre-‐) grid_sample> show dbs grid_sample 0.246GB local 0.000GB nair(mongod-‐3.1.0-‐pre-‐) grid_sample> show collections fs.chunks 258.995MB / 252.070MB fs.files 0.000MB / 0.016MB
database created
GridFSmongo nair(mongod-‐3.1.0-‐pre-‐) grid_sample> show dbs grid_sample 0.246GB local 0.000GB nair(mongod-‐3.1.0-‐pre-‐) grid_sample> show collections fs.chunks 258.995MB / 252.070MB fs.files 0.000MB / 0.016MB 2 collections
GridFSmongo nair(mongod-‐3.1.0-‐pre-‐) grid_sample> show dbs grid_sample 0.246GB local 0.000GB nair(mongod-‐3.1.0-‐pre-‐) grid_sample> show collections fs.chunks 258.995MB / 252.070MB fs.files 0.000MB / 0.016MB
chunks collection holds binary data
files holds metada data
Indexes
Indexes
• Single Field • Compound • Multikey • Geospatial
• 2d • 2dSphere - GeoJSON
• Full Text • Hash Based • TTL indexes • Unique • Sparse
Single Field Indexfrom pymongo import ASCENDING, MongoClient mc = MongoClient() !coll = mc.madrid_pug.testcollection !coll.ensure_index( 'some_single_field', ASCENDING )
indexed field indexing order
Compound Field Indexfrom pymongo import ASCENDING, DESCENDING, MongoClient mc = MongoClient() !coll = mc.madrid_pug.testcollection !coll.ensure_index( [('field_ascending', ASCENDING), ('field_descending', DESCENDING)] )
indexed fields indexing order
Multikey Field Indexmc = MongoClient() !coll = mc.madrid_pug.testcollection !!coll.insert( {'array_field': [1, 2, 54, 89]}) !coll.ensure_index( 'array_field')
indexed field
Geospatial Field Indexfrom pymongo import GEOSPHERE import geojson !!p = geojson.Point( [-73.9858603477478, 40.75929362758241]) !coll.insert( {'point', p) !coll.ensure_index( [( 'point', GEOSPHERE )])
index type
ODM and others
Friends
• mongoengine • http://mongoengine.org/
• Motor • http://motor.readthedocs.org/en/stable/ • async driver • Tornado • Greenlets
• ming • http://sourceforge.net/projects/merciless/
Let's recap
Recap
• MongoDB is Awesome • Specially to work with Python
• pymongo • super well supported • fully in sync with MongoDB server
MongoDB 3.0 is here!
Go and Play!
https://www.mongodb.com/lp/download/mongodb-‐enterprise?jmp=homepage
http://www.mongodb.com/norberto