introducing mongodb
TRANSCRIPT
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 1/57
Introducing: MongoDB
David J. C. Beach
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 2/57
David Beach
Software Consultant (past 6 years)
Python since v1.4 (late 90’s)Design, Algorithms, Data Structures
Sometimes Database stuff
not a “frameworks” guy
Organizer: Front Range Pythoneers
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 3/57
Outline
Part I: Trends in Databases
Part II: Mongo Basic Usage
Part III: Advanced Features
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 4/57
Part I:Trends in Databases
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 5/57
Database Trends
Past: “Relational” (RDBMS)
Data stored in Tables, Rows, Columns
Relationships designated by Primary, Foreign
keysData is controlled & queried via SQL
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 6/57
Trends:
Criticisms of RDBMSRigid data model
Hard to scale / distribute
Slow (transactions, disk seeks)
SQL not well standardizedAwkward for modern/dynamic languages
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 7/57
Trends:
FragmentationRelational with ORM (Hibernate, SQLAlchemy)
ODBMS / ORDBMS (push OO-concepts into database)
Key-Value Stores (MemcacheDB, Redis, Cassandra)
Graph (neo4j)Document Oriented (Mongo, Couch, etc...)
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 8/57
Where Mongo Fits
“The Best Features ofDocument Databases,Key-Value Stores,
and RDBMSes.”
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 9/57
What is Mongo
Document-Oriented Database
Produced by 10gen / Implemented in C++Source Code Available
Runs on Linux, Mac, Windows, Solaris
Database: GNU AGPL v3.0 License
Drivers: Apache License v2.0
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 10/57
Mongo
Advantagesjson-style documents(dynamic schemas)
exible indexing (B-Tree)
replication and high-availability (HA)
automatic shardingsupport (v1.6)*
easy-to-use API
fast queries (auto-tuningplanner)
fast insert & deletes(sometimes trade-offs)
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 11/57
Mongo
Language Bindings
C, C++, JavaPython, Ruby, Perl
PHP, JavaScript
(many more community supported ones)
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 12/57
Mongo
Disadvantages
No Relational Model / SQL
No Explicit Transactions / ACID
Limited Query API
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 13/57
When to use Mongo
Rich semistructured records (Documents)
Transaction isolation not essential
Humongous amounts of data
Need for extreme speedYou hate schema migrations
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 14/57
Part II:Mongo Basic Usage
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 15/57
Installing Mongo
Use a 64-bit OS (Linux, Mac, Windows)
Get Binaries: www.mongodb.org
Run “mongod” process
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 16/57
Installing PyMongo
Download:http://pypi.python.org/pypi/pymongo/1.7
Build with setuptools
(includes C extension for speed)
# python setup.py install
# python setup.py --no-ext install
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 17/57
Mongo Anatomy
Database
Collection
Document
Mongo Server
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 18/57
>>> import pymongo
>>> connection = pymongo.Connection(“localhost”)
Getting a Connection
Connection required for using Mongo
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 19/57
>>> db = connection.mydatabase
Finding a Database
Databases = logically separate stores
Navigation using propertiesWill create DB if not found
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 20/57
>>> blog = db.blog
Using a Collection
Collection is analogous to Table
Contains documentsWill create collection if not found
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 21/57
>>> entry1 = {“title”: “Mongo Tutorial”, “body”: “Here’s a document to insert.” }
>>> blog.insert(entry1)
ObjectId('4c3a12eb1d41c82762000001')
Inserting
collection.insert(document) => document_id
document
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 22/57
>>> entry1
{'_id': ObjectId('4c3a12eb1d41c82762000001'), 'body': "Here's a document to insert.", 'title': 'Mongo Tutorial'}
Inserting (contd.)
Documents must have ‘_id’ eld
Automatically generated unless assigned12-byte unique binary value
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 23/57
>>> entry2 = {"title": "Another Post", "body": "Mongo is powerful", "author": "David", "tags": ["Mongo", "Power"]}
>>> blog.insert(entry2)ObjectId('4c3a1a501d41c82762000002')
Inserting (contd.)
Documents may have different properties
Properties may be atomic, lists, dictionaries
another documentSunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 24/57
>>> blog.ensure_index(“author”)
>>> blog.ensure_index(“tags”)
Indexing
May create index on any eld
If eld is list => index associates all values
index by single value
by multiple values
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 25/57
bulk_entries = [ ]for i in range(100000): entry = { "title": "Bulk Entry #%i" % (i+1), "body": "What Content!", "author": random.choice(["David", "Robot"]), "tags": ["bulk",
random.choice(["Red", "Blue", "Green"])] } bulk_entries.append(entry)
Bulk Insert
Let’s produce 100,000 fake posts
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 26/57
>>> blog.insert(bulk_entries)
[ObjectId(...), ObjectId(...), ...]
Bulk Insert (contd.)
collection.insert(list_of_documents)
Inserts 100,000 entries into blogReturns in 2.11 seconds
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 27/57
>>> blog.remove() # clear everything
>>> blog.insert(bulk_entries, safe=True)
Bulk Insert (contd.)
returns in 7.90 seconds (vs. 2.11 seconds)
driver returns early; DB is still working...unless you specify “safe=True”
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 28/57
>>> blog.find_one({“title”: “Bulk Entry #12253”})
{u'_id': ObjectId('4c3a1e411d41c82762018a89'), u'author': u'Robot', u'body': u'What Content!', u'tags': [u'bulk', u'Green'], u'title': u'Bulk Entry #99999'}
Querying
collection.nd_one(spec) => document
spec = document of query parameters
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 29/57
>>> blog.find_one({“title”: “Bulk Entry #12253”, “tags”: “Green”})
{u'_id': ObjectId('4c3a1e411d41c82762018a89'), u'author': u'Robot', u'body': u'What Content!', u'tags': [u'bulk', u'Green'], u'title': u'Bulk Entry #99999'}
Querying
(Specs)Multiple conditions on document => “AND”
Value for tags is an “ANY” match
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 30/57
>>> green_items = [ ]>>> for item in blog.find({“tags”: “Green”}): green_items.append(item)
Querying
(Multiple)collection.nd(spec) => cursor
new items are fetched in bulk (behind thescenes)
>>> green_items = list(blog.find({“tags”: “Green”}))
- or -
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 31/57
>>> blog.find({"tags": "Green"}).count()
16646
Querying
(Counting)Use the nd() method + count()
Returns number of matches found
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 32/57
>>> item = blog.find_one({“title”: “Bulk Entry #12253”})>>> item.tags.append(“New”)>>> blog.update({“_id”: item[‘_id’]}, item)
Updating
collection.update(spec, document)
updates single document matching spec
“multi=True” => updates all matching docs
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 33/57
>>> blog.remove({"author":"Robot"}, safe=True)
Deleting
use remove(...)
it works like nd(...)
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 34/57
Part III:Advanced Features
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 35/57
Advanced Querying
Regular Expressions
{“tag” : re.compile(r“^Green|Blue$”)}
Nested Values {“foo.bar.x” : 3}
$where Clause (JavaScript)
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 36/57
>>> blog.find({“$or”: [{“tags”: “Green”}, {“tags”:“Blue”}]})
Advanced Querying
$lt, $gt, $lte, $gte, $ne
$in, $nin, $mod, $all, $size, $exists, $type
$or, $not
$elemmatch
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 37/57
>>> blog.find().limit(50) # find 50 articles>>> blog.find().sort(“title”).limit(30) # 30 titles>>> blog.find().distinct(“author”) # unique author names
Advanced Querying
collection.nd(...)
sort(“name”) - sortinglimit(...) & skip(...) [like LIMIT & OFFSET]
distinct(...) [like SQL’s DISTINCT]
collection.group(...) - like SQL’s GROUP BY
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 38/57
Map/Reduce
collection.map_reduce(mapper, reducer)ultimate in querying power
distribute across multiple nodes
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 39/57
Map/Reduce
Visualized
Diagram Credit:
by Tom White; O’Reilly BooksChapter 2, page 20
also see:Map/Reduce : A Visual Explanation
1 2 3
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 40/57
db.runCommand({mapreduce: "DenormAggCollection" ,query: { filter1: { '$in' : [ 'A' , 'B' ] }, filter2: 'C' ,
filter3: { '$gt' : 123 } },map: function () { emit( { d1: this .Dim1, d2: this .Dim2 }, { msum: this .measure1, recs: 1, mmin: this .measure1, mmax: this .measure2 < 100 ? this .measure2 : 0 } );},reduce: function (key, vals) { var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 }; for (var i = 0; i < vals.length; i++) { ret.msum += vals[i].msum; ret.recs += vals[i].recs; if (vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin; if ((vals[i].mmax < 100 ) && (vals[i].mmax > ret.mmax)) ret.mmax = vals[i].mmax; } return ret; },finalize: function (key, val) { val.mavg = val.msum / val.recs; return val;
},out: 'result1' ,verbose: true});db.result1. find({ mmin: { '$gt' : 0 } }). sort({ recs: -1 }). skip( 4). limit( 8);
SELECT Dim1, Dim2, SUM(Measure1) AS MSum, COUNT(*) AS RecordCount, AVG(Measure2) AS MAvg,
MIN(Measure1) AS MMin MAX( CASE WHEN Measure2 < 100 THEN Measure2 END) AS MMaxFROM DenormAggTableWHERE (Filter1 IN ( ’A’ , ’B’ )) AND (Filter2 = ‘C’ ) AND (Filter3 > 123 )GROUP BY Dim1, Dim2HAVING (MMin > 0)ORDER BY RecordCount DESCLIMIT 4, 8
!
"
#
$
%
!
&'
!
"
#
$
%
()*+,-. .01-230*2 4*5+123 6)- ,+55-.*+7 63 8-93 02 7:- 16, ;+2470*2<)-.+402= 7:- 30>- *; 7:- ?*)802= 3-7@
A-63+)-3 1+37 B- 162+6559 6==)-=67-.@
C==)-=67-3 .-,-2.02= *2 )-4*). 4*+2731+37 ?607 +2705 ;02650>670*2@A-63+)-3 462 +3- ,)*4-.+)65 5*=04@
D057-)3 :6E- 62 FGAHC470E-G-4*).I5**802= 3795-@
' C==)-=67- ;057-)02= 1+37 B- 6,,50-. 7*7:- )-3+57 3-7< 2*7 02 7:- 16,H)-.+4-@
& C34-2.02=J !K L-34-2.02=J I!
G - E 0 3 0 * 2
$ < M ) - 6 7 - .
" N ! N I N # I N '
G 0 4 8 F 3 B * ) 2 - < ) 0 4 8 * 3 B * ) 2 - @ * ) =
19OPQ A*2=*LR
ht tp://rickosborne.org/download/SQL-to-MongoDB.pdf Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 41/57
Map/ReduceExamples
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 42/57
Health Clinic Example
Person registers with the Clinic
Weighs in on the scale
1 year => comes in 100 times
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 43/57
Health Clinic Example
person = { “name”: “Bob”,
“weighings”: [
{“date”: date(2009, 1, 15), “weight”: 165.0},
{“date”: date(2009, 2, 12), “weight”: 163.2},
... ]
}
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 44/57
for i in range(N): person = { 'name': 'person%04i' % i } weighings = person['weighings'] = [ ] std_weight = random.uniform(100, 200) for w in range(100): date = (datetime.datetime(2009, 1, 1) + datetime.timedelta( days=random.randint(0, 365)) weight = random.normalvariate(std_weight, 5.0)
weighings.append({ 'date': date, 'weight': weight }) weighings.sort(key=lambda x: x['date']) all_people.append(person)
Map/Reduce
Insert Script
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 45/57
Insert Data
Performance
1
10
100
1000
1k 10k 100k
3.14s
29.5s
292s
Insert
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 46/57
map_fn = Code("""function () { this.weighings.forEach(function(z) { emit(z.date, z.weight); });
}""")
reduce_fn = Code("""function (key, values) { var total = 0; for (var i = 0; i < values.length; i++) { total += values[i];
} return total;}""")
result = people.map_reduce(map_fn, reduce_fn)
Map/Reduce
Total Weight by Day
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 47/57
>>> for doc in result.find(): print doc
{u'_id': datetime.datetime(2009, 1, 1, 0, 0), u'value':39136.600753163315}{u'_id': datetime.datetime(2009, 1, 2, 0, 0), u'value':41685.341024046182}{u'_id': datetime.datetime(2009, 1, 3, 0, 0), u'value':
38232.326554504165}
... lots more ...
Map/Reduce
Total Weight by Day
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 48/57
Total Weight by Day
Performance
1
10
100
1000
1k 10k 100k
4.29s
38.8s
384s
MapReduce
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 49/57
map_fn = Code("""function () { var target_date = new Date(2009, 9, 5); var pos = bsearch(this.weighings, "date", target_date);
var recent = this.weighings[pos]; emit(this._id, { name: this.name, date: recent.date, weight: recent.weight });};""")
reduce_fn = Code("""function (key, values) { return values[0];};""")
result = people.map_reduce(map_fn, reduce_fn, scope={"bsearch": bsearch})
Map/Reduce
Weight on Day
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 50/57
bsearch = Code("""function(array, prop, value) { var min, max, mid, midval; for(min = 0, max = array.length - 1; min <= max; ) { mid = min + Math.floor((max - min) / 2); midval = array[mid][prop]; if(value === midval) { break; } else if(value > midval) { min = mid + 1;
} else { max = mid - 1; } } return (midval > value) ? mid - 1 : mid;};""")
Map/Reduce
bsearch() function
Sunday, August 1, 2010
h
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 51/57
Weight on Day
Performance
1
10
100
1000
1k 10k 100k1.23s
10s
108s
MapReduce
Sunday, August 1, 2010
h
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 52/57
target_date = datetime.datetime(2009, 10, 5)
for person in people.find(): dates = [ w['date'] for w in person['weighings'] ] pos = bisect.bisect_right(dates, target_date) val = person['weighings'][pos]
Weight on Day
(Python Version)
Sunday, August 1, 2010
d
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 53/57
Map/Reduce
Performance
0.1
1
10
100
1000
1k 10k 100k
0.37s
2.2s
26s
1.23s
10s
108s
MapReduce Python
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 54/57
Summary
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 55/57
Resources
www.10gen.com
www.mongodb.org
MongoDBThe Denitive GuideO’Reilly
api.mongodb.org/pythonPyMongo
Sunday, August 1, 2010
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 56/57
END OF SLIDES
Sunday, August 1, 2010
Ch lkb d
8/12/2019 Introducing MongoDB
http://slidepdf.com/reader/full/introducing-mongodb 57/57
Chalkboard
is not Comic SansThis is Chalkboard, not Comic Sans.
This isn’t Chalkboard, it’s Comic Sans.
does it matter, anyway?