webinar: transitioning from sql to mongodb
TRANSCRIPT
Transitioning from SQL to MongoDB
Joe DrumgooleDirector of Developer Advocacy, EMEA
V1.2
Before We Begin• This webinar is being recorded
• Use The Chat Window for
• Technical assistance• Q&A
• MongoDB Team will answer quick questions in real time• “Common” questions will be reviewed at the end of the
webinar
Who is your Presenter?• Programmer
• Developer Manager
• Entrepreneur
• Geek
• Some time pre-sales guy
MongoDB: The New Default Database
Document Data Model
Open-Source
Fully FeaturedHigh Performance
Scalable
{ name: “John Smith”, pfxs: [“Dr.”,”Mr.”], address: “10 3rd St.”, phone: {
home: 1234567890, mobile: 1234568138 }}
6
It’s a JSON Database
{u'_id': ObjectId('58511bfbb26a8803b6b4d56c'), u'batchID': 108, u'member': {u'chapters': [{u'id': 1775736, u'name': u'London MongoDB User Group', u'urlname': u'London-MongoDB-User-Group'}, {u'id': 1780459, u'name': u'Stockholm MongoDB User Group', u'urlname': u'Stockholm-MongoDB-User-Group'}, {u'id': 3478392, u'name': u'Dublin MongoDB User Group', u'urlname': u'DublinMUG'}, u'urlname': u'Mannheim-MongoDB-User-Group'}], u'city': u'Dublin', u'country': u'Ireland', u'events_attended': 13, u'is_organizer': True, u'join_time': datetime.datetime(2013, 10, 30, 17, 5, 31), u'last_access_time': datetime.datetime(2016, 12, 13, 15, 45, 27), u'location': {u'coordinates': [-6.25, 53.33000183105469], u'type': u'Point'}, u'member_id': 99473492, u'member_name': u'Joe Drumgoole', u'photo_thumb_url': u'http://photos2.meetupstatic.com/photos/member/e/5/0/1/thumb_255178625.jpeg'}, u'timestamp': datetime.datetime(2016, 12, 14, 10, 16, 27, 607000)}
Typed
Hierarchical, with lists and maps
Geo-Spatial
Functions of a Database
• Durable data storage
• Structural representation of data
• CRUD operations
• Authentication and authorization
• Programmer Efficiency?
What Are Your Developers Doing All Day?
1964 - IMS
1977 - Oracle
1984 - dBASE
1991 - MySQL
2009 - MongoDB
The Challenge is Product Development
1976 2016
Business Data Goals
Process Payroll Monthly Process real-time billing to the minute for 1m customers
Release Schedule
Semi-Annually Monthly
Application/Code COBOL, Fortran, Algol, PL/1, assembler, proprietary tools
Python, Java, Node.js, Ruby, PHP, Perl, Scala, Erlang and the rest
Tools None Apache, LAMP, Mean, Eclipse, Intellij, Sourceforge etc.
Database I/VSAM, early RDBMS RDBMS, NoSQL
Rectangles are 1976. Maps and Lists are 2016{ customer_id : 1,
first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [ {
type : “work”,number: “1-800-555-
1212”},{ type : “home”,
number: “1-800-555-1313”,
DNC: true},{ type : “home”,
number: “1-800-555-1414”,
DNC: true}
] }
An Actual Code ExampleLet’s compare and contrast RDBMS/SQL to MongoDB development using Java over the course of a few weeks.
Some ground rules:1. Observe rules of Software Engineering 101: Assume separation of application,
Data Access Layer, and database implementation
2. Data Access Layer must be able toa. Expose simple, functional, data-only interfaces to the application
• No ORM, frameworks, compile-time bindings, special toolsb. Exploit high performance features of database
3. Focus on core data handling code and avoid distractions that require the same amount of work in both technologies
a. No exception or error handlingb. Leave out DB connection and other setup resources
4. Day counts are a proxy for progress, not actual time to complete indicated task5. Don’t expect to cut and paste this code
The Task: Saving and Fetching Contact data
Map m = new HashMap(); m.put(“name”, “Joe D”);m.put(“id”, “K1”);
Start with this simple, flat shape in the Data Access Layer:
id = save(Map m)And assume we save it in this way:
Map m = fetch(String id)And assume we fetch one by primary key in this way:
Brace yourself…..
MongoDBSQL
DDL: create table contact ( … )
init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name ) values ( ?,? )”); fetchStmt = connection.prepareStatement (“select id, name from contact where id = ?”);}
save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.execute();}
Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {
m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));
} return m;}
Day 1: Initial efforts for both technologies
DDL: none
Map fetch(String id){ Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext())
m = (Map) c.next(); } return m;}
save( Map m ){ collection.insert(Document( m ));}
{“name” : ”Joe D”, “id” : ”K1” }
Day 2: Add simple fields
m.put(“name”, “Joe D”);m.put(“id”, “K1”);m.put(“title”, “Mr.”);m.put(“hireDate”, new Date(2011, 11, 1));
• Capturing title and hireDate is part of adding a new business feature
• It was pretty easy to add two fields to the structure
• …but now we have to change our persistence code
SQL Day 2 (changes in bold)DDL: alter table contact add title varchar(8); alter table contact add hireDate date;
init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate from contact where id = ?”);}
save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); contactInsertStmt.execute();}
Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {
m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”, rs.getString(3));m.put(“hireDate”, rs.getDate(4));
} return m;}
Consequences:1. Code release schedule linked
to database upgrade (new code cannot run on old schema)
2. Issues with case sensitivity starting to creep in (many RDBMS are case insensitive for column names, but code is case sensitive)
3. Changes require careful mods in 4 places
4. Beginning of technical debt
MongoDB Day 2
save( Map m ){ collection.insert(Document( m ));}
Map fetch(String id){ Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext())
m = (Map) c.next(); } return m;}
Advantages:1. Zero time and money spent on
overhead code
2. Code and database not physically linked
3. New material with more fields can be added into existing collections; backfill is optional
4. Names of fields in database precisely match key names in code layer and directly match on name, not indirectly via positional offset
5. No technical debt is created
✔ NO CHANGE
Day 3: Add list of phone numbersm.put(“name”, “Joe D”);m.put(“id”, “K1”);m.put(“title”, “Mr.”);m.put(“hireDate”, new Date(2011, 11, 1));
n1.put(“type”, “work”);n1.put(“number”, “1-800-555-1212”));list.add(n1);n2.put(“type”, “home”));n2.put(“number”, “1-866-444-3131”));list.add(n2);m.put(“phones”, list);
• It was still pretty easy to add this data to the structure• .. but meanwhile, in the persistence code …
REALLY brace yourself…
SQL Day 3 changes: Option 1: Assume just 1 work and 1 home phone number
DDL: alter table contact add work_phone varchar(16); alter table contact add home_phone varchar(16); init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate, work_phone, home_phone ) values ( ?,?,?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, work_phone, home_phone from contact where id = ?”);}
save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { String t = onePhone.get(“type”); String n = onePhone.get(“number”); if(t.equals(“work”)) { contactInsertStmt.setString(5, n);
} else if(t.equals(“home”)) { contactInsertStmt.setString(6, n);
} } contactInsertStmt.execute();}
Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {
m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”, rs.getString(3));m.put(“hireDate”, rs.getDate(4));
Map onePhone;onePhone = new HashMap();onePhone.put(“type”, “work”);onePhone.put(“number”, rs.getString(5));list.add(onePhone);onePhone = new HashMap();onePhone.put(“type”, “home”);onePhone.put(“number”, rs.getString(6));list.add(onePhone);
m.put(“phones”, list);}
This is just plain bad….
SQL Day 3 changes: Option 2:Proper approach with multiple phone
numbersDDL: create table phones ( … )
init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”);}
save(Map m){ startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”));
for(Map onePhone : m.get(“phones”)) {c2stmt.setString(1, m.get(“id”));c2stmt.setString(2, onePhone.get(“type”));c2stmt.setString(3,
onePhone.get(“number”));c2stmt.execute();
} contactInsertStmt.execute(); endTrans();}
Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); int i = 0; List list = new ArrayList(); while (rs.next()) {
if(i == 0) {m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”,
rs.getString(3));m.put(“hireDate”,
rs.getDate(4)); m.put(“phones”, list);
}Map onePhone = new HashMap();onePhone.put(“type”, rs.getString(5));onePhone.put(“number”, rs.getString(6));
list.add(onePhone);i++;
} return m;}
This took time and money
SQL Day 5: Zero or More Entries
init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select A.id, A.name, A.title, A.hiredate, B.type, B.number from contact A left outer join phones B on (A.id = B. id) where A.id = ?”);}
Whoops! And it’s also wrong!We did not design the query accounting for contacts that have no phone number. Thus, we have to change the join to an outer join.
But this ALSO means we have to change the unwind logic
This took more time and money!
while (rs.next()) {if(i == 0) { // …}String s = rs.getString(5);if(s != null) { Map onePhone = new HashMap(); onePhone.put(“type”, s); onePhone.put(“number”,
rs.getString(6)); list.add(onePhone); } }
…but at least we have a DAL…right?
MongoDB Day 3
Advantages:1. Zero time and money spent on
overhead code
2. No need to fear fields that are “naturally occurring” lists containing data specific to the parent structure and thus do not benefit from normalization and referential integrity
3. Safe from “Zero or More” entities
save( Map m ){ collection.insert(Document( m ));}
Map fetch(String id){ Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext())
m = (Map) c.next(); } return m;}
✔ NO CHANGE
By Day 14, our structure looks like this:n4.put(“geo”, “US-EAST”);n4.put(“startupApps”, new String[] { “app1”, “app2”, “app3” } );list2.add(n4);
n4.put(“geo”, “EMEA”);n4.put(“startupApps”, new String[] { “app6” } );n4.put(“useLocalNumberFormats”, false):list2.add(n4);
m.put(“preferences”, list2)
n6.put(“optOut”, true);n6.put(“assertDate”, someDate);seclist.add(n6);m.put(“attestations”, seclist)
m.put(“security”, mapOfDataCreatedByExternalSource);
SQL Day 14
Error: Could not fit all the code into this space.
But very likely, among other things:
• n4.put(“startupApps”,new String[]{“app1”,“app2”,“app3”});was implemented as a single semi-colon delimited string or we had to create another table and change the DAL
• m.put(“security”, anotherMapOfData);was implemented by flattening it out and storing a subset of fields or as a blob
MongoDB Day 14 – and every other day
Advantages:1. Zero time and money spent on
overhead code
2. Persistence is so easy and flexible and backward compatible that the persistor does not upward-influence the shapes we want to persist i.e. the tail does not wag the dog
save( Map m ){ collection.insert(Document( m ));}
Map fetch(String id){ Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext())
m = (Map) c.next(); } return m;}
✔ NO CHANGE
But what if we must do a join?Both RDBMS and MongoDB will have a PhoneTransactions table/collection
{ customer_id : 1,first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [ {
type : “work”,number: “1-800-555-
1212”},{ type : “home”,
number: “1-800-555-1313”,
DNC: true},{ type : “home”,
number: “1-800-555-1414”,
DNC: true}
] }
{ number: “1-800-555-1212”, target: “1-999-238-3423”, duration: 20}{ number: “1-800-555-1212”, target: “1-444-785-6611”, duration: 243}{ number: “1-800-555-1414”, target: “1-645-331-4345”, duration: 132}{ number: “1-800-555-1414”, target: “1-990-875-2134”, duration: 71}
PhoneTransactions
SQL Join Attempt #1select A.id, A.lname, B.type, B.number, C.target, C.durationfrom contact A, phones B, phonestx Cwhere A.id = B.id and B.number = C.number
id | lname | type | number | target | duration-----+--------------+------+----------------+----------------+---------- g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7070 | 23 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7071 | 7 g9 | Moschetti | work | 1-800-989-2231 | 1-987-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7071 | 7 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7070 | 23 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7071 | 7 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7070 | 23 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7072 | 9 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7072 | 9 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7072 | 9
How to turn this into a list of names –each with a list of numbers, each of those with a list of target
numbers?
SQL Unwind Attempt #1Map idmap = new HashMap();ResultSet rs = fetchStmt.execute();while (rs.next()) { String id = rs.getString(“id"); String nmbr = rs.getString("number"); List tnum; Map snum; if((snum = (List) idmap.get(id)) == null) { snum = new HashMap(); idmap.put(did, snum); } if((tnum = snum.get(nmbr)) == null) {
tnum = new ArrayList(); snum.put(number, tnum);
} Map info = new HashMap(); info.put("target", rs.getString("target")); info.put("duration", rs.getInteger("duration")); tnum.add(info);}// idmap[“g9”][“1-900-555-1212”] = ({target:1-222-707-7070,duration:23…)
SQL Join Attempt #2select A.id, A.lname, B.type, B.number, C.target, C.durationfrom contact A, phones B, phonestx Cwhere A.id = B.id and B.number = C.number order by A.id, B.number
id | lname | type | number | target | duration-----+--------------+------+----------------+----------------+---------- g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7072 | 9 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7070 | 23 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7071 | 7 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7072 | 9 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7070 | 23 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7071 | 7 g9 | Moschetti | work | 1-800-989-2231 | 1-987-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7071 | 7 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7070 | 23
“Early bail out” from cursor is now possible – but logic to construct list of source and target numbers is similar
SQL is about Disassembly
String s = “select A, B, C, D, E, F from T1,T2,T3 where T1.col = T2.col and T2.col2 = T3.col2 and X = Y and X2 != Y2 and G > 10 and G < 100 and TO_DATE(‘ …”;
ResultSet rs = execute(s);
while(ResultSet.next()) { if(new column1 value from T1) { set up new Object; } if(new column2 value from T2) { set up new Object2 } if(new column3 value from T3) { set up new Object3 } populate maps, lists and scalars}
Design a Big Query including business logic to grab all the data up front
Throw it at the engine
Disassemble Big Rectangle into usable objects with logic implicit in change in column values
MongoDB is about Assembly
Cursor c = coll1.find({“X”:”Y”});while(c.hasNext()) { populate maps, lists and scalars;
Cursor c2 = coll2.find(logic+key from c); while(c2.hasNext()) { populate maps, lists and scalars;
Cursor c3 = coll3.find(logic+key from c2); while(c3.hasNext()) { populate maps, lists and scalars; }}
DIY:
OR assemble usable objects incrementally with explicit calls to $lookup and $graphLookup
MongoDB ”Join”
db.contacts.aggregate([{$unwind: "$phones"},{$lookup: { from: "phonestx”, localField: "phones.number”, foreignField: "number", as:"TX"}}]);
{"customer_id" : 1,"first_name" : "Mark","last_name" : "Smith","city" : "San Francisco","phones" : {
"type" : "home","number" : "1-800-555-
1414","DNC" : true
},"TX" : [
{"number" : "1-800-
555-1414","target" : "1-645-
331-4345","duration" : 132
},{
"number" : "1-800-555-1414",
"target" : "1-990-875-2134",
"duration" : 71}
]}
But what about “real” queries?
• MongoDB query language is a physical map-of-map based structure, not a String
• Operators (e.g. AND, OR, GT, EQ, etc.) and arguments are keys and values in a cascade of Maps
• No grammar to parse, no templates to fill in, no whitespace, no escaping quotes, no parentheses, no punctuation
• Same paradigm to manipulate data is used to manipulate query expressions
• …which is also, by the way, the same paradigm for working with MongoDB metadata and explain()
33
Mongo ShellJD10Gen:mugalyser jdrumgoole$ mongoMongoDB shell version: 3.2.7connecting to: testMongoDB Enterprise > use MUGSswitched to db MUGSMongoDB Enterprise > show collectionsattendeesauditgroupsmemberspast_eventsupcoming_eventsMongoDB Enterprise >MongoDB Enterprise > db.members.find( { "batchID" : 108, "member.member_name" : "Joe Drumgoole" }).pretty(){
"_id" : ObjectId("58511bfbb26a8803b6b4d56c"),"member" : {
"city" : "Dublin","events_attended" : 13,"last_access_time" : ISODate("2016-12-13T15:45:27Z"),"country" : "Ireland","member_id" : 99473492,"chapters" : [
{"urlname" : "London-MongoDB-User-Group","name" : "London MongoDB User Group","id" : 1775736…
MongoDB Query Examples
SQL CLI select * from contact A, phones B whereA.did = B.did and B.type = 'work’;
MongoDB CLI db.contact.find({"phones.type”:”work”});
SQL in Java String s = “select * from contact A, phones B where A.did = B.did and B.type = \'work\’”;ResultSet rs = execute(s);
MongoDB viaJava driver
Cursor c = contact.find(eq( “phones.type”, “work” ));
Find all contacts with at least one work phone
MongoDB Query Examples
SQL select A.did, A.lname, A.hiredate, B.type, B.number from contact A left outer join phones B on (B.did = A.did) where b.type = 'work' or A.hiredate > '2014-02-02'::date
Java db.contacts.find( or( eq( “phones.type’ : ”work” ), gt( “hiredate”, Date( 2014, 2, 2 ))
CLI db.contacts.find( { $or : [ { “phones.type” : “work” }, { “hiredate” : new Date("2014-02-02T00:00:00.000Z")]})
Find all contacts with at least one work phone or hired after 2014-02-02
…and before you ask…
Yes, MongoDB query expressions support
1. Sorting2. Cursor size limit3. Projection (asking for only parts of the rich
shape to be returned)4. Aggregation (“GROUP BY”) functions
Maybe even MORE powerful than SQL…?
> db.results.values.aggregate([{$match: { runnum:23, timeSeriesPath: "CDSSpread.12M//1909468128” }
,{$project: { timeSeriesPath: "$timeSeriesPath", values: foml }}
,{$unwind: {path: "$values", idx: "v_idx"}}
,{$match: {values: {$gt: 60}, {$or: [ {idx: 0}, {idx: {$size: . . .}
,{$group: {_id: {a: "$timeSeriesPath", b: term: "$idx"}, n: {$sum:1}, max: {$max: "$values"}, min: {$min: "$values"}}, sdev: {$stdDevPop: "$values"}}
,{$lookup: { from: ”deskLimits", localField: ”instID", foreignField: ”instID", as: ”inst"}}
,{$match: {maxDeskLimit: {$gt: {$cond: [ {$gt: [2, $max]}, 2, $max]}}}},{$group: {_id: "$deskID", total: {$sum: “$max”}}}]);
What is an Aggregation Pipeline
Match
Project
Join
Graph
Sort
View
39
Aggregation Pipeline
$match{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}
{}
{★ds}{★ds}{★ds}
40
Aggregation Pipeline
$match $project{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}
{}
{★ds}{★ds}{★ds}
{=d+s}
41
Aggregation Pipeline
$match $project{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}
{}
{★ds}{★ds}{★ds}
{★}{★}{★}
{=d+s}
42
Aggregation Pipeline
$match $project $lookup{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}
{}
{★ds}{★ds}{★ds}
{★}{★}{★}{★}
{★}{★}{★}
{=d+s}
43
Aggregation Pipeline
$match $project $lookup{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}
{}
{★ds}{★ds}{★ds}
{★}{★}{★}{★}
{★}{★}{★}
{=d+s}
{★[]}{★[]}{★}
44
Aggregation Pipeline
$match $project $lookup $group{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}
{}
{★ds}{★ds}{★ds}
{★}{★}{★}{★}
{★}{★}{★}
{=d+s}
{ Σ λ σ}{ Σ λ σ}{ Σ λ σ}
{★[]}{★[]}{★}
Aggregation Pipeline Stages
• $matchFilter documents
• $geoNearGeospherical query
• $projectReshape documents
• $lookup Left-outer equi joins
• $unwindExpand documents
• $groupSummarize documents
• $sampleRandomly selects a subset of documents
• $sortOrder documents
• $skipJump over a number of documents
• $limitLimit number of documents
• $redactRestrict documents
• $outSends results to a new collection
The Fundamental Change with mongoDB
RDBMS designed in era when:• CPU and disk was slow &
expensive• Memory was VERY expensive• Network? What network?• Languages had limited means to
dynamically reflect on their types• Languages had poor support for
richly structured types
Thus, the database had to• Act as combiner-coordinator of
simpler types• Define a rigid schema• (Together with the code) optimize
at compile-time, not run-time
In mongoDB, the data is the schema!
MongoDB and the Rich Map EcosystemGeneric comparison of two records
Map expr = new HashMap();expr.put("myKey", "K1");DBObject a = collection.findOne(expr);expr.put("myKey", "K2");DBObject b = collection.findOne(expr);List<MapDiff.Difference> d = MapDiff.diff((Map)a, (Map)b);
Getting default values for a thing on a certain date and then overlaying user preferences (like for a calculation run)
Map expr = new HashMap();expr.put("myKey", "DEFAULT");expr.put("createDate", new Date(2013, 11, 1));DBObject a = collection.findOne(expr);expr.clear();expr.put("myKey", "user1");DBObject b = otherCollectionPerhaps.findOne(expr);MapStack s = new MapStack();s.push((Map)a);s.push((Map)b);Map merged = s.project();
Runtime reflection of Maps and Lists enables generic powerful utilities (MapDiff, MapStack) to be created once and used for all kinds of shapes,
saving time and money
Lastly: A CLI with teeth> db.contact.find({"SeqNum": {"$gt”:10000}}).explain();{ "cursor" : "BasicCursor",
"n" : 200000,//..."millis" : 223
}
Try a query and show the diagnostics
> for(v=[],i=0;i<3;i++) {… n = i*50000;… expr = {"SeqNum": {"$gt”: n}};… v.push( [n, db.contact.find(expr).explain().millis)] }
Run it 3 times with smaller and smaller chunks and create a vector of timing result pairs (size,time)
> v[ [ 0, 225 ], [ 50000, 222 ], [ 100000, 220 ] ]
Let’s see that vector
> load(“jStat.js”)> jStat.stdev(v.map(function(p){return p[1];}))2.0548046676563256
Use any other javascript you want inside the shell
> for(i=0;i<3;i++) {… expr = {"SeqNum": {"$gt":i*1000}};… db.foo.insert(db.contact.find(expr).explain()); }
Party trick: save the explain() output back into a collection!
And There is More – Compass and AtlasCompass
Atlas
50
What Does This Add Up To?R
elat
iona
l NoSQ
LExpressive Query
Language
StrongConsistency
Secondary Indexes
Flexibility
Scalability
Performance