mongodb - how to model and extract your data
TRANSCRIPT
MongoDB How to model and extract your data
whoami
Francesco Lo Franco
Software developer
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
@__kekko
it.linkedin.com/in/francescolofranco/
What is MongoDB?
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
MongoDBis an open source database
that uses adocument-oriented
data model.
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
MongoDB Data Model
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
MongoDB uses a Json-like representation of his data
(Bson)
Bson > Json● custom types (Date, ObjectID...)● faster● lightweight
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
collections
documents
key-value pairs
MongoDB Data Model example (BLOG POST):{ "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, {
"content": "This is the body of comment", "author_id": ObjectId("508d35")} ]}
MongoDB Data Model example (BLOG POST):{ "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, {
"content": "This is the body of comment", "author_id": ObjectId("508d35")} ]}
MongoDB Data Model example (BLOG POST):{ "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, {
"content": "This is the body of comment", "author_id": ObjectId("508d35")} ]}
REFERENCING vs
EMBEDDING
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
One to few> db.employee.findOne()
{ name: 'Kate Monster', ssn: '123-456-7890', addresses:
[{ street: 'Lombard Street, 26', zip_code: '22545' },
{ street: 'Abbey Road, 99', zip_code: '33568' }]}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Disadvantages:
- It’s really hard accessing the embedded details as stand-alone entities
example:
“Show all addresses with a certain zip code”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Advantages:
- One query to get them all
- embedded + value object =
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
One to many> db.parts.findOne()
{ _id: ObjectID('AAAAF17CD2AAAAAAF17CD2'), partno: '123-aff-456', name: '#4 grommet', qty: 94, cost: 0.94, price: 3.99}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
One to many> db.products.findOne()
{ name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ ObjectID('AAAAF17CD2AAAAAAF17CD2AA'), ObjectID('F17CD2AAAAAAF17CD2AAAAAA'), ObjectID('D2AAAAAAF17CD2AAAAAAF17C'), // etc
]}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Disadvantages:
“find all parts that compose a product”
> product = db.products.findOne({catalog_number: 1234
});> product_parts = db.parts.find({
_id: { $in : product.parts } } ).toArray() ;
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
DENORMALIZATION
Advantages:
- Easy to search and update an individual referenced document (a single part)
- free N-to-N schema without join table
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
parts: [ ObjectID('AAAAF17CD2AAAAAAF17CD2AA'), ObjectID('F17CD2AAAAAAF17CD2AAAAAA'), ObjectID('D2AAAAAAF17CD2AAAAAAF17C')]
One to squillions(Logging)
- document limit size = 16M
- can be reached even if the referencing array contains only the objectId field(~ 1,300,000 references)
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Parent Referencing> db.hosts.findOne(){ _id: ObjectID('AAAB'), name: 'goofy.example.com', ipaddr: '127.66.66.66'}
> db.logmsg.findOne(){ time: ISODate("2014-03-28T09:42:41.382Z"), message: 'cpu is on fire!', host: ObjectID('AAAB')}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Disadvantages:“find the most recent 5K messages for a host”
> host = db.hosts.findOne({ipaddr : '127.66.66.66'
});> last_5k_msg = db.logmsg.find({
host: host._id}).sort({time : -1}).limit(5000).toArray()
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
DENORMALIZATION
DENORMALIZATION
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
NORMALIZATION
To be denormalized> db.products.findOne()
{ name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ ObjectID('AAAA'), ObjectID('F17C'), ObjectID('D2AA'), // etc ]}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Denormalized (partial + one side)> db.products.findOne()
{ name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ { id: ObjectID('AAAA'), name: 'part1'}, { id: ObjectID('F17C'), name: 'part2'}, { id: ObjectID('D2AA'), name: 'part3'}, // etc ]}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Advantages:
- Easy query to get product part name
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Disadvantages:- Updates become more expensive
- Cannot assure atomic and isolated updates
MongoDB it’s not
A.C.I.D. compliant
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
MongoDB supports only single document level
transaction
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
So, how can I have an (almost)
ACID Mongo?
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
1. Two Phase Commit (A+C)2. $isolate operator (I)3. enable journaling (D)
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit (A+C)
If we make a multi-update, a system failure between the 2 separate updates can bring to unrecoverable inconsistency
Create a transaction document tracking all the needed data
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Uses a bridge “transaction” document for retrying/rollback
operations not completed due to a system failure
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
TODO: transfer 100$ from A to B
Account A: total: 1000,
on_going_transactions: [];
Account B: total: 500,
on_going_transactions: []; Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Transaction document
from: “A”,to: “B”, amount: 100, status: “initial”, datetime: New Date();
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 1: Update the transaction
_id: “zzzz”from: “A”,to: “B”, amount: 100, status: “pending”, datetime: New Date();
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 2: Update Account A
update total: -100;push on_going_transactions:
{transaction where _id = “zzzz”}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 3: Update Account B
update total: +100;push on_going_transactions:
{transaction where _id = “zzzz”}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 4: Update the transaction
_id: “zzzz”from: “A”,to: “B”, amount: 100, status: “applied”, datetime: New Date();
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 5: Update Account A
pull on_going_transactions: {transaction where _id = “zzzz”}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 6: Update Account B
pull on_going_transactions: {transaction where _id = “zzzz”}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 7: Update the transaction
_id: “zzzz”from: “A”,to: “B”, amount: 100, status: “done”, datetime: New Date();
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit
This pattern emulates the sql transaction
management, achieving Atomicity + Consistency
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
$isolate operator (I)
“You can ensure that no client sees the changes until the operation
completes or errors out.”
db.car.update({ color : "RED" , $isolated : 1 },{ $inc : { count : 1 } }, { multi: true }
)
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Journaling (D)
Journaling is logging all writes (every 100ms) for recovering
purpose in case of system failure (crash)
If a clean shutdown is accomplished, journal files are
erased Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework(finally)
def: “Aggregations are operations that process data records and return computed results.”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
1) C.R.U.D.2) single purpose
aggregation operators3) pipeline4) map reduce
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
CRUD Operators:
- insert()- find() / findOne()- update()- remove()
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
1) C.R.U.D.2) single purpose
aggregation operators3) pipeline4) map reduce
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
SPAO
a) countb) distinctc) group
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
count{ a: 1, b: 0 }{ a: 1, b: 1 }{ a: 1, b: 4 }{ a: 2, b: 2 }
db.records.count( { a: 1 } ) = 3
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
distinct{ name: "jim", age: 0 }{ name: "kim", age: 1 }{ name: "dim", age: 4 }{ name: "sim", age: 2 }
db.foe.distinct("age")=[0, 1, 4, 2]
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
group{ age: 12, count: 4 }{ age: 12, count: 2 }{ age: 14, count: 3 }{ age: 14, count: 4 }{ age: 16, count: 6 }{ age: 18, count: 8 }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
group
db.records.group({ key: { age: 1 }, cond: { age: { $lt: 16 } }, reduce: function(cur,result) { result.count += cur.count }, initial: { count: 0 }})
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
group
[ { age: 12, count: 6 }, { age: 14, count: 7 }]
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
1) C.R.U.D.2) single purpose
aggregation operators3) pipeline4) map reduce
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline
“Documents enter a multi-stage pipeline that
transforms the documents into an aggregated results”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline
initial_doc $group
result1 $match
... ... ...
... ... ...
resultN $project
final
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Example> db.logs.findOne()
{ _id: ObjectId('a23ad345frt4'), os: 'android', token_id: 'ds2f43s4df', at: ISODate("2012-10-28T12:41:39.110Z"), event: “something just happened”,}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
“We need logs to be grouped by os, and count how many in a single day
interval, sort by time”
Pipeline ExampleExpected result:
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
os: 'android',date: {'year': 2012,'month': 10'day': 28
},count: 125
Pipeline Example$collection->aggregate(
array(
array('$project' => array(
'os' => 1,
'days' => array(
'year' => array('$year' => '$at'),
'month' => array('$month' => '$at'),
'day' => array('$dayOfMonth' => '$at')
)
)),
...
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Example...array( '$group' => array( '_id' => array( 'os' => '$os', 'date' => '$days', ), 'count' => array('$sum' => 1) ) )),...
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Example...array( '$sort' => array( '_id.date' => 1 ) )
));
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Optimization
…{ $limit: 100 },{ $skip: 5 },{ $limit: 10 },{ $skip: 2 }...
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Optimization
…{ $limit: 100 },{ $limit: 15 },{ $skip: 5 },{ $skip: 2 }...
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Optimization
…{ $limit: 15 },{ $skip: 7 }...
Aggregation Framework
1) C.R.U.D.2) single purpose
aggregation operators3) pipeline4) map reduce
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce
“Map reduce is a data processing paradigm for
condensing large volumes of data into useful aggregated
results”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
> db.orders.find()
{ sku: “01A”, qty: 8, total: 88 },
{ sku: “01A”, qty: 7, total: 79 },
{ sku: “02B”, qty: 9, total: 27 },
{ sku: “03C”, qty: 8, total: 24 },
{ sku: “03C”, qty: 3, total: 12 }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
“Calculate the avg price we sell
our products, grouped by sku
code, with total quantity and
total income, starting from
1/1/2015”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
db.orders.mapReduce( mapFunction, reduceFunction, { out: { merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Examplevar mapFunction = function() { var key = this.sku; var value = { tot: this.total qty: this.qty }; emit(key, value);}Result:{ 01A: [{tot: 88, qty: 8}, {tot: 79, qty: 7}] },{ 02B: {tot: 27, qty: 9} },{ 03C: [{tot: 24, qty: 8}, {tot: 12, qty: 3}] }
Map Reduce Example
db.orders.mapReduce( mapFunction, reduceFunction, { out: { merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
var reduceFunction = reducedVal = { qty: 0, tot: 0} function(key, values) { for(var i, i < values.length, i++) { reducedVal.qty += values[i].qty reducedVal.tot += values[i].tot };
return reducedVal; Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
Result:
{ 01A: {tot: 167, qty: 15} },{ 02B: {tot: 27, qty: 9} },{ 03C: {tot: 36, qty: 11} }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
db.orders.mapReduce( mapFunction, reduceFunction, { out: { merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
var finalizeFunction = function(key, reducedVal) { reducedVal.avg = reducedVal.tot/reducedVal.qty;
return reducedVal; };
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
Result:
{01A: {tot: 167, qty: 15, avg: 11.13} },{02B: {tot: 27, qty: 9, avg: 3} },{03C: {tot: 36, qty: 11, avg: 3.27} }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
db.orders.mapReduce( mapFunction, reduceFunction, { out: { merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
> db.reduced_orders.find()
{01A: {tot: 167, qty: 15, avg: 11.13} },{02B: {tot: 27, qty: 9, avg: 3} },{03C: {tot: 36, qty: 11, avg: 3.27} }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
thanks
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
References:
➔ http://docs.mongodb.org/manual➔ http://blog.mongodb.org/post/87200945828/➔ http://thejackalofjavascript.com/mapreduce-in-mongodb/
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework