mongodb - how to model and extract your data

80
MongoDB How to model and extract your data

Upload: francesco-lo-franco

Post on 15-Jul-2015

176 views

Category:

Software


2 download

TRANSCRIPT

Page 1: MongoDB - How to model and extract your data

MongoDB How to model and extract your data

Page 2: MongoDB - How to model and extract your data

whoami

Francesco Lo Franco

Software developer

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

@__kekko

it.linkedin.com/in/francescolofranco/

Page 3: MongoDB - How to model and extract your data

What is MongoDB?

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 4: MongoDB - How to model and extract your data

MongoDBis an open source database

that uses adocument-oriented

data model.

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 5: MongoDB - How to model and extract your data

MongoDB Data Model

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 6: MongoDB - How to model and extract your data

MongoDB uses a Json-like representation of his data

(Bson)

Bson > Json● custom types (Date, ObjectID...)● faster● lightweight

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 7: MongoDB - How to model and extract your data

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

collections

documents

key-value pairs

Page 8: MongoDB - How to model and extract your data

MongoDB Data Model example (BLOG POST):{ "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, {

"content": "This is the body of comment", "author_id": ObjectId("508d35")} ]}

Page 9: MongoDB - How to model and extract your data

MongoDB Data Model example (BLOG POST):{ "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, {

"content": "This is the body of comment", "author_id": ObjectId("508d35")} ]}

Page 10: MongoDB - How to model and extract your data

MongoDB Data Model example (BLOG POST):{ "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, {

"content": "This is the body of comment", "author_id": ObjectId("508d35")} ]}

Page 11: MongoDB - How to model and extract your data

REFERENCING vs

EMBEDDING

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 12: MongoDB - How to model and extract your data

One to few> db.employee.findOne()

{ name: 'Kate Monster', ssn: '123-456-7890', addresses:

[{ street: 'Lombard Street, 26', zip_code: '22545' },

{ street: 'Abbey Road, 99', zip_code: '33568' }]}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 13: MongoDB - How to model and extract your data

Disadvantages:

- It’s really hard accessing the embedded details as stand-alone entities

example:

“Show all addresses with a certain zip code”

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 14: MongoDB - How to model and extract your data

Advantages:

- One query to get them all

- embedded + value object =

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 15: MongoDB - How to model and extract your data

One to many> db.parts.findOne()

{ _id: ObjectID('AAAAF17CD2AAAAAAF17CD2'), partno: '123-aff-456', name: '#4 grommet', qty: 94, cost: 0.94, price: 3.99}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 16: MongoDB - How to model and extract your data

One to many> db.products.findOne()

{ name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ ObjectID('AAAAF17CD2AAAAAAF17CD2AA'), ObjectID('F17CD2AAAAAAF17CD2AAAAAA'), ObjectID('D2AAAAAAF17CD2AAAAAAF17C'), // etc

]}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 17: MongoDB - How to model and extract your data

Disadvantages:

“find all parts that compose a product”

> product = db.products.findOne({catalog_number: 1234

});> product_parts = db.parts.find({

_id: { $in : product.parts } } ).toArray() ;

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

DENORMALIZATION

Page 18: MongoDB - How to model and extract your data

Advantages:

- Easy to search and update an individual referenced document (a single part)

- free N-to-N schema without join table

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

parts: [ ObjectID('AAAAF17CD2AAAAAAF17CD2AA'), ObjectID('F17CD2AAAAAAF17CD2AAAAAA'), ObjectID('D2AAAAAAF17CD2AAAAAAF17C')]

Page 19: MongoDB - How to model and extract your data

One to squillions(Logging)

- document limit size = 16M

- can be reached even if the referencing array contains only the objectId field(~ 1,300,000 references)

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 20: MongoDB - How to model and extract your data

Parent Referencing> db.hosts.findOne(){ _id: ObjectID('AAAB'), name: 'goofy.example.com', ipaddr: '127.66.66.66'}

> db.logmsg.findOne(){ time: ISODate("2014-03-28T09:42:41.382Z"), message: 'cpu is on fire!', host: ObjectID('AAAB')}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 21: MongoDB - How to model and extract your data

Disadvantages:“find the most recent 5K messages for a host”

> host = db.hosts.findOne({ipaddr : '127.66.66.66'

});> last_5k_msg = db.logmsg.find({

host: host._id}).sort({time : -1}).limit(5000).toArray()

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

DENORMALIZATION

Page 22: MongoDB - How to model and extract your data

DENORMALIZATION

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

NORMALIZATION

Page 23: MongoDB - How to model and extract your data

To be denormalized> db.products.findOne()

{ name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ ObjectID('AAAA'), ObjectID('F17C'), ObjectID('D2AA'), // etc ]}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 24: MongoDB - How to model and extract your data

Denormalized (partial + one side)> db.products.findOne()

{ name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ { id: ObjectID('AAAA'), name: 'part1'}, { id: ObjectID('F17C'), name: 'part2'}, { id: ObjectID('D2AA'), name: 'part3'}, // etc ]}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 25: MongoDB - How to model and extract your data

Advantages:

- Easy query to get product part name

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 26: MongoDB - How to model and extract your data

Disadvantages:- Updates become more expensive

- Cannot assure atomic and isolated updates

MongoDB it’s not

A.C.I.D. compliant

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 27: MongoDB - How to model and extract your data

MongoDB supports only single document level

transaction

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 28: MongoDB - How to model and extract your data

So, how can I have an (almost)

ACID Mongo?

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 29: MongoDB - How to model and extract your data

1. Two Phase Commit (A+C)2. $isolate operator (I)3. enable journaling (D)

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 30: MongoDB - How to model and extract your data

Two Phase Commit (A+C)

If we make a multi-update, a system failure between the 2 separate updates can bring to unrecoverable inconsistency

Create a transaction document tracking all the needed data

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 31: MongoDB - How to model and extract your data

Two Phase Commit Example

Uses a bridge “transaction” document for retrying/rollback

operations not completed due to a system failure

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 32: MongoDB - How to model and extract your data

Two Phase Commit Example

TODO: transfer 100$ from A to B

Account A: total: 1000,

on_going_transactions: [];

Account B: total: 500,

on_going_transactions: []; Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 33: MongoDB - How to model and extract your data

Two Phase Commit Example

Transaction document

from: “A”,to: “B”, amount: 100, status: “initial”, datetime: New Date();

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 34: MongoDB - How to model and extract your data

Two Phase Commit Example

Step 1: Update the transaction

_id: “zzzz”from: “A”,to: “B”, amount: 100, status: “pending”, datetime: New Date();

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 35: MongoDB - How to model and extract your data

Two Phase Commit Example

Step 2: Update Account A

update total: -100;push on_going_transactions:

{transaction where _id = “zzzz”}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 36: MongoDB - How to model and extract your data

Two Phase Commit Example

Step 3: Update Account B

update total: +100;push on_going_transactions:

{transaction where _id = “zzzz”}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 37: MongoDB - How to model and extract your data

Two Phase Commit Example

Step 4: Update the transaction

_id: “zzzz”from: “A”,to: “B”, amount: 100, status: “applied”, datetime: New Date();

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 38: MongoDB - How to model and extract your data

Two Phase Commit Example

Step 5: Update Account A

pull on_going_transactions: {transaction where _id = “zzzz”}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 39: MongoDB - How to model and extract your data

Two Phase Commit Example

Step 6: Update Account B

pull on_going_transactions: {transaction where _id = “zzzz”}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 40: MongoDB - How to model and extract your data

Two Phase Commit Example

Step 7: Update the transaction

_id: “zzzz”from: “A”,to: “B”, amount: 100, status: “done”, datetime: New Date();

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 41: MongoDB - How to model and extract your data

Two Phase Commit

This pattern emulates the sql transaction

management, achieving Atomicity + Consistency

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 42: MongoDB - How to model and extract your data

$isolate operator (I)

“You can ensure that no client sees the changes until the operation

completes or errors out.”

db.car.update({ color : "RED" , $isolated : 1 },{ $inc : { count : 1 } }, { multi: true }

)

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 43: MongoDB - How to model and extract your data

Journaling (D)

Journaling is logging all writes (every 100ms) for recovering

purpose in case of system failure (crash)

If a clean shutdown is accomplished, journal files are

erased Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 44: MongoDB - How to model and extract your data

Aggregation Framework(finally)

def: “Aggregations are operations that process data records and return computed results.”

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 45: MongoDB - How to model and extract your data

Aggregation Framework

1) C.R.U.D.2) single purpose

aggregation operators3) pipeline4) map reduce

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 46: MongoDB - How to model and extract your data

Aggregation Framework

CRUD Operators:

- insert()- find() / findOne()- update()- remove()

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 47: MongoDB - How to model and extract your data

Aggregation Framework

1) C.R.U.D.2) single purpose

aggregation operators3) pipeline4) map reduce

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 48: MongoDB - How to model and extract your data

SPAO

a) countb) distinctc) group

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 49: MongoDB - How to model and extract your data

count{ a: 1, b: 0 }{ a: 1, b: 1 }{ a: 1, b: 4 }{ a: 2, b: 2 }

db.records.count( { a: 1 } ) = 3

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 50: MongoDB - How to model and extract your data

distinct{ name: "jim", age: 0 }{ name: "kim", age: 1 }{ name: "dim", age: 4 }{ name: "sim", age: 2 }

db.foe.distinct("age")=[0, 1, 4, 2]

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 51: MongoDB - How to model and extract your data

group{ age: 12, count: 4 }{ age: 12, count: 2 }{ age: 14, count: 3 }{ age: 14, count: 4 }{ age: 16, count: 6 }{ age: 18, count: 8 }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 52: MongoDB - How to model and extract your data

group

db.records.group({ key: { age: 1 }, cond: { age: { $lt: 16 } }, reduce: function(cur,result) { result.count += cur.count }, initial: { count: 0 }})

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 53: MongoDB - How to model and extract your data

group

[ { age: 12, count: 6 }, { age: 14, count: 7 }]

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 54: MongoDB - How to model and extract your data

Aggregation Framework

1) C.R.U.D.2) single purpose

aggregation operators3) pipeline4) map reduce

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 55: MongoDB - How to model and extract your data

Pipeline

“Documents enter a multi-stage pipeline that

transforms the documents into an aggregated results”

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 56: MongoDB - How to model and extract your data

Pipeline

initial_doc $group

result1 $match

... ... ...

... ... ...

resultN $project

final

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 57: MongoDB - How to model and extract your data

Pipeline Example> db.logs.findOne()

{ _id: ObjectId('a23ad345frt4'), os: 'android', token_id: 'ds2f43s4df', at: ISODate("2012-10-28T12:41:39.110Z"), event: “something just happened”,}

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

“We need logs to be grouped by os, and count how many in a single day

interval, sort by time”

Page 58: MongoDB - How to model and extract your data

Pipeline ExampleExpected result:

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

os: 'android',date: {'year': 2012,'month': 10'day': 28

},count: 125

Page 59: MongoDB - How to model and extract your data

Pipeline Example$collection->aggregate(

array(

array('$project' => array(

'os' => 1,

'days' => array(

'year' => array('$year' => '$at'),

'month' => array('$month' => '$at'),

'day' => array('$dayOfMonth' => '$at')

)

)),

...

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 60: MongoDB - How to model and extract your data

Pipeline Example...array( '$group' => array( '_id' => array( 'os' => '$os', 'date' => '$days', ), 'count' => array('$sum' => 1) ) )),...

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 61: MongoDB - How to model and extract your data

Pipeline Example...array( '$sort' => array( '_id.date' => 1 ) )

));

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 62: MongoDB - How to model and extract your data

Pipeline Optimization

…{ $limit: 100 },{ $skip: 5 },{ $limit: 10 },{ $skip: 2 }...

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 63: MongoDB - How to model and extract your data

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Pipeline Optimization

…{ $limit: 100 },{ $limit: 15 },{ $skip: 5 },{ $skip: 2 }...

Page 64: MongoDB - How to model and extract your data

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Pipeline Optimization

…{ $limit: 15 },{ $skip: 7 }...

Page 65: MongoDB - How to model and extract your data

Aggregation Framework

1) C.R.U.D.2) single purpose

aggregation operators3) pipeline4) map reduce

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 66: MongoDB - How to model and extract your data

Map Reduce

“Map reduce is a data processing paradigm for

condensing large volumes of data into useful aggregated

results”

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 67: MongoDB - How to model and extract your data

Map Reduce Example

> db.orders.find()

{ sku: “01A”, qty: 8, total: 88 },

{ sku: “01A”, qty: 7, total: 79 },

{ sku: “02B”, qty: 9, total: 27 },

{ sku: “03C”, qty: 8, total: 24 },

{ sku: “03C”, qty: 3, total: 12 }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 68: MongoDB - How to model and extract your data

Map Reduce Example

“Calculate the avg price we sell

our products, grouped by sku

code, with total quantity and

total income, starting from

1/1/2015”

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 69: MongoDB - How to model and extract your data

Map Reduce Example

db.orders.mapReduce( mapFunction, reduceFunction, { out: { merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 70: MongoDB - How to model and extract your data

Map Reduce Examplevar mapFunction = function() { var key = this.sku; var value = { tot: this.total qty: this.qty }; emit(key, value);}Result:{ 01A: [{tot: 88, qty: 8}, {tot: 79, qty: 7}] },{ 02B: {tot: 27, qty: 9} },{ 03C: [{tot: 24, qty: 8}, {tot: 12, qty: 3}] }

Page 71: MongoDB - How to model and extract your data

Map Reduce Example

db.orders.mapReduce( mapFunction, reduceFunction, { out: { merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 72: MongoDB - How to model and extract your data

Map Reduce Example

var reduceFunction = reducedVal = { qty: 0, tot: 0} function(key, values) { for(var i, i < values.length, i++) { reducedVal.qty += values[i].qty reducedVal.tot += values[i].tot };

return reducedVal; Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 73: MongoDB - How to model and extract your data

Map Reduce Example

Result:

{ 01A: {tot: 167, qty: 15} },{ 02B: {tot: 27, qty: 9} },{ 03C: {tot: 36, qty: 11} }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 74: MongoDB - How to model and extract your data

Map Reduce Example

db.orders.mapReduce( mapFunction, reduceFunction, { out: { merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 75: MongoDB - How to model and extract your data

Map Reduce Example

var finalizeFunction = function(key, reducedVal) { reducedVal.avg = reducedVal.tot/reducedVal.qty;

return reducedVal; };

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 76: MongoDB - How to model and extract your data

Map Reduce Example

Result:

{01A: {tot: 167, qty: 15, avg: 11.13} },{02B: {tot: 27, qty: 9, avg: 3} },{03C: {tot: 36, qty: 11, avg: 3.27} }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 77: MongoDB - How to model and extract your data

Map Reduce Example

db.orders.mapReduce( mapFunction, reduceFunction, { out: { merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 78: MongoDB - How to model and extract your data

Map Reduce Example

> db.reduced_orders.find()

{01A: {tot: 167, qty: 15, avg: 11.13} },{02B: {tot: 27, qty: 9, avg: 3} },{03C: {tot: 36, qty: 11, avg: 3.27} }

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 79: MongoDB - How to model and extract your data

thanks

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

Page 80: MongoDB - How to model and extract your data

References:

➔ http://docs.mongodb.org/manual➔ http://blog.mongodb.org/post/87200945828/➔ http://thejackalofjavascript.com/mapreduce-in-mongodb/

Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework