mongosv schema workshop
DESCRIPTION
TRANSCRIPT
![Page 1: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/1.jpg)
Schema Design Workshop
Sridhar Nanjundeswaran
Software Engineer, [email protected]
@snanjund
Wednesday, December 5, 12
![Page 2: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/2.jpg)
Agenda
• Part One - Basic Schema & Patterns• Part Two - Schema Design• Part Three - Sharding• Part Four: - Replication
Wednesday, December 5, 12
![Page 3: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/3.jpg)
Why is schema design different?• RDBMS design you ask "what answers do I have"
• MongoDB you ask "what questions will I have"
Wednesday, December 5, 12
![Page 4: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/4.jpg)
Goals
• Learn Data Modeling with MongoDB• Labs to try to solve problems• Understand implications of• Replication • Sharding
Please, ask many, many questions!
Wednesday, December 5, 12
![Page 5: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/5.jpg)
Part OneBasic Schema & Patterns
Wednesday, December 5, 12
![Page 6: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/6.jpg)
So why model data?
http://bit.ly/SSs7QB
Wednesday, December 5, 12
![Page 7: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/7.jpg)
Normalization• 1970 E.F.Codd introduces 1st Normal Form (1NF)• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:• Avoid anomalies when inserting, updating or deleting• Minimize redesign when extending the schema• Make the model informative to users• Avoid bias towards a particular style of query
* source : wikipediaWednesday, December 5, 12
![Page 8: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/8.jpg)
So today’s example will use...
http://bit.ly/RyIOvO
Wednesday, December 5, 12
![Page 9: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/9.jpg)
TerminologyRDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key
Wednesday, December 5, 12
![Page 10: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/10.jpg)
Schema DesignRelational Database
Wednesday, December 5, 12
![Page 11: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/11.jpg)
Schema DesignMongoDB
Wednesday, December 5, 12
![Page 12: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/12.jpg)
Schema DesignMongoDB
linking
Wednesday, December 5, 12
![Page 13: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/13.jpg)
Schema DesignMongoDB
embedding
linking
Wednesday, December 5, 12
![Page 14: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/14.jpg)
Basic schema
Design documents that simply map to your application
> post = { author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "movie"] }
> db.blogs.save(post)
Wednesday, December 5, 12
![Page 15: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/15.jpg)
> db.blogs.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "movie" ] } Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied
Find the document
Wednesday, December 5, 12
![Page 16: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/16.jpg)
Secondary index for “author”
// 1 means ascending, -1 means descending> db.blogs.ensureIndex( { author: 1 } )
> db.blogs.find( { author: 'Hergé' } ) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Hergé", ... }
Add an index, find via Index
Wednesday, December 5, 12
![Page 17: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/17.jpg)
Examine the query plan
> db.blogs.find( { author: "Hergé" } ).explain(){! "cursor" : "BtreeCursor author_1",! "nscanned" : 1,! "nscannedObjects" : 1,! "n" : 1,! "millis" : 5,! "indexBounds" : {! ! "author" : [! ! ! [! ! ! ! "Hergé",! ! ! ! "Hergé"! ! ! ]! ! ]! }}
Wednesday, December 5, 12
![Page 18: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/18.jpg)
Examine the query plan
> db.blogs.find( { author: "Hergé" } ).explain(){! "cursor" : "BtreeCursor author_1",! "nscanned" : 1,! "nscannedObjects" : 1,! "n" : 1,! "millis" : 5,! "indexBounds" : {! ! "author" : [! ! ! [! ! ! ! "Hergé",! ! ! ! "Hergé"! ! ! ]! ! ]! }}
Wednesday, December 5, 12
![Page 19: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/19.jpg)
Examine the query plan
> db.blogs.find( { author: "Hergé" } ).explain(){! "cursor" : "BtreeCursor author_1",! "nscanned" : 1,! "nscannedObjects" : 1,! "n" : 1,! "millis" : 5,! "indexBounds" : {! ! "author" : [! ! ! [! ! ! ! "Hergé",! ! ! ! "Hergé"! ! ! ]! ! ]! }}
How long it took
Number of objects returned
Wednesday, December 5, 12
![Page 20: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/20.jpg)
Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...
// find posts with any tags> db.blogs.find( { tags: { $exists: true } } )
Regular expressions:// posts where author starts with h> db.blogs.find( { author: /^h/i } )
Counting: // number of posts written by Hergé> db.blogs.find( { author: "Hergé" } ).count()
Wednesday, December 5, 12
![Page 21: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/21.jpg)
Extending the Schema
http://bit.ly/PpjT1l
Wednesday, December 5, 12
![Page 22: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/22.jpg)
Extending the Schema> new_comment = { author: "Kyle", date: new Date(), text: "great book" }
> db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } )
Wednesday, December 5, 12
![Page 23: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/23.jpg)
Extending the Schema> new_comment = { author: "Kyle", date: new Date(), text: "great book" }
> db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } )
Increment counterAdd element to
array
Wednesday, December 5, 12
![Page 24: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/24.jpg)
> db.blogs.find( { author: "Hergé"} )
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "movie" ], comments : [! {! ! author : "Kyle",! ! date : ISODate("2011-09-19T09:56:06.298Z"),! ! text : "great book"! } ], comments_count: 1 }
Extending the Schema
Wednesday, December 5, 12
![Page 25: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/25.jpg)
// create index on nested documents:> db.blogs.ensureIndex( { "comments.author": 1 } )
> db.blogs.find( { "comments.author": "Kyle" } )
// find last 5 posts:> db.blogs.find().sort( { date: -1 } ).limit(5)
// most commented post:> db.blogs.find().sort( { comments_count: -1 } ).limit(1)
When sorting, check if you need an index
Extending the Schema
Wednesday, December 5, 12
![Page 26: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/26.jpg)
Common Patterns
http://bit.ly/SNnt4z
Wednesday, December 5, 12
![Page 27: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/27.jpg)
Inheritance
http://bit.ly/T7MqUz
Wednesday, December 5, 12
![Page 28: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/28.jpg)
Inheritance
Wednesday, December 5, 12
![Page 29: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/29.jpg)
select * from shapes;
id type area radius length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
Single Table Inheritance - RDBMS
Wednesday, December 5, 12
![Page 30: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/30.jpg)
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2}
missing values not stored!
Wednesday, December 5, 12
![Page 31: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/31.jpg)
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2}
// find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } )
Wednesday, December 5, 12
![Page 32: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/32.jpg)
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2}
// find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } )
// create index> db.shapes.ensureIndex( { radius: 1 }, { sparse:true } )
index only values present!
Wednesday, December 5, 12
![Page 33: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/33.jpg)
One to Many
http://bit.ly/Oqbt8z
Wednesday, December 5, 12
![Page 34: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/34.jpg)
One to Many
One to Many relationships can specify• degree of association between objects• containment• life-cycle
Wednesday, December 5, 12
![Page 35: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/35.jpg)
One to ManyEmbedded Array
•$slice operator to return subset of comments•some queries harder
•e.g find latest comments across all blogs
blogs: { author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [! { author : "Kyle",! ! date : ISODate("2011-09-19T09:56:06.298Z"),! ! text : "great book" } ] }
> db.blogs.find( { author: "Hergé" }, { comment: { $slice : 10 } } )
Wednesday, December 5, 12
![Page 36: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/36.jpg)
One to ManyNormalized (2 collections)• most flexible• more queries
blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), comments: [! {comment : 1)} ]}
comments : { _id : 1, blog: 1000, author : "Kyle",! ! date : ISODate("2011-09-19T09:56:06.298Z")}
> blog = db.blogs.find( { text: "Destination Moon" } );> db.comments.find( { blog: blog._id } ).limit(5);
Wednesday, December 5, 12
![Page 37: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/37.jpg)
Many to Many
http://bit.ly/QTzhBF
Wednesday, December 5, 12
![Page 38: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/38.jpg)
Many - Many
Example: • Blog can have many Tags• Tag can be used by many Blogs
Wednesday, December 5, 12
![Page 39: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/39.jpg)
// Each Tag lists the "_id" of the Blogtags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] }
{ _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }
Many - Many
Wednesday, December 5, 12
![Page 40: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/40.jpg)
// Each Tag lists the "_id" of the Blogtags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] }
{ _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }
// Each Blog lists the "tag" of the Tagsblogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] }
Many - Many
Wednesday, December 5, 12
![Page 41: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/41.jpg)
// Each Tag lists the "_id" of the Blogtags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] }
{ _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }
// Each Blog lists the "tag" of the Tagsblogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] }
Many - Many
links via unique key, in this case "tags", could be "_id"
Wednesday, December 5, 12
![Page 42: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/42.jpg)
// Each Tag lists the "_id" of the Blogtags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] }
{ _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }
// Each Blog lists the "tag" of the Tagsblogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] } // All Tags for a given Blog> db.tags.find( { blog_ids: 10 } )
Many - Many
Wednesday, December 5, 12
![Page 43: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/43.jpg)
Use _id or not?
blogs: { _id: 10, name: "..." tags: [ "comic", "movie" ] }
Pros:• Single query
Cons:• Cascade any changes
blogs: { _id: 10, name: "..." tags: [ 10, 20 ] }
Pros:• Single update
Cons:• Second query required
Wednesday, December 5, 12
![Page 44: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/44.jpg)
// Each Blog lists the _id of the Tagblogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tagtags: { _id: 20, name: "comic" }
Alternative
Wednesday, December 5, 12
![Page 45: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/45.jpg)
// Each Blog lists the _id of the Tagblogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tagtags: { _id: 20, name: "comic" }
// All Blogs for a given Tag> db.blogs.find( { tag_ids: 20 } )
Alternative
Wednesday, December 5, 12
![Page 46: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/46.jpg)
// Each Blog lists the _id of the Tagblogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tagtags: { _id: 20, name: "comic" }
// All Blogs for a given Tag> db.blogs.find( { tag_ids: 20 } )
// All Tags for a given Blog> blog = db.blogs.findOne( { _id: 10 } )> db.tags.find({_id: {$in : blog.tag_ids}})
Alternative
Wednesday, December 5, 12
![Page 47: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/47.jpg)
Many - Many Intersection AttributesExample: • Blog can have many Tags• Tag can be used my many Blogs• When a Tag is used, record the usage date
Wednesday, December 5, 12
![Page 48: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/48.jpg)
// Each Blog lists the _id of the Tagblogs: { _id: 10, name: "...", tag_ids: [ 20, 30 ] } // Association not stored on the Tagtags: { _id: 20, name: "comic" }
// Store the interaction and usage dateusages: { blog_id: 10, // Blog _id tag_id : 20, // Tag _id usage: ISODate("2012-10-12...") }
// Find the Tags for a Blogfor(var c = db.usages.find({ blog_id: 10 }); c.hasNext(); ){ u = c.next(); t = db.tags.findOne( { _id: c.tag_id } ) printjson( u.usage );
Many - Many Normalized
Wednesday, December 5, 12
![Page 49: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/49.jpg)
// Each Blog lists the Blog Usage Objectblogs: { _id: 10, name: "Destination Moon", tags: [ { tag: "comic", usage: ISODate("2012-10-12...") } { tag: "movie", usage: ISODate("2012-09-11...") } ] }
// Find the Tags for a Blog> db.blogs.find( { _id: 10 }, { tags: 1} ) Pros:• Usage object encapsulated where used
Cons:• If updates allowed, changes will have to be cascaded
Many - Many Intersection Attributes
Wednesday, December 5, 12
![Page 50: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/50.jpg)
Summary
• Single biggest performance factor
• More choices than in an RDBMS
• Embedding, index design, shard keys
Wednesday, December 5, 12
![Page 51: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/51.jpg)
Part TwoSchema Design
Wednesday, December 5, 12
![Page 52: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/52.jpg)
Lab #1Design Schema for Twitter
• Model each users activity stream• Users
• Name, email address, display name• Tweets
• Text• Who• Timestamp
Wednesday, December 5, 12
![Page 53: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/53.jpg)
Lab #1 - Solution ATwo Collections// users - one doc per user{ _id: "alvin", email: "[email protected]", display: "jonnyeight"}
// tweets - one doc per user per tweet{ user: "bob", for: "alvin", tweet: "20111209-1231", text: "Best Tweet Ever!", ts: ISODate("2011-09-18T09:56:06.298Z")}
Wednesday, December 5, 12
![Page 54: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/54.jpg)
Lab #1 - Solution BEmbedded Tweets// users - one doc per user with all tweets{ _id: "alvin", email: "[email protected]", display; "jonnyeight", tweets: [! {! ! user: "bob",! ! tweet: "20111209-1231",! ! text: "Best Tweet Ever!", ts: ISODate("2011-09-18T09:56:06.298Z")! } ]}
Wednesday, December 5, 12
![Page 55: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/55.jpg)
Embedding
• Great for read performance
• One seek to load entire object
• One roundtrip to database
• Writes can be slow if adding to objects all the time
Wednesday, December 5, 12
![Page 56: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/56.jpg)
Linking or Embedding?
Linking can make some queries easy
// Find latest 50 tweets for "alvin"> db.tweets.find( { _id:"alvin"} ) .sort( {ts:-1} ) .limit(50)
But what effect does this have on the systems?
Wednesday, December 5, 12
![Page 57: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/57.jpg)
Collection 1
Index 1
Wednesday, December 5, 12
![Page 58: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/58.jpg)
Virtual Address Space 1
Collection 1
Index 1 This is your virtual memory size
(mapped)
Wednesday, December 5, 12
![Page 59: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/59.jpg)
Virtual Address Space 1
Physical RAM
Collection 1
Index 1
This is your resident
memory size
Wednesday, December 5, 12
![Page 60: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/60.jpg)
Virtual Address Space 1
Physical RAM
DiskCollection 1
Index 1
Wednesday, December 5, 12
![Page 61: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/61.jpg)
Virtual Address Space 1
Physical RAM
DiskCollection 1
Index 1
100 ns
10,000,000 ns
=
=
Wednesday, December 5, 12
![Page 62: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/62.jpg)
Virtual Address Space 1
Physical RAM
DiskCollection 1
Index 1
> db.tweets.find( { _id: "alvin" } ) .sort( { ts: -1 } ) .limit(10)
1
2
3
Linking = Many seeks + random reads
Wednesday, December 5, 12
![Page 63: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/63.jpg)
Virtual Address Space 1
Physical RAM
DiskCollection 1
Index 1
1
Embedding = Large Sequential Read
> db.tweets.find( { _id: "alvin" } )
Wednesday, December 5, 12
![Page 64: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/64.jpg)
Lab #2Alternative Schema
• Display last 10 tweets from today• Efficiently use memory and Disk seeks / IOPs
Wednesday, December 5, 12
![Page 65: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/65.jpg)
Lab #2 - SolutionBuckets// tweets : one doc per user per day> db.tweets.findOne()
{ _id: "alvin-2011/12/09", email: "[email protected]", tweets: [ { user: "Bob",! tweet: "20111209-1231",! text: "Best Tweet Ever!" } , ! { author: "Joe",! tweet: "20111210-9025",! date: "May 27 2011",! text: "Stuck in traffic (again)" } ]}
Wednesday, December 5, 12
![Page 66: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/66.jpg)
Lab #2 - SolutionLast 10 Tweets
> db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ) .sort( { _id: -1 } ) .limit(1)
Wednesday, December 5, 12
![Page 67: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/67.jpg)
Lab #2 - SolutionAdding a Tweet> tweet = { user: "Bob",! tweet: "20111209-1231",! text: "Best Tweet Ever!" }
> db.tweets.update( { _id : "alvin-2011/12/09" }, { $push : { tweets : tweet } );
Wednesday, December 5, 12
![Page 68: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/68.jpg)
Lab #2 - SolutionGetting All Tweets> cursor = db.tweets.find ( { _id : /^alvin/ } ).sort( { _id : -1 } )
> while ( cursor.hasNext() ) { doc = cursor.next(); for ( var i=0; i<doc.tweets.length; i++ ) printjson( doc.tweets[i] )}
Wednesday, December 5, 12
![Page 69: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/69.jpg)
Lab #2 - SolutionDeleting a Tweet> db.tweets.update( { _id: "alvin-20111209" }, { $pull: { tweets: { tweet: "20111209-1231" } })
Wednesday, December 5, 12
![Page 70: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/70.jpg)
Virtual Address Space 1
Physical RAM
DiskCollection 1
Index 1
> db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ) .sort( { _id: -1 } ) .limit(1)
Bucket = 1 seek + 1 sequential read
1
Wednesday, December 5, 12
![Page 72: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/72.jpg)
Trees
Hierarchical information
Wednesday, December 5, 12
![Page 73: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/73.jpg)
Trees
Full Tree in Document
{ retweet: [ { who: “Kyle”, text: “...”, retweet: [ {who: “James”, text: “...”, retweet: []} ]} ]}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 16MB limit
Wednesday, December 5, 12
![Page 74: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/74.jpg)
Array of Ancestors// Store all Ancestors of a node { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" }
A B C
DE
F
Wednesday, December 5, 12
![Page 75: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/75.jpg)
Array of Ancestors// Store all Ancestors of a node { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" }
// find all direct retweets of "b"> db.tweets.find( { retweet: "b" } )
// find all retweets of "e" anywhere in tree> db.tweets.find( { tree: "e" } )
// find tweet history of f:> tweets = db.tweets.findOne( { _id: "f" } ).tree> db.tweets.find( { _id: { $in : tweets } } )
A B C
DE
F
Wednesday, December 5, 12
![Page 76: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/76.jpg)
Trees as Paths
Store hierarchy as a path expression• Separate each node by a delimiter, e.g. “/”• Use text search for find parts of a tree
{ retweets: [ { _id: "a", text: "initial tweet", path: "a" }, { _id: "b", text: "reweet with comment", path: "a/b" }, { _id: "c", text: "reply to retweet", path : "a/b/c"} ] }
// Find the conversations "a" started > db.tweets.find( { path: /^a/i } )
A B C
DE
F
Wednesday, December 5, 12
![Page 77: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/77.jpg)
http://bit.ly/QeNsPX
Queues & Workflows
Wednesday, December 5, 12
![Page 78: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/78.jpg)
Lab #3Following Requests• Users are allowed to "follow" another user
• User send a "follow" request• Follower approves or not• Requests are timed out after 7 days
• The approval is an async process
Wednesday, December 5, 12
![Page 79: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/79.jpg)
Lab #3 - SolutionQueues & Workflows• Need to maintain order and state• Ensure that updates are atomic
> db.approvals.insert( { inprogress: false, approved: false, priority: 1, text: "Hey Jim, want to follow you!" } );// find highest priority approval and mark as in-progressjob = db.approvals.findAndModify({ query: { inprogress: false }, sort: { priority: -1 }, update: { $set: { inprogress: true, started: new Date() } }, new: true})
Wednesday, December 5, 12
![Page 80: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/80.jpg)
Lab #3 - SolutionQueues & Workflows• Need to maintain order and state• Ensure that updates are atomic
> db.approvals.insert( { inprogress: false, approved: false, priority: 1, text: "Hey Jim, want to follow you!" } );// find highest priority approval and mark as in-progressjob = db.approvals.findAndModify({ query: { inprogress: false }, sort: { priority: -1 }, update: { $set: { inprogress: true, started: new Date() } }, new: true})
Wednesday, December 5, 12
![Page 81: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/81.jpg)
Lab #3 - SolutionQueues & Workflows
{ inprogress: true, priority: 1, approved: False, started: ISODate("2011-09-18T09:56:06.298Z") ... }
updated
added
Wednesday, December 5, 12
![Page 82: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/82.jpg)
Lab #3 - SolutionQueues & Workflows• Follower approves request
// update approval after receiving approval> job = db.approvals.update( { _id: "1234" }, { $set: { approved: true } } )
• System times out request after 7 days
var limit=new Date();limit.setDate(limit.getDate()-7);
> job = db.approvals.update( { inprogress: true, started: { $gt: limit} }, { $set: { approved: false } } )
Wednesday, December 5, 12
![Page 83: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/83.jpg)
Lab #4Voting
Twitter meets Stack Overflow
• Users can "vote" for a tweet• A user can "vote" once and only once• Need to display current votes
Wednesday, December 5, 12
![Page 84: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/84.jpg)
Lab #4 - SolutionVotes// One document per voter per tweet> db.votes.insert( { tweet: "20111209-1231", voter: "alvin" } );
// Unique index guarantees the user can't vote twice> db.votes.ensureIndex( { tweet: 1, voter: 1 }, { unique: true } );
// Count will return the number of votes cast> db.votes.find({ tweet: "20111209-1231" }).count()
Wednesday, December 5, 12
![Page 85: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/85.jpg)
Count or Not?
• Indexes in MongoDB are not counting• The count has to be computed via a index scan
// One summary document per tweet, no "voter" key> db.votes.update( { tweet: "20111209-1231", voter: { $exists: false } }, { "$inc": { count: 1 } }, true, false );
// Return the count for the no "voter" document> db.votes.find( { tweet: "20111209-1231", voter: { $exists: false } }, { count: 1, _id: 0} )
Wednesday, December 5, 12
![Page 86: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/86.jpg)
Lab #5Time Series• Records votes by
• Day, Hour, Minute• Show time series of votes cast
Wednesday, December 5, 12
![Page 87: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/87.jpg)
Lab #5 - Solution ATime Series// Time series buckets, hour and minute sub-docs{ _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 23, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 }}
Wednesday, December 5, 12
![Page 88: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/88.jpg)
Lab #5 - Solution ATime Series// Add one to the last minute before midnight> db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { daily: 1 }, $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 } )
What is the cost of updating the minute before midnight?
Wednesday, December 5, 12
![Page 89: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/89.jpg)
• Sequence of key/value pairs• NOT a hash map• Optimized to scan quickly
• 1439 skips
BSON Storage
...0 1 2 3 1439
Wednesday, December 5, 12
![Page 90: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/90.jpg)
• Can skip sub-documents
• 23 skips (hours) + 59 skips (minutes) = 82 skips
BSON Storage
1
0 ...
... ...59
1 23
1380 143960 ... 119
Wednesday, December 5, 12
![Page 91: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/91.jpg)
Lab #5 - Solution BTime Series// Time series buckets, each hour a sub-document{ _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } }}
// Add one to the last second before midnight> db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { daily: 1 }, $inc: { "minute.23.59": 1 } })
Wednesday, December 5, 12
![Page 92: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/92.jpg)
Lab #6Inventory
• User has a number of "votes" they can use
Wednesday, December 5, 12
![Page 93: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/93.jpg)
Lab #6 - SolutionInventory // Number of votes and who voted for { _id: "alvin", votes: 42, voted_for: [] }
// Subtract a vote and add the voted for tweet // "20111209-1231" > db.user.update( { _id: "alvin", votes : { $gt : 0}, voted_for: { $ne: "20111209-1231" }}, { "$push": { voted_for: "20111209-1231"}, "$inc": { votes: -1} } )
Wednesday, December 5, 12
![Page 94: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/94.jpg)
Lab #6 - SolutionInventory // After vote > db.votes.findOne() { _id: "alvin", votes: 41, voted_for: ["20111209-1231"] }
decremented
added
Wednesday, December 5, 12
![Page 95: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/95.jpg)
Lab #7Statistic Buckets• Record referring web sites on customer sign up• Independent counter for each web site
Wednesday, December 5, 12
![Page 96: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/96.jpg)
Lab #7 - Solution AStatistic Buckets{ _id: "alvin", referrers: [ { domain: "www.google.co.uk", count: 4 }, { domain: "www.yahoo.com", count: 1 }, ] }
Wednesday, December 5, 12
![Page 97: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/97.jpg)
Lab #7 - Solution AStatistic Buckets{ _id: "alvin", referrers: [ { domain: "www.google.co.uk", count: 4 }, { domain: "www.yahoo.com", count: 1 }, ] }
> db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } )
Wednesday, December 5, 12
![Page 98: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/98.jpg)
Lab #7 - Solution AStatistic Buckets{ _id: "alvin", referrers: [ { domain: "www.google.co.uk", count: 4 }, { domain: "www.yahoo.com", count: 1 }, ] }
> db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } )
Wednesday, December 5, 12
![Page 99: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/99.jpg)
Lab #7 - Solution AStatistic Buckets{ _id: "alvin", referrers: [ { domain: "www.google.co.uk", count: 4 }, { domain: "www.yahoo.com", count: 1 }, ] }
> db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } )
{ _id: "alvin", referrers: [ { domain: "www.google.co.uk", count: 5 }, { domain: "www.yahoo.com", count: 1 }, ] }
Wednesday, December 5, 12
![Page 100: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/100.jpg)
Lab #7 - Solution AStatistic Buckets{ _id: "alvin", referrers: [ { domain: "www.google.co.uk", count: 4 }, { domain: "www.yahoo.com", count: 1 }, ] }
> db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } )
{ _id: "alvin", referrers: [ { domain: "www.google.co.uk", count: 5 }, { domain: "www.yahoo.com", count: 1 }, ] }
Wednesday, December 5, 12
![Page 101: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/101.jpg)
Lab #7 - Solution AStatistic Buckets
> db.referers.update( { "referrers.domain": "www.bing.com" }, { $inc: {"referrers.$.count": 1 } }, false, true ) What happens if a new referring site is used?
Wednesday, December 5, 12
![Page 102: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/102.jpg)
Lab #7 - Solution BStatistic Buckets// Need to replace dots with underscores{ _id: "alvin", referrers: { "www_google_co_uk": 4, "www_yahoo_com": 1 }, }
// simple $inc will add www_bing_com if not present> db.referers.update( { _id: "alvin" }, { $inc: { "referrers.www_bing_com": 1 } }, true, false);
Wednesday, December 5, 12
![Page 103: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/103.jpg)
Part ThreeSharding
Wednesday, December 5, 12
![Page 104: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/104.jpg)
What is Sharding
• Ad-hoc partitioning
• Consistent hashing• Amazon Dynamo
• Range based partitioning• Google BigTable• Yahoo! PNUTS• MongoDB
Wednesday, December 5, 12
![Page 105: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/105.jpg)
MongoDB Sharding
• Automatic partitioning and management
• Range based
• Convert to sharded system with no downtime
• Fully consistent
• No code changes required
Wednesday, December 5, 12
![Page 106: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/106.jpg)
Sharding - Range distribution
shard01 shard02 shard03
sh.shardCollection("mydb.tweets", {_id: 1} , false)
Wednesday, December 5, 12
![Page 107: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/107.jpg)
Sharding - Range distribution
shard01 shard02 shard03
a-i j-r s-z
Wednesday, December 5, 12
![Page 108: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/108.jpg)
Sharding - Splits
shard01 shard02 shard03
a-i ja-jz s-z
k-r
Wednesday, December 5, 12
![Page 109: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/109.jpg)
Sharding - Splits
shard01 shard02 shard03
a-i ja-ji s-z
ji-js
js-jw
jz-r
Wednesday, December 5, 12
![Page 110: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/110.jpg)
Sharding - Auto Balancing
shard01 shard02 shard03
a-i ja-ji s-z
ji-js
js-jw
jz-r
js-jw
jz-r
Wednesday, December 5, 12
![Page 111: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/111.jpg)
Sharding - Auto Balancing
shard01 shard02 shard03
a-i ja-ji s-z
ji-js
js-jw
jz-r
Wednesday, December 5, 12
![Page 112: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/112.jpg)
Sharding for caching
Wednesday, December 5, 12
![Page 113: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/113.jpg)
Sharding for caching
shard01
a-i
j-r
s-z
300
GB
Dat
a
300 GB
96 GB Mem3:1 Data/Mem
Wednesday, December 5, 12
![Page 114: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/114.jpg)
Aggregate Horizontal Resources
shard01 shard02 shard03
a-i j-r s-z
96 GB Mem1:1 Data/Mem
100 GB 100 GB 100 GB
300
GB
Dat
a
96 GB Mem1:1 Data/Mem
96 GB Mem1:1 Data/Mem
j-r
s-z
Wednesday, December 5, 12
![Page 115: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/115.jpg)
Sharding Features• Shard data without no downtime • Automatic balancing as data is written• Commands routed (switched) to correct node
• Inserts - must have the Shard Key• Updates - can have the Shard Key• Queries
• With Shard Key - routed to nodes• Without Shard Key - scatter gather
• Indexed / Sorted Queries• With Shard Key - routed in order• Without Shard Key - distributed sort merge
Wednesday, December 5, 12
![Page 116: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/116.jpg)
Lab #8Sharding Twitter Pictures
User can upload pictures to Twitter feed
{ photo_id : ???? , data : <binary> }
What should photo_id be?How will photo_id be sharded?
Wednesday, December 5, 12
![Page 117: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/117.jpg)
Lab #8Sharding Key
{ photo_id : ???? , data : <binary> }
What’s the right key?• auto increment• MD5( data )• month() + MD5( data )
Wednesday, December 5, 12
![Page 118: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/118.jpg)
• Only have to keep small portion in ram• Right shard "hot" • Time Based
• ObjectId• Auto Increment
Right balanced access
Wednesday, December 5, 12
![Page 119: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/119.jpg)
• Have to keep entire index in ram• All shards "warm"
• Hash
Random access
Wednesday, December 5, 12
![Page 120: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/120.jpg)
• Have to keep some index in ram• Some shards "warm"
•Month + Hash
Segmented access
Wednesday, December 5, 12
![Page 121: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/121.jpg)
Lab #9Single Identities// Shard by _idids:{ _id : "alvin", email: "[email protected]", addresses: [ { state : "CA", country: "USA" }, { country: "UK" } ] }
How would the following queries be executed?
> db.ids.find( { _id: "alvin"} )> db.ids.find( { email: "[email protected]" } )
Wednesday, December 5, 12
![Page 122: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/122.jpg)
Sharding - Routed Query
shard01 shard02 shard03
a-i ja-ji s-z
ji-js
js-jw
jz-r
find( { _id: "alvin"} )
Wednesday, December 5, 12
![Page 123: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/123.jpg)
Sharding - Routed Query
shard01 shard02 shard03
a-i ja-ji s-z
ji-js
find( { _id: "alvin"} )
js-jw
jz-r
Wednesday, December 5, 12
![Page 124: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/124.jpg)
Sharding - Scatter Gather
shard01 shard02 shard03
a-i ja-ji s-z
ji-js
js-jw
jz-r
find( { email: "[email protected]" } )
Wednesday, December 5, 12
![Page 125: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/125.jpg)
Sharding - Scatter Gather
shard01 shard02 shard03
a-i ja-ji s-z
ji-js
js-jw
jz-r
find( { email: "[email protected]" } )
Wednesday, December 5, 12
![Page 126: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/126.jpg)
Lab #9Multiple Identities
User can have multiple identities• twitter name• email address• facebook name• etc.
What is the best sharding key & schema design?
Wednesday, December 5, 12
![Page 127: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/127.jpg)
Lab #9 - Solution AMultiple Identities
// Shard by _id{ _id: "alvin", email: "[email protected]", fb: "alvin.richards", // facebook li: "alvin.j.richards", // linkedin tweets: [ ... ] }
Lookup by _id hits 1 node Lookup by email, li or fb is scatter gather Cannot create a unique index on email, li or fb
Wednesday, December 5, 12
![Page 128: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/128.jpg)
Lab #9 - Solution BMultiple Identitiesidentities{ _id: { _id: "alvin"}, info: "1200-42"}{ _id: { em: "[email protected]"}, info: "1200-42"}{ _id: { li: "alvin.j.richards"}, info: "1200-42"}
tweets{ _id: "1200-42", tweets: [ ... ]}
• Shard identities on { _id: 1}• Can create unique index on _id• Shard info on { _id: 1 }
Wednesday, December 5, 12
![Page 129: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/129.jpg)
Sharding - Multiple Identities
shard01 shard02 shard03
idscollection
tweetscollection
em: a-q em: r-z _id: a-z
li: s-z
li: a-c
li: d-r_id: "Min"-"1100"
_id: "1100"-"1200"
_id: "1200"-"Max"
Wednesday, December 5, 12
![Page 130: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/130.jpg)
Sharding - Multiple Identities
shard01 shard02 shard03em: a-q em: r-z _id: a-z
li: s-z
li: a-c
li: d-r_id: "Min"-"1100"
_id: "1100"-"1200"
_id: "1200"-"Max"
ids.find({ _id: {"em","[email protected] })
idscollection
tweetscollection
Wednesday, December 5, 12
![Page 131: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/131.jpg)
Sharding - Multiple Identities
shard01 shard02 shard03
ids.find({ _id: {"em","[email protected] })
tweets.find({ _id: "1200-‐42" })
idscollection
tweetscollection
em: a-q em: r-z _id: a-z
li: s-z
li: a-c
li: d-r_id: "Min"-"1100"
_id: "1100"-"1200"
_id: "1200"-"Max"
Wednesday, December 5, 12
![Page 132: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/132.jpg)
Part FourReplication
Wednesday, December 5, 12
![Page 133: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/133.jpg)
Types of outage• Planned
• Hardware upgrade• O/S or file-system tuning• Relocation of data to new file-system / storage• Software upgrade
• Unplanned• Hardware failure• Data center failure• Region outage• Human error• Application corruption
Wednesday, December 5, 12
![Page 134: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/134.jpg)
Replica Sets
• Data Protection• Multiple copies of the data• Spread across Data Centers, AZs
• High Availability• Automated Failover• Automated Recovery
Wednesday, December 5, 12
![Page 135: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/135.jpg)
Replica Sets
Primary
Secondary
Secondary
Read
Write
Read
Read
App
Asynchronous Replication
Wednesday, December 5, 12
![Page 136: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/136.jpg)
Replica Sets
Primary
Secondary
Secondary
Read
Write
Read
Read
App
Wednesday, December 5, 12
![Page 137: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/137.jpg)
Replica Sets
Primary
Primary
Secondary
Read
Write
Read
Automatic Election of new Primary
App
Wednesday, December 5, 12
![Page 138: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/138.jpg)
Replica Sets
Recovering
Primary
Secondary
Read
Write
Read
New primary serves data
App
Wednesday, December 5, 12
![Page 139: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/139.jpg)
Replica Sets
Secondary
Primary
Secondary
Read
Write
Read
Read
App
Wednesday, December 5, 12
![Page 140: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/140.jpg)
Elections
During an election• Most up to date• Highest priority• Less than 10s behind failed Primary
Wednesday, December 5, 12
![Page 141: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/141.jpg)
Types of Durability with MongoDB• Fire and forget• Wait for error • Wait for fsync• Wait for journal sync • Wait for replication
Wednesday, December 5, 12
![Page 142: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/142.jpg)
Network Ack- Old Default
Driver Primary
apply in memory
write
Wednesday, December 5, 12
![Page 143: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/143.jpg)
Get last error - New default
Driver Primary
getLastError apply in memory
write
Wednesday, December 5, 12
![Page 144: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/144.jpg)
Wait for Journal Sync
Driver Primary
apply in memory
write
j:trueWrite to journal
getLastError
Wednesday, December 5, 12
![Page 145: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/145.jpg)
Wait for replication
Driver Primary
apply in memory
write
w:2
Secondary
replicate
getLastError
Wednesday, December 5, 12
![Page 146: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/146.jpg)
Tunable Data DurabilityMemory Journal Secondary Other Data Center
RDBMS
networkACK
w=1
w=1j=true
w="majority"w=n
w="myTag"
Less More
async
sync
Wednesday, December 5, 12
![Page 147: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/147.jpg)
Eventual ConsistencyUsing Replicas for ReadsRead preference• primary (only)• primaryPreferred• secondary (only)• secondaryPreferred• nearest
Wednesday, December 5, 12
![Page 148: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/148.jpg)
Immediate Consistency
PrimaryThread #1
Insert
Update
Read
Read
v1
✔
✔
v2
Wednesday, December 5, 12
![Page 149: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/149.jpg)
Eventual Consistency
Primary SecondaryThread #1
Insert
Update
Read
Read
v1
Thread #2
✔
✔
v1
✖
✖v2
v2
reads v1
v1 does not exist
✔ reads v2
✔reads v1
Wednesday, December 5, 12
![Page 150: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/150.jpg)
Lab #10Replication
Primary, Secondary or both?
• Show the latest "votes" for a tweet and/or user• Changing your profile picture• Showing your thumbnail with a tweet
Wednesday, December 5, 12
![Page 151: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/151.jpg)
Summary
• Schema design is different in MongoDB
• Basic data design principals stay the same
• Focus on how the application manipulates data
• Rapidly evolve schema to meet your requirements
• Consider sharding early
• Understand the impact of eventual consistency
Wednesday, December 5, 12
![Page 152: MongoSV Schema Workshop](https://reader034.vdocument.in/reader034/viewer/2022052618/54c90fab4a79594f398b4590/html5/thumbnails/152.jpg)
@mongodb
conferences, appearances, and meetupshttp://www.10gen.com/events
http://bit.ly/mongo> Facebook | Twitter | LinkedIn
http://linkd.in/joinmongo
download at mongodb.org
Wednesday, December 5, 12