webinar: schema design
Post on 10-Dec-2014
2.716 Views
Preview:
DESCRIPTION
TRANSCRIPT
Schema Design
Solutions Architect, MongoDB
Jay Runkel
#MongoDB
First a story:
Once upon a time there was a medical records company…
• Schema Design Challenge
• Modeling Relationships in MongoDB
• An Example
• General Recommendations
Agenda
Schema Design Challenges
• Flexibility– Easily adapt to new requirements
• Agility– Rapid application development
• Scalability– Support large data and query volumes
Schema Design Challenge
• How do we model data and relationships to ensure:
–Flexibility
–Agility
–Scalability
Schema Design Challenge
Schema Design:
MongoDB vs. Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have?
What answers do I have?
MongoDB versus Relational
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values0, 1, many, or embed
Single value
QueryAny field or level
Any field
Schema Flexible Very structured
Updates In line In place
With relational, this is hard
Long development times
Inflexible
Doesn’t scale
Document model is much easier
Shorter development times
Flexible
Scalable
{ "patient_id": "1177099", "first_name": "John", "last_name": "Doe", "middle_initial": "A", "dob": "2000-01-25", "gender": "Male", "blood_type": "B+", "address": "123 Elm St., Chicago, IL 59923", "height": "66", "weight": "110", "allergies": ["Nuts", "Penicillin", "Pet Dander"], "current_medications": [{"name": "Zoloft", "dosage": "2mg", "frequency": "daily", "route": "orally"}], "complaint" : [{"entered": "2000-11-03", "onset": "2000-11-03", "prob_desc": "", "icd" : 250.00, "status" : "Active"}, {"entered": "2000-02-04", "onset": "2000-02-04", "prob_desc": "in spite of regular exercise, ...", "icd" : 401.9, "status" : "Active"}], "diagnosis" : [{"visit" : "2005-07-22" , "narrative" : "Fractured femur", "icd" : "9999", "priority" : "Primary"}, {"visit" : "2005-07-22" , "narrative" : "Type II Diabetes", "icd" : "250.00", "priority" : "Secondary"}]}
Modeling Entities and Relationships
Let’s model something together
How about a business card?
Business Card
Address Book Entity-Relationship
Contacts• name• company• title
Addresses
• type• street• city• state• zip_code
Phones• type• number
Emails• type• address
Thumbnails
• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
N
N
N
1
1
1
11
Twitters• name• location• web• bio
1
1
Modeling One-to-One Relationships
Referencing
Contact
• name• compan
y• title• phone
Address
• street• city• state• zip_cod
e
Use two collections with a reference
Similar to relational
Contact
• name• company• adress
• Street• City• State• Zip
• title• phone
• address• street• city• State• zip_cod
e
Embedding
Document Schema
Referencing
Contacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”: “VP, New Product Development”,
“company”: “Apple Computer”,
“phone”: “408-996-1010”,
“address_id”: 1
}
Addresses
{“_id”: 1,“street”: “10260 Bandley Dr”,“city”: “Cupertino”,“state”: “CA”,“zip_code”: ”95014”,“country”: “USA”
}
EmbeddingContacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”: “VP, New Product Development”,
“company”: “Apple Computer”,
“address”: {“street”: “10260 Bandley Dr”,
“city”: “Cupertino”,
“state”: “CA”,
“zip_code”: ”95014”,
“country”: “USA”},
“phone”: “408-996-1010”
}
How are they different? Why?
Contact
• name• compan
y• title• phone
Address
• street• city• state• zip_cod
e
Contact
• name• company• adress
• Street• City• State• Zip
• title• phone
• address• street• city• state• zip_cod
e
Schema Flexibility{
“name”: “Steven Jobs”,“title”: “VP, New Product
Development”,“company”: “Apple
Computer”,“address”: {
“street”: 10260 Bandley Dr”,
“city”: “Cupertino”,“state”: “CA”,“zip_code”:
“95014”},“phone”: “408-996-1010”
}
{“name”: “Larry Page,“url”: “http://google.com”,“title”: “CEO”,“company”: “Google!”,“address”: {
“street”: 555 Bryant, #106”,
“city”: “Palo Alto”,“state”: “CA”,“zip_code”:
“94301”},“phone”: “650-330-0100”“fax”: ”650-330-1499”
}
One to OneSchema Design Choices
contacttwitter_id
twitter1 1
contact twittercontact_id1 1
Redundant to track relationship
on both sides
May save a fetch?
Contacttwitter
twitter1
One to One: General Recommendations
• Embed– Full contact info all at once– Parent-child relationship “contains”– No additional data duplication– Can query or index on embedded field• e.g., “twitter.name”
• Exceptional cases…• Embedding results in large
documents
Contacttwitter
twitter 1
Modeling One-to-Many Relationships
One to ManySchema Design Choices
contactphone_ids: [ ]
phone1 N
contact phonecontact_id1 N
Redundant to track relationship
on both sides
Not possible in relational DBs
Contactphones
phoneN
One-to-many embedding vs. referencing
{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [{ “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301” }] “phones”: [{“type”: “Office”, “number”: “650-618-1499”}, {“type”: “fax”, “number”: “650-330-0100”}]}
{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [“addr99”], “phones”: [“ph23”, “ph49”]}
{ “_id”: “addr99”, “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301”}
{ “_id”: “ph23”, “type”: “Office”, “number”: “650-618-1499”},{ “_id”: “ph49”,
“type”: “fax”, “number”: “650-330-0100”}
One to ManyGeneral Recommendation
• Embed when possible– Full contact info all at once– Parent-children relationship “contains”– No additional data duplication– Can query or index on any field• e.g., { “phones.type”: “mobile” }
• Exceptional cases…• Scaling: maximum document size is 16MB
Contactphones
phoneN
Modeling Many-to-Many Relationships
Many to ManyTraditional Relational Association
Join table
Contactsnamecompanytitlephone
Groupsname
GroupContacts
group_idcontact_idX
Use arrays instead
Many to ManySchema Design Choices
groupcontact_ids: [ ]
contactN N
group contactgroup_ids: [ ]N N
Redundant to track relationship on both sides • Both references must be
updated for consistency
Redundant to track relationship on both sides • Duplicated data must be
updated for consistency
groupcontacts
contactN
contactgroups
group N
Many to ManyGeneral Recommendation
• Use case determines whether to reference or embed:
1. Simple address book• Contact references groups
2. Corporate email groups• Group embeds contacts for
performance
• Exceptional cases– Scaling: maximum document size is 16MB– Scaling may affect performance and
working set
group contactgroup_ids: [ ]N N
Address Book Entity-Relationship
Contacts• name• company• title
Addresses
• type• street• city• state• zip_code
Phones• type• number
Emails• type• address
Thumbnails
• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
N
N
N
1
1
1
11
Twitters• name• location• web• bio
1
1
Contacts• name• company• title
addresses• type• street• city• state• zip_code
phones• type• number
emails• type• address
thumbnail• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
twitter• name• location• web• bio
N
N
N
1
1
Document model - holistic and efficient representation
Contact document example{
“name” : “Gary J. Murakami, Ph.D.”,
“company” : “MongoDB, Inc”,
“title” : “Lead Engineer and Ruby Evangelist”,
“twitter” : {
“name” : “GaryMurakami”, “location” : “New Providence, NJ”,
“web” : “http://www.nobell.org”
},
“portrait_id” : 1,
“addresses” : [
{ “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” :
“10036” }
],
“phones” : [
{ “type” : “work”, “number” : “1-866-237-8815 x8015” }
],
“emails” : [
{ “type” : “work”, “address” : “gary.murakami@mongodb.com” },
{ “type” : “home”, “address” : “gjm@nobell.org” }
]
}
General Recommendations
Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterative schema design development– Measure performance, find bottlenecks, and embed
1. one to one associations first2. one to many associations next3. many to many associations– eliminate join table using array of references or
embedded documents– Measure and analyze, review concerns, scaling
• Embed by default
New Software Application
Embedding over Referencing
• Embedding is a bit like pre-joined data– BSON (Binary JSON) document ops are easy for
the server
• Embed (90/10 following rule of thumb)– When the “one” or “many” objects are viewed in
the context of their parent– For performance– For atomicity
• Reference– When you need more scaling– For easy consistency with “many to many”
associations without duplicated data
It’s All About Your Application
• Programs+Databases = (Big) Data Applications
• Your schema is the impedance matcher– Design choices: normalize/denormalize,
reference/embed– Melds programming with MongoDB for best of
both– Flexible for development and change
• Programs×MongoDB = Great Big Data Applications
Questions?
Thank You
Solutions Architect, MongoDB
Jay Runkeljay.runkel@mongodb.com@jayrunkel
#MongoDB
top related