transmart community meeting 5-7 nov 13 - session 2: mongodb: what, why and when
Post on 19-Jan-2015
650 Views
Preview:
DESCRIPTION
TRANSCRIPT
What, When and Why of MongoDB
Solution Architect, MongoDB Inc.
Massimo Brignoli
@mongodb
Agenda
About MongoDB Inc.
Data and Query Model
Scalability
Availability
Deployment Architectures
Schema Design Challenges
Use Cases
About MongoDB
MongoDB Inc. Overview
300+ employees 600+ customers
Offices in New York, Palo Alto, Washington DC, London, Dublin,
Barcelona and SydneyOver $231 million in funding
6,000,000+ MongoDB Downloads
100,000+ Online Education Registrants
20,000+ MongoDB User Group Members
20,000+ MongoDB Days Attendees
15,000+ MongoDB Management Service (MMS) Users
Global Community
MongoDB Inc. Products and Services
TrainingOnline and In-Person for Developers and Administrators
MongoDB Monitoring ServiceCloud-Based Service for Monitoring, Alerts, Backup and Restore
SubscriptionsMongoDB Enterprise, On-Prem Monitoring, Professional Support and Commercial License
ConsultingExpert Resources for All Phases of MongoDB Implementations
Data & Query Model
Operational Database Landscape
Document Data Model
Relational MongoDB
{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
Document Model Benefits
• Agility and flexibility– Data models can evolve easily– Companies can adapt to changes quickly
• Intuitive, natural data representation– Developers are more productive– Many types of applications are a good fit
• Reduces the need for joins, disk seeks– Programming is more simple– Performance can be delivered at scale
Developers are more productive
Developers are more productive
Developers are more productive
MongoDB is full featured
MongoDBRich Queries
• Find Paul’s cars• Find everybody in London with a
car built between 1970 and 1980
Geospatial• Find all of the car owners within
5km of Trafalgar Sq.
Text Search• Find all the cars described as
having leather seats
Aggregation• Calculate the average value of
Paul’s car collection
Map Reduce• What is the ownership pattern of
colors by geography over time? (is purple trending up in China?)
{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
Shell and Drivers
Shell
Command-line shell for
interacting directly with
database
DriversDrivers for most popular programming languages and frameworks
> db.collection.insert({company:“10gen”, product:“MongoDB”})> > db.collection.findOne(){
“_id” : ObjectId(“5106c1c2fc629bfe52792e86”),
“company” : “10gen”“product” : “MongoDB”
}
Java
Python
Perl
Ruby
Haskell
JavaScript
Scalability
Automatic Sharding
• Three types of sharding: hash-based, range-based, tag-aware
• Increase or decrease capacity as you go
• Automatic balancing
Query Routing
• Multiple query optimization models
• Each sharding option appropriate for different apps
Availability
High Availability – Ensure application availability
during many types of failures
Disaster Recovery – Address the RTO and RPO goals
for business continuity
Maintenance – Perform upgrades and other
maintenance operations with no application downtime
Availability Considerations
Replica Sets
• Replica Set – two or more copies
• “Self-healing” shard
• Addresses many concerns:
- High Availability
- Disaster Recovery
- Maintenance
Replica Set Benefits
Business Needs Replica Set Benefits
High Availability Automated failover
Disaster Recovery Hot backups offsite
Maintenance Rolling upgrades
Low Latency Locate data near users
Workload Isolation Read from non-primary replicas
Data Privacy Restrict data to physical location
Data Consistency Tunable Consistency
Deployment Architectures
Single Data Center
• Automated failover
• Tolerates server failures
• Tolerates rack failures
• Number of replicas defines failure tolerance
Primary – A Primary – B Primary – C
Secondary – A
Secondary – A
Secondary – B
Secondary – B
Secondary – C
Secondary – C
Active/Standby Data Center
• Tolerates server and rack failure
• Standby data center
Data Center - West
Primary – A Primary – B Primary – C
Secondary – A
Secondary – B
Secondary – C
Data Center - East
Secondary – A
Secondary – B
Secondary – C
Active/Active Data Center
• Tolerates server, rack, data center failures, network partitions
Data Center - West
Primary – A Primary – B Primary – C
Secondary – A
Secondary – B
Secondary – C
Data Center - East
Secondary – A
Secondary – B
Secondary – C
Secondary – B
Secondary – C
Secondary – A
Data Center - Central
Arbiter – A Arbiter – B Arbiter – C
Global Data Distribution
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Primary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Read Global/Write Local
Primary:NYC
Secondary:NYC
Primary:LON
Primary:SYD
Secondary:LON
Secondary:NYC
Secondary:SYD
Secondary:LON
Secondary:SYD
Schema Design Challenges
First a story:
Once upon a time there was a medical records company…
Schema Design Challenge
• Flexibility– Easily adapt to new requirements
• Agility– Rapid application development
• Scalability– Support large data and query volumes
Schema Design:
MongoDB vs. Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have?
What answers do I have?
MongoDB versus Relational
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values0, 1, many, or embed
Single value
QueryAny field or level
Any field
Schema Flexible Very structured
Updates In line In place
With relational, this is hard
Long development times
Inflexible
Doesn’t scale
Document model is much easier
Shorter development times
Flexible
Scalable
{ "patient_id": "1177099", "first_name": "John", "last_name": "Doe", "middle_initial": "A", "dob": "2000-01-25", "gender": "Male", "blood_type": "B+", "address": "123 Elm St., Chicago, IL 59923", "height": "66", "weight": "110", "allergies": ["Nuts", "Penicillin", "Pet Dander"], "current_medications": [{"name": "Zoloft", "dosage": "2mg", "frequency": "daily", "route": "orally"}], "complaint" : [{"entered": "2000-11-03", "onset": "2000-11-03", "prob_desc": "", "icd" : 250.00, "status" : "Active"}, {"entered": "2000-02-04", "onset": "2000-02-04", "prob_desc": "in spite of regular exercise, ...", "icd" : 401.9, "status" : "Active"}], "diagnosis" : [{"visit" : "2005-07-22" , "narrative" : "Fractured femur", "icd" : "9999", "priority" : "Primary"}, {"visit" : "2005-07-22" , "narrative" : "Type II Diabetes", "icd" : "250.00", "priority" : "Secondary"}]}
Let’s model something together
How about a business card?
Business Card
Address Book Entity-Relationship
Contacts• name• company• title
Addresses
• type• street• city• state• zip_code
Phones• type• number
Emails• type• address
Thumbnails
• mime_type
• dataPortraits• mime_typ
e• data
Groups• name
N
1
N
1
N
N
N
1
1
1
11
Twitters• name• location• web• bio
1
1
Referencing
Contact
• name• compan
y• title• phone
Address
• street• city• state• zip_cod
e
Use two collections with a reference
Similar to relational
Contact
• name• company• address
• Street• City• State• Zip
• title• phone
• address• street• city• State• zip_cod
e
Embedding
Document Schema
Referencing
Contacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”: “VP, New Product Development”,
“company”: “Apple Computer”,
“phone”: “408-996-1010”,
“address_id”: 1
}
Addresses
{“_id”: 1,“street”: “10260 Bandley Dr”,“city”: “Cupertino”,“state”: “CA”,“zip_code”: ”95014”,“country”: “USA”
}
EmbeddingContacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”: “VP, New Product Development”,
“company”: “Apple Computer”,
“address”: {“street”: “10260 Bandley Dr”,
“city”: “Cupertino”,
“state”: “CA”,
“zip_code”: ”95014”,
“country”: “USA”},
“phone”: “408-996-1010”
}
How are they different? Why?
Contact
• name• compan
y• title• phone
Address
• street• city• state• zip_cod
e
Contact
• name• company• adress
• Street• City• State• Zip
• title• phone
• address• street• city• state• zip_cod
e
Schema Flexibility{
“name”: “Steven Jobs”,“title”: “VP, New Product
Development”,“company”: “Apple
Computer”,“address”: {
“street”: 10260 Bandley Dr”,
“city”: “Cupertino”,“state”: “CA”,“zip_code”:
“95014”},“phone”: “408-996-1010”
}
{“name”: “Larry Page,“url”: “http://google.com”,“title”: “CEO”,“company”: “Google!”,“address”: {
“street”: 555 Bryant, #106”,
“city”: “Palo Alto”,“state”: “CA”,“zip_code”:
“94301”},“phone”: “650-330-0100”“fax”: ”650-330-1499”
}
One-to-many embedding vs. referencing
{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [{ “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301” }] “phones”: [{“type”: “Office”, “number”: “650-618-1499”}, {“type”: “fax”, “number”: “650-330-0100”}]}
{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [“addr99”], “phones”: [“ph23”, “ph49”]}
{ “_id”: “addr99”, “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301”}
{ “_id”: “ph23”, “type”: “Office”, “number”: “650-618-1499”},{ “_id”: “ph49”,
“type”: “fax”, “number”: “650-330-0100”}
Many to ManyTraditional Relational Association
Join tableContacts
namecompanytitlephone
Groupsname
GroupContacts
group_idcontact_idX
Use arrays instead
Address Book Entity-Relationship
Contacts• name• company• title
Addresses
• type• street• city• state• zip_code
Phones• type• number
Emails• type• address
Thumbnails
• mime_type
• dataPortraits• mime_typ
e• data
Groups• name
N
1
N
1
N
N
N
1
1
1
11
Twitters• name• location• web• bio
1
1
Contacts• name• company• title
addresses• type• street• city• state• zip_code
phones• type• number
emails• type• address
thumbnail• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
twitter• name• location• web• bio
N
N
N
1
1
Document model - holistic and efficient representation
Contact document example{
“name” : “Gary J. Murakami, Ph.D.”,“company” : “MongoDB, Inc”,“title” : “Lead Engineer and Ruby Evangelist”,“twitter” : {
“name” : “GaryMurakami”, “location” : “New Providence, NJ”,“web” : “http://www.nobell.org”
},“portrait_id” : 1,“addresses” : [
{ “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” : “10036” }
],“phones” : [
{ “type” : “work”, “number” : “1-866-237-8815 x8015” }],“emails” : [
{ “type” : “work”, “address” : “gary.murakami@mongodb.com” },{ “type” : “home”, “address” : “gjm@nobell.org” }
]}
Health Care Use Cases
360-Degree Patient View
• Healthcare provider networks have massive amounts of patient data
– Both structured and unstructured– Basic patient informations– Lab results– MRI images
• Centralization of data needed– Aggregation of all the data in one repository
• Analytics
Population Management for At-Risk Demographics
• Certain populations are known to be prone to certain diseases.
• Analyzing data insurers help people take preventative measures
– reminding them to get regularly scheduled colonoscopies
• Help insurers to reduce costs and to expand margins,
Lab Data Management and Analytics
• Strain on traditional technological systems:– Rise of number of tests conducted– Rise of variety of data collected– Lack of flexibility
• With MongoDB’s flexible data model, providers of lab testing, genomics and clinical pathology can:
– Ingest, store and analyze a variety of data types– Coming from numerous sources all in a single data
store
• enables these companies to generate new insights and revenue streams
Other use cases for MongoDB in healthcare include:
• Fraud Detection
• Remote Monitoring and Body Area Networks
• Mobile Apps for Doctors and Nurses
• Pandemic Detection with Real-Time Geospatial Analytics
• Electronic Healthcare Records (EHR)
• Advanced Auditing Systems for Compliance
• Hospital Equipment Management and Optimization
Thank You
Solutions Architect, MongoDB
Massimo Brignolimassimo@mongodb.com@massimobrignoli
#MongoDB
top related