transmart community meeting 5-7 nov 13 - session 2: mongodb: what, why and when

What, When and Why of MongoDB

Solution Architect, MongoDB Inc.

Massimo Brignoli

@mongodb

Agenda

About MongoDB Inc.

Data and Query Model

Scalability

Availability

Deployment Architectures

Schema Design Challenges

Use Cases

About MongoDB

MongoDB Inc. Overview

300+ employees 600+ customers

Offices in New York, Palo Alto, Washington DC, London, Dublin,

Barcelona and SydneyOver $231 million in funding

6,000,000+ MongoDB Downloads

100,000+ Online Education Registrants

20,000+ MongoDB User Group Members

20,000+ MongoDB Days Attendees

15,000+ MongoDB Management Service (MMS) Users

Global Community

MongoDB Inc. Products and Services

TrainingOnline and In-Person for Developers and Administrators

MongoDB Monitoring ServiceCloud-Based Service for Monitoring, Alerts, Backup and Restore

SubscriptionsMongoDB Enterprise, On-Prem Monitoring, Professional Support and Commercial License

ConsultingExpert Resources for All Phases of MongoDB Implementations

Data & Query Model

Operational Database Landscape

Document Data Model

Relational MongoDB

{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

Document Model Benefits

• Agility and flexibility– Data models can evolve easily– Companies can adapt to changes quickly

• Intuitive, natural data representation– Developers are more productive– Many types of applications are a good fit

• Reduces the need for joins, disk seeks– Programming is more simple– Performance can be delivered at scale

Developers are more productive

MongoDB is full featured

MongoDBRich Queries

• Find Paul’s cars• Find everybody in London with a

car built between 1970 and 1980

Geospatial• Find all of the car owners within

5km of Trafalgar Sq.

Text Search• Find all the cars described as

having leather seats

Aggregation• Calculate the average value of

Paul’s car collection

Map Reduce• What is the ownership pattern of

colors by geography over time? (is purple trending up in China?)

{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

Shell and Drivers

Command-line shell for

interacting directly with

database

DriversDrivers for most popular programming languages and frameworks

> db.collection.insert({company:“10gen”, product:“MongoDB”})> > db.collection.findOne(){

“_id” : ObjectId(“5106c1c2fc629bfe52792e86”),

“company” : “10gen”“product” : “MongoDB”

Python

Haskell

JavaScript

Scalability

Automatic Sharding

• Three types of sharding: hash-based, range-based, tag-aware

• Increase or decrease capacity as you go

• Automatic balancing

Query Routing

• Multiple query optimization models

• Each sharding option appropriate for different apps

Availability

High Availability – Ensure application availability

during many types of failures

Disaster Recovery – Address the RTO and RPO goals

for business continuity

Maintenance – Perform upgrades and other

maintenance operations with no application downtime

Availability Considerations

Replica Sets

• Replica Set – two or more copies

• “Self-healing” shard

• Addresses many concerns:

- High Availability

- Disaster Recovery

- Maintenance

Replica Set Benefits

Business Needs Replica Set Benefits

High Availability Automated failover

Disaster Recovery Hot backups offsite

Maintenance Rolling upgrades

Low Latency Locate data near users

Workload Isolation Read from non-primary replicas

Data Privacy Restrict data to physical location

Data Consistency Tunable Consistency

Deployment Architectures

Single Data Center

• Automated failover

• Tolerates server failures

• Tolerates rack failures

• Number of replicas defines failure tolerance

Primary – A Primary – B Primary – C

Secondary – A

Secondary – B

Secondary – C

Active/Standby Data Center

• Tolerates server and rack failure

• Standby data center

Data Center - West

Secondary – A

Secondary – B

Secondary – C

Data Center - East

Secondary – A

Secondary – B

Secondary – C

Active/Active Data Center

• Tolerates server, rack, data center failures, network partitions

Data Center - West

Secondary – A

Secondary – B

Secondary – C

Data Center - East

Secondary – A

Secondary – B

Secondary – C

Secondary – B

Secondary – C

Secondary – A

Data Center - Central

Arbiter – A Arbiter – B Arbiter – C

Global Data Distribution

Real-time

Real-time Real-time

Real-time

Primary

Secondary

Read Global/Write Local

Primary:NYC

Secondary:NYC

Primary:LON

Primary:SYD

Secondary:LON

Secondary:NYC

Secondary:SYD

Secondary:LON

Secondary:SYD

Schema Design Challenges

First a story:

Once upon a time there was a medical records company…

Schema Design Challenge

• Flexibility– Easily adapt to new requirements

• Agility– Rapid application development

• Scalability– Support large data and query volumes

Schema Design:

MongoDB vs. Relational

MongoDB Relational

Collections Tables

Documents Rows

Data Use Data Storage

What questions do I have?

What answers do I have?

MongoDB versus Relational

Attribute MongoDB Relational

Storage N-dimensional Two-dimensional

Field Values0, 1, many, or embed

Single value

QueryAny field or level

Any field

Schema Flexible Very structured

Updates In line In place

With relational, this is hard

Long development times

Inflexible

Doesn’t scale

Document model is much easier

Shorter development times

Flexible

Scalable

{ "patient_id": "1177099", "first_name": "John", "last_name": "Doe", "middle_initial": "A", "dob": "2000-01-25", "gender": "Male", "blood_type": "B+", "address": "123 Elm St., Chicago, IL 59923", "height": "66", "weight": "110", "allergies": ["Nuts", "Penicillin", "Pet Dander"], "current_medications": [{"name": "Zoloft", "dosage": "2mg", "frequency": "daily", "route": "orally"}], "complaint" : [{"entered": "2000-11-03", "onset": "2000-11-03", "prob_desc": "", "icd" : 250.00, "status" : "Active"}, {"entered": "2000-02-04", "onset": "2000-02-04", "prob_desc": "in spite of regular exercise, ...", "icd" : 401.9, "status" : "Active"}], "diagnosis" : [{"visit" : "2005-07-22" , "narrative" : "Fractured femur", "icd" : "9999", "priority" : "Primary"}, {"visit" : "2005-07-22" , "narrative" : "Type II Diabetes", "icd" : "250.00", "priority" : "Secondary"}]}

Let’s model something together

How about a business card?

Business Card

Address Book Entity-Relationship

Contacts• name• company• title

Addresses

• type• street• city• state• zip_code

Phones• type• number

Emails• type• address

Thumbnails

• mime_type

• dataPortraits• mime_typ

e• data

Groups• name

Twitters• name• location• web• bio

Referencing

Contact

• name• compan

y• title• phone

Address

• street• city• state• zip_cod

Use two collections with a reference

Similar to relational

Contact

• name• company• address

• Street• City• State• Zip

• title• phone

• address• street• city• State• zip_cod

Embedding

Document Schema

Referencing

Contacts

“_id”: 2,

“name”: “Steven Jobs”,

“title”: “VP, New Product Development”,

“company”: “Apple Computer”,

“phone”: “408-996-1010”,

“address_id”: 1

Addresses

{“_id”: 1,“street”: “10260 Bandley Dr”,“city”: “Cupertino”,“state”: “CA”,“zip_code”: ”95014”,“country”: “USA”

EmbeddingContacts

“_id”: 2,

“name”: “Steven Jobs”,

“title”: “VP, New Product Development”,

“company”: “Apple Computer”,

“address”: {“street”: “10260 Bandley Dr”,

“city”: “Cupertino”,

“state”: “CA”,

“zip_code”: ”95014”,

“country”: “USA”},

“phone”: “408-996-1010”

How are they different? Why?

Contact

• name• compan

y• title• phone

Address

• street• city• state• zip_cod

Contact

• name• company• adress

• Street• City• State• Zip

• title• phone

• address• street• city• state• zip_cod

Schema Flexibility{

“name”: “Steven Jobs”,“title”: “VP, New Product

Development”,“company”: “Apple

Computer”,“address”: {

“street”: 10260 Bandley Dr”,

“city”: “Cupertino”,“state”: “CA”,“zip_code”:

“95014”},“phone”: “408-996-1010”

{“name”: “Larry Page,“url”: “http://google.com”,“title”: “CEO”,“company”: “Google!”,“address”: {

“street”: 555 Bryant, #106”,

“city”: “Palo Alto”,“state”: “CA”,“zip_code”:

“94301”},“phone”: “650-330-0100”“fax”: ”650-330-1499”

One-to-many embedding vs. referencing

{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [{ “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301” }] “phones”: [{“type”: “Office”, “number”: “650-618-1499”}, {“type”: “fax”, “number”: “650-330-0100”}]}

{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [“addr99”], “phones”: [“ph23”, “ph49”]}

{ “_id”: “addr99”, “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301”}

{ “_id”: “ph23”, “type”: “Office”, “number”: “650-618-1499”},{ “_id”: “ph49”,

“type”: “fax”, “number”: “650-330-0100”}

Many to ManyTraditional Relational Association

Join tableContacts

namecompanytitlephone

Groupsname

GroupContacts

group_idcontact_idX

Use arrays instead

Address Book Entity-Relationship

Addresses

• type• street• city• state• zip_code

Phones• type• number

Emails• type• address

Thumbnails

• mime_type

• dataPortraits• mime_typ

e• data

Groups• name

Twitters• name• location• web• bio

addresses• type• street• city• state• zip_code

phones• type• number

emails• type• address

thumbnail• mime_type• data

Portraits• mime_type• data

Groups• name

twitter• name• location• web• bio

Document model - holistic and efficient representation

Contact document example{

“name” : “Gary J. Murakami, Ph.D.”,“company” : “MongoDB, Inc”,“title” : “Lead Engineer and Ruby Evangelist”,“twitter” : {

“name” : “GaryMurakami”, “location” : “New Providence, NJ”,“web” : “http://www.nobell.org”

},“portrait_id” : 1,“addresses” : [

{ “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” : “10036” }

],“phones” : [

{ “type” : “work”, “number” : “1-866-237-8815 x8015” }],“emails” : [

{ “type” : “work”, “address” : “gary.murakami@mongodb.com” },{ “type” : “home”, “address” : “gjm@nobell.org” }

Health Care Use Cases

360-Degree Patient View

• Healthcare provider networks have massive amounts of patient data

– Both structured and unstructured– Basic patient informations– Lab results– MRI images

• Centralization of data needed– Aggregation of all the data in one repository

• Analytics

Population Management for At-Risk Demographics

• Certain populations are known to be prone to certain diseases.

• Analyzing data insurers help people take preventative measures

– reminding them to get regularly scheduled colonoscopies

• Help insurers to reduce costs and to expand margins,

Lab Data Management and Analytics

• Strain on traditional technological systems:– Rise of number of tests conducted– Rise of variety of data collected– Lack of flexibility

• With MongoDB’s flexible data model, providers of lab testing, genomics and clinical pathology can:

– Ingest, store and analyze a variety of data types– Coming from numerous sources all in a single data

• enables these companies to generate new insights and revenue streams

Other use cases for MongoDB in healthcare include:

• Fraud Detection

• Remote Monitoring and Body Area Networks

• Mobile Apps for Doctors and Nurses

• Pandemic Detection with Real-Time Geospatial Analytics

• Electronic Healthcare Records (EHR)

• Advanced Auditing Systems for Compliance

• Hospital Equipment Management and Optimization

Thank You

Solutions Architect, MongoDB

Massimo Brignolimassimo@mongodb.com@massimobrignoli

#MongoDB

transmart community meeting 5-7 nov 13 - session 2: mongodb: what, why and when

mongodb downloads100

data query model

data center failures

mongodb days attendees15

flexibility data models

mongodb user group members20

application availability

secondary adata center

Health & Medicine

the transmart foundation, transmart platform, and open data

transmart community meeting 5-7 nov 13 - session 5: emif...

accumulo/hadoop, mongodb, and elasticsearch … ·...

transmart community meeting 5-7 nov 13 - session 1: wellcome...

mongodb europe 2016 - debugging mongodb performance

transmart community meeting 5-7 nov 13 - session 5: recent...

transmart roadmap presentation amsterdam 2015

integrating transmart with genedata analyst™

transmart presentation

mongodb 3.0 migration - mongodb days munich

transmart community meeting 5-7 nov 13 - session 5: the...

transmart hackathon introduction amsterdam 2015

transmart etl guide

transmart community meeting 5-7 nov 13 - session 5:...

transmart community meeting 5-7 nov 13 - session 2: creating...

transmart 17.1 technical overview

mongodb evenings minneapolis: medtronic's mongodb journey

transmart community meeting 5-7 nov 13 - session 1: ...

transmart community meeting 5-7 nov 13 - session 3:...

mongodb world 2016: mongodb & ibm