schema on read · rank dbms model score growth (20 mo) 1. oracle relational dbms 1,442 -5% 2. mysql...

Post on 21-May-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SCHEMA ON READ

Index everything One query type Low latency High concurrency

Index nothing Queries as programs High latency Low concurrency

Index everything One query type Low latency High concurrency

Index nothing Queries as programs High latency Low concurrency

IT’S POPULAR, BUT WHY?

7

Diverse operational workloads are common

Top 5 Marketing Firm Government Agency Top 5 Investment Bank

Data Key / Value 10+ fields, arrays, nested documents 20+ fields, arrays, nested documents

Queries Key – based

1-100 docs / query 80/20 read/write

Compound queries Range queries

MapReduce 20/80 read/write

Compound queries Range queries

50/50 read/write

Servers ~250 ~50 4

Ops / Sec 1,200,000 500,000 30,000

8

Some deployments are large

Cluster Scale Performance Scale Data Scale

Entertainment Company 1,400 servers 250 Million Ticks / Sec Petabytes

Asian Internet Company 1,000+ servers 300k Ops / Sec 10s of billions of

objects

250+ servers Federal Agency 500k Ops / Sec 13 billion documents

9

Multiple indicators suggest adoption is strong

RANK DBMS MODEL SCORE GROWTH (20 MO)

1. Oracle Relational DBMS 1,442 -5%

2. MySQL Relational DBMS 1,294 2%

3. Microsoft SQL Server Relational DBMS 1,131 -10%

4. MongoDB Document Store 277 172%

5. PostgreSQL Relational DBMS 273 40%

6. DB2 Relational DBMS 201 11%

7. Microsoft Access Relational DBMS 146 -26%

8. Cassandra Wide Column 107 87%

9. SQLite Relational DBMS 105 19%

Source: DB-engines database popularity rankings; May 2015

Source: Stack Overflow via Stackoverkill.com

Source: Stack Overflow via Stackoverkill.com

TO ME, THREE THINGS DRIVE THIS ADOPTION

13

We asked users why, here’s what they told us

{ CODE } DB SCHEMA XML CONFIG

APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING

14

We asked users why, here’s what they told us

{ CODE } DB SCHEMA XML CONFIG

APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING

15

RDBMS MongoDB

Database Database

Table Collection

Index Index

Row Document

Join Embedding & Linking

#1 The data model

16

Documents are rich data structures

{ first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000}, { model: ‘Rolls Royce’, year: 1965, value: 330000} ]

}

Fields can contain an array of sub-documents

Typed field values

Fields can contain arrays

String

Number

Geo-Location

Fields

17

Documents are self-describing

{ product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’],

size_oz: [8, 32], finish: [‘satin’, ‘eggshell’]

}

{ product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ],

material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’

}

{ product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’,

frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’,

weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26}

Documents in the same product catalog collection in MongoDB

18

#2 Idiomatic drivers & frameworks

Morphia

MEAN Stack

// Java: mapsDBObject query = new BasicDBObject(”publisher.founded”, 1980));Map m = collection.findOne(query);Date pubDate = (Date)m.get(”published_date”);

// Javascript: objectsm = collection.findOne({”publisher.founded” : 1980});pubDate = m.published_date; // ISODateyear = pubDate.getUTCFullYear();

# Python: dictionariesm = coll.find_one({”publisher.founded” : 1980 });pubDate = m[”pubDate”].year # datetime.datetime

Documents map to language constructs

20

#3 It’s easy…and fun

•  Easy to acquire – AGPL license •  Easy to install and configure – up and running in <5 min •  Easy to get high performance – no black magic for millisecond latency, scale out architecture •  Easy to deliver “always on” – replication and automatic failover built in •  Easy to add, query data – no complex modeling, no DDL

21

#3 It’s easy…and fun

•  Easy to acquire – AGPL license •  Easy to install and configure – up and running in <5 min •  Easy to get high performance – no black magic for millisecond latency, scale out architecture •  Easy to deliver “always on” – replication and automatic failover built in •  Easy to add, query data – no complex modeling, no DDL

BUT WHAT ABOUT •  Data governance? •  Referential integrity? •  Analytics?

DOCUMENT VALIDATION

23

Data governance: document validation

Implement data governance without sacrificing the

agility that comes from schema on read

24

Document validation gives you flexible control

•  Use familiar MongoDB Query Language •  Automatically tests each insert/update; delivers warning or error if a rule is broken •  You choose what keys to validate and how

db.runCommand({ collMod: "contacts", validator: { $and: [ {year_of_birth: {$lte: 1994}}, {$or: [ {phone: { $type: ”string"}}, {email: { $type: ”string"}} ]}] }})

25

Example validation failure

db.contacts.insert( name: "Fred", email: "fred@clusterdb.com", year_of_birth: 2012})

Document failed validationWriteResult({ "nInserted": 0, "writeError": { "code": 121, "errmsg": "Document failed validation”}})

26

Many ways to validate, no foreign keys yet

•  Can check most things that work with a find expression –  Existence –  Non-existence –  Data type of values –  <, <=, >, >=, ==, != –  AND, OR –  Regular expressions

–  Some geospatial operators (e.g. $geoWithin & $geoIntersects) •  Validate existing data by wrapping expression in $not

27

Where MongoDB validation excels (vs. RDBMS)

•  Simple –  Use familiar search expressions (MQL) –  No need for stored procedures

•  Flexible –  Only enforced on mandatory parts of the schema –  Can start adding new data at any point and then add validation later if needed

•  Practical to deploy –  Simple to role out new rules across thousands of production servers

•  Light weight –  Negligible impact to performance

28

Controlling validation

validationLevel

off moderate strict

validationAction

warn

No checks

Warn on validation failure for inserts & updates to existing valid documents. Updates to

existing invalid docs OK.

Warn on any validation failure for any insert or update.

error

No checks

Reject invalid inserts & updates to existing valid documents.

Updates to existing invalid docs OK.

Reject any violation of validation rules for any insert or update.

DEFAULT

29

Versioning of validators (optional)

•  Application can lazily update documents with an older version or with no version set at all

db.runCommand({ collMod: "contacts", validator: {$or: [{version: {"$exists": false}}, {version: 1, {Name: {"$exists": true}} }, {version: 2, {Name: {"$type": ”string"}} } ] } })

SCHEMA DISCOVERY

FUTURE DECISIONS

33

Still lots of hard problems to solve

•  Schema evolution •  Specialized storage engines

–  WORM –  Blockchain –  Proprietary hardware –  Integrated data warehouse

•  Complex transactions

34

One surface fits all

Content Repo IoT Sensor Backend Ad Service Customer

Analytics Archive

MongoDB Query Language (MQL) + Native Drivers

MongoDB Document Data Model

BTree LSM

Man

agem

ent

Sec

urity

In-memory WORM Archive

top related