taking care about your schema in the mongodb’s schemaless world

17
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/ Taking care about your schema in the MongoDBs schemaless world Alessandro Palumbo [email protected] http://it.linkedin.com/in/alessandropalumbo/ http://www.byte-code.com

Upload: mongodb-milan

Post on 14-Jul-2015

428 views

Category:

Data & Analytics


22 download

TRANSCRIPT

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Taking care about

your schema in the

MongoDB’s

schemaless worldAlessandro Palumbo

[email protected] http://it.linkedin.com/in/alessandropalumbo/

http://www.byte-code.com

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

MongoDB

from humongous “huge; enormous”

NoSql

OPEN-source

Document-OrientedJSON-style documents

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

JSON-style documents

{ "_id" : "6c85fa4c-fa64-44e2-89c9-e5eb7f306ed7", "code" : "CRS0001", "name" : "Test", "description" : "Test description", "active" : true, "scheduledDate" : { "from" : ISODate("2013-09-12T00:00:00.000Z"), "to" : ISODate("2013-10-31T00:00:00.000Z") }, "version" : NumberLong(1) }

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

don’t be relationaL

no joins

NO FULL transactions

no SCHEMA

WE CAN EMBED

IS IT REALLY AN ISSUE?

DOCUMENT LEVELTRANSACTIONS

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

DESIGN

DESIGN

FOR

QUERYEMBEDDED

DATA

vs

References

DYNAMIC

SCHEMA

VS

static

languages

friendly fire(aka RTFM)

AVOID

NATURAL

KEYS AS

IDENTIFIERS

PERFORMANCE

PREALLOCATE

FIELDS?

TUNING

UPDATES

AND

INSERTS

DOCUMENT

MOVING

SLOWS

YOU

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

FRIENDLY FIRE

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

All collections have an index on the id field that exists by default. If ID IS NOT PROVIDED the driver or the mongod will create an _id field with an ObjectID value.

AVOID

NATURAL

KEYS AS

IDENTIFIERS

ADD AN UNIQUE INDEX ON THE NATURAL KEY, SOMETIMES THE APPLICATION REALM CAN EVOLVE IN AN UNEXPECTED WAY

REMEMBER THAT UNIQUE INDEXES FIELDS MUST BE PART OF THE SHARD KEY IF SHARDING IS ENABLED

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

DESIGN

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

DOCUMENT DESIGN IS FUNCTIONAL TO THE QUERIES THAT WILL EXISTS IN THE APPLICATION

DESIGN

FOR

QUERY

REFERENCE OR EMBED DOCUMENTS,

“denormalized” is not always

a bad word

your document design will affect what kind of OPERATIONS will be safe or not

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

Embedded data models allow applications to store related pieces of information in the same database record

EMBEDDED

DATA

vs

References

The maximum BSON document size is 16 megabytes and embedding may lead to performance issues if not correctly used

USUALLY there is a “contains” relation

between the embedding and the embedded object

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

Normalized data models describe relationships using references between documents

EMBEDDED

DATA

vs

References

NO Referential integrity is supported, references could point to a not existing object

References provides more flexibility than embedding but remember that client-side applications will have to lookup for referenced objects with multiple queries

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

why use dynamic schema if we are not using a dynamic programming language?

DYNAMIC

SCHEMA

VS

static

languages

inheritance is not only a matter of hierarchy, it could be also a matter of composition

composition is the key to introduce dynamic schema in a static programming language

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

PERFORMANCE

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

MONGODB handle the space allocation of a RECORD considering also a PADDING FACTOR

DOCUMENT

MOVING

SLOWS

YOU

WHEN AN UPDATED DOCUMENT DOES NOT FIT IN THE RECORD SPACE IT WILL BE MOVED

DYNAMIC SCHEMA IS THE FIRST CAUSE OF DOCUMENT MOVING

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

FIELDS PREALLOCATION CAN FIX THE DOCUMENT MOVING ISSUES IN SOME USE CASES

PREALLOCATE

FIELDS?

Default values must be used to preallocate, this MUST BE HANDLEDin the application

NULL is not a default value :-) as it has its own type

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Alessandro Palumbo - [email protected] - http://www.byte-code.com

MongoDB stores BSON documents as a sequence of fields and values, not as aN hash table

TUNING

UPDATES

AND

INSERTS

WRITING THE FIRST FIELD OF A DOCUMENT (OR A NESTED DOCUMENT) is considerably faster than writing THE LAST

Intra-Document Hierarchy could help to handle the issue

Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/

Any questions?