mongodb as a nosql database - kit · 2014-09-05 · introduction marek krzysztof szuba (...
TRANSCRIPT
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
MongoDB as a NoSQL Database
Parinaz Ameri, Marek Szuba
GridKa School, 03.09.2014
1/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Introduction
Parinaz Ameri ([email protected])
PhD researcher in computer science at KIT
Working at DLCL Climatology of the LSDMA project
Working on two projects using MongoDB
Focus on geospatial / meteorological data
2/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Introduction
Marek Krzysztof Szuba ([email protected])
PhD in high-energy particle physics
Currently working at DLCL Climatology of LSDMA
Focus: visualisation of satellite climate dataMongoDB storage with a Web front-end
3/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Round of Introduction
Let us get to know you!
4/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Outline
1 Introduction
2 SQL vs. NoSQL
3 Getting started
4 CRUD
5 Indexing
6 Schema Design
7 MongoDB on the Web
8 Authentication, Authorisation and Encryption
9 Accounting
10 Replication
11 Sharding
5/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
ACID Transactions
DANGER
ACIDKEEP AWAY
Atomicity: �all or nothing�
no interruptions mid-transaction
Consistency: �valid data�
takes the database from one consistentstate to another consistent state
Isolation: �one after the other�
concurrent transactions will be treatedas if they were serial
Durability: �no loss�
ensures that any transaction committedto the database will not be lost
6/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Where Did the Problem Start?
SQLRDBMS and the Web 2.0 era (Big Data)
RDBMS and transactions (Scalability)
RDBMS and JOIN (Partitioning)
RDBMS and new hardware
7/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CAP Theorem (Brewer, Lynch)
Consistency:Do all applications see all the same data?
Availability:Can I interact with the system in the presence of a failure?
Partitioning:If two segments of your system can not talk to each other, canthey progress by they own?
if yes, you sacri�ce consistencyif no, you sacri�ce availability
8/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CAP Theorem (Brewer, Lynch)
Consistency:Do all applications see all the same data?
Availability:Can I interact with the system in the presence of a failure?
Partitioning:If two segments of your system can not talk to each other, canthey progress by they own?
if yes, you sacri�ce consistencyif no, you sacri�ce availability
8/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CAP Theorem (Brewer, Lynch)
Consistency:Do all applications see all the same data?
Availability:Can I interact with the system in the presence of a failure?
Partitioning:If two segments of your system can not talk to each other, canthey progress by they own?
if yes, you sacri�ce consistencyif no, you sacri�ce availability
8/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CAP Theorem (Brewer, Lynch)
Consistency:Do all applications see all the same data?
Availability:Can I interact with the system in the presence of a failure?
Partitioning:If two segments of your system can not talk to each other, canthey progress by they own?
if yes, you sacri�ce consistencyif no, you sacri�ce availability
8/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CAP Theorem
9/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
ACID vs. BASE
BASE:
BasicAvailability
Soft-state
Eventual Consistency
Inconsistencyis the worst thing
Being Unavailable
is the worst thing
10/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
NoSQL Databases
Key/ValueDynamoRedis
ColumnBigTableHBase
Cassandra
DocumentMongoDBCouchDB
Riak
GraphNeo4jFockDB
Typical Usage:
Key/Value: distributed hash table, caching
Column families: distributed data storage
Document Store: Web applications
Graph databases: social networking, Recommendations
11/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Intro to Mongo
MongoDB from "humongous"
Open source, written in C++
Developed by company: MongoDB Inc. (formerly 10gen)
First release in 2009
Last released stable version is 2.6.4
mongo shell � an interactive JavaScript shell
Both MongoDB and its drivers are available under free licenses
Commercial support, MongoDB Enterprise
12/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
MongoDB: Overview
Document-oriented database system:
schema-lesskey-value storeJSON documents instead of rows of a table:{ mykey: myvalue, answer: 42,myarray: [ "red", "green", "blue" ] }
Storage in BSON (short for Binary JSON):
a serialization format used to store documents and makeremote procedure calls in MongoDB
Excellent documentation:
www.mongodb.org
Web pages, videos, workshops, training, newsletters, . . .
13/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
MongoDB: Overview
Document-oriented database system:
schema-lesskey-value storeJSON documents instead of rows of a table:{ mykey: myvalue, answer: 42,myarray: [ "red", "green", "blue" ] }
Storage in BSON (short for Binary JSON):
a serialization format used to store documents and makeremote procedure calls in MongoDB
Excellent documentation:
www.mongodb.org
Web pages, videos, workshops, training, newsletters, . . .
13/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
MongoDB: Overview
Document-oriented database system:
schema-lesskey-value storeJSON documents instead of rows of a table:{ mykey: myvalue, answer: 42,myarray: [ "red", "green", "blue" ] }
Storage in BSON (short for Binary JSON):
a serialization format used to store documents and makeremote procedure calls in MongoDB
Excellent documentation:
www.mongodb.org
Web pages, videos, workshops, training, newsletters, . . .
13/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Terminology
RDBMS Terms/concepts MongoDB Terms/concepts
database database
collection
document
field
table
row
column
Join Embedding and Linking
Partition Shard
14/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Installation and Con�guration
To install:
get repositories, packages or source code fromhttp://www.mongodb.org
Red Hat/Fedora/CentOS:yum install mongodb-org-server (from repo)rpm -ihv mongodb-org-server-2.6.1-2.x86_64.rpm (from �le)
Ubuntu/Debian:apt-get install mongodb-org (from repo)dpkg -i mongodb-org_2.6.1_amd64.deb (from �le)
To con�gure:
edit /etc/mongod.con�g
Start the mongo service with e.g.:
service mongod start or systemctl start mongodb.service
15/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
ObjectID
ObjectID is a 12-byte BSON type, constructed using:
a 4-byte timestamp,
a 3-byte machine identi�er,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
MongoDB uses ObjectId values as the default values for _id �elds{"_id" : ObjectId("52308273355b531d2aeb768b"),"loc" : {
"lat" : 68.91,"lon" : 46.98},
"id" : "MLS-Aura_L2GP-O3_v03-30-c01_2005d001.0","time" : ISODate("2004-12-31T23:59:41Z")}
16/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CreateRUD
1 db.collection.insert()
insert one or more documents into a collectionThe primary key_id is automatically added if _id �eld is notspeci�ed
2 db.collection.save()
depending on the document parameters, insert a newdocument or update an existing one
3 mongoimport/mongoexport
import content from a JSON, CSV, or TSV created bymongoexporta single threaded methodinserts one document at a time
4 mongodump/mongorestore
create a binary export of the database content/use a binarydump to import to databasea backup method
17/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Create in Perspective
db.authors.insert({id: 2,�rst_name: "Arthur",last_name: "Miller",age: 25,})
CREATE TABLE authors(id MEDIUMINT NOT NULLAUTO_INCREMENT,user_id Varchar(30),�rst_name char(15),last_name char(15),age Integer,PRIMARY KEY (id))
18/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CReadUD
db.collection.�nd(criteria, projection)
select documents in a collectionreturn a cursor (a pointer to the result set of a query)db.collection.�ndone() is its limited version
Cursor methods:
count(), limit(), skip(), snapshot(), sort(),batchSize(), explain(), hint(), forEach(), ...db.collection.�nd().count()
e.g. db.cities.�nd({"city": "KARLSRUHE"})
19/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Conditional Operators
Comparison Query Operators
$gt, $gte, $lt, $lte, $ne, $all, $in, $nin
Logical Query Operators
$and, $or, $nor, $not
Element Query Operators
$exists, $type
Geospatial Query Operators
$geoWithin, $centerSphere
e.g.db. cities.�nd({"loc": { $geoWithin: { $center: [[8.4, 49.017 ],0.1] }}})
20/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Read in Perspective
db.authors.�nd({last_name:"Miller"})SELECT *FROM authorsWHERE last_name = �Miller�
21/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CRUpadateD
1 db.collection.update(query, update, options):
modify existing documentsbased on query speci�cations can modify a �eld in or thewhole documentoptions are:
upsert: boolean insert new document if no match existsmulti: booleanwriteConcerns: document concern levels: (unacknowledged,acknowledged, journaled, ...)
2 db.collection.save()
if the document contains _id �eldsave will call update, with upsert option
22/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Atomic Modi�ers
Field operators:
$inc, $mul, $set, $unset,$rename
Array operators:
$, $push, $pull
db.books.update({ item: "Divine Comedy" },{ $set: { price: 18 },$inc: { stock: 5 }})
23/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
CRUDelete
1 db.collection.remove(query, options):
removes all documents from a collection, w/o indexesoptions are:
justOne booleanwriteConcerns
2 db.collection.drop()
drops the entire collection, including the indexes
24/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Delete in Perspective
db.authors.remove({ _id: 1})DELETE FROM authorsWHERE id = �1"
25/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Introduction
Index is special data structure at collection level.
Using indexes ensures that MongoDB only scans smallestpossible number of documents.
Any �eld or sub-�eld of documents in a collection can beindexed.
No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)
MongoDB does not support clustered indexes.
26/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Introduction
Index is special data structure at collection level.
Using indexes ensures that MongoDB only scans smallestpossible number of documents.
Any �eld or sub-�eld of documents in a collection can beindexed.
No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)
MongoDB does not support clustered indexes.
26/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Introduction
Index is special data structure at collection level.
Using indexes ensures that MongoDB only scans smallestpossible number of documents.
Any �eld or sub-�eld of documents in a collection can beindexed.
No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)
MongoDB does not support clustered indexes.
26/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Introduction
Index is special data structure at collection level.
Using indexes ensures that MongoDB only scans smallestpossible number of documents.
Any �eld or sub-�eld of documents in a collection can beindexed.
No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)
MongoDB does not support clustered indexes.
26/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Introduction
Index is special data structure at collection level.
Using indexes ensures that MongoDB only scans smallestpossible number of documents.
Any �eld or sub-�eld of documents in a collection can beindexed.
No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)
MongoDB does not support clustered indexes.
26/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Behaviour
All indexes are B-trees in MongoDB � support both equalityand range-based queries.
Index stores items internally in order, stored by the value of�eld.
Order of indexes can be ascending or descending.
MongoDB may traverse an index in either direction.
No scanning of any documents when the query and itsprojection are included in index.
27/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Behaviour
All indexes are B-trees in MongoDB � support both equalityand range-based queries.
Index stores items internally in order, stored by the value of�eld.
Order of indexes can be ascending or descending.
MongoDB may traverse an index in either direction.
No scanning of any documents when the query and itsprojection are included in index.
27/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Behaviour
All indexes are B-trees in MongoDB � support both equalityand range-based queries.
Index stores items internally in order, stored by the value of�eld.
Order of indexes can be ascending or descending.
MongoDB may traverse an index in either direction.
No scanning of any documents when the query and itsprojection are included in index.
27/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Behaviour
All indexes are B-trees in MongoDB � support both equalityand range-based queries.
Index stores items internally in order, stored by the value of�eld.
Order of indexes can be ascending or descending.
MongoDB may traverse an index in either direction.
No scanning of any documents when the query and itsprojection are included in index.
27/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Behaviour
All indexes are B-trees in MongoDB � support both equalityand range-based queries.
Index stores items internally in order, stored by the value of�eld.
Order of indexes can be ascending or descending.
MongoDB may traverse an index in either direction.
No scanning of any documents when the query and itsprojection are included in index.
27/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Handling in MongoDB
Building an index on a collection:
db.collection.createIndex()
Creating an index if it does not currently exists:
db.collection.ensureIndex()
Getting description of the existing indexes:
db.germancities.getIndexes()
Having a report on the query execution plan, including index use:
cursor.explain()
Forcing MongoDB to use a speci�c index for a query:
cursor.hint()
28/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Types(1)
1 Default _id:
All document have a generated index with ObjectId value forthis �eld_id unique index prevents users to generate two documentswith same values
2 Single-Key index:
Supports user-de�ned index on a single keydb.books.ensureIndex({ "author": 1 } )
29/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Types(2)
3 Compound Indexes
To support several di�erent queriesIf some queries use x key and some queries use x key combinedwith y keydb.books.ensureIndex( { "author": 1, "price": 1 } )A single compound index on multiple �elds can support all thequeries that search a �pre�x� subset of those �elds.Maximum 31 �elds can be combined in one compound index.Two important factors play their role:
list order (i.e. the order in which the keys are listed in theindex)sort order (i.e. ascending or descending)
30/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Compound Indexes
If the collection orders has the following compound index:{ shell: 1, ord_date: -1 }The compound index can support the following queries:
db.orders.�nd({ shell: { $in: ["B", "S" ] } } )
db.orders.�nd( {shell: {$in:[ "S", "B" ] },ord_date: { $gt: new Date("2014-02-01") } })
But not the following two queries:
db.orders.�nd({ ord_date: { $gt: new Date("2014-09-07") }} )
db.orders.�nd( { } ).sort({ ord_date: 1 } )
31/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Intersection
Before version 2.6, only a single index to ful�ll most queries.
The exception was just for $or clauses, which could use asingle index for each $or clause.
Intersection can employ multiple/nested index intersections toresolve a query.
Note: Index intersection does not apply when the sort()operation requires an index completely separate from the querypredicate.
32/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Intersection and Sort
The orders collection has the following indexes:
{ qty: 1 }
{ shell: 1, ord_date: -1 }
{ shell: 1 }
{ ord_date: -1 }
MongoDB cannot use index intersection for the following querywith sort:
db.orders.�nd({ qty: { $gt: 10 } } ).sort({ shell: 1 } )
MongoDB can use index intersection for the following query andful�ll part of the query predicate:
db.orders.�nd( { qty: { $gt: 10 } , shell: "B" } ).sort({ord_date: -1 } )
33/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Types(3)
4 Multikey Index:
index contents of arrays.separate index entries for every element of the array.Automatically generated for array �elds.
5 Geospatial Index:
To support e�cient queries of geospatial coordinate data.2d index support two dimensional geometry queries2dspher index support there dimensional geometry queries
6 Text Indexes:
supports searching for string content in a collection.
7 Hash Indexes:
more random distribution of values along their range.Just equality matches, not range-based queries
34/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Properties
Unique Indexes:
causes MongoDB to reject all documents that contain aduplicate value for the indexed �eld.db.members.ensureIndex( { "usr_id": 1 }, { unique: true } )
Sparse Indexes:
Only contains documents that contains indexed �led. (even ifit contains null value)db.addresses.ensureIndex( { "usr_tel": 1 }, { sparse: true } )
35/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Covered Query
Covered Query:
all the �elds in the query are part of an index, and
all the �elds returned in the results are in the same index.
Note: Try to Create Indexes that Support Covered Queries
36/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Covered Query Bene�ts
Querying the index can be much faster than document that are notin index, because:
Index keys are smaller than the documents they catalog.
Indexes are usually either available in RAM or locatedsequentially on disk.
37/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Covered Query Checking
To determine covered query:
use cursor.explain()
indexOnly: true covered query
Covered query cannot support:
multi-key index (If an indexed �eld is an array)
any of the indexed �elds are �elds in subdocuments.
38/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Index Strategy
Depends on:
kinds of queries you expect
the ratio of reads to writes
amount of free memory on the system
Best Index Design Strategy:
Pro�le variety of index con�gurations
39/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Traditional Schema Design
Put the data into First Normal Form (NF1)
Putting data in �xed tablesEach cell should have just a single scalar valueRemove redundancy from a set of tables
40/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Denormalization and Performance
What is the problem?
Seek takes over 99% of the the time spent reading a row.JOINs typically require random seeks.
41/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
MongoDB and Schema Design
Has performance bene�ts of redundant denormalized format
Complicated schema design process
Decide if you have to Embed or Reference
General answer to schema design problem with MongoDB:
It depends!!!
42/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Embedding Bene�ts
Read Performance:
data locality: documents are stored continuously on diskone seek to load the entire documentone round trip to the database
Atomicity and Isolation:
with Mongo not supporting multidocument transactionsguarantee consistency
43/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Embedding Bene�ts
Read Performance:
data locality: documents are stored continuously on diskone seek to load the entire documentone round trip to the database
Atomicity and Isolation:
with Mongo not supporting multidocument transactionsguarantee consistency
43/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Embedding Penalties
Write operations may take long.
The larger a document is, the more RAM it uses.
Growing documents must eventually get copied to largerspaces.
MongoDB documents have a hard size limit of 16 MB.
but: GridFS
44/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Embedding Penalties
Write operations may take long.
The larger a document is, the more RAM it uses.
Growing documents must eventually get copied to largerspaces.
MongoDB documents have a hard size limit of 16 MB.
but: GridFS
44/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Embedding Penalties
Write operations may take long.
The larger a document is, the more RAM it uses.
Growing documents must eventually get copied to largerspaces.
MongoDB documents have a hard size limit of 16 MB.
but: GridFS
44/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Embedding Penalties
Write operations may take long.
The larger a document is, the more RAM it uses.
Growing documents must eventually get copied to largerspaces.
MongoDB documents have a hard size limit of 16 MB.
but: GridFS
44/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Embedding Penalties
Write operations may take long.
The larger a document is, the more RAM it uses.
Growing documents must eventually get copied to largerspaces.
MongoDB documents have a hard size limit of 16 MB.
but: GridFS
44/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Where to use Linking?
Referencing for �exibility:
if your application may query data in many di�erent waysif you are not able to anticipate the patterns in which datamay be queried
45/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Embedding vs. Linking
Decide based on more READs or more WRITEs that yourapplication may have.
46/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Polymorphic Schema
Schemaless: not enforcing a particular structure ondocuments in a collection
Polymorphic schema: all documents in a collection hassimilar, but not identical structure
47/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Polymorphism
Support Object-Oriented Programming
Enable Schema Evolution
Its major drawback is storage e�ciency
48/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Databases in Web ApplicationsA Bit of History
The LAMP stack:
Linux, Apache, MySQL and PHP
Established work horse of Web applications worldwide
Free and Open Source
Fast
Simple to deploy and maintain
but
Three di�erent languages: JavaScript, PHP, SQL
Data always as tables
Problems scaling to big-data applications
49/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Databases in Web ApplicationsEnter MongoDB
Alternative solution: the MEAN stack
MongoDBExpress � Node.js Web-application frameworkAngularJS � JavaScript Model-View-Whatever frameworkNode.js � high-performance server platform for Webapplications
Flexible data structure
Optimised for big data and high load
One language throughout the stack
Free and Open Source
50/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
MongoDB and Node
O�cial native Node.js driver since 2012
Several higher-level wrappers. Examples:
Mongoose � o�cial, extensive Object Document MapperMongoskin � lightweight, schema-less, similar to pymongo
usage similar to the mongo shellasynchronous: methods take callback functions
51/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Example: MongoDB REST API
Problem: cannot talk to MongoDB from Web browser
Solution: provide a representational state-transfer API
Intermediate Node.js Web server
Mongoskin to talk to the database
Express to provide REST plumbing
Client I/O with JSON
52/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Example: MongoDB REST API
Goal: map HTTP requests for URIs likehttp://myMongo/collections/myColl/docId to MongoDB CRUD
With Express, just a few lines of code:
initialisationinclude JSON body parserde�ne handlers for appropriate URIs and HTTP verbsstart listening for requests!
Example code: https: // github. com/ mkszuba/ mongoREST. git
53/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Overview
Default con�guration:
no user authorisation � full access for all connections
full trust between cluster members
plain-text network communication
Unsuitable for production systems!
54/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Encryption
Data transfer: supports SSL
�sslMode on the command line, net.ssl.mode in con�gurationdi�erent degrees of enforcementvalid certi�cate(s) requirednot available in binary builds from mongodb.orb
Data at rest: not encrypted
use third-party solutions, e.g. LUKS or BitLocker
55/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Authorisation
Role-Based Access Control
an user assigned to one or more rolesa role contains a set of privilegesa privilege: a resource plus a set of allowed actionsa resource: collection, set of collections, database or cluster
Some built-in roles:
user access: read, readWritedatabase administration: dbAdmin, userAdmincluster administration: clusterManagerbackup, restoreroot
Custom roles possible
Users are added per-database but can have roles spanningother databases
56/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Authorisation
To enable:
pass �auth on the command line,set security.authorization in con�guration, orcreate a key �le, set security.keyFile in con�guration
Disables anonymous access
localhost exception: local access and no users de�ned
also applies to shards
57/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Authentication
Two modes
user accesscluster membership
Several mechanisms available
standard: challenge-response, X.509 certi�catesEnterprise: Kerberos, LDAP
One user per connection
58/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Challenge-Response Authentication
Users: validate password against user database
Clusters: same key �le on di�erent machines
contains a (preferably random) key 6�1024 bytes long, Base64character set
Transmits modi�ed MD5 password hash
SSL recommended � hash protected against replay attacksbut could be brute-forced
59/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
X.509 Authentication
As seen on the Web and elsewhere
SSL required
Users: certi�cate subject registered in user database
Clusters: unique certi�cate and key for each machine, in aPEM �le
Constraints on certi�cates
users: appropriate key-usage dataclusters: same CA for all members, subject must match hostnamemustn't re-use subject for user and cluster authentication
60/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Accounting
monitor database use, performance studies etc.
standard: basic logging, journal monitor, pro�ling
Enterprise: full event audit system
61/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Accounting
Journal Monitor
Journalling � a mechanism for ensuring data consistency
Provides basic monitoring features: serverStatus,journalLatencyTest
Useful for assessing overall performance
Pro�ling
estimate performance/cost of database operations
enabled per-instance or per-database
stores results in the system.pro�le collection
62/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Replication
Database Replication ensures:
redundancy
backup
automatic failover
Replication occurs through groups of servers known asreplica sets.
63/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Replica Set Members
Primary/Master:
The only node that accepts writesThe default read node
Secondary/Slave:
The read-only membersReplicate from primary
Arbiter:
Doesn't replicate a copy of data setCannot become PrimaryMay be used to add votes for reelection process
64/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Primary and Oplog
Primary saves all write operations in its OplogOplog is a capped collection which records all of themodi�cationsOplog by default usually 5% of disk spaceSecondaries copy the Oplog from Primary asynchronously
Writes Reads
Arbiter
Secondary Secondary
Secondary
Primary
Client Application/ Driver
Replication
Replication
Replication
65/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Secondary and Read Preferences
Read Preferences:
Primary
PrimaryPreferred
Secondary
SecondaryPreferred
Nearest
Client applications can be con�gured with a Read Preference, on a:
per-connection,
per-collection,
or per-operation basis.
66/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Election Process
Election starts every time there isn't any Primary.
Arbiter
Secondary Secondary
Secondary
Primary
Heartbeat
Heartbeat
Heartbeat
Heartbeat
Heartbeat
67/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
E�ective Factors
Priority:
Members will vote for the node with highest prioritySet priority to 0 to prevent Secondary to ever become Primary
Optime:
Timestamp of last operation applied from Oplog for eachmemberThe member with the most recent optime from all visiblemembers will get elected
Connections:
Primary candidate must be able to connect with majority ofthe membersMajority: total number of accessible voting membersChoose number of Replica set members by considering faulttolerance.
68/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Arbiter and Election
Maximum seven voting Members.
Use Arbiter to get odd number of members.
Arbiter
Secondary Secondary
Heartbeat
Heartbeat
Heartbeat
Heartbeat
Primary
69/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Primary/Secondary vs. Master/Slave
Primary/Secondary is the preferred MongoDB replicationmethod.
Primary/Secondary can not support more than 12 nodes.
Master/Slave doesn't support automatic failover.
70/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Introduction to Sharding
Automatic horizontal partitioning and management
No downtime is needed to convert to a sharded system
Fully consistent
No code changes required
At collection level
71/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Sharding Components
Con�g server: holds a complete copy of cluster metadata
Mongos: a lightweight routing service
Shards:
mongod: the primary daemon process for MongoDB systemreplica set
72/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
Sharding
client
mongos mongos
mongod
mongod
mongod
mongod
Config Srv
Config Srv
Config Srv
Replica Set Replica Set
Shard Shard
73/75 Parinaz Ameri, Marek Szuba 03.09.2014
Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding
References
1 www.mongodb.org
2 MongoDB Applied Design Patterns, Rick Copeland, 2013
74/75 Parinaz Ameri, Marek Szuba 03.09.2014