mongodb as a nosql database - kit · 2014-09-05 · introduction marek krzysztof szuba (...

92

Upload: others

Post on 10-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

MongoDB as a NoSQL Database

Parinaz Ameri, Marek Szuba

GridKa School, 03.09.2014

1/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 2: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Introduction

Parinaz Ameri ([email protected])

PhD researcher in computer science at KIT

Working at DLCL Climatology of the LSDMA project

Working on two projects using MongoDB

Focus on geospatial / meteorological data

2/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 3: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Introduction

Marek Krzysztof Szuba ([email protected])

PhD in high-energy particle physics

Currently working at DLCL Climatology of LSDMA

Focus: visualisation of satellite climate dataMongoDB storage with a Web front-end

3/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 4: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Round of Introduction

Let us get to know you!

4/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 5: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Outline

1 Introduction

2 SQL vs. NoSQL

3 Getting started

4 CRUD

5 Indexing

6 Schema Design

7 MongoDB on the Web

8 Authentication, Authorisation and Encryption

9 Accounting

10 Replication

11 Sharding

5/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 6: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

ACID Transactions

DANGER

ACIDKEEP AWAY

Atomicity: �all or nothing�

no interruptions mid-transaction

Consistency: �valid data�

takes the database from one consistentstate to another consistent state

Isolation: �one after the other�

concurrent transactions will be treatedas if they were serial

Durability: �no loss�

ensures that any transaction committedto the database will not be lost

6/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 7: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Where Did the Problem Start?

SQLRDBMS and the Web 2.0 era (Big Data)

RDBMS and transactions (Scalability)

RDBMS and JOIN (Partitioning)

RDBMS and new hardware

7/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 8: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CAP Theorem (Brewer, Lynch)

Consistency:Do all applications see all the same data?

Availability:Can I interact with the system in the presence of a failure?

Partitioning:If two segments of your system can not talk to each other, canthey progress by they own?

if yes, you sacri�ce consistencyif no, you sacri�ce availability

8/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 9: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CAP Theorem (Brewer, Lynch)

Consistency:Do all applications see all the same data?

Availability:Can I interact with the system in the presence of a failure?

Partitioning:If two segments of your system can not talk to each other, canthey progress by they own?

if yes, you sacri�ce consistencyif no, you sacri�ce availability

8/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 10: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CAP Theorem (Brewer, Lynch)

Consistency:Do all applications see all the same data?

Availability:Can I interact with the system in the presence of a failure?

Partitioning:If two segments of your system can not talk to each other, canthey progress by they own?

if yes, you sacri�ce consistencyif no, you sacri�ce availability

8/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 11: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CAP Theorem (Brewer, Lynch)

Consistency:Do all applications see all the same data?

Availability:Can I interact with the system in the presence of a failure?

Partitioning:If two segments of your system can not talk to each other, canthey progress by they own?

if yes, you sacri�ce consistencyif no, you sacri�ce availability

8/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 12: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CAP Theorem

9/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 13: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

ACID vs. BASE

BASE:

BasicAvailability

Soft-state

Eventual Consistency

Inconsistencyis the worst thing

Being Unavailable

is the worst thing

10/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 14: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

NoSQL Databases

Key/ValueDynamoRedis

ColumnBigTableHBase

Cassandra

DocumentMongoDBCouchDB

Riak

GraphNeo4jFockDB

Typical Usage:

Key/Value: distributed hash table, caching

Column families: distributed data storage

Document Store: Web applications

Graph databases: social networking, Recommendations

11/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 15: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Intro to Mongo

MongoDB from "humongous"

Open source, written in C++

Developed by company: MongoDB Inc. (formerly 10gen)

First release in 2009

Last released stable version is 2.6.4

mongo shell � an interactive JavaScript shell

Both MongoDB and its drivers are available under free licenses

Commercial support, MongoDB Enterprise

12/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 16: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

MongoDB: Overview

Document-oriented database system:

schema-lesskey-value storeJSON documents instead of rows of a table:{ mykey: myvalue, answer: 42,myarray: [ "red", "green", "blue" ] }

Storage in BSON (short for Binary JSON):

a serialization format used to store documents and makeremote procedure calls in MongoDB

Excellent documentation:

www.mongodb.org

Web pages, videos, workshops, training, newsletters, . . .

13/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 17: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

MongoDB: Overview

Document-oriented database system:

schema-lesskey-value storeJSON documents instead of rows of a table:{ mykey: myvalue, answer: 42,myarray: [ "red", "green", "blue" ] }

Storage in BSON (short for Binary JSON):

a serialization format used to store documents and makeremote procedure calls in MongoDB

Excellent documentation:

www.mongodb.org

Web pages, videos, workshops, training, newsletters, . . .

13/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 18: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

MongoDB: Overview

Document-oriented database system:

schema-lesskey-value storeJSON documents instead of rows of a table:{ mykey: myvalue, answer: 42,myarray: [ "red", "green", "blue" ] }

Storage in BSON (short for Binary JSON):

a serialization format used to store documents and makeremote procedure calls in MongoDB

Excellent documentation:

www.mongodb.org

Web pages, videos, workshops, training, newsletters, . . .

13/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 19: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Terminology

RDBMS Terms/concepts MongoDB Terms/concepts

database database

collection

document

field

table

row

column

Join Embedding and Linking

Partition Shard

14/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 20: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Installation and Con�guration

To install:

get repositories, packages or source code fromhttp://www.mongodb.org

Red Hat/Fedora/CentOS:yum install mongodb-org-server (from repo)rpm -ihv mongodb-org-server-2.6.1-2.x86_64.rpm (from �le)

Ubuntu/Debian:apt-get install mongodb-org (from repo)dpkg -i mongodb-org_2.6.1_amd64.deb (from �le)

To con�gure:

edit /etc/mongod.con�g

Start the mongo service with e.g.:

service mongod start or systemctl start mongodb.service

15/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 21: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

ObjectID

ObjectID is a 12-byte BSON type, constructed using:

a 4-byte timestamp,

a 3-byte machine identi�er,

a 2-byte process id, and

a 3-byte counter, starting with a random value.

MongoDB uses ObjectId values as the default values for _id �elds{"_id" : ObjectId("52308273355b531d2aeb768b"),"loc" : {

"lat" : 68.91,"lon" : 46.98},

"id" : "MLS-Aura_L2GP-O3_v03-30-c01_2005d001.0","time" : ISODate("2004-12-31T23:59:41Z")}

16/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 22: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CreateRUD

1 db.collection.insert()

insert one or more documents into a collectionThe primary key_id is automatically added if _id �eld is notspeci�ed

2 db.collection.save()

depending on the document parameters, insert a newdocument or update an existing one

3 mongoimport/mongoexport

import content from a JSON, CSV, or TSV created bymongoexporta single threaded methodinserts one document at a time

4 mongodump/mongorestore

create a binary export of the database content/use a binarydump to import to databasea backup method

17/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 23: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Create in Perspective

db.authors.insert({id: 2,�rst_name: "Arthur",last_name: "Miller",age: 25,})

CREATE TABLE authors(id MEDIUMINT NOT NULLAUTO_INCREMENT,user_id Varchar(30),�rst_name char(15),last_name char(15),age Integer,PRIMARY KEY (id))

18/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 24: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CReadUD

db.collection.�nd(criteria, projection)

select documents in a collectionreturn a cursor (a pointer to the result set of a query)db.collection.�ndone() is its limited version

Cursor methods:

count(), limit(), skip(), snapshot(), sort(),batchSize(), explain(), hint(), forEach(), ...db.collection.�nd().count()

e.g. db.cities.�nd({"city": "KARLSRUHE"})

19/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 25: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Conditional Operators

Comparison Query Operators

$gt, $gte, $lt, $lte, $ne, $all, $in, $nin

Logical Query Operators

$and, $or, $nor, $not

Element Query Operators

$exists, $type

Geospatial Query Operators

$geoWithin, $centerSphere

e.g.db. cities.�nd({"loc": { $geoWithin: { $center: [[8.4, 49.017 ],0.1] }}})

20/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 26: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Read in Perspective

db.authors.�nd({last_name:"Miller"})SELECT *FROM authorsWHERE last_name = �Miller�

21/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 27: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CRUpadateD

1 db.collection.update(query, update, options):

modify existing documentsbased on query speci�cations can modify a �eld in or thewhole documentoptions are:

upsert: boolean insert new document if no match existsmulti: booleanwriteConcerns: document concern levels: (unacknowledged,acknowledged, journaled, ...)

2 db.collection.save()

if the document contains _id �eldsave will call update, with upsert option

22/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 28: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Atomic Modi�ers

Field operators:

$inc, $mul, $set, $unset,$rename

Array operators:

$, $push, $pull

db.books.update({ item: "Divine Comedy" },{ $set: { price: 18 },$inc: { stock: 5 }})

23/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 29: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

CRUDelete

1 db.collection.remove(query, options):

removes all documents from a collection, w/o indexesoptions are:

justOne booleanwriteConcerns

2 db.collection.drop()

drops the entire collection, including the indexes

24/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 30: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Delete in Perspective

db.authors.remove({ _id: 1})DELETE FROM authorsWHERE id = �1"

25/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 31: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Introduction

Index is special data structure at collection level.

Using indexes ensures that MongoDB only scans smallestpossible number of documents.

Any �eld or sub-�eld of documents in a collection can beindexed.

No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)

MongoDB does not support clustered indexes.

26/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 32: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Introduction

Index is special data structure at collection level.

Using indexes ensures that MongoDB only scans smallestpossible number of documents.

Any �eld or sub-�eld of documents in a collection can beindexed.

No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)

MongoDB does not support clustered indexes.

26/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 33: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Introduction

Index is special data structure at collection level.

Using indexes ensures that MongoDB only scans smallestpossible number of documents.

Any �eld or sub-�eld of documents in a collection can beindexed.

No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)

MongoDB does not support clustered indexes.

26/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 34: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Introduction

Index is special data structure at collection level.

Using indexes ensures that MongoDB only scans smallestpossible number of documents.

Any �eld or sub-�eld of documents in a collection can beindexed.

No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)

MongoDB does not support clustered indexes.

26/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 35: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Introduction

Index is special data structure at collection level.

Using indexes ensures that MongoDB only scans smallestpossible number of documents.

Any �eld or sub-�eld of documents in a collection can beindexed.

No more than 64 indexes per collection are possible. (MS SQLserver supports 1000 per table)

MongoDB does not support clustered indexes.

26/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 36: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Behaviour

All indexes are B-trees in MongoDB � support both equalityand range-based queries.

Index stores items internally in order, stored by the value of�eld.

Order of indexes can be ascending or descending.

MongoDB may traverse an index in either direction.

No scanning of any documents when the query and itsprojection are included in index.

27/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 37: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Behaviour

All indexes are B-trees in MongoDB � support both equalityand range-based queries.

Index stores items internally in order, stored by the value of�eld.

Order of indexes can be ascending or descending.

MongoDB may traverse an index in either direction.

No scanning of any documents when the query and itsprojection are included in index.

27/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 38: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Behaviour

All indexes are B-trees in MongoDB � support both equalityand range-based queries.

Index stores items internally in order, stored by the value of�eld.

Order of indexes can be ascending or descending.

MongoDB may traverse an index in either direction.

No scanning of any documents when the query and itsprojection are included in index.

27/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 39: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Behaviour

All indexes are B-trees in MongoDB � support both equalityand range-based queries.

Index stores items internally in order, stored by the value of�eld.

Order of indexes can be ascending or descending.

MongoDB may traverse an index in either direction.

No scanning of any documents when the query and itsprojection are included in index.

27/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 40: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Behaviour

All indexes are B-trees in MongoDB � support both equalityand range-based queries.

Index stores items internally in order, stored by the value of�eld.

Order of indexes can be ascending or descending.

MongoDB may traverse an index in either direction.

No scanning of any documents when the query and itsprojection are included in index.

27/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 41: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Handling in MongoDB

Building an index on a collection:

db.collection.createIndex()

Creating an index if it does not currently exists:

db.collection.ensureIndex()

Getting description of the existing indexes:

db.germancities.getIndexes()

Having a report on the query execution plan, including index use:

cursor.explain()

Forcing MongoDB to use a speci�c index for a query:

cursor.hint()

28/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 42: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Types(1)

1 Default _id:

All document have a generated index with ObjectId value forthis �eld_id unique index prevents users to generate two documentswith same values

2 Single-Key index:

Supports user-de�ned index on a single keydb.books.ensureIndex({ "author": 1 } )

29/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 43: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Types(2)

3 Compound Indexes

To support several di�erent queriesIf some queries use x key and some queries use x key combinedwith y keydb.books.ensureIndex( { "author": 1, "price": 1 } )A single compound index on multiple �elds can support all thequeries that search a �pre�x� subset of those �elds.Maximum 31 �elds can be combined in one compound index.Two important factors play their role:

list order (i.e. the order in which the keys are listed in theindex)sort order (i.e. ascending or descending)

30/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 44: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Compound Indexes

If the collection orders has the following compound index:{ shell: 1, ord_date: -1 }The compound index can support the following queries:

db.orders.�nd({ shell: { $in: ["B", "S" ] } } )

db.orders.�nd( {shell: {$in:[ "S", "B" ] },ord_date: { $gt: new Date("2014-02-01") } })

But not the following two queries:

db.orders.�nd({ ord_date: { $gt: new Date("2014-09-07") }} )

db.orders.�nd( { } ).sort({ ord_date: 1 } )

31/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 45: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Intersection

Before version 2.6, only a single index to ful�ll most queries.

The exception was just for $or clauses, which could use asingle index for each $or clause.

Intersection can employ multiple/nested index intersections toresolve a query.

Note: Index intersection does not apply when the sort()operation requires an index completely separate from the querypredicate.

32/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 46: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Intersection and Sort

The orders collection has the following indexes:

{ qty: 1 }

{ shell: 1, ord_date: -1 }

{ shell: 1 }

{ ord_date: -1 }

MongoDB cannot use index intersection for the following querywith sort:

db.orders.�nd({ qty: { $gt: 10 } } ).sort({ shell: 1 } )

MongoDB can use index intersection for the following query andful�ll part of the query predicate:

db.orders.�nd( { qty: { $gt: 10 } , shell: "B" } ).sort({ord_date: -1 } )

33/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 47: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Types(3)

4 Multikey Index:

index contents of arrays.separate index entries for every element of the array.Automatically generated for array �elds.

5 Geospatial Index:

To support e�cient queries of geospatial coordinate data.2d index support two dimensional geometry queries2dspher index support there dimensional geometry queries

6 Text Indexes:

supports searching for string content in a collection.

7 Hash Indexes:

more random distribution of values along their range.Just equality matches, not range-based queries

34/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 48: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Properties

Unique Indexes:

causes MongoDB to reject all documents that contain aduplicate value for the indexed �eld.db.members.ensureIndex( { "usr_id": 1 }, { unique: true } )

Sparse Indexes:

Only contains documents that contains indexed �led. (even ifit contains null value)db.addresses.ensureIndex( { "usr_tel": 1 }, { sparse: true } )

35/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 49: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Covered Query

Covered Query:

all the �elds in the query are part of an index, and

all the �elds returned in the results are in the same index.

Note: Try to Create Indexes that Support Covered Queries

36/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 50: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Covered Query Bene�ts

Querying the index can be much faster than document that are notin index, because:

Index keys are smaller than the documents they catalog.

Indexes are usually either available in RAM or locatedsequentially on disk.

37/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 51: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Covered Query Checking

To determine covered query:

use cursor.explain()

indexOnly: true covered query

Covered query cannot support:

multi-key index (If an indexed �eld is an array)

any of the indexed �elds are �elds in subdocuments.

38/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 52: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Index Strategy

Depends on:

kinds of queries you expect

the ratio of reads to writes

amount of free memory on the system

Best Index Design Strategy:

Pro�le variety of index con�gurations

39/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 53: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Traditional Schema Design

Put the data into First Normal Form (NF1)

Putting data in �xed tablesEach cell should have just a single scalar valueRemove redundancy from a set of tables

40/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 54: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Denormalization and Performance

What is the problem?

Seek takes over 99% of the the time spent reading a row.JOINs typically require random seeks.

41/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 55: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

MongoDB and Schema Design

Has performance bene�ts of redundant denormalized format

Complicated schema design process

Decide if you have to Embed or Reference

General answer to schema design problem with MongoDB:

It depends!!!

42/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 56: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Embedding Bene�ts

Read Performance:

data locality: documents are stored continuously on diskone seek to load the entire documentone round trip to the database

Atomicity and Isolation:

with Mongo not supporting multidocument transactionsguarantee consistency

43/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 57: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Embedding Bene�ts

Read Performance:

data locality: documents are stored continuously on diskone seek to load the entire documentone round trip to the database

Atomicity and Isolation:

with Mongo not supporting multidocument transactionsguarantee consistency

43/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 58: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Embedding Penalties

Write operations may take long.

The larger a document is, the more RAM it uses.

Growing documents must eventually get copied to largerspaces.

MongoDB documents have a hard size limit of 16 MB.

but: GridFS

44/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 59: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Embedding Penalties

Write operations may take long.

The larger a document is, the more RAM it uses.

Growing documents must eventually get copied to largerspaces.

MongoDB documents have a hard size limit of 16 MB.

but: GridFS

44/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 60: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Embedding Penalties

Write operations may take long.

The larger a document is, the more RAM it uses.

Growing documents must eventually get copied to largerspaces.

MongoDB documents have a hard size limit of 16 MB.

but: GridFS

44/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 61: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Embedding Penalties

Write operations may take long.

The larger a document is, the more RAM it uses.

Growing documents must eventually get copied to largerspaces.

MongoDB documents have a hard size limit of 16 MB.

but: GridFS

44/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 62: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Embedding Penalties

Write operations may take long.

The larger a document is, the more RAM it uses.

Growing documents must eventually get copied to largerspaces.

MongoDB documents have a hard size limit of 16 MB.

but: GridFS

44/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 63: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Where to use Linking?

Referencing for �exibility:

if your application may query data in many di�erent waysif you are not able to anticipate the patterns in which datamay be queried

45/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 64: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Embedding vs. Linking

Decide based on more READs or more WRITEs that yourapplication may have.

46/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 65: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Polymorphic Schema

Schemaless: not enforcing a particular structure ondocuments in a collection

Polymorphic schema: all documents in a collection hassimilar, but not identical structure

47/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 66: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Polymorphism

Support Object-Oriented Programming

Enable Schema Evolution

Its major drawback is storage e�ciency

48/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 67: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Databases in Web ApplicationsA Bit of History

The LAMP stack:

Linux, Apache, MySQL and PHP

Established work horse of Web applications worldwide

Free and Open Source

Fast

Simple to deploy and maintain

but

Three di�erent languages: JavaScript, PHP, SQL

Data always as tables

Problems scaling to big-data applications

49/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 68: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Databases in Web ApplicationsEnter MongoDB

Alternative solution: the MEAN stack

MongoDBExpress � Node.js Web-application frameworkAngularJS � JavaScript Model-View-Whatever frameworkNode.js � high-performance server platform for Webapplications

Flexible data structure

Optimised for big data and high load

One language throughout the stack

Free and Open Source

50/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 69: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

MongoDB and Node

O�cial native Node.js driver since 2012

Several higher-level wrappers. Examples:

Mongoose � o�cial, extensive Object Document MapperMongoskin � lightweight, schema-less, similar to pymongo

usage similar to the mongo shellasynchronous: methods take callback functions

51/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 70: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Example: MongoDB REST API

Problem: cannot talk to MongoDB from Web browser

Solution: provide a representational state-transfer API

Intermediate Node.js Web server

Mongoskin to talk to the database

Express to provide REST plumbing

Client I/O with JSON

52/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 71: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Example: MongoDB REST API

Goal: map HTTP requests for URIs likehttp://myMongo/collections/myColl/docId to MongoDB CRUD

With Express, just a few lines of code:

initialisationinclude JSON body parserde�ne handlers for appropriate URIs and HTTP verbsstart listening for requests!

Example code: https: // github. com/ mkszuba/ mongoREST. git

53/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 72: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Overview

Default con�guration:

no user authorisation � full access for all connections

full trust between cluster members

plain-text network communication

Unsuitable for production systems!

54/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 73: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Encryption

Data transfer: supports SSL

�sslMode on the command line, net.ssl.mode in con�gurationdi�erent degrees of enforcementvalid certi�cate(s) requirednot available in binary builds from mongodb.orb

Data at rest: not encrypted

use third-party solutions, e.g. LUKS or BitLocker

55/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 74: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Authorisation

Role-Based Access Control

an user assigned to one or more rolesa role contains a set of privilegesa privilege: a resource plus a set of allowed actionsa resource: collection, set of collections, database or cluster

Some built-in roles:

user access: read, readWritedatabase administration: dbAdmin, userAdmincluster administration: clusterManagerbackup, restoreroot

Custom roles possible

Users are added per-database but can have roles spanningother databases

56/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 75: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Authorisation

To enable:

pass �auth on the command line,set security.authorization in con�guration, orcreate a key �le, set security.keyFile in con�guration

Disables anonymous access

localhost exception: local access and no users de�ned

also applies to shards

57/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 76: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Authentication

Two modes

user accesscluster membership

Several mechanisms available

standard: challenge-response, X.509 certi�catesEnterprise: Kerberos, LDAP

One user per connection

58/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 77: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Challenge-Response Authentication

Users: validate password against user database

Clusters: same key �le on di�erent machines

contains a (preferably random) key 6�1024 bytes long, Base64character set

Transmits modi�ed MD5 password hash

SSL recommended � hash protected against replay attacksbut could be brute-forced

59/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 78: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

X.509 Authentication

As seen on the Web and elsewhere

SSL required

Users: certi�cate subject registered in user database

Clusters: unique certi�cate and key for each machine, in aPEM �le

Constraints on certi�cates

users: appropriate key-usage dataclusters: same CA for all members, subject must match hostnamemustn't re-use subject for user and cluster authentication

60/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 79: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Accounting

monitor database use, performance studies etc.

standard: basic logging, journal monitor, pro�ling

Enterprise: full event audit system

61/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 80: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Accounting

Journal Monitor

Journalling � a mechanism for ensuring data consistency

Provides basic monitoring features: serverStatus,journalLatencyTest

Useful for assessing overall performance

Pro�ling

estimate performance/cost of database operations

enabled per-instance or per-database

stores results in the system.pro�le collection

62/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 81: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Replication

Database Replication ensures:

redundancy

backup

automatic failover

Replication occurs through groups of servers known asreplica sets.

63/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 82: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Replica Set Members

Primary/Master:

The only node that accepts writesThe default read node

Secondary/Slave:

The read-only membersReplicate from primary

Arbiter:

Doesn't replicate a copy of data setCannot become PrimaryMay be used to add votes for reelection process

64/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 83: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Primary and Oplog

Primary saves all write operations in its OplogOplog is a capped collection which records all of themodi�cationsOplog by default usually 5% of disk spaceSecondaries copy the Oplog from Primary asynchronously

Writes Reads

Arbiter

Secondary Secondary

Secondary

Primary

Client Application/ Driver

Replication

Replication

Replication

65/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 84: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Secondary and Read Preferences

Read Preferences:

Primary

PrimaryPreferred

Secondary

SecondaryPreferred

Nearest

Client applications can be con�gured with a Read Preference, on a:

per-connection,

per-collection,

or per-operation basis.

66/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 85: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Election Process

Election starts every time there isn't any Primary.

Arbiter

Secondary Secondary

Secondary

Primary

Heartbeat

Heartbeat

Heartbeat

Heartbeat

Heartbeat

67/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 86: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

E�ective Factors

Priority:

Members will vote for the node with highest prioritySet priority to 0 to prevent Secondary to ever become Primary

Optime:

Timestamp of last operation applied from Oplog for eachmemberThe member with the most recent optime from all visiblemembers will get elected

Connections:

Primary candidate must be able to connect with majority ofthe membersMajority: total number of accessible voting membersChoose number of Replica set members by considering faulttolerance.

68/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 87: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Arbiter and Election

Maximum seven voting Members.

Use Arbiter to get odd number of members.

Arbiter

Secondary Secondary

Heartbeat

Heartbeat

Heartbeat

Heartbeat

Primary

69/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 88: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Primary/Secondary vs. Master/Slave

Primary/Secondary is the preferred MongoDB replicationmethod.

Primary/Secondary can not support more than 12 nodes.

Master/Slave doesn't support automatic failover.

70/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 89: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Introduction to Sharding

Automatic horizontal partitioning and management

No downtime is needed to convert to a sharded system

Fully consistent

No code changes required

At collection level

71/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 90: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Sharding Components

Con�g server: holds a complete copy of cluster metadata

Mongos: a lightweight routing service

Shards:

mongod: the primary daemon process for MongoDB systemreplica set

72/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 91: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

Sharding

client

mongos mongos

mongod

mongod

mongod

mongod

Config Srv

Config Srv

Config Srv

Replica Set Replica Set

Shard Shard

73/75 Parinaz Ameri, Marek Szuba 03.09.2014

Page 92: MongoDB as a NoSQL Database - KIT · 2014-09-05 · Introduction Marek Krzysztof Szuba ( marek.szuba@kit.edu ) PhD in high-energy particle physics Currently working at DLCL Climatology

Intro (No)SQL Start CRUD Indexing Schemas Web Apps AAE Accounting Replication Sharding

References

1 www.mongodb.org

2 MongoDB Applied Design Patterns, Rick Copeland, 2013

74/75 Parinaz Ameri, Marek Szuba 03.09.2014