mongodb nosql database a deep dive -mywhitepaper
TRANSCRIPT
Topic: Topic: NoSQLNoSQL Database Database –– MongoDBMongoDBPresenter: Rajesh KumarPresenter: Rajesh KumarSr. Data Architect Sr. Data Architect --Big Data Analytics & Information ManagementBig Data Analytics & Information Management
Agenda:• What is NoSQL ,Why NoSQL
• The different Types of NoSQL Databases & Data Model approach
• Detailed overview of one of the most popular NoSQL database–MongoDB
• Model- Document oriented database
• JSON
• CRUD Operation
• Model Data In MongoDB
• Data Model design consideration
• Indexing
• Sharding• Sharding
• Replication
• Use cases
• Reference Architecture
• Insurance Conceptual Data Model
Relational database has been so well but..Relational database has been so well but..
The relational Database has been excellent, But the world of data is rapidly changing. The amount of data created each year is almost doubling , and it is kind of data explosion. And these data are not simply transactional structured data. They are the new types of data-generated from web log, documents, clickstream, devices, censors & other IoT;.
Traditional RDBMS systems are not designed to handle such volume , variety and velocity of these (semi-structured & unstructured) data produced in such enormous quantity. Traditional RDBMS can’t provide scalability, performance, and flexibility needed for modern distributed data storage and processing .
Mongo DBMongo DB
A Document based database
MongoDBMongoDB-- A NoSQL DBA NoSQL DB
What is NoSQL What is NoSQL -- Not Only SQL ?Not Only SQL ?
Non relational,
distributed,
schema free,
flexible,
horizontal scalable,
open-source
simple API
Why NoSQL ?Why NoSQL ?
Support for distributed platform in the age of Big data
Ability to effectively deal with all kinds of data format images, docs, streaming, text, web, geospatial, sensor, machine , real time operational
Scalability and performance(low latency and faster data access )
Rapid scale - scale out as much as business need to support more user and growing data
24*7 data availability and global deployment
Data to support next gen high performance apps
Real time reporting and analytics (predictive analytics, Machine learning) support beyond their data warehouses support
Lowers data management cost Lowers data management cost
Types of NoSQL DatabasesTypes of NoSQL Databases Key/Value store – Memchased, DynamoDB,
Column Store – cassandra, Hbase
Document Store-MongoDB, CouchDB, DynamoDB
Graph Store- Neo4j
Multi-Model databases – DynamoDB,CouchDB
Mongo DB is document oriented database
Data structure is composed of key/value pair in JSON File format
What is MongoDB What is MongoDB ??
An Open source document oriented NoSQL database that provides high performance, automatic scaling and flexible schema design.
MongoDB fulfills both traditional and new requirementMongoDB fulfills both traditional and new requirement
NoSQL but fully featuredNoSQL but fully featured
A quick recap of MongoDB CharacteristicsA quick recap of MongoDB Characteristics
Distributed document oriented NoSQL Database
MongoDB store data in JSON-Documents represented as BSON
Dynamic and flexible schema
Horizontal scaling, easy to scale
Support reach query language
Support CRUD for read and write operation
Support for Text search and Geospatial queries
Efficient text and geospatial Index
Very strong sharding and replicationVery strong sharding and replication
_id : It’s a special key assign to each document
-id is unique across a collection
A record in MongoDB is a document, which is a data structure composed of A record in MongoDB is a document, which is a data structure composed of field(key) field(key) and value pairsand value pairs. The values of fields may include other nested . The values of fields may include other nested documents, arrays, and arrays of documents.documents, arrays, and arrays of documents.
MongoDB Data ModelMongoDB Data Model
MongoDB store document in JSON(BSON Actually)
JSON - short for JavaScript Object Notation
BSON is binary serialization of JSON objects
A JSON object is a key-value("key" : "value" )pair data format that is enclosed in curly braces { }
Document creation is free from schema- No structure, data type , size is required to be predefined. You can create as many fields as you require dynamically.
Data type supported BY JSON/BSON in MongoDB –Strings, Numbers(integer, long, double), Objects, Arrays, Boolean(true/false),Null, Date, Timestamp.
Other construct in MongoDB are Databases, collections, documents, fields
Mongo DB Data model core conceptsMongo DB Data model core concepts
Databases-In MongoDB databases is physical container of collection that holds collection of documents.
Collection- Collection is a container of documents, document can be anything.
Document- document is a group of fields in Key/Value pair and free from schema, table, column; adocument can hold any type of data.
Think of Collection and Documents as table & rows in RDBMS
Hierarchical
A document can reference other document
A document can contain other embedded document, array, arrays of document
Collection and DocumentCollection and Document
Mongo DB Data Mongo DB Data ModelModel-- A Document Store A Document Store ModelModelNot PDF , Word, CSV or HTML,Not PDF , Word, CSV or HTML,Documents Documents are nested structures created using JavaScript Object are nested structures created using JavaScript Object Notation(JSON). Notation(JSON). TThink of document as hink of document as a records in a records in below example, below example, lets see how lets see how a document look a document look like in MongoDBlike in MongoDB
MongoDB Document type areMongoDB Document type are
MongoDB system componentMongoDB system component
COMPONENTS
mongod - The database process.
mongo - The database shell (uses interactive javascript). The command line shell for interacting directly with database.
mongos - Sharding router
UTILITIES UTILITIES
mongostat - Show performance statistics
mongofiles - Utility for putting and getting files from MongoDB GridFS
mongoimport - Import into mongo from JSON or CSV
mongoexport - Export a single collection (JSON, CSV)
Basic Mongo Shell commandsBasic Mongo Shell commands
MongoDB stores documents in collections. If a collection does not exist, MongoDB creates the collection when you first store data for that collection.
Select/create Database : use customerdb
>db tells you the current database
List databases:
>show dbs
local 0.78125GB
test 0.23012GB
customerdb
myDBmyDB
Create collection:
db.createCollection(“products")
List collections,already created
>Show collections
Data Manipulation: Create & Read operationData Manipulation: Create & Read operation
DData manipulation frequently used methodsata manipulation frequently used methods
The createCollection() Method
db.createCollection(name, options)
The drop() Method
MongoDB's db.collection.drop() is used to drop a collection from the database.
Rename Collection:
>db.collection.renameCollection(“NewColName”)
>db.cusstomer.renameCollection(“Customer”)
The Insert Method ()
>db.COLLECTION_NAME.insert(document)
Query document using find method-
>db.COLLECTION_NAME.find()
Update() Method Update() Method
>db.COLLECTION_NAME.update(SELECTION_CRITERIA, UPDATED_DATA)
>db.col.update({“title”:”MongoDB '},{$set:{“title”: “MongoDB Definitive Guide”}})
The remove() Method
>db.col.remove({“title “ :”MongoDB”})
The sort() Method
>db.COLLECTION_NAME.find().sort({KEY:1})
sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.
Basic DB operations in a complex documentBasic DB operations in a complex document
Find operation
Querying in embedded object
Comparison operators
Querying in arrays of document
Indexing on embedded document
Indexing on multiple key
Model Your DataModel Your Data
Terminology:Terminology:
Example Schema.Example Schema.
Model Data in MongoDB: Model your data the way it is used.Model Data in MongoDB: Model your data the way it is used.
Lets Model some more data ..Lets Model some more data ..
Some schema design considerationsSome schema design considerations
What is priority High consistency
High read performance
High write performance
ODS application
Real time
How does the application access and manipulate data
Data access path and types of queries
Read versus write ratio Read versus write ratio
Analytics( aggregation, video, images, machine, geospatial data)
IndexesIndexes--Indexes are special data structure that store subset of your data in an efficient Indexes are special data structure that store subset of your data in an efficient way for easy & faster access to the dataway for easy & faster access to the data
MongoDB store Index in a b-tree format which allows efficient traversal to the index content
Proper Index selection is important in MongoDB and makes DB run optimally, improper Indexing may bring system to a lot of issues in read-write operations and data distribution across shardedcluster)
Indexes Types:
-id
Simple
Compound
Multi key
Full Text Full Text
Geo-spatial
Hashed
Index continued..Index continued..
The –id index : It is automatically created, immutable and can’t be removed.
This is same as primary key in RDBMS.
Default value is a 12 byte Object ID
4-Byte timestamp, 3-byte machine id, 2-byte process id,3-byte counter
Simple Index: A simple Index is an Index on a single key
Compound Index: A compound Index is created over two or more fields in a document
Multi-key Index: A multi-key Index is an Index created on a field that contains an array
Full-text search Index: This is an Index over a text based field, similar to how google indexes web pages. e.g Find all tweets that mention auto insurance within 30 days. Search Big Data in a blogpost or all the tweets in last 30 days.
Geo-spatial Index: This Index is to support efficient queries of geospatial coordinate data .It is Geo-spatial Index: This Index is to support efficient queries of geospatial coordinate data .It is used when you need to query location based spatial data. This Index is really a great feature because location based data is one of the valuable data being collected today for targeted location based customer, location based product analysis . e.g Find all customers that live within 50 miles of NY.
Hashed Index: Used mainly in Hash based sharding, and allows for more randomized data distribution across shards
Create Index syntax:
db.employee.ensureIndex({“email”:1},{“unique”:true})
db.employee.ensureIndex({“age”;1}, {“sparse”: true})
db.employee.find({age: {$gte :25}})
Index Continue..Index Continue..
Index Properties:
TTL Index-TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time
Sparse Index-The sparse property of an index ensures that the index only contain entries for documents that have the indexed field. The index skips documents that do not have the indexed field.
Unique Index- To enable the uniqueness of the field.
Text Search Index:
MongoDB provides text indexes to support text search queries on text content.To perform text search queries, you must have a text index on your collection. A collection can only have one text search index, but that index can cover multiple fields.
Creating text search Index over the ”title” and “content” fields :
db.blogpost.ensureIndex( { title: "text", content: "text" } )db.blogpost.ensureIndex( { title: "text", content: "text" } )
Use the $text query operator to perform text searches
on a collection with a text index.
$text perform a logical OR of all such on the intended search string.
For example, we can use the following query to find term MongoDB and BigData in the blogpost.
db.blogpost.find( { $text: { $search: “MongoDB" } } )
db.blogpost.find({$text:{$search:”BigData”}})
Deleting Text Index: To delete an existing text index, first find the name of index using the following query,
to get the name of the index >db.blogpost.getIndexes()
Now you can drop the text Index: >db.blogpost.dropIndex(“title_text_content_text")
TText ext indexesindexes to support text search to support text search analyticsanalytics--By exampleBy example
Mongo DB Mongo DB ShardingSharding
Sharding is a method for storing data across multiple machines in clustered computing environment. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
Purpose of Sharding
When Database system grows very large, capacity of the single server machine can be challenged in increased work load and high concurrent user that demands high throughput . After a certain level ,you can’t keep doing vertical scaling by adding more CPU,RAM and storage, vertical scaling has limitations.
In contrast, Sharding works on Horizontal scaling; divides the data sets and distribute the data over the multiple shards servers. Each shards work as an independent database and collectively all the shards make up a single logical database unit.collectively all the shards make up a single logical database unit.
Sharding reduces the amount of data that each server needs to store. When data grows you can add more shards in the cluster and subsequently each shard stores less data as the cluster grows.
For example, if a database has a 1 terabyte data set, and there are 4 shards, then each shard might hold only 256GB of data. If there are 40 shards, then each shard might hold only 25GB of data
Shards in Mongo DB Architecture Shards in Mongo DB Architecture
ReplicationReplication
The primary accepts all write operations. Then the secondary replicate the oplogto apply to their data sets.
Replication Continue..Replication Continue..
Replica set membersA replica set in MongoDB is a group of mongod processes that provide redundancy and high availability. The members of a replica set are:
Primary- It receives all write operations and records the operation in primary oplog.
Secondary – Secondary member replicate operations from the primary to maintain an identical copy of data set to recover from failure.
Note :The minimum recommended configuration for a replica set is: A primary, a secondary, and an arbiter. Most deployments, will keep three members that store data: A primary and two secondary members
Use Use casescases-- Type of workload suitable with NoSQLType of workload suitable with NoSQL
Mobile app development
Internet of things
Digital advertisement
Streaming application
Web application
Social applications
Gaming
Content management
Customer personalization
Recommendation engine
360 customer view of customer, business, product
Fraud detection
Real time analytics Gaming Real time analytics
MongoDB supports for programming languagesMongoDB supports for programming languages
Other cool stuffOther cool stuff
Sharding
Aggregation and map/reduce
Storage engine- Wired Tiger
Capped collection
GridFS
Text and GeoSpatial Index
Use of python, Java Scripting language for complex data handling Use of python, Java Scripting language for complex data handling
Replication
That’s itThank you !Email me:[email protected]
Follow me on Twitter: @rajesh14k