mongodb - csuohio.edugrail.cba.csuohio.edu/~sschung/cis433/lecturenotes_mongodb.pdf · mongodb...
TRANSCRIPT
MongoDB
� MongoDB is an open-source database and classified as a NoSQL database.
� The primary reason for the development of MongoDB is to make scaling easier as well as the need for semi-structured data.
� MongoDB belongs to the type of document-oriented database in which data is organized as JSON document, and store into a collection.
2
Architecture
� MongoDB is a NoSQL database, which means the mechanism for storage and retrieval of data is modeled in means other than tabular relation used in relational database.
� It has rich data structures with dynamic attributes, mixed structure, text, media, arrays and other complex types.
� MongoDB is flexible as it evolves over time to accommodate new features and requirements.
� Object-oriented programming languages interact with data in structures that are dramatically different from the way is stored in a relational database.
3
Features
� Data is stored in a structure that maps to object in modern programming language
� Rich index and query support, including secondary, geospatial and text search indexes, native MapReduce…
� MongoDB system capacity can dynamically increase
� Support data replication, failure tolerance
� Data is read and written in RAM providing fast performance.
4
Data Model
� MongoDB stores data as documents in a binary representation call BSON (Binary JSON) so it’s called document-oriented database.
� BSON extends the JSON (JavaScript Object Notation) representation to include additional types such as int, long, and floating point.
� BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data and sub-documents.
� Document and Collection can be seen as equivalent to Record and table in relational database system.
� A document is an ordered set of keys with associated values. The values could be one of several different data types: string, integer, etc. But the keys are strings and documents in MongoDB cannot contain duplicate keys.
{"greeting" : "Hello, world!", "foo" : 3}
� A collections is a group of documents and has a dynamic schema.
5
Storage Model
� MongoDB uses a memory map file that directly map a data file on disk to byte array in memory where data access is implemented using pointer arithmetic.
� Each document collection is stored in one namespace file as well as multiple extent data files.
� Each collection is organized in a linked list of extents each of which represents a contiguous disk space, and each document contains alinked list to other documents as well as the actual encoded in BSON format.
� MongoDB’s high availability is achieved via Replica Set which provides data redundancy across multiple physical servers including a single primary DB as well as multiple secondary DBs.
� All modifications request go to the primary DB then each modification is made and replicated asynchronously to the secondary DBs.
6
ACID in MongoDB
� Data that read is treated as a snapshot, which means it may has been changed in the database.
� In order to maintain consistency, a condition is attached along with modification request so that the DB server can validate the condition before applying the modification request.
� One way to achieve this isolation is to use findAndModify operation. This command returns either the previous or updated values of the documents.
� Transaction concept also missing in MongoDB, which there is no guarantee multiple documents update. In this case, developers are responsible to implement multi-update across multiple documents.
� A separate document is created and links all documents that need to be modified. Then all the modifications are done in sequence for each document.
7
Major Differences from RDBMS
� RDBMS has fixed number of data type, while MongoDB documents can contains multiple-value field because it has nested structure.
� Documents of any structure can be stored in the same collection without a defined schema.
� MongoDB has no join operations, transactions and atomicity is guaranteed only at document level.
� There is also no concept of isolation, which means any data read by one client may have its value modified by another concurrent client.
8
Installation
MongoDB 2.4.9 (mongodb-osx-x86_64-2.4.9)
To start a MongoDB instance:
$ mongod
mongod --help for help and startup options
Tue Apr 1 15:19:17.445 [initandlisten] MongoDB starting : pid=616 port=27017 dbpath=/data/db/ 64-bit host=Thuats-MacBook-Pro.local
Tue Apr 1 15:19:17.445 [initandlisten]
Tue Apr 1 15:19:17.445 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
Tue Apr 1 15:19:17.445 [initandlisten] db version v2.4.9
Tue Apr 1 15:19:17.445 [initandlisten] git version:
…
9
MongoDB Shell� MongoDB comes with a JavaScript shell that allows interaction with a MongoDB
instance from the command line.
� The shell is a full-featured JavaScript interpreter, capable of running JavaScript programs.
� To start the shell:
$ mongo
MongoDB shell version: 2.4.9
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
Server has startup warnings:
Tue Apr 1 15:19:17.445 [initandlisten]
Tue Apr 1 15:19:17.445 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
>
10
MongoDB Command
� To show current databases
> show dbs
local 0.078125GB
� To create a new database:
> use blog
If there is a database exists, then it switches to that one.
11
MongoDB CRUD Query
� The CRUD operations used to manipulate and view data in the shell.
� Create a new document:
> post = {"title": "My Blog Post",
"content" : "This is a blog post.",
"data" : new Date()}
{
"title" : "My Blog Post",
"content" : "This is a blog post.",
"data" : ISODate("2014-04-01T19:39:36.521Z")
}
� ‘post’ is a JavaScript object represents the documents, there are three keys ‘title’, ‘content’, and ‘date’
12
MongoDB CRUD Query
� Insert into collection:
> db.blog.insert(post)
� To see the collection:
> db.blog.find()
{ "_id" : ObjectId("533b16898bce20d2fd851cfc"), "title" : "My Blog Post", "content" : "This is a blog post.", "data" : ISODate("2014-04-01T19:39:36.521Z") }
> db.blog.findOne()
{
"_id" : ObjectId("533b16898bce20d2fd851cfc"),
"title" : "My Blog Post",
"content" : "This is a blog post.",
"data" : ISODate("2014-04-01T19:39:36.521Z")
}
13
MongoDB CRUD QueryTo see how MongoDB created that document:
> db.blog.find().explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"server" : "Thuats-MacBook-Pro.local:27017"
}
14
MongoDB CRUD Query
� To update:
> post.comments = []
[ ]
> db.blog.update({title: "My Blog Post"}, post)
> db.blog.findOne()
{
"_id" : ObjectId("533b16898bce20d2fd851cfc"),
"title" : "My Blog Post",
"content" : "This is a blog post.",
"data" : ISODate("2014-04-01T19:39:36.521Z"),
"comments" : [ ]
}
15
MongoDB CRUD Query
� To delete:
> db.blog.remove({title : "My Blog Post"})
> db.blog.findOne()
null
� To build index:
> db.blog.ensureIndex({title:1})
16
MongoDB CRUD QueryTo show all existing indexes:
> db.blog.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "blog.blog",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"title" : 1
},
"ns" : "blog.blog",
"name" : "title_1"
}
17
MongoDB CRUD Query
� To remove index:
> db.blog.dropIndex({title:1})
{ "nIndexesWas" : 2, "ok" : 1 }
18
MongoDB Application
� MongoDB Drivers and Client Libraries:
� MongoDB supports variety of modern programming languages including C, C++, C#, Java, Node.js, PHP, Python…
19
MongoDB Import/Export
� MongoDB can import input files of formats JSON, CSV or TSV and also can export database to those format using mongoimport and mongoexport respectively.
� Syntax:
mongoimport --collection collection --file collection.json
mongoexport --collection collection --out collection.json
20
MongoDB Import/Export
� Import a CSV file (NASDAQ_daily_prices_B.csv) into MongoDB collection stocks
$ cat NASDAQ_daily_prices_B.csv
exchange,stock_symbol,date,stock_price_open,stock_price_high,stock_price_low,stock_price_close,stock_volume,stock_price_adj_close
NASDAQ,BBND,2010-02-08,2.92,2.98,2.86,2.96,483800,2.96
NASDAQ,BBND,2010-02-05,2.85,2.94,2.79,2.93,884000,2.93
NASDAQ,BBND,2010-02-04,2.83,2.88,2.78,2.83,1333300,2.83
NASDAQ,BBND,2010-02-03,2.98,3.03,2.80,2.83,1015800,2.83
NASDAQ,BBND,2010-02-02,3.05,3.10,2.96,2.97,513100,2.97
NASDAQ,BBND,2010-02-01,3.11,3.13,3.00,3.04,997000,3.04
NASDAQ,BBND,2010-01-29,3.01,3.14,2.96,3.14,1132900,3.14
…
21
MongoDB Import/Export$ mongoimport --db stocks --collection nasdaq_daily_prices --type csv --file /Users/nqt289/Desktop/NASDAQ_daily_prices_B.csv --headerline
connected to: 127.0.0.1
Thu Apr 10 05:24:46.009 Progress: 780677/21998523 3%
Thu Apr 10 05:24:46.009 14000 4666/second
Thu Apr 10 05:24:49.004 Progress: 2011431/21998523 9%
Thu Apr 10 05:24:49.004 36200 6033/second
Thu Apr 10 05:24:52.004 Progress: 3300955/21998523 15%
Thu Apr 10 05:24:52.004 58600 6511/second
Thu Apr 10 05:24:55.005 Progress: 4575925/21998523 20%
Thu Apr 10 05:24:55.006 81300 6775/second
Thu Apr 10 05:24:58.009 Progress: 5845580/21998523 26%
Thu Apr 10 05:24:58.009 104000 6933/second
Thu Apr 10 05:25:34.005 374000 7333/second
…
Thu Apr 10 05:25:35.956 check 9 388777
Thu Apr 10 05:25:35.956 imported 388776 objects
22
MongoDB Import/Export
� Check result collection in the shell:
> show dbs
blog0.203125GB
local 0.078125GB
stocks 0.453125GB
> use stocks
switched to db stocks
> show tables
nasdaq_daily_prices
system.indexes
23
MongoDB Import/Export
> db.nasdaq_daily_prices.find().limit(5)
{ "_id" : ObjectId("5346635c6857e587111a2466"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-08", "stock_price_open" : 2.92, "stock_price_high" : 2.98, "stock_price_low" : 2.86, "stock_price_close" : 2.96, "stock_volume" : 483800, "stock_price_adj_close" : 2.96 }
{ "_id" : ObjectId("5346635c6857e587111a2467"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-05", "stock_price_open" : 2.85, "stock_price_high" : 2.94, "stock_price_low" : 2.79, "stock_price_close" : 2.93, "stock_volume" : 884000, "stock_price_adj_close" : 2.93 }
{ "_id" : ObjectId("5346635c6857e587111a2468"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-04", "stock_price_open" : 2.83, "stock_price_high" : 2.88, "stock_price_low" : 2.78, "stock_price_close" : 2.83, "stock_volume" : 1333300, "stock_price_adj_close" : 2.83 }
{ "_id" : ObjectId("5346635c6857e587111a2469"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-03", "stock_price_open" : 2.98, "stock_price_high" : 3.03, "stock_price_low" : 2.8, "stock_price_close" : 2.83, "stock_volume" : 1015800, "stock_price_adj_close" : 2.83 }
{ "_id" : ObjectId("5346635c6857e587111a246a"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-02", "stock_price_open" : 3.05, "stock_price_high" : 3.1, "stock_price_low" : 2.96, "stock_price_close" : 2.97, "stock_volume" : 513100, "stock_price_adj_close" : 2.97 }
24
MongoDB Import/Export
� Export that collection to JSON format:
$ mongoexport -d stocks -c nasdaq_daily_prices -q "{stock_price_open: { \$gte: 50 }}" --out /Users/nqt289/Desktop/gte50.json
connected to: 127.0.0.1
25
MongoDB Import/Export
� exported 9911 records
$ cat gte50.json
{ "_id" : { "$oid" : "5346635d6857e587111a4cda" }, "exchange" : "NASDAQ", "stock_symbol" : "BOLT", "date" : "2007-07-25", "stock_price_open" : 51,
"stock_price_high" : 51.47, "stock_price_low" : 44.1, "stock_price_close" : 47.04, "stock_volume" : 1109600, "stock_price_adj_close" : 31.36 }
{ "_id" : { "$oid" : "5346635d6857e587111a4cdb" }, "exchange" : "NASDAQ", "stock_symbol" : "BOLT", "date" : "2007-07-24", "stock_price_open" : 52.4, "stock_price_high" : 52.4, "stock_price_low" : 48.55, "stock_price_close" : 49.43, "stock_volume" : 650600, "stock_price_adj_close" : 32.95 }
…
26