![Page 1: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/1.jpg)
A Practical Look at theNOSQL and Big Data Hullabaloo
Level: Intermediate
Andrew J. BrustCEO and FounderBlue Badge Insights
Sam BisbeeSenior Doing Stuff Person
Cloudant(In Absentia)
![Page 2: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/2.jpg)
• CEO and Founder, Blue Badge Insights• Big Data blogger for ZDNet• Microsoft Regional Director, MVP• Co-chair VSLive! and 17 years as a speaker• Founder, Microsoft BI User Group of NYC
– http://www.msbinyc.com
• Co-moderator, NYC .NET Developers Group– http://www.nycdotnetdev.com
• “Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News
• brustblog.com, Twitter: @andrewbrust
Meet Andrew
![Page 4: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/4.jpg)
Read all about it!
![Page 5: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/5.jpg)
Meet Sam
• Wait…you can’t. He’s not here.• Sam Bisbee
– Director of Technical Business Development, Cloudant
– He prefers “Senior Doing Stuff Person”Which is ironic
• I’ve preserved a few of his slides.• Look for: From Sam in upper-right-hand corner
![Page 6: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/6.jpg)
Agenda
• Why NoSQL?• NoSQL Definition(s)• Concepts• NoSQL Categories• Provisioning, market, applicability• Take-aways
![Page 8: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/8.jpg)
NoSQL Data Fodder
Addresses Preferences
NotesFriends,
Followers
Documents
![Page 9: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/9.jpg)
“Web Scale”• This the term used to justify
NoSQL• Scenario is simple needs
but “made up for in volume”– Millions of concurrent users
• Think of sites like Amazon or Google
• Think of non-transactional tasks like loading catalog data to display product page, or environment preferences
![Page 10: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/10.jpg)
NOSQL DEFINITION(S)
![Page 11: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/11.jpg)
What is NOSQL?
• “Not Only SQL” - this is not a holy war
• 1870: Modern study of set theory begins
• 1970: Codd writes “A Relational Model of Data for Large Shared Data Banks”
• 1970 – 1980: Commercial implementations of Codd's theory are released
From Sam
![Page 12: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/12.jpg)
What is NOSQL?
• 1970 - ~2000: the same sorts of databases were made (plus a few niche products)
• Dot-Com Bubble forced the same data tier problems but at a new scale (Amazon), forcing innovation out of necessity
• 2000 – present: innovations are becoming open source and “main stream” (Hadoop)
From Sam
![Page 13: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/13.jpg)
So What is NOSQL Really?
New ways of looking at dynamic data storage
and querying for larger scale systems.
(scale = concurrent users and data size)
From Sam
![Page 14: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/14.jpg)
NoSQL Common Traits
• Non-relational• Non-schematized/schema-free• Open source• Distributed• Eventual consistency• “Web scale”• Developed at big Internet companies
![Page 15: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/15.jpg)
CONCEPTS
![Page 16: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/16.jpg)
Consistency
• CAP Theorem– Databases may only excel at two of the following
three attributes: consistency, availability and partition tolerance
• NoSQL does not offer “ACID” guarantees– Atomicity, consistency, isolation and durability
• Instead offers “eventual consistency”– Similar to DNS propagation
![Page 17: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/17.jpg)
Consistency
• Things like inventory, account balances should be consistent– Imagine updating a server in Seattle that stock was depleted– Imagine not updating the server in NY– Customer in NY goes to order 50 pieces of the item– Order processed even though no stock
• Things like catalog information don’t have to be, at least not immediately– If a new item is entered into t he catalog, it’s OK for some
customers to see it even before the other customers’ server know about it
• But catalog info must come up quickly– Therefore don’t lock data in one location while waiting to update
he other
• Therefore, OK to sacrifice consistency for speed, in some cases
![Page 18: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/18.jpg)
CAP Theorem
Consistency
Availability
Partition Tolerance
Relational
NoSQL
![Page 19: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/19.jpg)
Indexing
• Most NoSQL databases are indexed by key• Some allow so-called “secondary”
indexes• Often the primary key indexes are
clustered• HBase uses Hadoop Distributed File
System, which is append-only– Writes are logged– Logged writes are batched– File is re-created and sorted
![Page 20: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/20.jpg)
Queries
• Typically no query language• Instead, create procedural program• Sometimes SQL is supported• Sometimes MapReduce code is used…
![Page 21: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/21.jpg)
MapReduce
• Map step: pre-processes data• Reduce step: summarizes/aggregates data• Most typical of Hadoop and used with
Wide Column Stores, esp. HBase• Amazon Web Services’ Elastic MapReduce
(EMR) can read/write DynamoDB, S3, Relational Database Service (RDS)
• “Hive” offers a HiveQL (SQL-like) abstraction over MR– Use with Hive tables– Use with HBase
![Page 22: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/22.jpg)
Sharding
• A partitioning pattern where separate servers store partitions
• Fan-out queries supported• Partitions may be duplicated, so
replication also provided– Good for disaster recovery
• Since “shards” can be geographically distributed, sharding can act like a CDN
• Good for keeping data close to processing– Reduces network traffic when MapReduce splitting
takes place
![Page 23: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/23.jpg)
NOSQL CATEGORIES
![Page 24: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/24.jpg)
Key-Value Stores
• The most common; not necessarily the most popular
• Has rows, each with something like a big dictionary/associative array– Schema may differ from row to row
• Common on Cloud platforms– e.g. Amazon SimpleDB, Azure Table Storage
• MemcacheDB, Voldemort, Couchbase• DynamoDB (AWS), Dynomite, Redis and Riak
![Page 25: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/25.jpg)
Key-Value Stores
Table: CustomersRow ID: 101
First_Name: AndrewLast_Name: BrustAddress: 123 Main StreetLast_Order: 1501
Row ID: 202First_Name: JaneLast_Name: DoeAddress: 321 Elm StreetLast_Order: 1502
Table: Orders
Row ID: 1501Price: 300 USDItem1: 52134Item2: 24457
Row ID: 1502Price: 2500 GBPItem1: 98456Item2: 59428
Database
![Page 26: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/26.jpg)
Wide Column Stores
• Has tables with declared column families– Each column family has “columns” which are KV pair that
can vary from row to row
• These are the most foundational for large sites– Big Table (Google)– HBase (Originally part of Yahoo-dominated Hadoop project)– Cassandra (Facebook)Calls column families “super columns” and tables “super
column families”
• They are the most “Big Data”-ready– Especially HBase + Hadoop
![Page 27: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/27.jpg)
Wide Column Stores
Table: CustomersRow ID: 101
Super Column: Name Column: First_Name: Andrew Column: Last_Name: BrustSuper Column: Address Column: Number: 123 Column: Street: Main StreetSuper Column: Orders Column: Last_Order: 1501
Table: Orders
Row ID: 1501Super Column: Pricing Column: Price: 300 USDSuper Column: Items Column: Item1: 52134 Column: Item2: 24457Row ID: 1502Super Column: Pricing Column: Price: 2500 GBPSuper Column: Items Column: Item1: 98456 Column: Item2: 59428
Row ID: 202Super Column: Name Column: First_Name: Jane Column: Last_Name: DoeSuper Column: Address Column: Number: 321 Column: Street: Elm StreetSuper Column: Orders Column: Last_Order: 1502
![Page 28: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/28.jpg)
Wide Column Stores
![Page 29: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/29.jpg)
Document Stores• Have “databases,” which are akin to tables• Have “documents,” akin to rows
– Documents are typically JSON objects– Each document has properties and values– Values can be scalars, arrays, links to documents in other databases or
sub-documents (i.e. contained JSON objects - Allows for hierarchical storage)
– Can have attachments as well
• Old versions are retained– So Doc Stores work well for content management
• Some view doc stores as specialized KV stores• Most popular with developers, startups, VCs• The biggies:
– CouchDB– Derivatives
– MongoDB
![Page 30: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/30.jpg)
Document StoreApplication Orientation
• Documents can each be addressed by URIs
• CouchDB supports full REST interface• Very geared towards JavaScript and JSON
– Documents are JSON objects– CouchDB/MongoDB use JavaScript as native
language
• In CouchDB, “view functions” also have unique URIs and they return HTML– So you can build entire applications in the database
![Page 31: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/31.jpg)
Document Stores
Database: CustomersDocument ID: 101
First_Name: AndrewLast_Name: BrustAddress:
Orders:
Database: Orders
Document ID: 1501Price: 300 USDItem1: 52134Item2: 24457
Document ID: 1502Price: 2500 GBPItem1: 98456Item2: 59428
Number: 123Street: Main Street
Most_recent: 1501
Document ID: 202First_Name: JaneLast_Name: DoeAddress:
Orders:
Number: 321Street: Elm Street
Most_recent: 1502
![Page 32: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/32.jpg)
Document Stores
![Page 33: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/33.jpg)
Graph Databases
• Great for social network applications and others where relationships are important
• Nodes and edges– Edge like a join– Nodes like rows in a table
• Nodes can also have properties and values
• Neo4j is a popular graph db
![Page 34: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/34.jpg)
Graph Databases
Database
Sent invitation to
Commented on photo by
Friend of
Address
Placed order
Item2
Item1
Joe Smith Jane Doe
Andrew Brust
Street: 123 Main StreetCity: New YorkState: NYZip: 10014
ID: 52134Type: DressColor: Blue
ID: 24457Type: ShirtColor: Red
ID: 252Total Price: 300 USD
George Washington
![Page 35: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/35.jpg)
PROVISIONING, MARKET, APPLICABILITY
![Page 36: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/36.jpg)
NoSQL on Windows Azure
• Platform as a Service– Cloudant: https://cloudant.com/azure/– MongoDB (via MongoLab):
http://blog.mongolab.com/2012/10/azure/
• MongoDB, DIY: – On an Azure Worker Role:
http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+Worker+Roles
– On a Windows VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Windows+Installer
– On a Linux VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Linux+Tutorialhttp://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/
![Page 37: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/37.jpg)
NoSQL on Windows Azure
• Others, DIY (Linux VMs):– Couchbase:
http://blog.couchbase.com/couchbase-server-new-windows-azure
– CouchDB: http://ossonazure.interoperabilitybridges.com/articles/couchdb-installer-for-windows-azure
– Riak: http://basho.com/blog/technical/2012/10/09/Riak-on-Microsoft-Azure/
– Redis: http://blogs.msdn.com/b/tconte/archive/2012/06/08/running-redis-on-a-centos-linux-vm-in-windows-azure.aspx
– Cassandra: http://www.windowsazure.com/en-us/manage/linux/other-resources/how-to-run-cassandra-with-linux/
![Page 38: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/38.jpg)
The High-Level Shake Out
• Hadoop will continue to crush data warehousing
• MongoDB will be the top MySQL / on-prem alternative
• Cloudant will be the top as-a-Service / Cloud database
• Basho is pivoting toward cloud object store
From Sam
![Page 39: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/39.jpg)
NoSQL + BI
• NoSQL databases are bad for ad hoc query and data warehousing
• BI applications involve models; models rely on schema
• Extract, transform and load (ETL) may be your friend
• Wide-column stores, however are good for “Big Data”– See next slide
• Wide-column stores and column-oriented databases are similar technologically
![Page 40: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/40.jpg)
NoSQL + Big Data• Big Data and NoSQL are interrelated• Typically, Wide-Column stores used in Big
Data scenarios• Prime example:
– HBase and Hadoop
• Why?– Lack of indexing not a problem– Consistency not an issue– Fast reads very important– Distributed files systems important too– Commodity hardware and disk assumptions also
important– Not Web scale but massive scale-out, so similar concerns
![Page 41: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/41.jpg)
TAKE-AWAYS
![Page 42: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/42.jpg)
Compromises
• Eventual consistency• Write buffering• Only primary keys can be indexed• Queries must be written as programs• Tooling
– Productivity (= money)
![Page 43: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/43.jpg)
Summing Up
• Line of Business -> Relational• Large, public (consumer)-facing sites -> NoSQL
• Complex data structures -> Relational• Big Data -> NoSQL
• Transactional -> Relational• Content Management -> NoSQL
• Enterprise->Relational • Consumer Web -> NoSQL
![Page 44: A Practical Look at the NOSQL and Big Data Hullabaloo](https://reader033.vdocument.in/reader033/viewer/2022061217/54b58ca24a79592a3e8b4604/html5/thumbnails/44.jpg)
Thank you
• [email protected]• @andrewbrust on twitter• Want to get the free “Redmond Roundup
Plus?”Text “bluebadge” to 22828