comparing sql and nosql dbs
TRANSCRIPT
![Page 1: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/1.jpg)
Comparing SQL and noSQL Databases
1
![Page 2: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/2.jpg)
The following is a short, incomplete history of the SQL1987 – Initial ISO/IEC Standard1989 – Referential Integrity1992 – SQL2
1995 SQL/CLI (ODBC)1996 SQL/PSM – Procedural Language extensions
1999 – User Defined Types2003 – SQL/XML2008 – Expansions and corrections2011 (or 2012) System Versioned and Application
Time Period Tables
2
Standard SQL
![Page 3: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/3.jpg)
Data stored in columns and tablesRelationships represented by dataData Manipulation LanguageData Definition Language TransactionsAbstraction from physical layer
3
SQL Characteristics
![Page 4: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/4.jpg)
Applications specify what, not howQuery optimization enginePhysical layer can change without modifying
applicationsCreate indexes to support queriesIn Memory databases
4
SQL Physical Layer Abstraction
![Page 5: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/5.jpg)
Data manipulated with Select, Insert, Update, & Delete statementsSelect T1.Column1, T2.Column2 …
From Table1, Table2 …Where T1.Column1 = T2.Column1 …
Data AggregationCompound statementsFunctions and ProceduresExplicit transaction control
5
Data Manipulation Language (DML)
![Page 6: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/6.jpg)
Schema defined at the startCreate Table (Column1 Datatype1, Column2
Datatype 2, …)Constraints to define and enforce relationships
Primary KeyForeign KeyEtc.
Triggers to respond to Insert, Update , & DeleteStored ModulesAlter …Drop …Security and Access Control
6
Data Definition Language
![Page 7: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/7.jpg)
Atomic – All of the work in a transaction completes (commit) or none of it completes
Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints.
Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.
Durable – The results of a committed transaction survive failures
7
Transactions – ACID Properties
![Page 8: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/8.jpg)
CommercialIBM DB2Oracle RDMSMicrosoft SQL ServerSybase SQL Anywhere
Open Source (with commercial options) MySQLIngres
Significant portions of the world’s economy use SQL databases!
8
SQL Database Examples
![Page 9: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/9.jpg)
From www.nosql-database.org:Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge data amount, and more.
9
NoSQL Definition
![Page 10: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/10.jpg)
http://www.nosql-database.org/ lists 122 NoSQL DatabasesCassandraCouchDBHadoop & HbaseMongoDBStupidDBEtc.
10
NoSQL Products/Projects
![Page 11: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/11.jpg)
Large data volumesGoogle’s “big data”
Scalable replication and distributionPotentially thousands of machinesPotentially distributed around the world
Queries need to return answers quicklyMostly query, few updatesAsynchronous Inserts & UpdatesSchema-lessACID transaction properties are not needed – BASECAP TheoremOpen source development
11
NoSQL Distinguishing Characteristics
![Page 12: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/12.jpg)
Acronym contrived to be the opposite of ACIDBasically Available, Soft state,Eventually Consistent
CharacteristicsWeak consistency – stale data OKAvailability firstBest effortApproximate answers OKAggressive (optimistic)Simpler and faster
12
BASE Transactions
![Page 13: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/13.jpg)
A distributed system can support only two of the following characteristics: ConsistencyAvailabilityPartition toleranceThe slides from Brewer’s July 2000 talk do not define these characteristics.
13
Brewer’s CAP Theorem
![Page 14: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/14.jpg)
all nodes see the same data at the same time – Wikipedia
client perceives that a set of operations has occurred all at once – Pritchett
More like Atomic in ACID transaction properties
14
Consistency
![Page 15: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/15.jpg)
node failures do not prevent survivors from continuing to operate – Wikipedia
Every operation must terminate in an intended response – Pritchett
15
Availability
![Page 16: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/16.jpg)
the system continues to operate despite arbitrary message loss – Wikipedia
Operations will complete, even if individual components are unavailable – Pritchett
16
Partition Tolerance
![Page 17: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/17.jpg)
Discussing NoSQL databases is complicated because there are a variety of types:Column Store – Each storage block contains
data from only one columnDocument Store – stores documents made up
of tagged elementsKey-Value Store – Hash table of keys
17
NoSQL Database Types
![Page 18: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/18.jpg)
XML DatabasesGraph DatabasesCodasyl DatabasesObject Oriented DatabasesEtc…Will not address these today
18
Other Non-SQL Databases
![Page 19: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/19.jpg)
Each storage block contains data from only one column
Example: Hadoop/Hbasehttp://hadoop.apache.org/Yahoo, Facebook
Example: Ingres VectorWiseColumn Store integrated with an SQL databasehttp://www.ingres.com/products/vectorwise
19
NoSQL Example: Column Store
![Page 20: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/20.jpg)
More efficient than row (or document) store if:Multiple row/record/documents are inserted at
the same time so updates of column blocks can be aggregated
Retrievals access only some of the columns in a row/record/document
20
Column Store Comments
![Page 21: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/21.jpg)
Example: CouchDBhttp://couchdb.apache.org/BBC
Example: MongoDBhttp://www.mongodb.org/Foursquare, Shutterfly
JSON – JavaScript Object Notation
21
NoSQL Example: Document Store
![Page 22: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/22.jpg)
{ "_id": "guid goes here", "_rev": "314159",
"type": "abstract", "author": "Keith W. Hare"
"title": "SQL Standard and NoSQL Databases",
"body": "NoSQL databases (either no-SQL or Not Only SQL) are currently a hot topic in some parts of computing.", "creation_timestamp": "2011/05/10 13:30:00 +0004"}
22
CouchDB JSON Example
![Page 23: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/23.jpg)
"_id" GUID – Global Unique IdentifierPassed in or generated by CouchDB
"_rev"Revision numberVersioning mechanism
"type", "author", "title", etc. Arbitrary tagsSchema-lessCould be validated after the fact by user-written
routine23
CouchDB JSON Tags
![Page 24: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/24.jpg)
Hash tables of KeysValues stored with KeysFast access to small data valuesExample – Project-Voldemort
http://www.project-voldemort.com/Linkedin
Example – MemCacheDBhttp://memcachedb.org/Backend storage is Berkeley-DB
24
NoSQL Examples: Key-Value Store
![Page 25: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/25.jpg)
Technique for indexing and searching large data volumes
Two Phases, Map and ReduceMap
Extract sets of Key-Value pairs from underlying data
Potentially in Parallel on multiple machinesReduce
Merge and sort sets of Key-Value pairsResults may be useful for other searches
25
Map Reduce
![Page 26: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/26.jpg)
Map Reduce techniques differ across products
Implemented by application developers, not by underlying software
26
Map Reduce
![Page 27: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/27.jpg)
Google granted US Patent 7,650,331, January 2010System and method for efficient large-scale data processing
A large-scale data processing system and method includes one or more application-independent map modules configured to read input data and to apply at least one application-specific map operation to the input data to produce intermediate data values, wherein the map operation is automatically parallelized across multiple processors in the parallel processing environment. A plurality of intermediate data structures are used to store the intermediate data values. One or more application-independent reduce modules are configured to retrieve the intermediate data values and to apply at least one application-specific reduce operation to the intermediate data values to provide output data.
27
Map Reduce Patent
![Page 28: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/28.jpg)
Syntax variesHTMLJava ScriptEtc.
Asynchronous – Inserts and updates do not wait for confirmation
VersionedOptimistic Concurrency
28
Storing and Modifying Data
![Page 29: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/29.jpg)
Syntax VariesNo set-based query languageProcedural program languages such as Java, C,
etc.Application specifies retrieval pathNo query optimizerQuick answer is importantMay not be a single “right” answer
29
Retrieving Data
![Page 30: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/30.jpg)
Small upfront software costsSuitable for large scale distribution on
commodity hardware
30
Open Source
![Page 31: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/31.jpg)
NoSQL databases reject:Overhead of ACID transactions“Complexity” of SQLBurden of up-front schema designDeclarative query expression Yesterday’s technology
Programmer responsible forStep-by-step procedural languageNavigating access path
31
NoSQL Summary
![Page 32: Comparing sql and nosql dbs](https://reader036.vdocument.in/reader036/viewer/2022062822/588279b31a28ab24788b517b/html5/thumbnails/32.jpg)
SQL DatabasesPredefined SchemaStandard definition and interface languageTight consistencyWell defined semantics
NoSQL DatabaseNo predefined SchemaPer-product definition and interface languageGetting an answer quickly is more important
than getting a correct answer
32
Summary