Download - NOSQL for Dummies
![Page 1: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/1.jpg)
NOSQLfor Dummies
Tobias Ivarsson
Hacker @ Neo Technology
twitter: @thobe / #neo4jemail: [email protected]: http://www.neo4j.org/web: http://www.thobe.org/
![Page 2: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/2.jpg)
4Image credit: http://browsertoolkit.com/fault-tolerance.png
This is still the view a lot of people have of NOSQL.
![Page 3: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/3.jpg)
NOSQL - Defined by what it is Not
5
๏“Any database that is not a Relational Database”
๏The term was coined at a meetup with the creators behind some prominent emerging databases
๏“Non-Relational Databases” might be more correct- But it’s a mouthful!
๏ ... then there was a conference ...
๏ ... and a mailing list ...
๏ ... the name caught on ...
๏ ... then there were more conferences ...
๏ ... and here we are!
![Page 4: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/4.jpg)
6
NOSQLWhat’s in the name...
![Page 5: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/5.jpg)
7
NO to SQLIt’s not about saying that SQL should never be used, or that SQL is dead...
![Page 6: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/6.jpg)
8
Not Only SQLIt’s about recognizing that for some problems other storage solutions are better suited!
![Page 7: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/7.jpg)
9
Four trends
NOSQL - Why now?
![Page 8: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/8.jpg)
2006 2007 2008 2009 2010
0
250
500
750
1000
161253
397
623
988
ExaBytes (10¹⁸) of data stored per year
10
Trend 1: Data size
Data source: IDC 2007
Each year more and more digital data is created. Over two years we create more digital data than all the data created in history before that.
![Page 9: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/9.jpg)
Trend 2: Connectedness
11
Text documents
1990
Info
rmat
ion
conn
ectiv
ity
FolksonomiesTagging
User-generated content
Wikis
RSS
Blogs
Hypertext
2000 2010 2020web 1.0 web 2.0 “web 3.0”
Ontologies
RDF
GiantGlobal
Graph (GGG)
Over time data has evolved to be more and more interlinked and connected.Hypertext has links,Blogs have pingback,Tagging groups all related data
![Page 10: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/10.jpg)
Trend 3: Semi-structure
12
๏ Individualization of content
• In the salary lists of the 1970s, all elements had exactly one job
• In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15?
๏All encompassing “entire world views”
• Store more data about each entity
๏Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)
![Page 11: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/11.jpg)
Trend 4: Architecture
13
DB
Application
1980s: Mainframe applications
![Page 12: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/12.jpg)
Trend 4: Architecture
14
DB
Application
1990s: Database as integration hub
Application Application
![Page 13: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/13.jpg)
DBDB DB
Trend 4: Architecture
15
Application
2000s: (moving towards) Decoupled serviceswith their own backend
Application Application
![Page 14: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/14.jpg)
Why NOSQL Now?
๏Trend 1: Size
๏Trend 2: Connectedness
๏Trend 3: Semi-structure
๏Trend 4: Architecture
16
![Page 15: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/15.jpg)
RDBMS performance
17Data complexity
Perf
orm
ance
Majority ofWebapps
Social network
Semantic Trading
Salary List
}custom
Relational database
Requirement of application
We are building applications today that have size and load requirements that
![Page 16: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/16.jpg)
Four emerging NOSQL categories
18
![Page 17: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/17.jpg)
Key-Value stores
19
๏Focus on scaling to huge amounts of data
๏Designed to handle massive load
๏Based on Amazon’s Dynamo paper
๏Data model: (global) collection of Key-Value pairs
๏Dynamo ring partitioning and replication
๏Examples:
•Dynomite
•Voldemort
•Tokyo{Tyrant, Cabinet, etc...}
![Page 18: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/18.jpg)
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
![Page 19: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/19.jpg)
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
![Page 20: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/20.jpg)
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
![Page 21: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/21.jpg)
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
![Page 22: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/22.jpg)
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
![Page 23: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/23.jpg)
BigTable clones๏Like column oriented Relational Databases, but with a twist
๏Tables similarly to RDBMS, but handles semi-structured
๏Based on Google’s BigTable paper
๏Data model: ‣Columns → column families → ACL
‣Datums keyed by: row, column, time, index
‣Row-range → tablet → distribution
๏Examples:
•HBase
•Hypertable
•Cassandra 21
![Page 24: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/24.jpg)
Document databases๏Similar to Key-Value stores, but the DB knows what the Value is
๏ Inspired by Lotus Notes
๏Data model: Collections of Key-Value collections
๏Documents are often versioned
๏Examples:
•CouchDB
•MongoDB
•Redis
22
![Page 25: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/25.jpg)
Graph databases๏Focus on modeling the structure of data - interconnectivity
๏Scales to the complexity of the data
๏ Inspired by mathematical Graph Theory ( G=(E,V) )
๏Data model: “Property Graph” ‣Nodes‣Relationships/Edges between Nodes (first class)‣Key-Value pairs on both‣Possibly Edge Labels and/or Node/Edge Types
๏Examples:
•Neo4j
•AllegroGraph
• Sones graphDB 23
![Page 26: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/26.jpg)
Property Graph model
24
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
![Page 27: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/27.jpg)
Property Graph model
24
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
![Page 28: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/28.jpg)
Property Graph model
24
LIVES WITHLOVES
OWNSDRIVES
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
![Page 29: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/29.jpg)
Property Graph model
24
LIVES WITHLOVES
OWNSDRIVES
LOVES
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
![Page 30: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/30.jpg)
Property Graph model
24
LIVES WITHLOVES
OWNSDRIVES
LOVESname: “James”age: 32twitter: “@spam”
name: “Mary”age: 35
brand: “Volvo”model: “V70”
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
![Page 31: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/31.jpg)
Property Graph model
24
LIVES WITHLOVES
OWNSDRIVES
LOVESname: “James”age: 32twitter: “@spam”
name: “Mary”age: 35
brand: “Volvo”model: “V70”
property type: “car”
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
![Page 32: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/32.jpg)
Graphs are whiteboard friendly
25Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
![Page 33: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/33.jpg)
Graphs are whiteboard friendly
25
1
*
1
*
*
1*
1
*
*
Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
![Page 34: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/34.jpg)
Graphs are whiteboard friendly
25
thobe
Wardrobe Strength
Joe project blog
Hello Joe
Neo4j performance analysis
Modularizing Jython
Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
![Page 35: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/35.jpg)
Four emerging NOSQL categories
๏Key-Value stores
๏BigTable clones
๏Document databases
๏Graph databases
26
![Page 36: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/36.jpg)
... and one that’s been around for a while
๏Object databases
•Neither gaining nor loosing traction
•Not part of the NOSQL community
• Still a good solution to a lot of problems
• Focuses on matching object oriented programming paradigm
‣Simplicity to integrate
‣Ease of use
27
![Page 37: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/37.jpg)
Scaling to size vs. Scaling to complexity
28
Size
Complexity
Key/Value stores
Bigtable clones
Document databases
Graph databases
![Page 38: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/38.jpg)
Scaling to size vs. Scaling to complexity
28
Size
Complexity
Key/Value stores
Bigtable clones
Document databases
Graph databases
> 90% of use cases
Billions of nodesand relationships
![Page 39: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/39.jpg)
Who is NOSQL?
29
A healthy mix of big players and independent vendors.
![Page 40: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/40.jpg)
“Ok, it’s not a database. How do I query it?”
30
๏RESTful interfaces (HTTP as an access API)
๏Query languages other than SQL
•GQL - SQL-like QL for Google BigTable
• SPARQL - Query language for the Semantic Web
•Gremlin - the graph traversal language
• Sones Graph Query Language
๏Query APIs
•The Google BigTable DataStore API
•The Neo4j Traversal API
![Page 41: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/41.jpg)
Why is the database RESTing?
31
http://one/http://two/
http://three/http://four/
http://one/fishie
My best friend is http://three/flounder!
Because hyperlinks make it possible to reference data on different hosts without hassle.
RESTful is really all about hypermedia!
![Page 42: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/42.jpg)
How about Data Manipulation?๏RESTful interfaces again (http PUT, POST, DELETE)
๏Data Manipulation APIs
•Google BigTable DataStore API
•Neo4j GraphDatabase API
๏Serialization Formats
• JSON
•Thrift
• ProtoBuffers
•RDF
32
![Page 43: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/43.jpg)
NOSQL in the Enterprise
๏Availability
๏Security
๏Correctness
๏Performance
33
This presentation does not cover Security.The interesting parts of Security is an application layer issue anyways.
![Page 44: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/44.jpg)
Availability๏Replication
•Write to many
• (Multi-)Master to Slave replication
๏Master reelection
๏Failover
• Either by another machine taking over
• or by the client knowing to attempt a replica
34
![Page 45: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/45.jpg)
Correctness๏Brewer’s CAP theorem
•Most NOSQL db’s sacrifice Consistency
‣Some use “read-correction”, treat read values as votes
๏Some NOSQL databases don’t have transactions
• Instead they have only atomic single operations
•This makes some operations impossible to implement
35
![Page 46: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/46.jpg)
Performance๏This is where all the focus seems to be
๏A surprising number scarifies Durability for performance
•On-disk durability
•Multiple-replicas durability
๏All NOSQL databases outperform RDBMSes
• ... in their particular niche ...
36
![Page 47: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/47.jpg)
One database to rule them all
37Image credits: The Lord of the Rings, New Line Cinema
Up until recently there was only one Database, the RDBMS. The days of a single database that rules all is over.
![Page 48: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/48.jpg)
Use best suited storage for each kind of data
38
The era of using RDBMSes for all problems is over.Instead we should use the database most suited for the problem at hand.
![Page 49: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/49.jpg)
Polyglot persistence
39
... we could even use multiple databases in conjunction, and let each database handle the things it does best.
![Page 50: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/50.jpg)
Polyglot persistence
40
SQL && NOSQL
All databases are welcome!SQL and NOSQL - it is Not Only SQL!
![Page 51: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/51.jpg)
Summary๏Two steps forward ( but first one step back... )
๏The era of a single DBMS is over
๏Use the right tool for the right job
๏Polyglot persistence happens already, and will grow more common
๏Solves different scalability issues
• Scale to size - huge amounts of data, many many machines
• Scale to complexity - handle complicated schemas- avoid being bogged down by deep JOINs
๏Driven by big players and independent vendors - healthy community
41
![Page 52: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/52.jpg)
Open source implementations to play with!๏Neo4j - talk to me, or visit http://neo4j.org/
๏CouchDB - http://couchdb.apache.org/
๏Cassandra - http://cassandra.apache.org/
๏Hadoop + HBase (clones GFS + BigTable) - http://hadoop.apache.org/
๏MongoDB - http://www.mongodb.org/
๏Redis - http://code.google.com/p/redis/
๏Oracle Berkley DB - http://www.oracle.com/database/berkeley-db/
๏FlockDB - http://github.com/twitter/flockdb
๏ ... and more ...42
![Page 53: NOSQL for Dummies](https://reader031.vdocument.in/reader031/viewer/2022012305/540e1e1e8d7f728d7e8b4bc8/html5/thumbnails/53.jpg)
http://neotechnology.com