nosql choices
TRANSCRIPT
noSQL choices
What is mySQL?
What is noSQL?
Types of noSQL databases
Why noSQL?
Differences between noSQL and MYSQL
Aggregated data vs tuples
ACID vs BASE transactions
• A – Atomicity• C – Consistency• I – Isolation• D - Durabilty
Schema vs Schema-less
The 5 main data stores
• Relational Databases• Key-store• Document Databases• Graph Stores• Column Stores
Relational DatabasesAKA RDBMS
Why is it good?
• Super flexible• Proven to work, dominant in the market for 3
years• Robust, Stable• Very consistent• Follows ACID transitions, making it industry
standard
Why is it bad?
• Strongly typed columns• Inefficient with high volumes of data• Not designed for clusters• ONLY EFFICIENT WITH STRUCTRED DATA• Vertical scaling, need to buy bigger computer
to process bigger data
mySQL
NOSQL databases
Key-value stores
Why is it good?
• Hyper fast data storing and retrievals • Good for storing sessions from users– User profiles on forums– Shopping carts on websites
Why is it bad?
• Can’t query for values within the values• Need to know the key to properly query
Examples of key-stores
• CouchDB• Aerospike• Hyperdex• Flare• Dynamo• Redis
Most popular key-store: Redis
• Able to write 114293.71 requests per second • Able to read 81234.77 requests per second• https://redis-docs.readthedocs.org/en/
latest/Benchmarks.html
Companies that use Redis
• Twitter• Github• Pinterest• Snapchat• Flickr• Hulu• Vine• Imgur• Craigslist
Document Databases
Why is it good?
• Very easy to write up• Turn objects directly into Json files and easily turn Json
files into objects
• Easy to store data, documents contain whatever key and value you want
• No schema• Documents are independent units, easy to
distribute• No need for data to be related at all
Why is it good? (cont)
• Very, very programmer friendly• Good for:– Event logging– Content managing systems– E-commerce applications– Real-time analytics
Why is it bad?
• Tends to struggle when database is too big.• Not good at handling data that are very
related to each other• Not designed to handle cross-document operations
• Can’t slice data
Examples of document stores
• Mongo DB• lotusNotes• Apache Couch DB
Most popular Document Store: Mongo DB
Companies that use MongoDB
• Expedia• The Weather Channel• Forbes• Otto
Graph Stores
“If you can whiteboard it, you can graph it”
Why is it good?
• Well suited for analyzing interconnections• Very good for data that involve complex
relationships• High interest in mining social media data• Used for creating “recommended products”
on sales websites
Why is it bad?
• Not good at updating all, or a subset of entities
• Changing a property on all nodes in not a straight-forward approach
• Some databases may not be able to handle large amounts of data
Most popular graph database: Neo4j
Companies that use Neo4j
• Ebay• Tomtom• Hp• Walmart• eHarmony
Column Stores
Row vs Column store
Why is it good?
• Designed for gigantic amounts of data• Far better than row store, doesn’t waste time
searching• 10,000 rows. If you are looking for a value in a
single column, no need to read every single row.• Good for blogs, forums• Event logging• When you want to count and categorize certain
values
Why is it bad?
• Not good at working with systems that require ACID transactions for writes and reads
• If the data set is small, it is better of to use relational databases– If you just need to look at rows, relational
database is much better• Or a bunch of columns
Most popular Column-family store: Cassandra
Companies that use Cassandra• Walmart• VMWare• Unity• Ubisoft• Sony• Reddit• Paypal• Netflix• Nasa• Instagram• IBM• Fedix• Ebay• Call of Duty
Scaling in Cassandra
• Horizontal scaling• A matter of adding more nodes• Add more nodes = cluster support more writes
and reads• While clusters are working, you can still add
more nodes
Benchmark reports
Throughput
• Higher, the better• The power of the database engine
Latency guidelines
• Excellent: < 1ms• Very good: < 5ms• Good: 5 – 10ms• Poor: 10 – 20ms• Bad: 20 – 100ms• Really bad: 100 – 500ms• OMG!: > 500ms
The University of Toronto test (2012)
• Cassandra 1.0.0 rc2• Redis 2.4.2• Hbase v0.90.4• Voldmort 0.90.1• MySQL – 5.5.17
The tests
• Workload R (95% reads)• Workload RW (50% writes, 50% reads)• Workload W (99% writes)
Conclusion
• Cassandra – Highest Scalability, suffered in latency
• Redis – Highest initial troughput in read-intensive workloads. Latency very low
Conclusion (cont.)
• MySQL – Almost the same as Cassandra, latency is better
• HBase – Lowest throughput. Highest latency for reading. Lower latency for writing
EndPoint: Benchmarking Top NoSQL Databases
• Published: April 13, 2015• Updated: May 27, 2015• Cassandra (2.1.0)• Couchbase (3.0.1)• MongoDB (3.0)• Hbase(0.98.6-1 and Hadoop (2.6.0))
What was updated?
• Cassandra’s and Hbase’s performance went far up after updating results
Workload selection
• Workloads selected to be similar to today’s applications
• Database nodes: (30.5 GB RAM, 4 CPU cores, and a single volume of 800 GB of SSD local storage)
• All data had no data loss• Used data volumes that exceeded RAM
capacity on each node
Workloads
• Read-mostly: 95% read, 5% update ratio• Read/write: 50% read, 50% update• Read-modify-write: 50% read to 50% read-
modify-write ratio• Insert mostly: 90% insert, 10% read• 9 million operations per workload
Problems
• Couchbase• HBase• MongoDB
Conclusion
• Cassandra outperform everyone heavily in latency and troughput
• Hbase or CouchDB came second• MongoDB came last in most test cases
Altoros: The NoSQL Technical Comparison Report
• Published September 2014• Pretty unbiased• Couchbase: 2.5.1• MongoDB: 2.6.1• Cassandra: 2.0.8
Workload B
• 50% read operations • 40% update operations • 5% insert operations• 5% delete operations• 50 million 1 KB records
Workload B
• 3 million 10 KB records
Workload C
• 90% read operations• 8% update operations• 1% insert operations• 1% delete operations.• 3 million 10 KB records (50 million records is
similar to workload B results)
Scalability
Conclusions
• Cassandra has amazing scalability again• Cassandra is weaker at reading in terms of
latency• MongoDB has the worst latency results in
almost all fields
Overall conclusion
• Can’t state a single noSQL structure beats all• How about combining?• POLYGOT PERSISTENCE
Example: Shopping Site
OrdersCart Catalog &
Reviews
Suggestions
E-Commerce platform
Key/value
OrdersCart Catalog &
Reviews
Suggestions
E-Commerce platform
Key/value
OrdersCart Catalog &
Reviews
Suggestions
E-Commerce platform
RDBMS
Key/value
OrdersCart Catalog &
Reviews
Suggestions
E-Commerce platform
RDBMS Document
Key/value
OrdersCart Catalog &
Reviews
Suggestions
E-Commerce platform
RDBMS Document Graph